CDS 132 - 560 /gene="1" /product="gp1" /function="terminase, small subunit" /locus tag="ShakeItOph_1" /note=Original Glimmer call @bp 132 has strength 10.99; Genemark calls start at 132 /note=SSC: 132-560 CP: yes SCS: both ST: SS BLAST-Start: [terminase small subunit [Arthrobacter phage Emotion]],,NCBI, q1:s1 100.0% 6.55332E-73 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.303, -1.953940808934884, yes F: terminase, small subunit SIF-BLAST: ,,[terminase small subunit [Arthrobacter phage Emotion]],,WGH21350,86.6197,6.55332E-73 SIF-HHPRED: Terminase small subunit; genome packaging, bacteriophage, DNA binding, VIRAL PROTEIN; 1.4A {Enterobacteria phage HK97},,,6Z6E_C,55.6338,98.5 SIF-Syn: terminase small subunit, downstream is terminase large subunit, just like in phages VroomVroom (AZ4) and Emotion (AZ4) /note=Primary Annotator Name: Akkinepally, Mrudula /note=Auto-annotation: Glimmer and Genemark both call the same start site: 132 /note=Coding Potential: Host trained Genemark and Self trained Genemark had really good coding potential for this area on the forward strand with an open reading frame that covers the entire length of this gene. /note=SD (Final) Score: -1.954 (most positive value) /note=Gap/overlap: There is a 131 bp gap between the start of the genome and this gene, however, there is no coding potential in that area. There is a 10 bp overlap with the following gene. However, since the stop site cannot be changed and this is less than 30 bp, no action needs to be taken. /note=Phamerator: Pham 133530. 1/15/2024. It is conserved, found in VroomVroom (AZ4) and Emotion (AZ4). /note=Starterator: This gene calls the “most called” start site. Start 40 is 132, 68/195 genes in this pham call this start site. (Start: 40 @132 has 44 MA`s). This agrees with glimmer and gene mark. /note=Location call: Based on this evidence, this is a real gene and the most likely start site is 132. /note=Function call: Terminase Small Subunit. The top three phagesdb BLAST hits have the function of terminase small subunit (E-value 0), and 5 out of 5 top NCBI BLAST hits also have the function of terminase small subunit. (95-100% coverage, 67%+ identity, and E-value <9*10^-55). HHpred had a hit for terminase small subunit with 98.62% probability, and E-value of 4.2e-7. CDD had no hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Joo, Hannah /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 550 - 2277 /gene="2" /product="gp2" /function="terminase, large subunit" /locus tag="ShakeItOph_2" /note=Original Glimmer call @bp 550 has strength 17.85; Genemark calls start at 550 /note=SSC: 550-2277 CP: yes SCS: both ST: SS BLAST-Start: [terminase large subunit [Arthrobacter phage VroomVroom]],,NCBI, q2:s3 99.8261% 0.0 GAP: -11 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.711, -3.8130952303033587, yes F: terminase, large subunit SIF-BLAST: ,,[terminase large subunit [Arthrobacter phage VroomVroom]],,WIC90152,97.9167,0.0 SIF-HHPRED: Terminase large subunit; genome packaging, bacteriophage, ATPase, nuclease, VIRAL PROTEIN; HET: BR; 2.2A {Enterobacteria phage HK97},,,6Z6D_A,92.0,100.0 SIF-Syn: Terminase, large subunit, upstream gene is terminase, small subunit, downstream is portal protein, just like in phage Emotion. /note=Primary Annotator Name: Giusti, Alessia /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 550. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: The final score is the best option at -3.813 and the z score is the highest at 2.711. /note=Gap/overlap: Gap: -11. An overlap of 11 is somewhat strange, but appears to be reasonable because all other start sites create a gap of at least 100 bp or an overlap of -71. This overlap is also conserved in phage VroomVroom of the same cluster. /note=Phamerator: pham: 130372. Date 01/17/2024. It is conserved; found in VroomVroom (AZ), Emotion (AZ), and Yang (AZ). /note=Starterator: Start number 49 corresponds to 550 in ShakeItOph. It was manually annotated as the start site at the highest frequency (78 times). This start site was not the most conserved start site (62). The most conserved start site in the pham, however, is not present in ShakeItOph. VroomVroom, which is in the same cluster as ShakeItOph, also calls 49 as the start site and start site 49 is called 87.3% of the time when present. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 550. /note=Function call: This is most likely the large subunit of the terminase. Both PhagesDB BLASTp and NCBI BLASTp contained numerous top hits corresponding to a terminase large subunit with e-values of 0. CCD came back with a hit for a “phage terminase-like protein, large subunit” with an e-value of 1.61e-8. Additionally, HHPRED showed multiple hits with 100% coverage and e-values of 6.6e-31 or lower that corresponded to a terminase large subunit. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Garcia, Isabella /note=Secondary Annotator QC: I agree with the above annotation. CDS 2299 - 3663 /gene="3" /product="gp3" /function="portal protein" /locus tag="ShakeItOph_3" /note=Original Glimmer call @bp 2299 has strength 18.32; Genemark calls start at 2299 /note=SSC: 2299-3663 CP: yes SCS: both ST: SS BLAST-Start: [portal protein [Arthrobacter phage Tweety19] ],,NCBI, q4:s3 96.4758% 0.0 GAP: 21 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.303, -2.033982896655645, yes F: portal protein SIF-BLAST: ,,[portal protein [Arthrobacter phage Tweety19] ],,YP_010678395,86.5639,0.0 SIF-HHPRED: Portal protein; Bacteriophage, SPP1, Portal Protein, Head completion proteins, Connector Complex, DNA Channel, VIRAL PROTEIN; 2.7A {Bacillus subtilis},,,7Z4W_B,92.0705,100.0 SIF-Syn: portal protein, upstream gene is terminase, downstream gene is NKF, just like in phage Emotion and VroomVroom. /note=Primary Annotator Name: Joo, Hannah /note=Auto-annotation: Both Glimmer and GeneMark call the start site 2299. /note=Coding Potential: Strong coding potential for this ORF is in the forward strand only, indicating that this is a forward gene. Coding potential is found in both GeneMark Self and Host. /note=SD (Final) Score: -2.034. This is the best final score on PECAAN. /note=Gap/overlap: 21bp gap. The gap is reasonable. /note=Phamerator: Pham: 131677. Date: 1/18/2024. The pham is highly conserved and found in VroomVroom (AZ) and Emotion (AZ). /note=Starterator: Start site 103 was called in 327/1762 non-draft phages in the pham. This start site is not present in ShakeitOph. Site 120 is present and is called 54.2% of the time when present. Site 120 is at 2299 for ShakeitOph. Site 120 supports evidence from Glimmer and GeneMark. /note=Location call: This is a real gene based on the above evidence. The most likely start site is at 2299. /note=Function call: Portal protein. The top two phagesDB blasts have the function of portal protein with e-values of 0. NCBI blast top hits also included Tweety19 (77.7533% identity, 96.4758% coverage, and e-value of 0) and MaGuCo (75.2174% identity, 97.5771% coverage, and e-value of 0). HHpred had hits for portal protein with 100% probability, at least 91% coverage, and e-values less than 1e-32. The top CDD hit was for Phate_prot_GP6 superfamily which is a portal protein. /note=Transmembrane domains: DeepTMHMM predicts no transmembrane domains, so it is not a transmembrane protein. /note=Secondary Annotator Name: Hernandez, Sarah /note=Secondary Annotator QC: I agree with this annotation and function call. CDS 3667 - 4464 /gene="4" /product="gp4" /function="hypothetical protein" /locus tag="ShakeItOph_4" /note=Original Glimmer call @bp 3667 has strength 10.9; Genemark calls start at 3667 /note=SSC: 3667-4464 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_EMOTION_4 [Arthrobacter phage Emotion]],,NCBI, q1:s1 100.0% 4.98928E-152 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.985, -2.6013996449736907, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_EMOTION_4 [Arthrobacter phage Emotion]],,WGH21353,89.8113,4.98928E-152 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Garcia, Isabella /note=Auto-annotation: GeneMark and Glimmer both call the start site at 3667. /note=Coding Potential: Coding potential is found on the forward strand in both Host-Trained GeneMark and GeneMarkS. /note=SD (Final) Score: The final score is -2.601, which is the best from the options. The z-score is the highest, at 2.985. /note=Gap/overlap: There is a gap of 3bp. /note=Phamerator: Pham 133606 as of 1/16/24. There are 122 members in this pham. /note=Starterator: This start site is site 4 in starterator and it has 4 manual annotations. /note=Location call: Based on the evidence above, this is a real gene with a start site of 3667. /note=Function call: PhagesDB Blast returned 100 hits, all of which had e values below 10^-7, most of which were proteins of unknown function. NCBI Blast returned 100 hits, one having an e-value of 0, which was a protein of unknown function. There were not any hits on CDD. There were not any significant hits (coverage >35%) in HHPred. /note=Transmembrane domains: DeepTMHMM did not predict any TMDs, therefore this is not a membrane protein. /note=Secondary Annotator Name: Bidzan, Hanna /note=Secondary Annotator QC: I agree with the auto-annotation and the primary annotation. The start site is at 3667 and based on the evidence given by PhagesDB, NCBI, and HHpred, i think this gene has an unknown function. CDS 4542 - 5126 /gene="5" /product="gp5" /function="scaffolding protein" /locus tag="ShakeItOph_5" /note=Original Glimmer call @bp 4542 has strength 16.31; Genemark calls start at 4542 /note=SSC: 4542-5126 CP: yes SCS: both ST: SS BLAST-Start: [scaffolding protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 2.02409E-104 GAP: 77 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.985, -2.66371296573402, yes F: scaffolding protein SIF-BLAST: ,,[scaffolding protein [Arthrobacter phage VroomVroom]],,WIC90155,92.8205,2.02409E-104 SIF-HHPRED: Scaffold protein; major capsid protein, HK97-like fold, scaffolding protein, procapsid, VIRUS; 3.72A {Staphylococcus phage 80alpha},,,6B0X_b,57.2165,96.9 SIF-Syn: Scaffolding protein, upstream gene is NKF, downstream is major capsid protein like VroomVroom and Emotion. /note=Primary Annotator Name: Hernandez, Sarah /note=Auto-annotation: Both Glimmer and GeneMark call 4542 as the start site with codon ATG /note=Coding Potential: The start site appears to cover all of the coding potential /note=SD (Final) Score: The final score is -2.664. This is the best final score as there are no other possible start sites. /note=Gap/overlap: There is a gap of 77. This is larger than the recommended 50, but there is no coding potential in the gap and other genes, like Emotion and VroomVroom, do not have any genes in the gap. /note=Phamerator: 1-19-24, pham 1850. It is conserved, found in MaGuCo (AZ). /note=Starterator: Start site 19 is the most annotated. It is called in 47 of the 51 non-draft genes. Start site 19 is start site 4542 in ShakeItOph. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 4542. /note=Function call: Scaffolding protein. The top phagesdb BLAST hits were for scaffolding protein with e-values of 7e-96 and 2e-47. The HHpred hits include scaffold protein with 96.92% probability, 57.2% coverage, and an e-value of 0.083. Other results have e-values greater than 1. There were no hits in NCBI hits. /note=Transmembrane domains: DeepTMHMM does not predict and TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Sass, Arielle /note=Secondary Annotator QC: After reviewing the PECAAN notes and referenced external resources, I agree with the start site call of bp 4542 and the function call of a scaffolding protein. CDS 5155 - 6096 /gene="6" /product="gp6" /function="major capsid protein" /locus tag="ShakeItOph_6" /note=Original Glimmer call @bp 5155 has strength 16.06; Genemark calls start at 5155 /note=SSC: 5155-6096 CP: no SCS: both ST: SS BLAST-Start: [major capsid protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 0.0 GAP: 28 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.76, -3.123974469270304, no F: major capsid protein SIF-BLAST: ,,[major capsid protein [Arthrobacter phage VroomVroom]],,WIC90156,99.0415,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Bidzan, Hanna /note=Auto-annotation: Both Glimmer and GeneMark agree with start site 5155 and start codon ATG /note=Coding Potential: There is significant coding potential in the forward strand as seen in the line being over 0.5 displaying high activity, however there is no coding potential in the reverse direction. /note=SD (Final) Score: The final score for start site 5155 is -3.124 which is the best final score for this gene along with a z score of 2.76 /note=Gap/overlap: There is a 28bp gap which is reasonable and acceptable. This is also the LORF with a length of 942bp /note=Phamerator: As of 1/19/2024, pham #228. There are 301 members in this pham with 43 of them as drafts /note=Starterator: As of 1/19/2024, the starterator report called start site 8 as the most frequently called start site. Found within 135 of the 253 non-draft genes in the pham. /note=Location call: Due to both Glimmer and Genemark agreeing with the start site of 5155, the final score being the best score at this site, and the reasonable coding potential, i believe this is a real gene with a start site at 5155 /note=Function call: There are no hits from phagesdb, but HHpred showed 81 hits, with the function of major capsid protein being shown as the most likely function based on the high % coverage. The e values are also very low. NCBI also displayed 100 hits with the majority being the major capsid protein and 100% coverage from arthrobacter VroomVroom with an e value of 0. CDD did not show any significant results. /note=Transmembrane domains: DeepTMHMM`s displays an absence of transmembrane regions which suggests that this is probably not a transmembrane protein. /note=Secondary Annotator Name: Diaz, Sebastian /note=Secondary Annotator QC: I agree with all annotations made, this is a real gene with function likely being major capsid protein due to the HHpred and NCBI hits mentioned. CDS 6172 - 6573 /gene="7" /product="gp7" /function="head-to-tail adaptor" /locus tag="ShakeItOph_7" /note=Original Glimmer call @bp 6172 has strength 11.41; Genemark calls start at 6172 /note=SSC: 6172-6573 CP: yes SCS: both ST: SS BLAST-Start: [head-to-tail adaptor [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 99.2481% 1.54726E-87 GAP: 75 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.303, -1.953940808934884, yes F: head-to-tail adaptor SIF-BLAST: ,,[head-to-tail adaptor [Arthrobacter phage VroomVroom]],,WIC90157,99.2481,1.54726E-87 SIF-HHPRED: Phage_Gp19 ; Phage protein Gp19/Gp15/Gp42,,,PF09355.14,77.4436,99.4 SIF-Syn: Head-to-tail adaptor, upstream gene is major capsid protein, downstream is NKF, just like in phage VroomVroom /note=Primary Annotator Name: Sass, Arielle /note=Auto-annotation: GeneMark and Glimmer agree on a start site at 6172 bp /note=Coding Potential: There is coding potential predicted by Host-trained and self-trained GeneMark and the chosen start site covers all of the coding potential. /note=SD (Final) Score: -1.954, the best final score of two options on PECAAN. The Z-score is the highest of two at 3.303. /note=Gap/overlap: Gap is 75 bp which is longer than ideal however there is no upstream coding potential in the gap. /note=Phamerator: The gene is in pham 131852 as of 1/13/2024. 39 of 147 non-draft genes of the pham are from phages of the same cluster, AZ, and the gene is conserved in all other members of the subcluster AZ4 such as VroomVroom and Emotion. The phams database and Phamerator function calls for the gene are both head-to-tail adaptor. /note=Starterator: Start site 13 in Starterator was manually annotated in 1/144 non-draft genes in this pham, however it is called 100% of the time when present (4/4). Start 13 is 6172bp in ShakeItOph. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: The gathered evidence suggests that the gene is real and starts at 6172 bp. /note=Function call: multiple phagesDB BLAST hits with the function head-to-tail adaptor with small e-values (<2e-54). Multiple NCBI BLASTp hits with the function head-to-tail adaptor with small e-values (<1.4e-64), high coverage (96.9%+), and high identity (72%+). Multiple HHPRED hits aligning with SPP1 gp15 (99.1% probability, 81% coverage, e-value of 4.5e-9) and Bacillus protein YqbG (99.2% probability, 82% coverage, e-value of 2.9e-9) corresponding with SEA-PHAGES function call requirements. /note=Transmembrane domains: No transmembrane proteins were predicted by DeepTMHMM. /note=Secondary Annotator Name: Samudrala, Vaishnavi /note=Secondary Annotator QC: Possibly mention if CDD gave no hits (even if none were significant) in the function call. Double check Starterator site number for the most called start site (it says site 9 but maybe it just updated?) and maybe mention how many non-draft phages called the site. Follow lab manual template for location call, auto-annotation, and gap/overlap sections. Other than that, I agree with all of the annotations. CDS 6570 - 6671 /gene="8" /product="gp8" /function="hypothetical protein" /locus tag="ShakeItOph_8" /note=Genemark calls start at 6570 /note=SSC: 6570-6671 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_8 [Arthrobacter phage VroomVroom]],,NCBI, q5:s4 75.7576% 0.00475676 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.88, -2.8779978280336396, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_8 [Arthrobacter phage VroomVroom]],,WIC90158,63.6364,0.00475676 SIF-HHPRED: SIF-Syn: NKF, upstream gene is head-to-tail adaptor, downstream is head-to-tail stopper, just like in phage VroomVroom /note=Primary Annotator Name: Diaz, Sebastian /note=Auto-annotation: Only GeneMark calls the start site for this gene at #6570, calling start codon GTG. /note=Coding Potential: There is strong coding potential for this gene throughout its entire putative ORF in both the Host-Trained and Self-Trained graphs. In addition, the chosen start site does cover all of this coding potential. /note=SD (Final) Score: The SD score for the GeneMark start site is -2.878, (this score is the best however it is the only candidate start site called). Still, the value is reasonable for a potential ribosome binding site. /note=Gap/overlap: The gap score for this gene is ‘-4’ suggestive of a 4bp overlap which is indicative of an operon. As a result this value is reasonable due to this gene being transcribed polycistronic. Interestingly this gene is only 102bp long which may not be acceptable. /note=Phamerator: As of January 16th, 2024 this gene is found within Pham: 132640. The pham is conserved with other members within its subcluster AZ4. Including phage VroomVroom’s gene 8 and draft phage JasmineDragon’s gene 8. In addition, the phamerator database did not have a function called for this gene. /note=Starterator: There is a reasonable start site that is highly conserved within the pham. The conserved start site is site number 1, which corresponds to base pair coordinate 6570 for ShakeItOph. There are approximately 8 members in this pham and 5/5 nondraft genes call site #1. /note=Location call: I believe this is a real gene due to conservation within its pham group and its reasonable coding potential with the putative ORF. The only issue is that it may be too small (102bp) to be considered reasonable for proper gene length (120), this may however be disregarded due to the aforementioned gene conservation within its cluster. Therefore the the gene’s start site candidate #1/6570 is reasonable. /note=Function call: Unfortunately no function can be called for this gene as all databases returned with uninformative results. HHpred and NCBI BLASTp had decent hits with good % coverage, probability and identity but they all belonged to hits assigned as “hypothetical protein” and “domain of unknown function” /note=Transmembrane domains: This protein is not a membrane protein because it has no transmembrane domains called by TMHMM. /note=Secondary Annotator Name: Sotelo, Jessie /note=Secondary Annotator QC: I agree with this annotation. There were no significant hits so a function cannot be assigned. The only indicator that this is not a real gene is the small gene size. However, the coding potential and conserved nature of this gene indicates that it is likely real. CDS 6658 - 6999 /gene="9" /product="gp9" /function="head-to-tail stopper" /locus tag="ShakeItOph_9" /note=Original Glimmer call @bp 6658 has strength 17.85; Genemark calls start at 6697 /note=SSC: 6658-6999 CP: yes SCS: both-gl ST: SS BLAST-Start: [head-to-tail stopper [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 4.70171E-71 GAP: -14 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.905, -2.827683592113848, yes F: head-to-tail stopper SIF-BLAST: ,,[head-to-tail stopper [Arthrobacter phage VroomVroom]],,WIC90159,98.2301,4.70171E-71 SIF-HHPRED: Stopper protein Rcc01689; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_E,97.3451,99.7 SIF-Syn: Head-to-tail stopper, upstream gene is head-to-tail adaptor, downstream gene is tail terminator, just like for phage Emotion (cluster AZ4). /note=Primary Annotator Name: Samudrala, Vaishnavi /note=Auto-annotation: Both Glimmer and GeneMark. Glimmer calls the start site at 6658 (codon: ATG) and GeneMark calls the start site at 6697 (codon: TTG). It’s likely that the start site may be at the predicted Glimmer site considering that codon ATG is more commonly utilized. /note=Coding Potential: The start site called by Glimmer has reasonable coding potential in both Host and Self-Trained GeneMark. The start site also has covered all of the coding potential. The start site called by GeneMark doesn’t completely cover the coding potential in the area nearby it (for both the Host and Self-Trained GeneMark). /note=SD (Final) Score: The final score for the start site called by both Glimmer has a final score of -2.828 and a z-score at 2.905. The z-score for this start site is the best z-score out of all of the potential start sites. It also has the best or largest final score. This indicates that there is a high probability of ribosome binding at the start site and that this probability is higher when compared to other sites called. /note=Gap/overlap: There are unreasonable overlaps/gaps between the gene and another gene with the start sites called by Glimmer (considering there is a 14bp overlap) and also for the GeneMark auto-annotated start site (25bp gap).The best start site candidate would still be considered the Glimmer site as it has a smaller overlap compared to the gap for the GeneMark start site. Also, the Glimmer start site allows for a reasonable length ORF with the gene’s length being 341 (GeneMark start site results in a smaller length ORF of 302). /note=Phamerator: The pham number the gene is found in is 133718 as of 1/16/24. The pham that the gene is commonly conserved includes other members of the cluster that phage belongs to. These phages used for comparison were Emotion (AZ4) and VroomVroom (AZ4). No function was called in Phamerator/Pham Maps for the gene at the start site called by Glimmer. /note=Starterator: There is a reasonable start site (start site 8) conserved among members of the pham. This start site corresponds to the base pair site 6658 or the Glimmer called start site. There are 76 other members in this pham (25 being drafts). 43 of the 51 non-draft members of the pham call the start site number 8. /note=Location call: The gene with the called Glimmer start site is real as it has reasonable amounts of coding potential on both the Host and Self-Trained GeneMark. It also only has coding potential for one of the direct strands (doesn’t flip in coding orientation) at the start site. The gene is also conserved in Phamerator and has a reasonable length of 341 bps (>150 bps) for adequately coding a protein.The best start site candidate is consistently shown to be the Glimmer called site at 6658. It covers more coding potential than the GeneMark site, allows for a longer ORF, and it`s the most called start site in Phamerator. 34 MAs have called the start site which ensures accuracy of the call. /note=Function call: PhagesDB BLASTp shows at least two significant hits with non-draft phages that have strong E-values (<1e-57) and reasonable percentage identify values (>73%). Both these hits call the function of the gene to be a “head-to-tail stopper”. NCBI BLASTp also shows at least two significant results (E values<3e-55 and query coverage of 100% as well as percentage identity values greater than 73.45%). The hits also call the function of the gene to be a “head-to-tail” stopper. HHPRED does show two significant results but these hits call different functions of the gene either being a “head completion protein” or “stopper protein”. Since one of the HHPRED results show alignment of the protein structure to the SPP1 16 crystal structure, the function is still likely a “head-to-tail” stopper (refer to SEA-PHAGES approved function list). There are no significant hits for the CDD database. /note=Transmembrane domains: There were no predicted TMRs (through DeepTMHMM). /note=Secondary Annotator Name: Vazquez, Eunice /note=Secondary Annotator QC: /note=Agree with the primary annotator CDS 7008 - 7289 /gene="10" /product="gp10" /function="hypothetical protein" /locus tag="ShakeItOph_10" /note=Original Glimmer call @bp 7008 has strength 15.01; Genemark calls start at 7008 /note=SSC: 7008-7289 CP: yes SCS: both ST: SS BLAST-Start: [neck protein [Arthrobacter phage SWEP2]],,NCBI, q3:s2 94.6237% 1.89902E-9 GAP: 8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.755, -3.7234890726758456, yes F: hypothetical protein SIF-BLAST: ,,[neck protein [Arthrobacter phage SWEP2]],,USL85081,55.0,1.89902E-9 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sotelo, Jessie /note=Auto-annotation: Glimmer and Genemark both call the start site at 7008. Start codon is ATG. /note=Coding Potential: The ORF has good coding potential on the forward strand, indicating that it is a forward gene. Coding potential is found on both GeneMark Self and Host. /note=SD (Final) Score: -3.723. It is the best final score on PECAAN. Strong z-score of 2.755 for this start site. /note=Gap/overlap: Gap: 8. This is a reasonable gap size. /note=Phamerator: Pham: 133532. Date: 1/13/24. It is conserved. Found in VroomVroom (AZ4) and Emotion (AZ4). /note=Starterator: Date: 1/13/24. Start site 37 in Starterator was manually annotated in 48/159 non-draft phages. It is called 100% of the time when it is present. Start site 37 is at 7008 bp which agrees with Glimmer and Genemark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 7008 bp. /note=Function call: NKF. The only PhagesDB BLAST hit is for a head-to-tail connector (E-value: 3e-07) The only NCBI hit was for a neck protein (E-value: 1e-06, Coverage: 96.4%, Identity: 46%) HHpred had no significant hits. CDD had no hits. The lack of hits does not allow me to confidently make a function call. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Aguirre, Austin Leon /note=Secondary Annotator QC: Additional information to include: Called start codon was ATG, which is a common start site and strengthens our confidence that the called start site is correct. Synteny box should be filled out. Pham Starterator drop-down says N/A, but it should say suggested start site. Should also mention that the Z-score for start site 7008 has a strong value of 2.755. Should include the date that starterator was called. Great work besides these few edits I agree with your location and function call. /note=I don`t think synteny boxes need to be filled out for NKF. CDS 7286 - 7681 /gene="11" /product="gp11" /function="tail terminator" /locus tag="ShakeItOph_11" /note=Original Glimmer call @bp 7286 has strength 12.27; Genemark calls start at 7286 /note=SSC: 7286-7681 CP: yes SCS: both ST: SS BLAST-Start: [tail terminator [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 1.9747E-83 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.551, -4.3934175870462076, yes F: tail terminator SIF-BLAST: ,,[tail terminator [Arthrobacter phage VroomVroom]],,WIC90161,96.9466,1.9747E-83 SIF-HHPRED: Tail terminator protein Rcc01690; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_F,96.9466,99.5 SIF-Syn: just like in phage vroomvroom the downstream gene has NKF and upstream gene is a major tail protein /note=Primary Annotator Name: vazquez, eunice /note=Auto-annotation: Glimmer and Genemark. Both call the start at 7286. /note=Coding Potential: Coding potential in this ORF is found on the forward strand only, indicating this is a forward gene. The coding potential is covered by the start site. /note=SD (Final) Score: -4.393 is the final score, and it is the best final score on PEECAN. The z score is 2.551. /note=Gap/overlap: -4 bp overlap. This is a reasonable overlap considering it is not 50bp or more. /note=Phamerator: As of 01/17/24 the pham number is 133659. Other phages in this pham belong to clusters AZ4 which are either tail terminators or NKF, and AZ1 which are also tail terminators or NKF. /note=Starterator: Start site 8 in starterator. There are more than 10 manual annotations for this start site. Start site 8 is 7286 which corresponds to the start site called by Glimmer and Genemark. /note=Location call: With all the evidence above this gene is a real gene and has a start site at 7286 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: Tail terminator. The top two phagesDB Blastp hits with a known function were tail terminator (E-values: 9e-70 and 1e-54). HHpred showed that the first hit a tail terminator had a high probability , low e-value , and high coverage. In order to call this a tail terminator it must have an alignment with 5A21 chain G and we can see that this is true as it has high probability, high coverage, and a low e value. The NCBI Blast had two good hits which had a low e value, high identity and coverage. They were also tail terminators. /note=Transmembrane domains: There are no TMRs noted by DeepTMHMM. /note=Secondary Annotator Name: Jacobs, Sarisha /note=Secondary Annotator QC: I agree with what the primary annotator called for function and location. I would add to the function call evidence the HHpred hit for 5A21 since a 5A21 or 3F2Z hit is a requirement (see seaphages function requirement list). I wouldn`t check the 3F2Z because it does not have as good of e-value. Also, since you are calling a function you would need to fill in the synteny box. CDS 7693 - 8247 /gene="12" /product="gp12" /function="major tail protein" /locus tag="ShakeItOph_12" /note=Original Glimmer call @bp 7693 has strength 17.72; Genemark calls start at 7693 /note=SSC: 7693-8247 CP: yes SCS: both ST: SS BLAST-Start: [major tail protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 1.28369E-127 GAP: 11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.314, -1.993391246735709, yes F: major tail protein SIF-BLAST: ,,[major tail protein [Arthrobacter phage VroomVroom]],,WIC90162,98.913,1.28369E-127 SIF-HHPRED: YSD1_22 major tail protein; Bacteriophage tail, helical assembly, VIRAL PROTEIN; 3.5A {Bacteriophage sp.},,,6XGR_M,90.7609,98.4 SIF-Syn: Contains strong syntent from phage Emotion. Upstream is a tail terminator and down stream is a tail assembly chaperone. /note=Primary Annotator Name: Aguirre, Austin /note=Auto-annotation: Glimmer and Genemark both called the same start site at 7693. 7693 contains the longest ORF. Start codon was ATG. /note=Coding Potential: Genemark host has very strong coding potential on a single frame with no other coding potential present. Start site covers the coding potential. Genemark self also shows coding potential that is covered by the start codon. There is a dip in coding potential around 8000. /note=SD (Final) Score: The called start site of 7693 has an amazing Z-score of 3.314 and a strong final score of -1.993. /note=Gap/overlap: Gene contains a gap of 11 which is acceptable. /note=Phamerator: Contains multiple different phages in the phamerator. Gene was called into cluster AZ4, and the most common clusters in this group were AZ, Q, DI, EH, and CD. 133 Members in this Pham group 120341. /note=Starterator: Start site 7693 is start number 8, and has 100 MA’s. Start site 8 was found in 133 of 133 of genes in this pham group and was called 98.5% of the time when present. /note=Location call: Based on the strong Z-score and FInal score, I think this is a real gene. NOt only did this start site also have the LORF, but it has a strong history of manual annotations in non-draft phages. /note=Function call: Phages DB blast has multiple hits of a major tail protein with a very strong e-value. HHPREd also calls a major tail protein, but the E-value is 0.000036. NCBIp BLAST calls the major tail protein twice as well with a very strong e-value. /note=Transmembrane domains: No TMD. /note=Secondary Annotator Name: Tubeileh, Shareef /note=Secondary Annotator QC: I agree with the location and function call of this gene. There seems to be a lot of evidence pointing to this gene as a major tail protein. It is also located (in the genome) near other tail proteins. CDS 8354 - 8620 /gene="13" /product="gp13" /function="tail assembly chaperone" /locus tag="ShakeItOph_13" /note=Original Glimmer call @bp 8354 has strength 14.7; Genemark calls start at 8354 /note=SSC: 8354-8620 CP: no SCS: both ST: SS BLAST-Start: [tail assembly chaperone [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 6.41971E-47 GAP: 106 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.314, -2.0111200136961407, yes F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Arthrobacter phage VroomVroom]],,WIC90163,92.0455,6.41971E-47 SIF-HHPRED: Phage_TAC_10 ; Phage tail assembly chaperone,,,PF10963.12,79.5455,94.3 SIF-Syn: This gene is called as a tail chaperone assembly protein in VroomVroom and Emotion. VroomVroom and Emotion have a second tail chaperone gene that overlaps with it. There is a major tail protein upstream and the tape measure downstream of the gene in VroomVroom and Emotion. /note=Primary Annotator Name: Jacobs, Sarisha /note=Auto-annotation: Glimmer and Genemark both call start at 8354 /note=Coding Potential: There is significant coding potential that is encaptured by the start at 8354. There is only coding potential in the forward direction and not in the reverse. /note=SD (Final) Score: The final score is -2.011 and the Z-score is 3.314. These are the only scores on PECAAN, but the Z-score surpasses the minimum and the final score is relatively good. /note=Gap/overlap: There is a gap of 106 base pairs. However, if this gene was removed there would be a significant gap. This gap is also conserved in Emotion and VroomVroom. /note=Phamerator: 1/15/2023 Pham 133703 with 81 members.This gene is found in other phages of the same cluster (AZ4) and demonstrates synteny. This gene can be found in VroomVroom and in Emotion. /note=Starterator: Start 5 @8354 is the only start called for ShakeItOph and it is the most annotated start from starterator with 49 calls. /note=Location call: It is evident from the evidence above and the lack of any other predicted starts, that the best start is at 8354. /note=Function call: There were hits on NCBI Protein blast for tail assembly chaperone from VroomVroom and Emotion (two non-draft phages) with good e values. In the phages db blast there were also hits for tail assembly chaperones from Maureen and Liebe. HHPed also revealed hits from Pfam for tail assembly chaperones. This gene also have the same slippery sequence that is found in VroomVromm “AGGGAAA” towards the ending of the gene. It is suggested from the evidence that this a tail chaperone assembly protein. /note=Transmembrane domains: There are no transmembrane domains. /note=Secondary Annotator Name: Yao, Alice /note=Secondary Annotator QC: I agree with the annotations above. You might also want to check if the gap is conserved in other phages since the gap is a bit large. CDS join(8354..8611,8611..8961) /gene="14" /product="gp14" /function="tail assembly chaperone" /locus tag="ShakeItOph_14" /note= /note=SSC: 8354-8961 CP: no SCS: neither ST: NI BLAST-Start: [tail assembly chaperone [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 5.94597E-133 GAP: -267 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.314, -2.0111200136961407, yes F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Arthrobacter phage VroomVroom]],,WIC90164,96.0396,5.94597E-133 SIF-HHPRED: SIF-Syn: CDS 8971 - 11214 /gene="15" /product="gp15" /function="tape measure protein" /locus tag="ShakeItOph_15" /note=Original Glimmer call @bp 8971 has strength 13.34; Genemark calls start at 8971 /note=SSC: 8971-11214 CP: yes SCS: both ST: NI BLAST-Start: [tape measure protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 0.0 GAP: 8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.139, -4.325675514574479, no F: tape measure protein SIF-BLAST: ,,[tape measure protein [Arthrobacter phage VroomVroom]],,WIC90165,94.5114,0.0 SIF-HHPRED: Tape Measure Protein, gp57; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_BF,12.5837,99.9 SIF-Syn: VroomVroom, Emotion, Tweety19, Snek, Liebe. Maureen, MaGuCo /note=Primary Annotator Name: WANG, JORDAN /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 8971. /note=Coding Potential: There is good coding potential shown by the 3rd row between 8971 (potential start) and 11214 (stop). /note=SD (Final) Score: For start site 8971, Final Score is -4.326. There is one start site with a higher Z-score and better final score (-3.836) at 9160 however it does not account for all of the coding potential. /note=Gap/overlap: 0 bp gap. /note=Phamerator: Pham 135251. Date: Jan 21, 2024. Found across 294 phages (56 drafts). /note=Starterator: Start site 6 in Starterator was manually annotated in 40/238 non-draft genes in this pham. Start 6 is 8971 in ShakeItOph. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is likely a real gene and the most likely start site is 8971. /note=Function call: Tape measure protein. Non-draft phagesdb BLAST and NCBI Blast hits call function as tape measure with E-value of 0. HHpred hits call function as tape measure protein with E-value 6.4e-15 to 8.9e-7. Top hit on CDD calls function as unknown with E-value = 2.22e-21, and tape measure protein with E-value = 0.00087. /note=Transmembrane domains: DeepTMHMM predicts 8 TMDs, therefore it is likely a membrane protein. /note=I agree with this annotation. All of the evidence categories have been considered. CDS 11227 - 12087 /gene="16" /product="gp16" /function="minor tail protein" /locus tag="ShakeItOph_16" /note=Original Glimmer call @bp 11227 has strength 13.56; Genemark calls start at 11227 /note=SSC: 11227-12087 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 0.0 GAP: 12 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.303, -1.953940808934884, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage VroomVroom]],,WIC90166,95.8042,0.0 SIF-HHPRED: Distal Tail Protein, gp58; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_CC,98.951,100.0 SIF-Syn: Minor tail protein, upstream is pham 133481 and downstream is another minor tail protein just like VroomVroom. /note=Primary Annotator Name: Richard, Ketan /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 11227 /note=Coding Potential: Coding potential in this ORF is primarily found on the forward strand, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -1.954. It is the best final score (most positive). /note=Gap/overlap: There is a gap of 12 base pairs upstream of the gene. There is however, no coding potential in the gap and there is not enough of a gap to have a potential gene. And all of the other potential start sites create very large gaps. /note=Phamerator: pham: 133598. Date: 01/13/2024.It is conserved and found in other AZ4 phages (Emotion and VroomVroom) /note=Starterator: Start site 30 in Starterator. There are 14 phages found with this start number and there are 3 manually annotated phages with this start site. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 11227. /note=Function call: Major Capsid Hexamer Protein. There are many hits on PhagesDB BLASTp with VroomVroom (e-value = E-152) and Emotion (e-value = E-119). NCBI has the same two hits with e-values of 0 and 2e-135. HHPred has similar calls with . PhagesDB BLASTp, NCBI Hits, and HHpred similar calls with distal tail proteins characteristic of minor tail proteins and there are no hits for CDD. There is synteny with VroomVroom which calls the minor tail protein and the same phams on either side of this gene. Minor tail protein, upstream is pham 133481 and downstream is another minor tail protein just like VroomVroom. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Soan, Jessica /note=Secondary Annotator QC: I agree with this location and functional call. There is sufficient evidence to support the function of these gene as a minor tail protein. All evidence has been correctly noted. CDS 12097 - 13074 /gene="17" /product="gp17" /function="minor tail protein" /locus tag="ShakeItOph_17" /note=Original Glimmer call @bp 12097 has strength 12.38; Genemark calls start at 12097 /note=SSC: 12097-13074 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Brevibacterium phage LuckyBarnes] ],,NCBI, q1:s1 99.6923% 2.70528E-62 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.314, -2.0111200136961407, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Brevibacterium phage LuckyBarnes] ],,YP_009792200,54.2857,2.70528E-62 SIF-HHPRED: SIF-Syn: The upstream gene is a minor tail protein and the downstream gene is a minor tail protein, just like in phage VroomVroom /note=Primary Annotator Name: Zamora, Alexandra /note=Auto-annotation: The gene is called by both Glimmer and GeneMark. They both agree the auto-annotated start site is at 12097 bp. The start codon for this start site is ATG. /note=Coding Potential: There is good coding potential on the Host-trained GeneMark that is within the ORF; the auto-annotated start codon covers all this coding potential. Coding potential of the Host-trained GeneMark is on the forward strand only. There is good coding potential on GeneMark self that is within the ORF; the auto-annotated start codon covers all this coding potential. Coding potential of the GeneMark self is on the forward strand only. /note=SD (Final) Score: The final score for the likely start site is -2.011 and the z-score is 3.314. This start site has the best final score and the best start site. /note=Gap/overlap: Gap: 9bp. The gap with the upstream gene is ideal. There are no other start sites with a better gap/overlap. There is no coding potential in this gap, indicating there is no need for the addition of a gene within this gap. The likely start site contains the longest ORF of 978 bp; the length is acceptable given the likely start site. /note=Phamerator: As of 1/16/24, the gene is found in pham number 132135. The pham of this gene is conserved among other AK4 phages; Phages Emotion and MiniMommy contain this pham number. /note=Starterator: There are 55 members in this pham, 24 are drafts. The most annotated start is number 6, with it being called in 30 of 31 non-draft genes. This conserved start (number 6), corresponds to start site 12097 for ShakeItOph gene 18. /note=Location call: Based on the good coding potential within the host-trained and self GeneMark and the conserved start site within the phamerator/starterator report, this gene is real with likely start site at 12097. /note=Function call: The top two phagesdb hits have minor tail protein function with e-values less than 1e^-165. NCBI BLAST had two strong hits with function minor tail protein. ( coverage: 99% and 98%; e-value: 3e-62 and 7e-53) CDD had no relevant hits. HHpred had no relevant hits. Based on this information, the function of this gene is a minor tail protein. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Bhattarai, Aryan. /note=Secondary Annotator QC: Based on the above evidence, I agree with the primary annotator. This a real gene and the location call is correct. The functional call is also correct. There is enough sufficient evidence to determine that the function of this gene is a minor tail protein. /note= /note=Below are my notes: /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 12097. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. All of the coding potential is contained within the start and stop site. /note=SD (Final) Score: -2.011. It is the best final score on PECAAN. /note=Gap/overlap: 9 bp, which is not large and contains no coding potential. /note=Phamerator: pham: 132135. Date 1/22/2024. 55 members exist in this pham, and the majority of them are in Subcluster AZ. /note=Starterator: There are 55 total members, 21 are drafts. Start number 6 found in 53 of 55 genes in pham. There are 32 of 33 manual annotations of this start. It was called 100.0% of the time when present. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 12097. /note=Function call: Minor tail protein. This gene has a strong synteny with non-draft phages VroomVroom (Gene 17) and Emotion (Gene 79), which are both known to be a minor tail protein. For both of the programs, PhagesDB BLASTp & NCBI BLASTp, the top hits have scores that are >580 and strong e-value that are all < 1e-165. HHpred and CDD hits were not significant given the large e-values. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. CDS 13074 - 14306 /gene="18" /product="gp18" /function="minor tail protein" /locus tag="ShakeItOph_18" /note=Original Glimmer call @bp 13074 has strength 15.11; Genemark calls start at 13074 /note=SSC: 13074-14306 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.236, -6.166667396375102, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage VroomVroom]],,WIC90168,98.2968,0.0 SIF-HHPRED: Sipho_Gp37 ; Siphovirus ReqiPepy6 Gp37-like protein,,,PF14594.10,93.4146,99.9 SIF-Syn: Exhibits synteny with VroomVroom. Both genes on either side of this are minor tail proteins, which adds to the viability of the function call of minor tail protein. /note=Primary Annotator Name: Soan, Jessica /note=Auto-annotation: Both Glimmer and Genemark call the start site at 13074. /note=Coding Potential: Significant coding potential in this ORF is only on the forward strand, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -6.167 is not the best final score on PECAAN but has a very favorable overlap of 1bp (Z-score 1.236) therefore, the final score is not as relevant. This still supports the start call. /note=Gap/overlap: There is a 1bp overlap with an ATG start codon. This is reasonable as the -1 gap is very small. This favorable overlap also indicates the presence of an operon. /note=Phamerator: Pham130626. Date 1/19/24. It is conserved; found in VroomVroom (AZ4) and two draft genomes: JasminDragon_Draft (AZ4) and MiniMommy_Draft (AZ4). /note=Starterator: There are 110 total members, 32 are drafts. Start number 12 found in 20/110 of genes in pham. Start 12 is 13074 in ShakeItOph. 14 of 78 are manual annotations of this start site. Called 100.0% of time when present. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 13074bp. Starterator agrees with Glimmer and Genemark. /note=Function call: Minor tail protein. The top two phagesdb BLAST hits have a minor tail protein function call; VroomVroom (AZ4) (e = 0) and Emotion (AZ4) (1e-166). The top NCBI BLAST also has a function call of a minor tail protein. (100% coverage, 96.5% identity with VroomVroom, and E-value 0). HHpred has significant hits with high probability (>99%) and low E values to support that the function is a minor tail protein. Also using the BLASTp and comparing it to VroomVroom and Emotion both which have a similar protein sequence, shows that there is conservation of the gene and it has been mapped to be a minor tail protein. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Kang, Alix /note=Secondary Annotator QC: I agree with this location and function call. All the evidence categories have been considered. I would write the below for Phamerator, since non-draft genes are more significant. /note=Phamerator: The pham number as of January 14, 2024 is 130626. The gene is conserved in phages Emotion and VroomVroom, all in the same cluster as ShakeItOph. The function call for the gene is a minor tail protein and it is consistent between Phamerator and the phams database. It is on the approved SEA-PHAGES list. /note=As for function, CDD has a hit (accession: cl24272 - Sipho_Gp37 super family), which seems to align with the HHPRED evidence that you checked off (PF14594.10). However, I could not find it in the PECAAN Conserved Domain Database. For NCBI Blast, I would also check Emotion as evidence (currently only VroomVroom is checked as evidence). CDS 14321 - 17332 /gene="19" /product="gp19" /function="minor tail protein" /locus tag="ShakeItOph_19" /note=Original Glimmer call @bp 14321 has strength 18.03; Genemark calls start at 14321 /note=SSC: 14321-17332 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 0.0 GAP: 14 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.364, -3.9301544432541915, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage VroomVroom]],,WIC90169,96.1117,0.0 SIF-HHPRED: SIF-Syn: Minor tail protein. Upstream gene is a minor tail protein, downstream gene has NKF. This gene, the upstream gene, and the downstream gene all had synteny with non-draft phages Emotion, and VroomVroom. Gene 20 of ShakeItOph had synteny with gene 19 of Emotion, and 19 of VroomVroom. All genes were minor tail proteins. The upstream gene and downstream of Emotion and VroomVroom had the same function as the ones upstream and downstream of ShakeItOph (Gene 20). /note=Primary Annotator Name: Bhattarai, Aryan /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 14321. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.930. It is the best final score on PECAAN. /note=Gap/overlap: 14 bp, which is not large and contains no coding potential, indicating there might be a new gene. /note=Phamerator: pham: 131361. Date 1/19/2024. 6 members exist in this pham, with 5 being in Subcluster AZ, and one (Gilgamesh_71) being a singleton. /note=Starterator: There are 6 total members, 3 are drafts. Start number 2 found in 5 of 6 of genes in pham. There are 2 of 3 manual annotations of this start. It was called 100.0% of the time when present. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 14321. /note=Function call: Minor tail protein. This gene has a strong synteny with non-draft phages VroomVroom (Gene 19) and Emotion (Gene 19), which are both known to have a Minor tail protein. For both of the programs, PhagesDB BLASTp & NCBI BLASTp, the top hits have scores that are >700 and strong e-value that are all 0. HHpred and CDD hits were not significant given the large e-values. /note=Transmembrane domains: This is not a membrane protein because DeepTMHMM does not predict any TMDs. /note=Secondary Annotator Name: Arnav Saud /note=Secondary Annotator QC: /note=I agree with the location call as all evidence points to the called start site being correct. This is a real gene. /note=The function call is also correct, despite all evidence coming from Blast results. CDS 17408 - 17749 /gene="20" /product="gp20" /function="hypothetical protein" /locus tag="ShakeItOph_20" /note=Original Glimmer call @bp 17408 has strength 12.57; Genemark calls start at 17444 /note=SSC: 17408-17749 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_20 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 4.74818E-67 GAP: 75 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.808, -3.4897410997597818, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_20 [Arthrobacter phage VroomVroom]],,WIC90170,95.5752,4.74818E-67 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kang, Alix /note=Auto-annotation: Glimmer calls the start at 17408. GeneMark calls the start at 17444. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: The final score for Glimmer start site at 17408 is the best option at -3.490 and the z score is 2.331. /note=Gap/overlap: Gap: 75bp. Somewhat large. However, there is no coding potential or ORFs in the gap that might be a new gene. /note=Phamerator: The pham number as of January 13, 2024 is 2534. The gene is conserved in non-draft phages VroomVroom and Emotion, all in the same cluster as ShakeItOph. /note=Starterator: Start site 14 was manually annotated in 23/31 non-draft genes in this pham. Start 14 is 17408 in ShakeItOph. This evidence agrees with the site predicted by Glimmer. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 17408. /note=Function call: Multiple phagesDB has hits with the hypothetical protein with small e values of 1e-15 to 2e-53. NCBI BLAST hits also have hits with hypothetical protein. (89-100% coverage, 43.69%+ identity, E-value <2e-15). CDD and HHpred had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Daniel, Mila /note=Secondary Annotator QC: I agree with this annotation and chosen start site. CDS 17763 - 18488 /gene="21" /product="gp21" /function="endolysin" /locus tag="ShakeItOph_21" /note=Original Glimmer call @bp 17763 has strength 13.91; Genemark calls start at 17763 /note=SSC: 17763-18488 CP: yes SCS: both ST: SS BLAST-Start: [endolysin [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 4.07114E-110 GAP: 13 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.314, -2.2821867859826788, yes F: endolysin SIF-BLAST: ,,[endolysin [Arthrobacter phage VroomVroom]],,WIC90171,89.2562,4.07114E-110 SIF-HHPRED: LysT endolysin; Endolysin, thermophilic, N-terminal domain, T7-like fold, ANTIMICROBIAL PROTEIN; 1.95A {Thermus phage 2631} SCOP: d.118.1.0,,,6FHG_A,61.4108,99.1 SIF-Syn: endolysin, downstream is a membrane protein, just like in phages VroomVroom (AZ4) and Emotion (AZ4) /note=Primary Annotator Name: Saud, Arnav /note=Auto-annotation: Glimmer and Genemark both call the start site to be 17763. /note=Coding Potential: There is a high coding potential on both GenemarkHost and Self. Coding potential in this ORF is on the forward strand only. /note=SD (Final) Score: -2.282. This is the best final score called by PECAAN. /note=Gap/overlap: There is a 13 bp gap between our gene of interest and the previous gene’s stop site. /note=Phamerator: Date: 1/13/24. The gene is conserved. It is found in Emotion (AZ) and VroomVroom (AZ). /note=Starterator: Date: 1/13/24. Start site 3 was manually annotated in 6/6 non draft genes in this pham. Start site 3 is present in ShakeItOph. Start site 3 was also the most most annotated start site. There is sufficient evidence to use Starterator in order to determine the start site of this gene. /note=Location call: The gene is also a length of 726 bps. This is a sufficient enough length to make it a possible gene. Based on the evidence above, this is most likely a real gene. /note=Function call: Endolysin. The top 5 phagesdb BLAST results with a known function indicate that this gene has the function of endolysin (<3e-43). The top 2 NCBI Blast results have been assigned the function endolysin (>92% coverage, >71.66%, e-value < 2e-95). HHpred has a hit listed as 6FHG_A which has the function of phage endolysin (99.1% probability, 61.4% coverage, e-value: 5e-9). CDD yielded no hits that had a sufficient % identity and % coverage. /note=Based on the evidence listed above, this gene is most likely an endolysin. /note=Transmembrane domains: DeepTMHMM does not predict any transmembrane domains for this gene; therefore, it is not a membrane protein. /note=Secondary Annotator QC: /note=Secondary Annotator Name: Akkinepally, Mrudula /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. Notes: Check box for proof on NCBI Blast. CDS 18506 - 18772 /gene="22" /product="gp22" /function="membrane protein" /locus tag="ShakeItOph_22" /note=Original Glimmer call @bp 18506 has strength 10.79; Genemark calls start at 18491 /note=SSC: 18506-18772 CP: yes SCS: both-gl ST: SS BLAST-Start: [membrane protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s6 100.0% 1.18465E-49 GAP: 17 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.134, -4.687276900021401, no F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage VroomVroom]],,WIC90172,89.3617,1.18465E-49 SIF-HHPRED: SIF-Syn: Membrane protein. Upstream gene is an endolysin, which is conserved in VroomVroom (AZ) and Emotion (AZ). /note=Primary Annotator Name: Daniel, Mila /note=Auto-annotation: Glimmer calls the start site at 18506 and GeneMark calls the start at 18491. /note=Coding Potential: Coding potential is on the forward strand and there is good potential indicated on both GeneMark Self and Host. /note=SD (Final) Score: The final score is -4.687. It is the third best final score found on PECAAN. /note=Gap/overlap: 17 bp. The gap with the upstream gene is 17 bp, a relatively small gap. /note=Phamerator: Pham number 132066. Date 01/13/24. It is conserved; it is found in the phages Lego (AZ) and VroomVroom (AZ). /note=Starterator: The start site found at start number 14 was manually annotated for ShakeItOph and called in 13 of the 21 non-draft genes in this pham. This start site is also found in the starterator report for this ShakeItOph gene at start site 18491. /note=Location call: The above evidence points towards this gene being real with a likely start site of 18491. /note=Function call: Membrane protein. There are no relevant phagesdb BLAST hits of known function. The top 3 NCBI BLAST hits have the membrane protein function (100% coverage, 74%+ identity, E-value < 10^-42). HHpred and CDD didn’t have any relevant hits. /note=Transmembrane domains: DeepTMHMM predicted one TMD, so this gene may code for a membrane protein. /note=Secondary Annotator Name: Giusti, Alessia /note=Secondary Annotator QC: I agree with this location and functional call. Note: (1) Make sure to check the box for the start site you want to call (2) make sure the FS and the gap section refer to the start site you end up calling (3) I may be mistaken, but I believe you don’t fill out the synteny box unless the phage you’re comparing it to also calls the function you’re attempting to call (4) a bit nit-picky, but I would say why why the hits from CCD, HHPRED, and PhagesDB BLASTp were not relevant (high e-values, poor coverage, etc.) CDS 18773 - 19057 /gene="23" /product="gp23" /function="hypothetical protein" /locus tag="ShakeItOph_23" /note=Original Glimmer call @bp 18773 has strength 14.33; Genemark calls start at 18773 /note=SSC: 18773-19057 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 2.976E-52 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.957, -4.7784408226142965, yes F: hypothetical protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage VroomVroom]],,WIC90173,97.8723,2.976E-52 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Akkinepally, Mrudula /note=Auto-annotation:Glimmer and Genemark both call the same start site: 18773 /note=Coding Potential: Host trained Genemark and Self trained Genemark had really good coding potential for this area on the forward strand with an open reading frame that covers the entire length of this gene. /note=SD (Final) Score: -4.778 (most positive value) /note=Gap/overlap: There is a 324 bp gap with the following gene, but considering it is on the reverse strand, at least a 50 bp gap is normal. There is a 0 bp gap with the previous gene. /note=Phamerator: pham: 132106. Date 01/15/2024. It is conserved, found in VroomVroom (AZ4) and Emotion (AZ4). /note=Starterator: ShakeItOph does not have the most annotated start site. Manual Annotations of this start: 2 of 38 and called 100% of the time when present.This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on this evidence, this is a real gene and the most likely start site is 18773. /note=Function call: NKF. The top three phagesdb BLAST hits have unknown function (E-value <4e-33), and 5 out of 5 top NCBI BLAST hits have the function membrane protein. (73-100% coverage, 76%+ identity, and E-value <4*10^-27). HHpred had a hit for putative antiholin with 94.97% probability, and E-value of 0.44 and two hits for holin protein with <75% probability, and E-value of 19 and 46. CDD had no hits. Despite the presence of 2 transmembrane proteins and the hits to holins, the high e-values and the lack of endolysins adjacent to the gene, I think the best function call is NKF. /note=Transmembrane domains: 2 transmembrane proteins predicted. 1 is 20 aa long and the second is 18 aa long. /note=Secondary Annotator Name: Joo, Hannah /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. The gap using the chosen start site should be 0bp. CDS complement (19109 - 19339) /gene="24" /product="gp24" /function="membrane protein" /locus tag="ShakeItOph_24" /note=Original Glimmer call @bp 19339 has strength 4.19; Genemark calls start at 19339 /note=SSC: 19339-19109 CP: no SCS: both ST: NI BLAST-Start: GAP: 101 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.948, -5.367638368103365, no F: membrane protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Notes from same gene in VroomVroom: "AF: Tricky call. We examined this area in VroomVroom and determined that VV should have a reverse gene in this location, and I think the same is true here. The HHpred hits are better and match to a membrane protein, and there is one TMD predicted. Good coding potential. F gene has less supporting evidence." /note= /note=Primary Annotator Name: Giusti, Alessia /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 19339, but the start site was manually changed to 19381 due to a better Z-score (2.64) and FS (-3.306) (19339 has a Z-score of 1.948 and an FS of -5.368). /note=Coding Potential: Coding potential is found both in GeneMark Self and Host, but coding potential is found on both the forward and reverse strands. /note=SD (Final) Score: The final score for start site 19381 is the best option at -3.306 and the z score is the highest at 2.64. /note=Gap/overlap: gap: 59 for start site 19381, which is a sufficient gap (>50) for genes that switch directions. /note=Phamerator: pham: 100306. Date 01/19/2024. It is conserved in MiniMommy (AZ4) and JasmineDragon (AZ4), but both of these are draft genomes. /note=Starterator: 19339 is start site 2 in pham 100306 and is the start site that is called in the rest of the phages, but there are only 2 other phages in the pham and both are drafts. There are no manual annotations throughout the pham. 19381 is start site 1 in ShakeItOph. /note=Location call: This could be a real gene and the most likely start is 19381. /note=Function call: All significant PhagesDB BLASTp hits were returned as function unknown from draft genomes. NCBI BLASTp and CCD returned no hits. HHPRED had no significant hits (all e-values were 0.88 and above). As such, the exact function of this gene cannot currently be determined. /note=Transmembrane domains: DeepTMHMM does predict one TMDs. /note=Secondary Annotator Name: Garcia, Isabella /note=Secondary Annotator QC: CDS 19441 - 19806 /gene="25" /product="gp25" /function="hypothetical protein" /locus tag="ShakeItOph_25" /note=Original Glimmer call @bp 19441 has strength 18.01; Genemark calls start at 19441 /note=SSC: 19441-19806 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQD70_gp080 [Brevibacterium phage Cantare] ],,NCBI, q2:s12 99.1736% 8.91124E-40 GAP: 101 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.063, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQD70_gp080 [Brevibacterium phage Cantare] ],,YP_010676655,71.3178,8.91124E-40 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Joo, Hannah /note=Auto-annotation: Both Glimmer and GeneMark call the start 19441. /note=Coding Potential: Strong coding potential for this ORF is in the forward strand only, indicating that this is a forward gene. Coding potential is found in both GeneMark Self and Host. /note=SD (Final) Score: -2.443. This is the best final score on PECAAN. /note=Gap/overlap: The gap is a bit large at 101bp but is reasonable given the other strong evidence that the gene is real. /note=Phamerator: Pham: 6677. Date: 1/19/24. It is found in two other AZ draft phages: JasmineDragon and MiniMommy. Pham 6677 is present in 8 non-draft phages. /note=Starterator: Start 8 was called the most at 2/8 non-draft phages but is not present in ShakeitOph. Start 9 was found in all of the AZ phages in the pham but are draft phages. Start 9 in ShakeitOph is start site 19441. This would agree with the site predicted by Glimmer and GeneMark. /note=Location call: This is a real gene based on the above evidence. The most likely start site is at 2299. /note=Function call: No known function. PhagesDB blast had strong hits with low e-values. No strong HHpred hits returned. No significant NCBI blast hits returned. No CDD hits returned. /note=Transmembrane domains: DeepTMHMM predicts no transmembrane domains, so it is not a transmembrane protein. /note=Secondary Annotator Name: Hernandez, Sarah /note=Secondary Annotator QC: I agree with the annotation and function call. I think you could check off some evidence in phagesDB blast because the e-values are good and they also have NKF. CDS 19923 - 20534 /gene="26" /product="gp26" /function="deoxynucleoside monophosphate kinase" /locus tag="ShakeItOph_26" /note=Original Glimmer call @bp 19923 has strength 17.08; Genemark calls start at 19923 /note=SSC: 19923-20534 CP: yes SCS: both ST: SS BLAST-Start: [deoxynucleoside monophosphate kinase [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 5.28317E-128 GAP: 116 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.063, -2.523003374675015, yes F: deoxynucleoside monophosphate kinase SIF-BLAST: ,,[deoxynucleoside monophosphate kinase [Arthrobacter phage VroomVroom]],,WIC90176,93.6274,5.28317E-128 SIF-HHPRED: DEOXYNUCLEOSIDE MONOPHOSPHATE KINASE; TRANSFERASE, PHOSPHOTRANSFERASE; HET: DGP, OCS; 2.0A {Enterobacteria phage T4} SCOP: c.37.1.1,,,1DEK_A,91.133,99.8 SIF-Syn: /note=Primary Annotator Name: Garcia, Isabella /note=Auto-annotation: GeneMark and Glimmer both call the start site at 19923. /note=Coding Potential: Coding potential is found on the forward strand in Host-Trained GeneMark and GeneMarkSelf. /note=SD (Final) Score: The final score is -2.523, which is the best from the options. The z score is 3.063. /note=Gap/overlap: There is a gap of 116 bp, which is acceptable. /note=Phamerator: Pham 97531 as of 1/17/24. There are 231 members of this pham. /note=Starterator: This start site is site 51 in starterator, and it has 68 manual annotations. /note=Location call: Based on the evidence above, this is a real gene with a start site of 19923. /note=Function call: PhagesDB Blastp returned 100 hits with significant e values. The two most significant hits (e values of 1e-103 and 7e-82) were both for deoxynucleoside monophosphate kinase genes in phages VroomVroom and Emotion26. HHPred returned 250 hits, with the two most significant hits (coverage >90%, e value <10^-3) being deoxynucleoside monophosphate kinase genes. NCBI Blast returned significant hits for deoxynucleoside monophosphate kinases. CDD returned one significant hit (coverage >80%, e-value of 6.5e-18) for a deoxynucleoside monophosphate kinase. /note=Transmembrane domains: DeepTMHMM did not predict any TMDs, therefore this is not a membrane protein. /note=Secondary Annotator Name: Bidzan, Hanna /note=Secondary Annotator QC: I agree with the start site of 19923 based on the autoannotion, primary annotation, and the evidence provided by the SD score and the gap/length. I also agree with the function call of this gene based on the PhagesDB blast and the NCBI blast. I think this gene has a deoxynuceloside monophosphate kinase function CDS 20637 - 21230 /gene="27" /product="gp27" /function="hypothetical protein" /locus tag="ShakeItOph_27" /note=Original Glimmer call @bp 20637 has strength 14.08; Genemark calls start at 20637 /note=SSC: 20637-21230 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_27 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 8.89384E-125 GAP: 102 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 0.897, -6.857740919865087, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_27 [Arthrobacter phage VroomVroom]],,WIC90177,94.9239,8.89384E-125 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hernandez, Sarah /note=Auto-annotation: Both Glimmer and GeneMark called the same start site, 20637, which has the start codon ATG. /note=Coding Potential: The start site covers all of the coding potential shown in both the host and self trained GeneMarks. /note=SD (Final) Score: The final score is -6.858 which is not the best final score. /note=Gap/overlap: There is a gap of 102 which is quite large, however it is the smallest possible gap. /note=Phamerator: 1-19-24, pham: 1819. It is conserved and found in DrSierra and Emotion which are both part of the AZ cluster. /note=Starterator: The most annotated start number is 21 and it was called in 39 of the 55 non-draft genes in the pham. This start number is called in ShakeItOph and so this evidence agrees with Glimmer and GeneMark. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 20637. /note=Function call: NKF. The top phagesdb blast hits for non-graft genes are NKF with e-values of 3e-96 and 4e-64 for VroomVroom and Emotion.HHpred has several hits, however the highest probability is 58.8% with e-values in the 100s and only 30% coverage. No CDD hits. /note=Transmembrane domains: There were no TMDs predicted by DEEPTMHMM which means this is not a membrane protein. /note=Secondary Annotator Name: Sass, Arielle /note=Secondary Annotator QC: After reviewing the PECAAN notes and referenced external resources, I agree with the start site call of bp 33,177 however it must be noted that the final score and z-score of the selected start are not ideal (there are better options in the list). CDS 21397 - 22305 /gene="28" /product="gp28" /function="Cas4 exonuclease" /locus tag="ShakeItOph_28" /note=Original Glimmer call @bp 21397 has strength 12.19; Genemark calls start at 21397 /note=SSC: 21397-22305 CP: no SCS: both ST: SS BLAST-Start: [Cas4 exonuclease [Arthrobacter phage VroomVroom]],,NCBI, q24:s1 91.3907% 0.0 GAP: 166 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.136, -6.898293885147568, no F: Cas4 exonuclease SIF-BLAST: ,,[Cas4 exonuclease [Arthrobacter phage VroomVroom]],,WIC90178,98.5663,0.0 SIF-HHPRED: Mitochondrial genome maintenance exonuclease 1; human MGME1, DNA complex, DNA exonuclease, DNA BINDING PROTEIN; 2.702A {Homo sapiens},,,5ZYT_C,76.4901,99.8 SIF-Syn: /note=Primary Annotator Name: Bidzan, Hanna /note=Auto-annotation: Both Glimmer and GeneMark agree with start site 21397 and start codon GTG /note=Coding Potential: The coding potential is only shown in the forward strand, not the reverse and has high activity /note=SD (Final) Score: The SD (final) score is not the best for that auto annotated start site. It has a score of -6.898 and a z score of 1.136 whereas start site 21466 has an SD score of -4.669 and a z score of 1.971. /note=Gap/overlap: There is a 166bp gap on auto-annotated start site of 21397 which is outside of the acceptable/reasonable range. It is the LORF with 909bp in length /note=Phamerator: As of 1/19/2024, this gene belongs to pham #130576 with 181 other members 38 of which are drafts /note=Starterator: As of 1/19/2024, the starterator report called start site 43 as the most frequently called start site. Found within 53 of the 127 non-draft genes in the pham. /note=Location call: This is a tricky call since the SD score is not necessarily the best for the auto annotated start and the gap is a little too long, however I will say that I do believe this is a real gene just because it does not switch between forward and reverse coding potential and because it is the longest ORF. /note=Function call: Phagesdb displayed 100 blast hits with some of them showing as no known function and others as an exonuclease. HHpred had a bit more significance in its results since there were 69 entries with almost all of them as exonucleases and a fair % coverage (75-76%). NCBI blast had 100 hits with a 91% coverage with arthrobacter VroomVroom as a exonuclease and an e value of 0. CDD showed no hits. This gene likely has an exonuclease function. /note=Transmembrane domains: DeepTMHMM`s displays an absence of transmembrane regions which suggests that this is probably not a transmembrane protein. /note=Secondary Annotator Name: Diaz, Sebastian /note=Secondary Annotator QC: I agree with the primary annotation in that the poor SD and z-score are alarming but the LORF and conserved pham point towards this gene being real. In addition I agree with location and function call but I believe this gene is more specifically a cas4 exonuclease given that the aforemnetioned NCBI BLAST hit is labeled as cas4 exonuclease and as phagesDB BLAST hit with a very low eval of 1e-59 calls for the same function within phage VroomVroom`s genome. CDS 22315 - 22620 /gene="29" /product="gp29" /function="hypothetical protein" /locus tag="ShakeItOph_29" /note=Original Glimmer call @bp 22315 has strength 15.7; Genemark calls start at 22315 /note=SSC: 22315-22620 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_29 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 1.34219E-59 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.264, -4.150999100097377, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_29 [Arthrobacter phage VroomVroom]],,WIC90179,95.0495,1.34219E-59 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sass, Arielle /note=Auto-annotation: GeneMark and Glimmer agree on a start site at 22315 bp. /note=Coding Potential: There is coding potential predicted by Host-trained and self-trained GeneMark and the chosen start site covers all of the coding potential. /note=SD (Final) Score: -4.151, the best final score of four options on PECAAN. The Z-score is 2.264. /note=Gap/overlap: Gap is 9 which is below the recommended 50bp limit and therefore acceptable. /note=Phamerator: The gene is in pham 134201 as of 1/13/24. 1 of 8 non-draft genomes with this pham are in the same cluster AZ4, phage VroomVroom. No function called by the Phams database or Phamerator. /note=Starterator: Start site 1 in Starterator was manually annotated in 1/8 non-draft genes in this pham, but is called 100% of the time when present (4/4). Start 1 is 22315bp in ShakeItOph. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: The gathered evidence suggests that the gene is real and starts at 35,204 bp. /note=Location call: The gathered evidence suggests that the gene is real and starts at 22315 bp. /note=Function call: NKF. Significant hits (e-value 120bps) taking into account both the Glimmer and GeneMark start site. /note=Phamerator: The pham number the gene is found in is 107560 (as of 1/16/24). Other members of the cluster AZ4 (which ShakeItOph belongs to) also belong to the same pham. Non-draft phages used for comparison include Emotion and VroomVroom. No function was called by the Phamerator/Pham Maps for the gene coded by the called start site. /note=Starterator: There is a reasonable, chosen start site (at site 1) conserved in other members of the pham the target gene belongs to. The start number for the site in the phage ShakeItOph is 22997 (the same site as the one called by Glimmer and GeneMark). There were 2 non-draft phages which called this start site and these phages were the only non-draft phages within the total five members of the pham. The Staterator is informative, but may not be as relevant to the assessment to the start site due to the lack of data associated with the pham. /note=Location call: The target gene is real. It is associated with a called start site, doesn’t switch in its coding orientation, has an adequate length of 320bps (for coding a protein), and reasonable coding potential at or near the start site. The gene is also conserved in Phamerator to an extent (although, more data may be needed here). The best possible start site for the gene would be the 22997 start site. This site is called by both auto-annotation softwares and covers all of the coding potential near the site. Also, It can be noted that the site is conserved and called in at least two non-draft phages (with 2 MAs). /note=Function call: PhagesDB BLASTp shows a minimum of two significant hits with reasonable E-values (<7e-44) and percentage identity values (>78%). Both of these hits call that the protein coded by the gene to have an “unknown function”. NCBI BLASTp also shows at least two significant hits with E-values<1e-51 and high query coverage values (>99%). These hits have similar percentage identity values to the PhagesDB BLASTp hits (>78.3%). The hits call the function to be a “hypothetical protein”. CDD showed no significant hits. HHPRED shows at least two significant hits but the functions called for the protein are not listed on the SEA-PHAGES approved functions list (“DNA-packaging protein”) (as of 1/16/24). /note=Transmembrane domains: TMRs were not predicted in DeepTMHMM. /note=Secondary Annotator Name: Vazquez, Eunice /note=Secondary Annotator QC: /note=I agree with primary annotator CDS 23314 - 23712 /gene="32" /product="gp32" /function="hypothetical protein" /locus tag="ShakeItOph_32" /note=Original Glimmer call @bp 23314 has strength 15.89; Genemark calls start at 23314 /note=SSC: 23314-23712 CP: yes SCS: both ST: SS BLAST-Start: [site-specific recombination directionality factor RDF [Arthrobacter phage Mudcat] ],,NCBI, q2:s3 98.4848% 1.46711E-37 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.905, -2.8454123590742793, yes F: hypothetical protein SIF-BLAST: ,,[site-specific recombination directionality factor RDF [Arthrobacter phage Mudcat] ],,YP_009300750,66.4234,1.46711E-37 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sotelo, Jessie /note=Auto-annotation: Glimmer and Genemark both call the start site at 23314. /note=Coding Potential: The ORF has good coding potential on the forward strand, indicating that it is a forward gene. Coding potential is found on both GeneMark Self and Host. /note=SD (Final) Score: -2.845. This is the best and only final score on PECAAN. /note=Gap/overlap: Overlap: 4. This is a reasonable size for an overlap. /note=Phamerator: Pham: 130432. Date: 1/14/24. It is conserved. Found in VroomVroom (AZ4) and Maureen (AZ2). /note=Starterator: Start site 126 was manually annotated in 211/342 non-draft genes in this pham. Start site 126 is 23314 and agrees with the site predicted by Glimmer and Genemark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 23314. /note=Function call: NKF. All the phagesdb BLASTp hits were NKF. NCBI had a few hits for recombination directionality factor (E-value: 1.46711e-37, Coverage: 98.5%, Identity: 48.2%). HHpred had no significant hits. CDD had no hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Aguirre, Austin Leon /note=Secondary Annotator QC: Additional information to include: Mention that this start site has a very good Z-score of 2.905. Could also mention that this is the only called start site present. Another piece of evidence to note for this start is that it’s the LORF. Also mention that the overlap of 4 suggests that this is an operon, which is great evidence for this being a real start. Should also describe how many different genes are in the pham and what clusters they belong to, is function conserved? What cluster is this gene in ShakeitOph? Again, what date was starterator called? Synteny box should be filled out, what genes are upstream and downstream in phages with synteny? Great work. CDS 23733 - 24083 /gene="33" /product="gp33" /function="nucleoside deoxyribosyltransferase" /locus tag="ShakeItOph_33" /note=Original Glimmer call @bp 23733 has strength 12.27; Genemark calls start at 23709 /note=SSC: 23733-24083 CP: yes SCS: both-gl ST: SS BLAST-Start: [nucleoside deoxyribosyltransferase [Arthrobacter phage VroomVroom]],,NCBI, q1:s9 100.0% 7.94235E-66 GAP: 20 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.763, -5.092497832398658, no F: nucleoside deoxyribosyltransferase SIF-BLAST: ,,[nucleoside deoxyribosyltransferase [Arthrobacter phage VroomVroom]],,WIC90183,85.4839,7.94235E-66 SIF-HHPRED: SIF-Syn: Upstream gene for vroomvrrom is NKF just like shakeitoph and downstream gene is LAGLIDADG endonuclease just like shakeitoph /note=Primary Annotator Name: Vazquez, Eunice /note=Auto-annotation: Glimmer calls the start site at 23733 bp with a start codon of ATG and Genemark calls the start site at 23709 bp with the start codon GTG. /note=Coding Potential:Coding potential in this ORF is found on the forward strand only, indicating this is a forward gene. The coding potential is covered by the start site. /note=SD (Final) Score: -5.092 is the z score that corresponds to the start site called by glimmer, but it is not the best final score. The corresponding z score is 1.763. /note=Gap/overlap: The gap is 20 bp. This is a reasonable gap considering it is not 50 bp or more. /note=Phamerator: As of 1/22/24 the pham number is 135523. Other phages in this pham belong to the AZ4 cluster (vroomvroom and emotion) and their function is nucleoside deoxyribosyltransferase. /note=Starterator: Start site 33 in starterator. There is 1 manual annotations for this start site. Start site 33 is 23733 called by Glimmer. /note=Location call: With all the evidence above this gene is a real gene and has a start site at 23733 bp. Starterator agrees with Glimmer. /note=Function call: nucleoside deoxyribosyltransferase. The top two phagesDB Blastp hits with a known function were nucleoside deoxyribosyltransferase (E-values: 2e-58 and 4e-40) phages vroomvroom and emotion. HHpred had no good hits. NCBI blast had one good hit with phage vroomvroom as it had a good low e value, high percent identity, coverage, and aligned. /note=Transmembrane domains:There are no TMRs noted by DeepTMHMM. /note=Secondary Annotator Name: Jacobs, Sarisha /note=Secondary Annotator QC: I agree with the function call, but I do not agree with the start site. The selected start site does not have the best Z or final score. The best scores belong to start 2370 which is called by genemark, it capures all coding potential, has small overlap, it has 2 MA`s on starator ( same as selected call), and is called be emotion and vroomvroom. I think this would be the better start. Also, if the primary author assigns a function, they should also fill out the synteny box. CDS 24080 - 24496 /gene="34" /product="gp34" /function="LAGLIDADG endonuclease" /locus tag="ShakeItOph_34" /note=Original Glimmer call @bp 24080 has strength 7.9; Genemark calls start at 24080 /note=SSC: 24080-24496 CP: yes SCS: both ST: SS BLAST-Start: [LAGLIDADG endonuclease [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 3.73394E-91 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.551, -4.3934175870462076, yes F: LAGLIDADG endonuclease SIF-BLAST: ,,[LAGLIDADG endonuclease [Arthrobacter phage VroomVroom]],,WIC90184,97.1014,3.73394E-91 SIF-HHPRED: DNA endonuclease I-CreI; protein, DNA, HYDROLASE-DNA COMPLEX; 1.6A {Chlamydomonas reinhardtii} SCOP: d.95.2.1,,,1T9I_B,73.1884,99.5 SIF-Syn: Contains synteny with phage VroomVroom, upstream is nucleoside deoxyribosyltransferase, downstream is Thx-like thymidylate synthase. /note=Primary Annotator Name: Aguirre, Austin /note=Auto-annotation: Glimmer and GeneMark both called the start site at 24080. This start site contains the longest ORF. Called start codon was ATG. /note=Coding Potential: GeneMark Host and Self both show strong coverage with the called start codon in a single frame. No other frames had strong coding potential. /note=SD (Final) Score: Called start site had least negative final score (-4.393) and a significant Z score over 2 (2.551). /note=Gap/overlap: THere was a gap of -4, suggesting that this gene is an operon. /note=Phamerator: On 01/19/24, there were 79 genes in this Pham group 129258. Most pham members belonged in cluster AZ, some EH, and one FP. This gene is in cluster AZ4. /note=Starterator: On 01/19/24, the Phamerator showed that the called start site was in group 17. This start site was found in 5 of 78 of the genes in this pham, and it was called 100% of the time when present. There were 2 manual annotations. /note=Location call: Based on the above evidence, this is a real gene. The start site 24080 by far has the most evidence to support its function, given that it has an outstanding Z score, final score, and has a -4 gap. This start site also covers all of the coding potential for the gene. /note=Function call: PhagesDB BLAST shows three hits of a LAGLIDAG endonuclease from VroomVroom, Emotion, and Liebe. Echo with very strong evalues. HHPRED shows a hit of a DNA endonuclease, also with a strong e-value. Two more strong hits were observed in LAGIDAG endonuclease for NCBI BLAST. /note=Transmembrane domains: No TMDs observed/ /note=Secondary Annotator Name: Tubeileh, Shareef /note=Secondary Annotator QC: I agree with the primary annotator about the location and function of this gene. There does also seem to be synteny. I would just add in the information about CDD since there does seem to be a relevant hit suggesting the same function. CDS 24496 - 25200 /gene="35" /product="gp35" /function="ThyX-like thymidylate synthase" /locus tag="ShakeItOph_35" /note=Original Glimmer call @bp 24496 has strength 15.67; Genemark calls start at 24496 /note=SSC: 24496-25200 CP: yes SCS: both ST: SS BLAST-Start: [ThyX-like thymidylate synthase [Arthrobacter phage VroomVroom]],,NCBI, q2:s3 99.5726% 2.84776E-165 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.659, -4.174392870950849, no F: ThyX-like thymidylate synthase SIF-BLAST: ,,[ThyX-like thymidylate synthase [Arthrobacter phage VroomVroom]],,WIC90185,98.2979,2.84776E-165 SIF-HHPRED: Thymidylate synthase ThyX; Tetramer, UMP/dUMP methylase, ThyX homolog, TRANSFERASE; HET: FAD, 5BU; 1.76A {Streptomyces cacaoi subsp. asoensis} SCOP: d.207.1.0,,,4P5A_D,97.8633,100.0 SIF-Syn: There is synteny with phage VroomVroom. Upstream is a LAGLIGDAG endonuclease, and downstream is a recombination directionality factor. /note=Primary Annotator Name: Jacobs, Sarisha /note=Auto-annotation: Glimmer and Genemark both call start at 24496. /note=Coding Potential: There is significant coding potential that is encaptured by the start at 24496 on both the host and self-trained reports in forward direction. There is a small peak right next to the start (upstream), but the majority of the potential falls within the start and the stop. There is no coding potential in the reverse direction. /note=SD (Final) Score: The finals score for this startis -4.174 (not the best score) and the Z score is 2.659. There are two better final scores (-3.873 @24976 and -3.867 @25159), however these starts would create much larger gaps. /note=Gap/overlap: This start has an overlap of one base pair which may suggest that it is an operon and does not probe any suspicion. /note=Phamerator: 1/16/2024 Pham 122432 with 413 members. This gene is found in another phage of the same cluster, VroomVroom. It is called a Thyx-like thymidylate synthase. /note=Starterator: ShakeItOph calls the most annotated start 69 @ 24496. It is called by 244 non-draft genes, such as VroomVroom. /note=Location call: It is based on the evidence above that the best start site to be called is at 24496. /note=Function call: This gene has multiple hits on CDD for Thymidylate synthase, namely, two are for Thyx-like. In the NCBI and Phagesdb protein blasts, there were hits for Thyx-like Thymidylate synthase phages with significant values. Interestingly enough, some were Gordonia phages, but there was also phage VroomVroom (3e-165) from the same cluster. There were also significant hits on HHpred for Thyx-like thymidylate synthase. It is with this evidence that I call this gene a Thyx-like Thymidylate Synthase. /note=Transmembrane domains: This gene does not have any transmembrane domains. /note=Secondary Annotator Name: Yao, Alice /note=Secondary Annotator QC: I agree with the annotations above. CDS 25346 - 26047 /gene="36" /product="gp36" /function="recombination directionality factor" /locus tag="ShakeItOph_36" /note=Original Glimmer call @bp 25346 has strength 18.56; Genemark calls start at 25346 /note=SSC: 25346-26047 CP: yes SCS: both ST: SS BLAST-Start: [recombination directionality factor [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 5.38009E-169 GAP: 145 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.561, -4.374347681057712, yes F: recombination directionality factor SIF-BLAST: ,,[recombination directionality factor [Arthrobacter phage VroomVroom]],,WIC90186,99.5708,5.38009E-169 SIF-HHPRED: Gp3-like ; Recombination directionality factor-like,,,PF18897.4,87.5536,100.0 SIF-Syn: There is synteny with phage Vroomvroom, specifically with a synthase protein upstream of the same pham, and another protein downstream with identical phams as well. /note=Primary Annotator Name: Tubeileh, Shareef /note=Auto-annotation: Both Glimmer and GeneMark call this gene at 25346, this is likely the start site. /note=Coding Potential: On both GeneMark self and host there is ample coding potential in the forward strand and there isn`t really much potential on the reverse strand. /note=SD (Final) Score: The best final score (-4.373) and the best z-score (2.561) are associated with this start site. /note=Gap/overlap: The gap is 145, which is a pretty significant gap. There could maybe be a gene before this one. /note=Phamerator: As of 1/18/2024, this gene is in pham 848, with over 120 final phages. /note=Starterator: The most manually annotated start site is 39, and this gene calls it. This start site is also called in a significant amount of other non-draft genes. This start number is also associated with the one called by Glimmer and GeneMark. /note=Location call: Given the information from starterator, the coding capacity from GeneMark, and the data on pecaan, the most likely start site is 39 at 25346bp. /note=Function call: PhagesDB comes up with significant hits suggesting a recombination directionality factor. HHpred only has one significant result, but it does suggest a recombination directionality factor. NCBIblast also has a couple of strong suggestions for a recombination directionality factor. CDD has no hits. Based off this evidence, this is likely a recombination directionality factor. /note=Transmembrane domains: DeepTMHMM predicts no transmembrane regions, so this is likely not a membrane protein. /note=Secondary Annotator Name: Wang, Jordan /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator regarding function. CDS 26049 - 26249 /gene="37" /product="gp37" /function="membrane protein" /locus tag="ShakeItOph_37" /note=Original Glimmer call @bp 26049 has strength 17.96; Genemark calls start at 26049 /note=SSC: 26049-26249 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 7.61941E-13 GAP: 1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.145, -4.9668027763900575, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage VroomVroom]],,WIC90187,84.8485,7.61941E-13 SIF-HHPRED: SIF-Syn: There is synteny with JasmineDragon and VroomVroom on pham 38. There was no designated function on JasmineDragon or VroomVroom. /note=Primary Annotator Name: Yao, Alice /note=Auto-annotation: Both Glimmer and GeneMark have start sites at 26049. The starting codon is GTG. /note=Coding Potential: There is a high coding potential in this ORF in the forward strand. Coding potential is found in both Genemark Self and Host. /note=SD (Final) Score: -4.967. It is the best final score on PECAAN. /note=Gap/overlap: Gap of 1bp. Could be an indication of an operon. /note=Phamerator: pham: 128414. Date 01/13/24. It is conserved; found in JasmineDragon (AZ4) and MiniMommy (AZ4). /note=Starterator: Start site 7 in Starterator was found in 3 out of 7 genes in this pham. None were manually annotated. Start 7 is 26049 in JasmineDragon. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 26049. /note=Function call: Membrane protein. The top two PhageDB BLASTp results show unknown function with e-values of 3*10^-18 and 5*10^-7 respectively. NCBI BLASTp has one hit with e-value of 8*10^-13 and a percentage identity of 72.73% to a membrane protein. Because DeepTMHMM predicts two TMD and PhageDB BLASTp predicted unknown functions, it points to the possibility of a membrane protein according to the SEA Official list function. NCBI BLASTp supports it being a membrane protein too. /note=Transmembrane domains: DeepTMHMM predicts two TMD. Based on this evidence this gene can be assumed to have two real TMD and is therefore a membrane protein. /note=Secondary Annotator Name: RICHARD, KETAN LEONARD /note=Secondary Annotator QC: I agree that this is a real gene and a membrane protein based on the transmembrane domains. CDS 26246 - 26371 /gene="38" /product="gp38" /function="hypothetical protein" /locus tag="ShakeItOph_38" /note=Original Glimmer call @bp 26246 has strength 12.12 /note=SSC: 26246-26371 CP: yes SCS: glimmer ST: NI BLAST-Start: [hypothetical protein SEA_VROOMVROOM_38 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 8.46373E-17 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.236, -6.307665910037288, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_38 [Arthrobacter phage VroomVroom]],,WIC90188,92.6829,8.46373E-17 SIF-HHPRED: SIF-Syn: VroomVroom, Emotion /note=Primary Annotator Name: WANG, JORDAN /note=Auto-annotation: Glimmer calls the start at 26246, however Genemark does not call a start. /note=Coding Potential: There is good coding potential shown between 26246 (potential start) and 26371 (stop). Note that there is another open reading frame around 26150, however there is no coding potential to account for the earlier ORF. /note=SD (Final) Score: For start site 26246, Final Score is -6.308 and it is the best start site, however Z-score is 1.236. /note=Gap/overlap: 4 bp overlap, indicates possible operon. /note=Phamerator: Pham 48566. Date: Jan 21, 2024. Found across 5 phages (3 drafts). /note=Starterator: Start site 5 in Starterator was manually annotated in 2/2 non-draft genes in this pham. Start 5 is 26246 in ShakeItOph. This evidence agrees with the site predicted by Glimmer. /note=Location call: Based on the above evidence, this is likely a real gene and the most likely start site is 11151. /note=Function call: Unknown. All phagesdb BLAST and NCBI BLAST are of unknown function. HHpred hits have high e-values. CDD has no available data. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Zamora, Alexandra /note=Secondary Annotator QC: I agree with this annotation. All evidence has been considered. CDS 26463 - 26750 /gene="39" /product="gp39" /function="NrdH-like glutaredoxin" /locus tag="ShakeItOph_39" /note=Original Glimmer call @bp 26463 has strength 16.7; Genemark calls start at 26463 /note=SSC: 26463-26750 CP: yes SCS: both ST: SS BLAST-Start: [NrdH-like glutaredoxin [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 97.8947% 2.66047E-53 GAP: 91 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.063, -2.583959800616441, yes F: NrdH-like glutaredoxin SIF-BLAST: ,,[NrdH-like glutaredoxin [Arthrobacter phage VroomVroom]],,WIC90189,80.1802,2.66047E-53 SIF-HHPRED: GLUTAREDOXIN-LIKE PROTEIN NRDH; ELECTRON TRANSPORT, NRDH, THIOREDOXIN, GLUTAREDOXIN, REDOX PROTEIN; 1.7A {ESCHERICHIA COLI} SCOP: c.47.1.1,,,1H75_A,83.1579,99.2 SIF-Syn: The upstream gene has no known function and the downstream gene is a Holliday junction resolvase, just like phage VroomVroom /note=Primary Annotator Name: Zamora, Alexandra /note=Auto-annotation: This gene is auto-annotated by both Glimmer and GeneMark. The agreed auto-annotated start site is at 26463 bp. The start codon for this start site is ATG. /note=Coding Potential: There is good coding potential on the Host-trained GeneMark. The auto-annotated start site covers all this coding potential; the coding potential is on the forward strand only. There is good coding potential within GeneMark self. The auto-annotated start site covers all this coding potential; the coding potential is on the forward strand only. /note=SD (Final) Score: The final score for the likely start site is -2.584 and the z-score is 3.063. The final score and the z-score of this start site are the best. /note=Gap/overlap: Gap: 91bp. The gap with the upstream gene is fairly large. However, there is no coding potential within this gap to indicate that there should be an addition of a gene. There are no alternative start sites that will reduce this gap size. The likely start site contains the longest ORF of 288bp; this is an acceptable gene length. /note=Phamerator: As of 1/16/24, the gene is found in pham number 133401. The pham is conserved among other members of the AK2 cluster; Phages MiniMommy and VroomVroom contain genes found in this pham. /note=Starterator: Pham number 133401 contains 819 members, in which 78 are drafts. The most annotated start number for this pham is 101. This conserved start number corresponds to start site 26463 of ShakeItOph gene 41. /note=Location call: Based on the good coding potential of both the host-trained and self GeneMark and the information found in the phamerator/starterator report, this is a real gene with likely start site at 26463bp. /note=Function call: Two of the top two phagesdb blast hits have a function of NrdH-like glutaredoxin with e-values less than 1e-30. CDD had one relevant hit of a protein within the NrdH-redoxin family; the e-values is 2.29e-11. The top two NCBI BLAST hits have a NrdH-like glutaredoxin function (coverage: 97% and 98%; e-values: 3e-53 and 1e-38). The next two NCBI BLAST hits were proteins found in the glutaredoxin family (coverage: 81%; e-values: 4e-13 and 1e-12). Two of the top HHpred hits were NrdH-like glutaredoxin proteins with probability greater than 99% and e-values less than 1e-8. Based on this information, the function of this gene is an NrdH-like glutaredoxin protein. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Bhattarai, Aryan /note=Secondary Annotator QC: Based on the above evidence, I agree with the primary annotator. This a real gene and the location call is correct. The functional call is also correct. There is enough sufficient evidence to determine that the function of this gene is a NrdH-like glutaredoxin. Not: during the time of my QC (1/22/24), the Pham number changed, as well as the gap of this gene. Note: Primary author needs to fill out the synteny box. /note= /note=Below are my notes: /note=Auto-annotation: Glimmer and GeneMark both call the start at 26463. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. All of the coding potential is contained within the start and stop site. /note=SD (Final) Score: -2.584. It is the best final score on PECAAN. /note=Gap/overlap: 91 bp, which is not too large and contains no coding potential. There are also no alternative start sites that will reduce this gap size. /note=Phamerator: pham: 135195. Date 1/22/2024. 707 members exist in this pham, and the pham is conserved among members from various clusters (EK, A, V, AZ, N, K). /note=Starterator: There are 707 total members, 77 are drafts. Start number 93 is found in 15 of 707 genes in pham. There are 10 of 625 manual annotations of this start. It was called 93.3% of the time when present. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 26463. /note=Function call: NrdH-like glutaredoxin. This gene has a strong synteny with non-draft phages VroomVroom (Gene 39) which is known to be a NrdH-like glutaredoxin. All of the programs have significant hits suggesting this is the function of this gene. The top 2 NCBI BLAST hits have a NrdH-like glutaredoxin function (coverage: 97-98%; e-values < 1e-37). Two NCBI BLAST hits were proteins found in the glutaredoxin family (coverage: 80%+; e-values < 1e-11). Two HHpred hits were NrdH-like glutaredoxin proteins with probability > 99% & e-values < 1e-7. CDD said it was part of the NrdH-redoxin (NrdH) family (percent coverage: 82%, percent identity: 38%, e value: 2.03728e-11). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. CDS 26737 - 27180 /gene="40" /product="gp40" /function="Holliday junction resolvase" /locus tag="ShakeItOph_40" /note=Original Glimmer call @bp 26737 has strength 13.24; Genemark calls start at 26737 /note=SSC: 26737-27180 CP: yes SCS: both ST: NI BLAST-Start: [holliday junction resolvase [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 1.50367E-99 GAP: -14 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.165, -6.374982888595457, no F: Holliday junction resolvase SIF-BLAST: ,,[holliday junction resolvase [Arthrobacter phage VroomVroom]],,WIC90190,97.973,1.50367E-99 SIF-HHPRED: Holliday junction resolvase; archeal holliday junction resolvase helicase DNA binding enzyme phage 15-6 thermus thermophilus, RECOMBINATION; HET: SO4, MSE; 2.5A {Thermus thermophilus phage 15-6},,,7BGS_A,74.1497,99.6 SIF-Syn: Holiday junction resolvase, upstream gene is NrdH-like glutaredoxin, just like in phage VroomVroom. Downstream gene is NKF belonging to an orpham. /note=Primary Annotator Name: Soan, Jessica /note=Auto-annotation: Both Glimmer and Genemark call the start site at 26737. /note=Coding Potential: Significant coding potential in this ORF is only on the forward strand, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -6.375; this is the 4th best RBS score; there were better RBS scores but those caused there to be large gaps upstream of the start site where there is strong coding potential. /note=Gap/overlap: -14bp. This is a pretty large overlap and pretty unusual. However, all alternate start sites created large gaps where there was strong coding potential. /note=Phamerator: Pham135496. Date 1/19/24. It is conserved; found in VroomVroom (AZ4). The function of this pham is Holliday Junction resolvase. It is on the approved SEA-PHAGES list. /note=Starterator: There is no starterator found data for this gene. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 26737bp. Starterator agrees with Glimmer and Genemark. /note=Function call: Holliday junction resolvase. The top two phagesdb BLAST hits have a holliday junction resolvase function call; VroomVroom (AZ4) (e = 1e-78) and Emotion (AZ4) (3e-56). The top NCBI BLAST also has a function call of a holliday junction resolvase. (100% coverage, 95.2% identity with VroomVroom, and E-value 1.5e-99). HHpred has significant hits with high probability (>99%) and low E values to support the function call. Also using the BLASTp and comparing it to VroomVroom and Emotion both which have a similar protein sequence, shows that there is conservation of the gene and it has been mapped to be a holliday junction resolvase. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Kang, Alix /note=Secondary Annotator QC: I agree with this location and function call. All the evidence categories have been considered. I would check off “All GM Coding Capacity” to YES. For Phamerator: I would write the following: The pham number as of January 19, 2024 is 135496. The gene is conserved in non-draft phage VroomVroom in the same cluster as ShakeItOph. The function call for the gene is Holliday junction resolvase. It is on the approved SEA-PHAGES list. CDS complement (27177 - 27365) /gene="41" /product="gp41" /function="hypothetical protein" /locus tag="ShakeItOph_41" /note=Original Glimmer call @bp 27365 has strength 9.88; Genemark calls start at 27371 /note=SSC: 27365-27177 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_41 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 2.02746E-33 GAP: 124 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.156, -2.253486910374644, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_41 [Arthrobacter phage VroomVroom]],,WIC90191,98.3871,2.02746E-33 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Bhattarai, Aryan /note=Auto-annotation: Glimmer calls the start site at 27365 and GeneMark calls the start at 27371. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -2.253. It is the best final score on PECAAN. This corresponds to the start site of 27365. /note=Gap/overlap: 124 bp, which is very large, however, this is on the reverse frame. There needs to be ~50 bp or more anytime there’s a switch between the reverse to the forward direction, to make space for promoters! Also, there is no coding potential indicating that there might be a new gene. Gap is overall reasonable because it’s conserved in non-draft phages, VroomVroom and Emotion, both part of the same subcluster and pham. /note=Phamerator: pham: 135598. Date 1/19/2024. 49 members exist in this pham, all of them are in the Subcluster AZ. /note=Starterator: As of 1/19/2024 there is no starterator report, it cannot be used. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 27365. /note=Function call: No known function. PhagesDB produced one significant hit that had an unknown function with an e-value of 4e-29. NCBI BLAST produced one significant hit with a hypothetical protein (95-98% coverage, 100% identity, E-value <2.046e-33. CDD and HHpred had no relevant hits. /note=Transmembrane domains: This is not a membrane protein because DeepTMHMM does not predict any TMDs. /note=Secondary Annotator Name: Arnav Saud /note=Secondary Annotator QC: /note=Primary annotator still needs to fill out a synteny report for this gene. Annotator also needs to check off if starterator can be used or not. This is a real gene and the location call seems correct based on all available evidence. The function call is correct too. CDS 27490 - 30000 /gene="42" /product="gp42" /function="DNA primase/helicase" /locus tag="ShakeItOph_42" /note=Original Glimmer call @bp 27490 has strength 17.0; Genemark calls start at 27490 /note=SSC: 27490-30000 CP: yes SCS: both ST: SS BLAST-Start: [DNA primase/helicase [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 0.0 GAP: 124 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.379, -3.9166971242758706, no F: DNA primase/helicase SIF-BLAST: ,,[DNA primase/helicase [Arthrobacter phage VroomVroom]],,WIC90192,96.747,0.0 SIF-HHPRED: Putative primase C962R; polymerase, primase, PrimPol, Helicase, DNA BINDING PROTEIN; HET: ANP;{African swine fever virus BA71V},,,8IQI_C,55.3828,100.0 SIF-Syn: DNA primase/helicase, upstream gene is NKF, downstream is NKF. /note=Primary Annotator Name: Kang, Alix /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 27490. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.917. It is the second-best final score on PECAAN. /note=Gap/overlap: Gap: 124bp. Somewhat large, but ultimately reasonable because the gap is conserved in other non-draft phages (VroomVroom, Emotion) and there is no coding potential in the gap that might be a new gene. /note=Phamerator: The pham number as of January 14, 2024 is 85082. The gene is conserved in phages VroomVroom, Emotion, all in the same cluster as ShakeItOph. The function call for the gene is DNA primase/helicase. It is on the approved SEA-PHAGES list. /note=Starterator: Start site 44 was manually annotated in 58/119 non-draft genes in this pham. Start 44 is 27490 in ShakeItOph. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 27490. /note=Function call: DNA primase/helicase. multiple phagesdb BLAST hits with the suggested function DNA primase/helicase with small E-values of 0.0. The top 3 NCBI BLAST hits also have the function of DNA primase/helicase. (100% coverage, 74.88%+ identity, E-values of 0.0). HHpred has hits that correspond to unique SEA-PHAGES requirements for this gene. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Daniel, Mila /note=Secondary Annotator QC: I agree with this annotation and chosen start site. Make sure to fill out your synteny box. CDS 30010 - 30129 /gene="43" /product="gp43" /function="hypothetical protein" /locus tag="ShakeItOph_43" /note=Original Glimmer call @bp 30010 has strength 11.21; Genemark calls start at 30010 /note=SSC: 30010-30129 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_43 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 97.4359% 4.99449E-16 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.139, -4.325675514574479, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_43 [Arthrobacter phage VroomVroom]],,WIC90193,89.7436,4.99449E-16 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Saud, Arnav /note=Auto-annotation: Glimmer and Genemark both call the start site to be located at 30010. /note=Coding Potential: There is high coding potential on both GenemarkSelf and GenemarkHost. Coding potential in this ORF is on the forward strand only. /note=SD (Final) Score: -4.326. This is the best final score called by PECAAN. /note=Gap/overlap: /note=There is a gap of 9 bps between our gene of interest’s start site and the previous gene’s stop site. /note=Phamerator: pham: 89866. Date: 1/13/24. The gene is conserved. It is found in Emotion (AZ) and VroomVroom (AZ). /note=Starterator: Date: 1/13/24. Start site 2 was manually annotated in 5/5 non draft genes in this pham. Start site 2 is present in ShakeItOph. Start site 2 was also the most annotated start site. There is sufficient evidence to use Starterator in order to determine the start site of this gene. /note=Location call: The gene is also a length of 120 bps. This is a sufficient enough length to make it a possible gene. Based on the evidence above, this is most likely a real gene. /note=Function call: NKF. The top 2 phagesdb BLAST results reported No Known Function (<6e-08). The only 2 NCBI BLAST hits that appeared had the reported function of hypothetical protein (>60.53% Identity, >70% Aligned, >97.3459% Coverage, e-value: <1e-6). HHpred had no statistically significant hits. CDD had no significant hits. /note=Based on the evidence above, this gene has NKF. /note=Transmembrane domains: DeepTMHMM does not predict any transmembrane domains for this gene; therefore, it is not a membrane protein. /note=Secondary Annotator Name: Akkinepally, Mrudula /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. Note: The overlap might be 10 bp and not 9. also no need to fill out synteny box if NKF. CDS 30221 - 30355 /gene="44" /product="gp44" /function="hypothetical protein" /locus tag="ShakeItOph_44" /note=Original Glimmer call @bp 30221 has strength 10.23 /note=SSC: 30221-30355 CP: no SCS: glimmer ST: NA BLAST-Start: [hypothetical protein SEA_VROOMVROOM_44 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 4.40137E-19 GAP: 91 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.905, -2.8454123590742793, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_44 [Arthrobacter phage VroomVroom]],,WIC90194,95.4545,4.40137E-19 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Daniel, Mila /note=Auto-annotation: Glimmer calls the start site at 30221. GeneMark does not call the start site. /note=Coding Potential: There is no good coding potential between the called start site and stop site, and no ORF that fits the STOP and potential START sites. It is possible that this gene is not real. Another gene may need to be added instead. /note=SD (Final) Score: The final score is -2.845, and this is the best final score reported. /note=Gap/overlap: 91 bps. There is a gap of 91 bps with the upstream gene. /note=Phamerator: Pham 88080. /note=Starterator: The start site found at start number 10 was manually annotated for ShakeItOph and called in 7 of the 7 non-draft genes in this pham. /note=Location call: N/A (Not a real gene) /note=Function call: I don’t believe this gene to be real. /note=Transmembrane domains: DeepTMHMM predicted no TMDs, so this gene does not code for a membrane protein. /note=Secondary Annotator Name: Giusti, Alessia /note=Secondary Annotator QC: I agree that this is most likely not a real gene. All of the reasons above detail. why thoroughly, but I would also add that it is less than 140 bp long. I would maybe add a gene in the reverse direction since there appears to be coding potential. CDS 30557 - 32419 /gene="45" /product="gp45" /function="DNA polymerase I" /locus tag="ShakeItOph_45" /note=Original Glimmer call @bp 30557 has strength 19.79; Genemark calls start at 30557 /note=SSC: 30557-32419 CP: yes SCS: both ST: SS BLAST-Start: [DNA polymerase I [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 0.0 GAP: 201 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.303, -2.4811409279978642, yes F: DNA polymerase I SIF-BLAST: ,,[DNA polymerase I [Arthrobacter phage VroomVroom]],,WIC90195,99.6774,0.0 SIF-HHPRED: Apicoplast DNA polymerase; DNA polymerase, exonulease, apicoplast, Plasmodium falciparum, REPLICATION, TRANSFERASE; HET: PEG, EDO; 2.5A {Plasmodium falciparum (isolate 3D7)},,,7SXQ_B,96.7742,100.0 SIF-Syn: DNA Polymerase I, upstream gene is DNA Primase/Helicase, downstream is Binding Protein, just like in phage Emotion (AZ4) /note=Primary Annotator Name: Akkinepally, Mrudula /note=Auto-annotation: Glimmer and Genemark both call the same start site: 30557 /note=Coding Potential: Host trained Genemark and Self trained Genemark had really good coding potential for this area on the forward strand with an open reading frame that covers the entire length of this gene. /note=SD (Final) Score: -2.481 (most positive value) /note=Gap/overlap: There is a 201 bp gap between the previous gene and this gene. There is a little coding potential on both the forward and reverse strands between these genes and an open reading frame on self trained genemark. /note=Phamerator: Pham 133396. 1/15/2024. It is conserved, found in VroomVroom (AZ4) and Emotion (AZ4). /note=Starterator: Calls the most annotated start site. (Start: 230 @30557 has 860 MA`s). Found in 942 of 1707 ( 55.2% ) of genes in pham, Manual Annotations of this start: 860 of 1546, Called 98.5% of time when present. /note=Location call: Based on this evidence, this is a real gene and the most likely start site is 30557. /note=Function call: DNA Polymerase I. The top three phagesdb BLAST hits have the function of DNA Polymerase I (E-value 0), and 5 out of 5 top NCBI BLAST hits also have the function of DNA Polymerase I. (100% coverage, 75%+ identity, and E-value 0). HHpred had a hit for DNA Polymerase I with 100% probability, and E-value of 3.9e-73. Best hit on CDD was for DNA Polymerase Family A (accession number: pfam00476). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Joo, Hannah /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. The gap notes need to be double checked (eg. gap is listed as 202bp but is 201bp). CDS 32416 - 32613 /gene="46" /product="gp46" /function="hypothetical protein" /locus tag="ShakeItOph_46" /note=Original Glimmer call @bp 32434 has strength 9.53; Genemark calls start at 32434 /note=SSC: 32416-32613 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_EMOTION_47 [Arthrobacter phage Emotion]],,NCBI, q3:s2 89.2308% 1.69041E-28 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.987, -4.717114992838803, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_EMOTION_47 [Arthrobacter phage Emotion]],,WGH21396,74.6479,1.69041E-28 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Giusti, Alessia /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 32424, but the start site was manually changed to 32416 due to a better Z-score (1.987) and FS (-4.717) (32424 has a Z-score of 1.487 and an FS of -6.307), better gap (-4 for 32416 and 14 for 32424), and it being the most annotated start site on Starterator. /note=Coding Potential: Coding potential in this ORF (start site 32416) is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score:The final score is the best option at -4.717 and the z score is the highest at 1.987. /note=Gap/overlap: The gap for start site 32416 is -4, which is a favorable gap, likely indicating it is part of an operon. The gap for start site 32424 is 14, which is less favorable. /note=Phamerator: pham: 85757. Date 01/19/2024. It is conserved; found in Emotion (AZ), Iter (AZ), and Adolin (AZ). /note=Starterator: Start number 4 corresponds to 32416 in ShakeItOph. It was manually annotated as the start site at the highest frequency (23 times). This start site does not agree with the site predicted by Glimmer and GeneMark (start 7 in Starterator), but is the most conserved start site. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 32416. /note=Function call: All hits in PhagesDB BLASTp returned with function unknown. All but one hit in NCBI BLASTp returned with hypothetical protein. CCD showed no hits and HHPRED showed no significant hits (all e-values were 42 or above). A hit in NCBI BLASTp did return as a DNA helicase with an e-value of 8e-20, 70.1% identity, and 81.5% coverage, but there were no other hits similar to this one in any program. As such, the function of this protein cannot be known at this time. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Garcia, Isabella /note=Secondary Annotator QC: Based on the evidence provided above, this is likely a real gene with a start site of 32416 and no known function. CDS 32601 - 32918 /gene="47" /product="gp47" /function="hypothetical protein" /locus tag="ShakeItOph_47" /note=Original Glimmer call @bp 32601 has strength 11.56; Genemark calls start at 32601 /note=SSC: 32601-32918 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQE15_gp41 [Arthrobacter phage KeAlii] ],,NCBI, q1:s1 97.1429% 7.52521E-16 GAP: -13 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.063, -2.583959800616441, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE15_gp41 [Arthrobacter phage KeAlii] ],,YP_010678158,59.4059,7.52521E-16 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Joo, Hannah /note=Auto-annotation: Both Glimmer and GeneMark call the start 32601. /note=Coding Potential: Strong coding potential for this ORF is in the forward strand only, indicating that this is a forward gene. Coding potential is found in both GeneMark Self and Host. /note=SD (Final) Score: -2.584. This is the best final score on PECAAN. /note=Gap/overlap: Overlap: 13bp. While the overlap is too large to predict the presence of an operon, it is not large and therefore, is still reasonable. /note=Phamerator: Pham: 965. Date: 1/22/24. The pham is conserved. It is found in two non-draft AZ4 phages: VroomVroom (e-value 3e-47) and Emotion (e-value 7e-30). /note=Starterator: Start site 36 was present in 103/108 non-draft phages in pham 965. Start 36 is 32601 which agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: This is a real gene based on the above evidence. The most likely start site is at 32601. /note=Function call: NKF. CDD and HHpred returned no strong hits. The top two phagesDB blastp hits were VroomVroom (e-value 3e-47) and Emotion (e-value 7e-30) with no known function. The top NCBI blastp hits list no known function (identity 36% and greater, at least 87% coverage, e-values less than e-14). /note=Transmembrane domains: DeepTMHMM predicts no transmembrane domains, so it is not a transmembrane protein. /note=Secondary Annotator Name: Hernandez, Sarah /note=Secondary Annotator QC: I agree with this annotation and function call. CDS 32967 - 33116 /gene="48" /product="gp48" /function="hypothetical protein" /locus tag="ShakeItOph_48" /note=Original Glimmer call @bp 32967 has strength 0.4 /note=SSC: 32967-33116 CP: yes SCS: glimmer ST: SS BLAST-Start: GAP: 48 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.985, -3.9067510144203146, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Garcia, Isabella /note=Auto-annotation: Glimmer calls the start site at 32967. GeneMark did not call a site. /note=Coding Potential: There is coding potential found on the forward strand in both Host-Trained GeneMark and GeneMark Self. /note=SD (Final) Score: The final score is the best option, at -3.907. The z score for this start site is the highest at 2.985. /note=Gap/overlap: There is a gap of 49 bp. /note=Phamerator: Pham 100344 as of 1/17/24. There are 3 draft members of this pham. /note=Starterator: Uninformative (due to the other two pham members being drafts). /note=Location call: Based on this evidence, this is likely a real gene with a start site of 32967. /note=Function call: PhagesDB Blastp, CDD, HHpred, and NCBI Blast did not return any significant hits. /note=Transmembrane domains: There were not any TMDs predicted by DeepTMHMM. /note=Secondary Annotator Name: Bidzan, Hanna /note=Secondary Annotator QC: I agree with the location call with a start site at 32967 due to the final score and gap. I also agree with the function call due to no significant hits by PhagesDB, NCBI, HHpred and CDD and can also agree that this gene has no known function. CDS 33177 - 33989 /gene="49" /product="gp49" /function="DNA binding protein" /locus tag="ShakeItOph_49" /note=Original Glimmer call @bp 33177 has strength 19.7; Genemark calls start at 33132 /note=SSC: 33177-33989 CP: yes SCS: both-gl ST: SS BLAST-Start: [DNA binding protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 3.22942E-167 GAP: 60 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.964, -2.996490344739583, yes F: DNA binding protein SIF-BLAST: ,,[DNA binding protein [Arthrobacter phage VroomVroom]],,WIC90197,94.4444,3.22942E-167 SIF-HHPRED: SIF-Syn: DNA binding protein, upstream gene is NKF, downstream is DNA binding protein, in VroomVroom and Emotion this gene and the downstream are a single gene. /note=Primary Annotator Name: Hernandez, Sarah /note=Auto-annotation: Glimmer calls 33177 and GeneMark calls 33132, both with start codon GTG. /note=Coding Potential: Both start sites cover all of the coding potential in the self and host trained GeneMarks. /note=SD (Final) Score: The final score for 33177 is -2.996 and 33132 and -4.931. The final score for 33177 is good. /note=Gap/overlap: The gap for 33177 is 60 which is within the ~50 range. /note=Phamerator:1-19-24. Pham: 131891. It is conserved in Ascela (AZ) and Iter (AZ). /note=Starterator: The most annotated start is 39. It is included in The most annotated start is 39. It is included in 47 of 109 non-draft genomes. It is not present in ShakeItOph. ShakeItOph has start site 38 annotated which is present in 9 of 146 gene in the pham and called 100% of the time it is present. This evidence agrees with the start site predicted by Glimmer. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 33177. /note=Function call: DNA binding protein. CDD has a hit for DNA-directed RNA polymerase specialized sigma subunit with e-value 6.96e-07 with 92.6% coverage. HHpred top 18 hits were for RNA pilfered sigma factors with probates over 99% and e-values over 6.7e-21. The top hits for Phagesdb were function unknown or DNA binding protein with e-values ranking from 3e-58 to 1e-133. NCBI blast top 3 hits were also DNA binding protein and the fourth is RNA polymerase sigma factor with over 99% coverage and e values from 3.23e-167 to 1.1e-48 and 3.15e-48 respectively. SEA-PHAGES does not accept RNA polymerase sigma factor as a function name so it is DNA binding protein. /note=Transmembrane domains: Deeptmhmm did not predict any TMDs. This is most likely not membrane protein. /note=Secondary Annotator Name: Sass, Arielle /note=Secondary Annotator QC: After reviewing the PECAAN notes and referenced external resources, I agree with the start site call of bp 33,177 and the function call of a DNA binding protein. CDS 34166 - 35137 /gene="50" /product="gp50" /function="DNA binding protein" /locus tag="ShakeItOph_50" /note=Original Glimmer call @bp 34166 has strength 14.52; Genemark calls start at 34166 /note=SSC: 34166-35137 CP: no SCS: both ST: SS BLAST-Start: [DNA binding protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s11 99.6904% 0.0 GAP: 176 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.763, -5.092497832398658, no F: DNA binding protein SIF-BLAST: ,,[DNA binding protein [Arthrobacter phage VroomVroom]],,WIC90198,92.8144,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Bidzan, Hanna /note=Auto-annotation: Both Glimmer and GeneMark agree with start site 34166 and start codon GTG /note=Coding Potential: The coding potential is only shown in the forward strand, not the reverse and has high activity /note=SD (Final) Score: The final score for the auto annotated start site is not the best since it is -5.092 with a z score of 1.763. The start site with the best score of -2.765 is start site 34799, although it does not necessarily have the best z score which is 2.905. Neither of those are the LORF however. The LORF has a start site of 34136, but a final score of -8.335 and a z score of 0.493. /note=Gap/overlap: There is a 10bp gap for the auto annotated start site of 34166 and the start site with the best score which is 34799. The LORF has a 14bp gap. All of these gaps are within reasonable range. /note=Phamerator: As of 1/19/2024, this gene belongs to pham #131891 along with 147 other members. 37 of which are drafts /note=Starterator: As of 1/19/2024, the starterator report called start site 39. Found in 47 of the 109 non-draft genes in the pham /note=Location call: This is a tricky call since the SD score shows a lot of variability between start sites, however I believe that the auto annotated start site of 34166 is the correct start site due to it having good coding potential with no switching between forward and reverse. /note=Function call: Phagesdb displayed 100 hits with VroomVroom as evidence of a DNA binding protein as shown by the low e value and relatively high score. HHpred and CDD did not have any significant results that could be used for our function call. I believe this gene has a DNA binding protein function. /note=Transmembrane domains: DeepTMHMM`s displays an absence of transmembrane regions which suggests that this is probably not a transmembrane protein. /note=Secondary Annotator Name: Diaz, Sebastian /note=Secondary Annotator QC: I agree with the primary annotation, the low SD and z-score for the auto-annotated start site is alarming yet the conserved nature of this gene within the pham suggests otherwise. I also agree with location and function call being DNA binding protein given the data provided. CDS 35204 - 35473 /gene="51" /product="gp51" /function="hypothetical protein" /locus tag="ShakeItOph_51" /note=Original Glimmer call @bp 35204 has strength 14.95; Genemark calls start at 35204 /note=SSC: 35204-35473 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_EMOTION_51 [Arthrobacter phage Emotion]],,NCBI, q1:s1 100.0% 1.23029E-37 GAP: 66 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.303, -2.033982896655645, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_EMOTION_51 [Arthrobacter phage Emotion]],,WGH21400,85.3933,1.23029E-37 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sass, Arielle /note=Auto-annotation: GeneMark and Glimmer agree on a start site at 35,204 bp. /note=Coding Potential: There is coding potential predicted by Host-trained and self-trained GeneMark and the chosen start site covers all of the coding potential. /note=SD (Final) Score: -2.034, the best final score of two options on PECAAN. The Z-score is 3.303. /note=Gap/overlap: Gap is 66 bp which is longer than ideal however there is no upstream coding potential in the gap. /note=Phamerator: The gene is in pham 80399 as of 1/13/24. There is only one non-draft gene in the pham, in the phage Emotion, which is in the same cluster AZ4. /note=Starterator: Start site 2 in Starterator was manually annotated in 1/1 non-draft genes in this pham, but found and called in 4 of 4 genes in the pham. Start 2 is 35204pb in ShakeItOph. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: The gathered evidence suggests that the gene is real and starts at 35,204 bp. /note=Function call: NKF. One significant hit from PhagesDB BLASTp (3e-33) and one significant hit from NCBI BLASTp (1e-37) but neither with suggested functions. /note=Transmembrane domains: No transmembrane proteins were predicted by DeepTMHMM. /note=Secondary Annotator Name: Samudrala, Vaishnavi /note=Secondary Annotator QC: The chosen start site doesn`t cover all coding potential on the Self-trained GeneMark; Try to completely follow template in lab manual for auto-annotation, Staterator, gap/overlap, and location call sections. More evidence for NCBI BLASTp possible. Maybe mention that there were no significant hits for HHPRED and CDD in the function call section just for reference. Other than that, agree with everything else. CDS 35594 - 36457 /gene="52" /product="gp52" /function="hypothetical protein" /locus tag="ShakeItOph_52" /note=Original Glimmer call @bp 35594 has strength 17.75; Genemark calls start at 35594 /note=SSC: 35594-36457 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_49 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 2.67275E-155 GAP: 120 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.985, -2.6013996449736907, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_49 [Arthrobacter phage VroomVroom]],,WIC90199,87.4564,2.67275E-155 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Diaz, Sebastian /note=Auto-annotation: Both Glimmer and GeneMark are utilized, they both agree at start site #35594, with the start codon ATG. /note=Coding Potential: Within both the Host-Trained and Self-Trained coding potential graphs, the gene’s ORF contains substantial coding potential activity and the start site does cover all the coding potential. /note=SD (Final) Score: The SD score for the potential start site at #35594 is -2.601 which is the best out of all the potential start site candidates. /note=Gap/overlap: The gap score for this gene is 120 base pairs which is an abnormally large value (enough to fit a gene, however there is absolutely no coding potential upstream). The length of this gene with start site at #35594 is 846bp. The alternative start site candidates produce smaller gene lengths and larger gap values. /note=Phamerator: As of January 19th, 2024 this gene is found within pham #135878. The pham is conserved with other members within its subcluster AZ4. Including phage VroomVroom’s gene 49. There was no conserved function in this pham. /note=Starterator: There is a reasonable start site that is highly conserved within the pham. The conserved start site is site number 6, which corresponds to base pair coordinate 35594 for ShakeItOph. There are approximately 19 members in this pham and 9/15 nondraft genes call site #6. /note=Location call: I believe this is a real gene at start site #35594 due to high coding potential within the putative ORF, a reasonable SD score and a long gene length. /note=Function call: All phagedDB BLAST hits returned with no function assigned. HHpred results all possessed both poor e-values and poor %probability. Finally NCBI hits returned with “hypothetical protein” assignment., Consequently, no function may be called. /note=Transmembrane domains: This protein is not a membrane protein because it has no transmembrane domains called by TMHMM. /note=Secondary Annotator Name: Sotelo, Jessie /note=Secondary Annotator QC: I noticed that there are other pham members and it is not an orpham. Try reloading the starterator page. I agree with the rest of the annotation and with the function call of NKF due to the lack of hits. CDS 36578 - 37183 /gene="53" /product="gp53" /function="SprT-like protease" /locus tag="ShakeItOph_53" /note=Original Glimmer call @bp 36578 has strength 14.27; Genemark calls start at 36578 /note=SSC: 36578-37183 CP: yes SCS: both ST: SS BLAST-Start: [SprT-like protease [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 6.12856E-132 GAP: 120 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.985, -2.6013996449736907, yes F: SprT-like protease SIF-BLAST: ,,[SprT-like protease [Arthrobacter phage VroomVroom]],,WIC90200,97.0149,6.12856E-132 SIF-HHPRED: SprT-like domain-containing protein Spartan; DPC repair protease, DNA BINDING PROTEIN; HET: FLC, MLZ, ADP; 1.5A {Homo sapiens},,,6MDW_A,93.0348,99.8 SIF-Syn: SprT-like protease, upstream gene is DNA polymerase I and downstream gene is HNH endonuclease, like in phage Emotion. /note=Primary Annotator Name: Samudrala, Vaishnavi /note=Auto-annotation: Both Glimmer and GeneMark. Both call the start site at 36578 (codon: ATG). /note=Coding Potential: There is reasonable coding potential on one of the direct strands near the start site called by both Glimmer and GeneMark when viewed on Self or Host Trained GeneMark. The start site covers all of the coding potential on only the Self-Trained GeneMark and partially for the Host Trained GeneMark. /note=SD (Final) Score: The Glimmer and GeneMark called start site has a z-score of 2.985 and a final score of -2.601. The final and z-scores are the best when compared to the other start site candidates listed. /note=Gap/overlap: There is an unreasonable gap between the gene and another upstream gene (120 bps). Although the final score is much smaller for the 36509 start site, it has a more reasonable gap value of 51 bps and allows for a larger length gene of 675 bps. Since the final and z-score are irrelevant, the 36509 start site could be a better alternative. The 36509 start site would allow for an appropriate length of the gene to be able to code a protein adequately (675 bps >120 bps). This site would also completely cover the coding potential in both Host and Self-trained GeneMark as opposed to the auto-annotated start site. /note=Phamerator: The pham number the gene is found in is 1210 (as of 1/18/24). Other members of the cluster AZ4 (ShakeItOph’s cluster) are included as a part of this pham. The non-draft members the phage was compared to include Emotion and VroomVroom. No functions were called on Phamerator/PhamMaps for this gene in ShakeItOph. /note=Starterator: There is a reasonable start site conserved in the members of the pham associated with the gene at site 48. For ShakeItOph, the site corresponds to the 36578 bp site or the auto annotated site (has 1 MA). The start site was called in 44 of the 83 non-draft phages of the pham. /note=Location call: The gene associated with the auto-annotated start site is real. There are no changes in orientation of its coding (only coding potential on the direct strand), the gene is more than 120 bps long and can adequately code a protein, and there is reasonable coding potential in its location. Considering that the auto-annotated start site (36578) is conserved and called most often in Phamerator (by pham members), has 1 MA, and is called by both Glimmer and GeneMark, there is more evidence towards it being the best start site candidate. The only benefits of the alternative start site proposed above would be a more reasonable gap value (but there may be another gene in this location). Overall, the auto-annotated start site is the better choice. /note=Function call: PhagesDB BLASTp has at least two significant hits with strong E-values (<5e-84) and reasonable percentage identity values (>78%). Both of these hits call the protein to have an “SprT-like protease” function. NCBI BLASTp also has at least two significant hits with strong E-values (<3e-106) and query coverage values (>95%). The percentage identity values for these hits are greater than 78.87%. NCBI hits call the function to also be “SprT-like protease”. CDD shows one significant hit (E-value= 3.1e-8) which assigns “no known function” to the protein but classifies the protein as a “SprT family zinc-dependent metalloprotease” (irrelevant). HHPRED shows a minimum of two significant hits with E-values<1.28e-8 but only one of the hits identifies the protein to be a “SprT-like domain-containing” protein (this function isn’t listed on the SEA PHAGES approved functions list as of 1/18/23). /note=Transmembrane domains: No TMRs were predicted using DeepTMHMM. /note=Secondary Annotator Name: Vasquez, Eunice /note=Secondary Annotator QC: /note=I agree with the primary annotator CDS 37230 - 37544 /gene="54" /product="gp54" /function="hypothetical protein" /locus tag="ShakeItOph_54" /note=Original Glimmer call @bp 37314 has strength 15.79; Genemark calls start at 37314 /note=SSC: 37230-37544 CP: no SCS: both-cs ST: SS BLAST-Start: GAP: 46 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.915, -4.845388397338397, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sotelo, Jessie /note=Auto-annotation: Glimmer and Genemark both call the start site at 37314. /note=Coding Potential: The ORF has good coding potential on the forward strand, indicating that it is a forward gene. Coding potential is found on both GeneMark Self and Host. /note=SD (Final) Score: -4.708. It is the best final score on PECAAN. /note=Gap/overlap: Gap: 46. Gap is conserved in other phages (VroomVroom and Emotion) and there is no coding potential in the gap. /note=Phamerator: Pham: 9258. Date: 1/14/24. It is conserved; found in Maureen (AZ2) and Liebe (AZ2). /note=Starterator: Start site 5 in Starterator has no manual annotations, however, it is found in 3/8 genes in the pham. It calls the start site at 37314 bp and therefore agrees with Glimmer and Genemark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 37314. /note=Function call: NKF. All the top phagesdb BLAST hits have NKF. NCBI hits all came up as hypothetical proteins. HHpred has no hits with reasonable e-values. CDD has no hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Aguirre, Austin Leon /note=Secondary Annotator QC: I think I disagree with the called start site. Starterator seems to be uninformative because there are so few genes in this pham that its not very helpful, but looking based on PECANN, the start site 37230 has a smaller gap of 46 compared to the called start site’s gap of 130. Without sufficient evidence of a conserved start, I don’t think that a gap bigger than 100 can be justified. 46 is still very large, but less so. Also, start 37230 has the strongest z-score, a very close to the strongest final value, and is the longest ORF. GTG is also a common reasonable start codon. I do see on Genemark that your called start site is closer to the beginning of coding potential, but we unfortunately do not have a lot of information to explore that on starterator. I think I may have noticed an error, you said the gap of 46 is conserved in other phages, but the gap for 37314 is 130. The gap for 37230 is 46. I suggest looking at this data again or getting another opinion. Should also describe how many different genes are in the pham and what clusters they belong to, is function conserved in pham? What cluster is this gene in ShakeitOph? Again, what date was starterator called? Synteny box should be filled out, what genes are upstream and downstream in phages with synteny? Phamerator should be marked as not informative. CDS 37686 - 37970 /gene="55" /product="gp55" /function="hypothetical protein" /locus tag="ShakeItOph_55" /note=Original Glimmer call @bp 37686 has strength 20.3; Genemark calls start at 37626 /note=SSC: 37686-37970 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_52 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 97.8723% 8.08688E-48 GAP: 141 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.063, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_52 [Arthrobacter phage VroomVroom]],,WIC90202,81.5534,8.08688E-48 SIF-HHPRED: SIF-Syn: /note=PECAAN Notes /note=Primary Annotator Name: vazquez, eunice /note=Auto-annotation: Glimmer calls the start site at 37686 with ATG start codon and genemark calls the start at 37626 with the start codon TTG. /note=Coding Potential: Coding potential in this ORF is found on the forward strand only, indicating this is a forward gene. The coding potential is covered by the start site. /note=SD (Final) Score: -2.443 is the final score, and it the best final score on PEECAN. The z score is 3.063. /note=Gap/overlap: 141 bp gap. This is a big overlap which makes us question this gene, but it could be that there is another gene that needs to be added. /note=Phamerator: As of 01/19/24 the pham number is 4404. Other phages in this pham belong to clusters AZ4 which are NKF, and FP which are also NKF. /note=Starterator: Start site 11 in starterator. There are more than 10 manual annotations for this start site. Start site 11 is 37686 which corresponds to the start site called by Glimmer. /note=Location call: With all the evidence above this gene is a real gene and has a start site at 37686 bp which was called by Glimmer. Starterator agrees with Glimmer. /note=Function call: NKF. The top two phagesDB Blastp hits had NKF (E-values: 3e-38 and 2e-16). HHpred had no good hits since the evalues were too high making the hypothetical proteins not a good call. The NCBI Blast had one good hit with phage vroomvroom with an e value of 8.08688e-48 and high percent coverage, /note= identity and aligned with the function hypothetical proteins. /note=Transmembrane domains: There are no TMRs noted by DeepTMHMM. /note=Secondary Annotator Name: Jacobs, Sarisha /note=Secondary Annotator QC: I agree with the primary annotator`s start and function call. CDS 38032 - 38379 /gene="56" /product="gp56" /function="hypothetical protein" /locus tag="ShakeItOph_56" /note=Original Glimmer call @bp 38032 has strength 10.6; Genemark calls start at 38032 /note=SSC: 38032-38379 CP: no SCS: both ST: NI BLAST-Start: [hypothetical protein [Arthrobacter sp.]],,NCBI, q16:s75 86.087% 3.26215E-38 GAP: 61 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.164, -4.354778061539587, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Arthrobacter sp.]],,MCU1522918,45.6647,3.26215E-38 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Aguirre, Austin /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 38032. The called start site is the longest ORF.Start codon called was ATG. Auto Annotated gene does not have synteny with any manually annotated phages. /note=Coding Potential: Called start site does not appear to be shown on GeneMark Self and Host. Coding potential begins before called start site. No other possible start sites are present in the region. /note=SD (Final) Score: Called start site has a final score of -4.355 and a Z-score of 2.164. This called start site is also the longest ORF. One other called start site has a slightly better Z-score of 2.269 and a final score of -4.143, but it is not the LORF. /note=Gap/overlap: Gap of 61bp, which is fairly high. The only other called start site, 38254, has a gap of 283. /note=Phamerator: As of 1/13/24, there were 3 members in this group of cluster AC, but all three were draft phages. /note=Starterator: As of 1/13/24, there were only 3 members in the pham and all three members are drafts. No other manual annotations were called. /note=Location call: I think this might be a real novel gene. There appears to be a lack of evidence pointing towards this gene being real but it does have strong coding potential, a start site called by both glimmer and genemark, and it fills up a large gap in the genome. /note=Function call: NKF, no significant hits in any database. Mostly hits of unknown function genes or proteins. /note=Transmembrane domains: Two transmembrane domains present, suggesting that this gene could be a membrane protein. /note=Secondary Annotator Name: Tubeileh, Shareef /note=Secondary Annotator QC: Seeing as this gene does not have a lot of evidence pointing towards a real function, I think that the location call and function call are correct. There is evidence pointing to an unknown function though, so I would check the evidences from the corresponding options, CDD, HHpred, NCBIblast, etc. Also it could be a membrane protein given the membrane regions so that would be good to mention too I think. CDS 38704 - 39057 /gene="57" /product="gp57" /function="hypothetical protein" /locus tag="ShakeItOph_57" /note=Original Glimmer call @bp 38704 has strength 13.21; Genemark calls start at 38704 /note=SSC: 38704-39057 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_53 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 4.47228E-38 GAP: 324 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.379, -3.898968357315439, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_53 [Arthrobacter phage VroomVroom]],,WIC90203,75.8333,4.47228E-38 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Jacobs, Sarisha /note=Auto-annotation: Glimmer and Genemark both call start at 38074. /note=Coding Potential: The start site encaptures significant coding potential. However, in the forward direction there is a dip in coding potential that corresponds with a spike in coding potential in the reverse direction. This spike in the reverse direction is one lone peak and does ot have any surrounding coding potential near it. /note=SD (Final) Score: The start 38074 has the best Final score (-3.899) and the only Z score surpassing 2 (2.379). /note=Gap/overlap: There is a significant gap of 324 base pairs, but this gap can also be seen in VroomVroom and Emotion. /note=Phamerator: 1/16/2024 Pham:133767 with 60 members. This gene is found in other members of the same cluster, such as VroomVroom and Emotion. /note=Starterator: This gene does call the most annotated start (start 13 @38074). This start is called by 34 of the 36 published phages with this gene. /note=Location call: It is with evidence (best Z and final score, synteny with VroomVroom and Emotion, good coding potential)above that I call this gene at 38074. /note=Function call: On almost all of the databases (HHpred and Phagesdb), there are no hits for any function except for one hit for a tail assembly chaperone on the NCBI Protein blast. However, this e-value is too large (0.024). All the other hits are for hypothetical or no known function proteins. There are no conserved domains in this gene. Based on this evidence, the function I assign to this protein is NKF. /note=Transmembrane domains: There are no transmembrane domains /note=Secondary Annotator Name: Yao, Alice /note=Secondary Annotator QC: For the most part, I agree. Just remember to also check off 1-2 evidence in the Phagesdb BLAST that supports what you are saying. CDS 39054 - 39932 /gene="58" /product="gp58" /function="hypothetical protein" /locus tag="ShakeItOph_58" /note=Original Glimmer call @bp 39054 has strength 18.45; Genemark calls start at 39054 /note=SSC: 39054-39932 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_54 [Arthrobacter phage VroomVroom]],,NCBI, q3:s1 91.7808% 5.91847E-82 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.985, -2.7423981586358774, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_54 [Arthrobacter phage VroomVroom]],,WIC90204,70.3297,5.91847E-82 SIF-HHPRED: SIF-Syn: There is synteny with Vroomvroom, wherein the gene upstream is also of unknown function. /note=Primary Annotator Name: Tubeileh, Shareef /note=Auto-annotation: Both Glimmer and GeneMark call this gene at 39054, this is the likely start site. /note=Coding Potential: There is significant coding potential seen in both GeneMark host and self. Additionally, there is no significant coding potential on the reverse strand for this interval. /note=SD (Final) Score: The best final score is associated with a different start site (-2.601), but the one associated with the recommended start site is also very good in value (-2.742) and identical in z-score (2.985). /note=Gap/overlap: The gap is -4, which is an acceptable gap. /note=Phamerator: As of 1/18/2024, the pham for this gene is 48281, with only 5 members, 2 of them being non-draft phages. /note=Starterator: The start site with the most manual annotations is start site 1, and it is called in all non-draft phages of this pham. This is the start site at 39054, also associated with the recommended start site by Glimmer and GeneMark. /note=Location call: From the coding capacity, Glimmer, GeneMark, and starterator, the likely start site of this gene is at 39054. /note=Function call: CDD has no significant hits, the HHpred hits are not significant in terms of e-value. NCBIblast gives a couple significant hits for a hypothetical protein. PhagesDB blast has three significant hits suggesting a protein of unknown function. This protein likely has no known function. /note=Transmembrane domains: DeemTMHMM predicts no transmembrane regions, so this is likely not a membrane protein. /note=Secondary Annotator Name: Wang, Jordan /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator regarding function. CDS 39929 - 40144 /gene="59" /product="gp59" /function="hypothetical protein" /locus tag="ShakeItOph_59" /note=Original Glimmer call @bp 39929 has strength 17.34; Genemark calls start at 39929 /note=SSC: 39929-40144 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Arthrobacter mobilis] ],,NCBI, q3:s4 94.3662% 8.53186E-4 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.299, -6.565547700744353, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Arthrobacter mobilis] ],,WP_168488253,53.7313,8.53186E-4 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Yao, Alice /note=Auto-annotation: Both Glimmer and GeneMark have start sites at 39929. The starting codon is GTG. /note=Coding Potential: There is a high coding potential in this ORF in the forward strand. Coding potential is found in both Genemark Self and Host. /note=SD (Final) Score: -6.566. It is not the best final score but it is the final score associated with the gap of -4 which could be indicative of an operon. /note=Gap/overlap: Gap: -4. Could indicate an operon. /note=Phamerator: Pham 100410. Date 01/13/24. It is conserved; found in JasmineDragon (AZ4). /note=Starterator: Start site 2 in Starterator was found in 3 out of 3 genes in this pham. None were manually annotated. Start 2 is 39929 in JasmineDragon. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 39929. /note=Function call: Unknown function. Top two Phagesdb BLASTp results show unknown function. None of the HHPRED results that are not unknown made the probability, percent coverage, and e-value cutoffs. NCBI BLASTp had a HIT of a hypothetical protein with 46.26% identity, 53.73% alignment, 94.36% coverage, and an e-value of 8.5*10^-4. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: RICHARD, KETAN LEONARD /note=Secondary Annotator QC: I agree that this is a real gene based off the coding potential and I agree with the start site since it encompasses the entire coding potential. And I agree that you cannot call a function based on the evidence, but it is a real gene and therefore a hypothetical protein. CDS 40141 - 40287 /gene="60" /product="gp60" /function="hypothetical protein" /locus tag="ShakeItOph_60" /note=Original Glimmer call @bp 40141 has strength 19.81; Genemark calls start at 40141 /note=SSC: 40141-40287 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_VROOMVROOM_56 [Arthrobacter phage VroomVroom]],,NCBI, q1:s16 83.3333% 7.92563E-19 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.063, -2.583959800616441, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_56 [Arthrobacter phage VroomVroom]],,WIC90206,63.4921,7.92563E-19 SIF-HHPRED: SIF-Syn: VroomVroom, Emotion /note=Primary Annotator Name: WANG, JORDAN /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 40141. /note=Coding Potential: There is good coding potential shown between 40141 (potential start) and 40287 (stop). /note=SD (Final) Score: For start site 40141, Final Score is -2.584 and it is the best start site. /note=Gap/overlap: 4 bp overlap, indicates possible operon. /note=Phamerator: Pham 135301. Date: Jan 21, 2024. Found across 220 phages (34 drafts). /note=Starterator: Start site 39 in Starterator was manually annotated in 5/186 non-draft genes in this pham. Start 39 is 11151 in ShakeItOph. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is likely a real gene and the most likely start site is 40141. /note=Function call: Unknown. All phagesdb BLAST and NCBI BLAST are of unknown function. HHpred hits have high e-values. CDD has no available data. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Zamora, Alexandra /note=Secondary Annotator QC: I agree with this annotation. All evidence has been considered. CDS 40280 - 40492 /gene="61" /product="gp61" /function="hypothetical protein" /locus tag="ShakeItOph_61" /note=Original Glimmer call @bp 40280 has strength 12.25; Genemark calls start at 40280 /note=SSC: 40280-40492 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_58 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 98.5714% 1.54603E-39 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.529, -6.097136544885063, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_58 [Arthrobacter phage VroomVroom]],,WIC90208,97.1429,1.54603E-39 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Richard, Ketan /note=Auto-annotation: Glimmer and Genemark; Both marked start site as 40280 /note=Coding Potential: There is primarily only coding potential in the forward direction in both the host and self GeneMark. The coding potential covers most of the ORF with the suggested start site. /note=SD (Final) Score: -6.097; It is the only score listed on PECAAN. /note=Gap/overlap: The gap is -8 base pairs which is not too much of a concern. This overlap is found in another final genome, VroomVroom. /note=Phamerator: Pham: 100283. Date: 1/13/24 This is not a common pham, but it is found in final genome VroomVroom in Cluster AZ. /note=Starterator: Start site 2 in Starterator was manually annotated in 1/1 non-draft genes in this pham. Start site of 2 is the most annotated start site. For ShakeitOph, it calls a start site of 40280. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 40280. /note=Function call: NKF. Phagesdb Blastp had 1 hit for VroomVroom with an unknown function. NCBI Blastp also had the same hit. There were no significant CDD or HHpred hits with any e-values less than 1. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Soan, Jessica /note=Secondary Annotator QC: I would check off VroomVroom as evidence within the Phagesdb BLAST data (e-value = 7e-33). I would also check off phage VroomVroom as evidence within the NCBI BLAST data as it has 92.85% identity, 98.57% coverage, and an e-value = 1.54603e-39. There were no significant hits within HHPRED. I agree with the start site and function call. Make sure to check off all the significant evidence. /note=I made all the necessary changes. CDS 40489 - 40872 /gene="62" /product="gp62" /function="hypothetical protein" /locus tag="ShakeItOph_62" /note=Original Glimmer call @bp 40489 has strength 12.61; Genemark calls start at 40489 /note=SSC: 40489-40872 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_59 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 92.126% 7.90147E-68 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.106, -4.744199388040119, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_59 [Arthrobacter phage VroomVroom]],,WIC90209,84.252,7.90147E-68 SIF-HHPRED: SIF-Syn: The upstream and downstream gene both have unknown functions, just like in phage VroomVroom. /note=Primary Annotator Name: Zamora, Alexandra /note=Auto-annotation: This gene is auto-annotated by Glimmer and GeneMark. Both agree on an auto-annotated start site at 40489 bp. The start codon of this start site is ATG. /note=Coding Potential: There is good coding potential on the host-trained GeneMark. The auto-annotated start site covers all this coding potential; the coding potential is on the forward strand only. There is good coding potential on the self-trained GeneMark. The auto-annotated start site covers all this coding potential. There is some coding potential on the reverse strand, but considering that this gene is surrounded by other forward genes, this coding potential is not significant. /note=SD (Final) Score: The final score for the likely start site is -4.744; the z-score is 2.106. This is the best final score and z-score. /note=Gap/overlap: Overlap: -4bp. The gap with the upstream gene is reasonable. The start is likely within an operon. This is not the longest ORF. This start site is favored over one with a longer ORF because pushing the start site forward would increase the overlap. /note=Phamerator: As of 1/16/24, this gene is located in pham number 48283. The pham is conserved among other members of the AK2 cluster. Phages Emotion and MiniMommy contain genes found in this cluster. /note=Starterator: Pham number 48283 has 5 members, with 3 of them being drafts. The most annotated start number is 2. This start number was annotated in 2/2 no-draft genes. This conserved start number corresponds to the likely start site at 40489 bps in ShakeItOph gene 64. /note=Location call: Based on the good coding potential of the host and self trained GeneMark and the information from the phamerator/starterator report, this is a real gene with likely start site at 40489. /note=Function call: Two of the top phagesdb hits have unknown function with e-values less than 1e-52. CDD had no relevant hits. HHpred had no relevant hits. The top two NCBI BLAST hits are hypothetical proteins with unknown function (coverage: 90% and 92%; e-values: 8e-68 and 1e-56). Based on this evidence, this gene has no known function. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Bhattarai, Aryan /note=Secondary Annotator QC: /note=Based on the above evidence, I agree with the primary annotator. This a real gene and the location call is correct. There isn`t enough sufficient evidence to determine the function of this gene. The function call of NKF is correct. /note= /note=Secondary Annotator QC: Bhattarai, Aryan /note=Auto-annotation: Glimmer and GeneMark both call the start at 40489. /note=Coding Potential: Coding potential in this ORF is primarily on the forward strand only, indicating that this is a forward gene. There is some coding potential on the reverse strand, but this is not significant considering that this gene is surrounded by other forward genes. Coding potential is found both in GeneMark Self and Host. All of the coding potential is contained within the start and stop site. /note=SD (Final) Score: -4.744. It is the best final score on PECAAN. /note=Gap/overlap: -4 bp, which is highly favorable. Gene is likely part of an operon. /note=Phamerator: pham: 48283. Date 1/22/2024. 5 members exist in this pham, and all of them are in the AZ cluster. /note=Starterator: There are 5 total members, 3 are drafts. Start number 2 is found in 5 of 5 genes in pham. There are 2 of 2 manual annotations of this start. It was called 100% of the time when present. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 40489. /note=Function call: NKF. Two of the top phagesdb hits have unknown function with e-values < than 1e-52. The top two NCBI BLAST hits are hypothetical proteins with unknown function (coverage: 90%+; e-values < 1e-56). Both CDD and HHpred had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. CDS 40869 - 40967 /gene="63" /product="gp63" /function="membrane protein" /locus tag="ShakeItOph_63" /note=Genemark calls start at 40869 /note=SSC: 40869-40967 CP: yes SCS: genemark ST: SS BLAST-Start: GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.985, -2.9525085049809894, yes F: membrane protein SIF-BLAST: SIF-HHPRED: SIF-Syn: Genes upstream and downstream of the gene have no known function, as seen in Emotion and JasmineDragon. /note=Primary Annotator Name: Soan, Jessica /note=Auto-annotation: 40869bp and has a start codon of ATG. Only GeneMark calls for a start site. /note=Coding Potential: This ORF has good coding potential on the forward strand in both the Self and Host Genemark; the chosen start site contains all the coding potential. /note=SD (Final) Score: -2.953 is the best final score with a Z-score of 2.985. This was not the LORF, but it had the best final score and one of the best Z-scores. There is a 4bp overlap with an ATG start codon. This favorable overlap also indicates the presence of an operon. All other possible start sites either create a large gap or overlap. /note=Gap/overlap: 4bp overlap. This is a small overlap, especially compared to the possible start sites which have significantly larger overlaps, at 238bp, or gaps of 50 or 68bp. An overlap of 4bp is a sign of an operon. /note=Phamerator: As of 1/22/2024, this gene belongs to pham 100474. It is conserved; found in only one other phage, JasmineDragon (AZ). /note=Starterator: There are 2 total members, both are drafts. Start number 2 was found in both of these draft genomes. There are no manual annotations of this start site. JasminDragon is the only other phage in this pham. /note=Location call: The start site called by GeneMark at 40869 bp seems to be the best, as it covers all the coding potential and is the only start site with a favorable overlap. /note=Function call: Membrane protein. DeepTMHMM predicts one TMD. Phagesdp BLAST shows one hit with a draft genome, JasminDragon, which also has one TMD and is listed as a membrane protein. Phagesdb function frequency and CDD had no hits. All HHPRED hits had too high of e-values to be considered. /note=Transmembrane domains: DeepTMHMM predicts one TMD. /note=Secondary Annotator Name: Kang, Alix /note=Secondary Annotator QC: The function is membrane protein, not NKF. DeepTMHMM predicts just one TMD. Based on this evidence this gene can be assumed to have a real TMD and is therefore a “membrane protein.” If you look at the same pham in JasmineDragon, you can see that its gene also has one TMD and is listed as membrane protein as function. Hence you would also need to fill out the synteny box. CDS 41082 - 41342 /gene="64" /product="gp64" /function="hypothetical protein" /locus tag="ShakeItOph_64" /note=Original Glimmer call @bp 41082 has strength 19.8; Genemark calls start at 41082 /note=SSC: 41082-41342 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_61 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 3.2374E-23 GAP: 114 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.063, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_61 [Arthrobacter phage VroomVroom]],,WIC90211,70.9302,3.2374E-23 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Bhattarai, Aryan /note=Auto-annotation: Glimmer and GeneMark both call the start at 41082. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. All of the coding potential is contained within the start and stop site. /note=SD (Final) Score: -2.443. It is the best final score on PECAAN. /note=Gap/overlap: 114 bp which is somewhat large, but ultimately reasonable because the gap is conserved in other non-draft phages, VroomVroom and Emotion. Also, there is no coding potential in the gap indicating that there might be a new gene. /note=Phamerator: pham: 48339. Date 1/20/2024. 5 members exist in this pham. All of them are in Subcluster AZ. /note=Starterator: There are 5 total members, 3 are drafts. Start number 7 found in 4 of 5 of genes in pham. There are 1 of 2 manual annotations of this start. It was called 100.0% of the time when present. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 41082. /note=Function call: NKF. The top 2 phagesDB hits both have an unknown function with small e-values that are both less than 2e-14. NCBI BLAST hits also have hits with 2 hypothetical proteins. (97-100% coverage, 44-58% identity, e-value <1.0191e-15). CDD and HHpred had no relevant hits. /note=Transmembrane domains: This is not a membrane protein because DeepTMHMM does not predict any TMDs. /note=Secondary Annotator Name: Arnav Saud /note=Secondary Annotator QC: /note=I agree that this is a real gene, and the location call is correct. I agree with the function call. CDS 41521 - 42096 /gene="65" /product="gp65" /function="hypothetical protein" /locus tag="ShakeItOph_65" /note=Original Glimmer call @bp 41521 has strength 15.33; Genemark calls start at 41521 /note=SSC: 41521-42096 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_62 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 98.4293% 7.12014E-36 GAP: 178 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.303, -1.953940808934884, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_62 [Arthrobacter phage VroomVroom]],,WIC90212,57.2139,7.12014E-36 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kang, Alix /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 41521. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -1.954. It is the best final score on PECAAN. /note=Gap/overlap: Gap: 178bp. Somewhat large. Somewhat large, but ultimately reasonable because the gap is conserved in other non-draft phages (VroomVroom, Emotion) and there is no coding potential in the gap that might be a new gene. /note=Phamerator: The pham number as of January 14, 2024 is 48567. The gene is conserved in non-draft phages VroomVroom and Emotion, all in the same cluster as ShakeItOph. /note=Starterator: Start site 7 in Starterator was manually annotated in 2/2 non-draft genes in this pham. Start 7 is 41521 in ShakeItOph. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 41521. /note=Function call: The top 2 phagesDB hits have unknown function with small E-values of 3e-29 to 1e-35. NCBI BLAST hits also have hits with hypothetical protein. (95-98% coverage, 40.59%+ identity, E-value <5e-25). CDD and HHpred had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Daniel, Mila /note=Secondary Annotator QC: I agree with this annotation and chosen start site. CDS 42093 - 42320 /gene="66" /product="gp66" /function="hypothetical protein" /locus tag="ShakeItOph_66" /note=Original Glimmer call @bp 42099 has strength 11.57; Genemark calls start at 42099 /note=SSC: 42093-42320 CP: yes SCS: both-cs ST: NI BLAST-Start: [hypothetical protein SEA_EMOTION_68 [Arthrobacter phage Emotion]],,NCBI, q1:s1 100.0% 5.26148E-20 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.878, -4.857784300446364, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_EMOTION_68 [Arthrobacter phage Emotion]],,WGH21417,74.6835,5.26148E-20 SIF-HHPRED: SIF-Syn: NKF more information is needed before a synteny call can be made /note=Primary Annotator Name: Saud, Arnav /note=Auto-annotation: Glimmer and Genemark both call the start site to be located at 42099. /note=Coding Potential: There is high coding potential on both GenemarkSelf and GenemarkHost. Coding potential in this ORF is on the forward strand only. /note=SD (Final) Score: -4.858. This is the best final score called by PECAAN. /note=Gap/overlap: /note=There is an overlap of 4 bps between our gene of interest’s start site and the previous gene’s stop site. This indicates an operon is present. /note=Phamerator: pham: 3945. Date: 1/13/24. The gene is conserved. It is found in Emotion (AZ4) and VroomVroom (AZ4). /note=Starterator: Date: 1/15/24. The most annotated start site was start site 13, and it was called in 10/20 non draft genes in this pham. Start site 13 was present in ShakeItOph, but it was not called. Instead, Start site 15 was called in ShakeItOph. The Starterator report is not indicative of where the start site could be. /note=Location call: The start site for this gene is different from that which is determined by Glimmer and Genemark. Based on the evidence above, the start site is most likely 42093. Additionally, this is a real gene. /note=Function call: NKF. The top 2 phagesdb BLAST results reported No Known Function (<3e-16). The top 2 NCBI BLAST hits that appeared had the reported function of hypothetical protein (>57.77% Identity, >64.44% Aligned, >98.63% Coverage, e-value: <2.24e-18). HHpred had no statistically significant hits. CDD yielded no results. /note=Based on the evidence above, this gene has NKF. /note=Transmembrane domains: DeepTMHMM does not predict any transmembrane domains for this gene; therefore, it is not a membrane protein. /note=Secondary Annotator Name: Akkinepally, Mrudula /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 42320 - 42616 /gene="67" /product="gp67" /function="hypothetical protein" /locus tag="ShakeItOph_67" /note=Original Glimmer call @bp 42320 has strength 12.39; Genemark calls start at 42320 /note=SSC: 42320-42616 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_VROOMVROOM_64 [Arthrobacter phage VroomVroom]],,NCBI, q12:s14 88.7755% 1.79045E-32 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.057, -3.284137222879636, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_64 [Arthrobacter phage VroomVroom]],,WIC90214,71.5789,1.79045E-32 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Daniel, Mila /note=Auto-annotation: Glimmer and GeneMark both call the start site at 42320. /note=Coding Potential: Coding potential is on the forward strand and there is good potential indicated on both GeneMark Self and Host between the STOP site and called START site. /note=SD (Final) Score: The final score is -3.284, and this is the best final score reported. /note=Gap/overlap: -1 bp.. There is an overlap of -1 bp with the upstream gene, indicating taht this gene may be part of an operon. /note=Phamerator: Pham 133755. /note=Starterator: The most frequently called start site at start number 18 was not manually annotated for ShakeItOph. Rather, the start site number 27 at 42320 was called. /note=Location call: The above evidence points towards this gene being real with a likely start site of 42320. /note=Function call: NKF. There are a few phagesdb BLAST hits for the HNH endonuclease function (E-value < 10^-13), however, the protein sequence does not qualify with the HNH endonuclease condition. Similarly, there are also a few NCBI BLAST hits for the HNH endonuclease function, but they are irrelevant for the same reason. CDD didn’t have any relevant hits. /note=Transmembrane domains: DeepTMHMM predicted no TMDs, so this gene does not code for a membrane protein. /note=Secondary Annotator Name: Giusti, Alessia /note=Secondary Annotator QC: I agree with this location and functional call. Note: (1) I could be mistaken, but I believe the dropdown from Starterator should read “SS” since the start site that was suggested by Starterator was the one that was annotated (2) make sure to put the accessed date for the Pham section and whether it’s conserved in other phages (3) include information about whether HHPRED had relevant hits (4) a bit nit-picky, but I would say why the hits from CCD were not relevant (high e-values, poor coverage, etc.) (5) I would also maybe state a little more specifically that NCBI BLASTp called top hits of hypothetical protein. tRNA 42824 - 42900 /gene="68" /product="tRNA-Trp(cca)" /locus tag="SHAKEITOPH_68" /note=tRNA-Trp(cca) CDS 43056 - 43256 /gene="69" /product="gp69" /function="hypothetical protein" /locus tag="ShakeItOph_69" /note=Original Glimmer call @bp 43056 has strength 8.59; Genemark calls start at 43056 /note=SSC: 43056-43256 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_66 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 7.58932E-19 GAP: 439 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.063, -2.523003374675015, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_66 [Arthrobacter phage VroomVroom]],,WIC90215,75.3623,7.58932E-19 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Akkinepally, Mrudula /note=Auto-annotation: Glimmer and Genemark both call the same start site: 43056 /note=Coding Potential: Host trained Genemark and Self trained Genemark had really good coding potential for this area on the forward strand with an open reading frame that covers most of the length of this gene. /note=SD (Final) Score: -2.523 (most positive value) /note=Gap/overlap: There is a 439 bp gap between the previous gene and this gene. There is a little coding potential on both the forward and reverse strands between these genes and an open reading frame on self trained genemark, however not enough to be a potential gene. There is a 13 bp overlap with the following gene, but we can’t change the stop site and it is less than 30 bp so it should be fine. /note=Phamerator: Pham 48236. 1/16/2024. It is conserved, found in VroomVroom (AZ4) and Emotion (AZ4). /note=Starterator: Found in 5 of 5 ( 100.0% ) of genes in pham. Manual Annotations of this start: 2 of 2. Called 100.0% of time when present. (Start: 1 @43056 has 2 MA`s). This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on this evidence, this is a real gene and the most likely start site is 43056. /note=Function call: NKF. The top non draft phagesdb BLAST hit has no known function (E-value 5e-17), and the only NCBI BLAST hit also has no known function. (100% coverage, 60.87% identity, and E-value 8e-19). HHpred had no relevant hits (E-values >99) CDD had no hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Joo, Hannah /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. Gap should be listed as 439bp. CDS 43243 - 43605 /gene="70" /product="gp70" /function="HNH endonuclease" /locus tag="ShakeItOph_70" /note=Original Glimmer call @bp 43243 has strength 0.12; Genemark calls start at 43513 /note=SSC: 43243-43605 CP: yes SCS: both-gl ST: NA BLAST-Start: [HNH endonuclease [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 1.15379E-71 GAP: -14 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.905, -2.827683592113848, yes F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage VroomVroom]],,WIC90216,91.7355,1.15379E-71 SIF-HHPRED: HNH endonuclease; Thermophilic bacteriophage, HNH Endonuclease, DNA nicking, HYDROLASE; 1.52A {Geobacillus virus E2},,,5H0M_A,62.5,95.7 SIF-Syn: HNH endonucleases, upstream gene has NKF and is in pham 48236, just like in phage VroomVroom. /note=Primary Annotator Name: Giusti, Alessia /note=Auto-annotation: Glimmer calls the start at 43243 and GeneMark calls the start at 43513. /note=Coding Potential: Coding potential in this ORF (start site 43243) is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: The final score is the best option at -2.828 and the z score is the highest at 2.905. /note=Gap/overlap: Gap: -14. An overlap of 14 is somewhat strange, but appears to be reasonable because all other start sites create a gap of at least 55 bp or an overlap of 155. This overlap of 14 is also conserved in phage VroomVroom of the same cluster. /note=Phamerator: pham: 135185. Date 01/19/2024. It is conserved; found in Emotion (AZ), Iter (AZ), and Adolin (AZ). /note=Starterator: Starterator is not currently available 01/19/2024. Will check back throughout the weekend. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 43243. /note=Function call: PhagesDB BLASTp and NCBI BLASTp both returned numerous hits with e-values 3e-22 or smaller that corresponded to an HNH endonuclease. CCD returned a hit to an HNH endonuclease with an e-value of 1.60e-03. HHPRED had a few hits corresponding to an HNH endonuclease with coverage 52.5% or greater and probability 92.7% or greater, but e-values 0.11 or greater. Overall, though, because they match the function calls of the BLASTp functions, the function of this protein is likely an HNH endonuclease. It also satisfies the SEA-PHAGES requirement for HNH endonucleases that the protein sequence has an H-N-H motif across a 30 aa space. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Garcia, Isabella /note=Secondary Annotator QC: Based on the evidence provided, this is a real gene with a start site of 43243, encoding an HNH endonuclease.