CDS complement (join(1. .573;61794. .62027)) /gene="1" /product="gp1" /function="helix-turn-helix DNA binding domain" /locus tag="RomansRevenge_1" /note=Original Glimmer call @bp 624 has strength 10.17 /note=SSC: 573-61794 CP: no SCS: glimmer-cs ST: NI BLAST-Start: GAP: 611 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.257, -6.166667396375102, no F: helix-turn-helix DNA binding domain SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=QC notes: For function call, there were no hits on HHpred, NCBI Blast, PhagesDB or CDD that had small enough e-values to be considered. When I looked at the SEA-PHAGES requirement of "2-3 alpha helices in the seqence separated by small spacer (turn) regions of 3-4 amino acids," it almost satisifies it, except the spacer between the first and second alpha helix is 5 amino acids long instead of 3-4. Should this still be considered satisfactory? /note=Primary Annotator Name: Akkinepally, Mrudula /note=Auto-annotation: Only Glimmer calls it. Start site 624 /note=Coding Potential: On the reverse strand, there is strong coding potential from the proposed start to position 0, however, there is no coding potential from the last nucleotide to the the stop site on the genemark accessed through PECAAN. Genemarks from phagesDB shows a weak potential from around the last nucleotide to the stop site. /note=SD (Final) Score: -4.101 (most positive value) /note=Gap/overlap: 561 bp gap between start of this gene and start of next gene on the forward strand. 561 is a large gap, however, there is no coding potential on genemark for this area.There is also a 185 bp gap between the stop site and the start site of the next gene on the reverse strand. However, sincewe would be required to add two copies of a gene in this area (because this is a tandem repeat region) it is impractical to add a gene there. /note=Phamerator: pham: 100382. Not shown on pham maps/no synteny. /note=Starterator: Found in 1 of 1 ( 100.0% ) of genes in pham, No Manual Annotations of this start, Called 100.0% of time when present, Phage (with cluster) where this start called: RomansRevenge_92 (singleton). It is not found in another phages besides RomansRevenge. /note=Location call: start site: 624 stop site: 61794 /note=Function call: The top three phagesdb BLAST hits: Rizwana_102, function unknown, 250 (e value 0.030), Fowlmouth_68, integrase, 266 (e value 0.20), MrMiyagi_67, helix-turn-helix DNA-binding domain protein, 266 (e value0.33); No significant hits on NCBI Blast. HHpred has hits for helix-turn-helix transcriptional regulators with >95% probability, and E-values ranging 0.16-0.36. CDD had one hit - accession number cl39134 - which is a helix turn helix containing protein - transcriptional regulator - (e-value 7.03e-03). But after looking at the hhpred alignment, there is evidence of 2-3 alpha helices in the sequence separated by small spacer (turn), which is the criteria set forth by sea phages. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Salinas, Juan Carlos /note=Secondary Annotator QC: I agree with the location call made by the primary annotator.. All available evidence has been considered. Most convincing are the z-score and final score, which are the most favorable of the provided location predictions. I also agree with the function call based on the information found on HHPRED and the predictions of 3 different sequences that favor a helix turn helix function. All evidence has been considered and the gene is most likely a helix-turn-helix DNA binding domain. The only missing detail is marking the coding potential (Y/N) on the gene candidate box and marking all evidence if possible. CDS 1185 - 1436 /gene="2" /product="gp2" /function="hypothetical protein" /locus tag="RomansRevenge_2" /note=Original Glimmer call @bp 1185 has strength 9.22; Genemark calls start at 1185 /note=SSC: 1185-1436 CP: yes SCS: both ST: NA BLAST-Start: GAP: 611 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.022, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Giusti, Alessia /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 1185. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: The final score is the best option at -2.443 and the z score is the highest at 3.022. /note=Gap/overlap: Gap: 560bp, which is pretty large, but appears reasonable because the gene prior is a reverse gene, which usually necessitates a larger gap. The gap is also conserved at the end of the genome, which is significant because RomansRevenge has direct terminal repeat ends. /note=Phamerator: pham: 100686. Date: 01/08/2023. It is an orpham. /note=Starterator: This gene is an orpham and therefore does not have a Starterator report. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 1185. /note=Function call: Only 4 PhagesDB BLASTp hits were returned, all of which had e-values 3.4 or greater. Additionally, the top 3 had unknown functions. All HHpred hits returned with e-values of 37 or higher. Two hits returned with coverage 69% or greater, but had e-values 150 or greater. NCBI BLASTp and CCD both returned no hits. Consequently, the function of this gene cannot currently be determined. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Zaragoza, Evelin /note=Secondary Annotator QC: I agree with the primary annotator on the start site and their function call. They may choose to elaborate that their gap is also due to repeats (mentioned by professor, because it is also similar at the end of the genome). CDS 1433 - 1741 /gene="3" /product="gp3" /function="hypothetical protein" /locus tag="RomansRevenge_3" /note=Genemark calls start at 1493 /note=SSC: 1433-1741 CP: yes SCS: genemark-cs ST: NA BLAST-Start: GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.946, -2.6814417326944517, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Joo, Hannah /note=Auto-annotation: GeneMark calls the start at 1493. Glimmer does not call a start site. /note=Coding Potential: Strong coding potential in this ORF is in the forward direction only and therefore, this is a forward gene. Coding potential is found on both GeneMark Self and Host. /note=SD (Final) Score: The best final score on PECAAN is -2.681. /note=Gap/overlap: Overlap is 4bp which indicates the presence of an operon for start at 1433. /note=Phamerator: Pham: 100711. Date: 01/10/2024. Gene is an orpham. /note=Starterator: There is no Starterator report because the gene is an orpham. /note=Location call: Based on the above evidence, this is a real gene. The most likely start site is at 1433, however, because the overlap of 4bp is stronger evidence since it suggests an operon. Additionally, start site 1433 has the strongest z-score of 2.946 and final score of -2.681, compared to the automated start 1493 z-score of 0.562 and final score of -7.632 which are both weak. /note=Function call: Unknown function. Only one other PhagesDB Blastp hit returned but the e-value was high (2.6). No NCBI Blastp hits returned. No strong HHpred hits returned since all hits had e-values of 30 or greater. No CDD hits returned. /note=Transmembrane domains: DeepTMHMM does not predict the presence of any TMDs, and therefore, it is not a membrane protein. /note=Secondary Annotator Name: Kibria, Kamille /note=Secondary Annotator QC: Coding potential supports start site at 1433. Auto annotated start site has a gap of 56, while Hannah`s call is -4. This seems reasonable to me. All blast databases did not return anything significant. No TMH`s. I agree completely with the primary annotator. CDS 1829 - 2017 /gene="4" /product="gp4" /function="hypothetical protein" /locus tag="RomansRevenge_4" /note=Original Glimmer call @bp 1829 has strength 15.89; Genemark calls start at 1829 /note=SSC: 1829-2017 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PBI_BEAGLE_12 [Arthrobacter phage Beagle] ],,NCBI, q1:s1 70.9677% 0.00406335 GAP: 87 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.526, -3.5676719685030167, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PBI_BEAGLE_12 [Arthrobacter phage Beagle] ],,QGJ92772,58.8235,0.00406335 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Garcia, Isabella /note=Auto-annotation: GeneMark and Glimmer, both call the start site at 1829. /note=Coding Potential: Coding potential is found only in the forward strand in both GeneMark Host and Self, meaning that this is a forward gene. /note=SD (Final) Score: The final score for this site is -3.568, which is the best. The z score is also the highest, at 2.526 /note=Gap/overlap: A gap of 87bp is present between the previous gene and this start site. However, this start site has the best final score and z score, and there is no coding potential found in the gap, nor synteny (due to this being a very novel genome) to suggest that a gene is missing in that gap. /note=Phamerator: Pham 131250 (1/9/24). There are 8 members total, which all belong to cluster AP except for this gene. The gene is conserved in phages Beagle and Odyssey395 /note=Starterator: This start site, start site 2 in the report, is found in all 8 members and manually annotated in 6. The start site appears to be conserved. /note=Location call: Based on the evidence above, this is a real gene with a likely start site of 1829. /note=Function call: There were 10 hits on PhagesDB, all of which had e scores above 10^-6 and unknown functions. NCBI Blastp returned one hit, which had an e score of 0.004. CDD did not return any hits. All HHPred hits had e scores greater than 4. Due to this, no function can be assigned to this gene. /note=Transmembrane domains: It is not a membrane protein. There are not any transmembrane domains predicted by DeepTMHMM. /note=Secondary Annotator Name: Dweik, Qaiss /note=Secondary Annotator QC: I agree with the location and function call for this gene`s annotation. CDS 2014 - 2301 /gene="5" /product="gp5" /function="membrane protein" /locus tag="RomansRevenge_5" /note=Original Glimmer call @bp 2020 has strength 11.42; Genemark calls start at 2020 /note=SSC: 2014-2301 CP: no SCS: both-cs ST: NA BLAST-Start: GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.441, -3.8079208184165134, no F: membrane protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=--Switched to start with gap -4, possible operon-- /note=Primary Annotator Name: Hernandez, Sarah /note=Auto-annotation: GeneMark and Glimmer both call the start site at 2020 with start codon GTG. /note=Coding Potential: Coding potential is found only in the forward strand indicating that this is a forward gene. Coding potential is found in both GeneMark Host and Self. /note=SD (Final) Score: The final score is -4.685 which is not the best. Z-score is close by not greater than 2. /note=Gap/overlap: Gap of 2, small enough to be reasonable. /note=Phamerator: pham: 100780. 1/10/24. It is a singleton. /note=Starterator: Page/report not found. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 2020. /note=Function call: There are a few hits from PhagesDB with tape measure protein however the e-values are greater than 1 which means they are not good indicators of function. HHpred has a few hits for membrane protein with the highest probability being 78.12 % and e-values of 29, 64, 48, and more over 1 which make them unlikely indicators of function. There were no CDD hits. DEEPTHMM predicts 3 TMDs, because this gene has at least 2 TMDs it is a membrane protein. /note=Transmembrane Domains: DEEPTHMM predicts 3 transmembrane domains. This gene is likely a membrane protein. /note=Secondary Annotator Name: Kathiravan, Anoushka /note=Secondary Annotator QC: I agree with the start site of 2020. There is high coding potential and a reasonable gap. The function call is likely also a membrane protein given the predicted TMDs. CDS 2298 - 2507 /gene="6" /product="gp6" /function="hypothetical protein" /locus tag="RomansRevenge_6" /note=Original Glimmer call @bp 2298 has strength 7.58; Genemark calls start at 2298 /note=SSC: 2298-2507 CP: yes SCS: both ST: NA BLAST-Start: GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.78, -3.303727606004721, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Bidzan, Hanna /note=Auto-annotation: Glimmer and GeneMark agree with start site 2298 however GTG, TTG, and ATG are all listed as start codons /note=Coding Potential: There is reasonable coding potential. Typical coding potential is displayed by the black line and it does extend upward with a start uptick so possible TTG start. /note=SD (Final) Score: The start site at 2298 has a final score of -3.304 which is good considering the other start sites are in the -6 to -7 range so it`s the closest to zero, therefore the most optimal. /note=Gap/overlap: 4bp overlap however the start site is not the LORF due to it being 210 bp long, however the longer orfs have overlaps in the >120bp range so it is not considered. Start 2214 (GTG) has a 88bp overlap but a length of 294. Start 2289 (TTG) has a 13bp overlap which is fairly acceptable and a length of 219bp. /note=Phamerator: 01/10/2024, pham #100812. It is an orpham/belongs to cluster singleton. /note=Starterator: No starterator report as of 01/10/2024 /note=Location call: I believe this is a real gene at start site 2298 due to the 4bp overlap and the final score of -3.304 /note=Function call: 16 blastp hits all with relatively low scores and high e values. Highest coverage was 66% and was called ssDNA binding protein. No HHpred, NCBI, or CDD hits. No function can be called until further research is done. /note=Transmembrane domains: No transmembrane domains /note=Secondary Annotator Name: Aves, Alexandra /note=Secondary Annotator QC: Annotation is correct and complete. I agree the function of the protein cannot be called currently. CDS 2504 - 2764 /gene="7" /product="gp7" /function="hypothetical protein" /locus tag="RomansRevenge_7" /note=Original Glimmer call @bp 2504 has strength 11.78; Genemark calls start at 2504 /note=SSC: 2504-2764 CP: yes SCS: both ST: NA BLAST-Start: GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.022, -4.049342652064859, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sass, Arielle /note=Auto-annotation: GeneMark and Glimmer agree on a start site at 2504. /note=Coding Potential: There is coding potential predicted by Host-trained and self-trained GeneMark and the chosen start site covers all of the coding potential. /note=SD (Final) Score: -4.049, the best final score on PECAAN. The Z-score is the highest possible at 3.022. /note=Gap/overlap: Gap of -4 is reasonable and indicates a potential operon. /note=Phamerator: The gene is in an orpham numbered 100552 as of 1/9/2024. /note=Starterator: This gene is an orpham so no Starterator report could be produced. /note=Location call: The gathered evidence suggests that the gene is real and starts at 2504bp. /note=Function call: No known function. No programs returned statistically informative results. /note=Transmembrane domains: No transmembrane proteins were predicted by DeepTMHMM. /note=Secondary Annotator Name: Ebrahimi, Shaghaiegh Mariam /note=Secondary Annotator QC: Agree with location and function call. Maybe mention that it is orpham in synteny box, and change location box to suggested start. CDS 2769 - 3038 /gene="8" /product="gp8" /function="hypothetical protein" /locus tag="RomansRevenge_8" /note=Original Glimmer call @bp 2769 has strength 17.46; Genemark calls start at 2769 /note=SSC: 2769-3038 CP: yes SCS: both ST: NA BLAST-Start: GAP: 4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.926, -2.996490344739583, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Diaz, Sebastian /note=Auto-annotation: Both Glimmer and GeneMark are utilized, both agreeing with the start site at #2769, calling start codon ATG. /note=Coding Potential: Within the host-trained coding potential graph there is consistent reasonable coding potential throughout the entire putative ORF, the same can be said for the self-trained coding potential graph. /note=SD (Final) Score: The SD score for the putative start site is -2.996, this is an optimal score as it is the closest out of all the potential start sites to zero. In addition it has a z-score of 2.926 (the most positive) As a result this site is the most credible for ribosome binding. /note=Gap/overlap: With candidate start site #2769, there is a reasonable gap of 4 base pairs. The gene length for the candidate start site is 270 which is acceptable. There is one alternative start site which produces a larger ORF (315bp) but it results in a 41bp overlap which is unacceptable and as a result is dismissed. /note=Phamerator: As of January 9th, 2024 this gene is an orpham, pham #100789. There is no function called. /note=Starterator: This gene is an orpham and as a result there is no starterator report to compare other start sites with. /note=Location call: Due to high coding potential within the putative ORF, a reasonable SD score and a valid gap of 4bp, I believe this is a real gene at start site #2769. /note=Function call: There was one phagesDB BLAST hit, which had an unacceptably high e-value of 7.6, the function called was a DNA binding protein. All HHpred hits returned with e-values of greater than 10 (also unacceptable), of which the highest had 47% coverage and only a probability of 60.3%. In addition there were no hits from NCBI BLASTp or CDD. As a result, no function may be determined. /note=Transmembrane domains: This protein is not a membrane protein because it has no transmembrane domains called by TMHMM. /note=Secondary Annotator Name: Tran, Michelle /note=Secondary Annotator QC: I agree with the location and function calls shown above. CDS 3035 - 3610 /gene="9" /product="gp9" /function="ParB-like nuclease domain" /locus tag="RomansRevenge_9" /note=Original Glimmer call @bp 3035 has strength 18.51; Genemark calls start at 3035 /note=SSC: 3035-3610 CP: yes SCS: both ST: SS BLAST-Start: [ParB-like nuclease domain protein [Arthrobacter phage BruhMoment]],,NCBI, q2:s3 99.4764% 2.85562E-45 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.178, -2.253377680616507, yes F: ParB-like nuclease domain SIF-BLAST: ,,[ParB-like nuclease domain protein [Arthrobacter phage BruhMoment]],,UOK18329,66.1538,2.85562E-45 SIF-HHPRED: ParB family protein; DNA-binding protein, CTP, Myxococcus, DNA-segregation, DNA BINDING PROTEIN; HET: GOL, UFQ; 1.7A {Myxococcus xanthus (strain DK 1622)},,,7BNR_A,93.7173,99.8 SIF-Syn: ParB-like nuclease domain protein, downstream is helix-turn-helix DNA binding domain protein, just like in phage Odyssey365 /note=Primary Annotator Name: Samudrala, Vaishnavi /note=Auto-annotation: Both Glimmer and GeneMark. Both call the start site at 3035 (codon: ATG). /note=Coding Potential: The gene has high coding potential in one of the ORFs of a forward strand. The start site suggested by both Glimmer and GeneMark is shown to cover all of the coding potential in the Host-Trained GeneMark but slightly does not in the Self-Trained GeneMark. /note=SD (Final) Score: The SD score for the start site called by both Glimmer and GeneMark has a best final score of -2.253 and the best z-score at 3.178. The z-score indicates that the binding of ribosomes to the site is much more probable as compared to other start sites. Overall, this site can be considered a good start site candidate. However, considering there is an overlap of four base pairs (gap value = -4), the RBS score may not be relevant and since the gene coded at the site may be an operon. /note=Gap/overlap: There is a four base pair overlap of the gene coded by the start site and another gene upstream. This overlap is reasonable as it suggests the gene could be coded as part of an operon. Even with this overlap, the length of the gene coded by the suggested start site is 575 bps. This is a reasonable length for adequate coding of a protein (above 120 bps). /note=Phamerator: /note=The pham the gene is found in is 132059 as of 1/9/24. Phages used for comparison include Ranunculus_10 (AP), Beagle_15 (AP2), BruhMoment_12 (AP3), MellowYellow_11 (AP2). The Pham Maps/ Phamerator data doesn’t call a function of this gene. Pham Maps/Phamerator also doesn’t call a function for this gene in phages RomansRevenge has a synteny with. /note=Starterator: The reasonable start site, that is conserved among the members of the pham to which the gene belongs, is site 15 (isn’t present in RomansRevenge). There are 69 total members in this pham of which 8 are drafts. 21 non- draft members out of the 61 non-draft genes call site 15 as the start site/ number. The Starterator is informative since there is a highly agreed upon start site between the many non-drafts phages in the pham. This start site also has 21 MAs, which builds more evidence towards the idea it’s a valid start site within the other pham members that RomansRevenge is a part of. The Starterator isn’t as relevant to assessing the candidate start site for RomansRevenge in comparison to other non-draft phages in its pham. However, it tells us that there are 2 MAs which call the 3035 bp site as the start site. In this way, the start site called by both Glimmer and GeneMark are supported by other manual annotations as well. This builds confidence towards the idea of calling it as a start site. /note=Location call: The gene examined is a real gene as it has high coding potential on one of the forward strands. There are also no switches in the orientation of the coding for the gene as there isn’t the same amount of coding potential on the complementary strands. The gene is an adequate length for a coded protein (575 bps or greater than 120 bps) Both Glimmer and GeneMark call the same start site at 3035. Although not conserved in other phages of the same pham as RomansRevenge, Starterator suggests that there are some manual annotations supporting the call. The start site also covers all of the coding potential in the Host-Trained GeneMark. This makes the site at 3035 the best start site candidate taking into account its singleton status. /note=Function call: NCBI BLASTp results show there are at least 4 phages with high query coverage values (>99%), significant E-values (<1e-49), but with low percentage identity values (<45.41%) which all call the function of the protein coded by the gene to be “ ParB-like nuclease domain protein”. PhagesDB BLASTp also shows at least two significant hits with good E-values (<1e-36) associated with non-draft phages. CDD shows that there is one significant hit within a “ParB-like nuclease domain” of the gene (E-value = 1.72e-17) which suggests the protein acts as a ParB-like nuclease. HHPred has a minimum of two significant hits (E-values <2.7e-19). Both hits call the function of the protein as “ParB-like nuclease domain protein”. Overall, HHPred and CDD hits support the function calls made by NCBI BLASTp and the protein possibly has a “ParB-like nuclease” function. NCBI BLASTp, HHPRED, and CDD all show significant hits (E-values<1e3) which consistent call the function of the protein as a "ParB-like nuclease" protein. NCBI BLASTp hits show high query coverage values (>99%) which suggests that close to all of the gene sequence was aligned with the query sequence; even considering low percentage identity values (<45.41%). /note=Transmembrane domains: There are no TMRs predicted by DeepTMHMM. /note=Secondary Annotator Name: Kim, Abby /note=Secondary Annotator QC: I agree with this annotation and all the evidence has been considered. CDS 3610 - 4101 /gene="10" /product="gp10" /function="helix-turn-helix DNA binding domain" /locus tag="RomansRevenge_10" /note=Original Glimmer call @bp 3610 has strength 13.45; Genemark calls start at 3610 /note=SSC: 3610-4101 CP: yes SCS: both ST: NI BLAST-Start: [helix-turn-helix DNA binding domain protein [Arthrobacter phage Odyssey395] ],,NCBI, q16:s28 83.4356% 1.18544E-27 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.606, -4.420240956945979, no F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding domain protein [Arthrobacter phage Odyssey395] ],,QOP66769,52.907,1.18544E-27 SIF-HHPRED: HTH_58 ; Helix-turn-helix domain,,,PF19575.3,22.0859,98.1 SIF-Syn: Helix-turn-helix, upstream gene is ParB-like nuclease domain, just like in phage Odyssey395. /note=Primary Annotator Name: Sotelo, Jessie /note=Auto-annotation: Glimmer and Genemark both call the start site at 3610. /note=Coding Potential: The ORF has good coding potential on the forward strand, indicating that it is a forward gene. Coding potential is found on both GeneMark Self and Host. /note=SD (Final) Score: -4.420. This is the best final score of ORFs producing genes with an appropriate length of at least 120 bp on PECAAN. /note=Gap/overlap: Overlap: 1bp. Overlap is very small and because it is 1bp this could be an operon. /note=Phamerator: Pham: 133612. Date: 1/16/2024. RomansRevenge is a singleton so it has no other cluster members. Other members of the pham are Affeca_1 (DE) and Ailee_1 (DE). /note=Starterator: Start site 8 in starterator has no manual annotations out of the 11 genes in the pham. Start site 8 is 3610 which corresponds to the start site called by Glimmer and Genemark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 3610 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: Helix-turn-helix binding domain. The top two phagesDB Blastp hits with a known function were terminase small subunit and helix-turn-helix DNA binding domain protein (E-values: 2e-24). However, there were more overall hits for a helix-turn-helix DNA binding protein. NCBI BLASTp also had a hit for a helix-turn-helix binding protein (83% coverage, 35% identity, and E-value of 9.31997e-28). HHpred had a hit for a helix-turn-helix domain (98% probability, coverage 22%, and E-value of 2.9*10^-5). CDD had no hits. /note=Transmembrane Domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Nathan Joseph To /note=Secondary Annotator QC: I have QCed this annotation and I agree with the location call due to the convincing z-score and final score, favoring this start as well as having a reasonable start codon and LORF with good coding potential. I also agree with the function call, with strong blast and HHpred hits pointing to this gene being a helix-turn-helix DNA binding domain. CDS 4085 - 4486 /gene="11" /product="gp11" /function="ribbon-helix-helix DNA binding domain" /locus tag="RomansRevenge_11" /note=Original Glimmer call @bp 4085 has strength 13.33; Genemark calls start at 4085 /note=SSC: 4085-4486 CP: yes SCS: both ST: NI BLAST-Start: [ribbon-helix-helix DNA binding domain protein [Arthrobacter phage BruhMoment]],,NCBI, q19:s18 86.4662% 2.27209E-31 GAP: -17 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.649, -4.331019961814701, yes F: ribbon-helix-helix DNA binding domain SIF-BLAST: ,,[ribbon-helix-helix DNA binding domain protein [Arthrobacter phage BruhMoment]],,UOK18331,58.0153,2.27209E-31 SIF-HHPRED: Putative; Helicobacter pylori, repressor, transcriptional regulator, DNA-binding, ribbon-helix-helix, HP0564, JHP0511, UNKNOWN FUNCTION, GENE REGULATION; NMR {Helicobacter pylori},,,2K1O_B,33.8346,99.0 SIF-Syn: Some synteny with Odyssey395 with the upstream gene function being a helix-turn-helix DNA binding domain. /note=Primary Annotator Name: Vazquez, Eunice /note=Auto-annotation: Glimmer and Genemark. Both call the start at 4085. /note=Coding Potential: Coding potential in this ORF is found on the forward strand only, indicating this is a forward gene. The coding potential is covered by the start site. /note=SD (Final) Score: -4.331 is the final score, and it is the best final score on PEECAN. The z score is 2.649. /note=Gap/overlap: -17 bp overlap. This is a reasonable overlap considering it is not 50bp or more. /note=Phamerator: As of 01/09/24 the pham number is 87648. Other phages in this pham belong to the AP cluster like SilentRX (DNA binding protein), Pureglobe5 (ribbon-helix-helix DNA binding domain protein), and BruhMoment (ribbon-helix-helix DNA binding domain protein). /note=Starterator: Start site 6 in starterator. There are no manual annotations for this start site, so it is noninformative, and it is the only phage with this start site in the pham. Start site 6 is 4085 which corresponds to the start site called by Glimmer and Genemark. /note=Location call: With all the evidence above this gene is a real gene and has a start site at 4085 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: ribbon-helix-helix DNA binding domain protein. The top two phagesDB Blastp hits with a known function were ribbon-helix-helix DNA binding domain protein (E-values: 2e-27 and 1e-23). HHpred showed that the first hit a ribbon helix helix had a high probability , low e-value , and although the coverage is lower than 35 it is 33 which is fairly close. The NCBI Blast had only one good hit which had a low e value, but the identity and coverage is low. This was also a ribbon helix helix. /note=Transmembrane domains: There are no TMRs noted by DeepTMHMM. /note=Secondary Annotator Name: Valente, Nina /note=Secondary Annotator QC: Check CCD and HHpred, even if there are no hits or an error mention that just to be safe. Note whether or not all of the coding potential is covered by the start site (I think it is). Also need to fill out the synteny box if there is a function, make a note on whether or not ORF is the longest with the chosen start site for location call (I don’t think it is the longest ORF). CDS 4536 - 6038 /gene="12" /product="gp12" /function="terminase, large subunit" /locus tag="RomansRevenge_12" /note=Original Glimmer call @bp 4548 has strength 15.17; Genemark calls start at 4548 /note=SSC: 4536-6038 CP: yes SCS: both-cs ST: NI BLAST-Start: [terminase large subunit [Arthrobacter phage SilentRX] ],,NCBI, q7:s6 93.8% 0.0 GAP: 49 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.111, -2.3335289980954053, yes F: terminase, large subunit SIF-BLAST: ,,[terminase large subunit [Arthrobacter phage SilentRX] ],,YP_010656397,59.1228,0.0 SIF-HHPRED: Large subunit terminase; large terminase, VIRAL PROTEIN; 2.2A {Deep-sea thermophilic phage D6E},,,5OE8_B,81.0,100.0 SIF-Syn: This gene shows synteny with BruhMoment and SilentRX in the downstream gene with the function of a membrane protein. /note=Primary Annotator Name: Aguirre, Austin /note=Auto-annotation: Both Glimmer and GeneMark were called and agreed on start site: 4548. Called start codon was ATG. 4548 is not the LORF at 1491 bp. The start site 4536 was the LORF at 1503 bp, though the called codon was TTG, which is less likely. /note=Coding Potential: Gene has strong coding potential and the start site contains all of the coding potential. However, there is a significant dip in coding potential around 4700-4900. /note=SD (Final) Score: The Final score for the start site 4548 is -4.574 with a Z score of 2.773. I disagree with this called start site as 4563 has a greater Final score of -2.334 with a greater Z score of 3.111. /note=Gap/overlap: 4536 has the smallest gap at 49 bp, while 4548 has a gap of 61 bp.These are, however, in a lower range than the other called start sites, which range in gaps of 1000bp. /note=Phamerator: Pham group 132044 as of 1/09/24. Function unknown and lacked synteny with other phages. No identified sub cluster. 73 pham members present. RomansRevenge has an unknown cluster, but genes within the cluster overwhelmingly point towards a terminase function. No synteny is present. /note=Starterator: Staterator is uninformative as there are no manual annotations. /note=Location call: Data points towards the start site being located at 4563 as shown by the Z and Final score as well as the smallest gap, and had the longest ORF. This is a real gene as it has strong coding potential. /note=Function call: terminase, large subunit. No synteny is present, but HHPRed and PhagesDBp BLAST both have hits of this function with significant e-values. /note=Transmembrane domains: No transmembrane domains are present. /note=Secondary Annotator Name: Laureano, Ryan /note=Secondary Annotator QC: I agree with the location call and function call. Some numbers in the PECAAN notes are incorrect such as the gap being 49 and not 45 but other than those minor issues this is a good entry. CDS 6049 - 6360 /gene="13" /product="gp13" /function="membrane protein" /locus tag="RomansRevenge_13" /note=Original Glimmer call @bp 6049 has strength 7.8; Genemark calls start at 6049 /note=SSC: 6049-6360 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Rizwana]],,NCBI, q1:s1 98.0583% 6.00268E-40 GAP: 10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.497, -4.458092987303931, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Rizwana]],,QWY81312,77.6699,6.00268E-40 SIF-HHPRED: SIF-Syn: This gene does appear in other phages of cluster AP like Beagle and BruhMoment and show synteny but they are NKF. There is no synteny with function of membrane protein /note=Primary Annotator Name: Jacobs, Sarisha /note=Auto-annotation: Glimmer and Genemark both call start at 6049 /note=Coding Potential: There is significant coding potential in the forward direction on both the host and self-trained maps, but on the self-trained map, there are one to two significant peaks in one of the ORFs for reverse direction. /note=SD (Final) Score: The best final score (-4.458) belongs to 6049. It also has a Z score of 2.497 (meets the minimum). The codon is also an ATG codon. /note=Gap/overlap: The smallest gap(10 bp) belongs to the start at 6049 /note=Phamerator: 1/9/2024. Pham: 131787 contains 253 members. (3/4/2024: pham 144134) /note=Starterator: The phage does not have the most annotated start. Start 60 @6049 is called by other phages, such as MellowYellow and Tank, that belong to the AP clusters and are used as comparison phages based on the Phagesdb blast.(3/4/2024: has 42 MA at start17@ 6049) /note=Location call: Based on the evidence above, this is a real gene, and the call is at 60409. /note=Function call: Based on the three membrane domains found on DeepTMHMM this gene most likely codes for a membrane protein. Nothing else can be said about the function as of now. /note=Transmembrane domains: There are 3 TMhelix membrane domains predicted. /note=Secondary Annotator Name: Ryan, Kaitlin /note=Secondary Annotator QC: I agree with all of the evidence presented above and see no changes to be made at this time. Just remember to check the GM coding potential box CDS 6378 - 8228 /gene="14" /product="gp14" /function="portal protein" /locus tag="RomansRevenge_14" /note=Original Glimmer call @bp 6378 has strength 17.48; Genemark calls start at 6363 /note=SSC: 6378-8228 CP: yes SCS: both-gl ST: SS BLAST-Start: [portal protein [Arthrobacter phage BruhMoment]],,NCBI, q8:s10 97.4026% 0.0 GAP: 17 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.895, -5.1701681203124, no F: portal protein SIF-BLAST: ,,[portal protein [Arthrobacter phage BruhMoment]],,UOK18335,65.9306,0.0 SIF-HHPRED: Portal protein; portal, HK97, bacteriophage, packaging, VIRAL PROTEIN;{Hendrixvirus},,,8CEZ_K,60.8766,100.0 SIF-Syn: This gene`s function is portal protein. There is synteny in the upstream and downstream phams with phages BruhMoment (AP3) and SilentRX (AP4). /note=ES: /note=Location Call: @6378 because more MA, but also better covers all coding potential. This site also has more reasonable z-score and coding potential than (79, 6462) /note= /note=Primary Annotator Name: Tubeileh, Shareef /note=Auto-annotation: Glimmer calls the gene at 6378, while GeneMark calls it at 6363. /note=Coding Potential: There is significant coding potential on the forward strand of the gene, while there is no significant coding on the reverse strand. This is likely a real gene. /note=SD (Final) Score: The best final score and z-score are not associated with either recommended start site but with a start site at 7845. The best final score and z-score for this start site is -2.292 and 3.093, respectively. However, I do not think these correspond to the best start site, for which the final score is -5.170 and the z-score is 1.895. /note=Gap/overlap: The gap/overlap for the best final score/z-score option is 10bp, while the gap for the suggested site is 17bp. /note=Phamerator: The pham number for this gene as of 1/9/2023 is 131802. This pham is found in other phages from various clusters, coding for a portal protein. One of these is Ailee, which is in cluster DE, it seems fairly conserved in this cluster as well as others. /note=Starterator: Start site 5 was manually annotated in 93/199 non-draft genes of this pham. Start site 5 is called in phage Ailee at 7526bp, but there is variety mostly in the 6000-7500bp range, which somewhat supports the site called by Glimmer and GeneMark. However, start site 34 is the one associated with RomansRevenge, with 7 manual annotations, and it corresponds to the start site given by Glimmer. /note=Location call: I think this is a real gene due to the coding potential and the pham which seems to code for a portal protein. The likely start site is Start site 34 at 6378bp, as mostly indicated by Glimmer and Starterator. /note=Function call: The likely function of this gene is a portal protein, as shown by the top 5+ phagesDB blast, with very low e-values indicating likelihood. The HHpred results have the top 3 phages with a low e-value (70% coverage, <1e-16 e-value). CDD and HHpred had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a transmembrane protein. /note=Secondary Annotator Name: Woodward, Lauren /note=Secondary Annotator QC: I agree with both the function and location calls. CDS 14092 - 14508 /gene="19" /product="gp19" /function="hypothetical protein" /locus tag="RomansRevenge_19" /note=Original Glimmer call @bp 14092 has strength 13.39; Genemark calls start at 14092 /note=SSC: 14092-14508 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_RIZWANA_21 [Arthrobacter phage Rizwana]],,NCBI, q4:s6 85.5072% 5.5732E-12 GAP: 4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.033, -4.879204572359809, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_RIZWANA_21 [Arthrobacter phage Rizwana]],,QWY81318,44.2177,5.5732E-12 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Soan, Jessica /note=Auto-annotation: Both Glimmer and Genemark call the start site at 14092. /note=Coding Potential: Coding potential in this ORF is only on the forward strand, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.879 is the best final score on PECAAN (Z-score 2.033). This supports the start called. /note=Gap/overlap: There is a 4bp gap. This is reasonable as the gap is very small. /note=Phamerator: Pham 5987. Date 1/9/24. All phages within Pham 5987 are in cluster AP. Found in Beagle (AP), BruhMoment (AP), and Odyssey395 (AP). No functions called for this gene. /note=Starterator: There are 11 total members, 3 are drafts. Start number 6 found in 1/11 of genes in pham (singleton). No manual annotations of this start. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 14092bp. Starterator agrees with Glimmer and Genemark. /note=Function call: No known function. The top two phagesdb BLAST hits have no known function call; BruhMoment (AP) (e = 2e-16) and Wilde (AP) (1e-15). The top NCBI BLAST hit showed no known function call. (97% coverage, 38% identity with BruhMoment, and E-value 1e-10) No significant HHpred hits (lowest e-value was 34 with 33.33% coverage). CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Carnes, Julianne /note=Secondary Annotator QC: Accidently annotated this gene. Notes below are mine and they match. I agree with the final conclusions. /note= /note= /note= /note=Auto-annotation: Glimmer and Genemark both call the start site at 14092. /note=Coding Potential: Coding potential for this gene in Host-Trained gene mark is valid in this region and codes only in the forward direction. Self Gene mark is also valid as this region is coded in the forward direction. /note=SD (Final) Score: -4.879. This is the best final score on PECAAN (Z-score 2.033). /note=Gap/overlap: The gap is 4 base pairs. This is reasonable as the gap is very small. /note=Phamerator: Pham 5987. Date 1/8/23. This gene is conserved in the AP cluster. Beagle, BruhMoment, Odyssey395, Pureglobe5, Rizwana, SilentRX, Tank, and Wilde all have this pham. /note=Starterator: Startsite 6 found in 1/11 of genes in pham. No manual annotations of this start. RomansRevenge is the only phage that calls this startsite (singleton). /note=Location call: Based on the evidence, the likely start site for this gene is at 14092 /note=Function call: No known function. BLASTp resulted in two best matches with BruhMoment (AP) (e = 2e-16) and Wilde (AP) (1e-15). No known function was assigned in BLASTp. NCBI Blastp had 97% query cover, 1e-10, and 38% identity with BruhMoment (no known function), and 85% query cover, 6e-12, and 38% identity cover with Rizwana (no known function). There are no listed CDD hits. No significant HHpred hits (lowest e-value was 34). /note=Transmembrane domains: No transmembrane domains. CDS 14509 - 15429 /gene="20" /product="gp20" /function="major tail protein" /locus tag="RomansRevenge_20" /note=Original Glimmer call @bp 14509 has strength 15.51; Genemark calls start at 14509 /note=SSC: 14509-15429 CP: yes SCS: both ST: SS BLAST-Start: [major tail protein [Arthrobacter phage SilentRX] ],,NCBI, q1:s1 99.3464% 2.69834E-120 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.022, -2.583959800616441, yes F: major tail protein SIF-BLAST: ,,[major tail protein [Arthrobacter phage SilentRX] ],,YP_010656405,73.4899,2.69834E-120 SIF-HHPRED: SIF-Syn: Major tail protein. The downstream gene and had no synteny with non-draft phage SilentRX, but showed upstream synteny with NKF. Gene 20 of RomansRevenge, which has synteny with gene 24 of SilentRX, are both in the same Pham (131693). /note=Primary Annotator Name: Bhattarai, Aryan /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 14509. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -2.584. It is the best final score on PECAAN. /note=Gap/overlap: The gap with the upstream gene is non-existent at 0 bp. The gap with the downstream gene is a little large at a 124bp gap. However, there is no coding potential in the gap that indicates there might be a new gene. /note=Phamerator: pham: 131693. Date 1/10/2024. 664 members exist in this pham and it seems there are various clusters. /note=Starterator: There are 664 total members, 81 are drafts. Start number 20 found in 11 of 664 of genes in pham (singleton). There are 8 of 583 manual annotations of this start. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 14509. /note=Function call: Major tail protein. This gene has a strong synteny with non-draft phages SilentRX and Odyssey395, which are both known to have a Major tail protein. For both of the programs, PhagesDB BLASTp & NCBI BLASTp, the top hits have scores that are >400 and strong e-value that are < 1e-93. HHpred and CDD hits were not significant given the large e-values. /note=Transmembrane domains: This is not a membrane protein because DeepTMHMM does not predict any TMDs. /note=Secondary Annotator Name: Chamorro, Marco /note=Secondary Annotator QC: I agree with the start site called by the primary annotator. The Z-score and final score of the start site are the strongest, and the gene has coding potential in the forward region. I agree with the major tail protein function because PhagesDB Blastp and NCBI Blastp had strong hits to other Major Tail Protein genes. CDS 15554 - 15964 /gene="21" /product="gp21" /function="hypothetical protein" /locus tag="RomansRevenge_21" /note=Original Glimmer call @bp 15554 has strength 9.36; Genemark calls start at 15554 /note=SSC: 15554-15964 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein [Cytophagales bacterium]],,NCBI, q37:s44 40.4412% 6.57557E-7 GAP: 124 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.129, -4.325675514574479, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Cytophagales bacterium]],,MDX2195338,15.1751,6.57557E-7 SIF-HHPRED: Tail fiber protein; bacteriophae T7 tail proteins, VIRUS, VIRAL PROTEIN; 3.4A {Escherichia phage T7},,,7EY9_h,55.1471,92.8 SIF-Syn: /note=Primary Annotator Name: Kang, Alix /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 15554. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.326. It is the best final score on PECAAN. /note=Gap/overlap: The gap with the upstream gene is a little large at a 124bp gap. However, there is no coding potential in the gap that might be a new gene. /note=Phamerator: pham: 100811. Date 1/10/2024. It is not conserved; it is an orpham. /note=Starterator: Orphams do not have Starterator reports. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 15554. /note=Function call: No known function. NCBI BLAST, CDD, and HHpred had no relevant hits. However, the top two PhagesDB BLAST hits (not including those with an unknown function) have minor tail protein function and E-values of 1e-9. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Mahadev, Anirudh /note=Secondary Annotator QC: I agree with the primary annotator`s location and function call, but please change the gene`s function to NKF rather than TBD to align with your function call. Also, I believe the starterator drop down should say not informative rather than N/A because of the lack of data since it`s an orpham. CDS 15964 - 17505 /gene="22" /product="gp22" /function="phosphoesterase" /locus tag="RomansRevenge_22" /note=Original Glimmer call @bp 15964 has strength 13.5; Genemark calls start at 15964 /note=SSC: 15964-17505 CP: yes SCS: both ST: NI BLAST-Start: [glycerophosphodiester phosphodiesterase family protein [Microbacterium oxydans]],,NCBI, q246:s205 51.6569% 2.0E-58 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.02, -4.6353190821692705, no F: phosphoesterase SIF-BLAST: ,,[glycerophosphodiester phosphodiesterase family protein [Microbacterium oxydans]],,WP_282847225,25.4808,2.0E-58 SIF-HHPRED: Glycerophosphoryl diester phosphodiesterase; hydrolase, metal binding protein; HET: G3P; 1.48A {Bacillus subtilis} SCOP: c.1.18.0,,,5T9C_E,46.7836,100.0 SIF-Syn: This gene has the function of phosphoesterase. There is no synteny found with other genes. Although there is conservation of the downstream (head-to-tail adaptor) genes found in phages Blab (EG) and Rasputia (GC), the upstream gene does not have the same pham numbers for the NKF function. /note=Primary Annotator Name: Saud, Arnav /note=Auto-annotation: Glimmer and Genemark both call the start site to be 15964. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene only. High coding potential is seen on bothGenemarkHost and GenemarkSelf. /note=SD (Final) Score: -4.635. This is the best final score listed on PECAAN. /note=Gap/overlap: There is a 1 bp overlap with the preceding gene. This suggests that an operon may be present. There are no significant gaps or overlaps that suggest that this gene needs to be deleted or adjusted. /note=There is a gap of 63 bps this gene’s stop site with the proceeding one’s start site. /note=Phamerator: pham: 2178, Date: 1/8/23. The gene is conserved; It is found in Huwbert (GG) and Triscuit (GG). /note=Starterator: Date: 1/8/23. Start site 12 was manually annotated in 35/56 non draft genes in this pham. Start site 12 is not present in RomansRevenge, but start site 8 is the auto-annotated start site. There is not enough evidence present in Starterator in order to use it as a tool to make a sufficient determination. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 15964. /note=Function call: Phosphoesterase. The top two phagesDB BlastP hits have the function of phosphoesterase (E-value < 2e^-48), and multiple species of bacteria have this protein function call as phosphodiesterase with the top 2 NCBI BLAST hits having the following: (51.65% coverage, +18.58% identity, and e-value < 1.4e-57). HHpred had a hit for a phosphodiesterase with 99.95% probability, 46.78% coverage, and E-value of 2.3e-26. CDD had one valid hit that called the gene to be located within the PI-PLCc_GDPD_SF super family. This super family is characterized as having genes that are assigned the function of phosphodiesterases. /note=Based on this and approved functions list, the gene most likely has the function of phosphoesterase. /note=Transmembrane domains: DeepTMHMM does not predict any transmembrane domains for this gene; therefore, it is not a membrane protein. /note=Secondary Annotator Name: Hosford, Ryan /note=Secondary Annotator QC: I agree with the primary annotator`s decisions on start site and function call. CDS 17569 - 18318 /gene="23" /product="gp23" /function="head-to-tail adaptor" /locus tag="RomansRevenge_23" /note=Original Glimmer call @bp 17569 has strength 13.39; Genemark calls start at 17569 /note=SSC: 17569-18318 CP: yes SCS: both ST: NI BLAST-Start: [head-tail adaptor [Arthrobacter phage SilentRX] ],,NCBI, q6:s3 97.992% 7.48095E-61 GAP: 63 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.361, -3.9166971242758706, no F: head-to-tail adaptor SIF-BLAST: ,,[head-tail adaptor [Arthrobacter phage SilentRX] ],,YP_010656407,60.2459,7.48095E-61 SIF-HHPRED: Adaptor protein Rcc01688; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_C,85.1406,99.7 SIF-Syn: Upstream gene is phosphoesterase and downstream gene is unknown. The upstream gene is conserved in Blab (EG) and Marcie (EG). /note=Primary Annotator Name: Daniel, Mila /note=Auto-annotation: Glimmer and GeneMark both call the start at 17569. /note=Coding Potential: Coding potential is on the forward strand and there is good potential as indicated on both GeneMark Self and Host. /note=SD (Final) Score: The final score is -3.917. It is the second best final score found on PECAAN. /note=Gap/overlap: 63 bp. The gap with the upstream gene is 63 bp which is slightly large. However, this gap is somewhat conserved in some other complete genomes displaying synteny with this gene such as with Tank from Cluster AP. /note=Phamerator: Pham number 125390. Date 01/05/24. It is conserved; it is found in the phages Adriana (B) and Barb (DC), in addition to many other phages from various clusters. /note=Starterator: The start site found at start number 96 was manually annotated and called in 303 of the 598 non-draft genes in this pham. This conflicts with the site predicted by both Glimmer and Genemark which called the start site at start number 137. /note=Location call: The above evidence points towards this gene being real with a likely start site of 17569 /note=Function call: Head-to-tail adaptor. The top three phagesdb BLAST hits of known function have the function of head-to-tail adaptor (E-value < 10^-35). 4 out of 5 of the top NCBI BLAST hits also have the head-to-tail adaptor function (95%+ coverage, 35%+ identity, E-value < 10^-39). CDD didn’t have any relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, so it is not a membrane protein. /note=Secondary Annotator Name: Indiresan, Neeti /note=Secondary Annotator QC: I agree with the chosen start site and functional call. I know your function is currently TBD and you`re waiting on help from Dr. Freise but based on the evidence I agree with the tentative functional call of head-to-tail adaptor. I would double check the HHPred results list because I think there might be some evidence that you can check. Other than that, all other dropdowns and checkboxes are completed and the PECAAN notes look good. CDS 18318 - 18812 /gene="24" /product="gp24" /function="hypothetical protein" /locus tag="RomansRevenge_24" /note=Original Glimmer call @bp 18318 has strength 11.71; Genemark calls start at 18318 /note=SSC: 18318-18812 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PP641_gp027 [Arthrobacter phage SilentRX] ],,NCBI, q1:s1 98.7805% 3.1684E-34 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.612, -3.3861058366776207, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PP641_gp027 [Arthrobacter phage SilentRX] ],,YP_010656408,61.2121,3.1684E-34 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Dweik, Qaiss /note=Auto-annotation: Both Glimmer and GeneMark call and agree on a start site at bp 18318 with a start codon of ATG (Met). /note=Coding Potential: Gene has great coding potential within the putative ORF, as indicated by Self and Host GeneMark maps with high levels of typical coding potential. The start site does cover all of the coding potential, as it is placed to the left of the beginning of the high peaks of the coding potential and the stop site is placed to the right on the maps. /note=SD (Final) Score: The original start site has the best SD score (least negative) at -3.386. /note=Gap/overlap: The overlap with the preceding gene (-1 bp) is reasonable and is indicative of this gene being part of an operon, strengthening the evidence for this start site. This proposed start site also creates the gene with the second longest ORF out of all the alternative start site candidates. The proposed length of the gene (495 bp) is acceptable as well. /note=Phamerator: As of 01/05/2024, this gene is found in pham 132595, which has 5 final members. Those genes are part of members in the AP cluster. Some of these phages include BruhMoment, Rizwana, SilentRX, Tank, and Wilde. /note=Starterator: There are 5 non-draft members in this pham and an additional 3 draft members. Of the 5 non-draft members, 3 of them call start site 8. However, start site 8 is not present in RomansRevenge_24 and cannot be called. Instead, this gene calls start site 11 (at bp 18318), which no other member calls. Therefore, this Starterator report is mostly uninformative, as there does not seem to be an agreeable start site that is present in RomansRevenge_23. Also, there are very few members within this pham, which is indicative of a lack of synteny across different phages and their clusters. /note=Location call: The gathered evidence suggests this is a real gene with the original start site @ bp 18318 being correct due to its complete encompassing of the coding potential, very little overlap with the preceding gene (-1 bp), and creation of the longest reasonable ORF of all the start site candidates. /note=Function call: No program has returned any informative results as to the function of this gene and no TMDs were found. /note=Transmembrane domains: No transmembrane domains were predicted, signifying that this gene does not code for a membrane protein. /note= /note=Secondary Annotator Name: Kalliomaa, Kira /note=Secondary Annotator QC: I have QC’ed this annotation and agree with this annotation. CDS 18816 - 19310 /gene="25" /product="gp25" /function="hypothetical protein" /locus tag="RomansRevenge_25" /note=Original Glimmer call @bp 18816 has strength 12.44; Genemark calls start at 18816 /note=SSC: 18816-19310 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_RIZWANA_27 [Arthrobacter phage Rizwana]],,NCBI, q10:s6 92.6829% 4.97933E-9 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.869, -2.7653702713535186, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_RIZWANA_27 [Arthrobacter phage Rizwana]],,QWY81324,49.0566,4.97933E-9 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kathiravan, Anoushka /note=Auto-annotation: The start site is identified as 18816 by both GeneMark and Glimmer. /note=Coding Potential: Based on Host-trained GeneMark and Self-Trained GeneMark there is coding potential predicted by both because the region of coding falls between the start site and the stop site. /note=SD (Final) Score: The Z score is 2.869 and the Final Score is -2.765. This is the second highest Z score however this is the gene with the least negative Final score. /note=Gap/overlap: The gap is 3bp /note=Phamerator: /note=Starterator: On 1/9/23 the pham is 6046. There are 11 members in this pham. While this gene has the most annotated start it is not called. In addition only 2 members call start site 5. /note=Location call: Start site is likely 18816 /note=Transmembrane domains: None. /note=Secondary Annotator Name: Potter, Sofia /note=Secondary Annotator QC: Because a function has been called, the synteny box should be filled out. There is also nothing stated in the Phamerator section above, but it seems like some of the Phamerator information bled over into the Starterator explanation-- I would suggest re-checking the example notes found in the annotation lab manual to see how to divide that up. Other than that, the evidence provided looks solid, and I agree with the calls made for the start site and function. CDS 19370 - 19843 /gene="26" /product="gp26" /function="tail assembly chaperone" /locus tag="RomansRevenge_26" /note=Original Glimmer call @bp 19370 has strength 6.32; Genemark calls start at 19370 /note=SSC: 19370-19843 CP: yes SCS: both ST: SS BLAST-Start: [tail assembly protein [Arthrobacter phage SilentRX] ],,NCBI, q1:s1 78.9809% 1.01703E-15 GAP: 59 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.253, -2.033982896655645, yes F: tail assembly chaperone SIF-BLAST: ,,[tail assembly protein [Arthrobacter phage SilentRX] ],,YP_010656411,45.2055,1.01703E-15 SIF-HHPRED: Phage_TAC_10 ; Phage tail assembly chaperone,,,PF10963.11,55.414,97.7 SIF-Syn: Tail assembly chaperone, upstream gene is major tail protein, downstream gene is tape measurer protein, similar to PureGlobe5 /note=Primary Annotator Name: Aves, Alexandra /note=Auto-annotation: Both Glimmer and GeneMark call the start site of this gene as 19370. The codon is GTG and the selected range is the LORF. /note=Coding Potential: The Host-Trained and Self-Trained GeneMark suggest that this is a real gene as both display strong coding potential that is covered between the suggested start site of 19370 and the stop of 19843. /note=SD (Final) Score: The selected gene has the highest z-score (3.253) and the least negative final score (-2.034) /note=Gap/overlap: The selected gene has a gap of 59bp which may be indicative of the need to add an additional gene, especially since there is coding potential for this gap on another frame. /note=Phamerator: As of 01/08/24, Phamerator calls the pham as 87695. All of the other members of this pham are in cluster AP including BruhMoment and PureGlobe5, both of which have functions listed as tail assembly chaperones. /note=Starterator: As of 01/08/24, Starterator calls the pham as 87695 and demonstrates this gene contains the most annotated start site at 19370. This pham has 11 members, 3 of which are drafts and all of which are in cluster AP besides RomansRevenge. /note=Location call: Based on evidence from the auto-annotation and Starterator, this gene is real and the start site for this gene is most likely 19370. /note=Function call: There are over 10 hits with eligible e-values that suggest the function of this gene is a tail assembly chaperone, including those found in SilentRX and Wilde. HHPRED also displays hits with probabilities around 97% with genes listed as tail assembly chaperones. /note=Transmembrane domains: There is no evidence for transmembrane properties. /note=Secondary Annotator Name: Salinas, Juan Carlos /note=Secondary Annotator QC: I agree with the function call made by the annotator. Strong coding potential within the ORF is found on both host-trained and GeneMark. Additionally, this start site has the most convincing z-score (3.253) and final score (-2.034). I also agree with the function call based on information from BLAST (Silent RX probability of 97%). NOTE: The only thing missing is selecting your evidence for the function call, selecting if staterator was informative or not, and if the coding potential covers the whole ORF (Y/N). CDS 20099 - 23632 /gene="27" /product="gp27" /function="tape measure protein" /locus tag="RomansRevenge_27" /note=Original Glimmer call @bp 20099 has strength 12.04; Genemark calls start at 20099 /note=SSC: 20099-23632 CP: yes SCS: both ST: NA BLAST-Start: [tape measure protein [Lactobacillus sp. UCMA15818] ],,NCBI, q579:s311 39.6771% 1.07499E-66 GAP: 255 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.671, -3.833788790802489, no F: tape measure protein SIF-BLAST: ,,[tape measure protein [Lactobacillus sp. UCMA15818] ],,WP_289977403,29.9659,1.07499E-66 SIF-HHPRED: Tape Measure Protein, gp57; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_AF,62.9567,99.8 SIF-Syn: Gene 27 in RomansRevenge aligns with 29 in Rizwana and there is a gap present before both genes. Gene 28 in Romans Revenge aligns with gene 30 in Rizwana and both are minor tail proteins. Similar to gene 31 in Wilde. Gene 28 of RomansRevenge aligns with 32 in Wilde. Filled gap in Wilde, but not before gene 27 in RomansRevenge. /note=Primary Annotator Name: Ebrahimi, Shaghaiegh Mariam /note=Auto-annotation start source: Glimmer and Genemark both call start at 20099. /note=Coding Potential: Good coding potential is present in Self-Genemark and HostGenemark. Is present throughout the entire reading frame and only in the forward strand. /note=SD (Final) Score: -3.834 this is the largest final score among the start sites /note=Gap/overlap: Gap is 255 which is large, but there is no coding potential between previous gene. /note=Phamerator: Pham as of 1/9/24 is 100431 and is unrelated to other genes since it is an orpham. Although phage is not part of any cluster, similar gene found in Rizwana, Wilde, and Tank phages in cluster AP1. /note=Starterator: N/A gene is orpham /note=Location call: Most likely start site is 20099 since it was called by both programs and has the highest z-score and final score. /note=Function call: Most likely function is tape measure protein. Good hits on phagesdb BLAST with e values ranging from 3e-81 to 4e-53. 4 good hits on HHpred with e values 1.1e-21 to 3.5e-10. Many good hits on NCBI BLAST ranging from 9e-53 to 2e-69. /note=Transmembrane domains: DeepTMHMM predicts 2 TMRs. This indicates that this is a membrane protein. /note= /note=Secondary Annotator Name: Zaragoza, Evelin /note=Secondary Annotator QC: Consider looking over HHPred since percent coverage is very low for some of them (the coverage should be at least 35%). Also look at your Phagesdb BLAST: Some of these are not evidence for the function you have called (tape measure protein), and you only need to check a few (you have checked too many). However, I agree with your start site and your function call. You might choose to elaborate that NCBI BLAST does not require high identity % or alignment since we are comparing to a bacteria. CDS 23638 - 24843 /gene="28" /product="gp28" /function="minor tail protein" /locus tag="RomansRevenge_28" /note=Original Glimmer call @bp 23638 has strength 11.7; Genemark calls start at 23638 /note=SSC: 23638-24843 CP: no SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage SilentRX] ],,NCBI, q1:s1 99.7506% 1.26886E-65 GAP: 5 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.366, -4.733054803473993, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage SilentRX] ],,YP_010656413,52.9703,1.26886E-65 SIF-HHPRED: HYPOTHETICAL PROTEIN 19.1; VIRAL PROTEIN, DISTAL TAIL PROTEIN; 2.95A {BACILLUS PHAGE SPP1},,,2X8K_C,35.9102,78.2 SIF-Syn: Minor tail protein; upstream gene is tape measure protein, downstream gene is minor tail protein, just like in phages SilentRX and Beagle. /note=Primary Annotator Name: Tran, Michelle /note=Auto-annotation: Both Glimmer and GeneMark designate the start site for this gene as 23638. /note=Coding Potential: The coding potential for this gene is indicated on both the host-trained and self-trained GeneMark in the assigned ORF. This coding potential is only found on the forward strand, which supports the annotation of this as a forward gene. /note=SD (Final) Score: The final score for this start site is -4.733, which is the best available final score for this gene on PECAAN. /note=Gap/overlap: There is a 5-bp gap between this gene’s proposed start site and the stop site of the upstream gene. This is a reasonably sized gap. /note=Phamerator: As of January 10, 2024, this gene is in pham 86486. This is conserved in Beagle and SilentRX (both AP). /note=Starterator: Start site 16, which corresponds to 23638 in RomansRevenge, was manually annotated in 18/21 of the non-draft genes in the pham. This evidence would agree with the auto-annotations predicted by Glimmer and GeneMark. /note=Location call: This gene is most likely real, with its start site at 23638. /note=Function call: Minor tail protein. The top three PhagesDB hits are consistent with this function (e-value <10e-47), as well as the top three NCBI Blast hits (e-value <10e-50). HHPred and the Conserved Domain Database didn’t yield any useful results due to the e-value being too high (>1). /note=Transmembrane domains: This is not a membrane protein because DeepTMHMM does not predict any TMDs. /note=Secondary Annotator Name: Kamille, Kibria /note=Secondary Annotator QC: Pham 86486 with 27 members. Conserved in many AP phages, the cluster which shows the most similarity/synteny with RomansRevenge. The start number called the most often in the published annotations is 16, and was called in 18 of the 21 non-draft genes in the pham. Start 16 corresponds with auto annotated start site at 23638. Phages DB Blast has very compelling evidence that this gene is a minor tail protein because significant hits are in the same pham and in a subcluster of AP. NCBI blast also has compelling evidence for a minor tail protein. High e values in HHPred, so nothing significant. I agree with the primary annotator`s call. CDS 24847 - 25974 /gene="29" /product="gp29" /function="minor tail protein" /locus tag="RomansRevenge_29" /note=Original Glimmer call @bp 24847 has strength 9.77; Genemark calls start at 24847 /note=SSC: 24847-25974 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage BruhMoment]],,NCBI, q1:s1 100.0% 1.17736E-136 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.344, -4.40029072999159, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage BruhMoment]],,UOK18350,71.3904,1.17736E-136 SIF-HHPRED: Tail protein, 43 kDa; tail protein, structural genomics, PSI, MCSG, Protein Structure Initiative, Midwest Center for Structural Genomics, UNKNOWN FUNCTION; 2.1A {Neisseria meningitidis MC58} SCOP: b.106.1.1,,,3D37_A,97.8667,99.7 SIF-Syn: Upstream gene is minor tail protein, the downstream gene is minor tail protein, just like in phage BruhMoment (the one immediately downstream has no function, however the one right after is minor tail protein) /note=Primary Annotator Name: Kim, Abby /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 24847. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.400. It is the second best final score on PECAAN. /note=Gap/overlap: 3 bp gap which is very small. There is no coding potential in the gap that might be a new gene as well. /note=Phamerator: pham: 131919. Date 1/10/24. It is conserved and found in FuzzBuster (Singleton) and TurkishDelight (Singleton), in addition to many other phages from various clusters /note=Starterator: Start site 33 in Starterator was manually annotated in 8 out 106 genes in this pham. Start 33 agrees with the site predicted by Glimmer and GeneMark 24847. /note=Location call: Based on the evidence, this is a real gene and the most likely start site is 24847. /note=Function call: Minor tail protein. The second and third phagesDB BlastP hits have the function of a minor tail protein (e-111 and e-106). The top NCBI BLAST hit and the third has the function call of a minor tail protein as well with the top hit having a query cover of 100% (e-value: 1e-136 and 57.74% identity). HHpred had a hit for tail protein with a 99.7% probability and an e-value of 2e-13. No available data in CDD. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Akkinepally, Mrudula /note=Secondary Annotator QC: I agree with the location call as it is the largest open reading frame and there is evidence of an open reading frame and good coding potential on Genemark. Furthermore, Genemark and Glimmer call the same start site. Although it doesn`t have the most annotated start site, 10 other phages share the same start site. Considering the synteny with BruhMoment and the low e-values found in BlastP, NCBI Blast, and HHPred, I also agree with the function call. CDS 25974 - 30983 /gene="30" /product="gp30" /function="minor tail protein" /locus tag="RomansRevenge_30" /note=Original Glimmer call @bp 25974 has strength 7.6; Genemark calls start at 25974 /note=SSC: 25974-30983 CP: yes SCS: both ST: NA BLAST-Start: [minor tail protein [Arthrobacter phage SilentRX] ],,NCBI, q6:s7 85.7999% 1.15334E-52 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.803, -3.0457372370466467, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage SilentRX] ],,YP_010656416,34.051,1.15334E-52 SIF-HHPRED: Probable central straight fiber; Bacteriophage, Siphophage, T5, baseplate, VIRAL PROTEIN; 3.88A {Escherichia phage T5},,,7ZQB_i,9.82624,98.6 SIF-Syn: There is synteny found in SilentRX, with the upstream genes being other minor tail proteins. /note=Primary Annotator Name: To, Nathan /note=Auto-annotation: Both Glimmer and Genemark call this start site at 25974 with start codon ATG. /note=Coding Potential:Both Genemark Self and Host show coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. /note=SD (Final) Score: -3.046, this is the second best final score available, and the better one has an unreasonable gap. /note=Gap/overlap: Overlap of -1, this is reasonable and the preferable gap, it also allows for the largest ORF. In addition, -1 gap is indicative of a potential operon, making this start site more likely. Additionally, start sites that would change the gap have much worse Z scores and final scores. /note=Phamerator: This gene is an orpham as of 1/9/2024. No function call is given. /note=Starterator: This is an orpham so there is no starterator report. /note=Location call:The gathered evidence suggests that this is a real gene, with start site @25974. This gene has good coding potential, and does not have large gaps before or after it. The start site 25974 seems most likely due to both Glimmer and Genemark calling it, its good Z score and final score, and a start codon of ATG. /note=Function call: Predicted function is minor tail protein, based on multiple NCBI and PhagesDB BLASTs with predicted function minor tail protein and extremely low E-values, (2e-70 for minor tail protein on phage SilentRX). HHpred and CDD do not make any calls for minor tail proteins, but neither make any calls with very low e values. Additionally, the length of this gene is consistent with minor tail protein genes and is in the syntenic region of minor tail protein genes. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs. /note=Secondary Annotator Name: Giusti, Alessia /note=Secondary Annotator QC: I agree with this location and functional call. Note: (1) You shouldn’t check more than 3 pieces of evidence for Phagesdb BLAST (2) the Starterator drop down should say “NA,” since it’s an orpham (3) HHPRED evidence shouldn’t be checked since it isn’t evidence for a minor tail protein (4) the final score is not the best final score (5) I would include coverage, identity, and e-values for the NCBI BLASTp hits. Additionally, this is a bit more nit-picky, but I would say orpham instead of orphan under the Starterator note and I wouldn’t say that the other start sites have worse Z-scores and final scores since one of them does have better scores for both (though a horrible gap). CDS 31074 - 31361 /gene="31" /product="gp31" /function="hypothetical protein" /locus tag="RomansRevenge_31" /note=Original Glimmer call @bp 31074 has strength 14.21; Genemark calls start at 31074 /note=SSC: 31074-31361 CP: yes SCS: both ST: NA BLAST-Start: GAP: 90 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.595, -4.170102567841746, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Valente, Nina /note=Auto-annotation: Both Glimmer and GeneMark call the gene and they agree on start site 31074. /note=Coding Potential: The ORF has reasonable coding potential. The chosen start site includes all of the coding potential in the forward direction. /note=SD (Final) Score: The final score of -4.71 is the best final score provided, given that it is the least negative score available. /note=Gap/overlap: There is a 90 base pair gap. Alternate start sites do not create a longer ORF, this start site with the 90 bp gap provides the longest ORF. The gene length is also reasonable at 288 base pairs. /note=Phamerator: As of 1/9/2024, the pham is listed as 100551. There are no other members of this pham on phagesDB, and an error message provided by Starterator. /note=Starterator: Error message provided by Starterator, since the start site cannot be compared to other start sites in similar genes in the pham, since there are no other members of the pham. /note=Location call: Based on the above evidence, being auto-annotation and the length of the ORF, the start site can be called at 31074.This appears to be a real gene. /note=Function call: 3 out of the top 5 phagesDB hits on PECAAN list a tape measure protein, while the other two are a major capsid protein and a scaffolding protein. The first hit, scaffolding protein, has an e-value of 0.91, and identities = 30%. The 2nd through 10th hits on phagesDB Blastp show tape measure proteins, with e-values ranging from 1.2-1.5. CDD showed one hit (hsdR super family, accession cl36022) for a type 1 restriction enzyme EcoKI subunit R. HHpred showed an error code, a domain of unknown function. Based on this evidence, the most supported function call would be a tape measure protein. However, this evidence is not adequate, there is extreme variability in similar phams, and incredibly poor e-values. This evidence cannot be confidently supported. For primary annotation, final call would be NKF. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Joo, Hannah /note=Secondary Annotator QC: I agree with the location and function calls. All of the evidence categories have been considered for this annotation. CDS 31532 - 31825 /gene="32" /product="gp32" /function="membrane protein" /locus tag="RomansRevenge_32" /note=Original Glimmer call @bp 31532 has strength 6.44; Genemark calls start at 31532 /note=SSC: 31532-31825 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage BruhMoment]],,NCBI, q4:s22 86.5979% 3.53937E-6 GAP: 170 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.924, -4.837739279947301, no F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage BruhMoment]],,UOK18358,46.4912,3.53937E-6 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Laureano, Ryan /note=Auto-annotation: Genemark and Glimmer call start at 31532. Start is ATG. /note=Coding Potential: The putative ORF start site covers good coding potential and more than 80% is covered in the proposed start to stop site. /note=SD (Final) Score: The SD score is not the highest possible. The highest possible SD score is -3.306 which corresponds to the start site of 31634. /note=Gap/overlap: The gap for this gene is the lowest for the proposed stop site at 170. This suggests the possibility of a gene upstream from this one since that gap is rather large. The length of the actual gene is sufficient at 294 bp. /note=Phamerator: This gene is in pham 12066 as of 1/10/24. It is a singleton. /note=Starterator: There is a conserved start site that is reasonable. The start number is 2 and the start site is 31532 . 4/4 call start #2. /note=Location call: The gene is real as indicated by coding potential and the start site has sufficient SD score, Z score and is backed by starterator. The LORF is made via this proposed start site. /note=Function call: Based on HHpred and NCBI blast there is NKF for this gene. /note=Transmembrane domains: There is a predicted TMH indicating it may be a membrane protein. /note=Secondary Annotator Name: Garcia, Isabella /note=Secondary Annotator QC: I agree with this annotation, due to the evidence and completeness. CDS 31825 - 32055 /gene="33" /product="gp33" /function="hypothetical protein" /locus tag="RomansRevenge_33" /note=Original Glimmer call @bp 31825 has strength 10.35; Genemark calls start at 31825 /note=SSC: 31825-32055 CP: yes SCS: both ST: NA BLAST-Start: GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.015, -3.3633184689272606, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Ryan, Kaitlin /note=Auto-annotation: Glimmer and GeneMark. Both call the start site as 31825. /note=Coding Potential: High coding potential in the forward direction as indicated by only the direct sequence on both GeneMark Host and GeneMark S outputs. /note=SD (Final) Score: -3.363. This is the second best final score on PECAAN since it is the second least negative score. /note=Gap/overlap: Overlap of 1 base pair. This is a tiny overlap indicative of an operon and there is no coding potential indicating a new gene. /note=Phamerator: Pham: 100561. Date: 1/8/2024. RomansRevenge showed some similarity to phages in Cluster FH via protein BLAST hits, but when comparing the pham maps of RomansRevenge to these phages (Bolt007, Prairie, and Lilmac1015), there was no synteny observed. /note=Starterator: Starterator was not applicable (NA) due to RomansRevenge being an orpham, although it did agree with PhagesDB Phamerator that the pham was 100561. /note=Location call: Based on the above evidence, this is a real gene and the start site is 31825. /note=Function call: Function unknown, as called by the majority of the PhagesDB BLAST hits; NCBI BLAST also did not find a known function nor similarities with other proteins and their functions. HHPRED and CDD results were irrelevant. /note=Transmembrane domains: Deep TMHMM does not predict any TMDs; therefore, this is not a membrane protein. /note=Secondary Annotator Name: Hernandez, Sarah /note=Secondary Annotator QC: I agree with the start site and no known function call. CDS 32052 - 32363 /gene="34" /product="gp34" /function="membrane protein" /locus tag="RomansRevenge_34" /note=Original Glimmer call @bp 32052 has strength 9.35; Genemark calls start at 32052 /note=SSC: 32052-32363 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein [Zhihengliuella halotolerans]],,NCBI, q7:s6 94.1748% 0.00261983 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.153, -4.337049294579155, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein [Zhihengliuella halotolerans]],,WP_102158664,54.0816,0.00261983 SIF-HHPRED: SIF-Syn: /note=Three TMDs /note=Primary Annotator Name: Sanchez, Kayla /note=Auto-annotation: Both Glimmer and GeneMark. Both call the start site at 32052. /note=Coding Potential: Good coding potential is found both in GeneMark Self and Host. Our start and stop is outside of the coding potential for Host. Coding potential is in the forward region so we can say that our gene is a forward gene. /note=SD (Final) Score: -4.337. It is the best final score on PECAAN because it is the lowest negative value. /note=Gap/overlap: Overlap of 4 bp. This overlap suggests that it can be an operon. Acceptable gene length of 312bp. /note=Phamerator: pham: 100461. Date 9/11/23. It is a singleton so there are no other phages that have this pham so it is an orpham. /note=Starterator: Starterator was not found because it is an orpham. /note=Location call: Based on the good coding potential and the reasonable overlap, this gene is real and the most likely start site is 32052. /note=Function call: Function unknown. The top two PhagesDB BLASTp hits have an unknown function however they have a large e-value, and the top two NCBI BLAST hits have the function of a hypothetical protein but have low identity and a high e-value. The hits given to us by HHpred all had very large e-values and low probability (<80%). /note=Transmembrane domains: DeepTMHMM does predict TMDs, therefore it could be a membrane protein. /note=Secondary Annotator Name: Bidzan, Hanna /note=Secondary Annotator QC: I agree with all the calls made from the annotator. I agree with the function call based on the blast hits. Just be sure to fill out your starterator box CDS 32363 - 33472 /gene="35" /product="gp35" /function="endolysin" /locus tag="RomansRevenge_35" /note=Original Glimmer call @bp 32363 has strength 14.87; Genemark calls start at 32363 /note=SSC: 32363-33472 CP: yes SCS: both ST: NA BLAST-Start: [N-acetylmuramoyl-L-alanine amidase [Glutamicibacter sp. FBE19]],,NCBI, q1:s1 51.7615% 4.6225E-79 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.153, -4.337049294579155, no F: endolysin SIF-BLAST: ,,[N-acetylmuramoyl-L-alanine amidase [Glutamicibacter sp. FBE19]],,MBF6672457,44.6746,4.6225E-79 SIF-HHPRED: N-acetylmuramoyl-L-alanine amidase amiD; ZINC AMIDASE, PGRP, Peptidoglycan Recognizing Protein, AmpD, N-ACETYLMURAMYL-L-ALANINE AMIDASE, Cell wall biogenesis/degradation, Hydrolase, Lipoprotein, Membrane, Metal-binding; HET: GOL, AH0; 1.75A {Escherichia coli},,,3D2Y_A,85.9079,99.8 SIF-Syn: /note=Primary Annotator Name: Hon, Darren /note=Auto-annotation: Gene 35 (stop@33472F) /note=Coding Potential: There is good coding potential demonstrated by both host-trained and self-trained GeneMark. Glimmer and GeneMark both call the start site as 32363. There are also no switches in gene orientation and the length of the gene itself is greater than 120bp. /note=SD (Final) Score: The start codon is ATG. The final score of -4.337 and z-score of 2.153 indicate a good RBS score. Thus, the original start site of 32363 is kept. /note=Gap/overlap: The overlap of -1 indicates that no gene needs to be added. The spacer of 11 is negligible. /note=Phamerator: As of 1/10/2024, there are no other genes in the pham, 100670, indicating that this gene is an orpham. /note=Starterator: As of 1/10/2024, there are no other genes in the pham, 100670, indicating that this gene is an orpham. Both starterator and phamerator call 100670 as the pham. /note=Location call: Strong coding potential is found between the start site of 32363 and stop site of 33472. The start site is supported by both Glimmer and GeneMark. Additionally, the final score of -4.337 and z-score of 2.153 are optimal values. The phamerator and starterator, both calling 100670, indicate an orpham. The overlap of -1 indicates no insertion of a gene. /note=Function call: According to the blastp hits, other genes indicate that the function is an endolysin. Furthermore, HHPred had a strong hit of 3D2Y_A, in which the function is N-acetylmuramoyl-L-alanine amidase amiD. This is a specific endolysin. Thus, the approved function of this gene is likely endolysin, N-acetylmuramoyl-L-alanine amidase domain. The CDD did not have any hits. /note=Transmembrane domains: According to DeepTMHMM, this is not a transmembrane protein. This is synonymous with how the protein, according to the function call, functions independently of the membrane. /note=Secondary Annotator Name: Sass, Arielle /note=Secondary Annotator QC: I agree with both the location and function calls. However, evidence from PhagesDB BLASTp, NCBI BLASTp, and HHPRED results can be selected to support the call. Auto-annotation start site agrees on 32363 bp. Coding potential covered fully. Final score is least negative and z-score is greater than 2. Gap is -1, good, indicative of an operon. Gene is in an orpham numbered 100670 as of 1/10/24. No starterator report could be produced. Two best PhagesDB BLASTp hits have e-value of 2e-12 and function of endolysin. NCBI BLASTp top hit has e-value of 4.6225e-79 and function of amidase, and second bst hit has e-value of 4.85447e-71 and unknown function. CDD top hit has e-value of e-5 and function of amidase but low coverage. Top HHPRED hit has e-value of 1.5e-16 and probability 99.8% of with function amidase/Cell wall biogenesis/degradation. The second best HHPRED hit has an e-value of 2.1e-10 and probability of 99.4% with function hydrolase. If there is another gene with the function lysin B called, then we can call the function lysin A, N-acetylmuramoyl-L-alanine amidase domain, but otherwise we can call the function endolysin, N-acetylmuramoyl-L-alanine amidase domain. No transmembrane domains predicted by Deep TMHMM. CDS 33483 - 33725 /gene="36" /product="gp36" /function="holin" /locus tag="RomansRevenge_36" /note=Original Glimmer call @bp 33483 has strength 14.49; Genemark calls start at 33483 /note=SSC: 33483-33725 CP: yes SCS: both ST: NI BLAST-Start: [holin [Pseudarthrobacter sp. HLT1-5] ],,NCBI, q11:s6 87.5% 1.98921E-14 GAP: 10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.926, -3.473611599459246, yes F: holin SIF-BLAST: ,,[holin [Pseudarthrobacter sp. HLT1-5] ],,WP_252576518,65.3333,1.98921E-14 SIF-HHPRED: Phage_r1t_holin ; Putative lactococcus lactis phage r1t holin,,,PF16945.8,83.75,99.9 SIF-Syn: This gene is a Holin Protein. No funciton called for the upstream gene. Downstream gene is a Rus-A like resolvase (endonuclease) /note=Primary Annotator Name: Claire, Monjov /note=Auto-annotation: Glimmer and GeneMark both call the same start site of 33483. The start codon is an ATG. /note=Coding Potential: In the forward direction, the coding potential map shows coding potential covering a large proportion of ORF. This coding potential is covering over 90% of the gene and the start site is just before where the coding potential is which is a great sign. /note=SD (Final) Score: -3.474 which is the least negative value. The other final scores are near -6 which indicates there is a gap meaning this score is a promising indicator that the gene is real. The z-score is 2.926 which is over 2 also indicating that this gene is most likely real. /note=Gap/overlap: There is a 10 base pair gap between the start of this gene and the end of the gene upstream. This gap is very reasonable and indicates this start site is in a reasonable position. This results in a gene that is 243 base pairs long, and that is a reasonable length for a gene as it is reasonably long. /note=Phamerator: This gene is found in pham 130421 as of 1/10/2024. This phage is a singleton but clusters this gene is found in includes F, ED, BD, and many more. I used the pages Aubs and Avani to compare. The function for many other genes within the pham was a Holin protein. /note=Starterator: Start site 62 is only called by 1/412 genes within the pham which is not a very conserved start site. This start site corresponds to a base pair start site of 33483. Though this isn’t very strong evidence at all, the start site agrees with Glimmer and GeneMark which is a good sign. /note=Location call: Based on all of the evidence, this gene is very likely a real gene. There is a conserved function across the other genes within the pham. The coding potential is very good and both Glimmer and GeneMark agree on the start site of 33483. Starterator also called this start site indicating 33483 is the most likely start site of this gene. /note=Function call: HHPred has a hit with 99.9 probability and 83.75% coverage of the gene with a function of the Holin Protein. NCBI Blast also has another promising hit with 49.33% probability and over 87% coverage also indicating a Holin Protein. /note=Transmembrane domains: There are 2 TMD’s predticed and one of them is 19 base pairs long. /note=Secondary Annotator Name: Diaz, Sebastian /note=Secondary Annotator QC: I agree with all the calls made from the original annotator. The start site aligns with the coding potential from the putative ORF. The SD and the gap scores are all reasonable values. And the functional call is accurate based upon all the cited data from HHpred and NCBI BLASTp. CDS complement (33783 - 34253) /gene="37" /product="gp37" /function="RusA-like resolvase" /locus tag="RomansRevenge_37" /note=Original Glimmer call @bp 34253 has strength 14.7; Genemark calls start at 34253 /note=SSC: 34253-33783 CP: yes SCS: both ST: SS BLAST-Start: [RusA-like resolvase [Arthrobacter phage Rizwana]],,NCBI, q9:s8 81.4103% 5.84683E-25 GAP: -11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.874, -5.515861183442005, no F: RusA-like resolvase SIF-BLAST: ,,[RusA-like resolvase [Arthrobacter phage Rizwana]],,QWY81336,57.554,5.84683E-25 SIF-HHPRED: Crossover junction endodeoxyribonuclease rusA; Homologous recombination, DNA repair, resolvase, HYDROLASE; 1.2A {Escherichia coli} SCOP: d.79.6.1,,,2H8E_A,85.8974,99.7 SIF-Syn: No observed synteny since there are no final comparisons that can be made as the other phages are drafts. /note=Primary Annotator Name: Labib, Youstina /note=Auto-annotation: Glimmer identifies the start site at 34253. GeneMark identifies the start site to be 34253. Both start sites match, indicating a greater likelihood of this being the correct start site. /note=Coding Potential: There is good coding potential observed within the first open reading frame for this gene on the GeneMarkS output report. However, on the GeneMark.hmm it appears that the coding potential is minimal but still observed in the first ORF. This is still supportive evidence of coding potential and that this is a reverse gene. /note=SD (Final) Score: The SD (final) score is -5.516, though this is not the best-observed score it appears to have more supportive evidence compared to other calls. The best observed final score is -4.836. The z-score is 1.874 which is not the best of those called but is relatively close to the optimal z-score of 2. The best z-score observed is 1.996. /note=Gap/overlap: This gene has an 11bp overlap with the gene downstream. The upstream gap may be indicative of the switch from forward to reverse. /note=Phamerator: #132528 observed on 1/10/24. This gene has many others within its pham and it appears they are all consistently in the AP cluster with the function call of RusA-like endonuclease. Examples of these phages compared include Chubster and Beagle. /note=Starterator: In this case, I think the “most annotated” start is also the best start site because the starterator data demonstrates that this is the most called and shown in the most non-draft phages. This start has the most calls the a call number that matches the auto-annotated. Thus, the collective evidence shows that this gene likely starts at the most annotated start. /note=Location call: Based on the above evidence, it appears this is a real gene and the start site is 34253. /note=Function call: RusA-like resolvase - based on the PhagesDB Blastp, NCBI Blastp, and HHPRED there are several matching hits indicating this gene’s function is RusA-like resolvase. The hits that contain the highest percentage of coverage and probability are those with this function. CDD does not have any hits. /note=Transmembrane domains: There are no transmembrane domains observed. All the graph data supports inside signals indicating that this gene is not a transmembrane component. /note=Secondary Annotator Name: Samudrala, Vaishnavi /note=Secondary Annotator QC: No comments. Agree with the annotations. Just add some of the names of the phages compared to for the Phamerator. CDS complement (34243 - 34764) /gene="38" /product="gp38" /function="hypothetical protein" /locus tag="RomansRevenge_38" /note=Original Glimmer call @bp 34764 has strength 11.02; Genemark calls start at 34764 /note=SSC: 34764-34243 CP: yes SCS: both ST: NI BLAST-Start: GAP: -7 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.844, -2.8970853586582503, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Woodward, Lauren /note=Auto-annotation: Glimmer and GeneMark both called the start site at 34764. /note=Coding Potential: There is coding potential in the third ORF of the reverse direction in both Host and Self-trained GeneMark, indicating that this is a reverse gene. /note=SD (Final) Score: The final score is -2.897 and the z-score is 2.844. There is one potential start site with a higher final score and z-score, but it does not include all of the coding potential. /note=Gap/overlap: There is an overlap of 7 bp with the adjacent gene, which is a reasonable size. /note=Phamerator: As of 1/8/24 this gene is a member of pham #100683. It is the only member of this pham. /note=Starterator: The gene is the only member of its pham so there are no comparisons for Starterator. /note=Location call: Based on the evidence listed above this is a real gene and the start site is 34764. The start site that had a higher final score and z-score would leave a gap of 158 bps and would not include all of the coding potential shown in the Host and Self-trained GeneMark. /note=Function call: NKF. There were no BLASTp hits with an e-value less than 10-7. There were no significant hits for NCBI BLAST. There were no hits for CDD or HHpred. /note=Transmembrane domains: There are no TMDs, so this is not a membrane function. /note=Secondary Annotator Name: Sotelo, Jessie /note=Secondary Annotator QC: I agree with this annotation. None of the blast hits had significant e-values. Therefore I would agree with the call of NKF. CDS complement (34758 - 35285) /gene="39" /product="gp39" /function="hypothetical protein" /locus tag="RomansRevenge_39" /note=Original Glimmer call @bp 35285 has strength 14.22; Genemark calls start at 35285 /note=SSC: 35285-34758 CP: yes SCS: both ST: SS BLAST-Start: [DUF5664 domain-containing protein [Arthrobacter sp. VKM Ac-2550] ],,NCBI, q4:s3 72.0% 2.91571E-44 GAP: 89 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.955, -2.6623718267059564, yes F: hypothetical protein SIF-BLAST: ,,[DUF5664 domain-containing protein [Arthrobacter sp. VKM Ac-2550] ],,WP_264669376,59.6026,2.91571E-44 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Carnes, Julianne /note=Auto-annotation: Both Glimmer and Genemark call the start site at 35285. /note=Coding Potential: Goding potential is present in both host and self genemark in the reverse direction only and covers the entire gene. /note=SD (Final) Score: -2.662. Z-score is 2.955. These are the best scores. /note=Gap/overlap: Gap is 89 base pairs. This is large, however, the coding potential matches the length of the gene and gaps in between neighboring genes. /note=Phamerator: Pham 97793 as of 1/9/24. Synteny for this pham is shown among phages in the AU and AW cluster. Gene is conserved. /note=Starterator: Report as of 1/9/24. Has start site 8 (found 1/50 genes in this pham), not a manually annotated start site. /note=Location call: The likely start site of this gene is at 35285. /note=Function call: No known function. Blastp results two best matches NapoleanB (5e-36) and Circum (5e-36) as NKF. NCBI Blastp indicates dATG/dGTP diphosphohydrolase domain-containing protein from Arthrobacter sp (e-value 3e-44, 59% identity) and hypothetical protein match from phage Zeina (8e-42, 63% identity). CDD specific hit for DUF5664 superfamily (e-value 5.01e-39) for NFK protein in siphoviruses. HHpred indicates hit for dATP/dGTP diphosphohydrolase (e-value 8.6e-30, 99% probability). Overall, needs more experimental evidence to officially determine function. Other phages have this protein, but the main consensus is that the role of this protein in phages is unknown, but is known in host. /note=Transmembrane domains: None /note=Secondary Annotator Name: vazquez, eunice /note=Secondary Annotator QC: /note=Auto-annotation: Glimmer and Genemark both call the start at 35285. /note=Coding Potential: Coding potential in this ORF is found on the reverse strand only, indicating this is a reverse gene. /note=SD (Final) Score:-2.662 is the final score and it is the best final score on PECAAN. The z score is 2.955. /note=Gap/overlap: there is a gap of 89bp and it is large but the coding potential corresponds to this gap. /note=Phamerator: As of 1/10/24 the pham is 97793. Other phages in this pham belong to clusters AU6 and AU3 /note=Starterator: Start site 8 in starterator. There are no manual annotations for this start site and it is the only phage with this start site in the pham . Start site 8 is 35285 which corresponds to the start site called by Glimmer and Genemark. /note=Location call: With all the evidence above this gene is a real gene and has a start site at 35285 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: NKF. Other genes like Ingrid and Zeina in this pham have NKF with an e-value of ( 1e-35 and 2e-35). /note=Transmembrane domains:There are no TMRs noted by DeepTMHMM. CDS complement (35375 - 36241) /gene="40" /product="gp40" /function="thymidylate synthase" /locus tag="RomansRevenge_40" /note=Original Glimmer call @bp 36241 has strength 14.27; Genemark calls start at 36241 /note=SSC: 36241-35375 CP: yes SCS: both ST: SS BLAST-Start: [thymidylate synthase [Brevibacterium phage Cantare] ],,NCBI, q10:s4 96.875% 2.64556E-130 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.178, -2.17469248771465, yes F: thymidylate synthase SIF-BLAST: ,,[thymidylate synthase [Brevibacterium phage Cantare] ],,YP_010676624,77.6596,2.64556E-130 SIF-HHPRED: Thymidylate synthase; methyltransferase, ternary complex, dihydrofolic acid, Transferase-Transferase Inhibitor complex; HET: DHF, NOH, GOL; 1.17A {Mus musculus} SCOP: d.117.1.1,,,4EZ8_A,99.3056,100.0 SIF-Syn: /note=Primary Annotator Name: Chamorro, Marco /note=Auto-annotation: Both Glimmer and GeneMark both called the same start site at 36241. This is a ATG codon. /note=Coding Potential: Host trained GeneMark shows coding potential in the reverse direction. The Self trained GeneMark (GeneMarkS) shows coding potential in the same direction. There are currently no final draft phages that are comparable, so this gene does not have synteny with others. /note=SD (Final) Score: This start site has a final score of -2.175. This is the best score, and it has the smallest gap (-8). This may indicate that the start site is within an operon. Additionally, this start site aligns well with both of the coding regions from the Host-Trained/Self-Trained Algorithms. /note=Gap/overlap: This start site contains a gap of -8. /note=Phamerator: As of 1/9/23, this gene is part of pham#127616. There are 92 members in this pham, and 7 are draft phages. /note=Starterator: Start site 50 was manually annotated in 20/80 non draft genes in this pham. Start site 50 is not present in RomansRevenge, but start site 38 is the auto-annotated start site. Starterator suggests other start sites such as (13, 36568), (16, 36433), (26, 36331), however these start sites rest outside of the region of coding potential. /note=Location call: Based on the evidence listed above this is a real gene and the start site is 36241. The start site that had the strongest final score and z-score. The gap is reasonable at -8 bps. The coding potential shown in the Host and Self-trained GeneMark align to this start site. /note=Function call: Thymidylate Synthase. NCBI Blast shows similar sequences to the brevibacterium phage Cantare, with an e-value of 3e-130, which is also a thymidylate synthase. BlastP produced several hits with Cantare, Bantam, and Arcadia, all of which suggest a Thymidylate Synthase function. CDD also produces hits that suggest that this is a Thymidylate Synthase. /note=Transmembrane domains: TmHmm predicts 0 transmembrane domains. /note=Secondary Annotator Name: Aguirre, Austin /note=Secondary Annotator QC: Annotation looks great and I agree with the location start site. However, the e-values for the HHPred hits are greater than 10^-3, which is our goal. I would double check if we are able to use those hits as evidence. Also, don`t forget to fill out the synteny box. Great work! CDS complement (36234 - 36629) /gene="41" /product="gp41" /function="hypothetical protein" /locus tag="RomansRevenge_41" /note=Original Glimmer call @bp 36629 has strength 7.62; Genemark calls start at 36629 /note=SSC: 36629-36234 CP: yes SCS: both ST: NA BLAST-Start: GAP: 1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.61, -3.662072209942156, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Mahadev, Anirudh /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 36629, and the start codon is ATG. /note=Coding Potential: From the Host Trained GeneMark, this gene has reasonable coding potential in the fifth open reading frame. There appears to be a sharp drop in coding potential around 36520, but it does not seem to have an impact because coding potential increases again at 36480. From the Self Trained GeneMark, there is also reasonable coding potential in the fifth open reading frame. /note=SD (Final) Score: Among the 4 possible start sites found by auto-annotation, the start site at 36629 has the least negative final score and a reasonably high Z score. The Z score of 2.61 at start site 36629 is lower than the Z score of 2.749 for the start site at 36443. However, this is not a large enough difference to not call the start site at 36629. /note=Gap/overlap: The length of the gene is reasonable at 396 base pairs. The gap between this gene and the previous gene is 1 base pair, which is a reasonable gap. The gap for the other possible start sites is much higher, which leads me to believe that they are incorrect. /note=Phamerator: This gene did not have a useful page on the PhagesDB Pham page as it is in an orpham. /note=Starterator: The Starterator link led to an Error 404 page, which is because this gene is an orpham and does not have other genes in its pham to compare it to. /note=Location call: By comparing the auto-annotated start sites and the coding potential graphs, I think the best start site for this gene is the ATG start site at 36629. I believe this is a real gene. /note=Function call: While Phagesdb calls this gene as a helix-turn helix DNA binding protein in comparison to the Malibo phage, the high e-value, lack of data from HHPRED, NCBI Blast and CDD lead me to believe this gene has no known function. /note=Transmembrane domains: Deep TmHmm predicts no transmembrane domains for this gene, so this is not a transmembrane protein. /note=Secondary Annotator Name: Jacobs, Sarisha /note=Secondary Annotator QC: I agree with the primary annotator`s call for site and function. All I would change is the starterator drop down since, you don`t have a page it would be not informative. You also don`t have to fill in synteny box if there is no function assigned. Also you may want to include coding potential information from the self-based genemark. CDS complement (36631 - 36870) /gene="42" /product="gp42" /function="hypothetical protein" /locus tag="RomansRevenge_42" /note=Original Glimmer call @bp 36870 has strength 11.19; Genemark calls start at 36861 /note=SSC: 36870-36631 CP: yes SCS: both-gl ST: NA BLAST-Start: GAP: 5 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.735, -5.238109312333945, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hosford, Ryan /note=Auto-annotation: Glimmer Called the Start at 36870 with codon ATG (which was chosen to be most likely) while GeneMark called the site at 36861 with codon TTG seems like a real gene, based off of evidence of auto annotation calling the site. /note=Coding Potential: There seems to be a solid amount of coding potential throughout the entire gene in the reverse direction. /note=SD (Final) Score: Auto annotation calls the Start site of 36870 with a final score of -5.238 /note=Gap/overlap: 5 /note=Phamerator: No info /note=Starterator: Starterator has a 404 error /note=Location call: The location being called at Start of 36870 seems the most accurate due to the start codon of ATG, having a small gap of 5, although the Final score is not the most ideal at -5.238 it is the best of the group shown from auto annotation. The Z-Score is also fairly close to our ideal threshold of 2 coming in at 1.735. No shinedelgarno sequence observed (AAGGAGG) which raises some concern. /note=Function call: HHPred function shows nothing better than a 14 E-Value, PhagesDB shows a couple capsid protein varients that seem plausible but nothing in BLAST. Assuming NKF /note=Transmembrane domains:NONE /note= /note=Secondary Annotator Name: Tubeileh, Shareef /note=Secondary Annotator QC: /note=Auto-annotation: Glimmer calls this gene at 36870, while GeneMark calls it at 36861. /note=Coding Potential: There is significant coding potential in both GeneMark self and host, indicating this is likely a real gene. /note=SD (Final) Score: The best final score is -5.238, which corresponds to this selected start site, while the best z-score is on a different start site. The z-score for the selected gene is 1.735, which is less than 2. /note=Gap/overlap: The gap for the selected gene 5, which is in the accepted range. /note=Phamerator: As of 1/10/2024, this gene is in pham 100479. It seems this gene is an orpham, as no other hits pop up for phamerator. /note=Starterator: Since this gene is an orpham, there is no starterator information since there are no other genes to compare to. /note=Location call: I believe the best start site for this gene is the one currently indicated, at 36870. This covers all of the coding potential and has an acceptable final score and z-score. /note=Function call: HHpred did not reveal any significant results. NCBI blast and phagesDB blast also did not have any significant matches given the very high e-values. This is likely a protein with an unknown function. /note=Transmembrane domains: DeepTMHMM predicts no transmembrane regions, so this is most likely not a membrane protein. CDS complement (36876 - 37589) /gene="43" /product="gp43" /function="hypothetical protein" /locus tag="RomansRevenge_43" /note=Original Glimmer call @bp 37589 has strength 15.46; Genemark calls start at 37589 /note=SSC: 37589-36876 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein [Arthrobacter sp. ok362] ],,NCBI, q13:s12 88.6076% 3.60356E-27 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.005, -2.4787699911121788, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Arthrobacter sp. ok362] ],,WP_091559367,50.2262,3.60356E-27 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Indiresan, Neeti /note=Auto-annotation: Both Glimmer and Genemark, start site 37589, start codon ATG /note=Coding Potential: The gene shows coding potential in the reverse direction according to both host and self. The chosen start site covers this coding potential. /note=SD (Final) Score: -2.479. It is the best RBS Final Score on PECAAN. /note=Gap/overlap: 2 bp, which is not large and contains no coding potential. /note=Phamerator: pham: 100484. Date 01/09/2024. It is an orpham. /note=Starterator: No starterator report due to gene being an orpham. /note=Location call: Based on the evidence shown above, this is likely a real gene and the most reasonable start site is 37589. /note=Function call: NKF. PhagesDB Blastp had no hits, and NCBI Blastp had no informative hits (identity did not meet the cutoff). CDD and HHPred also had no informative hits (e-values did not meet the cutoff). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Yao, Alice /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS complement (37592 - 37735) /gene="44" /product="gp44" /function="hypothetical protein" /locus tag="RomansRevenge_44" /note=Original Glimmer call @bp 37735 has strength 12.27; Genemark calls start at 37735 /note=SSC: 37735-37592 CP: yes SCS: both ST: NI BLAST-Start: GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.702, -3.2575878860394365, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kalliomaa, Kira /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 37735. /note=Coding Potential: Coding potential in this Open Reading Frame (ORF) is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found in both GeneMark Self and GeneMark Host. /note=SD (Final) Score: -3.258. This is the best final score on PECANN. /note=Gap/overlap: There is an overlap of 4 base pairs. This suggests that there may be an operon present. /note=Phamerator: 100494. Date accessed 01/09/2024. Not conserved in any phages; orpham. /note=Starterator: Orphan, no data present. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 37735. /note=Function call: NKF /note=Transmembrane domains: 0; DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note= /note=Secondary Annotator Name: Wang, Jordan /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator regarding function. CDS complement (37732 - 38103) /gene="45" /product="gp45" /function="hypothetical protein" /locus tag="RomansRevenge_45" /note=Original Glimmer call @bp 38103 has strength 6.96; Genemark calls start at 38103 /note=SSC: 38103-37732 CP: yes SCS: both ST: NA BLAST-Start: GAP: -29 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.253, -2.0162541296952132, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Potter, Sofia /note=Auto-annotation start source: Glimmer and GeneMark both call the start at 38103. ATG is the start codon for this site. /note=Coding Potential: GeneMark Self and Host both show good coding potential within the complementary strand for this gene, extending through the vast majority of the ORF. The direct sequence does show a peak of possible coding potential towards the end of this gene, but it does not show consistent coding potential in the region like that in the complementary strand (which is ideal, because this is a reverse gene). /note=SD (Final) Score: The final score for the selected start site is -2.016. This is the best final score on PECAAN. /note=Gap/overlap: The autoannotated start site shows an overlap of 29 bp, which is not ideal, but the other two options show large gaps of 55 and 91 respectively. /note=Phamerator: As of January 10, 2024, this gene is in pham 100494, which is an orpham. /note=Starterator: Because this gene is a member of an orpham, Starterator does not yield results. /note=Location call: Despite the 29 bp gap, given the options of start sites, the most likely start site is still @ 38103 due to the final score being significantly better than the other options, this being the LORF, and the Z-score being above 2. With this phage being a singleton, there is no Phamerator data to support (or deny) it being a real gene, but the coding potential suggests that it is real. /note=Function call: Due to all top Phagesdb BLAST hits being unknown function, and the top HHPred result (with an e-value of 0.03, which is still not ideal) being a family of unknown function, this gene is too ambiguous to definitively call a function. NCBI BLAST does not return any results, and the CDD results only return a provisional family, which again supports a call of NKF. /note=Transmembrane domains: DeepTMHMM does not predict any transmembrane domains. /note=Secondary Annotator Name: Richard, Ketan /note=Secondary Annotator QC: This all looks very good. I agree with all of the comments and annotations. CDS complement (38075 - 38389) /gene="46" /product="gp46" /function="hypothetical protein" /locus tag="RomansRevenge_46" /note=Original Glimmer call @bp 38389 has strength 8.55; Genemark calls start at 38389 /note=SSC: 38389-38075 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein [Glutamicibacter creatinolyticus]],,NCBI, q4:s1 90.3846% 4.84614E-17 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.391, -3.914968956777623, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Glutamicibacter creatinolyticus]],,WP_346921319,61.2903,4.84614E-17 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Salinas, Juan Carlos /note=Auto-annotation: Glimmer and GeneMark both call the start site at 38389. Although this is not the LORF, the statistical values such as the Z-score (2.391) and final score (-3.195, highest) favor this call. At this start site, the gene is 330bp long, which meets the expected length of 120 for a typical protein-coding gene. /note=Coding Potential: Coding potential exists in both GeneMark and Host-Trained GeneMark within the start and end site of the predicted gene. In both cases, this ORF is on the reverse strand only, suggesting that this is a reverse gene. This is also evidence that this is a real gene. /note=SD (Final) Score: -3.195. This is the highest (least negative) score on PECAAN. All other suggestions are more negative. The Z-score is 2.391, a favorable value as it is over 2, providing significant evidence for this call. /note=Gap/overlap: -4 bp. This is a favorable overlap that is not excessive. All other gaps suggested by other start site predictions on PECAAN are over 50 bp, suggesting that the start site at 38389 is the most likely given that it is most favorable. /note=Phamerator: 100796. Date 01/09/2024. This gene is the only member of this pham according to phagesdb. Since this is a singleton phage, conservation of this pham does not exist in any other cluster or subcluster for any phage. /note=Starterator: There is no starterator information for this gene as it is an orpham, and there is no information available to compare it to. In this case, the starterator is uninformative. /note=Location call: Given the available information, the most likely start site is 38389. This was called by both Glimmer and GeneMark. Additionally, the coding potential extends throughout the ORF in both GeneMark and Host-Trained GeneMark. This call also holds the most favorable Final score (-3.195) and Z-score (2.391). /note=Function call: NKF. No programs provided informative values of function calls. NCBI and CDD contained no data as to a function predication for this gene. PhagesDB BLAST provided two hits with a relatively high e-value of 2, while HHPRED’s hit was less significant with an e-value of 10. Both programs were inconsistent with their function call (Phages DB: DNA Binidng Protein; HHPRED: Transcription DNA Complex). None of these programs provide significant evidence of a function call, hence, this is being called NKF. /note=Transmembrane domains: 0. This is not a membrane protein and DeepTMHMM does not predict any transmembrane domains. /note=Secondary Annotator Name: Zamora, Alexandra /note=Secondary Annotator QC: I agree with this location and function call. All the evidence categories have been considered. This is a real gene with no known function. I agree with the annotation of this gene. CDS complement (38386 - 38715) /gene="47" /product="gp47" /function="hypothetical protein" /locus tag="RomansRevenge_47" /note=Original Glimmer call @bp 38715 has strength 9.76; Genemark calls start at 38715 /note=SSC: 38715-38386 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Cryobacterium zhongshanensis] ],,NCBI, q3:s11 93.578% 3.88788E-7 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.946, -2.66371296573402, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Cryobacterium zhongshanensis] ],,WP_243013106,48.6726,3.88788E-7 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Zaragoza, Evelin /note=Auto-annotation: GeneMark and Glimmer. Both call the start at 38715, which is reasonable because, while not the longest possible ORF, the start site has a large Z-score (statistical analysis of all final scores of that gene) of 2.946, where a more positive number above 2 is better. The final score is the highest at -2.664, which suggests the best sequence match to the Shine-Dalgarno sequence (likelihood ribosome binds to it). The start codon, ATG, is common. Length is acceptable at 330bp. LORF with start site 38772 is not chosen since it has a smaller final score. /note=Coding Potential: Gene likely exists as it is greater than 120bp with this start site. Chosen start site (38715) has most of the coding potential (in Self GeneMark and Host GeneMark) with the ORF showing reasonable potential so there is no need to push the start site back. No significant overlapping coding potential on the direct sequence. /note=SD (Final) Score: Final score is -2.664 with the z-score being 2.946. The z-score being over 2 is favorable and the final score is the highest out of all the other possibilities so this suggests this start site is likely. /note=Gap/overlap: There is little overlap (4bp) with gene to the right and little overlap with the gene to the left (4 bp). Gene may be part of an operon. While there is overlap, this is still reasonable as it is very small and almost insignificant. Coding potential does not significantly overlap with that of the gene to the right. The high z-score and high final score support this conclusion. /note=Phamerator: The pham number is 9414 as of 01/05/2024. It is conserved and found in non-draft phages such as Dewdrop, Hendrix, Leaf, and Rasputia, which are all part of the cluster GC. However, RomansRevenge is a singleton. /note=Starterator: Start site 8 in Starterator was manually annotated in 4/4 non-draft genes in pham 9414. However, RomansRevenge is a singleton that does not have start site 8 as a possibility and thus, for RomansRevenge_46, it is not called. This evidence suggests that the site predicted by Glimmer and GeneMark (which corresponds to start site 9 in Starterator) is still the best option. Other candidate starts for RomansRevenge_46 include: (4, 38772), (5, 38763), and (13, 38487). However, these are not the best options since all these start sites have lower final scores. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 38715. The gene does not need to be deleted because the coding potential proves that there needs to be a gene there and the start site called for this particular gene has most of the coding potential. /note=Function call: NKF. There is no significant evidence on HHPred, CDD, Phagesdb BLAST, or NCBI BLAST that gives us information on the function (e-values too high). /note=Transmembrane domains: DeepTMHMM predicts no TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Soan, Jessica /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS complement (38712 - 38999) /gene="48" /product="gp48" /function="hypothetical protein" /locus tag="RomansRevenge_48" /note=Original Glimmer call @bp 38999 has strength 7.31; Genemark calls start at 38999 /note=SSC: 38999-38712 CP: yes SCS: both ST: NA BLAST-Start: GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.253, -2.606079664606164, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kibria, Kamille /note=Auto-annotation: Glimmer and GeneMark both call the start site at 38999 /note=Coding Potential: Self trained and Host Trained GeneMark shows reasonable coding potential, and show a start site predicted by GeneMark and Glimmer. All of the coding potential is contained within the start and stop site. /note=SD (Final) Score: -2.606. Best score out of all candidates. /note=Gap/overlap: -1. This is reasonable and indicative of potential operon. /note=Phamerator: Pham 100705 on 1/8/24. Only gene in the pham. AP cluster appears to be most related but no phamerator page for final AP phages. /note=Starterator: No starterator report for orphams. 1/8/24. /note=Location call: The gene is real and has a start site of 38999. /note=Function call: NKF. From PhagesDB Blast, there were not “good” hits, but there were a few hits with a score of 38 and an e value of 0.007. Juliet approved to check as evidence. NCBI pBLAST generated one “okay” hit with a score of 47.8, coverage of 80%, e value of 0.001, and percent identity of 39.47%. There is a screenshot in my notebook but this hit was not present in PECAAN, however, so I could not “check” it as evidence. CDD generated no hits at all. HHpred generated no significant hits (e values >30). /note=Transmembrane domains: No TMDs detected by TMHMM. Not a membrane protein. /note=Secondary Annotator Name: Bhattarai, Aryan /note=Secondary Annotator QC: /note=Based on the above evidence, I agree with the primary annotator. This a real gene and the location call is correct. There isn`t enough sufficient evidence to determine the function of this gene. The function call of NKF is correct. Note: I would mention that this gene is an orpham. /note= /note=Below are my notes: /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 38999. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. All of the coding potential is contained within the start and stop site. /note=SD (Final) Score: -2.606. It is the best final score on PECAAN. /note=Gap/overlap: -1 bp, which is not large and contains no coding potential. -1. This is indicative of potential operon. /note=Phamerator: pham: 100705. Date 1/15/2024. It is not conserved; it is an orpham. /note=Starterator: Orphams do not have Starterator reports. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 38999. /note=Function call: No known function. Phagesdb BLAST, NCBI BLAST, CDD, and HHpred had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. CDS complement (38999 - 39250) /gene="49" /product="gp49" /function="hypothetical protein" /locus tag="RomansRevenge_49" /note=Original Glimmer call @bp 39250 has strength 10.8; Genemark calls start at 39250 /note=SSC: 39250-38999 CP: yes SCS: both ST: NA BLAST-Start: GAP: -14 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.022, -3.095100142625534, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Akkinepally, Mrudula /note=Auto-annotation: Glimmer and Genemark both call the start site 39250. /note=Coding Potential: Coding potential in this ORF is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.095 /note=Gap/overlap: There is a 14 bp overlap with the previous gene and 0 bp overlap with the following gene. This is greater than the 30 bp overlap, but it looks like the best option out of those shown on the list. /note=Phamerator: Pham: 100609. Date: 01/09/24. It is a singleton. /note=Starterator: Page not found when attempted to access through PECAAN. Also not found in the Washington University St. Louis database of uploaded starterator pdfs. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 39250. /note=Function call: The top three phagesdb BLAST hits: YungMoney_19, RNA ligase, 183 (e value 2.0), Mutzi_19, RNA ligase, 184 (e value 2.0) TillyBobJoe_17, RNA ligase, 182 (e value 2.6); No significant hits on NCBI Blast. HHpred had no good hits. CDD had no hits. Given that the RNA Ligase hits were low on the list, and there were no hits with good e-values, the best call for this gene is unknown function. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Kang, Alix /note=Secondary Annotator QC: /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.095. It is the best final score on PECAAN and the z score is the highest at 3.022. /note=Gap/overlap: The overlap with the upstream gene is 14. /note=Starterator: Orphams do not have Starterator reports. /note=Function call: No known function. Phagesdb BLAST, NCBI BLAST, CDD, and HHpred had no relevant hits. CDS complement (39237 - 39452) /gene="50" /product="gp50" /function="hypothetical protein" /locus tag="RomansRevenge_50" /note=Genemark calls start at 39452 /note=SSC: 39452-39237 CP: yes SCS: genemark ST: NA BLAST-Start: GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.88, -3.091876957011931, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Giusti, Alessia /note=Auto-annotation start source: GeneMark calls the gene and calls it at 39452. /note=Coding Potential: Coding potential is found both in GeneMark Self and Host, but coding potential is found on both the forward and reverse strands. A gene is only officially called, however, on the reverse strand, indicating that this is likely a reverse gene. /note=SD (Final) Score: The final score is the best option at -3.092 and the z score is the highest at 2.88. /note=Gap/overlap: Gap: -4, which is a very favorable overlap, likely indicating it is part of an operon. /note=Phamerator: Pham: 100712. Date: 01/09/2023. It is an orpham. /note=Starterator: This gene is an orpham and therefore does not have a Starterator report. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 39452. /note=Function call: HHpred returned hits whose e-values were all 21 or above and whose probabilities were 58 or below. Some hits returned with coverage greater than 35%, but had e-values and probabilities that were not sufficient to claim as evidence. PhagesDB BLASTp, NCBI BLASTp, and CCD all returned no hits. Consequently, the function of this protein cannot be determined at this time. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Saud, Arnav /note=Secondary Annotator QC: /note=Based on evidence, I agree with the primary annotator. This a real gene and the location call is correct. There is not enough evidence to determine the function of this gene; the function call of NKF is correct. CDS complement (39449 - 39751) /gene="51" /product="gp51" /function="hypothetical protein" /locus tag="RomansRevenge_51" /note=Original Glimmer call @bp 39751 has strength 14.8; Genemark calls start at 39751 /note=SSC: 39751-39449 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein [Glutamicibacter arilaitensis] ],,NCBI, q49:s167 52.0% 8.51089E-15 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.366, -3.905685530420168, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Glutamicibacter arilaitensis] ],,WP_102598081,19.7248,8.51089E-15 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Joo, Hannah /note=Auto-annotation: Glimmer and GeneMark both call the start at 39751. /note=Coding Potential: Coding potential in this ORF is only on the reverse strand which indicates that this is a reverse gene. Coding potential is found in GeneMark Self and Host. /note=SD (Final) Score: -3.854. It is the best final score on PECAAN. /note=Gap/overlap: The overlap is 8 bp. The overlap is reasonable. /note=Phamerator: Pham: 100616. Date 01/09/2024. Gene is an orpham. /note=Starterator: There is no Starterator report because the gene is an orpham. /note=Location call: Based on the above evidence, this is a real gene. The most likely start site is 39751. /note=Function call: Unknown Function. No strong phagesDB BLAST hits. The best e-value was 2.6 so the hit was not considered. No strong HHpred hits. Probability for hits was 57.39% and below, with high e-values (greater than 101). Only one NCBI Blastp hit returned for unknown function (coverage 52%, e-value 6.68968e-15, identity 16.5138%). No strong CDD hits were returned. /note=Transmembrane domains: DeepTMHMM does not predict the presence of any TMDs, and therefore, it is not a membrane protein. /note=Secondary Annotator Name: Daniel, Mila /note=Secondary Annotator QC: I agree with this annotation. All of these evidence categories have been considered. However, the primary annotator needs to make sure both drop-down menus have had options selected (missing All GM Coding Capacity). CDS complement (39744 - 40049) /gene="52" /product="gp52" /function="hypothetical protein" /locus tag="RomansRevenge_52" /note=Original Glimmer call @bp 40049 has strength 5.19; Genemark calls start at 40049 /note=SSC: 40049-39744 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein [Glutamicibacter sp. BW80] ],,NCBI, q4:s15 60.396% 1.05254E-13 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.631, -3.2669815101762634, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Glutamicibacter sp. BW80] ],,WP_096284063,38.5321,1.05254E-13 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Garcia, Isabella /note=Auto-annotation: GeneMark and Glimmer both call the start site at 40049. /note=Coding Potential: Coding potential is found in the reverse strand in Host Trained GeneMark and GeneMarkS, meaning this is a reverse gene. /note=SD (Final) Score: The final score is -3.267, which is the best option. The z score is also the best option, at 2.631. /note=Gap/overlap: There is a gap of -8bp, which appears to be large but is fine so long as it is not more than 30bp. /note=Phamerator: Pham 100515 as of 1/9/24. It is an orpham. /note=Starterator: There is not a starterator report for an orpham. /note=Location call: Based on the evidence above, this is likely a real gene with a start site of 40049. /note=Function call: PhagesDB blast returned 35 hits, all of which had e values above 0. NCBI Blastp returned 2 hits with e values below 10^-7, both of which were hypothetical proteins. There were not any CDD hits. All HHPred hits e had scores of 99+. A function cannot be assigned to the gene. /note=Transmembrane domains: DeepTMHMM did not predict any TMDs, therefore this is not a membrane protein. /note=Secondary Annotator Name: Akkinepally, Mrudula /note=Secondary Annotator QC: Considering Glimmer and Genemark both call the same start site and Genemark shows an open reading frame with good coding potential, I agree with this start site. Considering the lack of hits from CDD, the high e-values in PhagesDB Blast and HHpred, and the low e-values for hypothetical proteins on NCBI BlastP, I agree with the no known function call CDS complement (40042 - 40359) /gene="53" /product="gp53" /function="hypothetical protein" /locus tag="RomansRevenge_53" /note=Original Glimmer call @bp 40359 has strength 8.84; Genemark calls start at 40209 /note=SSC: 40359-40042 CP: yes SCS: both-gl ST: NA BLAST-Start: [hypothetical protein [Glutamicibacter arilaitensis] ],,NCBI, q60:s68 35.2381% 2.70313E-4 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.869, -2.9063687850157054, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Glutamicibacter arilaitensis] ],,WP_102598536,24.1379,2.70313E-4 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hernandez, Sarah /note=Auto-annotation start source: Glimmer says 40359 with start codon ATG and GeneMark says 40209 with start codon GTG. /note=Coding Potential: The gene has some coding potential within the ORF which is shown in both Host and Self GeneMarks in the reverse direction. The 40359 start site covers all of the coding potential. /note=SD (Final) Score: Final score is -2.096 for 40359. /note=Gap/overlap: Overlap of -4 for 40359. This is within the acceptable range. /note=Phamerator: 100624. 1/10/24, orpham. /note=Starterator: Page not found because it is an orpham. /note=Location call: Based on the evidence above, this is a real gene and the start site is 40359. /note=Function call: NKF. PhagesDB has several hits for PAPS reductase-like domain protein with e-values of 0.52. The top two HHpred hits are for cytochrome c-type biogenesis protein with 70.83% and 67.63% probability and e-values of 8.4 and 12 which are not very likely. This function is also not included in the SEA PHAGES function list. No CDD hits. /note=Transmembrane domains: DEEPTHMM did not predict any TMDs. This is not a membrane gene. /note=Secondary Annotator Name: Giusti, Alessia /note=Secondary Annotator QC: I agree with this location and functional call. Note: (1) the final score is the best final score (2) the drop down menu that says that the start site covers all of the coding potential needs to be selected (3) I would clarify that the Starterator report is not available because it is an orpham (4) I wouldn’t necessarily indicate what the functions are of the various hits - I would just say that the e-values, coverage, probabilities, and identities are too low to be significant. Additionally, this is a bit nit-picky, but I would indicate which start site is the one that covers all of the coding potential under start site and I would mention why the gap is acceptable under gap. Under Phamerator, I would also say “orpham” instead of “singleton” because singleton refers to the phage as a whole and not just the gene. CDS complement (40356 - 40898) /gene="54" /product="gp54" /function="hypothetical protein" /locus tag="RomansRevenge_54" /note=Original Glimmer call @bp 40898 has strength 17.31; Genemark calls start at 40898 /note=SSC: 40898-40356 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein [Zhihengliuella flava] ],,NCBI, q7:s6 92.2222% 1.91471E-35 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.738, -3.0409675880108447, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Zhihengliuella flava] ],,WP_196834724,57.4586,1.91471E-35 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Bidzan, Hanna /note=Auto-annotation: Both Glimmer and GeneMark agree with start site 40898 and start codon ATG /note=Coding Potential: Reasonable coding potential in reverse direction due to solid black line /note=SD (Final) Score: start site 40898 has a final score of -3.041 which is the closest to zero so therefore the best choice. The Z-score for this start site is 2.738 which is okay, but the other start sites with lower z scores have much lower final scores. /note=Gap/overlap: There is a 4bp overlap which is acceptable although this start site is not the LORF. Its 543bp in length which is not too far from the LORF which is 660bp in length. /note=Phamerator: As of 01/10/2024, pham #100825. It is an orpham/belongs to cluster singleton. /note=Starterator: No starterator report as of 01/10/2024 due to the gene being an orpham /note=Location call: Due to both Glimmer and Genemark agreeing with the start site of 40898, the final score being the best score at this site, and the reasonable coding potential, i believe this is a real gene with a start site at 40898 /note=Function call: According to blastp, there are about 18 hits all with unknown functions and low scores. HHpred has 90% coverage on no known function. Same goes for NCBI hits with no known functions and low coverage %. There are no CDD hits. No function can be called until further research is done. /note=Transmembrane domains: No transmembrane domains /note=Secondary Annotator Name: Joo, Hannah /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. Note: The starterator menu has not been filled out yet. Also, I think you could add to the coding potential notes by specifying which direction you found coding potential in and adding what database stated there were no TMDs. I think also you should add that the gene is an orpham in explaining why there is no starterator report. CDS complement (40895 - 41248) /gene="55" /product="gp55" /function="hypothetical protein" /locus tag="RomansRevenge_55" /note=Original Glimmer call @bp 41248 has strength 10.32; Genemark calls start at 41248 /note=SSC: 41248-40895 CP: yes SCS: both ST: NA BLAST-Start: GAP: 21 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.253, -2.033982896655645, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sass, Arielle /note=Auto-annotation: GeneMark and Glimmer agree on a start site at 41248 bp. /note=Coding Potential: There is coding potential predicted by Host-trained and self-trained GeneMark and the chosen start site covers all of the coding potential. /note=SD (Final) Score: -2.034, the best final score on PECAAN. The Z-score is the highest possible at 3.253. /note=Gap/overlap: Gap is 21 which is below the recommended 50bp limit and therefore acceptable. /note=Phamerator: The gene is in an orpham numbered 100524 as of 1/9/2024. /note=Starterator: This gene is an orpham so no Starterator report could be produced. /note=Location call: The gathered evidence suggests that the gene is real and starts at 41248 bp. /note=Function call: No known function. No programs returned statistically informative results. /note=Transmembrane domains: No transmembrane proteins were predicted by DeepTMHMM. /note=Secondary Annotator Name: Garcia, Isabella /note=Secondary Annotator QC: I agree with the primary annotation; This appears to be a real gene with a start site of 41248 bp with coding potential on the reverse strand. CDS complement (41270 - 42046) /gene="56" /product="gp56" /function="hypothetical protein" /locus tag="RomansRevenge_56" /note=Original Glimmer call @bp 42046 has strength 13.47; Genemark calls start at 42046 /note=SSC: 42046-41270 CP: yes SCS: both ST: NA BLAST-Start: GAP: 15 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.457, -3.9844811943414205, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Diaz, Sebastian /note=Auto-annotation: Both Glimmer and GeneMark are utilized, both agreeing with the start site at #42046, calling start codon ATG. /note=Coding Potential: Within both the host-trained and self-trained coding potential graphs there is consistent reasonable coding potential throughout the entire putative ORF. /note=SD (Final) Score: The SD score for the candidate start site at #42046 is -3.984, which is the second best final score from all potential start sites. Suggesting potential for the most credible ribosome binding site. /note=Gap/overlap: There is a 15bp gap which is a reasonable value. This start site creates the longest ORF (777bp) and is the best candidate. All alternative start sites produce smaller ORFs with larger unacceptable gaps (greater than 57 base pair gaps). /note=Phamerator: As of January 9th, 2024 this gene is an orpham, pham #100432. There is no function called. /note=Starterator: This gene is an orpham and as a result there is no starterator report to compare other start sites with. /note=Location call: I believe this is a real gene at start site #42046 due to high coding potential within the putative ORF, a reasonable SD score and a valid gap of 15bp. /note=Function call: There were multiple phagesDB BLAST hits, which had e-values of about 1e-5, all of which had no known function. The various HHpred hits all returned with poor e-values of at least 13, the highest reported coverage was 22%. In addition there were no hits from NCBI BLASTp or CDD. Consequently, no function may be called. /note=Transmembrane domains: This protein is not a membrane protein because it has no transmembrane domains called by TMHMM. /note=Secondary Annotator Name: Hernandez, Sarah /note=Secondary Annotator QC: I agree with the call that 42046 is the best start site as it has the smallest gap and covers all of the coding potential. CDS complement (42062 - 42310) /gene="57" /product="gp57" /function="hypothetical protein" /locus tag="RomansRevenge_57" /note=Original Glimmer call @bp 42310 has strength 10.44; Genemark calls start at 42310 /note=SSC: 42310-42062 CP: yes SCS: both ST: NA BLAST-Start: GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.462, -6.3867082620457305, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Samudrala, Vaishnavi /note=Auto-annotation: Both Glimmer and GeneMark. Both call the start site at 42310 (codon: GTG). /note=Coding Potential: The gene has high coding potential in one of the complementary or reverse strands. The start site suggested by both Glimmer and GeneMark covers all of the coding potential. The start site suggested by both Glimmer and GeneMark covers all of the coding potential in the Host-Trained GeneMark but only partially in the Self-Trained GeneMark. /note=SD (Final) Score: The SD score for the start site called by both Glimmer and GeneMark has a final score of -6.387 and a z-score at 1.462. For the start site, The z-score isn’t the best z-score out of all of the potential start sites and the final score isn’t the best as well. However, it does have the potential to be a ribosome binding site since the z-score implies that the ribosomes will bind to the site at a much higher probability than average. Also, due to this 4 bp overlap, the gene may be coded as a part of an operon (the final score may not be as relevant for assessment of the start site). /note=Gap/overlap: Since there is only a 4bp overlap, there is reasonable overlap between the target gene and the gene beside it. This is because the overlap can be used to deduce that the gene may be a part of an operon. The site called by Glimmer and GeneMark could be the best start site candidate because it has the most reasonable overlap value (4 bps) in comparison to other start sites. The length of the gene is 248, taking the called start site into consideration. This length is reasonable for coding a protein. /note=Phamerator: The gene is found in the pham 100639 (as of 1/9/24). It’s likely an orphan since no Starterator report has been generated. /note=Starterator: The Starterator report has not been generated as of 1/9/24. There may be no other phages to compare the presence gene to. /note=Location call: The target gene is real considering that a start site is called for it by both Glimmer and GeneMark auto annotation softwares.There is no switches in the orientation the gene is coded since high coding potential is detected in a complementary strand only (when examined in Self and Host Trained GeneMark). The gene has an appropriate length to code for a protein (248 bps > 150 bps). Considering that the overlap value of the GeneMark and Glimmer called start site is in a reasonable range of values and covers all of the coding potential in (at least) the Host-Trained GeneMark, the start site 42310 is the best so far. /note=Function call: There were no significant hits on PhagesDB BLASTp (the only hit was with RomansRevenge itself) and on NCBI BLASTp as well. There were also no significant hits on CDD and HHPred. /note=Transmembrane domains: There are no TMRs noted by DeepTMHMM. /note=Secondary Annotator Name: Bidzan, Hanna /note=Secondary Annotator QC: I agree with your calls. Just make sure to fill out the starterator box. Also maybe mention that there is no starterator report due to it being an orpham if youd like CDS complement (42307 - 42633) /gene="58" /product="gp58" /function="hypothetical protein" /locus tag="RomansRevenge_58" /note=Original Glimmer call @bp 42633 has strength 9.82; Genemark calls start at 42633 /note=SSC: 42633-42307 CP: yes SCS: both ST: NA BLAST-Start: GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.458, -5.821000469731741, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sotelo, Jessie /note=Auto-annotation: Glimmer and Genemark both call the start site at 42633. /note=Coding Potential: The ORF has good coding potential on the reverse strand, indicating that this is a reverse gene. Coding potential is found on both GeneMark Self and Host. /note=SD (Final) Score: -5.821. This is the best final score of ORFs producing genes with an appropriate length of at least 120 bp on PECAAN. /note=Gap/overlap: Overlap: 1bp. Overlap is very small and because it is 1bp this could be an operon. /note=Phamerator: Pham: 100539. Date: 1/09/24. RomansRevenge is a singleton and this gene is the only member of this pham. /note=Starterator: No information because the gene is an orpham. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 42633 bp. /note=Function call: NKF is supported. The top phagesDB BLAST hits all had high e-values and were therefore insignificant. NCBI BLASTp had no significant hits. HHpred had no significant hits. CDD had no hits. Since the only hits did not have good e-values a function cannot be called. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Sass, Arielle. /note=Secondary Annotator QC: I agree with location call and function call. Auto-annotation start site agrees on 42,633bp. Coding potential covered fully by both Self and Host GeneMark on the reverse strand. Final score is more negative of two options and z-score is slightly lower of two options however the start site selected results in the longest ORF and has a -1 gap, indicative of an operon. The only alternative start site offered has a gap of 290bp. Gene is in an orpham numbered 100539 as of 1/10/24. No starterator report could be produced. No statistically significant PhagesDB BLASTp hits with functions. No NCBI BLASTp hits. No CDD hits. HHPRED top two hits were for Nuclear pore complex component (e-value of 13) and Inner membrane protein import complex subunit (e-value of 18) however these had low e-values and probability. NKF. No transmembrane protein predicted by Deep TMHMM. CDS complement (42633 - 42974) /gene="59" /product="gp59" /function="hypothetical protein" /locus tag="RomansRevenge_59" /note=Original Glimmer call @bp 42974 has strength 11.53; Genemark calls start at 42974 /note=SSC: 42974-42633 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein [Nesterenkonia sp. PF2B19] ],,NCBI, q11:s13 71.6814% 0.00742941 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.022, -2.523003374675015, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Nesterenkonia sp. PF2B19] ],,WP_070158444,43.1193,0.00742941 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Vazquez, Eunice /note=Auto-annotation: Glimmer and Genemark. Both call the start at 42974. /note=Coding Potential: Coding potential in this ORF is found on the reverse strand only, indicating this is a reverse gene. /note=SD (Final) Score: -2.523 is the final score, and it is the best final score on PEECAN. The z score is 3.022. /note=Gap/overlap: -4 bp overlap. This is a reasonable overlap considering it is not 50bp or more. /note=Phamerator: As of 01/09/24 the pham number is 100440. This phage is an orpham. /note=Starterator: no information because it is an orpham /note=Location call: With all the evidence above this gene is a real gene and has a start site at 42974 bp. /note=Function call: NKF, there are no significant hits with phages db, NCBI Blast, HHpred, or CDD. /note=Transmembrane domains: There are no TMRs noted by DeepTMHMM. /note=Secondary Annotator Name: Diaz, Sebastian /note=Secondary Annotator QC: I agree with all the calls made by the original annotator. The gene is real based upon the start site called by Glimmer and GeneMark as it aligns well with the coding potential from the putative ORF. I would only consider noting that this gene is apart of an operon as an overlap of -4 is indicative of a polycistronic gene. In addition I agree with the inability to call a function as there are no informational hits from HHpred, CDD and NCBI BLASTp. CDS complement (42971 - 43210) /gene="60" /product="gp60" /function="toxin in toxin/antitoxin system, HicA-like" /locus tag="RomansRevenge_60" /note=Original Glimmer call @bp 43111 has strength 6.65; Genemark calls start at 43111 /note=SSC: 43210-42971 CP: yes SCS: both-cs ST: NA BLAST-Start: [HicA-like toxin [Arthrobacter phage SilentRX] ],,NCBI, q1:s1 92.4051% 1.17444E-12 GAP: -14 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.817, -5.512695393419078, no F: toxin in toxin/antitoxin system, HicA-like SIF-BLAST: ,,[HicA-like toxin [Arthrobacter phage SilentRX] ],,YP_010656436,62.8205,1.17444E-12 SIF-HHPRED: mRNA interferase toxin HicA; toxin-antitoxin, TA, protein complex, DNA-binding, TA complex, TOXIN; HET: MSE, SO4; 2.28A {Escherichia coli str. K-12 substr. MG1655},,,6HPB_A,73.4177,99.4 SIF-Syn: No synteny observed, ORPHAM. /note=Primary Annotator Name: Aguirre, Austin /note=Auto-annotation: Glimmer and GeneMark were called and agree on the start site: 43111. Called start codon was ATG. Not the longest ORF, which is 43345. /note=Coding Potential: 43111 does not have strong coding potential at this start site. It looks like 43210 covers the coding potential better. /note=SD (Final) Score: Final score for 43111 is -5.337 with a Z score of 1.836. 43210 has a Z score of 1.817 and a final score. of -5.513. /note=Gap/overlap: 43111 has a gap of 85 while 43210 has an overlap of -14. This is the strongest evidence for 43210 as it contains the smallest gap/overlap of all called start sites, as the ORF has a overlap of -149 and the called start site 43111 has a gap of 85. /note=Phamerator: No other members in Phamerator, gene is singleton. /note=Starterator: No starterator report was generated, Orpham. /note=Location call: This is a real gene as it has strong coding potential. The start site is located at 43111 as its Z-score is close to 2 and was called by both glimmer and gene mark, despite not being the longest ORF. This start site has a gap of 85, and this is better than the second-best start site which contained an overlap of -100. /note=Function call: No known function based on pBLAST. Some functions point towards hic-A like toxin, but there are no significant hits in the databases. No significant hits in NCBI BLASTp, CDD, or hhPRED. /note=Transmembrane domains: No TMDs present. /note=Secondary Annotator Name: Samudrala, Vaishnavi /note=Secondary Annotator QC: Select an option for the coding capacity box. Also include info about the NCBI BLASTp, CDD, and HHPRED in the function call of the notes (even if you had no significant hits, mention that, (E-values <10^-3 are still significant) ). Not sure about whether there is coding potential so double check that. Other than that, I agree with all the annotations. CDS complement (43197 - 43517) /gene="61" /product="gp61" /function="hypothetical protein" /locus tag="RomansRevenge_61" /note=Original Glimmer call @bp 43517 has strength 10.71; Genemark calls start at 43517 /note=SSC: 43517-43197 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein PBI_BEAGLE_77 [Arthrobacter phage Beagle] ],,NCBI, q4:s6 91.5094% 3.71104E-7 GAP: -38 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.253, -2.0162541296952132, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PBI_BEAGLE_77 [Arthrobacter phage Beagle] ],,QGJ92836,43.6893,3.71104E-7 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Jacobs, Sarisha /note=Auto-annotation: Glimmer and Genemark both call start at 43517 /note=Coding Potential: There is high and significant protein coding potential that is captured by the start site in the reverse direction. There is no coding potential in the forward direction /note=SD (Final) Score: The best final score is for the selected start at 43517 (-2.016). This also has the best Z score (3.253), surpassing the minimum. /note=Gap/overlap: This gene does have a large overlap of 38 base pairs with a gene stop @43480. If this gene was removed, there would be a significant gap. /note=Phamerator: 1/9/2024. Pham: 5917 contains 11 members, 10 of which are from AP cluster. /note=Starterator: Romansrevenge does not have the most annotated start but this start is close in location to other starts called in other phages of the AP cluster such as phage beagle and SilentRX. /note=Location call: Based on the evidence above, this gene is a real gene and the selected start is at 43157. It has the best Z and final score and contains all the coding potential despite the relatively big overlap. We did look at whether the gene downstream is real and could not be remove. /note=Function call: The phagesdb blast showed hits for unknown function proteins, there were no conserved domains, and NCBI blast reveals hits for only hypothetical proteins. There is one hit on HHPRED but the e value is not significant, therefore there is no known function for this gene. /note=Transmembrane domains: There are no transmembrane domains found on DeepTmHmm /note=Secondary Annotator Name: Sotelo, Jessie /note=Secondary Annotator QC: I agree with this annotation. All the hits for phagesdb were of unknown function except for one hit for an exonuclease, however, there was only slight alignment and this is not enough evidence. The other significant hits were for hypothetical proteins. Therefore function cannot be determined. CDS complement (43480 - 43635) /gene="62" /product="gp62" /function="hypothetical protein" /locus tag="RomansRevenge_62" /note=Original Glimmer call @bp 43635 has strength 9.88; Genemark calls start at 43635 /note=SSC: 43635-43480 CP: yes SCS: both ST: NA BLAST-Start: GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.346, -4.008839636156049, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Tubeileh, Shareef /note=Auto-annotation: Both Glimmer and GeneMark call this gene at 43635, so this is the likely start position of the gene. /note=Coding Potential: From the host-trained GeneMark, it does seem like there is coding potential in this rather short interval. GeneMark self agrees with this and shows significant coding potential in this region. There is only coding potential on the reverse strand, indicating this is likely a reverse gene. /note=SD (Final) Score: The best final score (-4.009) is the one already indicated, its the best since it is the most positive. The best z-score is also the one indicated (2.346), since it is greater than 2 and the highest. /note=Gap/overlap: The gap is -4, which is fairly typical. There is no coding potential in the gap so this is fine. /note=Phamerator: The Pham as of 1/9/2024 is 100562. There are no other phages on phagesDB that have this pham, which is interesting. It might not be conserved. /note=Starterator: This pham is likely an orpham since the starterator report did not generate. This likely means this pham is not present in any other phages so it cannot be compared to anything. /note=Location call: Based on the above evidence, this is likely a real gene starting at 43635bp. /note=Function call: The only hits for this gene on phagesDB is as a putative membrane protein, under cluster CR1. The phagesDB blast top hits are for unknown protein functions, only one of which has a significant e-value (5e-22). The NCBI blast shows no hits, and HHpred shows top hits having an unknown function. There are no CDD hits. The gene itself is also very short (155bp, slightly larger than 120bp mark). This evidence points me towards thinking this is a hypothetical protein (putative protein or NKF). /note=Transmembrane domains: DeepTMHMM does predict 1 transmembrane region, indicating it could play a role in the membrane. According to Deep TMHMM and the requirements you can call this a membrane protein. /note= /note=Secondary Annotator Name: vazquez, eunice /note=Secondary Annotator QC: check starterator and coding capacity box and according to Deep TMHMM and the requirements you can call this a membrane protein. /note=Auto-annotation: Glimmer and Genemark. Both call the start at 43635 /note=Coding Potential: There is coding potential present in the reverse strand. This is what was found in both Genemark Self and Host. /note=SD (Final) Score: -4.009 is the final score, and it is the best final score on PEECAN. The z score is 2.346. /note=Gap/overlap:-4 bp overlap. This is a reasonable overlap considering it is not 50bp or more. /note=Phamerator: As of 1/10/24 the pham is 100562. This phage is an orpham. /note=Starterator: no information because it is an orpham /note=Location call:With all the evidence above this gene is a real gene and has a start site at 43635 bp. /note=Function call: membrane protein and confirmed with deep TMHMM but = there are no siginficant hits on phages db, hhpred, ncbi blast or cdd for a function call. /note=Transmembrane domains:DeepTMHMM does predict 1 transmembrane region, indicating it could play a role in the membrane. CDS complement (43632 - 44645) /gene="63" /product="gp63" /function="Cas4 exonuclease" /locus tag="RomansRevenge_63" /note=Original Glimmer call @bp 44645 has strength 17.58; Genemark calls start at 44645 /note=SSC: 44645-43632 CP: yes SCS: both ST: SS BLAST-Start: [Cas4 family exonuclease [Arthrobacter phage BruhMoment]],,NCBI, q33:s22 89.6142% 1.1805E-89 GAP: 78 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.1, -2.2763497933341483, yes F: Cas4 exonuclease SIF-BLAST: ,,[Cas4 family exonuclease [Arthrobacter phage BruhMoment]],,UOK18385,54.3767,1.1805E-89 SIF-HHPRED: Uncharacterized protein; Cas4, CRISPR, MCSG, Exonuclease, PSI-Biology, STRUCTURAL GENOMICS, UNKNOWN FUNCTION, Midwest Center for Structural Genomics; HET: MN, SF4, MSE; 2.35A {Sulfolobus solfataricus},,,4IC1_J,68.8427,99.6 SIF-Syn: RecA-like DNA recombinase in RomansRevenge is located downstream of Cas4 exonuclease like what is seen in phage Rizwana on 1/24/23. /note=Auto-annotation: Both Glimmer and GeneMark called the start at 44645. The start codon is ATG and it is the LOF. This is the best start site because it has the smallest gap, the best Z-score, and final score of all the candidates. /note=Coding Potential: There is coding potential present in the reverse strand. This is what was found in both Genemark Self and Host. /note=SD (Final) Score: -2.276 (Best option since it is the least negative among the gene candidates). Best Z score is 3.1. /note=Gap: Has a gap of 78 bp. This gap is justified because it seems to appear in other phages like Pureglobe5. A gap of 78 is also the smallest gap in PECAAN. /note=Phamerator: Pham 5924. It has 11 members and 3 of them are drafts. It is conserved in MellowYellow and Odyssey. Date: 1/5/2024. /note=Starterator: The start number is 11. Has 2 out of the 8 manual annotations. Found in 6 of the 11 genes in pham. There are no other phages with this start number at 44645. However, the start site is close to MellowYellow whose start number is 44636. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 44645. /note=Function call: Cas4 exonuclease. There are a few BLAST hits with low e-values (10^-77) on other similar phages whose function is exonuclease. Top 2 NCBI BLAST hits have the function Cas4 exonuclease (89% coverage and e values <10^-89). HHPRED has hits that correspond to unique SEA-PHAGES requirements for this gene (gene only includes exonuclease region, alignments to the crystal structure 4R5Q_A, 41C1_A, and to the PD-(D/E)XK nuclease superfamily (PF12705.7, among others). In HHPRED, Cas4 exonuclease has a 99.71% coverage and a low e-value score is 4*10^-15. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: AGUIRRE, AUSTIN LEON /note=Secondary Annotator QC: Auto-annotation should be explained more. What is the called codon? Is it the LORF? How does this compare to other possible start sites? Synteny box should comment on upstream and downstream genes in phages with synteny. On phages DB Blast, a Rec-B like exonuclease/helicase was marked as evidence, but called function was Cas-4 exonuclease. CDS complement (44724 - 45479) /gene="64" /product="gp64" /function="DNA binding protein" /locus tag="RomansRevenge_64" /note=Original Glimmer call @bp 45422 has strength 13.48; Genemark calls start at 45422 /note=SSC: 45479-44724 CP: yes SCS: both-cs ST: NI BLAST-Start: [DNA recombinase [Arthrobacter phage BruhMoment]],,NCBI, q25:s7 88.8446% 9.05591E-54 GAP: 57 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.482, -3.72200618970992, no F: DNA binding protein SIF-BLAST: ,,[DNA recombinase [Arthrobacter phage BruhMoment]],,UOK18386,53.0909,9.05591E-54 SIF-HHPRED: DNA repair and recombination protein RAD52; Recombination mediator protein, DNA repair, Apo structure, Decamer, RECOMBINATION;{Saccharomyces cerevisiae S288C},,,8G3G_G,70.1195,99.9 SIF-Syn: Rizwana /note=Primary Annotator Name: WANG, JORDAN /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 45422. /note=Coding Potential: There is good coding potential shown by the 5th row between 45422 (potential start) and 44724 (stop). Will note that there may be another start site that appears to be closer to where the coding potential begins slightly under 45400. /note=SD (Final) Score: For start site 45422, Final Score is -4.409. There are two start sites with better Final Scores (start@45164, final score: -3.103) and (start@45479, final score: -3.722) however the start@45164 would result in a 372 bp gap and the start@45479 would result in the inclusion of a segment with no coding potential. As a result, 45422 appears to be the best start site. /note=Gap/overlap: 114 bp gap /note=Phamerator: Pham 132898. Date: Jan 9, 2024. Also found in phage Rizwana. /note=Starterator: Start site 3 in Starterator was manually annotated in 1/1 non-draft genes in this pham. Start 3 does not correspond to a start site in RomansRevenge. This does not provide evidence to support or refute a start site at 45422. /note=Location call: Based on the above evidence, this is likely a real gene and the most likely start site is 45422. /note=Function call: DNA Recombinase. The top three phagesdb BLAST hits have the function of DNA recombinase (E-value <10^-37), and top two NCBI BLAST hits also have the function of DNA recombinase. (96-97% coverage, 38% identity, and E-value <10^-48). HHpred had a hit for a DNA recombinase protein with 100% probability, 72% coverage, and E-value of 1.9e-29. CDD suggests family RAD52/22 family double strand break repair protein with an E value of 3.63e-21. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: JACOBS, SARISHA MALAIKA /note=Secondary Annotator QC: I agree with the primary annotator`s call on the start and function call. Just make sure to fill out the dropdown for the starterator maps. CDS complement (45537 - 45746) /gene="65" /product="gp65" /function="hypothetical protein" /locus tag="RomansRevenge_65" /note=Original Glimmer call @bp 45737 has strength 15.86; Genemark calls start at 45737 /note=SSC: 45746-45537 CP: yes SCS: both-cs ST: NA BLAST-Start: [hypothetical protein [Bradyrhizobium canariense] ],,NCBI, q3:s2 94.2029% 6.8418E-5 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.252, -4.128569837189469, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Bradyrhizobium canariense] ],,WP_085352846,51.5625,6.8418E-5 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Richard, Ketan /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 45737. /note=Coding Potential: Coding potential in this ORF is primarily found only on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.129. It is the best final score for the start site of 45746, not the start site called by Glimmer and GeneMark. /note=Gap/overlap: There is a gap of -4. This would suggest that the gene is a part of an operon. /note=Phamerator: pham: 100760. The phamerator is not found because this gene is not a part of a pham yet. /note=Starterator: There is no starterator information since this gene is not a part of a pham. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 45746. This is different from the suggested start site by GeneMark and Glimmer, but since operons are so common in phages and the start site had a better z score and final score, this is most likely the start site. /note=Function call: Unknown Function. Based on the phagesdb Blastp, this protein is not found in other phages as far as the database is concerned and the phages with any similarity have a high e-value. The HHpred does not have evidence since all of the e-values are extremely high, but the NCI Blast calls hypothetical protein, but the e-value is too high. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note= /note=Secondary Annotator Name: Tubeileh, Shareef /note=Secondary Annotator QC: /note=Auto-annotation: Both Glimmer and GeneMark call the gene at 45537. /note=Coding Potential: According to both GeneMark hosts and self, there is significant coding potential in this open reading frame, meaning this is likely a real gene. /note=SD (Final) Score: The best final is -4.129, and the best z-score is 2.252, both of which correspond to the selected start site. /note=Gap/overlap: The gap is -4 which is not too high, it seems realistic. /note=Phamerator: As of 1/10/2024, this is gene is pham 100760, it is an orpham and there are no other genes in this pham. /note=Starterator: There is no starterator report since this gene is an orpham and there is nothing to compare it to. /note=Location call: I agree with the primary annotator in choosing a non-suggested start site. The final score and z-score are slightly better, and the gap makes sense as well. /note=Function call: Since this is an orpham, and the hits from hhpred and phagesdb blast have e-values that are far too great, I don`t think we can surmise the function of the protein. NCBI blast does pop up with one hit for a hypothetical protein, but again, the e-value is non-significant. /note=Transmembrane domains: DeepTMHMM does not predict any membrane domains, meaning this is likely not a membrane protein. CDS complement (45743 - 46180) /gene="66" /product="gp66" /function="MazG-like nucleotide pyrophosphohydrolase" /locus tag="RomansRevenge_66" /note=Original Glimmer call @bp 46180 has strength 9.3; Genemark calls start at 46180 /note=SSC: 46180-45743 CP: yes SCS: both ST: NI BLAST-Start: [MAG TPA: NTP-PPase-like protein [Caudoviricetes sp.]],,NCBI, q4:s3 97.931% 2.07546E-22 GAP: 8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.129, -4.4057176022952405, no F: MazG-like nucleotide pyrophosphohydrolase SIF-BLAST: ,,[MAG TPA: NTP-PPase-like protein [Caudoviricetes sp.]],,DAE95137,63.1579,2.07546E-22 SIF-HHPRED: Hypothetical protein; MazG, Vibrio, NTP-PPase, HYDROLASE; 1.8A {Vibrio sp. DAT722} SCOP: a.204.1.0,,,2Q73_A,93.1034,99.3 SIF-Syn: Gene is found in a neighborhood of other DNA modification genes. /note=Primary Annotator Name: Zamora, Alexandra /note=Auto-annotation: Glimmer and Genemark both call the gene and the agreed start site is at 46180 bp. The start codon is ATG. /note=Coding Potential: The gene has reasonable coding potential within the predicted ORF. The coding potential is in the reverse direction. The chosen start site covers all this coding potential. /note=SD (Final) Score: The final score is -4.406 and the Z-score is 2.129. The final score is not the best score but it is a reasonable one. The score is good and suggests the presence of a credible ribosome binding site. /note=Gap/overlap: Gap: 8bp. The gap with the upstream gene is reasonable and within the acceptable range. The ORF is not the longest, but is acceptable. This start site is chosen over one with a longer reading frame because it eliminates a large overlap with the upstream gene. /note=Phamerator: As of 1/8/24, the gene is found in pham number 84995. /note=Starterator: There are 155 non-draft members of this pham. The start site number for this gene is 20 and the start site is at 46180. 8/155 non-draft members called site number 20; none were manually annotated. /note=Location call: Based on the information, this is a real gene with a likely start site at 46180. /note=Starterator: There are 155 non-draft members of this pham. The start site number for this gene is 20 and the start site is at 46180. 8/155 non-draft members called site number 20; none were manually annotated. Considering this information, the starterator report was not informative. /note=Location call: Based on the information, this is a real gene with a likely start site at 46180. /note=Function call: MazG-like nucleotide pyrophosphohydrolase. The top three phagesdb BLAST hits have the function of MazG-like nucleotide pyrophosphohydrolase with scores greater than 93 and e-values less than e-19. NCBI BLAST hits showed alignments with an NTP-PPase like protein (97% coverage, 2e-22 e-value) and a MazG-like pyrophosphotase (100% coverage, 3e-22 e-value). CDD showed a hit with function nucleotide triphosphate pyrophosphohydrolase with e-value 5.96e-6. Two of the top three HHpred hits have the function of hydrolase with scores greater than 99 and e-values less than 1e-9. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Yao, Alice /note=Secondary Annotator QC: I agree with this annotation with all evidence considered. However, don`t forget to explain why Starterator is not informative. CDS complement (46189 - 46389) /gene="67" /product="gp67" /function="hypothetical protein" /locus tag="RomansRevenge_67" /note=Original Glimmer call @bp 46389 has strength 7.29; Genemark calls start at 46389 /note=SSC: 46389-46189 CP: yes SCS: both ST: NA BLAST-Start: GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.46, -4.535884094130711, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Soan, Jessica /note=Auto-annotation: Both Glimmer and Genemark call the start site at 46389bp. /note=Coding Potential: Coding potential in this ORF is only on the reverse strand, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.536 is the best final score on PECAAN (Z-score 2.46). This supports the start called. /note=Gap/overlap: There is a -4bp overlap. This small overlap is favorable. /note=Phamerator: Pham 100684. Date 1/9/23. No other phages within this pham were found. No known function call. /note=Starterator: Pham 100684 is not reported on starterator as of 1/9/24. Gene is an orpham. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 46389bp. /note=Function call: No known function. The top two phagesdb BLAST hits have a function call of tyrosine integrase; Amgine (Cluster K6) (e =1.1) and Ellie (Cluster K6) (1.1). However, this data is not significant due to high e-values. NCBI BLAST lacks any data/evidence. No significant HHpred hits (lowest e-value was 21 with 15.15% coverage). CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Wang, Jordan /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator regarding function CDS complement (46386 - 46817) /gene="68" /product="gp68" /function="hypothetical protein" /locus tag="RomansRevenge_68" /note=Original Glimmer call @bp 46817 has strength 5.83; Genemark calls start at 46817 /note=SSC: 46817-46386 CP: yes SCS: both ST: NA BLAST-Start: GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.517, -3.5690131075310805, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Bhattarai, Aryan. /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 47203. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.569. Although this is not the best-observed score it appears to have more supportive evidence compared to other calls. The best-observed final score is -2.605, and the z-score is 3.111, however it is not the true LORF, the gap is larger, and a start site of 46772 exists which neither Glimmer nor GeneMark call. /note=Gap/overlap: 2 bp, which is not large and contains no coding potential. /note=Phamerator: pham: 10073. Date 1/10/2024. It is not conserved; it is an orpham. /note=Starterator: Orphams do not have Starterator reports. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 46817. /note=Function call: No known function. Phagesdb BLAST, NCBI BLAST, CDD, and HHpred had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator QC: I agree with all of your annotations. There is a question about the start site with the one that has the better final score, but it creates a large app, so good job. CDS complement (46820 - 47203) /gene="69" /product="gp69" /function="hypothetical protein" /locus tag="RomansRevenge_69" /note=Original Glimmer call @bp 47203 has strength 14.97; Genemark calls start at 47203 /note=SSC: 47203-46820 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein [Arthrobacter sp. fls2-241-R2A-200]],,NCBI, q4:s41 80.315% 0.0308468 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.168, -2.274496637415597, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Arthrobacter sp. fls2-241-R2A-200]],,WP_284977604,35.3293,0.0308468 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kang, Alix /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 47203. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -2.274. It is the best final score on PECAAN and the z score is the highest at 3.168. /note=Gap/overlap: Overlap: 4bp. Gene is likely part of an operon. /note=Phamerator: pham: 106668. Date 1/10/2024. It is conserved; found in Abidatro (AS) and Andrew (AS). /note=Starterator: Start site 8 in Starterator was manually annotated in 6/11 non-draft genes in this pham. However, Start 15 agrees with the site predicted by Glimmer and GeneMark, which correlates to a start site of 47203 bp for RomansRevenge. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 47203. /note=Function call: No known function. Phagesdb BLAST, NCBI BLAST, CDD, and HHpred had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Zamora, Alexandra Michal /note=Secondary Annotator QC: I agree with this location and function call. All the evidence categories have been considered. This is a real gene with no known function. I agree with the annotation of this gene. CDS complement (47200 - 47385) /gene="70" /product="gp70" /function="hypothetical protein" /locus tag="RomansRevenge_70" /note=Original Glimmer call @bp 47385 has strength 1.78 /note=SSC: 47385-47200 CP: yes SCS: glimmer ST: NI BLAST-Start: [HNH endonuclease [Arthrobacter phage Beagle]],,NCBI, q1:s51 100.0% 1.80148E-11 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.99, -4.970213310205538, yes F: hypothetical protein SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage Beagle]],,QGJ92845,34.5455,1.80148E-11 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Saud, Arnav /note=Auto-annotation: Only Glimmer has a listed start site, which is at 47385. /note=Coding Potential: On Genemark self, there is a high level of coding potential in the atypical ps. On Genemark host, there is a low level of coding potential. Coding potential in this ORF is on the reverse strand only. /note=SD (Final) Score: -4.970. This is the best final score called by PECAAN. /note=Gap/overlap: /note=There is a 3 bp gap between the start site of our gene of interest and the previous gene`s stop site. /note=Phamerator: pham: 132644. Date: 1/8/24. The gene is conserved within the phage Bubbles123 (F1). /note=Starterator: Date: 1/8/23. Start site 21 is the most annotated start and was only manually annotated in 1 out of 1 non draft genes within this pham. Start site 21 is not present in RomansRevenge, but Start Site 22 is present. This site was auto annotated. There is not enough evidence present in Starterator in order to use it as a tool to make a sufficient determination. However, it is important to note that the start site 22 does call 47835 as the suggested start site, which is the same call made by Glimmer. /note=Location call: Based on the above evidence, this is a real gene, and the most likely start site is located at 47385. /note=Function call: NKF. The top two phagesDB BlastP hits have the listed function of HNH endonuclease (E-value < 2e-7), and the top 2 NCBI BLAST hits that are from phages have been assigned the function of HNH endonuclease( >80% coverage, >23% identity, and e-value < 9.9e-11). HHpred yielded no statistically significant results, and the length of the amino acid sequence produced by this gene is too short to qualify to be an HNH endonuclease. CDD yielded no hits. /note=From the Blast hits, this gene is definitely a real gene, but, from the HHpred and CDD data conflicting with the function called by the Blast hits, it can be said that the gene has NKF. /note=Transmembrane domains: DeepTMHMM does not predict any transmembrane domains for this gene; therefore, it is not a membrane protein. /note=Secondary Annotator Name: Soan, Jessica Hyunsil /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS complement (47389 - 47985) /gene="71" /product="gp71" /function="HNH endonuclease" /locus tag="RomansRevenge_71" /note=Original Glimmer call @bp 48084 has strength 5.6; Genemark calls start at 47931 /note=SSC: 47985-47389 CP: no SCS: both-cs ST: NA BLAST-Start: [HNH endonuclease [Gordonia phage Soos]],,NCBI, q2:s28 68.1818% 2.07711E-15 GAP: -22 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.134, -5.222075281493363, no F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Gordonia phage Soos]],,WNM68813,44.1718,2.07711E-15 SIF-HHPRED: a.4.1.2 (C:) HIN recombinase (DNA-binding domain) {Synthetic},,,d1ijwc_,21.7172,98.5 SIF-Syn: HNH endonuclease; upstream and downstream gene functions are unknown. /note=Primary Annotator Name: Daniel, Mila /note=Auto-annotation: Glimmer calls the start site at 48084 and GeneMark calls the start at 47931. /note=Coding Potential: Coding potential is on the reverse strand and there is good potential indicated on both GeneMark Self, but coding potential is somewhat weak in GeneMark Host, with a large absence of coding potential closer to the side of the start site. /note=SD (Final) Score: The final score is -3.550. It is the best final score found on PECAAN. The z-score is 2.526, which is the best z-score listed. /note=Gap/overlap: -121 bp. The overlap of 121 bp is large and not found to be conserved in other complete genomes like Beagle (AP) or Soos (CP). The gene itself does fill a large gap however. /note=Phamerator: Orpham 100480; no Phamerator report found. /note=Starterator: No Starterator report found. /note=Location call: The above evidence points towards this gene being real with a likely start site of 48084. /note=Function call: HNH endonuclease. The top three phagesdb BLAST hits of known function have the function of HNH endonuclease (E-value < 10^-22). 1 out of the 5 top NCBI BLAST hits also has the HNH endonuclease (72.7%+ coverage, 30.6%+ identity, E-value < 10^-19). HHpred had a hit for HNH homing endonuclease with 98.5% probability, 46.3% coverage, and E-value of 2.5*10^-8. The amino acid sequence also meets the qualifying condition for an HNH endonuclease by having H-N-H over a 30 amino acid span. CDD has no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, so it is not a membrane protein. /note=Secondary Annotator Name: Bhattarai, Aryan /note=Secondary Annotator QC: /note=Based on the above evidence, this is a real gene. Additionally, I don`t agree with the location call by primary annotator. The selected start site has a huge overlap with the next gene. The evidence is sufficient enough to determinethe function of this gene. The function call of HNH endonuclease is correct. /note= /note=Below are my notes: /note=Auto-annotation: Glimmer calls the start site at 48084 and GeneMark calls the start at 47931. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is weaker in GeneMark Host. /note=SD (Final) Score: -5.117. Although this is not the best-observed score it appears to have more supportive evidence compared to other calls. The best-observed final score is -3.550, and the z-score is 2.526, however the gap is significantly larger and overlaps with the next gene. /note=Gap/overlap: 32, which isn’t that large, and contains no coding potential. /note=Phamerator: pham: 100480. Date 1/15/2024. It is not conserved; it is an orpham. /note=Starterator: Orphams do not have Starterator reports. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 47931. /note=Function call: HNH endonuclease. For the programs, PhagesDB BLASTp, HHpred, & NCBI BLASTp, the top hits have scores that are >100 and strong e-value that are < 2.4e-8. CDD hits were not significant given the large e-values. The amino acid sequence also meets the qualifying condition for an HNH endonuclease by having H-N-H over a 30 amino acid span. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. CDS complement (47964 - 48362) /gene="72" /product="gp72" /function="HNH endonuclease" /locus tag="RomansRevenge_72" /note=Genemark calls start at 48185 /note=SSC: 48362-47964 CP: yes SCS: genemark-cs ST: NA BLAST-Start: [HNH endonuclease [Arthrobacter phage Beagle]],,NCBI, q3:s2 80.303% 1.52509E-29 GAP: -11 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.111, -2.7806870294376242, yes F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage Beagle]],,QGJ92845,61.8182,1.52509E-29 SIF-HHPRED: HNH endonuclease; Thermophilic bacteriophage, HNH Endonuclease, DNA nicking, HYDROLASE; 1.52A {Geobacillus virus E2},,,5H0M_A,46.9697,95.1 SIF-Syn: /note=Primary Annotator Name: Dweik, Qaiss /note=Auto-annotation: GeneMark calls a start site at bp 48185 with an ATG start codon but Glimmer does not call a start site at all. /note=Coding Potential: Gene does have reasonable coding potential in the reverse orientation within the putative ORF in the Self-trained GeneMark but does not demonstrate much coding potential in the Host-trained GeneMark. This may be because of the Host-trained GeneMark`s reliance on A. sp. as the host. /note=SD (Final) Score: The original start site actually has the worst SD score (-7.447). The start site with the best SD score (least negative) is the bp 48362 start site with a score of -2.781. /note=Gap/overlap: The gap with the preceding gene is too large (166 bp). The start site at bp 48362 has a reasonable overlap with the preceding gene (-11 bp) and provides the longest reasonable ORF of the alternative start site candidates. The length of the gene for the original start site (bp 48185) is 222 bp (which is reasonable) but the length for the gene with the start site at bp 48362 is longer (399 bp). /note=Phamerator: This gene is an orpham with no defined function. /note=Starterator: Starterator program is not informative because this gene is an orpham (no results shown). /note=Location call: The gathered evidence suggests this is a real gene with a start site at bp 48362 instead of the original start site at bp 48185 due to this proposed start site having the least negative SD score, the highest Z-score, and the longest reasonable ORF. /note=Function call: This gene is an HNH endonuclease. The top phagesdb BLAST hit (Beagle_86) has the function of HNH endonuclease (E-value <1e-5), and the top three NCBI BLAST hits (Beagle_86, Rizwana_65, Tank_65) also have the function of HNH endonuclease. (80% coverage, 40%+ identity, and E-value <1e-14). HHpred had a hit for HNH endonuclease with 96.5% probability and E-value of 0.012. CDD had no relevant hits. /note=Transmembrane domains: No transmembrane domains were predicted, signifying that this gene does not code for a membrane protein. /note=Secondary Annotator Name: Kang, Alix /note=Secondary Annotator QC: I agree with the function called. I agree with the annotation of this gene. However, I am uncertain of the start site between 38374 bp and 48362 bp. The synteny box needs to be filed. /note=Function call: The top 3 phagesdb BLAST hits have the function of HNH endonuclease (E-value <3e-15), and the top 3 NCBI BLAST hits also have the function of HNH endonuclease (>80% coverage, >39.81% identity, and E-value <3e-15). HHpred had 2 hits for HNH endonuclease with >96.58% probability and E-value <0.0099. CDD had no relevant hits. CDS complement (48352 - 48855) /gene="73" /product="gp73" /function="DNA binding protein" /locus tag="RomansRevenge_73" /note=Original Glimmer call @bp 48858 has strength 10.25; Genemark calls start at 48855 /note=SSC: 48855-48352 CP: yes SCS: both-gm ST: NA BLAST-Start: [DNA binding protein [Arthrobacter phage Beagle] ],,NCBI, q14:s17 55.6886% 8.00406E-11 GAP: 42 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.253, -1.953940808934884, yes F: DNA binding protein SIF-BLAST: ,,[DNA binding protein [Arthrobacter phage Beagle] ],,QGJ92849,38.961,8.00406E-11 SIF-HHPRED: a.4.5.28 (A:) automated matches {Yersinia pseudotuberculosis [TaxId: 502800]} | CLASS: All alpha proteins, FOLD: DNA/RNA-binding 3-helical bundle, SUPFAM: `Winged helix` DNA-binding domain, FAM: MarR-like transcriptional regulators,,,SCOP_d4aiha_,67.6647,99.4 SIF-Syn: /note=Primary Annotator Name: Kathiravan, Anoushka /note=Auto-annotation: The start site is identified as 48855 by GeneMark and 48858 by Glimmer. /note=Coding Potential: Based on Host-trained GeneMark and Self-Trained GeneMark there is coding potential predicted by both start sites because the region of coding falls between the start site and the stop site. It is hard to differentiate between the two start sites. /note=SD (Final) Score: The Z score is 3.253 and the Final Score is -2.782 for start site 48858. The Z score is 3.253 and the Final Score is -1.954 for start site 48855. The second start site has the highest Z score with the least negative Final score. /note=Gap/overlap: The gap is 39bp for start site 48858 and 42bp for 48855. /note=Phamerator: /note=Starterator: /note=Location call: Start site is likely 48855 /note=Function call: Transcriptional Regulator. The highest ranked phages in HHPred have the function of a transcriptional regulator with an e value of 1.9e-11 at 99.3% probability. In NCBI BLAST the top phages say hypothetical protein. There are no hits for CDD. /note=Transmembrane domains: None. This makes sense because according to HHpred this is a transcriptional regulator. /note=Secondary Annotator Name: Saud, Arnav /note=Secondary Annotator QC: /note=The primary annotator needs to add both the phamerator and starterator reports. This a real gene, but the final score is higher in the second start site at -1.954, so it should be 48855 instead of 48858. The function call is correct as a DNA binding protein is a type of transcriptional regulator, and phagesDB blastP hits also have DNA binding protein as the listed function. CDS complement (48898 - 50487) /gene="74" /product="gp74" /function="DNA primase/helicase" /locus tag="RomansRevenge_74" /note=Original Glimmer call @bp 50487 has strength 13.07; Genemark calls start at 50487 /note=SSC: 50487-48898 CP: yes SCS: both ST: SS BLAST-Start: [DNA primase/helicase [Arthrobacter phage Pureglobe5]],,NCBI, q1:s1 100.0% 0.0 GAP: -17 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.773, -4.0178693334748665, no F: DNA primase/helicase SIF-BLAST: ,,[DNA primase/helicase [Arthrobacter phage Pureglobe5]],,UYL87456,75.4682,0.0 SIF-HHPRED: DNA primase/helicase; helicase, ATPase, hexamer, DNA replication, HYDROLASE, TRANSFERASE-DNA complex; HET: TTP; 3.2A {Enterobacteria phage T7},,,6N7I_F,76.3705,100.0 SIF-Syn: DNA primase/polymerase/helicase, upstream gene is DNA Binding Protein, downstream is HNH endonuclease. Shares synteny with gene of same function in BruhMoment which doesn`t have assigned functions for flanking genes. /note=Primary Annotator Name: Aves, Alexandra /note=Auto-annotation: Both Glimmer and GeneMark call the start site of this gene as 50487. The start codon is ATG, all GM coding capacity is covered and the selection is the LORF. /note=Coding Potential: The Host-Trained and Self-Trained GeneMark suggest that this is a real gene as both display strong coding potential that is covered between the suggested start site of 50487 and the stop of 48898. /note=SD (Final) Score: The selected gene has the highest z-score (2.773) and the least negative final score (-4.018). /note=Gap/overlap: The selected gene has an overlap of -17. /note=Phamerator: As of 01/08/24, Phamerator calls the pham as 5919. There are 11 members in this pham, 3 of which are drafts and all of which are in the cluster AP, such as Beagle and Wilde. The finalized genes have the function listed as DNA primase, helicase, and/or polymerase. /note=Starterator: As of 01/08/24, Starterator calls the pham as 5919. There are 11 members in this pham, 3 of which are drafts and all of which are in the cluster AP, such as Beagle and Wilde. All of the genes call start number 3, which is 50487 for RomansRevenge, in agreement with the auto annotated start site. /note=Location call: Based on the evidence gathered above, the start site for this gene is most likely 50487. /note=Function call: There are numerous PhagesDB hits suggesting the function of this gene is a DNA primase/helicase/polymerase, including hits with Beagle and BruhMoment which have e-values of 0. HHPRED demonstrates strong hits with other proteins of DNA primase/helicase as well as ATPase such as that with 6N71_C. NCBI Blast also demonstrates hits with 100% coverage and 60%+ identity such as that with Pureglobe5 and Odyssey395. /note=Transmembrane domains: There is no evidence for transmembrane properties. /note=Secondary Annotator Name: Daniel, Mila /note=Secondary Annotator QC: I agree with this annotation. All of these evidence categories have been considered. However, the primary annotator needs to make sure both drop-down menus have had options selected (missing Starterator and All GM Coding Capacity). Also make sure to mention which programs you pulled evidence from for functional call as well as other supporting evidence (coverage, identity, etc.), as well as select the applicable evidence down below. Make sure to fill out your synteny box. CDS complement (50471 - 50908) /gene="75" /product="gp75" /function="HNH endonuclease" /locus tag="RomansRevenge_75" /note=Original Glimmer call @bp 50908 has strength 2.19; Genemark calls start at 50908 /note=SSC: 50908-50471 CP: yes SCS: both ST: SS BLAST-Start: [HNH endonuclease signature motif containing protein [Nocardia wallacei]],,NCBI, q24:s21 79.3103% 3.1076E-7 GAP: -22 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.022, -2.794070146961553, yes F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease signature motif containing protein [Nocardia wallacei]],,WP_280332887,42.069,3.1076E-7 SIF-HHPRED: HNH homing endonuclease; HNH catalytic motif, Helix-turn-helix DNA binding domain, protein-DNA complex, DNA binding protein-DNA COMPLEX; HET: EDO; 2.92A {Bacillus phage SPO1} SCOP: d.285.1.1, d.4.1.3,,,1U3E_M,44.8276,98.9 SIF-Syn: Gene 75 in RomansRevenge aligns with gene 48 in Angelicage (cluster DE). Gene 75 in RomansRevenge aligns with gene 10 in Phamished (cluster B1). No other synteny since gene 76 in RomansRevenge is NKF and previous genes in other two phages don’t align either. /note=Primary Annotator Name: Ebrahimi, Shaghaiegh Mariam /note=Auto-annotation start source: Glimmer and Genemark both call start at 50908. /note=Coding Potential: Good coding potential is present in Self-Genemark and HostGenemark. It is present throughout the entire reading frame and only in the reverse strand. /note=SD (Final) Score: -2.794 this is the largest final score among the start sites. /note=Gap/overlap: Gap is -22 which is relatively large overlap, but it is smallest overlap and without it coding potential would not be covered and there would be large gap between previous gene. /note=Phamerator: Pham as of 1/10/24 is 131754 and includes 314 members across various clusters (14 of which are drafts). /note=Starterator: There are 313 members of this Pham (20 drafts). 143/293 non-draft members call start site 98, but it is not present in RomansRevenge. Start 142 at 50908 is called, and is found only in 2 phages. It is not most annotated in Starterator, but was chosen because it had the smallest gap and was present in phage. /note=Location call: Most likely start site is 50908 since it was called by both programs and has the highest z-score and final score. /note=Function call: Most likely function is HNH endonuclease. Good hits on phagesdb BLAST with e-values ranging from 1e-87 to 0.004. 6 good hits on HHpred with e-values ranging 2.4e-18 to 6.2e-9. Many good hits on NCBI BLAST ranging from 3e-8 to 0.003. Also fulfills 30 AA requirement for call. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs and so this is not a transmembrane protein. /note= /note=Secondary Annotator Name: Dweik, Qaiss /note=Secondary Annotator QC: I agree with the location and function call for this gene`s annotation but I don`t think the synteny box needs to be filled out for an HNH endonuclease gene. CDS complement (50887 - 51348) /gene="76" /product="gp76" /function="hypothetical protein" /locus tag="RomansRevenge_76" /note=Original Glimmer call @bp 51348 has strength 9.17; Genemark calls start at 51348 /note=SSC: 51348-50887 CP: no SCS: both ST: NA BLAST-Start: [DNA binding protein [Arthrobacter phage Beagle] ],,NCBI, q76:s170 50.9804% 5.06789E-8 GAP: 235 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.955, -3.109529858048176, yes F: hypothetical protein SIF-BLAST: ,,[DNA binding protein [Arthrobacter phage Beagle] ],,QGJ92855,18.2186,5.06789E-8 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Tran, Michelle /note=Auto-annotation: Both Glimmer and GeneMark call the start site for this gene as 51348. /note=Coding Potential: The coding potential for this gene is indicated on both the host-trained and self-trained GeneMark in the assigned ORF. This coding potential is only found on the reverse strand, which supports the annotation of this as a reverse gene. /note=SD (Final) Score: The final score for this gene’s start site is 2.955, which is the best final score for this gene available on PECAAN. /note=Gap/overlap: 235-bp gap. This gap is large, but it is the smallest gap available for this gene on PECAAN. /note=Phamerator: As of January 10, 2024, this gene is in pham 100800. It is an orpham. /note=Starterator: Because pham 100800 is an orpham, the Starterator page for it was unavailable. /note=Location call: Based on the evidence above, this gene is real and is located at start site 51348. /note=Function call: NKF. PhagesDB yields some strong hits (e-value <10e-10), but these hits do not count because they are in different phams (and this gene’s pham is currently an orpham). HHPred and the Conserved Domain Database similarly do not yield any useful results. /note=Transmembrane domains: This is not a membrane protein because DeepTMHMM does not predict any TMDs. It is impossible to determine the protein’s function from there because it currently has NKF. /note=Secondary Annotator Name: Kathiravan, Anoushka /note=Secondary Annotator QC: I agree with the start selected as it has the highest Z-score (2.955) and Final score (-3.110). There were no useful data for function determination as well so I agree with the NKF function call. CDS complement (51584 - 51955) /gene="77" /product="gp77" /function="helix-turn-helix DNA binding domain" /locus tag="RomansRevenge_77" /note=Original Glimmer call @bp 51955 has strength 3.18; Genemark calls start at 51955 /note=SSC: 51955-51584 CP: yes SCS: both ST: NA BLAST-Start: [helix-turn-helix domain-containing protein [Saccharospirillaceae bacterium] ],,NCBI, q1:s1 74.7967% 1.82552E-15 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.869, -2.9063687850157054, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix domain-containing protein [Saccharospirillaceae bacterium] ],,MCD8523084,31.7204,1.82552E-15 SIF-HHPRED: DNA REPLICATION PROTEIN DNAD; PRIMOSOME, DNA-BINDING PROTEIN, DNA BINDING PROTEIN; HET: CL; 2.0A {BACILLUS SUBTILIS},,,2V79_B,78.0488,97.9 SIF-Syn: No synteny with other phages /note=Primary Annotator Name: Kim, Abby /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 51955. /note=Coding Potential: Coding potential is on the reverse strand, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -2.906. It is the best final score on PECAAN. /note=Gap/overlap: 2 bp gap which is very small. There is no coding potential in the gap that might be a new gene as well. /note=Phamerator: pham: 100706. Date 1/10/24. It is the only member in this pham (orpham). /note=Starterator: There is no Starterator report because the gene is an orphan. /note=Location call: Based on the evidence, this is a real gene and the most likely start site is 51955. /note=Function call: DNA binding domain protein. The second and third phagesDB BlastP hits have the function of a DNA binding domain protein (LittleTokyo (AS2): 2e-17 and RedFox (AS3): 4e-17). The top NCBI BLAST hit has the function call of a DNA binding domain protein with a query cover of 74% (e-value: 2e-18 and 46.74% identity). HHpred had a hit for DNA binding protein with a (2V79_B) 97.9% probability and an e-value of 0.0021. When running CDD, it does not produce any hits so there is no available data. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Aves, Alexandra /note=Secondary Annotator QC: Please add the phage`s name + cluster to your evidence for function hits. Also, check CDD again for evidence or please explain why you aren`t including it (check using a link outside of PECAAN). Lastly, consider including "HTH" (helix turn helix) in your function call as SEAPHAGES encourages. CDS complement (51958 - 52251) /gene="78" /product="gp78" /function="hypothetical protein" /locus tag="RomansRevenge_78" /note=Original Glimmer call @bp 52251 has strength 7.92; Genemark calls start at 52251 /note=SSC: 52251-51958 CP: yes SCS: both ST: NA BLAST-Start: GAP: 224 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.955, -2.644643059745525, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: To, Nathan /note=Auto-annotation: Both Glimmer and Genemark call this start site at 52251 with start codon ATG. /note=Coding Potential: Both Genemark Self and Host show coding potential in this ORF is on the complementary strand only, indicating that this is a reverse gene. /note=SD (Final) Score: -2.645, this is the best final score available and is much higher than other options, with the second highest being -5.555. /note=Gap/overlap: 224bp, this is a relatively large gap but is still reasonable, additionally, the only other start site that could decrease it is still a gap of 167bp and has a much worse Z-score, final score, and start codon. /note=Phamerator: This gene is an orpham as of 1/9/2024. No function call is given. /note=Starterator: This is an orphan so there is no starterator report. /note=Location call: The gathered evidence suggests that this is a real gene, with start site @52251. This gene has good coding potential, and does not have large gaps before or after it. The start site 52251 seems most likely due to both Glimmer and Genemark calling it, its good Z score and final score, and a start codon of ATG. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs. /note=Secondary Annotator Name: Ebrahimi, Shaghaiegh Mariam /note=Secondary Annotator QC: I agree with the location call. I would mention pham number even though it is orpham (it is 100610 as of 1/16/24). I am not sure of the function call as there are no hits with low e-values (the lowest is 0.062 on phagesdb and other programs don`t show any below 1). Would mention orpham in synteny box. Make sure to check suggested start box. CDS complement (52476 - 52994) /gene="79" /product="gp79" /function="hypothetical protein" /locus tag="RomansRevenge_79" /note=Original Glimmer call @bp 52994 has strength 14.14; Genemark calls start at 52994 /note=SSC: 52994-52476 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein PP641_gp069 [Arthrobacter phage SilentRX] ],,NCBI, q100:s28 38.9535% 1.63824E-18 GAP: 72 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.265, -4.390184094340439, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PP641_gp069 [Arthrobacter phage SilentRX] ],,YP_010656450,37.2263,1.63824E-18 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Valente, Nina /note=Auto-annotation: Both Glimmer and GeneMark call this gene. They both agree on start site 52994. /note=Coding Potential: Almost the entire coding potential is covered by this start site. There is a very small piece not contained within this range, but changing the start site would not affect this. This gene is in the reverse direction, with the given start site providing the longest ORF. /note=SD (Final) Score: The final score is -4.390, which is the best (least negative) score provided. /note=Gap/overlap: There is a 72 base pair gap with the upstream gene. The given start site of 52994 provides the longest ORF. The length of the gene is acceptable at 519 base pairs. /note=Phamerator: As of 1/9/2024, the pham is listed as 100713. There are no other members of this pham on phagesDB, and an error message provided by Starterator. /note=Starterator: Error message provided by Starterator, since the start site cannot be compared to other start sites in similar genes in the pham, since there are no other members of the pham. /note=Location call: Based on the above evidence, being auto-annotation and the length of the ORF, the start site can be called at 52994. This appears to be a real gene. /note=Function call: The top 10 Blastp hits from phagesDB show function unknown. The top hit has an e-value of 2E-17 and identities = 64%, while the second hit has an e-value of 6E-16 and identities= 47%. CDD showed no results. The top two HHpred hits showed De novo designed protein, with e-values of 0.63 and 27 respectively. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Tran, Michelle /note=Secondary Annotator QC: I agree with the location and call functions shown above, but the Starterator box should be updated to NA. The synteny box should also be left blank due to the protein having NKF. CDS complement (53067 - 53660) /gene="80" /product="gp80" /function="SSB protein" /locus tag="RomansRevenge_80" /note=Original Glimmer call @bp 53660 has strength 11.7; Genemark calls start at 53660 /note=SSC: 53660-53067 CP: yes SCS: both ST: NI BLAST-Start: [single-stranded DNA-binding protein [Nocardioidaceae bacterium]],,NCBI, q12:s13 64.467% 1.06398E-16 GAP: 6 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.735, -5.158067224613184, no F: SSB protein SIF-BLAST: ,,[single-stranded DNA-binding protein [Nocardioidaceae bacterium]],,MBA2419405,44.382,1.06398E-16 SIF-HHPRED: Single-stranded DNA-binding protein; Single strand DNA-binding domain, SSB, RecO, ExoI, RecQ, DnaG, HolC, DNA BINDING PROTEIN; 2.2A {Escherichia coli} SCOP: b.40.4.3,,,4MZ9_D,57.3604,100.0 SIF-Syn: No synteny with other phages. /note=Primary Annotator Name: Laureano, Ryan /note=Auto-annotation: GeneMark and Glimmer call start site at 53660. The codon is ATG. /note=Coding Potential: Start to stop site covers good coding potential. More than 80% is covered. /note=SD (Final) Score: This start site does not have the highest final score. The current final score is -5.158 while the highest is -4.093. /note=Gap/overlap: The gap for this start site it 6 which is the smallest gap. This start site provides the longest ORF at 594 bp which is rather large. /note=Phamerator: As of 1/10/24 this gene is in 131706 pham. It is a singleton in this pham. Genes in this pham include Bruin_97 and Cookies_97. /note=Starterator: The most annotated start site is #93 which was called 289/464 times. This gene for RomansRevenge was not called at start site #93 and instead for #91 which is at 53660. /note=Location call: The gene is real and has starterator data and the LORF information to back the proposed start site. /note=Function call: The predicted function is a ssDNA binding protein. BLASTp runs show hits with SadLad_62 with a low e-value at 4e-23 and RubyRalph at 2e-22. HHpred data with high probability, high coverage, and low e-value states the function as ssDNA binding protein. /note=Transmembrane domains: There are no TMRs which indicates the protein is nota transmembrane protein. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: I agree with this annotation, except, for the phamerator portion I would include a few of the other members in the same pham instead of just putting that its a singleton. (check for some spelling errors) CDS complement (53667 - 53969) /gene="81" /product="gp81" /function="hypothetical protein" /locus tag="RomansRevenge_81" /note=Original Glimmer call @bp 53969 has strength 10.65; Genemark calls start at 53969 /note=SSC: 53969-53667 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein [Paeniglutamicibacter kerguelensis] ],,NCBI, q8:s9 86.0% 6.02584E-13 GAP: 109 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.888, -4.897971560974249, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Paeniglutamicibacter kerguelensis] ],,WP_210001901,53.6842,6.02584E-13 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Ryan, Kaitlin /note=Auto-annotation: Glimmer and GeneMark. Both call the start site as 53969. /note=Coding Potential: High coding potential in the reverse direction as indicated by only the complementary sequence on both GeneMark Host and GeneMark S outputs /note=SD (Final) Score: -4.898. This is the best final score on PECAAN since it is the least negative score. /note=Gap/overlap: Gap of 109 base pairs. This is the shortest possible gap and there is no coding potential in the area indicative of a new gene. /note=Phamerator: Pham: 87868. Date: 1/8/2024. This gene does show some minor synteny with phages Pureglobe5 and Odyssey395, both of which show some similarity via protein BLAST hits to RomansRevenge, despite RomansRevenge being a singleton. /note=Starterator: Start site 8 was the most annotated in 4 out of 7 non-draft genes in this pham. However, this start site was not manually annotated for this phage gene. /note=Location call: Based on the above evidence, this is a real gene and the start site is 53969. /note=Function call: Function unknown, as called by the majority of significant hits from PhagesDB protein BLAST and NCBI BLAST, as supported also by relevant hits from phage genomes used for comparison from cluster AP (Beagle, Odyssey395, Pureglobe5). HHPRED and CDD were irrelevant. /note=Transmembrane domains: Deep TMHMM does not predict any TMDs; therefore, this is not a membrane protein. /note=Secondary Annotator Name: To, Nathan /note=Secondary Annotator QC: I have QCed this annotation and I agree with the location call due to the convincing z-score and final score, favoring this start as well as having a reasonable start codon and LORF with good coding potential. I also agree with the function call, there are no strong HHpred hits and blast hits show NKF, demonstrating that this gene cannot have a function call made for it. CDS complement (54079 - 54471) /gene="82" /product="gp82" /function="hypothetical protein" /locus tag="RomansRevenge_82" /note=Original Glimmer call @bp 54471 has strength 20.33; Genemark calls start at 54471 /note=SSC: 54471-54079 CP: no SCS: both ST: NA BLAST-Start: [hypothetical protein SEA_PUREGLOBE5_98 [Arthrobacter phage Pureglobe5]],,NCBI, q6:s38 93.0769% 1.26731E-5 GAP: 122 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.005, -2.4787699911121788, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_PUREGLOBE5_98 [Arthrobacter phage Pureglobe5]],,UYL87461,40.8537,1.26731E-5 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sanchez, Kayla /note=Auto-annotation: Both Glimmer and Genemark. Both call the start at 54471. /note=Coding Potential: Good coding potential is found both in GeneMark Self and Host. Our start and stop is outside of the coding potential. Coding potential is in the reverse region so we can say that our gene is a reverse gene. /note=SD (Final) Score: -2.479. It is the best final score on PECAAN because it is the lowest negative value. /note=Gap/overlap: Gap of 217bp. This is a large gap but there is no coding potential in any of the other reading frames so it is not a sign for concern. Acceptable gene length of 393bp. /note=Phamerator: pham: 100625. Date 9/11/23. It is a singleton so there are no other phages that have this pham so it is an orpham. /note=Starterator: Starterator was not found because it is an orpham. /note=Location call: Based on the good coding potential and the small e-value, this gene is real and the most likely start site is 54471. /note=Function call: Function unknown. The top two PhagesDB BLASTp hits have an unknown function with an e-value of 8e-10 and 5e-9, and the top two NCBI BLAST hits have the function of a hypothetical protein but have low identity and a high e-value. The hits given to us by HHpred all had very large e-values. No hits on CDD. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. It predicts that the protein is found on the inside of the membrane. /note=Secondary Annotator Name: Valente, Nina /note=Secondary Annotator QC: Explain gap a bit more/discuss gap (check for coding potential in the gap, check other genomes, indicate that you have checked both), anything from CDD? Select NA for starterator drop down. CDS complement (54594 - 56051) /gene="83" /product="gp83" /function="hypothetical protein" /locus tag="RomansRevenge_83" /note=Original Glimmer call @bp 56051 has strength 13.48; Genemark calls start at 56051 /note=SSC: 56051-54594 CP: yes SCS: both ST: NA BLAST-Start: GAP: 49 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.276, -4.016212351373635, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hon, Darren /note=Auto-annotation: Gene 83 (stop@54594R) /note=Coding Potential: There is good coding potential demonstrated by both host-trained and self-trained GeneMark. Glimmer and GeneMark both call the start site as 56051. There are also no switches in gene orientation and the length of the gene itself is greater than 120bp. It is also the LORF. /note=SD (Final) Score: The start codon is ATG. The final score of -4.016 and z-score of 2.276 indicate a good RBS score. Thus, the original start site of 32363 is kept. /note=Gap/overlap: The gap of 49 is large, but due to having strong coding potential, this gap is likely associated with a non-coding region or extraneous nucleotides. /note=Phamerator: As of 1/13/2024, there are no other genes in the pham, 100826, indicating that this gene is an orpham. /note=Starterator: As of 1/13/2024, there are no other genes in the pham, 100826, indicating that this gene is an orpham. Both starterator and phamerator call 100826 as the pham. /note=Location call :Strong coding potential is found between the start site of 56051 and stop site of 54594. The start site is supported by both Glimmer and GeneMark. Additionally, the final score of -4.016 and z-score of 2.276 are optimal values. This gene is the LORF. The phamerator and starterator, both calling 100826, indicate an orpham. The gap of 49 is large but negligible, as it can be attributed to a non-coding region, evident by the location of surrounding genes. /note=Function call: According to the blastp hits, other genes indicate that there are no known function. CDD did not have any domain hits. HHPred had hits that had E-values > 0, indicating that this gene is not likely to have any of these functions. In conclusion, this gene has no known function. /note=Transmembrane domains: According to DeepTMHMM, this is not a transmembrane protein as there is an absent predicted number of TMDs. /note=Secondary Annotator Name: Laureano, Ryan /note=Secondary Annotator QC:: I agree with this annotation, just make sure to mark boxes that have good e-values. CDS complement (56101 - 57114) /gene="84" /product="gp84" /function="hypothetical protein" /locus tag="RomansRevenge_84" /note=Original Glimmer call @bp 57114 has strength 8.92; Genemark calls start at 57114 /note=SSC: 57114-56101 CP: yes SCS: both ST: NA BLAST-Start: GAP: 43 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.836, -5.024417463850961, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: This gene, along with the one upstream and downstream, do not have a defined function. /note=Primary Annotator Name: Claire, Monjov /note=Auto-annotation: Both glimmer and genemark call a start site of 57114. The start codon predicted is an ATG. /note=Coding Potential: The gene has strong coding potential in the reverse direction over the open reading frame which is a great indicator that the gene is real because this is a reverse gene. The start site is just outside of the coding potential which is something we’d expect. /note=SD (Final) Score: The final score is -5.024 and the z-score is 1.836. This is not the least negative final score, there is one closer to -4. Also, there are other z-scores out there that have values over 2 which this start doesn’t. The reason I think this is still the best start site is that this start site results in the longest gene and the most reasonable gap. The other start sites have a gap of +50 while this is the only one that is lower than 50. /note=Gap/overlap: This gene only has a 47 base pair gap which is slightly large but reasonable considering the size of the gene itself. The other values were far too high which means this gap is the best option. I would consider this strong evidence to keep the start site 57114. /note=Phamerator: This gene is found in pham 100525 as of 1/10/2024. There are no other members of this pham and no function call. /note=Starterator: Starterator is uninformative because there are no other members of the pham to compare this gene to. There is no website that pops up when I click starterator for this gene. /note=Location call: Based on the high coding potential, the Glimmer and GeneMark start site, and the logical basis of the gap distance, I believe the start site for this gene is 57114. I believe this gene is real because of these reasons as well. /note=Function call: The function of these genes remains unknown as there isn’t enough information to build off of. Since there are no genes in the pham, we cannot predict a function. /note=Transmembrane domains: There are no hits for TMD proteins indicating that this is most likely not a transmembrane protein gene. /note=Secondary Annotator Name: Ryan, Kaitlin /note=Secondary Annotator QC: I agree with all of the above evidence. Just make sure to note the drop-down menu for Starterator as "NA" since no Starterator report was found due to this phage being an orpham. CDS complement (57158 - 57361) /gene="85" /product="gp85" /function="hypothetical protein" /locus tag="RomansRevenge_85" /note=Original Glimmer call @bp 57352 has strength 5.35; Genemark calls start at 57352 /note=SSC: 57361-57158 CP: no SCS: both-cs ST: NA BLAST-Start: GAP: 208 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.399, -4.584232102922595, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: No observed synteny since there are no final comparisons that can be made as the other phages are drafts. /note=Primary Annotator Name: Labib, Youstina /note=Auto-annotation: Glimmer identifies the start site at 57352. GeneMark identifies the start site to be 57352. Both start sites match, indicating a greater likelihood of this being the correct start site. /note=Coding Potential: There is good coding potential observed within the fourth open reading frame for this gene on the GeneMarkS output report and GeneMark.hmm. This is supportive evidence that this gene is real and the coding potential on the reverse confirms this is a reverse gene. /note=SD (Final) Score: The SD (final) score is -4.584, though this is not the best-observed score it appears to have more supportive evidence compared to other calls. The best observed final score is -4.351 but it prompts a larger gap. The z-score is 2.142 which is not the best of those called but is relatively close to the optimal z-score of 2. The best z-score observed is 2.399 but other components of this start site are not preferable. /note=Gap/overlap: This gene has a 217bp gap with the gene downstream. This gap was investigated and no coding potential was observed within this gap. Myself and the instructional team agreed no gene modification or gene addition is needed. /note=Phamerator: #100433 observed on 1/10/24. This gene does not have any other members within its pham. This gene is also not a part of any particular subcluster observed from phamerator. /note=Starterator: An error message occurred indicating that this gene is an orphan and does not have any other matches within its identified pham. /note=Location call: Based on the above evidence, it appears this is a real gene and the start site is 57352. /note=Function call: Based on the lack of functional evidence it appears that this gene has no known function (NKF). There are no significant e-values demonstrated by PhagesDB blastp data beside NKF. There are no NCBI or CDD hits. The HHPRED data demonstrates calls in homo sapiens and other eukaryotes thus demonstrating no significant functions for this gene. /note=Transmembrane domains: There are no transmembrane domains observed. All the graph data supports inside signals indicating that this gene is not a transmembrane component. /note=Secondary Annotator Name: Sanchez, Kayla /note=Secondary Annotator QC: I agree with the calls the primary annotator has made. CDS complement (57570 - 57863) /gene="86" /product="gp86" /function="hypothetical protein" /locus tag="RomansRevenge_86" /note=Original Glimmer call @bp 57863 has strength 12.57; Genemark calls start at 57863 /note=SSC: 57863-57570 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_BRUHMOMENT_102 [Arthrobacter phage BruhMoment]],,NCBI, q1:s3 78.3505% 1.00979E-13 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.372, -3.8137921535956054, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BRUHMOMENT_102 [Arthrobacter phage BruhMoment]],,UOK18418,30.8642,1.00979E-13 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Woodward, Lauren /note=Auto-annotation: Glimmer and GeneMark both called the start site at 57863. /note=Coding Potential: There is coding potential in the second ORF of the reverse direction in both Host and Self-trained GeneMark, indicating that this is a reverse gene. The coding potential extends past the suggested start site. /note=SD (Final) Score: The z-score is 2.372 and the final score is -3.814. These were not the highest scores of the suggested sites, but the only site that had higher scores would result in a gene length of 57 base pairs, which is too small. /note=Gap/overlap: There is a gap of 9 base pairs. This is the smallest possible gap. /note=Phamerator: This gene is a member of pham #10875 as of 1/8/24. There are three non-draft members of this pham, which are all part of cluster AP. /note=Starterator: Start site 6 was manually annotated in 2/3 non-draft members of the pham; however the RomansRevenge gene does not have this start site. The start site suggested for RomansRevenge is not present in any other members of the pham. /note=Location call: Based on the above evidence, this is a real gene, and the start site is 57863. /note=Function call: NKF; BLASTp had 8 significant non-draft hits, but none had a known function. NCBI BLAST had 6 significant hits; none had a known function. HHpred and CDD had no significant hits. /note=Transmembrane domains: There were no TMDs, so this is not a membrane protein. /note=Secondary Annotator Name: Hon, Darren Wing-Hang /note=Secondary Annotator QC: I agree with the location and function call. Affirmations of BLASTp, NCBI Blast, HHPred, and CDD hits demonstrates an UKF. CDS complement (57873 - 58130) /gene="87" /product="gp87" /function="hypothetical protein" /locus tag="RomansRevenge_87" /note=Original Glimmer call @bp 58130 has strength 9.81; Genemark calls start at 58130 /note=SSC: 58130-57873 CP: yes SCS: both ST: NI BLAST-Start: GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.137, -5.217052647771982, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Carnes, Julianne /note=Auto-annotation: Glimmer and Genemark both call the start site at 58130. /note=Coding Potential: There is coding potential in both Host and Self Genemark only in the reverse direction. /note=SD (Final) Score: -5.217 is the score for this gene. Z-score is 2.137. /note=Gap/overlap: There is a 3 base pair gap. This is reasonable. /note=Phamerator: Orpham. No phamerator report available as of 1-9-24. No synteny /note=Starterator: Orpham. No starterator report available as of 1-9-24. /note=Location call: It is likely the start site is at 58130. /note=Function call: No known function. No hits in Blastp, NCBI Blastp, CDD, and insignificant hits in HHpred (lowest e-value is 4). /note=Transmembrane domains: None. /note=Secondary Annotator Name: Claire, Monjov /note=Secondary Annotator QC: I agree that this is a real gene and the start site is 58130. There is high coding potential in the reverse direction like we`d expect along with both glimmer and genemark agreeing on a start site. One thing to look at is the final score mentioned above is not the best possible score, but it is an acceptable one. I think that start site is the best one but the wording needs to change. There is little evidence to support this function of these gene as it is not in a pham. This is expected as a singleton, so NKF is an acceptable conclusion. Overall, the primary annotator was correct in my opinion. CDS complement (58134 - 58454) /gene="88" /product="gp88" /function="hypothetical protein" /locus tag="RomansRevenge_88" /note=Original Glimmer call @bp 58454 has strength 9.94; Genemark calls start at 58454 /note=SSC: 58454-58134 CP: yes SCS: both ST: NA BLAST-Start: GAP: 47 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.836, -4.944375376130201, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chamorro, Marco /note=Auto-annotation: Glimmer and GeneMark both called the start site at 58454. This is a ATG codon. /note=Coding Potential: Host trained GeneMark shows coding potential in the reverse direction. The Self trained GeneMark (GeneMarkS) shows coding potential in the same direction. There are currently no final draft phages that are comparable, so this gene does not have synteny with others. /note=SD (Final) Score: This start site has a final score of -4.944. Relative to the other starts sites, this is not the best score, but has the smallest gap (47), and aligns well with both of the coding regions from the Host-Trained/Self-Trained Algorithms. /note=Gap/overlap: The gap of this start site is 47, which is not unreasonable. /note=Phamerator: As of 1/9/24, this gene is part of pham#100441, and it is the only member. /note=Starterator: Since this gene is the only member of its pham, there are no comparisons for Starterator /note=Location call: Based on the evidence above, this is a real gene with a start site of 58454. This start site had the third best final score and z-score (1.836). This is the most likely start site because it minimizes the gap (47), and it fits the best within the coding potential shown in the Host and Self-trained GeneMark. Additionally, the 58391 start site was considered because it had a strong Z-score and final score, however it was ruled out due to the large gap it created, and it would exclude regions of coding potential on GeneMarkS/GeneMark. /note=Function call: Unknown Function. There were no CDD hits, and no NCBI blast hits. There are BlastP hits, however none of the hits have an e-value under 1e-7. The HHPred hits have high e-values which makes them unreliable. The predicted function yielded by PhagesDB is accurate as it is based on a separate pham. This is a real gene due to the coding potential but it is likely no one has investigated this gene before. /note=Transmembrane domains: TmHmm predicts 0 TMDs /note=Secondary Annotator Name: Labib, Youstina /note=Secondary Annotator QC: The location and function call appear correct. The start site is confirmed to be 58454 and there is no supportive evidence of a known function thus I agree with NKF as the function call. CDS complement (58502 - 59452) /gene="89" /product="gp89" /function="hypothetical protein" /locus tag="RomansRevenge_89" /note=Original Glimmer call @bp 59452 has strength 15.41; Genemark calls start at 59452 /note=SSC: 59452-58502 CP: yes SCS: both ST: SS BLAST-Start: GAP: 42 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.614, -5.4744737427355155, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Mahadev, Anirudh /note=Auto-annotation: Glimmer and GeneMark both call the start site at 59452 as the best start site for this gene. The called start codon is ATG. /note=Coding Potential: This gene has reasonable coding potential in the fourth reading frame of the Host Trained GeneMark. /note=SD (Final) Score: Several possible start sites were auto-annotated for this gene. While the Z score and final score are better for the start site at 59284, the presence of coding potential after 59284 that drops before the auto-annotated start site at 59452 leads me to believe that the start site at 59452 was correct. The Z score and final score of the 59452 start site are 1.614 and -5.474 respectively. /note=Gap/overlap: The gap is 42 base pairs for the 59452 start site, which is the smallest gap among possible start sites. /note=Phamerator: This gene did not have a useful page on the PhagesDB Pham page as it is in an orpham. /note=Starterator: The Starterator link led to an Error 404 page, which is because this gene is an orpham and does not have other genes in its pham to compare it to. /note=Location call: Based on the Z score and final score, I believe the auto-annotated start site at 59452 is the best start site. I believe this is a real gene because of the coding potential on GeneMark, the reasonable gap, and gene length. /note=Function call: PhagesDB has hits in several different clusters, with the most common hit as DNA polymerase I. The PhagesDB BLAST however did not support this. HHPRED had several hits of various protein functions and CDD did not have any hits. Based on the data from PhagesDB, HHPRED, CDD and the NCBI BLAST, I believe that this gene has NKF. /note=Transmembrane domains: According to the Deep TmHmm this gene has no transmembrane domains. /note=Secondary Annotator Name: Woodward, Lauren /note=Secondary Annotator QC: I agree with both the location and function calls. CDS complement (59495 - 59713) /gene="90" /product="gp90" /function="hypothetical protein" /locus tag="RomansRevenge_90" /note=Original Glimmer call @bp 59713 has strength 8.08; Genemark calls start at 59713 /note=SSC: 59713-59495 CP: yes SCS: both ST: NA BLAST-Start: GAP: 33 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.265, -4.03907523433314, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hosford, Ryan /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 59713 with a start codon of ATG. this is the LORF /note=Coding Potential: in HostTrained GeneMark Within the start and stop site there is solid coding potential throughout. in the reverse direction /note=SD (Final) Score: -4.039 for the called site which is not great but one of the better of the bunch /note=Gap/overlap: 33 /note=Phamerator: no data as of 1-8-24. /note=Starterator: Page not found error 404 as of 1-8-24 /note=Location call: Both auto annotations call the site at 59713 and with the final score being one of the better ones out of the given bunch, the codon being an ATG, the Z-score being 2.265 is a good sign. This is enough for me to call the start location correct. /note=Function call: Based off of HHPred and Blastp there seems to be no known function HHPred only has E-value of 39 which is outside of the normal threshold. based off that, NKF /note=Transmembrane domains: None /note=Secondary Annotator Name: Carnes, Julianne /note=Secondary Annotator QC: Also accidently annotated this gene. I agree with this annotation. My notes are below. /note= /note= /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 59713. /note=Coding Potential: Host GeneMark covers the coding potential for this entire gene. Self GeneMark also covers the entire length of the gene. /note=SD (Final) Score: -4.039 isn`t the best score, but is reasonable as the lowest final score has a 165 base pair gap (z-score 2.265). /note=Gap/overlap: There is a 33 base pair gap, which is reasonable. /note=Phamerator: Orpham. No phamerator report available as of 1-8-24. No synteny /note=Starterator: Orpham. No starterator report available as of 1-8-24. /note=Location call: The start site is likely at 59713 based on the evidence. /note=Function call: No known function. No Blastp hits. No NCBI Blastp hits. No CDD hits. No significant HHpred hits (lowest e-value was 27). /note=Transmembrane domains: None CDS complement (59747 - 59986) /gene="91" /product="gp91" /function="hypothetical protein" /locus tag="RomansRevenge_91" /note=Original Glimmer call @bp 59986 has strength 10.63; Genemark calls start at 59986 /note=SSC: 59986-59747 CP: yes SCS: both ST: NA BLAST-Start: GAP: 185 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.706, -3.1085431521568267, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Indiresan, Neeti /note=Auto-annotation: Both Glimmer and Genemark, start site 59986, start codon ATG /note=Coding Potential: The gene shows coding potential in the reverse direction according to both host and self. The chosen start site covers this coding potential. /note=SD (Final) Score: -3.109. It is the best RBS Final Score on PECAAN. /note=Gap/overlap: 185 bp, which is quite large, but contains no coding potential. Therefore, the gap cannot be filled by changing the start site. /note=Phamerator: pham: 100563. Date 01/09/2024. It is an orpham. /note=Starterator: No starterator report due to gene being an orpham. /note=Location call: Based on the evidence shown above, this is likely a real gene and the most reasonable start site is 59986. /note=Function call: NKF. PhagesDB Blastp, NCBI Blastp, and CDD had no hits. HHPred also had no informative hits (e-values did not meet the cutoff). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Chamorro, Marco /note=Secondary Annotator QC: I agree with the start site called by the primary annotator. The Z-score, final score, and coding potential support this. I agree with the NKF function call because PhagesDB Blastp, NCBI Blastp, CDD, and HHpred resulted in no hits. CDS complement (60172 - 60543) /gene="92" /product="gp92" /function="hypothetical protein" /locus tag="RomansRevenge_92" /note=Original Glimmer call @bp 60543 has strength 9.84; Genemark calls start at 60543 /note=SSC: 60543-60172 CP: yes SCS: both ST: NI BLAST-Start: GAP: 349 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.265, -4.390184094340439, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kalliomaa, Kira /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 60543. /note=Coding Potential: Coding potential in this Open Reading Frame (ORF) is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found in both GeneMark Self and GeneMark Host. /note=SD (Final) Score: -4.390. This is the best final score on PECANN. /note=Gap/overlap: 349 bp. This is a large gap but is acceptable since there is no coding potential in this gap that indicates a new gene. /note=Phamerator: 100462. Date Accessed 01/10/2024.Not conserved in any phages; orpham. /note=Starterator: Orphan, no data present. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 60543. /note=Function call: NKF /note=Transmembrane domains: 0; DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note= /note=Secondary Annotator Name: Mahadev, Anirudh /note=Secondary Annotator QC: I agree with the primary annotator`s decision to mark this gene as NKF. CDS complement (60893 - 61699) /gene="93" /product="gp93" /function="helix-turn-helix DNA binding domain" /locus tag="RomansRevenge_93" /note=Original Glimmer call @bp 61609 has strength 16.73; Genemark calls start at 61609 /note=SSC: 61699-60893 CP: no SCS: both-cs ST: NI BLAST-Start: GAP: 0 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.058, -6.586097562032079, no F: helix-turn-helix DNA binding domain SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Potter, Sofia /note=Auto-annotation start source: Glimmer and GeneMark both call the start at 61609. The start codon for this site is ATG. /note=Coding Potential: There is good coding potential throughout the entirety of the putative ORF in GeneMark, suggesting that this is a real gene. /note=SD (Final) Score: The final score is -4.101 for the selected site, the highest on PECAAN. /note=Gap/overlap: This is the final gene listed for this phage and it is in the reverse direction, so gap/overlap is not applicable. /note=Phamerator: As of 1/10/24, this gene is in pham 100382. This pham is shared by this gene and the gene immediately before it in this same phage genome, presumably due to a duplication event. /note=Starterator: Because this gene is effectively within an orpham (along with its duplicate before it), Starterator does not provide valuable information. /note=Location call: Because the autoannotated start site provides the highest final score, a Z-score above 2, and encompasses all coding potential of the gene, the start site can be called at this position, 61609. /note=Function call: Even though the e-values and percent coverage on the top HHPred hits are less than ideal, there is evidence to call this gene a helix-turn-helix DNA binding domain. The SEA-PHAGES criterion for a helix-turn-helix DNA binding domain is reached by the HHPred results: multiple alpha helices are hit with spacer regions between. This is further supported by the same call being reached on the first gene, with the PhagesDB entry for this phage listing the character of genome ends as direct terminal repeat. /note=Transmembrane domains: DeepTMHMM does not predict any transmembrane domains. /note=Secondary Annotator Name: Hosford, Ryan /note=Secondary Annotator QC: After review of the evidence for both start site and function calls I believe with the primary annotator on the start site of 61609 and the function of a helix-turn-helix binding domain. also no transmembrane domains were observed.