CDS 271 - 495 /gene="1" /product="gp1" /function="hypothetical protein" /locus tag="Clarkson_1" /note=Original Glimmer call @bp 271 has strength 15.92; Genemark calls start at 274 /note=SSC: 271-495 CP: yes SCS: both-gl ST: NI BLAST-Start: [hypothetical protein SEA_PRINGAR_1 [Mycobacterium phage Pringar]],,NCBI, q2:s1 98.6487% 9.1355E-44 GAP: 0 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.823, -3.0301037898918786, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_PRINGAR_1 [Mycobacterium phage Pringar]],,QFP96865,98.6301,9.1355E-44 SIF-HHPRED: SIF-Syn: /note=Start site call notes: Coding potential does extend slightly past start site of 271, but not all the way back to 121. There are only 3 non-draft members in pham, and 3 of the 3 non-draft members call start site 4, which correlates to a start site of 274 bp for Clarkson. However, site 3 (271) is just one codon earlier, GTG, and has a better Z/RBS score. Glimmer also calls 271. Calling that one. -AF /note=Primary Annotator Name: Abana, Juana /note=Auto-annotation:Glimmer calls the start at 271 and Genemark calls the start at 274. /note=Coding Potential:The coding potential in this ORF is predominantly seen in the forward strand which shows that this is a forward gene. There is coding potential found in both the Host GeneMark and the Self GeneMark. Moreover, both the Host and Self GeneMark include all of the coding potential from the chosen Glimmer start site. /note=SD (Final) Score: -3.030 is the best final score on PECAAN and the z-score is the highest at 2.823. /note=Gap/overlap: There is no gap/overlap upstream of the gene since this is the first gene in the phage genome. Moreover downstream of the gene there is a small gap but there is not much of a gap for a new gene to be inserted. /note=Phamerator: 3 non-draft members in pham. All call a start site with length of 222. /note=Function call: The first and fifth hit on phagesdb BLAST are LittleLaf and Pringar which both have an unknown function. They have e-values of 6e-35 and 6e-36 which are low. In NCBI BLAST the first hit suggests that the gene is a hypothetical protein and has 100% coverage, 94.5205% identity, and e-value of 8.02797e-44. Decent hits in HHpred to Portal, but there is another call for portal later on that is more compelling. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Chang, Julia/ Cho, Emily /note=Secondary Annotator QC: Fill out starterator drop down menu, but yes I agree with the location call/ I agree with the location call and function call. make sure to check NCBI BLAST hit evidences. CDS 557 - 1057 /gene="2" /product="gp2" /function="hypothetical protein" /locus tag="Clarkson_2" /note=Original Glimmer call @bp 557 has strength 14.53; Genemark calls start at 578 /note=SSC: 557-1057 CP: yes SCS: both-gl ST: NI BLAST-Start: [hypothetical protein SEA_LITTLELAF_2 [Mycobacterium phage LittleLaf] ],,NCBI, q1:s1 100.0% 1.41125E-115 GAP: 61 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.865, -5.484122476401707, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LITTLELAF_2 [Mycobacterium phage LittleLaf] ],,AYB69812,99.3976,1.41125E-115 SIF-HHPRED: SIF-Syn: NFK, upstream gene is in pham 18639 and is called as NKF, downstream gene is in pham 14698 and is called as NKF. The upstream gene in phage Littlelaf is also in pham 18639 and the downstream gene in phage Littlelaf is also in pham 14698. The functions of the upstream and downstream genes in Littlelaf have not been called however. /note=Primary Annotator Name: Araque, Colette /note=Auto-annotation: Both: Glimmer Start: 557 (chosen), Glimmer Start Codon: GTG, GeneMark Start: 578, GeneMark Start Codon: ATG /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF; the chosen start site covers all this coding potential /note=SD (Final) Score: The score is -5.484, which is not the best score, but is still reasonable to suggest a credible ribosome binding site /note=Gap/overlap: There is a reasonable 61bp gap between the start site and the upstream gene. This start site creates the longest ORF. The length of this gene is acceptable (501 bp). /note=Phamerator: As of 1/13/2021 this gene is found in Pham 95124. This pham is present in other members of the same cluster (S) which my phage belongs to. Some examples are Beelzebulb, Blackbeetle, and Corazon. This start site is shared by other members of the cluster. Others call the site at 578 for a final length of 480bp. /note=Starterator: Yes, there is a conserved start site choice: 7@578. There are 44 members in this pham and 15 out of the 32 non-draft genes called start site 7. /note=Location call: Based on the gathered evidence, this gene is a real gene and has a start site at 557. /note=Function call: The top 2 NCBI BLASTp hits and the top 2 phagesdb Hits, sorted by E-value, suggested that this is a real gene, but that it has no known function as of yet, with high query coverage (>99.6%), high % identity (>97%), and low E-values (0). CDD and HHpred hits were not helpful in determining function. I do not have enough information to make a conclusion on this gene`s function. /note=Transmembrane domains: /note=Secondary Annotator Name: Charton, Chris /note=Secondary Annotator QC: I agree with this start site. Might want to add that this start site is shared by other members of the S cluster. /note=Transmembrane domains: None called by either TMHMM or TOPCONS. This is not a transmembrane protein. CDS 1054 - 1314 /gene="3" /product="gp3" /function="hypothetical protein" /locus tag="Clarkson_3" /note=Original Glimmer call @bp 1054 has strength 7.93; Genemark calls start at 1054 /note=SSC: 1054-1314 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FH33_gp005 [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 1.35532E-54 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.203, -4.3978601770686225, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FH33_gp005 [Mycobacterium phage MosMoris] ],,YP_009031515,98.8372,1.35532E-54 SIF-HHPRED: SIF-Syn: NKF, upstream gene has NKF and is in pham 98435, downstream gene has NKF and is in pham 6061, just like in phages LittleLaf, MosMoris, and Corazon /note=Primary Annotator Name: Bartolome, Alexandra /note=Auto-annotation: Glimmer and GeneMark both call the same start site at 1054, with a start codon of ATG. /note=Coding Potential: Glimmer and GeneMark both show coding potential that is included in the predicted start site at 1054. However, the coding potential is only in the forward strand of this ORF, so this must be a forward gene. /note=SD (Final) Score: The final score is -4.398, which is not the best option but still favorable. The Z-score is irrelevant because it is very likely that the gene is part of an operon, but is still favorable at 2.203. /note=Gap/overlap: The gap is -4 bp, which suggests that is likely part of an operon. This is strong evidence for the start site. /note=Phamerator: As of 01/12/22, the gene is part of pham 14699. It is conserved in phages Marvin, Corazon, and Tesla, which are all in Cluster S with Clarkson. /note=Starterator: Starterator shows start site number 6 as highly conserved. Start site number 6 was manually annotated by 13/14 non-draft genes in the same pham. In Clarkson, start site number 6 is at 1054 bp, which is the start site predicted by both GeneMark and Glimmer. /note=Location call: The evidence suggests that this is a real gene with a start site at 1054. It is highly conserved in Starterator, includes all coding potential, and is likely part of an operon. /note=Function call: NKF. Phagesdb BLAST, NCBI BLAST, HHpred, and CDD hits were not informative and showed no known function for this gene. /note=Transmembrane domains: TMHMM and TOPCONS both do not predict any TMDs, so this is not a membrane protein. /note=Secondary Annotator Name: Abana, Juana /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. /note=Secondary Annotator Name: Charton, Chris /note=Secondary Annotator QC: QC complete, agree with location and function call. Need to check the boxes for 2 NCBI blast hits with very low e-values. The Z-score is still important and it is greater than the critical value of 2, giving it support for the start site even if it is not the highest value. Note the function of the upstream and downstream genes as well for synteny box. CDS 1311 - 1565 /gene="4" /product="gp4" /function="hypothetical protein" /locus tag="Clarkson_4" /note=Original Glimmer call @bp 1311 has strength 5.06; Genemark calls start at 1311 /note=SSC: 1311-1565 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDI61_gp004 [Mycobacterium phage Marvin] ],,NCBI, q1:s9 100.0% 2.30251E-46 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.156, -2.338663114094478, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDI61_gp004 [Mycobacterium phage Marvin] ],,YP_009614122,86.9565,2.30251E-46 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF (Pham 14699) and downstream gene (Pham 22829) is NKF, just like RedRaider77 and Beelzebub /note=Primary Annotator Name: Bharadwaj, Shreya /note=Auto-annotation: GeneMark and Glimmer both call start sites of 1311 /note=Coding Potential: The coding potential of this ORF is on the forward strand, which means that this is a forward gene. I found coding potential in both GeneMark Self and Host. /note=SD (Final) Score: The final score was -2.339 and this was the best final score according to PECAAN /note=Gap/overlap: -4, which is well within the guidelines and indicates an overlap of 4 bp which is fairly good. /note=Phamerator: 6061 was the pham of the gene as of 1/17/22. It has also been found in phages Beelzebub_5 and Blackbeetle_5 /note=Starterator: Start site 3 was manually annotated 13/14 times in non-draft genomes. The start site 3 corresponds to start coordinate 1311 in Clarkson which agreed with the Glimmer and GeneMark auto-annotation. /note=Location call: Based on the evidence specified above, this is a real gene with a start site of 1311. /note=Function call: NKF, All of the evidence for this gene indicates that there is no known function. Both phagesdb and NCBI BLAST indicated that the function was unknown with relatively high e-values and several NKF hits. Some examples of phages from Phagesdb BLAST include Beelzebub which had no known function and an e-value of 3e-40 and Corazon which had no known function and an e-value of 7e-39. The CDD offered no hits and this was uninformative. HHpred offered hits that had high e-values, low probabilities, and low % coverage and therefore was also not informative. Therefore, the function call is NKF because there is no definitive function. /note=Transmembrane domains: No transmembrane domains found /note=Secondary Annotator Name: Araque, Colette /note=Module 9: Abana, Juana /note=Secondary Annotator QC:I have QC’ed this location call and agree with the first annotator. /note=Module 9: I have QC’ed this and agree with the function call however, for Phagesdb BLAST include the names of the phages as well as their e-values. CDS 1565 - 1834 /gene="5" /product="gp5" /function="hypothetical protein" /locus tag="Clarkson_5" /note=Original Glimmer call @bp 1565 has strength 12.43; Genemark calls start at 1565 /note=SSC: 1565-1834 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_PRINGAR_4 [Mycobacterium phage Pringar] ],,NCBI, q1:s1 100.0% 5.25792E-58 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.505, -3.769471247018311, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_PRINGAR_4 [Mycobacterium phage Pringar] ],,QFP96868,98.8764,5.25792E-58 SIF-HHPRED: SIF-Syn: My gene has NKF, and downstream gene 6 has NKF as well and is in the Pham 9483, and gene 4 has not updated a function and is in Pham 6061. Compared to LittleLaf, there is no function listed up or downstream. However, LittleLaf has the same Pham numbers for the genes upstream and downstream. /note=Primary Annotator Name: Bovee, Alyson /note=Auto-annotation: Glimmer and GeneMark both produced the correct start site at 1565 /note=Coding Potential: The ORF does have coding potential and is shown in the forward direction. /note=SD (Final) Score: The final score is -3.769, which is the second best (best is -3.756) /note=Gap/overlap: The gap/overlap is -1, which is a very good sign that this is the correct start site because it cannot move upstream and no gene can be added. /note=Phamerator: Pham: 22829 called on 1/13/22; it is also found in Corazon and Beelzebub /note=Starterator: Start number 3 is located in 14/14 of non-draft genes, which correlates to a start site of 1565, which is good evidence that this is the correct start site /note=Location call: 1565 is the best start site based on the guiding principles. Therefore, the gene should be kept. /note=Function call: NKF. Based on all the evidence presented, including BLAST and HHPRED not having any hits, the gene does not have a function to be called. /note=Transmembrane domains: There are no transmembrane domains as shown by TMHMM and Topcon. Thus, there is still NKF. /note=Secondary Annotator Name: Bartolome, Alexandra /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. /note=New Secondary Annotator Name: Araque, Colette /note=New Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. However, please check the synteny box. The annotator includes "gene 5 has not updated a function," but I think the annotator meant "gene 4" which is the correct upstream gene for the annotator`s gene (gene 5). Gene 4 has now been called: NKF. CDS 1824 - 2318 /gene="6" /product="gp6" /function="hypothetical protein" /locus tag="Clarkson_6" /note=Original Glimmer call @bp 1824 has strength 9.49; Genemark calls start at 1824 /note=SSC: 1824-2318 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_PRINGAR_5 [Mycobacterium phage Pringar]],,NCBI, q1:s1 100.0% 8.60666E-112 GAP: -11 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.495, -3.712290173411778, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_PRINGAR_5 [Mycobacterium phage Pringar]],,QFP96869,98.7805,8.60666E-112 SIF-HHPRED: SIF-Syn: This gene at Stop 2318 has NKF. In the phage, the genes immediately upstream and downstream have NKF. In Beelzebub, the gene of the same pham has no listed function, but its immediate upstream and downstream genes correspond to the same phams as the immediate upstream and downstream genes in Clarkson respectively. /note=Primary Annotator Name: Carreon, Justin /note=Auto-annotation: Both Glimmer and GeneMark Call the Start at position 1824, the Start Codon is an ATG. /note=Coding Potential: Coding Potential for this ORF is found primarily in the forward direction in a single reading frame in both Host-Trained and Self-Trained GeneMark. Noteworthy is a small spike of coding potential centered around position 2025 in the reverse strand in the Host-Trained GeneMark, however, there is no associated reading frame that contains this coding potential. This spike and lack of reading frame is also reflected in typical and atypical coding potential in the Self-Trained GeneMark. /note=SD (Final) Score: -3.712. It is the second-highest Final Score in PECAAN, below -3.423 for position 2292. The Z-score is not the best, at the 4th highest (2.495), being beat out by start 1881 and two other start sites that leave gaps greater than 300 base pairs from the previous gene. /note=Gap/overlap: Overlap of 11bp. The overlap is reasonable as it is conserved in other phages, such as BlackBeetle (S) and Beelzebub (S). /note=Phamerator: pham 9483 as of 01/12/2022 at 1436 hours Pacific Time. This pham is conserved in all non-draft genomes of Cluster S on PECAAN as of 01/12/2022, and is found in at least Corazon, Blackbeetle, and Beelzebub. /note=Starterator: Start Site 2 was manually annotated in 14 of 14 non-draft genomes in Cluster S. In Clarkson, Start Site 2 corresponds to position 1824. /note=Location call: Based on the above, this gene is real and the most likely start site is at position 1824. /note=Function call: UNKNOWN FUNCTION. Based on the BLAST hits on both PhagesDB, and NCBI, I cannot conclude any function for this gene as all hits as of 1/30/22 @ 0218 hours PST have no known function listed. Additionally, searches on CDD yielded no recorded homologs, and the only two returns on HHpred listed proteins with low probability and coverage, and e-values greater than 500. /note=Transmembrane domains: This gene has no predicted transmembrane domains according to the TMHMM, so it cannot be assigned as a membrane protein. /note=Secondary Annotator Name: Bharadwaj, Shreya, Bartolome, Alexandra /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator. CDS 2311 - 2583 /gene="7" /product="gp7" /function="hypothetical protein" /locus tag="Clarkson_7" /note=Original Glimmer call @bp 2374 has strength 0.77 /note=SSC: 2311-2583 CP: yes SCS: glimmer-cs ST: SS BLAST-Start: [hypothetical protein SEA_VASUNZINGA_8 [Mycobacterium phage VasuNzinga] ],,NCBI, q1:s1 100.0% 4.5922E-58 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.505, -3.769471247018311, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VASUNZINGA_8 [Mycobacterium phage VasuNzinga] ],,AYB70746,100.0,4.5922E-58 SIF-HHPRED: SIF-Syn: NKF; downstream gene is pham number 9483, while the upstream gene is pham number 98616 just like in other non-draft phages Beezlebub and Pringar. The upstream and downstream genes also have NKF. /note=Primary Annotator Name: Chang, Julia /note=Auto-annotation: Glimmer calls the start site at 2374bp with the codon TTG (rare as it only appears in 7% of start codons). /note=Coding Potential: The ORF has little coding potential based on the start site at 2311bp and stop site at 2583bp. Coding potential is in the forward direction, indicating it is a forward gene. /note=SD (Final) Score: The final score is best at option -3.769 and the z-score is the highest at 2.505. /note=Gap/overlap: The overlap upstream is 8bp long, which is reasonably close to the guidelines of upstream gene overlaps. This overlap is also conversed in other non-draft phages like Beezlebub and Blackbeetle. /note=Phamerator: The Pham number as of 01/17/2022 is 95469. The gene is also conserved in phages Beezlebub and VasuNzinga. /note=Starterator: Start site 82 is conserved among members of this pham at position 2311bp. 235 of the 264 non-draft phage annotations called this start site. /note=Location call: The gathered evidence suggests that this is a real gene based on full coding potential and being conserved in pharmerator. The start site number 82 and position at 2311bp seems to be the most likely potential start site candidate because it is conserved in starterator and covers all coding potential. /note=Function call: NKF. The top two hits for both phagesDB BLAST and NCBI BLAST hits had small e-values ranging from 1e-54 to 2e-50, but all hits unknown functions. CDD and HHpred had no significant hits. /note=Transmembrane domains: Since neither TMHMM nor TOPCONS predicted any TMDs, it can be assumed that this gene is not a membrane protein. /note=Secondary Annotator Name: Bovee, Alyson, Shreya Bharadwaj (2) /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator, I have QC’ed this location call and agree with the first annotator (2) CDS 2580 - 2957 /gene="8" /product="gp8" /function="hypothetical protein" /locus tag="Clarkson_8" /note=Original Glimmer call @bp 2580 has strength 7.97; Genemark calls start at 2580 /note=SSC: 2580-2957 CP: yes SCS: both ST: SS BLAST-Start: [gp6 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 2.04102E-40 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.167, -2.394485424036831, yes F: hypothetical protein SIF-BLAST: ,,[gp6 [Mycobacterium phage Bxz1] ],,NP_818082,72.3077,2.04102E-40 SIF-HHPRED: SIF-Syn: This gene has NKF. The gene upstream is NKF in pham 95469. The gene downstream is NKF as well in pham 16200. This same gene structure is seen in phages Beezlebub and Pringar as well. /note=Primary Annotator Name: Charton, Chris /note=Auto-annotation: Glimmer and Genemark both call for the start at 2580. /note=Coding Potential: There is strong evidence for coding potential in the forward direction using the third reference frame. This evidence is seen in both GeneMark Self and Host. /note=SD (Final) Score: -2.394. This start site has the best Z-score as well (3.167). There is a slightly better final score than this site, however that site incurs a large gap of 242bp. /note=Gap/overlap: -4bp. There is a 4bp overhand between this and the upstream gene. This indicates it is part of an operon, which is supported by the overhang sequence of ATGA. /note=Phamerator: The Pham as of 1/12/22 is 95640. This gene is found in other phages in cluster S such as JoieB and Marvin /note=Starterator: Start site 120 is manually annotated in 205 of 609 phages of this pham. 9 of 12 phages in S have this start site. This start site corresponds with a start @2580 for Clarkson. This site is also called for by Glimmer and GeneMark. /note=Location call: Based on the evidence, this is a real gene and a start site @2580 is well supported. /note=Function call: Unknown. There are no known functions in the PhagesDB BLAST, and the few identified proteins from NCBI BLAST do not have known function either. /note=Transmembrane domains: None. Neither TMHMM nor TOPCONS predict any TMDs. /note=Secondary Annotator Name: Carreon, Justin; Bovee, Alyson (I have QCd this gene and agree with the start site) /note=Secondary Annotator QC: I have QCd this gene and agree with the start site. Update Notes (Original Notes were unsaved): As of 1/28/22, Pham number has been updated to 98616. As of 1/21/22, Start Number is 117 from 120, start site bp is still 2580. The start number was called in 224 of 718 MA genes. CDS 2954 - 3115 /gene="9" /product="gp9" /function="hypothetical protein" /locus tag="Clarkson_9" /note=Original Glimmer call @bp 2954 has strength 4.46; Genemark calls start at 2954 /note=SSC: 2954-3115 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PBI_OVECHKIN_87 [Mycobacterium phage Ovechkin] ],,NCBI, q1:s1 100.0% 2.15199E-29 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.505, -3.769471247018311, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PBI_OVECHKIN_87 [Mycobacterium phage Ovechkin] ],,YP_009211251,100.0,2.15199E-29 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Mendoza, Alleana /note=Auto-annotation: Glimmer and GeneMark. Both call the start site at 2954 with start codon GTG. /note=Coding Potential: Only the ORF on the forward strand has reasonable coding potential, indicating that this is a forward gene. The chosen start site also does include all of the coding potential. /note=SD (Final) Score: -3.769. It is the best final score on PECAAN. /note=Gap/overlap: 4 bp overlap. However, because the overlap is GTGA rather than ATGA, this gene might not be part of an operon. This overlap is also conserved in other phages. /note=Phamerator: Pham: 16200. Date: 1/30/2022. It is conserved and found in Corazon and LittleLaf. /note=Starterator: Start site 19 in Starterator was manually annotated in 53/81 non-draft genes in this pham. Start 19 is 2954 in Clarkson. This evidence agrees with the start site predicted by Glimmer and GeneMark. As of 2/1/22, start site 19 was changed to start site 20, and this start site was manually annotated in 57/81 non-draft genomes. /note=Location call: Based on the data presented above, this is a real gene with 2954 as the most likely start site. /note=Function call: The top three PhagesDB BLASTp hits all have unknown functions (E-value 2e-27), and 3 out of 18 NCBI BLASTp hits (100% coverage, 98%+ identity, and E-value ) seems to be conserved in many other phage genomes, and it shows no coding potential, meaning no gene should be added. /note=Function call: Top two hits from BLASTp from PhagesDB sorted by e-value suggested there is no known function for this protein, and it showed perfect query coverage and % identity. Top two hits from BLASTp from NCBI also sorted by e-value suggested there is no known function for this protein, and it showed perfect or nearly perfect query coverage and % identity. In HHpred, all hits are bad calls (green, blue). All e-values were greater than 4, and the greatest probability was 73.23%, which is below our standard (>80-90). The coverage was also low. This indicates that the protein might be a new protein with no known homologs. Moreover, the CDD showed no conserved domains in other genomes. /note=Transmembrane domains: There are no transmembrane domains predicted. As a novel protein that has not been tested in wet lab, it is possible that there is no database yet. HHpred reports similar domains are parts of nuclear receptor coactivator, which also does not include no TMDs, so this is expected. /note=Secondary Annotator Name: Charton, Chris /note=Secondary Annotator QC: I agree with this call. As of 1/24/2022 the Pham number has changed to 96713 and start site have changed to 103. 393 of 476 MA phages call for this site. /note=Secondary Annotato Name: Chang, Julia /note=Secondary Annotator QC: I have QCed this location call and agree with the primary annotator. Note: You can probably shorten the description for the Function Call and touch upon the main points as to why it is NKF. Look at the PECAAN notes template for more concise descriptions. For the transmembrane domains section, also follow the annotation guidelines and use the evidence from databases like TMHMM and TOPCONS to explain why there are no TMDs. CDS 3529 - 3936 /gene="11" /product="gp11" /function="hypothetical protein" /locus tag="Clarkson_11" /note=Original Glimmer call @bp 3529 has strength 4.44; Genemark calls start at 3541 /note=SSC: 3529-3936 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_LITTLELAF_10 [Mycobacterium phage LittleLaf] ],,NCBI, q1:s1 100.0% 8.1581E-90 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.65, -3.389881189012514, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LITTLELAF_10 [Mycobacterium phage LittleLaf] ],,AYB69820,100.0,8.1581E-90 SIF-HHPRED: SIF-Syn: NKF; upstream gene is NKF (pham 96713), downstream gene is MazG-like nucleotide pyrophosphohydrolase (pham 11065), just like in other cluster S phages like Corazon, LittleLaf, Marvin, and Pringar /note=Primary Annotator Name: Cini, Victoria /note=Auto-annotation: Gimmer and GeneMark. Glimmer lists ATG start site at 3529, whereas GeneMark lists a GTG start site at 3541. /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF. In the first panel for both H-GeneMark and S-GeneMark maps, we observe the highest coding potential, indicating that this is a forward gene. We observe high coding potential throughout the entirety of the gene sequence, and both chosen start sites capture all the coding potential. /note=SD (Final) Score: Glimmer start site (3529) has best final score (-3.390) and z-score (2.65) on PECAAN. GeneMark start site (3541) also has a really good final score (-4.708) and z-score (2.015) compared to other potential start sites. /note=Gap/overlap: 8bp overlap for Glimmer start site (3529) and 4bp gap for GeneMark start site (3541) with the upstream gene. The gap/overlap size for both potential starts is reasonably small and meets the Guiding Principles of Bacteriophage Genome Annotation. /note=Phamerator: Pham 95431 has 307 members with 294 non-draft members (01/13/2021); gene is conserved and present in other cluster S phages like JoieB and Corazon. /note=Starterator: Start number 100, corresponding to the basepair coordinate 3529, is the most annotated start in Starterator; this site has manual annotations in 232/292 non-draft genes in the pham. This evidence agrees with the site predicted by Glimmer. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 3529. /note=Function call: NKF. The top 9 phagesdb BLAST hits (e-values < 10^-68) come from other cluster S phages and have unknown function. The top 4 NCBI BLAST hits (e-values < 10^-45, 98%+ coverage, 94%+ identity, score 244+) also come from cluster S phages and have NKF associated with them. In both databases, there are other strong hits for HNH endonuclease proteins, however, they come from cluster A phages, and their respective e-values, query coverage, final scores, and percent identities are significantly lower than the top BLAST hits with NKF, which come from cluster S phages. CDD had no hits. HHpred has no relevant hits (all e-values > 10^-3). /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Dr. Freise /note=Secondary Annotator QC: Dr. Freise qced my gene call and reviewed my PECAAN notes and said I could go forward with auto-annotation. /note=Secondary Annotator Name: Chris Charton /note=Secondary Annotator QC: I have QC`ed this gene and agree with location and function call. CDS 3933 - 4214 /gene="12" /product="gp12" /function="MazG-like nucleotide pyrophosphohydrolase" /locus tag="Clarkson_12" /note=Genemark calls start at 3933 /note=SSC: 3933-4214 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein SEA_LITTLELAF_11 [Mycobacterium phage LittleLaf] ],,NCBI, q1:s1 100.0% 3.40472E-58 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.505, -3.769471247018311, yes F: MazG-like nucleotide pyrophosphohydrolase SIF-BLAST: ,,[hypothetical protein SEA_LITTLELAF_11 [Mycobacterium phage LittleLaf] ],,AYB69821,100.0,3.40472E-58 SIF-HHPRED: NTP-PPase_iMazG; Nucleoside Triphosphate Pyrophosphohydrolase (EC 3.6.1.8) MazG-like domain found in integron-associated MazG (iMazG) proteins.,,,cd11536,89.2473,99.4 SIF-Syn: MazG-like nucleotide pyrophosphohydrolase, upstream gene is NKF (but the same pham for all phages - 98004), downstream is NKF(but the same pham for all phages - 12851), just like in phage Beelzebub and JoieB. /note=Primary Annotator Name: De Schutter, Elena /note=Auto-annotation: Genemark was the only one that annotated a start site which was at 3933 (GTG) but there is a TTG site a little bit further at 3837. /note=Coding Potential: There is a lot of coding potential in the gene; both the start site chosen by GeneMark and the TTG one is covered by that coding potential. /note=SD (Final) Score: For the start at 3933 called by GeneMark, the SD score is -3.769 which is a good score since it’s fairly high and also the highest among all start sites. This gene is probably part of an operon so the SD score is not very valuable. /note=Gap/overlap: The overlap is -4 meaning that this is probably an operon. This chosen start site makes our gene 282 base pairs long which is plausible. The 2 start sites with longer ORFs had huge overlaps, Z-scores below 2, and not the highest final score. /note=Phamerator: Gene was found in pham 11065 as of 01/13/2022. This pham mostly contained phages from cluster S (with a few from cluster DE), these S phages included: Beelzebub and LittleLaf. The function called for the ones that had a call was “MazG-like nucleotide pyrophosphohydrolase”, and this function is found in the list of approved functions. It was called for all the ones in the DE cluster and only 2 in the S cluster. /note=Starterator: The start site is conserved among the members of this pham. This corresponds to the start site that we had also manually checked. The start site here is start site number 7 and corresponds to basepair coordinate 3933. In this pham, 6/11 call the same site #7 (all 6 phages are in cluster S) /note=Location call: There is good coding potential in this gene and the start side 3933 (called by GeneMark) includes it. This suggests that it is a real gene and; in addition, has a conserved start site within its pham (start site 3933). /note=Function call: The top hits in BLASTp corresponded to gene of either function unknown or MazG-like nucleotide pyrophosphohydrolase. All of these calls had good scores (187) and the e-values were all very low (8e-48). It was decided that the function was not unknown based on the information from CDD and HHpred. CDD only gave only one output which was the family NTP-PPase MazG-like domain superfamily but this helped to sway us toward MazG-like nucleotide pyrophosphohydrolase. Finally, the output of HHpred was split but most of the relevant descriptions mentioned MazG. /note=Transmembrane domains: No transmembrane domains. /note=Secondary Annotator Name: Cho, Emily /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator that the start site is 3933. CDS 4264 - 4479 /gene="13" /product="gp13" /function="hypothetical protein" /locus tag="Clarkson_13" /note=Original Glimmer call @bp 4264 has strength 10.7; Genemark calls start at 4264 /note=SSC: 4264-4479 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_LITTLELAF_12 [Mycobacterium phage LittleLaf] ],,NCBI, q1:s1 100.0% 6.29258E-42 GAP: 49 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.978, -2.7254235724530456, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LITTLELAF_12 [Mycobacterium phage LittleLaf] ],,AYB69822,100.0,6.29258E-42 SIF-HHPRED: SIF-Syn: The upstream gene is MazG-like nucleotide pyrophosphohydrolase (pham 11065), downstream gene is uncalled (pham 16522). This is similar to other cluster S phages including Corazon, JoieB, Beelzebub, MosMoris, Poise, Pringar, Raela, Tesla, and RedRaider77 /note=Primary Annotator Name: Enos, Alexander /note=Auto-annotation: Both. Glimmer Start: 4264, Genemark Start: 4264 (same). GTG is the start codon /note=Coding Potential: The gene does have coding potential /note=SD (Final) Score: -2.725 which is the best of all the start sites on PECAAN /note=Gap/overlap: 49. Somewhat large but the smallest of all possible gaps of this gene. Also conserved. /note=Phamerator: Pham 12851 as of 1/17/22. Beelzebub_15 and Blackbeetle_12 show conservation. /note=Starterator: The start number called the most often in the published annotations is 4, it was called in 14 of the 14 non-draft genes in the pham. Start: 4 @4264 has 14 MA`s /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 4264. /note=Function call: NKF. The top three hits on phagesdb BLAST and NCBI BLAST indicated that there was no known function. Each of these had E-values of 9e-34 or less strongly indicating NKF. CDD and HHpred had no hits. /note=Transmembrane domains: none called by TMHMM or TOPCONS /note=Secondary Annotator Name: Cini, Victoria/ Cho, Emily /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator/ I have QC’ed this location call and function call and agree with the first annotator CDS 4508 - 4795 /gene="14" /product="gp14" /function="hypothetical protein" /locus tag="Clarkson_14" /note=Original Glimmer call @bp 4508 has strength 6.29; Genemark calls start at 4508 /note=SSC: 4508-4795 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDI61_gp013 [Mycobacterium phage Marvin] ],,NCBI, q1:s1 100.0% 1.66845E-62 GAP: 28 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.072, -4.59040903311284, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDI61_gp013 [Mycobacterium phage Marvin] ],,YP_009614131,100.0,1.66845E-62 SIF-HHPRED: SIF-Syn: the gene upstream is apart of pham 12851 with a stop site at 4479. It is also a NKF. the downstream gene is in pham 16399 with a stop site at 5154. It also has NKF. /note=Primary Annotator Name: Fields, Brooke /note=Auto-annotation: Both Glimmer and GeneMark list ATG start site 4508. /note=Coding Potential: There is substantial evidence of coding potential in the Host-GeneMark and Self-GenerMark maps. Both start sites include all coding potential. /note=SD (Final) Score: Glimmer and Genemark both called for gene at start site 4508 (stop@4795 F) RBS Final Score: -4.590 Z-value: 2.072 (higher than 2 is good.) It does have the best final score. /note=Gap/overlap: There is a gap of 28 bp for the upstream gene /note=Phamerator: Pham 16522 and has 17 members , 3 are drafts. Compared to BeezleBub and JoieB /note=Starterator: The start position is 2 , which corresponds to start 4508. IT is the most annotated start in starterator. It received 14 manual annotations from all 14 of the nondraft members. /note=Location call: Based on evidence the most likely start site is 4508 and this is a real gene. /note=Secondary Annotator Name: De Schutter, Elena & Cini, Victoria (module 9) /note=Secondary Annotator QC: Please input whether your chosen start site includes all coding potential (drop down menu). Please expand your notes on Phamerator, are the genes in it part of the same cluster, is there a function called, what pham group is it in? How many are drafts? What you wrote for phamerator is better suited for starterator notes. Please expand on location call by summarizing your evidence (coding potential, conserved start site, etc). I would tentatively say that the start site does seem correct since 14/14 genes from the pham called it. CDS 4804 - 4896 /gene="15" /product="gp15" /function="hypothetical protein" /locus tag="Clarkson_15" /note= /note=SSC: 4804-4896 CP: no SCS: neither ST: NI BLAST-Start: [hypothetical protein SEA_RAELA_14 [Mycobacterium phage Raela] ],,NCBI, q1:s1 100.0% 6.82277E-13 GAP: 8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.448, -5.885962410576086, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_RAELA_14 [Mycobacterium phage Raela] ],,AZS06778,100.0,6.82277E-13 SIF-HHPRED: SIF-Syn: /note=Added gene. CP present on GM-self. Blackbeetle, Huphlepuff, Poise, Raela, RedRaider77 all have one here. (Others do not; Variable genome region within cluster S) /note=Chose longest SS that would result in reasonable gap/overlap. -AF CDS 4918 - 5154 /gene="16" /product="gp16" /function="hypothetical protein" /locus tag="Clarkson_16" /note=Original Glimmer call @bp 4918 has strength 4.4; Genemark calls start at 4918 /note=SSC: 4918-5154 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_JOIEB_15 [Mycobacterium phage JoieB] ],,NCBI, q1:s1 100.0% 1.70274E-48 GAP: 21 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.677, -5.347481225442977, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_JOIEB_15 [Mycobacterium phage JoieB] ],,QFP94154,100.0,1.70274E-48 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: West, Julie /note=Auto-annotation: Glimmer and Genemark. Both agree on start site of 4918 /note=Coding Potential: Coding potential exists in the ORF in both host and self-trained Genemark. The start site covers all coding potential. /note=SD (Final) Score: -5.347. This is not the best final score on PECAAN, but this start site allows for the largest ORF and smallest gap. /note=Gap/overlap: There is a 122 bp gap with the preceding, which could warrant the addition of a gene, but there is no coding potential in this area. Additionally, other phages such as Corazon, show synteny with respect to this gap. /note=Phamerator: Pham 16399. Several other S phages have genes in this pham. /note=Starterator: Analysis was run 02/01/22. The start at 4918 was called 81% of the time when present and had 12/14 manual annotations. /note=Location call: Altogether, the above evidence indicates this gene starts at 4918. /note=Function call: NKF. BLASTp only gives hits for unknown/hypothetical proteins. No significant hits in CDD or HHPred. /note=Transmembrane domains: No TMD predicted by TmHmm or TOPCONS. /note=Secondary Annotator Name: De Schutter, Elena /note=Secondary Annotator QC: I agree with the location call and also the function call. Really the only thing would be adding the date for the pham but great notes for the rest. I know the upstream gene isn`t complete yet but I`m just noting that the synteny box isn`t fully filled out yet. Just update that whenever it becomes possible also with possibly a second example of a phage. CDS 5151 - 5441 /gene="17" /product="gp17" /function="hypothetical protein" /locus tag="Clarkson_17" /note=Original Glimmer call @bp 5151 has strength 9.46; Genemark calls start at 5151 /note=SSC: 5151-5441 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_LITTLELAF_16 [Mycobacterium phage LittleLaf] ],,NCBI, q1:s1 100.0% 1.68744E-64 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.505, -3.6907860541164537, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LITTLELAF_16 [Mycobacterium phage LittleLaf] ],,AYB69825,100.0,1.68744E-64 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hatashita, Anthony /note=Auto-annotation: Both Glimmer and Genemark agreed on the same start site at position 5151. The start codon is ATG. /note=Coding Potential: The gene has substantial coding potential within the putative ORF and the start site covers all of the coding potential. /note=SD (Final) Score: The final score is not the best of the options, however it is close to the best and is still reasonable to suggest the presence of a credible ribosome binding site. While this is true, the final score is irrelevant for the start call because this gene is likely organized in an operon as there is a 4 base pair overlap between itself and the upstream gene. /note=Gap/overlap: There is a gap of 4 base pairs with the upstream gene which is very reasonable and likely indicates the presence of an operon. The length of the gene is great (291 base pairs) and other start site options had too large of upstream gaps to be real genes. /note=Phamerator: The gene is found in pham 8405 as of 1/12/21. The pham in which the gene is conserved is in other members of the cluster that the phage belongs in. Phages used for comparison are Beelzebub and Blackbeetle. There is no function called for the gene. /note=Starterator: There is a reasonable start site that is conserved among all members of the pham that the phage belongs to. The start site number that is conserved is start site 1 which corresponds to base pair 5151 in the phage. There are 16 total members of the pham, 2 of which (including Clarkson) are drafts, and every single one calls start site 1. /note=Location call: Altogether, evidence suggests that this is a real gene as it has amazing coding potential and is conserved in phamerator. The potential start site that seems most likely to be the actual start site is the suggested start site of base pair 5151. /note=Function call: The top 3 NCBI BLASTp hits and all PhagesDB hits (8), sorted by E-value, suggest that the function is unknown, with high query coverage (100%), high % identity (>96.8%), and low E-values (2e-50>). /note=Transmembrane domains: There are no TMDs in this protein. /note=Secondary Annotator Name: Fields, Brooke /note=Secondary Annotator QC: I have QC`d this location call and i agree with the primary annotator CDS 5434 - 5700 /gene="18" /product="gp18" /function="hypothetical protein" /locus tag="Clarkson_18" /note=Original Glimmer call @bp 5434 has strength 6.29; Genemark calls start at 5434 /note=SSC: 5434-5700 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FH33_gp016 [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 1.30891E-55 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.563, -3.859592806742189, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FH33_gp016 [Mycobacterium phage MosMoris] ],,YP_009031526,100.0,1.30891E-55 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hu, Yixiao /note=Auto-annotation: the glimmer start is 5434 and the gene mark starts 5434 as well, with the stop codon being at 5700. It’s in the forward direction and the start codon is TTG /note=Coding Potential: the coding potential is both high at the start site 5434 and the stop codon at site 5700 /note=Phamerator: Pham 16683 on 2022.1.12, all S cluster /note=Starterator: start site 4 with 15/15 manual annotations. /note=Function call:NKF /note=Transmembrane domains:no /note=Secondary Annotator Name: ENOS, ALEXANDER /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 5687 - 5881 /gene="19" /product="gp19" /function="hypothetical protein" /locus tag="Clarkson_19" /note=Original Glimmer call @bp 5687 has strength 15.19; Genemark calls start at 5687 /note=SSC: 5687-5881 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_RAELA_18 [Mycobacterium phage Raela]],,NCBI, q1:s1 100.0% 5.17412E-35 GAP: -14 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.921, -2.827683592113848, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_RAELA_18 [Mycobacterium phage Raela]],,AZS06782,98.4375,5.17412E-35 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Juarez, Sabrina /note=Auto-annotation: Glimmer and Genemark both call a start site of 5687bp. /note=Coding Potential: Good coding potential is found in both Self- and Host-Trained Genemark in the forward reading frame. The chosen start site includes all of the coding potential for the ORF. /note=SD (Final) Score: The final score is the best option, -2.828. It also has the highest z-score, 2.921. /note=Gap/overlap: The 14 bp overlap with the upstream gene is larger than what is typical. However, this gene is conserved in other phages, such as phage Corazon and Gattaca. /note=Starterator: Start site 1 in Starterator was manually annotated in 14/14 non-draft genes in this Pham. Start site 1 is at position 5687 in Clarkson. This evidence agrees with the sites predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence, this is a real gene, and most likely starts at 5687 bp. /note=Function call: No known function. The top two phagesdb BLAST hits have the function listed as function unknown in Mycobacteriophage Raela_18 (E-value of 5e-30 and 98% identity) and Mycobacteriphage Marvin_17 (E-value of 1e-29 and 96% identity). The six NCBI BLAST hits are listed as hypothetical proteins (100% coverage, 94%+ identity, and E-value <1e-32). There were no hits given by CDD and the HHpred hits were not significant, the best hits had 79% probability, <34% coverage, and E-values greater than 6. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Hatashita, Anthony / Enos, Alexander /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. / I have QC’ed this location call and agree with the first annotator. CDS 5883 - 6023 /gene="20" /product="gp20" /function="hypothetical protein" /locus tag="Clarkson_20" /note=Original Glimmer call @bp 5883 has strength 5.77; Genemark calls start at 5883 /note=SSC: 5883-6023 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_LITTLELAF_19 [Mycobacterium phage LittleLaf] ],,NCBI, q1:s1 100.0% 8.2331E-26 GAP: 1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.921, -2.827683592113848, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LITTLELAF_19 [Mycobacterium phage LittleLaf] ],,AYB69828,100.0,8.2331E-26 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Koetters, Owen /note=Auto-annotation: Glimmer and GeneMark, both calling a start site at position 5883. /note=Coding Potential: Coding potential in this ORF is only on the forward strand, and so this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -2.828. This is the most favorable score on PECAAN as it is the closest to zero. /note=Gap/overlap: There is a one nucleotide gap with the gene immediately upstream. This is certainly a reasonable observation and is also more likely than other gap/(large) overlap options. /note=Phamerator: 11728. Data 01/07/22. It is highly conserved. For example, it is present in both phage Beelzebub and phage Corazon. /note=Starterator: Start site 4 in Starterator was manually annotated in 12 of 12 non-draft genomes. This site agrees with that which was predicted by GeneMark and Glimmer. /note=Location call: Given the above evidence, this is a real gene with a start site at position 5883 and stop site at position 6023. /note=Function call: No known function. A BLAST was executed using the PhagesDB and NCBI databases, however there were zero BLAST hits that contained a functional call. Further, there were no significant HHpred or CDD hits. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predict a TMD for this gene, and so it is not membrane associated. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 6072 - 6209 /gene="21" /product="gp21" /function="hypothetical protein" /locus tag="Clarkson_21" /note=Original Glimmer call @bp 6072 has strength 7.91; Genemark calls start at 6072 /note=SSC: 6072-6209 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FH33_gp019 [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 5.36274E-22 GAP: 48 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.156, -2.2763497933341483, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FH33_gp019 [Mycobacterium phage MosMoris] ],,YP_009031529,100.0,5.36274E-22 SIF-HHPRED: SIF-Syn: NKF, both upstream and downstream genes are NKF. Upstream gene is in pham 11728. Downstream gene is in pham 98353. Pham numbers for upstream and downstream genes are the same in phages Beelzebub and Corazon. /note=Primary Annotator Name: Li, Shally /note=Auto-annotation: Both Glimmer and GeneMark call the gene and agree on the start site of 6072. /note=Coding Potential: Strong coding potential is found in the self-trained GeneMark but not the host-trained GeneMark. The entire high coding potential region is encompassed by the gene. There is some coding potential on the reverse strand, but it is not as high as the forward strand, indicating that this is a forward gene. /note=SD (Final) Score: -2.276. This is the best final score on PECAAN. /note=Gap/overlap: 48bp. This is a large gap but there are no earlier possible start sites and no coding potential in the gap that might be another gene. This gap is also not large enough for another gene. /note=Phamerator: Pham number 17100 as of 1/12/2022. The gene is conserved in 14 published phages, all of cluster S, including Blackbeetle and Beezelbub. /note=Starterator: 14/14 of the non-draft members in this pham call start site 2. This correlates to position 6072 in Clarkson and is confirmed by Glimmer and GeneMark /note=Location call: This is likely a real gene based on the above information. The start site is 6072, which Glimmer and GeneMark agree on and is confirmed by Starterator. /note=Function call: NKF. All PhagesDB and NCBI Blast hits were for proteins of undefined function. There are no hits in CDD and all HHpred hits have extremely high e-values, with bacterial genes that are unlikely to be present in a phage. /note=Transmembrane domains: None /note=Secondary Annotator Name: Juarez, Sabrina /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 6250 - 6405 /gene="22" /product="gp22" /function="hypothetical protein" /locus tag="Clarkson_22" /note=Original Glimmer call @bp 6250 has strength 6.12; Genemark calls start at 6250 /note=SSC: 6250-6405 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDI61_gp020 [Mycobacterium phage Marvin] ],,NCBI, q1:s1 100.0% 1.19657E-23 GAP: 40 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.335, -7.160850644804691, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDI61_gp020 [Mycobacterium phage Marvin] ],,YP_009614138,100.0,1.19657E-23 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Likwong, Chloe /note=Auto-annotation: Glimmer and GeneMark predicted the same starting sites at 6250. /note=Coding Potential: Via both Glimmer and GeneMark, there is coding potential found in both the maps and there is no violation of the guiding principles /note=SD (Final) Score: Final Score of -7.161 and a Z-score of 1.335 for the start site @6250. It is important to note that compared to Start@6289, Start@6250’s Final Value is the most negative and its Z-score is less than 2. /note=Gap/overlap: There is a 40bp gap with the upstream gene. /note=Phamerator: This gene is in pham 3367 as of the date 1/14/2022. There are 67 members, 6 of which are non-final drafts. While the other members are found in either B, G, S, or V, Clarkson belongs to the S cluster and shares it with phages like Corazon and Gattaca. It is important to note that in Pham 98353 as of 1/21/22, there are 18 members instead. /note=Starterator: as of the date 1/21/22, Start @6250 is Start Number 4 and the site is under the “Most Annotated” and is conserved in 14 of the 15 non-draft genes. /note=Location call: Clarkson 21 gene seems to be a real gene given that Start 15 @6250 covers the coding potential, has a gap of -4, and has 14 MA’s done compared to the other potential Start site; hence, the location call is at Start @6250. It is important to note that Start@6289 has a Z-score of 2.094 and a Final Score of -4.545, while Start@6250 has a Z-score of 1.335 and a Final Score of -7.161. Both Glimmer and Genemark agree on Start @6250. /note=Function call: The function is unknown. In PhagesDB, the hits with strong e-values listed no known function. For NCBI BLAST hits, there are strong e-values listed for hypothetical proteins; however, it is important to note that there are some NCBI BLAST hits with an e-value of 8e-06 or higher that lists “Rho termination factor” as a function. For HHpred, the e-values are high and either listed a probability under 80% and/or coverage under 40%. /note=Transmembrane domains: No transmembrane domains detected by TMHMM or TOPCONS. /note=Secondary Annotator Name: Koetters, Owen / Sabrina Juarez /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. / I have QC’ed this location call and agree with the first annotator. CDS 6402 - 6551 /gene="23" /product="gp23" /function="membrane protein" /locus tag="Clarkson_23" /note=Original Glimmer call @bp 6402 has strength 9.51; Genemark calls start at 6402 /note=SSC: 6402-6551 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDI61_gp021 [Mycobacterium phage Marvin] ],,NCBI, q1:s1 100.0% 2.46784E-23 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.733, -3.807177799446732, no F: membrane protein SIF-BLAST: ,,[hypothetical protein FDI61_gp021 [Mycobacterium phage Marvin] ],,YP_009614139,100.0,2.46784E-23 SIF-HHPRED: SIF-Syn: The gene downstream of this gene consistently shows synteny as a helix-turn-helix DNA binding domain in other non-draft phages, including Blackbeetle, Beelzebub, Corazon, Gattaca, etc. The gene upstream is NKF, pham 99494. This gene itself is NKF in other non-draft phages, pham 56364. /note=Primary Annotator Name: Lin, Yuri /note=Auto-annotation: Both Glimmer and GeneMark call the gene and agree on a start site at 6402 bp. /note=Coding Potential: Good coding potential in the forward direction is indicated for this ORF on Genemark for both the host- and self-trained algorithms. /note=Gap/overlap: There is an overlap of 4 bp with the preceding gene. The other candidate start sites with better RBS and Z-values, as well as longer ORFs, have much larger overlaps (150+ bp) that make them unlikely candidates. /note=Phamerator: As of 1/19/22, the pham number is 56364. This gene is conserved in 16 other phages, 14 of which are published. All are in cluster S, the same as Clarkson. /note=Starterator: Start site 4 in Starterator was manually annotated in 14/14 non-draft genes in this pham. Start site 4 is at position 6402 bp on Clarkson. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 6402. /note=Functional call: Membrane protein - NKF - In PhagesDB, every hit except for two hits states “function unknown”, and the two hits that call it a tape measure protein have extremely high e-values (6.6), so they are poor evidence of function for this gene. The NCBI blast only had two matches and both were also “hypothetical protein”. CDD returned no hits, and HHpred returned only DUFs or hits with unreasonably large e-values. However, both TMHMM and TOPCONS detected at least one TMD, so this protein can be tentatively identified as a membrane protein for now. /note=Transmembrane domains: 1 /note=Secondary Annotator Name: Li, Shally /note=Secondary Annotator QC: I agree with the location call. There is a possible typo on the Starterator notes. Primary annotator notes that the most called start, start 4, corresponds to position 6305. Start 4 on starterator actually corresponds to position 6402, which is the start site agreed upon by Glimmer and GeneMark and the annotator in location call. I agree with all else. [fixed] CDS complement (6548 - 6781) /gene="24" /product="gp24" /function="helix-turn-helix DNA binding domain" /locus tag="Clarkson_24" /note=Original Glimmer call @bp 6715 has strength 8.99; Genemark calls start at 6811 /note=SSC: 6781-6548 CP: yes SCS: both-cs ST: NA BLAST-Start: [helix-turn-helix DNA binding protein [Mycobacterium phage Beelzebub]],,NCBI, q1:s11 100.0% 8.2082E-49 GAP: 74 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.067, -5.4460838369057685, no F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding protein [Mycobacterium phage Beelzebub]],,AZF93292,88.5057,8.2082E-49 SIF-HHPRED: Endothelial differentiation-related factor 1; EDF1, HMBF1alpha, helix-turn-helix, Structural Genomics, NPPSFA, National Project on Protein Structural and Functional Analyses, RIKEN Structural; NMR {Homo sapiens} SCOP: l.1.1.1, a.35.1.12,,,1X57_A,80.5195,99.0 SIF-Syn: Helix-turn-helix DNA binding domain; upstream gene is pham 56364 and downstream gene is pham 5237. Phages Beelzebub and Corazon also show this order of gene function. /note=Primary Annotator Name: Mao, Xuanting /note=Auto-annotation: Glimmer and Genmark called different start sites. Glimmer called the start site at 6715 and the start codon ATG is called here. GeneMark called the start site at 6811 and the start codon ATG is called here. /note=Coding Potential: The ORF does have reasonable coding potential in which coding potential is found in both self-trained and host-trained GeneMark in the reverse reading frame. Both of the start sites called by Glimmer and GeneMark cover all of the coding potentials. /note=SD (Final) Score: Start site at 6715 has both the lowest RBS final score (-7.967) and lowest Z-value (0.946). Start site at 6811 has a better RBS final score (-6.557) and Z-value (1.723) compared to the start site at 6715. However, by comparing 6811 with other gene candidates, it’s still not that competitive. Start site at 6781 is better than 6715 and 6811 because its Z-value is 2.067 and RBS Final Score is -5.446. Even though start site at 6781 doesn’t have the highest Z-value and final score, other factors like gene length and gap size make it a very good start site. /note=Gap/overlap: 140 bp gap for start site at 6715. This is reasonable because it requires at least a 50 bp gap for two genes transcribed in opposite directions. 44 bp gap for start site at 6811, this doesn`t meet the at least 50 bp requirement. 74 bp gap for start site at 6781, this is reasonable because it requires at least a 50 bp gap. This gap is conserved in other phages. /note=Phamerator: The pham number as of January 12, 2022 is 21975. The gene is conserved in 16 phages and all of them are belonging to cluster S. /note=Starterator: There`s no conserved start site among the members of the pham. The start number called the most often in the published annotations is 5, it was called in 11 of the 14 non-draft genes in the pham. For Clarkson, it calls for start number 7 at the start site of 6715. The start 5 in Clarkson calls for the start site at 6781. /note=Location call: Considering all of the evidence above, the gene is a real gene. But the start site seems most likely at 6781. Because this start site is the one chosen by most people and it has a good balance between the gap size and the final score. Its gap size is 74 bp. It`s reasonable because this gene is a reverse gene, it requires at least a 50 bp gap. RBS Final score for 6781 is the second-best (-5.446)among all of the other start sites. Moreover, its Z-score is 2.067 which is bigger than 2 and it’s 234 base pairs long. It covers all of the coding potentials. All of these make 6781 the best start site for this gene. /note=Function call: The top two phagesdb hits have 2e-38 and suggest the function is helix-turn-helix DNA binding protein. Top two NCBI BLASTp hits suggested function is helix-turn-helix DNA binding protein, with high query coverage (100%), high % identity( one is 88.5057% and another one is 98.7013%), and low e-values (2e-48 and 8e-49). CDD hits also suggest the same function as phagesdb and NCBI with low e-value( 2.94e-09). One of the most significant hits from HHpred suggests the function is helix-turn-helix DNA binding protein with e-value of 6.3e-8, 98.96% probability, and 80.5159% coverage. All of these hits suggest that the function of this gene is helix-turn-helix DNA binding protein. /note=Transmembrane domains: None. There`s no transmembrane domain shown on TMHMM and TOPCONS in which both of them called 0 TMD. /note=Secondary Annotator Name: Likwong, Chloe; Li, Shally /note=Secondary Annotator QC: The chosen start site is accurate and it reflects the guiding principles of annotation. Start@6781 seems to be the most likely start site, given its RBS Final Score and the data acquired from Starterator. I agree with the function call. CDS 6856 - 7233 /gene="25" /product="gp25" /function="hypothetical protein" /locus tag="Clarkson_25" /note=Original Glimmer call @bp 6856 has strength 8.45; Genemark calls start at 6886 /note=SSC: 6856-7233 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_LITTLELAF_24 [Mycobacterium phage LittleLaf] ],,NCBI, q1:s1 100.0% 1.63377E-83 GAP: 74 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.39, -3.867841122493862, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LITTLELAF_24 [Mycobacterium phage LittleLaf] ],,AYB69832,100.0,1.63377E-83 SIF-HHPRED: SIF-Syn: NKF, but is also NKF in other members of pham 5237. Upstream gene is a helix-turn-helix DNA binding protein, just like in phages Pringar and Tesla. The downstream gene is NKF, just like in other members of pham 19355. /note=Primary Annotator Name: Maraziti, Gabriela /note=Auto-annotation: Glimmer and Genemark. Glimmer calls 6856, but Genemark calls 6886. /note=Coding Potential: The gene has reasonable coding potential within the ORF, and the chosen start encapsulates all typical and atypical coding potential. /note=SD (Final) Score: The final score is the best option at -3.868. The z-score is the second highest at 2.39. /note=Gap/overlap: 74 bp; this is a relatively large gap but ultimately reasonable with strong support. Both this gene and the gap are conserved in several other phages, including LittleLaf and Blackbeetle. /note=Phamerator: pham 5237 as of 1/18/22. This gene is conserved and found in phages such as Corazon and Gattaca. There is no function called for this gene. /note=Starterator: 13/14 non-draft genomes call the most conserved start, number 2. Start 2 corresponds to position 6856 on the Clarkson phage, is the auto-annotated start site, and agrees with the site predicted by Glimmer. /note=Location call: The evidence suggests that this gene is real and has a start site at 6856 bp. /note=Function call: NKF; phagesdb BLAST, HHpred and CCD did not return any informative hits. /note=Transmembrane domains: Not a membrane protein; no hits predicted by either TMHMM or TOPCONS. /note=Secondary Annotator Name: Lin, Yuri /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator. /note=Tertiary Annotator: Likwong, Chloe /note=Tertiary Annotator QC: I agree with the findings of the primary and secondary annotator. CDS 7217 - 7768 /gene="26" /product="gp26" /function="hypothetical protein" /locus tag="Clarkson_26" /note=Original Glimmer call @bp 7217 has strength 10.95; Genemark calls start at 7217 /note=SSC: 7217-7768 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_BEELZEBUB_29 [Mycobacterium phage Beelzebub] ],,NCBI, q1:s1 100.0% 2.97233E-127 GAP: -17 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.311, -2.0162541296952132, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BEELZEBUB_29 [Mycobacterium phage Beelzebub] ],,AZF93294,100.0,2.97233E-127 SIF-HHPRED: SIF-Syn: This gene is NFK in pham 19355. Upstream gene NFK, pham 5237. Downstream gene NFK, pham 21399. /note=Primary Annotator Name: Mascareno, Greta /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 7217. /note=Coding Potential: Coding potential is found both in GeneMark Self and Host. The ORF encompasses the whole proposed gene, from 7217 bp to 7768 bp. /note=SD (Final) Score: The start site at 7217 has the best final score on PECAAN of -2.016 along with the highest Z-score of 3.311. /note=Gap/overlap: There is an overlap of 17 bp however it is under the 30 bp limit and can be accepted.. /note=Phamerator: The pham number as of 01/12/2022 is 19355. The gene is conserved; it is found in phages Corazon, Blackbeetle, and Beezlebub which are also included in the same cluster as Clarkson. /note=Starterator: Start site 2 is called in 14/14 non-draft members of the Pham. Start site two is located at 7217 bp as called by Glimmer and GeneMark. /note=Location call: Considering the evidence, this is a real gene with the start site of 7217 bp. /note=Function call: There is no known function for this gene. All evidence was inconclusive. There were no hits in HHpred, CDD or BLASTp that could help conclude the function. /note=Transmembrane domains: None were found in TmHmm. /note=Secondary Annotator Name: Mao, Xuanting /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator. But one thing I want to correct is -17 in the gap column doesn`t mean a 17 bp gap, it means a 17 bp overlap. This wouldn`t affect the result concluded by the first annotator because up to about 30 bp is legitimate. CDS 7765 - 8046 /gene="27" /product="gp27" /function="hypothetical protein" /locus tag="Clarkson_27" /note=Original Glimmer call @bp 7765 has strength 7.77; Genemark calls start at 7771 /note=SSC: 7765-8046 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein FDI61_gp025 [Mycobacterium phage Marvin] ],,NCBI, q1:s1 100.0% 4.89156E-58 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.159, -5.174867876921511, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDI61_gp025 [Mycobacterium phage Marvin] ],,YP_009614143,100.0,4.89156E-58 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Mendoza, Alleana /note=Auto-annotation: Glimmer and GeneMark. Both call the gene, but at different start sites (7765 and 7771, respectively). Although, they have the same start codon, TTG. /note=Coding Potential: Only the ORF on the forward strand has reasonable coding potential, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -5.175. It is not the best final score, but this preferred start site has a favorable score within all the potential start codons. /note=Gap/overlap: 4 bp overlap. However, because the overlap is TTGA rather than ATGA, this gene might not be part of an operon. This overlap is also conserved in other phages. /note=Phamerator: Pham: 21399. Date: 1/12/2022. It is conserved and found in Marvin and Tesla. /note=Starterator: Start site 3 in Starterator was manually annotated in 14/14 non-draft genes in this pham. Start 3 is 7765 in Clarkson. This evidence agrees with the start site predicted by Glimmer. /note=Location call: Based on the data presented above, this is a real gene with 7765 as the most likely start site. /note=Function call: The top three PhagesDB BLASTp hits all have unknown functions (E-value 3e-48), and 5 out of 5 NCBI BLASTp hits (100% coverage, 98%+ identity, and E-value 99%), and low E-values (much less than 0). HHPRED and CDD support this call, with the closest matches for sequence similarity pointing towards DNA methyltransferase (matches being from the superfamilies S-adenosyl-L-methionine-dependent methyltransferases and AdoMet_MTases). /note=Transmembrane domains: No evidence to suggest a transmembrane domain. /note=Secondary Annotator Name: Patel, Rishi /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. (See comments in spreadsheet) CDS 10423 - 10857 /gene="34" /product="gp34" /function="terminase, small subunit" /locus tag="Clarkson_34" /note=Original Glimmer call @bp 10423 has strength 8.37; Genemark calls start at 10504 /note=SSC: 10423-10857 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_GATTACA_32 [Mycobacterium phage Gattaca] ],,NCBI, q1:s1 100.0% 8.94825E-100 GAP: 24 bp gap LO: no RBS: Kibler 6, Karlin Medium, 0.965, -7.4801240167582215, no F: terminase, small subunit SIF-BLAST: ,,[hypothetical protein SEA_GATTACA_32 [Mycobacterium phage Gattaca] ],,ANM46255,100.0,8.94825E-100 SIF-HHPRED: Terminase small subunit; genome packaging, bacteriophage, DNA binding, VIRAL PROTEIN; 1.4A {Enterobacteria phage HK97},,,6Z6E_B,51.3889,98.8 SIF-Syn: terminate, small subunit, upstream DNA methylase, downstream NKF, just like in Huphlepuff. /note=Primary Annotator Name: Perez, Joshua /note=Auto-annotation: Gene was called by both Glimmer and Genemark, but do not start at the same site. The preferred program, Glimmer, has a start site at 10423 and calls ATG. Genemark has a start site at 10504 and calls TTG. /note=Coding Potential: Yes, this gene has reasonable coding potential predicted within the first RF forward. This chosen start site does cover all this coding potential in both the Host and Self GeneMark. /note=SD (Final) Score: -7.4890; has the third lowest final score present. The Z score present for this start is 0.965, which is decently low in terms of Z scores. /note=Gap/overlap: Glimmer has a gap of 24bp and seems to be conserved in the Pham Maps. This gap seems to be conserved in other phages, with a few being Beezlebub, Blackbeetle, and VazuNzinga. /note=Phamerator: Pham 95633; 01/12/22. Pham 95633 is present in other members of the cluster S. I used Beezlebub, Corazon and 11 others from cluster S. None of the other similar members had a function called. /note=Starterator: This is a reasonable conserved start site. The start site number is 42, and the coordinate base pair number is 10423. While it is not the most annotated gene, it has 7 manual draft annotations, is called 71% of the time when present, and has 159 members in its pham. This is good evidence that start site 42 with bp number 10423 is the correct start site. /note=Location call: The overall evidence shows that this gene is most likely real at start 42 and bp number 10423 due to high coding potential in its first RF forward, conserved gap in other cluster S phages, and evidence from starterator. Glimmer start site of 10423 is the most likely start site, as it has the highest ORF and lowest gap bp. This is most likely a real gene as it has coding potential. /note=Function call: The top 5 NCBI hits sorted by e-value had low e-values and high % identity. For example, phage Gattaca has a 100% identity and an e-value of 7.76149e-100. This points to the fact that this gene is most likely a terminase small subunit. The most related protein on phages DB had a 51% match, also as a terminase small subunit. While there is no data available from CDD, this is the most likely function I can determine since the top two hits from HHpred were terminase, small subunits. Module 8 also shows evidence of terminase, small subunit. Thus, this gene is mostly related to terminase, small subunit. /note=Transmembrane domains: No predicted TMDs by TOPCON or TMHMM makes sense because the hypothesized protein function is a terminase, small subunit. This protein would package the DNA into the portal vortex, which would all probably be located in the cytoplasm. /note=Secondary Annotator Name: Patel, Sahaj; Rishi Patel /note=Secondary Annotator QC: I have QC’ed this location call and agree with this annotation. However, much more detail can be given in the "Coding Potential", you didn`t talk about the details of the Host or Self Genemark. In the "SD(Final Score)", you never depicted if that score was the best score out of all of the Final Scores present. Under "Gap/Overlap", you could mention whether or not that gap is conserved amongst other phages to justify its existence, as well as the possibility that the overlap might sign that gene being part of an operon since it is -4 or less. Under "Starterator", you didn`t mention if the evidence agrees with Glimmer/Genemark, and how many draft genes were manually annotated (to clarify you did mention how many draft genes there are, but not whether or not they were manually annotated). CDS 10858 - 11115 /gene="35" /product="gp35" /function="hypothetical protein" /locus tag="Clarkson_35" /note=Original Glimmer call @bp 10858 has strength 4.86; Genemark calls start at 10858 /note=SSC: 10858-11115 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TESLA_33 [Mycobacterium phage Tesla] ],,NCBI, q1:s1 100.0% 2.24033E-53 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.64, -3.348261428751315, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TESLA_33 [Mycobacterium phage Tesla] ],,AVE00779,100.0,2.24033E-53 SIF-HHPRED: SIF-Syn: NKF, upstream gene is terminase (small subunit), downstream gene is terminase (large subunit) just as in Beelzebub. /note=Primary Annotator Name: Pham, Britney /note=Auto-annotation: Glimmer and GeneMark, Start site 10858, Start codon TTG /note=Coding Potential: Yes, but does not cover all of the coding potential /note=SD (Final) Score: The gene has a final score of -3.348 and a Z-score of 2.64 which is the best of all the start sites. /note=Gap/overlap: There is a gap of 0, which means that this gene is the longest reasonable ORF with a sufficient sequence length /note=Phamerator: Gene is found in pham number 10858 as of 1/13/2022. It appears to be conserved (commonly annotated) in Cluster S apparently for the entire pham. I looked further into comparisons for Beezlebub and Blackbeetle. Phamerator did not have a function called. /note=Starterator: There is a reasonable start site choice that is conserved among members of the pham to which my gene belongs. Start site number 1-10858 for Clarkson. There are 14 other members in this pham and 14 non-draft phages call site 1 as the start site. /note=Location call: The gene seems to start at 10858 and stop at 11115. This gene is most likely real /note=Function call: Unknown, from both NCBI and phagesDB, the function of the gene has been determined as undefined but there has been strong hits. The function frequency table is empty meaning that no one has called the gene`s function yet. CDD had no hits called, phagesDB said the ORF was unknown function and HHpred had too large of evalue to consider the functions. So overall, the function is deemed unknown. /note=Transmembrane domains: From TmHmm and Topcon, there are no hits for any amino acids spanning across the membrane. This does correspond with the sequence`s lack of function called as the sequence cannot signal transduce. /note=Secondary Annotator Name: Pay, Iona /note=Secondary Annotator QC: I agree with this annotation -- all of the evidence categories have been considered. CDS 11120 - 12862 /gene="36" /product="gp36" /function="terminase, large subunit" /locus tag="Clarkson_36" /note=Original Glimmer call @bp 11120 has strength 15.03; Genemark calls start at 11120 /note=SSC: 11120-12862 CP: yes SCS: both ST: SS BLAST-Start: [terminase large subunit [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 0.0 GAP: 4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.495, -3.712290173411778, no F: terminase, large subunit SIF-BLAST: ,,[terminase large subunit [Mycobacterium phage MosMoris] ],,YP_009031543,100.0,0.0 SIF-HHPRED: Terminase large subunit; genome packaging, bacteriophage, ATPase, nuclease, VIRAL PROTEIN; HET: BR; 2.2A {Enterobacteria phage HK97},,,6Z6D_A,95.5172,100.0 SIF-Syn: Terminase, large subunit, the upstream gene is NKF, downstream is portal protein, similarly found in phages Tesla and Poise. /note=Primary Annotator Name: Pramana, Martin /note=Auto-annotation: Both GeneMark (Self and Host) and Glimmer calls for the same start site 11,120. With a start site of ATG. /note=Coding Potential: reasonable coding potential is in an ORF with a forward direction (at the second frame). The coding potential includes all possible start sites and is found in both GeneMark Self and Host. /note=SD (Final) Score: -3.712. It is the third-best Final Score on PECAAN. /note=Gap/overlap: There is a reasonably small gap of 4bp in between the upstream gene and the start site. This gap is conserved in other phages such as Poise and there is no coding potential in the gap between the upstream gene and the potential start site. This start site is the Longest ORF. /note=Phamerator: As of 1/13/2022 the gene is found in Pham 91114. It is conserved in cluster S (Poise and Beelzebub) but also found in other clusters such as cluster A (20ES and 40AC). /note=Starterator: There are 935 non-draft genes in this pham. Start site 139 is conserved in manually annotated 47/935 non-draft genes. Start site 139 has a position of 11120 bp in Clarkson. The start site agrees with both Glimmer and GeneMark. /note=Location call: Based on the evidence above, this is a real gene and the start site is most likely 11120 bp. Starterator agrees with both Glimmer and GeneMark. /note=Function call: Terminase, Large subunit. The top 2 matches from PhagesDB BLASTp function is terminase, large subunit. Both have high % identity (100%), low E values (0), and high query coverage. Similarly, the 2 hits from NCBI BLASTp call for the same function of terminase, large subunit, with low E values (<2e-164), reasonable % identity (>46.30%), and high query coverage. CDD did not give any matches/ hits. The top 3 matches from HHpred all have high probability (100), high % coverage (>94.1379%), and low E-values (<4.8e-31). /note=Transmembrane domains: Both TMHMM and TOPCONS did not predict any TMDs, so it is not a transmembrane protein. /note=Secondary Annotator Name: Perez, Joshua /note=Secondary Annotator QC: I have QC the above annotation and believe that 11,120 bp is the correct start site. There is good coding potential evidence given, starterator agrees with Glimmer and GeneMark, and the gap is only 4 bp. However, did not mention there manually edited draft genomes present (I believe 83?) CDS 12859 - 14424 /gene="37" /product="gp37" /function="portal protein" /locus tag="Clarkson_37" /note=Original Glimmer call @bp 12859 has strength 13.35; Genemark calls start at 12859 /note=SSC: 12859-14424 CP: yes SCS: both ST: SS BLAST-Start: [portal protein [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.64, -4.255672789525901, no F: portal protein SIF-BLAST: ,,[portal protein [Mycobacterium phage MosMoris] ],,YP_009031544,100.0,0.0 SIF-HHPRED: PORTAL PROTEIN; BACTERIOPHAGE SPP1, DNA TRANSLOCATION, MOLECULAR MOTOR, VIRAL PORTAL PROTEIN, VIRAL PROTEIN; HET: CA, HG; 3.4A {BACTERIOPHAGE SPP1},,,2JES_Q,82.9175,100.0 SIF-Syn: Portal protein, upstream gene is terminase, downstream is MuF-like minor capsid protein. Similar to phage Beezlebub and Tesla. /note=Primary Annotator Name: Reyes, Glania /note=Auto-annotation start source: Both Glimmer and GeneMark. They both agree on the same start site at 12859. The start codon is ATG. /note=Coding Potential: Coding potential in the ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: The final score is the best option at -4.256 and the z score is the highest at 2.64. /note=Gap/overlap: -4 bp. This is reasonable and evidence that the gene is part of an operon. This is also conserved in other phages as well, such as LittleLaf and Raela. /note=Phamerator: Pham: 95678. Date 1/12/2022. It is conserved; found in Corazon (S) and Raela (S). /note=Starterator: Start site 69 in Starterator was manually annotated in 278/1356 non-draft genes in this pham. This is the most annotated gene, but Start 40 is 12859 in Clarkson. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 12859. /note=Function call: Portal protein. The top 5 phagesdb BLAST hits have the function of portal protein (E-value of 0.0), and the top 4 NCBI BLAST hits also have the function of portal protein. The first NCBI BLAST has 100% coverage, and E-value of 0. The next couple hits go down in coverage (averaging ~50% coverage) but still name the protein function as portal protein. HHpred had a hit for portal protein with 100% probability, 82.9175% coverage, and E-value of 2.8e-33. CDD had no relevant hits. /note=Transmembrane domains: No transmembrane domains. This makes sense because this is a portal protein, which helps the phage transfer genomic content, and does not function through transmembrane. /note=Secondary Annotator Name: Pham, Britney /note=Secondary Annotator QC: The GeneMark and Glimmer start sites are the same. The Starterator Map indicated a most common start sight, while Phamerator indicates that this gene is conserved in other genomes too. The start sight also has the best Z-value and Final Score and no large overlaps. /note=Secondary Annotator Name: Perez, Joshua /note=Secondary Annotator QC: All sections of PECAAN have been filled out. For evidence boxes, most have been checked off. However, you compared your synteny to Beezlebub, and it is not checked on phagesDB blast. Is there a reason for this? If it is similar enough, I would say to check it off for evidence since you are using it as a comparison. All drop down menus have been filled out and checked off. Also, maybe explain why having no TM domains would make sense with your gene. CDS 14421 - 15065 /gene="38" /product="gp38" /function="hypothetical protein" /locus tag="Clarkson_38" /note=Original Glimmer call @bp 14421 has strength 9.57; Genemark calls start at 14421 /note=SSC: 14421-15065 CP: yes SCS: both ST: SS BLAST-Start: [MuF-like minor capsid protein [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 3.03547E-156 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.222, -4.278576812492247, no F: hypothetical protein SIF-BLAST: ,,[MuF-like minor capsid protein [Mycobacterium phage MosMoris] ],,YP_009031545,100.0,3.03547E-156 SIF-HHPRED: SIF-Syn: NKF, upstream gene is portal protein, downstream is scaffolding protein, just like in the phages LittleLaf and Corazon. /note=Primary Annotator Name: Rodriguez, Sean /note=Auto-annotation: Glimmer and GeneMark. Both of them agree on start site at 14421 bp. /note=Coding Potential: Coding potential found in both GeneMark Self and Host, with Self having coding potential over a wider range than Host. Putative ORF has coding potential in the forward strand only, indicating that it is a forward gene. The chosen start site includes all coding potential. /note=SD (Final) Score: The final score is the third-best option at -4.279 and the z score is fourth-best at 2.222. The RBS score is irrelevant for the start call given the overlap information. /note=Gap/overlap: The putative ORF overlaps the upstream gene by 4 bp, suggesting that this gene is part of an operon. The gap between this gene and the downstream gene is a little large at 130 bp, but the gene is conserved in several other phages and the gap is seen in other phages such as MosMorris and Gattaca with the exact same length. The gene with the autoannotated start site is 645 bp long which is acceptable. /note=Phamerator: The pham number is 8316 as of January 12, 2022. The gene is conserved in many other Cluster S phages, including Corazon and Gattaca. The function call for the gene is a MuF-like minor capsid protein, which is consistent between Phamerator and the Phams database. The gene is on the SEA-PHAGES list as a hypothetical protein. /note=Starterator: There are 14 non-draft members in this pham, and 14/14 call the most conserved site (site #1). This site corresponds to a start site of 14421 bp for Clarkson. /note=Location call: Considering the above evidence, this is a real gene that has a start site at 14421 bp. /note=Function call: MuF-like minor capsid protein. 5 out of the top 6 phagesdb BLAST hits have this function (all e values < e-124), with all of these matches showing 100% identity. 3 out of the top 4 NCBI BLAST hits also call the function of MuF-like minor capsid protein (100% Query Coverage, 99%+ identity, and e value < 2e-151). HHpred had a hit for MuF-like protein with 96.9% probability, 33% coverage, and e value of 0.0058. CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Pramana, Martin and Pham, Britney /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator. Both GeneMark and Glimmer called for the same start site of 14421 bp which includes all coding potential. This gene contains the most annotated start site which is start site 1. It has 14 manual annotations of 14 non-draft genomes. The gene has an overlap of 4 bp which is favorable. Although the Z score and SD score is not the best, both are still reasonable to support the called start site. I have QC`ed this location call and and agree with the 1st and second annotator. Both GeneMark and Glimmer called the same start site of 14421 which includes all the coding potential. The gene is the most annotated and had 14 manual annotations of 14 non-draft called. The gene had a favorable overlap and althought he Z and SD score is not the best, they both are reasonable. The function call seems accurate going from PhagesDB and HHpred which seems to fit the transmembrane protein lack. CDS 15195 - 15827 /gene="39" /product="gp39" /function="scaffolding protein" /locus tag="Clarkson_39" /note=Original Glimmer call @bp 15195 has strength 8.26; Genemark calls start at 15195 /note=SSC: 15195-15827 CP: yes SCS: both ST: SS BLAST-Start: [scaffolding protein [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 2.47769E-146 GAP: 129 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.039, -4.595786194553911, no F: scaffolding protein SIF-BLAST: ,,[scaffolding protein [Mycobacterium phage MosMoris] ],,YP_009031546,100.0,2.47769E-146 SIF-HHPRED: Scaffold protein; major capsid protein, HK97-like fold, scaffolding protein, procapsid, VIRUS; 3.72A {Staphylococcus phage 80alpha},,,6B0X_f,58.0952,99.2 SIF-Syn: Scaffolding protein; gene upstream is NKF and gene downstream is a capsid decoration protein– as seen in phages Beelzebub and Corazon /note=Primary Annotator Name: Ruiz, Paola /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 15195. /note=Coding Potential: For both GeneMark Host and Self-Trained the coding potential in this ORF is mainly on the forward strand, indicating that this is a forward gene. It accurately shows an upward hash around 15195 and ends with a downward hash at 15827. The typical and atypical coding potential also spans the length of the start and end sight. This gene has good coding potential. /note=SD (Final) Score: The final score is -4.596 This is not the best final score on PECAAN, it is only the fourth best option. The z score, 2.039 is also the fourth best. /note=Gap/overlap: There is a 129 bp gap which is large but it is conserved in other phages such as Beezlbub and Corazon and there is no coding potential in the gap that may be a new gene. /note=Phamerator: pham: 13679. Date 1/13/2021. It is conserved and found in Beelzebub, JoieB, Tesla, and LittleLaf. They are all in cluster S and are 633bp based on phagesdb pham page. /note=Starterator: Start site number is 1 which correlates to start site 15195 bp for clarkson. It was called in 14 of the 14 non-draft genes in pham. /note=Location call: Based on the above evidence, this may not be a real gene. More evidence is needed. The gap is large and it does not have the best final and Z score. /note=Function call: Function is Scaffolding protein. Top 7 of the phagesdb BLAST hits have the function of scaffolding protein (E value e-116, 100% identity, 100% positives). For NCBI BLAST, the top hit had the same function and good e value (3e-146), 100% identity and 100% coverage. CDD was not helpful in determining function and HHpred top hit agreed on scaffolding protein and had a decent e value (3.2 e-10). /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Reyes, Glania; Pramana, Martin /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. The GeneMark and Glimmer start sites are the same. The Starterator Map indicated a most common start sight, and Phamerator indicates that this gene is conserved in cluster S. The start site does not have the best Z-value and Final Score and no large overlaps. This is possibly not a real gene, and we need stronger evidence. However, the Starterator data shows that this gene has the most annotated start, which could trump all other evidence. /note=I have QC`ed this gene and agree with the primary annotator. Both GeneMark and Glimmer calls for the same start site of 15195 bp. This gene belongs to pham 13679 and has the most annotated start site of 1 conserved in the other 14 non-draft genes. There is a conserved gap upstream of this gene that is found in other phages such as Beelzebub and Corazon. Although the E-values for the start site is the second from the lowest among possible start sites, starterator does confirm that it`s the most manually annotated start site. PhagesDB Blast and NCBI Blast both provide evidence for this gene as a scaffolding protein with reasonably low e-values. CDD did not have any relevant hits. HHpred did get one hit with a low e value of 4.3e-9 that states scaffolding protein as the function of this gene. TmHmm and TOPCONS did not provide any evidence that this gene has a transmembrane domain. However, I think that the primary annotator should mention the name of phages that are conserved in the gap/overlap and phamerator section. I also notice there is a slight typo in the synteny box for phage Beelzebub. CDS 15831 - 16235 /gene="40" /product="gp40" /function="capsid decoration protein" /locus tag="Clarkson_40" /note=Original Glimmer call @bp 15831 has strength 10.03; Genemark calls start at 15831 /note=SSC: 15831-16235 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FH33_gp037 [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 8.00324E-94 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.076, -2.442961286954254, yes F: capsid decoration protein SIF-BLAST: ,,[hypothetical protein FH33_gp037 [Mycobacterium phage MosMoris] ],,YP_009031547,100.0,8.00324E-94 SIF-HHPRED: HDPD ; Bacteriophage lambda head decoration protein D,,,PF02924.17,75.3731,99.4 SIF-Syn: capsid decoration protein, upstream gene is scaffolding protein, downstream is major capsid protein, just like in phage Corazon /note=Primary Annotator Name: Saha, Atul /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 15831 bp. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: The final score is best at -2.443 and the z score is the highest at 3.076. /note=Gap/overlap: The gap is 3 base pairs, which is realistic. There is no coding potential in the gap, and the ORF includes all coding potential. /note=Phamerator: pham: 94496. Date 1/13/22. It is conserved; found in Beelzebub and Black Beetle. /note=Starterator: 96/116 members of this pham are non-draft members. 86/90 non-draft members call start site 4, which correlates to a start site of 15831 bp for Clarkson. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 15831 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: Clarkson_38 has the function of “capsid decoration”. Both the phagesDB and NCBI BlastP searches yielded highly matched alignments with low e values (phagesDB E <-73), supporting the idea that the gene is conserved across multiple phages as a capsid decoration protein. PhagesDB showed that in Cluster S, which includes Clarkson, multiple other phages such as Tesla and Gattaca had 100% matches with the Clarkson sequence. Across clusters, 9/10 of top functionality hits on PhagesDB Blast (combining for 98% of all hits) correspond to capsid decoration proteins. NCBI BlastP supported these findings. HHPred shows that there are multiple hits that share the head decoration functionality. One such hit, 1TD4_A, had a 99.4% probability, 79.9% coverage, and an E value of 1.9e-11 /note=Transmembrane domains: TmHmm predicts 0 transmembrane domains. /note=Secondary Annotator Name: Rodriguez, Sean; Reyes, Glania /note=Secondary Annotator QC: I have OC`d this location call and agree with the first annotator. Suggestions: Add to "Coding potential" whether or not the start site covers all of the coding potential. Select options for the coding potential and starterator drop-down menus. Mention the length of the putative ORF in the "gap/overlap" section. Mention the function call within the pham. This new pham information slightly changes the Starterator data although the overall conclusions would be the same. CDS 16247 - 17317 /gene="41" /product="gp41" /function="major capsid protein" /locus tag="Clarkson_41" /note=Original Glimmer call @bp 16247 has strength 16.64; Genemark calls start at 16247 /note=SSC: 16247-17317 CP: yes SCS: both ST: SS BLAST-Start: [major capsid protein [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 0.0 GAP: 11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.907, -2.8564937087383147, yes F: major capsid protein SIF-BLAST: ,,[major capsid protein [Mycobacterium phage MosMoris] ],,YP_009031548,100.0,0.0 SIF-HHPRED: YSD1_17 major capsid protein; capsid protein, VIRAL PROTEIN; 2.6A {Bacteriophage sp.},,,6XGP_B,97.191,100.0 SIF-Syn: Major capsid protein. Upstream gene is capsid decoration protein, downstream gene is head-to-tail adaptor, like in phages Tesla and Raela, both in Cluster S with Clarkson. /note=Primary Annotator Name: Scavetti, Alexa /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 16247 with start codon ATG. /note=Coding Potential: Yes. Both Self- and Host-Trained GeneMark show coding potential in the forward direction in the region from 16247 bp to 17317 bp. The chosen start site of 16247 does cover all of the coding potential for this region. /note=SD (Final) Score: -2.856 with a Z-value of 2.907. This is the best possible SD score and Z-value according to PECAAN. /note=Gap/overlap: There is a reasonable 11 bp gap between this gene and the upstream gene, creating the longest possible ORF and a reasonable gene length. /note=Phamerator: Pham number 56987 as of 1/13/2022. Conserved in other S cluster phages, including Corazon and Blackbeetle. Function call is major capsid protein, which is consistent between Phamerator and the phams database and is included on the approved SEA-PHAGES list. /note=Starterator: Start site 4 is conserved among 56/100 non-draft genomes in the pham; however, start site 1 is conserved among 13/14 non-draft genomes in same cluster as Clarkson (S). Start site 1 corresponds to 16247 in Clarkson and was indicated by GeneMark and Glimmer. /note=Location call: Based on the evidence, this is a real gene and the most likely start site is 16247. Starterator agrees with Glimmer and GeneMark for S cluster phages. /note=Function call: The top two hits on both PhagesDB and NCBI BLASTp, sorted by e-value, suggested function is major capsid protein, with high query coverage (100%), high % identity (>99%), and low e-values (0.0). Both CDD and HHpred also called major capsid protein, with high HHpred probability (100%), high coverage (>99%), and low e-values (<10^-37). /note=Transmembrane domains: Neither TMHMM nor TOPCONS predict any TMDs, therefore it is not a membrane protein. This agrees with the functional call of major capsid protein, which is not associated with the membrane. /note=Secondary Annotator Name: Ruiz, Paola, Rodriguez, Sean /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. /note=For Phamerator, mention the programs that the function call is consistent between. Fill out the transmembrane domains section even though it is uninformative. Mention 2-3 of the informative HHPRED hits. Please mention the CDD entry as the probability is low but the hit still makes sense for the function of the gene. Fill out the synteny box and function window. Check the evidence boxes for HHpred and CDD. CDS 17398 - 17823 /gene="42" /product="gp42" /function="head-to-tail adaptor" /locus tag="Clarkson_42" /note=Original Glimmer call @bp 17398 has strength 11.75; Genemark calls start at 17398 /note=SSC: 17398-17823 CP: yes SCS: both ST: SS BLAST-Start: [head-to-tail adaptor [Mycobacterium phage Pringar]],,NCBI, q1:s1 100.0% 3.69107E-96 GAP: 80 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.416, -3.8137921535956054, yes F: head-to-tail adaptor SIF-BLAST: ,,[head-to-tail adaptor [Mycobacterium phage Pringar]],,QFP96904,99.2908,3.69107E-96 SIF-HHPRED: 15 PROTEIN; VIRAL PROTEIN, VIRAL INFECTION, TAILED BACTERIOPHAGE, SIPHOVIRIDAE, SPP1, VIRAL ASSEMBLY, HEAD-TO-TAIL INTERFACE, DNA GATEKEEPER, ALLOSTERIC MECHANISM; 7.2A {BACILLUS PHAGE SPP1},,,5A21_C,85.1064,98.1 SIF-Syn: Head to tail adaptor, upstream gene is a major capisid protein and downstream gene is a head to tail stopper, just like in phage Corazon which is in the same cluster. /note=Primary Annotator Name: Shah, Aayushi /note=Auto-annotation: Glimmer and GeneMark, both at start site ​​17398 /note=Coding Potential: There is reasonable coding potential between the putative ORF, and the chosen start site covers all this coding potential. /note=SD (Final) Score: -3.814, the best final score on PECAAN /note=Gap/overlap: 80 BP, second lowest possible gap on PECAAN. Large but reasonable as this is preserved in synteny and no significant coding potential is seen in that region. /note=Phamerator: The pham as of 01/12/22 is 64323. A lot of other phages in this cluster have this pham present, as seen in Beelzebub_45, Marvin_39, and Pringar_41. The function called in phamerator is head to tail adaptor. /note=Starterator: Start site 2 is conserved across many manually annotated genomes in the pham, and represents the start site at bp 17398. 13/14 non-draft genes call this start site, which is strong evidence. /note=Location call: Based on evidence the gene is real and has a start site of 17398, which covers all coding potential and has the most reasonable gap and final score, as well as evidence from phamerator as conserved within a pham. /note=Functional call: Head-to-tail adaptor: The top 5 PhagesDB BLAST hits call the function as a head to tail adaptor with e values of 1e-79 for all of them, constituting strong evidence. In the NCBI BLAST, only two of the top five calls sorted by e value called it as a head to tail adaptor, with e values of 4e-96 and 1e-08, the rest calling function unknown, so the evidence is much weaker with the NCBI BLAST, though still decent. While no CDD hits were found, HHpred found one hit with an e value of 0.00022, 98.14% probability, and 85% coverage that called it as a head to tail connector, and was listed as required evidence for this gene’s function to be a head to tail adaptor. There are no TMDs, which makes sense with this given function. Thus, there is strong evidence for this function. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, so it is not a membrane protein. /note=Secondary Annotator Name: Saha, Atul; Ruiz, Paola /note=Secondary Annotator QC: no additional comments /note=I have QC’ed this gene and agree with the first and second annotator. CDS 17823 - 18194 /gene="43" /product="gp43" /function="head-to-tail stopper" /locus tag="Clarkson_43" /note=Original Glimmer call @bp 17823 has strength 12.21; Genemark calls start at 17877 /note=SSC: 17823-18194 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein FH33_gp040 [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 5.24007E-84 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.326, -4.352304813682479, no F: head-to-tail stopper SIF-BLAST: ,,[hypothetical protein FH33_gp040 [Mycobacterium phage MosMoris] ],,YP_009031550,100.0,5.24007E-84 SIF-HHPRED: HEAD COMPLETION PROTEIN GP16; VIRAL PROTEIN, VIRAL INFECTION, TAILED BACTERIOPHAGE, SIPHOVIRIDAE, SPP1, VIRAL ASSEMBLY, HEAD-TO-TAIL INTERFACE, DNA GATEKEEPER, ALLOSTERIC MECHANISM; 7.2A {BACILLUS PHAGE SPP1},,,5A21_E,91.8699,92.8 SIF-Syn: head-to-tail stopper; upstream gene is head-to-tail adaptor; downstream gene has NKF, similar to phage Corazon, where genes 41 and 42 are head-to-tail complex proteins and gene 43 has NKF. /note=Primary Annotator Name: Shaikh, Iman /note=Auto-annotation: both Glimmer and Genemark, with the start site for Glimmer at 17823, and the start site for Genemark at 17877 with an ATG start codon. /note=Coding Potential: These gene has reasonable coding potential which can be seen in the host-trained and self-trained GeneMark. The chosen start site covers the coding potential. /note=SD (Final) Score: -4.352 is the highest final score and is located at start site 17823 /note=Gap/overlap: There is a 1 basepair overlap at start site 17823 which is very small and is the smallest overlap out of all of the possible start sites, so it is likely that this is the correct start site. /note=Phamerator: The pham is 83108 according to Jan 12, 2022. Several genes in this cluster also have this pham present. /note=Starterator: Start site 4 is the most highly conserved, with 62/77 non-draft genomes calling this site. /note=Location call: It is very likely that this gene starts at 17,823. /note=Function call: head-to-tail stopper /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any transmembrane domains for this gene. /note=Secondary Annotator Name: Scavetti, Alexa /note=Secondary Annotator QC: I agree with the location call given the above evidence. Note: Starterator and Coding Potential menus have not been filled out. I would also suggest adding the start codon to `Auto-annotation,` adding rationale to the `Gap/overlap` section, adding the function call and specific comparison phages to `Phamerator,` and adding the specific base pair corresponding to the start site in `Starterator.` See Lab Manual: PECAAN Notes for more detail. There also appears to be a typo in the base pair listed in the `Location call` section (should be 17823 instead of 17328). CDS 18194 - 18517 /gene="44" /product="gp44" /function="hypothetical protein" /locus tag="Clarkson_44" /note=Original Glimmer call @bp 18194 has strength 10.43; Genemark calls start at 18194 /note=SSC: 18194-18517 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FH33_gp041 [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 1.31423E-72 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.754, -5.188777976577085, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FH33_gp041 [Mycobacterium phage MosMoris] ],,YP_009031551,100.0,1.31423E-72 SIF-HHPRED: SIF-Syn: My gene is NKF. upstream gene is head to tail stopper, downstream is tail terminator, like in Corazon. /note=Primary Annotator Name: Sharma, Devshi /note=Auto-annotation: I used Glimmer and Genemark to determine if my gene is real. They both agree with each other and the start site number is 18194. The start codon is called ATG. /note=Coding Potential: The gene has very good coding potential predicted within the potential open reading frame. This is seen in the host trained gene mark and the self trained gene mark. The start site covers the entire coding potential. Furthermore, since I am looking at the graphs and synteny and see little to no gap or overlap from my genes to neighboring genes, I can say that I believe 18194 would be my start site. There is a decent final and z score as well. I also think that my gene sequence could be an operon because there was a -1 overlap instead of a gap. /note=SD (Final) Score: -5.189 This is not the best SD score, however it gives the least gap between the previous sequence and my sequence. The RBS score I think may be irrelevant for the start call because seeing the overlap and agreement from Glimmer and Genemark is more convincing. /note=Gap/overlap: -1 This is an overlap that is reasonable because an operon normally consists of this overlap. I do not believe that there are alternative start candidates because of the evidence given from Glimmer and Genemark. This reading frame is the LORF. The length of the gene is acceptable being around 400 base pairs long. /note=Phamerator: Pham 20891 found on 01/12/2022. The pham in which my gene is conserved in other members of the cluster/subcluster.The phages I used for comparison were Beezlebub_47 and Blackbeetle_43. /note=Starterator: The start site number in this pham that is conserved is 1 and the base pair coordinate that this corresponds to is 18194. 14 members are in this pham and 14/14 have call site #1. /note=Location call: I think the best start site is 18194. There is good coding potential and this is maybe an operon. I will be able to determine this when I find out the function of my gene. /note=Function call: PhagesDB and BLAST did not give a good indication of what the function of my gene is. CDD did not result in any hits while HHpred did even though not many conclusions could be made from either program /note=Transmembrane domains: No transmembrane domains were predicted /note=Secondary Annotator Name: Shah, Aayushi; Scavetti, Alexa /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered, including coding potential, starterator, gap/overlap, synteny, and find score. Noticed a typo in starterator notes but otherwise the annotation looks good. All boxes are completed and checked. CDS 18521 - 19027 /gene="45" /product="gp45" /function="tail terminator" /locus tag="Clarkson_45" /note=Original Glimmer call @bp 18521 has strength 6.84; Genemark calls start at 18521 /note=SSC: 18521-19027 CP: yes SCS: both ST: SS BLAST-Start: [tail terminator [Mycobacterium phage Pringar]],,NCBI, q1:s1 100.0% 1.17266E-119 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.243, -4.234594972733653, no F: tail terminator SIF-BLAST: ,,[tail terminator [Mycobacterium phage Pringar]],,QFP96907,98.8095,1.17266E-119 SIF-HHPRED: Tail terminator protein Rcc01690; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_F,79.1667,98.7 SIF-Syn: Tail terminator, upstream and downstream genes function unknown. just like in phages Beelzebub and Corazon. /note=Primary Annotator Name: Sun, Xingzheng /note=Auto-annotation: Both Glimmer and Genemark called start at 18521. /note=Coding Potential: Coding potential is found in both GeneMark Self and Host. The ORF has a coding potential on the forward strand, and the chosen start site also includes all of the coding potential. /note=SD (Final) Score: The Final Score is the best at -4.235, and the z-value is above 2. /note=Gap/overlap: Gap of 3. It is a small value and is acceptable. /note=Phamerator: The Pham number is 57340 on the date of 1/12/2022. It is conserved in other phages such as Beelzebub and Blackbeetle that also belong to cluster S. /note=Starterator: Start site 7 at 18521 in Starterator was manually annotated in 32 genes in this Pham. It is the same as the site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence, it is a real gene with the start site most likely to be 18521. /note=Function call: Tail terminator. The PhageDB BLAST hits have the function of tail terminator of 100% frequency with a significantly small e-value of 2e^-95. NCBI BLAST also suggests it to be a tail terminator on the second top hit with an e-value of 1e-119. HHpred has the topmost hit corresponding to this result with a probability of 98.75, percent coverage of 79.1667, and e-value of 1.3e-6. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs. It is not a membrane protein which is reasonable as the function called tail terminator. /note=Secondary Annotator Name: Shaikh, Iman /note=Secondary Annotator QC: I agree with the location call for this gene and in the annotation module notebook. Great work on the detail in your PECAAN notes and rationale behind item. /note=Secondary Annotator Name: Shah, Aayushi /note=Secondary Annotator QC: I agree with this annotation and functional call. All of the evidence categories have been considered, including coding potential, gap/overlap, synteny, and final score, and function call evidence is presented. Explain a bit more for starterator. CDS 19043 - 19708 /gene="46" /product="gp46" /function="major tail protein" /locus tag="Clarkson_46" /note=Original Glimmer call @bp 19043 has strength 14.72; Genemark calls start at 19043 /note=SSC: 19043-19708 CP: yes SCS: both ST: SS BLAST-Start: [major tail protein [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 5.95379E-161 GAP: 15 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.311, -2.033982896655645, yes F: major tail protein SIF-BLAST: ,,[major tail protein [Mycobacterium phage MosMoris] ],,YP_009031553,100.0,5.95379E-161 SIF-HHPRED: YSD1_22 major tail protein; Bacteriophage tail, helical assembly, VIRAL PROTEIN; 3.5A {Bacteriophage sp.},,,6XGR_M,83.7104,98.1 SIF-Syn: Major tail protein, upstream gene is tail terminator, downstream is tail assembly chaperone, just like in phage Beezlebub /note=Primary Annotator Name: Taylor, Amaya /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 19043 /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host /note=SD (Final) Score: -2.034 - best final score on PECAAN /note=Gap/overlap: 15 - reasonable for a gene /note=Pham:94448. Date 01/07/22 . It is conserved. /note=Starterator: Start site 16 in Starterator was manually annotated in 107 of 197 non-draft genes in this pham. Start 16 is 19043 in Clarkson. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence the real gene’s start site is at 19043 /note=Function call: There is sufficient evidence to conclude the function of this gene is major tail protein. BLAST results showed a very low E value of 6e-161 a high score of 453. HHPred results also showed a very low E-value of .00037 and called it a major tail protein. CDD had no relevant hits. /note=Transmembrane domains: No predicted TMD’s in TMHMM or TOPCONS. Can conclude there are no transmembrane domains. /note=Secondary Annotator Name: Sharma, Devshi /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. I see that you have updated the starterator and coding potential drop down menu in Pecaan notes but make sure you fill our the menu. Also, I see that the starterator notes are incorrect or mistyped. I found that in starterator, it says "Found in 107 of 197 ( 54.3% ) of genes in pham" and you have typed something else. Otherwise, it looks great! /note= /note=New secondary annotator (module 9): Shaikh, Iman /note=Secondary Annotator QC: I have QCd the function call for this gene and I agree with the primary annotator. Based on all of the evidence, I also believe that this gene encodes a major tail protein. CDS 19842 - 20414 /gene="47" /product="gp47" /function="tail assembly chaperone" /locus tag="Clarkson_47" /note=Original Glimmer call @bp 19842 has strength 11.53; Genemark calls start at 19842 /note=SSC: 19842-20414 CP: yes SCS: both ST: SS BLAST-Start: [tail assembly chaperone [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 5.86077E-138 GAP: 133 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.245, -6.897232859360425, no F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Mycobacterium phage MosMoris] ],,YP_009031555,100.0,5.86077E-138 SIF-HHPRED: SIF-Syn: Upstream, major tail protein. Downstream, tail assembly chaperone (fusion protein). Just like in Beelzebub and JoieB. /note=Primary Annotator Name: Torres Espinosa, Michael /note=Auto-annotation: Auto annotation is present for both Glimmer and Genemark. The start site is 19842 for both. Start codon is ATG. /note=Coding Potential: The gene does have reasonable coding potential. The chosen start site does cover this coding potential. /note=SD (Final) Score: The RBS final score is -6.897. This is not the best final score on PECAAN. It is lower than other RBS final scores and similar to others. The next most reasonable start site has an RBS final score of -4.666. /note=Gap/overlap: The gap is 133 base pairs. Though this gap is large, it is still reasonable because it is also present in other phages within the same cluster. Additionally, this gap also lacks coding potential. /note=Phamerator: As of January 13, 2022, the pham for this gene is 2466. This gene is conserved in the phages Beelzebub, Corazon, and Gattaca. Each of these is in the same cluster as Clarkson, namely cluster S. /note=Starterator: There are 14 non-draft annotations for this pham. 13/14 non-draft annotations call start site 1, which corresponds to base pair coordinate 19842 for Clarkson. This agrees with the start site predicted by Glimmer and Genemark. /note=Location call: Prior evidence suggests this is a real gene with a start site at 19842. Starterator agrees with Glimmer and Genemark. /note=Function call: This is a tail assembly chaperone. The top 28 hits in PhageDB BLAST have the function of tail assembly chaperone with e-values ranging from 3e-88 to e-105. The top 9 hits in NCBI BLAST also have the function of tail assembly chaperone with e-values ranging from 5e-122 to 6e-138, coverage ranging from 92% to 100%, and percent identity ranging from 99% to 100%. CDD was not informative because it did not show any hits, and HHpred was also uninformative since the hits it provided had very high E-values. /note=Transmembrane domains: Neither TMHMM nor TOPCONS call any TMDs for this gene. This is not a membrane protein. /note=Secondary Annotator Name: Sun, Xingzheng; Devshi Sharma /note=Secondary Annotator QC: I have QC’ed this annotation and agree with the first annotator on location and functional call; I have QC’ed this location call and agree with the first and second annotator. CDS join(19842..20369,20369..20911) /gene="48" /product="gp48" /function="tail assembly chaperone" /locus tag="Clarkson_48" /note= /note=SSC: 19842-20911 CP: yes SCS: neither ST: NA BLAST-Start: [tail assembly chaperone [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 0.0 GAP: -573 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.245, -6.897232859360425, no F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Mycobacterium phage MosMoris] ],,YP_009031554,100.0,0.0 SIF-HHPRED: GP24_25 ; Mycobacteriophage tail assembly protein,,,PF17388.5,33.427,98.6 SIF-Syn: /note=second tail assembly chaperone. Annotated using Corazon as an example. CDS 20889 - 26438 /gene="49" /product="gp49" /function="tape measure protein" /locus tag="Clarkson_49" /note=Original Glimmer call @bp 20931 has strength 10.11; Genemark calls start at 20931 /note=SSC: 20889-26438 CP: yes SCS: both-cs ST: SS BLAST-Start: [tape measure protein [Mycobacterium phage Corazon]],,NCBI, q1:s1 100.0% 0.0 GAP: -24 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.875, -3.7666523115065305, yes F: tape measure protein SIF-BLAST: ,,[tape measure protein [Mycobacterium phage Corazon]],,QFP97602,100.0,0.0 SIF-HHPRED: Tape Measure Protein, gp57; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_AF,14.1157,100.0 SIF-Syn: Tape Measure Protein, upstream is minor tail protein (pham 98709), downstream is pham 18949 but the next known protein function is gene 45 as Tail Assembly Chaperone (Pham 2466), as observed similarly in Raela Phage. There is a gene listed with pham 88602 that is observed in other final phages, it overlaps with gene downstream of gene 47 and similar in other phages. /note=Primary Annotator Name: Vazquez, Gilda /note=Auto-annotation: Glimmer and Genemark were both used. Both agree on same start site. Start site number called was 20931. /note=Coding Potential: This gene does have reasonable coding potential predicted within the putative ORF. The chosen start site does cover all the coding potential of the gene, observed in lines before the stop site. /note=SD (Final) Score: -7.190. This is the Final score value for the called gene but it is not the best FS value, a better one is observed for stat site 20889. /note=Gap/overlap: There is a 19bp gap (upstream). If another reasonable start site is used to narrow the gap, a -23 overlap occurs, still remains under the 30bp threshold. /note=Phamerator: The gene is found in pham 20561 as of 01/13/2022. All phages of the pham were compared. It is conserved in other members within S cluster, such as Corazon and Beelzebub phages. The function call of this gene is not listed for this phage. /note=Starterator: Start site number 1 was conserved in 12/14 non-draft phage genes. Start site number 1 is @ 20889 coordinates. /note=Location call: This may be a real gene on above information and values and start site 1 should be used instead of the auto annotated start site. /note=Function call: The phagesdb blasts displayed tape measure protein as function. E-values were significant and were 2e-82 & 2e-81 for evidence calls. 100% identities were observed for calls with e-value of 0.0 but for evidence calls, the identities were 31% and 30% respectively and similarly for other results. For NCBIp blast, this trend was also found. The first results below 10^-6 e-values were evidence calls with 32% & 30% for identities with 3e-97 & 8e-95 e-values. From all data collected, the function is a tape measure protein. HHpred blasts, Phagesdb blasts and NCBI blasts support that this protein is the most likely due to high % coverage, low E-values listed, high probability, & % identity. /note=Transmembrane domains: There are no TMDs predicted from TMHMM or TOPCONS. The absence of TMDs supports that the function of this gene is tape measure protein, a more specific name & of which is an inner membrane protein. Therefore, this is not a membrane protein. /note=Secondary Annotator Name: Torres Espinosa, Michael /note=Secondary Annotator QC: I have QC`d this annotation, and I agree with the primary annotator. CDS 26438 - 27400 /gene="50" /product="gp50" /function="minor tail protein" /locus tag="Clarkson_50" /note=Original Glimmer call @bp 26438 has strength 10.47; Genemark calls start at 26438 /note=SSC: 26438-27400 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.702, -3.2200995065294116, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Mycobacterium phage MosMoris] ],,YP_009031557,100.0,0.0 SIF-HHPRED: Distal Tail Protein, gp58; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_BD,92.8125,99.9 SIF-Syn: The upstream gene is tape measure protein (pham 20561), downstream gene is Minor tail protein (pham 97046). This is similar to other cluster S phages including Beelzebub and Blackbeetle. /note=Primary Annotator Name: Villarreal, Alexia /note=Auto-annotation: Glimmer and Genemark Start sites are both at 26438. Start codon is ATG. /note=Coding Potential: Coding Potential of this ORF is only on the forward strand, which indicates it is a forward gene. Coding Potential is indicated by both Genemark Self and Host. /note=SD (Final) Score: SD score is -3.220 and is the best score available which is a good indication this is the best start site. /note=Gap/overlap: Small overlap of 1 is good standing and is preferred as it falls under 50 bp range. /note=Phamerator: pham:95315. Date 01/12/2022. It is conserved; found in phages Beelzebub_107, Blackbeetle_104, Corazon_97. /note=Starterator: Start site 52 in Starterator was manually annotated in 288/627 non-draft genes in this pham. Start 52 is at 26438 in Clarkson. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Confidently confirm this gene’s start site at 26438. Would choose to keep this state site for the data suggested by guidelines regarding coding potential, same starting site predictions, synteny with other phage data, lack of gap/small overlap, probable start sequence of ATG, as well as good Final scores and z-score. Based on the above evidence, this is a real gene and the most likely start site is 58282. /note=Function call: According to the data obtained by PhagesDB Blastp and NCBI Blastp, suggested function of the gene is a minor tail protein as there is evidence with high query coverage, high percentage of identity (100% and >45%) as well as extremely low E-values of 0 and 3e-82. /note=Transmembrane domains: TMHMM or TOPCONS both do not predict any TMDs. Therefore, it is not a membrane protein. /note=Secondary Annotator Name: Taylor, Amaya /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. /note=Secondary Annotator Name: Torres Espinosa, Michael /note=Secondary Annotator QC: I have QC’ed this annotation. Please verify that you have not pasted information from another gene into these notes. The reported start and stop sites are incorrect. This mistake shows up through the entirety of your notes since you continuously refer to a different start and stop site. You have reported conservation in Beelzebub_107, Blackbeetle_104, and Corazon_97. However, this is also incorrect. Please fix spelling mistakes as well, such as "Beelebub" to "Beelzebub". CDS 27400 - 29112 /gene="51" /product="gp51" /function="minor tail protein" /locus tag="Clarkson_51" /note=Original Glimmer call @bp 27400 has strength 11.41; Genemark calls start at 27400 /note=SSC: 27400-29112 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Mycobacterium phage Tesla]],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.81, -3.1358149394540633, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Mycobacterium phage Tesla]],,AVE00795,99.8246,0.0 SIF-HHPRED: Protein gp18; NP_465809.1, prophage tail protein gp18, Structural Genomics, Joint Center for Structural Genomics, JCSG, Protein Structure Initiative; HET: MSE, MLY; 1.7A {Listeria monocytogenes EGD-e},,,3GS9_A,93.8596,98.9 SIF-Syn: minor tail protein, upstream gene is minor tail protein and downstream gene is an endonuclease. Phages Raela and RedRaider_77 show this order of gene function as well. /note=Primary Annotator Name: West, Julie /note=Auto-annotation: Glimmer and Genemark. Both programs agree on a start site of 27,400 for the gene. /note=Coding Potential: Coding potential exists in this ORF in the forward direction and is indicated by host and self-trained GeneMark. This start covers all coding potential. /note=SD (Final) Score: -3.136. This is the best Final score out of the other suggested scores. /note=Gap/overlap: There is a 1 bp overlap, suggesting this gene is part of an operon. /note=Phamerator: pham: 95768. Date:01/14/2022. The pham is conserved and found in 1113 other phage, including several of the S cluster. /note=Starterator: Start site 110 at 27400 was the most manually annotated (18 MA`s) site out of the candidate start sites for this gene. This agrees with the start at 27400 called by Glimmer and GeneMark. /note=Location call: Altogether, evidence indicates that the most probable start site is 27,400. /note=Function call: minor tail protein. Most of the phagesDB BLASTp hits gave the function of minor tail protein, and several had an e value of 0. ⅘ NCBI BLAST hits also stated minor tail protein as their function, each with 100% coverage, and 99%+ identity. Additionally, there were many strong function calls for tail domains given by HHpred (recorded hits have an e value of 10e-6 or lower, several with probability greater than 90%, and at least 89% coverage). /note=Transmembrane domains: No TMD predicted by TmHmm or TOPCONS. /note=Secondary Annotator Name: Vazquez, Gilda /note=Secondary Annotator QC: I have reviewed and QC the annotation and I agree with the Primary Author. CDS 29146 - 29619 /gene="52" /product="gp52" /function="HNH endonuclease" /locus tag="Clarkson_52" /note=Original Glimmer call @bp 29146 has strength 6.47; Genemark calls start at 29176 /note=SSC: 29146-29619 CP: yes SCS: both-gl ST: SS BLAST-Start: [HNH endonuclease [Mycobacterium phage VasuNzinga] ],,NCBI, q1:s1 100.0% 1.37393E-111 GAP: 33 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 0.95, -6.920054240625416, no F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Mycobacterium phage VasuNzinga] ],,AYB70686,100.0,1.37393E-111 SIF-HHPRED: HNH homing endonuclease; HNH catalytic motif, Helix-turn-helix DNA binding domain, protein-DNA complex, DNA binding protein-DNA COMPLEX; HET: EDO; 2.92A {Bacillus phage SPO1} SCOP: d.4.1.3, d.285.1.1,,,1U3E_M,78.9809,99.8 SIF-Syn: HNH endonuclease, upstream gene is a minor tail protein and downstream gene is an NKF gene which belongs to pham 87827. Phages Beelzebub_55 and Blackbeetle_51 show this order of gene function as well. /note=Primary Annotator Name: Abana, Juana /note=Auto-annotation: Glimmer calls the start at 29146 and Genemark calls the start at 29176 /note=Coding Potential:The coding potential in this ORF is predominantly seen in the forward strand which shows that this is a forward gene. There is coding potential found in both the Host GeneMark and the Self GeneMark. However, only the Self GeneMark includes all of the coding potential from the chosen Glimmer start site. /note=SD (Final) Score: -6.920 is the final score on PECAAN and the z-score is 0.95 not the best score but are reasonable. /note=Gap/overlap: There is a gap of 33 bp upstream of my gene which is too small for a gene to be inserted. Moreover, downstream of my gene there is a gap of 46 which is also too small for a gene to be inserted. /note=Phamerator: The pham number as of January 18, 2022 is 96673. The gene is conserved in phages Raela and Pringar which are all part of cluster S. /note=Starterator: There are 684 non-draft members in pham 96673, and 9 of the 684 non-draft members call start site 239, which correlates to a start site of 29146 bp for Clarkson. Other S phages in this gene call a gene with length ~474 very consistently. /note=Location call: Based on the information above this is a real gene and the most likely start is 29146. /note=Function call: HNH endonuclease. The top three phagesdb BLAST hits have the function of HNH endonuclease and all have an e-value of 4e-91. In NCBI BLAST hits also have the function of HNH endonuclease, for instance the first hit has 100% coverage, 100% identity, and e-value of 1.37393e-111. HHpred also had a hit for HNH endonuclease with a 99.8% probability, 78.9809% coverage, and an e-value of 1.5e-20. CDD had one relevant hit in which it said that a conserved domain was part of the HNH supper family and has an e-value of 1.20e-05. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Villarreal, Alexia. Vazquez, Gilda /note=Secondary Annotator QC: Agree with the conclusions stated above in that this is a real gene at the indicated start site considering the data regarding coding potential, synteny, reasonable scores, and starterator program. I have completed the second QC, and agree with the primary annotator in regards to the location call, function call, synteny and evidence boxes for function. CDS 29656 - 30003 /gene="53" /product="gp53" /function="hypothetical protein" /locus tag="Clarkson_53" /note=Original Glimmer call @bp 29665 has strength 15.2; Genemark calls start at 29665 /note=SSC: 29656-30003 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_JOIEB_53 [Mycobacterium phage JoieB] ],,NCBI, q1:s1 100.0% 1.08134E-72 GAP: 36 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.179, -6.46371945197481, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_JOIEB_53 [Mycobacterium phage JoieB] ],,QFP94191,100.0,1.08134E-72 SIF-HHPRED: SIF-Syn: NKF, upstream gene is in pham 98484 and is called as an HNH endonuclease, downstream gene is in pham 10336 and is called as NKF. Both Beelzebub and Blackbeetle have upstream genes in pham 98484 called as an HNH endonuclease and have downstream genes in pham 10336. The functions of the downstream gene in both Beelzebub and Blackbeetle have not been called however. /note=Primary Annotator Name: Araque, Colette /note=Auto-annotation: Both, Glimmer Start: 29665, Glimmer Start Codon: ATG, GeneMark Start: 29665, GeneMark Start Codon: GTG /note=Coding Potential:The gene has reasonable coding potential predicted within the putative ORF; the chosen start site covers all this coding potential /note=SD (Final) Score: The score is -6.464, which is not the best score, but is still reasonable to suggest a credible ribosome binding site /note=Gap/overlap: There is a reasonable 39bp gap between the start site and the upstream gene. This start site is longest ORF. The length of this gene is acceptable (348 bp). /note=Phamerator: As of 1/13/2021 this gene is found in Pham 87827. This pham is present in other members of the same cluster (S) which my phage belongs to. Some phages used for comparison are Beelzebulb, Gattaca, and LittleLaf. /note=Starterator: Yes, there is a conserved start site choice: 3. 13/15 non-draft genes called start site 3. /note=Location call: Based on the gathered evidence, this gene is a real gene with a start site of 8@29656. /note=Function call: The top 2 NCBI BLASTp hits and the top 2 phagesdb Hits, sorted by E-value, suggested that this is a real gene, but that it has no known function as of yet, with high query coverage (>99.6%), high % identity (>99%), and low E-values (0). CDD and HHpred hits were not helpful in determining function. I do not have enough information to make a conclusion on this gene`s function. /note=Transmembrane domains: Function call: NFK /note=Transmembrane domains: None called by either TMHMM or TOPCONS. This is not a transmembrane protein. /note=Secondary Annotator Name: West, Julie /note=Secondary Annotator QC: Reconsider start site of 29.656. Seems to fit within coding potential, has better z and final values than the start given by glimmer and genemark, and shortens the preceding gap. Also, starterator shows more manual annotations (12 vs 2 for 29,665) for the start site at 29,656 than for 29,665. /note=For the start site at 29,665, there is a 45 bp gap. For the alternative site at 29,656 which should be reconsidered, there is a 36 bp gap (typo in gap/overlap) line. /note=Secondary Annotator Name: Villarreal, Alexia /note=Secondary Annotator QC: Would consider going over the notes left by first QC in regards to reconsideration of start site. CDS 30029 - 30706 /gene="54" /product="gp54" /function="hypothetical protein" /locus tag="Clarkson_54" /note=Original Glimmer call @bp 30029 has strength 11.59; Genemark calls start at 30029 /note=SSC: 30029-30706 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FH33_gp051 [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 9.39073E-158 GAP: 25 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.307, -5.344996894520007, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FH33_gp051 [Mycobacterium phage MosMoris] ],,YP_009031561,100.0,9.39073E-158 SIF-HHPRED: SIF-Syn: NKF, upstream gene is Pham 87827 and has NKF, downstream gene is Pham 95921 and has NKF, just like in phages Corazon, LittleLaf, and MosMoris /note=Primary Annotator Name: Bartolome, Alexandra /note=Auto-annotation: Glimmer and GeneMark both call the same start site at 30029, with a start codon of ATG. /note=Coding Potential: Glimmer and GeneMark both show coding potential that is all included in the predicted start site at 30029. The coding potential is present in only the forward stand of this ORF, indicating that this is a forward gene. /note=SD (Final) Score: The final score is -5.345, which is not the best but still reasonable. The RBS score is 2.307, which is not the highest but still very favorable. /note=Gap/overlap: The gap is 25 bp, which is still reasonable and creates the longest ORF. There is no coding potential in the gap that suggests the presence of a new gene. /note=Phamerator: As of 01/18/22, this gene is part of pham 10336. It is conserved in phages like Corazon, Gattaca, and Marvin, in Cluster S with Clarkson. /note=Starterator: Start site number 1 is highly conserved. It was manually annotated by 14/14 of the non-draft genes in the same pham. In Clarkson, start site number 1 is at 30029 bp. This is the same start site predicted by Glimmer and Gene Mark. /note=Location call: This likely a real gene based on the collected evidence. The start site is most likely at 30029, which was predicted by Glimmer and GeneMark and manually annotated by all the non-draft genes in the same pham. /note=Function call: NKF. Phagesdb BLAST, NCBI BLAST, HHpred, and CDD hits were not informative and showed no known function for this gene. /note=Transmembrane domains: TMHMM and TOPCONS both do not predict any TMDs, so this gene is not a membrane protein. /note=Secondary Annotator Name: Bovee, Alyson /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 30703 - 32082 /gene="55" /product="gp55" /function="lysin A" /locus tag="Clarkson_55" /note=Original Glimmer call @bp 30703 has strength 8.04; Genemark calls start at 30703 /note=SSC: 30703-32082 CP: yes SCS: both ST: SS BLAST-Start: [lysin A [Mycobacterium phage LittleLaf] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.921, -3.6727816321281046, yes F: lysin A SIF-BLAST: ,,[lysin A [Mycobacterium phage LittleLaf] ],,AYB69917,99.7821,0.0 SIF-HHPRED: d.118.1.1 (A:1-157) N-acetylmuramoyl-L-alanine amidase PlyG {Anthrax bacillus (Bacillus anthracis) [TaxId: 1392]},,,d1yb0a1,31.8083,99.6 SIF-Syn: Lysin A, upstream gene (Pham 10336) is NKF and downstream gene (Pham 5931) is NKF, just like RedRaider77 and Beelzebub /note=AF: lots of evidence for lysin A. /note=Primary Annotator Name: Bharadwaj, Shreya /note=Auto-annotation: GeneMark and Glimmer both call start sites of 30703, with a start codon of ATG. /note=Coding Potential:The coding potential of this ORF is on the forward strand, which means that this is a forward gene. I found coding potential in both GeneMark Self and Host. The start site contains all of the coding potential. /note=SD (Final) Score: The final score was -3.673 and this was the best final score according to PECAAN. The z-score was 2.921 which was the best z-score. /note=Gap/overlap:-4, which is well within the guidelines and indicates an overlap of 4 bp which is fairly good. /note=Phamerator: 95921 was the pham of the gene as of 1/17/22. It has also been found in phages Blackbeetle_54 and MosMoris_52. /note=Starterator: Start site 58 was manually annotated 159/300 times in non-draft genomes. The start site 58 corresponds to start coordinate 30703 in Clarkson which agreed with the Glimmer and GeneMark auto-annotation. It has been manually annotated 14 times. /note=Location call: Based on the evidence specified above, this is a real gene with a start site of 30703. /note=Function call: NKF, My ultimate decision is that the function call is NKF. Both the phagesdb and NCBI BLAST gave semi-strong evidence to suggest that the function was Lysin A. However, all of the phages in the Cluster S had e-values of 0 which was not very convincing. However, the CDD and HHpred hits gave evidence for peptidoglycan recognition proteins. However, peptidoglycan recognition protein is not an approved function while LysM-like peptidoglycan binding protein is an approved function. LysM-like peptidoglycan binding protein was not a function for any of the hits on any of the data bases. Therefore, I conclude that the function is unknown. /note=Transmembrane domains: No transmembrane domains found /note=Secondary Annotator Name: Abana, Juana; Bovee, Alyson (I have QC’ed this location call and agree with the first annotator-- things to fix on excel) /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator however, some information needs to be added/fixed. The details are on the google spread sheet. CDS 32079 - 32354 /gene="56" /product="gp56" /function="hypothetical protein" /locus tag="Clarkson_56" /note=Original Glimmer call @bp 32079 has strength 6.58; Genemark calls start at 32103 /note=SSC: 32079-32354 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein FH33_gp053 [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 2.66762E-58 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.435, -3.8540125308361968, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FH33_gp053 [Mycobacterium phage MosMoris] ],,YP_009031563,97.8022,2.66762E-58 SIF-HHPRED: SIF-Syn: My gene has NKF, and the gene downstream (55) is functions as lysin B and is in 20226. This is compared to LittleLaf which also has lysin B downstream which is in pham 20226 (as of 1/31/22). The gene upstream of this gene also has NKF, though they are both in pham 95921. /note=Primary Annotator Name: Bovee, Alyson /note=Auto-annotation: Glimmer shows the correct start site, while GeneMark does not. /note=Coding Potential: The ORF shows the start and stop sites in the correct locations with high coding potential. /note=SD (Final) Score: The final score is -3.854, which is the best final score and indicative of a good start site. /note=Gap/overlap: The gap is -4, meaning 4 bp overlap, which is indicative of an operon. This provides good evidence of a proper start site /note=Phamerator: Pham: 5931 as of 1/13/22, it is conserved and also found in Beelzebub and Corazon /note=Starterator: Start site 3 is located in 14/14 of the non-draft genes. Start site 3 is at 32079 in Clarkson, as also called by Glimmer, which is good evidence this is the correct start site /note=Location call: 32079 is the best start site based on the guiding principles and evidence shown by Phamerator and Starterator. Therefore, the gene should be kept. /note=Function call: NKF. Based off the evidence on BLAST and the presence of no transmembrane proteins, the best call should be no known function. PhagesDB first two calls were Gattaca and VasuNzigna with e values of -49, however there is a function unknown. /note=Transmembrane domains: There are no transmembrane domains as shown by TMHMM and Topcon. Thus, there is still NKF. /note=Secondary Annotator Name: Araque, Colette /note=Module 9: Abana, Juana /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. /note=Module 9: I agree with the primary annotators function call however, I think that the function call should be more specific. For instance, mention that Phagesdb BLAST had 2 hits (include phage name and their e-values). Also mention that HHPRED and CDD were not helpful in calling the function. In synteny box include upstream gene. CDS 32354 - 33508 /gene="57" /product="gp57" /function="lysin B" /locus tag="Clarkson_57" /note=Original Glimmer call @bp 32354 has strength 11.5; Genemark calls start at 32354 /note=SSC: 32354-33508 CP: yes SCS: both ST: SS BLAST-Start: [lysin B [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.681, -3.4030855957987485, no F: lysin B SIF-BLAST: ,,[lysin B [Mycobacterium phage MosMoris] ],,YP_009031564,100.0,0.0 SIF-HHPRED: Gene 12 protein; alpha/beta sandwich, CELL ADHESION; 2.0A {Mycobacterium phage D29},,,3HC7_A,55.2083,99.9 SIF-Syn: The function of this gene is lysin B, and demonstrates conserved synteny with its downstream gene of holin, of which in Clarkson correspond to genes 55 and 56 respectively. This is also depicted in the phage Beelzebub, of which the gene of the same pham as this gene (Pham 20226) corresponds to gene 60 and is also a lysin B gene. The downstream gene of gene 60 in Beelzebub corresponds to the same pham as the downstream gene of this gene in Clarkson, pham 22732, of which both genes are annotated to be for holin. /note=Primary Annotator Name: Carreon, Justin /note=Auto-annotation: Start site called by both Glimmer and GeneMark as position 32354, Start is an ATG. /note=Coding Potential: There is reasonable coding potential within the ORF for the gene, and the suggested reading frame does include all coding potential. There is essentially only coding potential in one reading frame in the forward direction, however, there is a thin spike in coding potential centered around position 32650 in the reverse strand in the Host-Trained GeneMark, although it does not extend further into the gene. /note=SD (Final) Score: -3.403. This is still reasonable as it is among the highest amongst other start site candidates. It is the 3rd highest, being beat by two other proposed start sites that have gaps between the gene and its preceding gene in the hundreds. It has the second best Z-score of 2.681. /note=Gap/overlap: Overlap of 1 bp. This is reasonable, as some other viable candidates have poor RBS scores, and the candidates with better RBS scores have gaps in excess of 700 bp. /note=Phamerator: Pham 20226 as of 1/14/2022 at 1101 hours PT. The gene is conserved in other members of the cluster such as Blackbeetle and Beelzebub, but it is also conserved in members of other clusters such as 32HC (Z) and Indlovu (B). This gene has no function auto-called, but other members of the pham are listed as being the gene for lysin B. /note=Starterator: Of the 17 non-draft genes in the pham, 15 members call start site 4. In Clarkson, start site 4 correlates to position 32354. /note=Location call: Given the coding potential for the reading frame, the conservation of the gene across members of the cluster and others, and the conservation of the start site in the gene between members of its pham from phages in the same cluster, this gene is real and starts at bp 32354. Starterator, Glimmer, and GeneMark agree on the start location. /note=Function call: Lysin B. The rationale for the function is a result of its similarity and conservation amongst a significant number of bacteriophages both within and outside of Clarkson’s cluster, indicating that it is a vital protein necessary for survival, reproduction, and/or infection. Furthermore, in manually annotated genomes of phages both within and without Cluster S, the gene is called for that of Lysin B. According to CDD, a region of the predicted protein corresponds to the conserved domain of Cutinase, an enzyme that is involved in the degradation of cellular walls high in fatty acid content, such as those found in mycobacterium. Further analysis using HHpred calls a domain’s similarity to a region of 3HC7, a lysin B protein found in the Mycobacterium phage D29, and specifically, the region corresponds to the domain responsible for adhesion of the enzyme to the cellular wall, facilitating catalytic activity. /note=Transmembrane domains: There were no transmembrane domains predicted for this gene, which is reasonable given it is a lysin B, whose purpose is to exit the cell into the space between the peptidoglycan layer and the mycolic acid layer to cut the mycolic acid free from the peptidoglycan, allowing for cellular lysis. /note=Secondary Annotator Name: Bartolome, Alexandra /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. /note=New Secondary Annotator Name: Araque, Colette /note=New Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 33528 - 33818 /gene="58" /product="gp58" /function="holin" /locus tag="Clarkson_58" /note=Original Glimmer call @bp 33528 has strength 10.27; Genemark calls start at 33528 /note=SSC: 33528-33818 CP: yes SCS: both ST: SS BLAST-Start: [holin [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 2.90059E-61 GAP: 19 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.921, -2.7653702713535186, yes F: holin SIF-BLAST: ,,[holin [Mycobacterium phage MosMoris] ],,YP_009031565,100.0,2.90059E-61 SIF-HHPRED: SIF-Syn: Upstream gene is Lysin B as in numerous other phages. /note=Primary Annotator Name: Julia Chang /note=Auto-annotation: Both Glimmer and GeneMark both call the start site at 33528bp with the codon ATG. /note=Coding Potential: There is high coding potential found both on self and host within Glimmer and GeneMark. Coding potential is on the forward strand only, suggesting that it is a forward gene. /note=SD (Final) Score: The SD score is best at -2.765, and the z-score is best at 2.291. /note=Gap/overlap: The gap with the upstream gene is reasonable at 20bp. Downstream of the gene, there is a large gap of 407bp but no coding potential and has shared synteny with several other non-draft phages. /note=Phamerator: The Pham number as of 01/17/2022 is 22372. The gene is also conserved in Tesla and MosMorris, which is also in the same cluster as Clarkson (S cluster). /note=Starterator: Start site 4 is conserved among members of this pham at position 33528bp. 14 of the 16 non-draft phage annotations called this start site. /note=Location call: Based on the gathered evidence, this is a real gene because of its full coding potential and being conserved in both Pharmerator and Starterator. The start site is at 33528bp. /note=Functional call: Holin; The top two hits on phagesDB BLAST had a called function for holin with a high score of 187 and small e-value of 6e-48. Two of the four hits found on NCBI BLAST also showed that the gene had a function of holin, also having very small e-values from 6e-16 to 3e-61. CDD and HHpred hits were not informative. /note=Transmembrane domains: TMHMM and SOSUI each predicted two TMDs. /note=Secondary Annotator Name: Bharadwaj, Shreya /note=Secondary Annotator QC:I have QC`ed this location call and agree with the first annotator /note=Secondary Annotator Name: Bartolome, Alexandra /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. /note=Notes: (1) Phamerator: don’t forget to mention if the other phages you compared yours to are in the same cluster as your phage! also i think there might be a typo, MosMoris instead of MasMorris (2) Location call: include start site location (3) Synteny box: upstream gene has function of lysin B, include function of downstream gene CDS 33968 - 34228 /gene="59" /product="gp59" /function="hypothetical protein" /locus tag="Clarkson_59" /note= /note=SSC: 33968-34228 CP: yes SCS: neither ST: NI BLAST-Start: [hypothetical protein SEA_BLACKBEETLE_58 [Mycobacterium phage Blackbeetle] ],,NCBI, q1:s1 100.0% 1.34467E-53 GAP: 149 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.234, -4.2529836510151195, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BLACKBEETLE_58 [Mycobacterium phage Blackbeetle] ],,QBQ71348,100.0,1.34467E-53 SIF-HHPRED: SIF-Syn: /note=Added gene. CP on GM-self only. Gene found in several other S phages. Chose longest start site w/ reasonable gap and RBS score. CDS 34225 - 34734 /gene="60" /product="gp60" /function="membrane protein" /locus tag="Clarkson_60" /note=Original Glimmer call @bp 34225 has strength 4.0; Genemark calls start at 34225 /note=SSC: 34225-34734 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FH33_gp056 [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 1.0203E-117 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.264, -4.192397292939198, no F: membrane protein SIF-BLAST: ,,[hypothetical protein FH33_gp056 [Mycobacterium phage MosMoris] ],,YP_009031566,100.0,1.0203E-117 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Charton, Chris /note=Auto-annotation: Glimmer and Genemark both call for the start at 34225. /note=Coding Potential: There is strong evidence for coding potential in the forward direction using the first reference frame. This evidence is seen in both GeneMark Self and Host. /note=SD (Final) Score: -4.192. This site has the best Final Score and second best Z-score (2.264 vs 2.723) of the possible start sites. /note=Gap/overlap: 406. This is quite a large gap, but this start site represents the smallest possible gap. There is some modest coding potential in the unmarked upstream region, and related phages, notably JoieB, have a gene in this upstream region. It is possible an unmarked gene exists here. /note=Phamerator: The Pham as of 1/12/22 is 20204. This gene is found in other phages of cluster S such as JoieB and Marvin. /note=Starterator: Start site 1 is manually annotated in 14 of 14 phages of this pham, all of cluster S. This start site corresponds with a start @34225 for Clarkson. This site is also called for by Glimmer and GeneMark. /note=Location call: Based on the evidence, this is a real gene with a strongly supported start site @34225. /note=Function call: Membrane protein. There are no known functions in either the PhagesDB or NCBI databases. However, we have evidence there are real TMDs associated with this protein. /note=Transmembrane domains: TMHMM calls for 4 TMDs. TOPCONS itself calls for 1, other algorithms Philius and PolyPhobius call for 4 TMDs. With this evidence we conclude that this is a membrane protein. /note=Secondary Annotator Name: Carreon, Justin /note=Secondary Annotator QC: I have QC`d this annotation and agree with the location call. CDS 34721 - 35059 /gene="61" /product="gp61" /function="membrane protein" /locus tag="Clarkson_61" /note=Original Glimmer call @bp 34721 has strength 8.33; Genemark calls start at 34721 /note=SSC: 34721-35059 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FH33_gp057 [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 5.26249E-73 GAP: -14 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.255, -5.055463787873258, no F: membrane protein SIF-BLAST: ,,[hypothetical protein FH33_gp057 [Mycobacterium phage MosMoris] ],,YP_009031567,100.0,5.26249E-73 SIF-HHPRED: SIF-Syn: Membrane protein (pham 16220), downstream gene is hydrolase (pham 55313), upstream gene is membrane protein (function not seen but in the same pham - 2024), just like in phage Corazon and MosMoris. /note=Primary Annotator Name: De Schutter, Elena /note=Auto-annotation: Glimmer and GeneMark both call the gene at a start site of 34721. /note=Coding Potential: The coding potential covers the entire gene for both GeneMark Self and Host and is very high. /note=SD (Final) Score: The final score is -5.055 and the z score is 2.255, these are very good values and are some of the highest on the list which gives use confidence that this is the right start site. /note=Gap/overlap: The gap is -14bp which is rather large, but the other gap with good values is -8 (still pretty big) and the only start site with a reasonable gap (16) does not have good SD scores or z values. /note=Phamerator: pham 16220 as of 02/04, this pham has only S cluster members like JoieB and Beelzebub (no function call) /note=Starterator: The most called start site is #1 and it was called 14/14 times for the non-draft genomes. This corresponds to start site 34721 and was also predicted by Glimmer and GeneMark. /note=Location call: Because starterator provides such strong evidence, we are calling 34721 as the start site. /note=Function call: Unknown function, the top hits on phagesDB all have unknown function (5e-56). HHPRED did not have any relevant hits, CDD did not have any hits at all, and NCBI blast only had hits for hypothetical proteins which had good values. /note=Transmembrane domains: There is one transmembrane domain for TMHMM and TOPCONS also calls a domain in one of its programs so we can call this gene as having a membrane domain. /note=Secondary Annotator Name: Carreon, Justin /note=Secondary Annotator QC: With the exception of possible confusion over the use of `overlap of -14bp` contextually, I have otherwise read through the annotation and agree with the location, function, synteny, and transmembrane domain calls. CDS 35069 - 36046 /gene="62" /product="gp62" /function="hydrolase" /locus tag="Clarkson_62" /note=Original Glimmer call @bp 35069 has strength 15.57; Genemark calls start at 35069 /note=SSC: 35069-36046 CP: yes SCS: both ST: NI BLAST-Start: [hydrolase [Mycobacterium phage Beelzebub] ],,NCBI, q1:s1 100.0% 0.0 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.113, -4.442181083645917, no F: hydrolase SIF-BLAST: ,,[hydrolase [Mycobacterium phage Beelzebub] ],,AZF93328,100.0,0.0 SIF-HHPRED: LEVANASE; HYDROLASE, LEVAN; HET: FRU; 1.65A {BACILLUS SUBTILIS},,,4B1L_A,60.6154,95.9 SIF-Syn: /note=Primary Annotator Name: Cho, Emily /note=Auto-annotation: Both Glimmer and GeneMark call 35,069 as the start site. /note=Coding Potential: Reasonable coding potential covered by the start site. /note=SD (Final) Score: -4.442, Fifth best value among 14 candidates. /note=Gap/overlap: 9, in acceptable range of gap (50bp). /note=Phamerator: Found in Pham 55313 as of 13th Jan, 2022. The pham is in other members of the cluster but mostly found in different cluster (L, M). The function called for S cluster was hydrolase. For other clusters (T, L, M), it was minor tail protein. /note=Starterator: Start site number 3 (bp 35069) a reasonable start site that is conserved in other pham members, although it was not the most called start number. The most called site was 6 which was called by 35 out of 99 genomes but is a relatively low percentage, and start number 3 also seems to be conserved in other genomes. Also, start site 3 is called by all S cluster phages (16 total including Clarkson) /note=Location call: The gene highly seems to be real based on the starterator (which showed most conserved start site) and phamerator with a start site of 35069 and good coding potential. The gene also shows synteny with many other genomes like Corazon and Pringar. /note=Function call: Top two hits from BLASTp from PhagesDB sorted by e-value (0.0) suggested function to be hydrolase, and it showed perfect query coverage and % identity. Top two hits from BLASTp from NCBI also sorted by e-value (0.0) suggested function to be hydrlase, and it showed perfect or nearly perfect query coverage and % identity. Although some hits showed function calls of minor tail proteins, they had low frequency of being called, low identities and coverage, and they were in different clusters. The NCBI and BLASTp showed that the expected function would be hydrolase based on amino acid sequence, and HHpred showed that our protein include parts that strucutrally function as glycoside hydrolase. On HHpred, The e-value is also much worse than the standard, (<10e-3). However, the probability and coverage was acceptable, and the e-value was the best values among the hits for the hits of glycoside hydrolase. Therefore, the function seems to be glycoside hydrolase. /note=Transmembrane domains: No TMDs reported. Given the function of the enzymes, they are not known to be transmembrane proteins, and having TMDs do not seem necessary for their function. Therefore, the absence of the TMDs make sense. /note=Secondary Annotator Name: Charton, Chris /note=Secondary Annotator QC: I agree with this start site. Add in note that Start site 3 is called by all S cluster phages (16 total including clarkson), giving strong support to use this start site. /note=Secondary Annotato Name: Chang, Julia /note=Secondary Annotator QC: I have QCed this location call and agree with the primary annotator. Note: You can probably shorten the description for the Function Call and touch upon the main points as to why it is NKF. Look at the PECAAN notes template for more concise descriptions. For the transmembrane domains section, also follow the annotation guidelines and use the evidence from databases like TMHMM and TOPCONS to explain why there are no TMDs. For the Pharmerator section, I would remove the information about the function call. For the location call section, add in examples of what other phages it shares synteny with. CDS 36090 - 36383 /gene="63" /product="gp63" /function="helix-turn-helix DNA binding domain" /locus tag="Clarkson_63" /note=Original Glimmer call @bp 36090 has strength 6.47; Genemark calls start at 36090 /note=SSC: 36090-36383 CP: yes SCS: both ST: SS BLAST-Start: [helix-turn-helix DNA binding protein [Mycobacterium phage Raela]],,NCBI, q1:s1 100.0% 1.25234E-60 GAP: 43 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.482, -6.661315563983987, no F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding protein [Mycobacterium phage Raela]],,AZS06826,98.9691,1.25234E-60 SIF-HHPRED: Regulatory protein cox; helix-turn-helix, DNA binding, VIRAL PROTEIN; 2.401A {Enterobacteria phage P2},,,4LHF_A,59.7938,98.9 SIF-Syn: helix-turn-helix DNA binding domain protein; upstream gene is hydrolase (pham 55313), downstream gene is NKF (pham 5416), just like in other cluster S phages like Corazon, JoieB, Beelzebub, MosMoris, Poise, Pringar, Raela, Tesla, and RedRaider77 /note=Primary Annotator Name: Cini, Victoria /note=Auto-annotation: Glimmer and GeneMark. Both call ATG start site at 36090. /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF. In the third panel for both H-GeneMark and S-GeneMark maps, we observe the highest coding potential, indicating that this is a forward gene. We observe high coding potential throughout the entirety of the gene sequence, and the chosen start of 36090 captures all the coding potential. /note=SD (Final) Score: Start site (36090) has one of the worst final scores (-6.661) and z-scores (1.482) compared to other potential starts on PECAAN, however, it has the smallest gap (43bp) compared to other potential starts, which have gap sizes upward of 127bp. /note=Gap/overlap: The gap with the upstream gene is a little large at 43 bp. However, this gene, as well as its upstream gap, are conserved in several related phages and there is no coding potential in the gap, which means no gene should be added. /note=Phamerator: Pham 28258 has 16 members with 14 non-draft members (01/13/2021); gene is conserved and present in other cluster S phages like JoieB and Corazon. /note=Starterator: Start number 1, corresponding to the basepair coordinate 36090, is the most annotated start in Starterator; this site has manual annotations in 14/14 non-draft genes in the pham. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 36090. /note=Function call: helix-turn-helix DNA binding domain. The top 8/14 phagesdb BLAST hits (e-values = 10^-49, 100% identity, 100% coverage) come from other cluster S phages and have the function helix-turn-helix DNA binding domain assigned to them. Of the 3 strongest NCBI BLAST hits, 2 have no known function (NFK) associated with them, and 1 is associated with the function helix-turn-helix DNA binding domain. The graphic summary on NCBI BLAST shows that putative conserved domains have been detected with specific hits for HTH_17m, which is a DNA-binding helix-turn-helix domain. HHpred had two strong hits: one for a helix-turn-helix DNA binding domain (from bacteria) with 99.32% probability, 62.89% coverage, and e-value of 10^-11 and one for a helix-turn-helix DNA binding domain (from phage) with 98.99% probability, 59.79% coverage, and e-value of 10^-8. CDD had a specific hit for HTH_17, which is a helix-turn-helix domain-containing protein with superfamily HTH_MerR-SF. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Dr. Friese /note=Secondary Annotator QC: Dr. Freise qced my gene call and reviewed my PECAAN notes and said I could go forward with auto-annotation. /note=Secondary Annoator Name: Chris Charton /note=Secondary Annotator QC: I have QCed this gene and agree with location and function call. CDS 36397 - 36777 /gene="64" /product="gp64" /function="hypothetical protein" /locus tag="Clarkson_64" /note=Original Glimmer call @bp 36403 has strength 5.2 /note=SSC: 36397-36777 CP: yes SCS: glimmer-cs ST: SS BLAST-Start: [hypothetical protein FH33_gp060 [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 1.12989E-85 GAP: 13 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.421, -6.021325347704108, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FH33_gp060 [Mycobacterium phage MosMoris] ],,YP_009031570,100.0,1.12989E-85 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: De Schutter, Elena /note=Auto-annotation: Glimmer shows the presence of a start site even if GeneMark does not. The start codon that is called is 36403 and it’s an ATG codon. /note=Coding Potential: There is coding potential present even if it is rather low in the host-trained GeneMark. Coding potential in both graphs covers the start site. /note=SD (Final) Score: There are two final scores: the start 35424 (GTG) which has a better final score (-4.193) than the one chosen by Glimmer (36403, ATG) with a SD score of -6.021. /note=Gap/overlap: For ATG the gap is +13 which is plausible, the GTG has a gap that is too large with +40bps. For both the length of the gene is reasonable and there are no large gaps. /note=Phamerator: The pham is 5416 as of 01/13/2022. Within this pham there are only phages from the S cluster like Tesla and Gattaca. /note=Starterator: The start site is conserved among the members of this pham. The auto-annotated start site (#2 - 36403) is conserved in all genes within this pham but our gene also contained the most annotated start site (#1 - 36397). #2 is never called while #1 is called in 10/14 non-draft genomes. /note=Location call: There is good coding potential in either case with the start codon being included in it. Since it’s well conserved in phamerator we can call that it is a real gene. The most likely start site is the one most called in Starterator (36397) as it corresponds to the longest open reading frame and is most conserved across the members of the pham. /note=Function call: The top hits in BLASTp were all of unknown function with good e-value (around 1e-61) and with high scores (around 230). In addition, HHpred and CDD did not provide any further information as none of them had relevant hits. Thus the function is currently unknown. /note=Transmembrane domains: No transmembrane domains. /note=Secondary Annotator Name: Cho, Emily /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator that the start site is 36397. CDS 36722 - 37060 /gene="65" /product="gp65" /function="membrane protein" /locus tag="Clarkson_65" /note=Original Glimmer call @bp 36848 has strength 6.29; Genemark calls start at 36722 /note=SSC: 36722-37060 CP: yes SCS: both-gm ST: SS BLAST-Start: [hypothetical protein FH33_gp061 [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 6.3247E-74 GAP: -56 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.979, -5.2484617369160205, no F: membrane protein SIF-BLAST: ,,[hypothetical protein FH33_gp061 [Mycobacterium phage MosMoris] ],,YP_009031571,100.0,6.3247E-74 SIF-HHPRED: SIF-Syn: The upstream gene is NKF (pham 5416), the downstream gene is uncalled (pham 11018). This is similar to other cluster S phages including Corazon, JoieB, Beelzebub, MosMoris, Poise, Pringar, Raela, Tesla, and RedRaider77 /note=Primary Annotator Name: Enos, Alexander /note=Auto-annotation: Glimmer Start: 36848, GeneMark Start: 36722. 36722 seems more likely. Start codon GTG /note=Coding Potential: The gene does have coding potential /note=SD (Final) Score: 1.979. Although this is not the best Final Score, the other start site with a better score had a very large gap, over 200 bp. /note=Gap/overlap: -56. Although this gap is somewhat large and negative, the other gaps are large and positive. 70 bp was the smallest and did not cover all coding potential. /note=Phamerator: It is in pham 56952. Date: 1/14/22. It is conserved, found in Beelzebub_68, and Blackbeetle_64 /note=Starterator: The start number called the most often in the published annotations is 3, it was called in 13 of the 15 non-draft genes in the pham. Start: 3 @36722 has 13 MA`s /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 36722. /note=Function call: membrane protein. The top three hits on phagesdb BLAST and NCBI BLAST indicated that there was no known function. Each of these had E-values of 3e-60 or less strongly indicating NKF. Additionally, HHpred and CDD indicated NKF. However, TMHMM and TOPCONS indicated there were transmembrane domains and therefore we can conclude it is a membrane protein. /note=Transmembrane domains: TMHMM and TOPCONS indicated there were transmembrane domains. TMHMM indicated 2, TOPCONS indicated 1 /note=Secondary Annotator Name: Cini, Victoria, Cho, Emily /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator/ I have QC’ed this location call and function call and agree with the first annotator CDS 37060 - 37386 /gene="66" /product="gp66" /function="helix-turn-helix DNA binding domain" /locus tag="Clarkson_66" /note=Original Glimmer call @bp 37060 has strength 8.38; Genemark calls start at 37060 /note=SSC: 37060-37386 CP: yes SCS: both ST: SS BLAST-Start: [helix-turn-helix DNA binding protein [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 1.17403E-72 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.039, -5.69701758134461, no F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding protein [Mycobacterium phage MosMoris] ],,YP_009031572,100.0,1.17403E-72 SIF-HHPRED: a.35.1.3 (A:1-68) SinR repressor, DNA-binding domain {Bacillus subtilis [TaxId: 1423]},,,d1b0na2,50.9259,98.3 SIF-Syn: The gene downstream is apart of Pham 15208 with a stop site at 38336. It is a RecB-like exonuclease/helicase. The upstream gene is apart of Pham 56,952 at stop site 37060. Its a membrane protein /note=Primary Annotator Name: Fields, Brooke /note=Auto-annotation: Both Glimmer and Genemark call for this gene at ATG start site 37060 /note=Coding Potential: There is substantial evidence of coding potential in the Host- Genemark map and the Self- Genemark map /note=SD (Final) Score: -5.697. It is not the start with the best RBS score however given the other starts gap values and z-scores, start site 37060 appears to be the best one. /note=Gap/overlap: Has an overlap of 1bp with the upstream gene /note=Phamerator: Pham 11018. Has 17 members and 3 are drafts. Compared to Beezlebub and JoieB. /note=Starterator: Start position 1 coordinates with base pair coordinate 37060 which is the most annotated start with 14 manual annotations from all 14 of the nondraft gene. This evidence agrees with Glimmer and Genemark information. /note=Location call: The most likely start site is 37060 /note=Function call: DNA binding protein. Both databases show a strong hit for helix-turn-helix DNA binding proteins that come from cluster S. HHpred provided significant hits from the PDB databases that support the function of a DNA binding domain. /note=Transmembrane domains: Neither TMHMM or TOPCONS predicted any TMD’s therefore it is not a membrane protein. /note= /note=Secondary Annotator Name: De Schutter, Elena & Cini, Victoria (Module 9) /note=Secondary Annotator QC: Please remove the gaps between the different lines and select in the drop down menu whether your chosen start site includes all the coding potential. The lack of coding potential is so interesting! For your SD score, could you please elaborate on it since it seems rather low. You should also include the Z score and a discussion of it. Since this is probably an operon these scores don`t matter too much and I`m assuming that was your thought process but I would love for you to confirm that. For phamerator, please add the pham group and the date at which you wrote that down (please include a small discussion of the function as well if there is any, is it conserved). For your location call, please provide a brief summary of your evidence especially because there is no coding potential so it would be good to see why you decided it was a real gene despite it. It’s a little bit hard for me to call anything since I feel like I’m missing some information, I would like some more detail especially with the lack of coding potential. /note=Module 9 QC: Needs attention; Select Y/N from all GM coding capacity dropdown box, add to synteny box, add function for your gene, please check evidence for BLAST hits, please add to your notes (see secondary annotator`s comments because you didn`t address that yet :) ! ) CDS 37383 - 38336 /gene="67" /product="gp67" /function="RecB-like exonuclease/helicase" /locus tag="Clarkson_67" /note=Original Glimmer call @bp 37383 has strength 11.58; Genemark calls start at 37383 /note=SSC: 37383-38336 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDI61_gp062 [Mycobacterium phage Marvin] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.999, -6.285651950023102, no F: RecB-like exonuclease/helicase SIF-BLAST: ,,[hypothetical protein FDI61_gp062 [Mycobacterium phage Marvin] ],,YP_009614180,100.0,0.0 SIF-HHPRED: Mitochondrial genome maintenance exonuclease 1; human MGME1, DNA complex, DNA exonuclease, DNA BINDING PROTEIN; 2.702A {Homo sapiens},,,5ZYT_C,82.6498,99.8 SIF-Syn: /note=Primary Annotator Name: Dr. Freise /note=Starterator: Start 12 at 37383 has 16 MAs, all in Cluster S phages. -4 gap: favorable (operon) /note=Function: several HHpred hits include exonuclease and helicase region, therefore calling this RecB /note=however note that PD-(D/E)XK nuclease superfamily hits also present further down in HHpred list (common evidence for Cas4) /note=Secondary Annotator Name: De Schutter, Elena /note=Secondary Annotator QC: I agree with the start site and also with the function. I`m not sure in this case but should the boxes still be selected? Like the coding potential for example? CDS 38333 - 39415 /gene="68" /product="gp68" /function="DNA binding protein" /locus tag="Clarkson_68" /note=Original Glimmer call @bp 38333 has strength 5.82; Genemark calls start at 38333 /note=SSC: 38333-39415 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CORAZON_65 [Mycobacterium phage Corazon]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.978, -3.5527928455068705, no F: DNA binding protein SIF-BLAST: ,,[hypothetical protein SEA_CORAZON_65 [Mycobacterium phage Corazon]],,QFP97620,99.4444,0.0 SIF-HHPRED: DNA repair protein RAD52 homolog; DNA annealing protein, ssDNA binding, multimeric ring formation, DNA BINDING PROTEIN; 2.405A {Homo sapiens} SCOP: d.50.1.3,,,5JRB_F,40.2778,99.7 SIF-Syn: /note=Primary Annotator Name: Hatashita, Anthony /note=Auto-annotation: Both Glimmer and Genemark agreed on the same start site at position 38333. The start codon is ATG. /note=Coding Potential: The gene has substantial coding potential within the putative ORF and the start site covers all of the coding potential. /note=SD (Final) Score: The SD score is not the best, however it is still reasonable to suggest the presence of a credible ribosome binding site. This gene also might have an irrelevant RBS score as its upstream gap is 4 base pairs which is common for genes included within an operon. /note=Gap/overlap: The gap with the upstream gene of 4 base pairs is completely reasonable and could indicate the presence of an operon. There is an alternative start site, however this one has too large a gap between the upstream gene and itself. The length of the gene is acceptable given the autoannotated start site. /note=Phamerator: The pham the gene is found in is 19632 as of 1/12/21. The pham in which the gene is conserved is in other members of the cluster that the phage belongs to. Beelzebub and Blackbeetle were other phages in cluster S that were used for comparison. There is no function called for the gene. /note=Starterator: There is a reasonable start site that is conserved among the members of the pham that the gene belongs to. The start site number that is conserved is start site 1 which corresponds to base pair 38333. There are 16 total members of the pham, 2 of which (including Clarkson) are drafts, and every single one calls the most conserved site, site 1. /note=Location call: Altogether, the gathered evidence suggests that the gene is real and starts at position 38333. This gene is conserved in phamerator and has great coding potential. The start site of 38333 seems most likely as it covers all coding potential. /note=Function call: The 2 best NCBI BLASTp hits, sorted by E-value, suggest that the function is unknown, with high query coverage (100%), high % identity (>99%), and low E-values (0). However, HHpred has decent hits to functions related to DNA binding. /note=Transmembrane domains: There are no TMDs in this protein. /note=Secondary Annotator Name: Fields, Brooke /note=Secondary Annotator QC: I have qc`d this gene and agree that the start site is 38333 CDS 39415 - 39840 /gene="69" /product="gp69" /function="hypothetical protein" /locus tag="Clarkson_69" /note=Original Glimmer call @bp 39415 has strength 14.51; Genemark calls start at 39415 /note=SSC: 39415-39840 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CORAZON_66 [Mycobacterium phage Corazon]],,NCBI, q1:s1 100.0% 7.55089E-95 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.264, -5.037495332953455, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CORAZON_66 [Mycobacterium phage Corazon]],,QFP97621,100.0,7.55089E-95 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hu, Yixiao /note=Auto-annotation: the start site for the gene is 39415 and the stop codon is 39840, with the start codon ATG /note=Coding Potential: the coding potential at the start codon 39415 is really high for the first frame and becomes 0 when it goes to the stop codon 39840, so it should be in forward direction /note=Transmembrane domains: No /note=Secondary Annotator Name: ENOS, ALEXANDER /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 39837 - 40190 /gene="70" /product="gp70" /function="HNH endonuclease" /locus tag="Clarkson_70" /note=Original Glimmer call @bp 39837 has strength 7.94; Genemark calls start at 39837 /note=SSC: 39837-40190 CP: yes SCS: both ST: SS BLAST-Start: [HNH endonuclease [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 3.30878E-81 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.41, -4.653873557426368, no F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Mycobacterium phage MosMoris] ],,YP_009031576,100.0,3.30878E-81 SIF-HHPRED: d.4.1.8 (A:513-673) CRISPR-associated endonuclease Cas9/Csn1, HNH domain {Actinomyces naeslundii [TaxId: 1115803]},,,d4ogca2,59.8291,96.4 SIF-Syn: /note=Primary Annotator Name: Juarez, Sabrina /note=Auto-annotation: Glimmer and Genemark both call a start site of 39837 /note=Coding Potential: Good coding potential is found in both Self- and Host-Trained Genemark in the forward reading frame. The chosen start site includes all the coding potential for the ORF. /note=SD (Final) Score: This call has a final score, of -4.654 which is not the best. It has the second-best z-score, 2.41. /note=Gap/overlap: The 4 bp overlap with the upstream gene can be common. This slight overlap is conserved in other phages, such as phage Blackbettle and Gattaca. /note=Phamerator: The Pham number as of Jan 12, 2022, is 39837. The gene is conserved in several phages within the same cluster, such as Gattaca, Blackbeetle, and Marvin. /note=Starterator: The most called start site is not present in Clarkson. However, start site 68 in Starterator was manually annotated in 14/14 non-draft genes for cluster S in this Pham. Start site 68 is at position 39837 in Clarkson. This evidence agrees with the sites predicted by Glimmer and GeneMark. /note=Function call: HNH endonuclease. The top two phagesdb BLAST hits have the function listed as HNH endonuclease in Mycobacteriophage Raela_69 (E-value of 3e-67 and 100% identity) and Mycobacteriphage Poise_69 (E-value of 3e-67 and 100% identity). The top four NCBI BLAST hits are listed as HNH endonuclease proteins (100% coverage, 98%+ identity, and E-value <8e-79). CDD had a hit for an HNH endonuclease with 28% identity, 36% coverage, and an E-value of 0.00000416336. HHpred had a few hits, the best for a CRISPR-associated HNH endonuclease found in Acintomyces naeslundii with 96% probability, 60% coverage, and an E-value of 0.011. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Hatashita, Anthony / Enos, Alexander /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. / I have QC’ed this location call and agree with the first annotator. CDS 40256 - 40426 /gene="71" /product="gp71" /function="hypothetical protein" /locus tag="Clarkson_71" /note= /note=SSC: 40256-40426 CP: yes SCS: neither ST: NI BLAST-Start: [hypothetical protein SEA_RAELA_70 [Mycobacterium phage Raela] ],,NCBI, q1:s1 100.0% 6.94554E-33 GAP: 65 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.582, -4.774753077400917, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_RAELA_70 [Mycobacterium phage Raela] ],,AZS06834,100.0,6.94554E-33 SIF-HHPRED: SIF-Syn: /note=added gene based on CP present in GM-self /note=Gene also found in Beelzebub, Blackbeetle /note=SS at 40256 has best stats and LORF CDS 40466 - 40747 /gene="72" /product="gp72" /function="hypothetical protein" /locus tag="Clarkson_72" /note=Original Glimmer call @bp 40466 has strength 5.59; Genemark calls start at 40466 /note=SSC: 40466-40747 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDI61_gp066 [Mycobacterium phage Marvin] ],,NCBI, q1:s1 100.0% 3.91695E-62 GAP: 39 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.008, -6.267683495103299, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDI61_gp066 [Mycobacterium phage Marvin] ],,YP_009614184,100.0,3.91695E-62 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Koetters, Owen /note=Auto-annotation: Glimmer and GeneMark, both calling a start site at position 40466. /note=Coding Potential: Coding potential in this ORF is only on the forward strand, and so this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -6.268. This is far from the most favorable score on PECAAN. /note=Gap/overlap: GAP: 275. There is a very large gap immediately upstream the predicted start site that shows coding potential in the self GeneMark and that is filled by annotated genes in other Pham maps. With that being said, although other start site predictions do have more favorable RBS final scores or Z-values, their length is too short to be a real gene and the gap from the upstream stop codon only widens. /note=Phamerator: pham: 96342. Date 01/14/22. This gene is conserved; it is present in phage Beelzebub and phage Corazon. /note=Starterator: Although start site 10 was called in 40/54 non-draft genomes and was the most annotated start site, this well-conserved site is not present in Clarkson. Thus, Starterator disagreed with the site predicted by GeneMark and Glimmer. Instead, the predicted start site was site 9. This site was called 100% of the time that it was present, supporting the validity of this prediction. /note=Location call: Given the above evidence, this is a real gene and the auto-annotated start site at position 40466 is the best option listed. However, this predicted start site at position 40466 requires further analysis to be validated, as an additional gene potentially needs to be added immediately upstream. /note=Function call: NKF but possible Hydrolase based on PhagesDB BLAST. A BLAST was executed using the PhagesDB and NCBI databases, and the only functional call that was made in either database was hydrolase function: 2 excellent Blast hits on PhagesDB (E-values>-50, 100% coverage, identity 100%) and 2 excellent Blast hits on NCDI (E-values>-61, 100% coverage, identity> 98%). There were no significant HHpred or CDD hits. /note=Transmembrane domains: There are no TMDs predicted by TOPCONS or TMHMM, suggesting that this gene product is not membrane associated or bound. CDS 40744 - 41109 /gene="73" /product="gp73" /function="hypothetical protein" /locus tag="Clarkson_73" /note=Original Glimmer call @bp 40744 has strength 12.7; Genemark calls start at 40744 /note=SSC: 40744-41109 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CORAZON_69 [Mycobacterium phage Corazon]],,NCBI, q1:s1 100.0% 1.06272E-83 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.147, -4.722970737487855, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CORAZON_69 [Mycobacterium phage Corazon]],,QFP97624,100.0,1.06272E-83 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Li, Shally /note=Auto-annotation: Both Glimmer and GeneMark call the gene and agree on the start site of 40744. /note=Coding Potential: There is high coding potential in both the self and host trained GeneMarks in the first reading frame in the forward direction. The high coding potential region encompasses the entire gene. There is no coding potential in any other reading frame. This is a forward gene. /note=SD (Final) Score: -4.723. This is the best final score on PECAAN with a reasonable gap/overlap. /note=Gap/overlap: -4bp. This 4bp overlap indicates that this gene is in an operon. /note=Phamerator: The pham number is 5323 as of 1/12/2022. The gene is conserved in 14 non-draft phages, all of cluster S, including Blackbeetle and Beezelbub. /note=Starterator: 14/14 non-draft members of the pham call start site 5, which corresponds to position 40744 in Clarkson. This start position is agreed upon by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene with the start site at 40744. Glimmer and GeneMark agree on this start site and it is confirmed by Starterator. /note=Function call: NKF. All PhagesDB and NCBI Blast hits were for proteins of undefined function. There were no CDD hits. HHpred hits had extremely high e-values and low probabilities. /note=Transmembrane domains: None /note=Secondary Annotator Name: Juarez, Sabrina /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 41106 - 41369 /gene="74" /product="gp74" /function="hypothetical protein" /locus tag="Clarkson_74" /note=Original Glimmer call @bp 41106 has strength 10.8; Genemark calls start at 41106 /note=SSC: 41106-41369 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_PRINGAR_71 [Mycobacterium phage Pringar] ],,NCBI, q1:s1 100.0% 1.91016E-58 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.723, -5.77913802305459, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_PRINGAR_71 [Mycobacterium phage Pringar] ],,QFP96933,100.0,1.91016E-58 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Likwong, Chloe /note=Auto-annotation: Glimmer and GeneMark predicted the same start sites, which is 41106 /note=Coding Potential: Via both Glimmer and GeneMark, there is coding potential found in both the maps and there is no violation of the guiding principles /note=SD (Final) Score: Final Score of -5.779 and a Z-score of 1.723 for the start site @41106. It is important to note that the Final Value is not the most positive and the Z-score is less than 2. /note=Gap/overlap: There is a 4 bp overlap with the upstream gene–the gene might be part of an operon. /note=Phamerator: This gene is in pham 3367 as of the date 1/14/2022. There are 16 members that share the same cluster (S) of the gene, 14 of which are final drafts. /note=Starterator: Starterator states that the most annotated start is 5 (41154 bp; 10/15 manual annotations) Clarkson auto-annotates Start 4 @41106, which agrees with Glimmer and GeneMark’s start sites. Start Number 4 has 4 MA’s and is called 83% of the time when present, possibly due to the -4 gap/operon /note=Location call: The gene seems to be a real gene and that the most likely Start site is @41106; it covers the coding potential, has a gap of -4, and is the auto-annotated Start site compared to the other potential Start sites. Both Glimmer and Genemark agree on Start @41106. It is important to note that Start@41106 has a Final score of-5.779, while Start@41154 has -5.552 and that no Z-score is >2. Furthermore, Start@41154 does not cover the entire coding potential. /note=Function call: The function is unknown. In PhagesDB, the hits with low e-values listed no known function. Similarly, in NCBI BLASTP, all the hits list “hypothetical function.” In CDD Hits, there were no hits found. For HHpred, the e-value is high and lists either a probability under 80% and/or coverage under 40%. /note=Transmembrane domains: No transmembrane domains detected by TMHMM or TOPCONS. /note=Secondary Annotator Name: Koetters, Owen /note=Secondary Annotator QC: Although start number 4 @411106 does have a -4 overlap, which would often be the best start-site call, starterator site number number 5 @41154 is conserved in all genes in the pham. I believe this uniformly conserved start site is more likely to be correct despite its 44 nucleotide gap. / I have QC’ed this location call and agree with the first annotator. CDS 41423 - 41626 /gene="75" /product="gp75" /function="hypothetical protein" /locus tag="Clarkson_75" /note=Original Glimmer call @bp 41423 has strength 7.3; Genemark calls start at 41423 /note=SSC: 41423-41626 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDI61_gp070 [Mycobacterium phage Marvin] ],,NCBI, q1:s1 100.0% 2.16392E-41 GAP: 53 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.246, -4.2297005446939515, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDI61_gp070 [Mycobacterium phage Marvin] ],,YP_009614188,100.0,2.16392E-41 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Lin, Yuri /note=Auto-annotation: Both Glimmer and GeneMark call the gene and agree on a start site at 41423 bp. /note=Coding Potential: Good coding potential in the forward direction is indicated for this ORF on Genemark for both the host- and self-trained algorithms. /note=SD (Final) Score: The final score was -4.230, which was the best (least negative) of all the candidate start sites. /note=Gap/overlap: Gap of 53 bp. While this gap might be a little large, this gap is reasonable for two reasons: 1) this is the smallest gap of all the candidate start sites and 2) a similar gap is conserved in several other non-draft phages as well, including Beelzebub, JoieB, and Pringar. /note=Phamerator: As of 1/12/22, the pham number is 18412. This gene is conserved in 16 other phages, 14 of which are published. All are in cluster S, the same as Clarkson. /note=Starterator: Start site 4 is the most conserved start site and is found in 16/16 genes in this pham. It was manually annotated in 8/14 non-draft genes in the pham. Start site 4 is at position 41423 bp on Clarkson. This evidence agrees with the start site predicted by Glimmer and Genemark. The next most called start site is 3, but all genes that do not have start site 3 (including this gene on Clarkson) call start site 4, which gives additional confidence in start site 4 for this gene in Clarkson. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 41423 bp. /note=Function call: NKF - In PhagesDB and NCBI, all hits are listed as “function unknown” or “hypothetical protein” respectively, so there is no information gained from the BLAST results. CDD did not return any hits, and HHpred did not have any hits with sufficient e-value to be considered good evidence. /note=Transmembrane domains: No transmembrane domains were predicted by TMHMM or TOPCONS, so it is not a membrane protein. /note=Secondary Annotator Name: Li, Shally /note=Secondary Annotator QC: I agree with the location call and all other evidence categories. CDS 41760 - 42686 /gene="76" /product="gp76" /function="hypothetical protein" /locus tag="Clarkson_76" /note=Original Glimmer call @bp 41760 has strength 3.93; Genemark calls start at 41760 /note=SSC: 41760-42686 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_BEELZEBUB_79 [Mycobacterium phage Beelzebub]],,NCBI, q1:s1 100.0% 0.0 GAP: 133 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.611, -3.4080892535965868, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BEELZEBUB_79 [Mycobacterium phage Beelzebub]],,AZF93340,100.0,0.0 SIF-HHPRED: Phage_rep_org_N ; N-terminal phage replisome organiser (Phage_rep_org_N),,,PF09681.13,43.1818,99.0 SIF-Syn: NKF. Upstream gene is pham 18412. Downstream gene is pham 99085, just like in phage Beelzebub and Corazon. /note=Primary Annotator Name: Mao, Xuanting /note=Auto-annotation: Both Glimmer and GeneMark call the gene and they agree on the start site at 41760 bp. The start codon ATG is called here. /note=Coding Potential: The ORF does have reasonable coding potential and the chosen start site (41760)does include all of the coding potentials in which coding potential is found in both self-trained and host-trained GeneMark in the forward frame. /note=SD (Final) Score: SD Score ( -3.408) is the best compared with other potential genes. E-value is 2.611. /note=Gap/overlap: There`s a 133 bp gap. However, there`s no coding potential in the gap that might be a new gene. The gap was also seen in other phages as well, such as Beelzebub. /note=Phamerator: The pham number as of January 12, 2022 is 21577. The gene is conserved in 16 phages and all of them are belonging to cluster S. /note=Starterator: The start number called the most often in the published annotations is 2, it was called in 13 of the 14 non-draft genes in the pham. The start site number for Clarkson is also 2. The start site at this start number is 41760 and it has 13 MA`s. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 41760. /note=Function call: The function of this gene is unknown. Because almost all of the NCBI hits suggest a hypothetical function for this gene with 100% coverage, high % identity (~100%), and e-value of 0. /note=PhagesDB only has a 44% function frequency to say that this gene has the function as RepA-like replication initiator. CDD doesn`t even have a conserved domain for this gene. HHpred also says to mark this ORF as NKF because the significant hits are the ones with high e-value (1.4) and low coverages (one is 43.18% and another one is 38.31%). By combining all of the data from these blasts, it’s not convincing to say that this ORF has a function. /note=Transmembrane domains: None. There`s no transmembrane domain shown on TMHMM and TOPCONS in which both of them called 0 TMD. /note=Secondary Annotator Name: Likwong, Chloe; Li, Shally /note=Secondary Annotator QC: The chosen start site is accurate, and it reflects the guiding principles of annotation. I agree with the function call. CDS 42688 - 43308 /gene="77" /product="gp77" /function="hypothetical protein" /locus tag="Clarkson_77" /note=Original Glimmer call @bp 42688 has strength 5.4; Genemark calls start at 42688 /note=SSC: 42688-43308 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDI61_gp072 [Mycobacterium phage Marvin] ],,NCBI, q1:s1 100.0% 9.04548E-152 GAP: 1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.65, -3.389881189012514, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDI61_gp072 [Mycobacterium phage Marvin] ],,YP_009614190,100.0,9.04548E-152 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Maraziti, Gabriela /note=Auto-annotation: Glimmer and Genemark both call the start at 42688. /note=Coding Potential: The gene has reasonable coding potential within the ORF, and the start encapsulates all typical and atypical coding potential. /note=SD (Final) Score: -3.390; this is the best final score on PECAAN. The z-score is also the highest at 2.65. /note=Gap/overlap: 1 bp; perfectly reasonable gap. The length of the gene is acceptable as is. /note=Phamerator: pham 94219 as of 2/18/22. The gene is conserved in other members of the same cluster (S) such as Beelzbub and Blackbeetle. It is mostly found in clusters F and Y, some of which called the function as a DnaC-like helicase loader. /note=Starterator: 61/78 non-draft members call start site 14. The Clarkson phage does not contain this start. Start number 1 has the most manual annotations in the Clarkson phage, and it is also the auto-annotated start at position 42688. /note=Location call: Gene is real considering all the evidence above, and has a start site at 42688 bp. /note=Function call: NKF; phagesdb BLAST, HHpred and CCD did not return any informative hits. /note=Transmembrane domains: Not a membrane protein; no hits predicted by either TMHMM or TOPCONS. /note=Secondary Annotator Name: Lin, Yuri /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. /note=Tertiary Annotator: Likwong, Chloe /note=Tertiary Annotator QC: I agree with the primary and secondary annotator. The most likely start site is Start @42688 and that there is no known function; while HHPRED lists hits with known function and has e-values of 10^-9, the hits in other databases like PhagesDB blast and NCBI BLAST have stronger e-values, with hits as low as e^-119 and e^-152 respectively. There are also no hits in Topcons and TmHmm. CDS 43305 - 43634 /gene="78" /product="gp78" /function="hypothetical protein" /locus tag="Clarkson_78" /note=Original Glimmer call @bp 43305 has strength 7.75; Genemark calls start at 43305 /note=SSC: 43305-43634 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDI61_gp073 [Mycobacterium phage Marvin] ],,NCBI, q1:s1 100.0% 9.3689E-74 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.921, -3.8154491356968365, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDI61_gp073 [Mycobacterium phage Marvin] ],,YP_009614191,100.0,9.3689E-74 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Mascareno, Greta /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 43305. /note=Coding Potential: There is coding potential in the forward direction of this gene. The ORF encompasses the whole gene from 43305 bp to 43632 bp in both Host and Self GeneMark. /note=SD (Final) Score: The start site at 43305 has the second best final score on PECAAN of -3.815. along with the highest Z-score of 2.921. /note=Gap/overlap: There is a 4 bp overlap that suggests this gene is part of an operon. The overlap is conserved in phages Corazon and Beelzebub. /note=Phamerator: The pham number as of 01/12/2022 is 19355. The gene is conserved; it is found in phages Corazon, Blackbeetle, and Beezlebub which are also included in the same cluster as Clarkson. /note=Starterator: Start site 3 is called in 14/14 non-draft members of the Pham. Start site 3 is located at 43305 bp as called by Glimmer and GeneMark. /note=Location call: Considering the evidence, this is a real gene with the start site of 43305 bp. /note=Function call: There is no known function for this gene. All evidence was inconclusive. There were no hits in HHpred, CDD or BLASTp that could help conclude the function. /note=Transmembrane domains: None were found in TmHmm. /note=Secondary Annotator Name: Dr. Freise /note=Secondary Annotator QC: CDS 43631 - 43774 /gene="79" /product="gp79" /function="hypothetical protein" /locus tag="Clarkson_79" /note=Original Glimmer call @bp 43631 has strength 2.46 /note=SSC: 43631-43774 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein SEA_BEELZEBUB_82 [Mycobacterium phage Beelzebub] ],,NCBI, q1:s1 100.0% 1.08326E-25 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.921, -3.6727816321281046, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BEELZEBUB_82 [Mycobacterium phage Beelzebub] ],,AZF93343,100.0,1.08326E-25 SIF-HHPRED: SIF-Syn: Since this gene has NKF will refer to pham numbers: 2/3/2022 42501, upstream gene is 16325, downstream is 99194, just like in phage Poise. 2/9/2022 42501, upstream gene is 16325, downstream is 99194, just like in phage Blackbeetle. /note=Primary Annotator Name: Mendoza, Alleana /note=Auto-annotation: Glimmer. Only Glimmer calls the gene at start site 43631 with start codon ATG. /note=Coding Potential: The ORF does have reasonable coding potential and the chosen start site does include all of the coding potential. /note=SD (Final) Score: -3.673. It is the best final score on PECAAN. /note=Gap/overlap: 4 bp overlap (ATGA). This indicates this gene is most likely part of an operon. This overlap is also conserved in other phages. /note=Phamerator: Pham: 42501. Date: 1/16/2022. It is conserved and found in Blackbeetle and Poise. /note=Starterator: Start site 11 in Starterator was manually annotated in 7/7 non-draft genes in this pham. Start 11 is 43631 in Clarkson. This evidence agrees with the start site predicted by Glimmer. /note=Location call: Based on the data presented above, this is a real gene with 43631 as the most likely start site. /note=Function call: The top three PhagesDB BLASTp hits all have unknown functions (E-value 2e-24), and 2 out of 2 NCBI BLASTp hits (100% coverage, 97%+ identity, and E-value 100%), high % identity (>97.92%), and low E-values (0). HHPRED and CDD confirm NCBI BLASTp`s conclusion. /note=Transmembrane domains: No evidence of any transmembrane domains. /note=Secondary Annotator Name: Patel, Rishi /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. (See comments in spreadsheet) CDS 45440 - 46327 /gene="87" /product="gp87" /function="glycosyltransferase" /locus tag="Clarkson_87" /note=Original Glimmer call @bp 45467 has strength 12.31; Genemark calls start at 45422 /note=SSC: 45440-46327 CP: yes SCS: both-cs ST: SS BLAST-Start: [glycosyltransferase [Mycobacterium phage Beelzebub] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.929, -4.966271270034708, no F: glycosyltransferase SIF-BLAST: ,,[glycosyltransferase [Mycobacterium phage Beelzebub] ],,AZF93349,100.0,0.0 SIF-HHPRED: SIF-Syn: glycosyltransferase, upstream O-methyltransferase, downstream glycosyltransferase. Similar to phage VasuNzinga. /note=Primary Annotator Name: Perez, Joshua /note=Auto-annotation: Gene is called by both Glimmer and Genemark, but do not start at the same site. The preferred program, Glimmer, has a start site at 45467 and calls GTG. Genemark starts at 45422 and calls GTG. /note=Coding Potential: Yes, this gene has reasonable coding potential predicted within the second RF forward. This chosen start site does cover all this coding potential. /note=Phamerator: Pham 21260; 01/15/22. Pham 21260 is present in other members of the cluster S. I usedBlackbeetle, Beezlebub, and 13 others in cluster S. None of the other similar members had a function called. /note=Location call: The overall evidence shows that this gene is most likely real at bp number 45467, as it has no large noncoding gaps, high coding potential in its second RF forward, conserved gap in other cluster S phages, and evidence from starterator. /note=Function call: The top 5 NCBI hits sorted by e-value had low e-values and high % identity. For example, phage VasuNzinga has a 99% identity and an e-value of 0. This points to the fact that this gene is most likely a glycosyltransferase. On PECAAN, most of the similar sequences pointed to the fact it was a glycosyltransferase. The HHpred hits support the function of glycosyltransferase, with various hits supporting this function. On the other hand, there are some hits that support the function as a N-acetylgalactosaminyltransferase. However, since module 6 supports the glycosyltransferase function more heavily, glycosyltransferase is the more supported hypothesis. There was no evidence from CDD present. /note=Transmembrane domains: No predicted TMDs by TOPCON or TMHMM makes sense in the context of the hypothesized function because it is said to be a glycosyltransferase. The function of this protein is to catalyze the transfer of carbohydrate from nucleotide sugar substrates to incomplete glycolipid or glycoprotein acceptors, which would not be involved in the membrane. /note=Secondary Annotator Name: Patel, Sahaj; Rishi Patel /note=Secondary Annotator QC: I have QC’ed this location call and agree with this annotation. All of the evidence categories have been considered. Note that under "Gap/Overlap", you could mention whether or not that gap is conserved amongst other phages to justify its existence. Under "Starterator", you did not mention how many draft genes were manually annotated. You also have not done the drop-down menu options for the Starterator and the GM Coding Capacity. Also you did not include the Z score under SD (Final Score), and you have a typo or two. CDS 46327 - 47055 /gene="88" /product="gp88" /function="glycosyltransferase" /locus tag="Clarkson_88" /note=Original Glimmer call @bp 46327 has strength 14.05; Genemark calls start at 46327 /note=SSC: 46327-47055 CP: yes SCS: both ST: SS BLAST-Start: [glycosyltransferase [Mycobacterium phage VasuNzinga] ],,NCBI, q1:s1 100.0% 2.89718E-178 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.214, -4.760068129112225, no F: glycosyltransferase SIF-BLAST: ,,[glycosyltransferase [Mycobacterium phage VasuNzinga] ],,AYB70716,100.0,2.89718E-178 SIF-HHPRED: Polypeptide N-acetylgalactosaminyltransferase; GalNAc-Ts, GalNAc-T3, long-range glycosylation preference, (glyco)peptides, Molecular dynamics, specificity, enzyme kinetics, FGF23, phosphate homeostasis, TRANSFERASE; HET: EDO, NAG, UDP, NGA; 1.96A {Taeniopygia guttata},,,6S22_A,92.562,99.9 SIF-Syn: Glycosyltransferase, with the upstream gene being another glycosyltransferase and the downstream gene being NKF ( Stop Site 47235). This is similarly found in in Beelzebub with the upstream gene being glycosyltransferase and the downstream gene also being NKF. The genes are found in the same order with the same function, indicating synteny. /note=Primary Annotator Name: Pham, Britney /note=Auto-annotation: Glimmer and GeneMark agree on the gene starting at bp coordinate 46327, Start codon was ATG /note=Coding Potential: Yes, covers all of the coding site. /note=SD (Final) Score: Final score is -4.760 with Z value of 2.214, which was the most favorable of all the other start sites with the Z value being over 2 and the final score being the closest to 0. /note=Gap/overlap: There is an overlap of -1 which is favorable. This is the most reasonable length found. The next was a gap of 14 which is too large /note=Phamerator: Pham 94683; 01/13/22. Pham 94683 is present in other members of the cluster S but there were a lot of other clusters with the Pham in there. I used Beezlebub, Corazon, Gattaca as samples of the S cluster. Glycosyltransferase was the called function of the gene /note=Starterator: There is a reasonable conserved start site. The start site number is 37, and is coordinated base pair number is 46327. Called 100% of time when present, all in S phages. /note=Location call: The start is at 46327. the length of the sequence is 729. This result corresponds with all the other function calls and has the best e-value, Z-value and SD score. /note=Function call: The sequence seems to code for glycotransferase which are the enzymes responsible for the initiation and elongation of glycan chains on mucins as they transfer activated sugar residues to the proper acceptor. Based off of the Function Frequency table and how many time the gene function is called, it is reasonable to determine the function of this sequence. The rationale for the function is based on the Function Frequency table that identifies the number of times the gene was called for that function, along with how phageDB identified the ORF’s function as glycosyltransferase. Though CDD did not have any hits which is normal HHpred had a lot of hits for the gene with the same function and a small qualifying e-value. This is sufficient evidence enough to declare the gene’s function. /note=Transmembrane domains: None of the amino acids span through the membrane which does help confirm the function of the sequence. Glycosyltransferases catalyze the transfer of carbohydrate from nucleotide sugar substrates to incomplete glycolipid or glycoprotein acceptors. They are predominantly located in the Golgi apparatus /note=Secondary Annotator Name: Pay, Iona /note=Secondary Annotator QC: I agree with this annotation. All evidence categories have been considered. CDS 47065 - 47235 /gene="89" /product="gp89" /function="hypothetical protein" /locus tag="Clarkson_89" /note=Original Glimmer call @bp 47065 has strength 8.57; Genemark calls start at 47065 /note=SSC: 47065-47235 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CORAZON_83 [Mycobacterium phage Corazon]],,NCBI, q1:s1 100.0% 2.03985E-32 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.444, -5.061353357642258, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CORAZON_83 [Mycobacterium phage Corazon]],,QFP97638,100.0,2.03985E-32 SIF-HHPRED: SIF-Syn: This gene has NKF, the upstream gene is glycosyltransferase, and the downstream gene is glycosyltransferase, just like in phages Tesla and MosMoris. This gene with NKF belongs to Pham 12166 which is also present in phages Tesla and MosMoris.T /note=Primary Annotator Name: Pramana, Martin /note=Auto-annotation: Both GeneMark (Self and Host) and Glimmer, calls for the same site 47,065. With start codon ATG. /note=Coding Potential: The coding potential is located in an ORF at the forward direction (the first frame), which the coding potential covers all possible start sites and is found in both GeneMark Self and Host. /note=SD (Final) Score: -5.061. It is the lowest possible Final score compared to other possible start sites. /note=Gap/overlap: There is a relatively small gap of 9 bp between the upstream gene and the start site. The gap is conserved in other phages such as Corazon and there is no coding potential between the upstream gene and the start site. It is the longest ORF. /note=Phamerator: As of 1/13/2022 the gene is grouped in Pham 12166. It is conserved in cluster S such as Poise, Corazon, Gattaca, and Tesla. /note=Starterator: There are 14 non-draft phages in this pham. Start site 1 is conserved in manually annotated 14/14 non-draft members. Start site 1 has a position of 47259 bp in Corazon. The start site agrees with both Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this is a real gene and the start site is most likely 47065 bp. Starterator agrees with both GeneMark and Glimmer. /note=Function call: NKF. Both NCBI and PhageDB BLAST’s top 2 results state that this gene doesn`t have a known function. The protein matches with phages Corazon, VasuNzinga, and Marvin have high query coverage of 100%, high % identities (>98.21%), and low E-values (<1e-27). CDD shows no matches/ hits for this gene. While HHpred has no good hits in the databank. The first match from HHpred, has a high E-value (>15), a low probability (55%), and low coverage (35.7143%). Therefore, there are no relevant hits from CDD or HHpred. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predict any TMDs for this protein, therefore this protein is not a transmembrane protein. /note=Secondary Annotator Name: Perez, Joshua /note=Secondary Annotator QC: I have QC the above annotation and agree with the following start site. All of the evidence points to this, including agreeing with GeneMark and Glimmer, strong starterator evidence, and having the lowest final score. However, did not mention the amount of manually edited draft genomes present. CDS 47238 - 47900 /gene="90" /product="gp90" /function="glycosyltransferase" /locus tag="Clarkson_90" /note=Original Glimmer call @bp 47238 has strength 6.64; Genemark calls start at 47238 /note=SSC: 47238-47900 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDI61_gp083 [Mycobacterium phage Marvin] ],,NCBI, q1:s1 100.0% 1.35291E-161 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.73, -3.2409984623528394, yes F: glycosyltransferase SIF-BLAST: ,,[hypothetical protein FDI61_gp083 [Mycobacterium phage Marvin] ],,YP_009614201,100.0,1.35291E-161 SIF-HHPRED: Glyco_transf_25 ; Glycosyltransferase family 25 (LPS biosynthesis protein),,,PF01755.20,67.2727,99.8 SIF-Syn: Glycosyltransferase, upstream gene is unknown, downstream is unknown, just like in phage Beezlebub. But the gene upstream the upstream gene is also a glycotransferase, just like in Beezlebub and LittleLaf. /note=Primary Annotator Name: Reyes, Glania /note=Auto-annotation: Both Glimmer and GeneMark. Both agree on the start site at 47238. The start codon is GTG. /note=Coding Potential: Coding potential in the ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: The final score is the best option at -3.241 and the z score is the highest at 2.73. /note=Gap/overlap: 2 bp. This is reasonable and evidence that the gene is part of an operon. This is also conserved in other phages as well, such as LittleLaf and Raela. /note=Phamerator: pham: 13742. Date 1/12/2022. It is conserved; found in Corazon (S) and Marvin (S). /note=Starterator: Start site 59 in Starterator was manually annotated in 149/395 non-draft genes in this pham.This is the most annotated gene, but Start 57 is 47238 in Clarkson. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 47238. /note=Function call: Glycosyltransferase. The top 2 phagesdb BLAST hits have the function of portal protein (E-value of 1e-128), and the top 2 NCBI BLAST hits also have the function of portal protein. The first NCBI BLAST has 100% coverage, and E-value of 1.35291e-161. HHpred had a hit for portal protein with 99.8% probability, 71.8182% coverage, and E-value of 1.6e-18. CDD had no relevant hits. /note=Transmembrane domains: No called transmembrane domains. This makes sense because this is a glycosyltransferase, which does not need to interact with the membrane. /note=Secondary Annotator Name: Pham, Britney /note=Secondary Annotator QC: The GeneMark and Glimmer start sites are the same. The Starterator Map indicated a most common start sight, while Phamerator indicates that this gene is conserved in other genomes too. The start sight also has the best Z-value and Final Score and no large overlaps. /note=Secondary Annotator Name: Perez, Joshua /note=Secondary Annotator QC: PECAAN notes have been completely filled out. See annotation on other gene for similar note on Beezlebub evidence marker. All drop down menus have been successfully filled out and checked. Maybe also describe why having no TM domains makes sense with the function of your gene. CDS 47897 - 48451 /gene="91" /product="gp91" /function="hypothetical protein" /locus tag="Clarkson_91" /note=Original Glimmer call @bp 47897 has strength 17.15; Genemark calls start at 47897 /note=SSC: 47897-48451 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDI61_gp084 [Mycobacterium phage Marvin] ],,NCBI, q1:s1 100.0% 1.16808E-133 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.432, -6.685694065809379, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDI61_gp084 [Mycobacterium phage Marvin] ],,YP_009614202,100.0,1.16808E-133 SIF-HHPRED: SIF-Syn: NKF, upstream gene is glycosyltransferase, downstream gene belongs to Pham 4673. Both of these qualities are found in the phages Blackbeetle and Gattaca. /note=Primary Annotator Name: Rodriguez, Sean /note=Auto-annotation: Glimmer and GeneMark. Both of them agree on start site at 47897 bp. /note=Coding Potential: Coding potential found in both GeneMark Self and Host. Putative ORF has coding potential in the forward strand only, indicating that it is a forward gene. Chosen start site includes all coding potential. /note=SD (Final) Score: The final score is the third-worst option at -6.686 and the z score is the fourth-worst at 1.432. These scores are irrelevant for the start call based on the overlap information. /note=Gap/overlap: The putative ORF overlaps the upstream gene by 4 bp, suggesting that this gene is part of an operon. No abnormal gap/overlap exists between this gene and the downstream gene. The gene with the autoannotated start site is 555 bp long which is acceptable. /note=Phamerator: The pham number as of January 12, 2022 is 13514. The gene is conserved in the phages Marvin and Pringar, both of which are in the same cluster as Clarkson (S). No function call present. /note=Starterator: Start site 1 in Starterator was manually annotated in 14/14 non-draft genes in the pham. Start site 1 is at 47897 bp in Clarkson. /note=Location call: Considering the above evidence, this is a real gene that has a start site at 47897 bp. /note=Function call: Multiple phagesdb BLAST hits with e values of < e-102, but no hits with actual functions have sufficiently low e values. 4 out of the top 5 NCBI BLAST hits are associated with Mycobacteriophages (Query Coverage 100%, 96%+ Identity, e < 1e-129), but all function calls were for hypothetical proteins. Neither CDD nor HDDpred had relevant hits due to lack of function call or e-values > 8.3, respectively. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, so this cannot be a membrane protein. /note=Secondary Annotator Name: Pramana, Martin and Pham, Britney /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator. Both GeneMark and Glimmer call for the same start site which includes all coding potential. The start site has a favorable gap of -4 bp that indicates an operon. Although the SD score and Z score isn`t the best, starterator notes it as the Most Annotated start site with 14 manual annotations out of 14 non-draft phages. I have QC`ed this location call and agree witht he first annotator. Both GeneMark and Glimmer call the same start site which included all of the coding potential. The site has a small overlap of -4 indicating that it could be an operon. Starterator notes this start site is the most annotated with 14/14 non-draft annotations. The function call seems accurate going off of what was called on CDD and HHpred which fits the function of a sequence that is non-transmembrane. CDS 48473 - 48886 /gene="92" /product="gp92" /function="hypothetical protein" /locus tag="Clarkson_92" /note=Original Glimmer call @bp 48473 has strength 4.82; Genemark calls start at 48479 /note=SSC: 48473-48886 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein FDI61_gp085 [Mycobacterium phage Marvin] ],,NCBI, q1:s1 100.0% 9.60233E-98 GAP: 21 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.854, -3.55687757905574, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDI61_gp085 [Mycobacterium phage Marvin] ],,YP_009614203,100.0,9.60233E-98 SIF-HHPRED: SIF-Syn: NFK; upstream gene is pham 13514, downstream gene is pham 98967 which is HNH endonnuclease, just like in phage Beelzebub and Corazon. /note=Primary Annotator Name: Ruiz, Paola /note=Auto-annotation: Glimmer calls start site at 48473, and GeneMark at 48479. /note=Coding Potential: There appears to be strong typical coding potential for this gene in the forward direction indicating that this is a forward gene for both GeneMark host and Self-Trained. There is upward hash marks for both start sights and downward hash at the stop sight. This gene has good coding potential. /note=SD (Final) Score: The final score is -3.557 and z score is 2.854. This is the best final score and z score on PECAAN. /note=Gap/overlap: The Gap is 21 bp which is very reasonable. This Gap is conserved in Beezlbub and Littlelaf and they are also 414bp. /note=Phamerator: pham: 4673. Date 1/13/2021. It is conserved and found in Beelzebub, JoieB, Tesla, and LittleLaf. They are all in cluster S. /note=Starterator: Start site number is 1 which correlates to start site 48473 bp for clarkson. It was called in 14 of the 14 non-draft genes in pham. /note=Location call: Based on the above evidence, this is a real gene who most likely starts at 48473. /note=Function call: Function is unknown. Top 3 hits of phagesdb BLAST with the best e value (3e-81) had unknown function. The hits that did have a listed function mostly agreed on DnaE-like DNA polymerase III but the e values were very high at 8.6 so this is most likely not the function. For NCBI BLAST hits, all of the hits were hypothetical and had unknown function despite good e values. HHpred and CDD were not helpful in determining function. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Reyes, Glania; Pramana, Martin /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. Though the Glimmer and Genemark starts are not the same, this is a real gene whose start is likely at 48473. Both the final score and z score are best at this start. Phamerator suggests that this gene is conserved in cluster S, and Starterator calls start site 1, which is 48473. /note=I have QC`ed and agree with the first and secondary annotator. Although, GeneMark and Glimmer called for different start sites, the start site called by Glimmer is backed by the evidence from starterator. This phage belongs to pham 4673 and a conserved upstream gap of 21 bp. Start site 1 is conserved in other 14 manually annotated non draft genomes with a start site of 48473. CDD and HHpred did not show any relevant hits. The hits from HHpred did not have any reasonable e-values. NCBI and PhagesDB BLAST did get reasonable e-values but the functions are either hypothetical proteins or function unknown. TmHMM and TOPCONS also did not state any TMDs for this protein. All of the evidence above shows that this is a real gene with No Known Function NKF. However, I think that the primary annotator should mention the name of phages that`s conserved in the gap/overlap and phamerator section. I also think that the primary annotator should mention the gene upstream and downstream functions in the syntenny box. CDS 48873 - 49199 /gene="93" /product="gp93" /function="HNH endonuclease" /locus tag="Clarkson_93" /note=Original Glimmer call @bp 48915 has strength 7.69; Genemark calls start at 48915 /note=SSC: 48873-49199 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein FDI61_gp086 [Mycobacterium phage Marvin] ],,NCBI, q1:s1 100.0% 1.22238E-73 GAP: -14 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.741, -3.2017655760970833, yes F: HNH endonuclease SIF-BLAST: ,,[hypothetical protein FDI61_gp086 [Mycobacterium phage Marvin] ],,YP_009614204,100.0,1.22238E-73 SIF-HHPRED: d.4.1.8 (A:513-673) CRISPR-associated endonuclease Cas9/Csn1, HNH domain {Actinomyces naeslundii [TaxId: 1115803]},,,d4ogca2,70.3704,96.4 SIF-Syn: HNH endonuclease, upstream gene has no known function, downstream gene is minor tail protein, just like phage Corazon. /note=Primary Annotator Name: Saha, Atul /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 48915 bp. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: The final score is -4.213 and the z score is 2.291. This is not the best final score + z score combination, as the start site at at 48873 has the highest final score at -3.202 and best z score at 2.741. /note=Gap/overlap: Start site 48915 bp has gap of 28 base pairs, which is realistic. Start site 48873 has overlap of 14 bp, which is also abnormal but could warrant a closer look. /note=Phamerator: 95447. Date 1/13/22. It is conserved; found in Beelzebub and Black Beetle. /note=Starterator: Start: 138 @48873 has 222 MA`s) - very compelling /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 48773 bp. Starterator does not agree with Glimmer and Genemark, but this start site is the most conserved in the S cluster and is also the most manually annotated. The chosen start site contains all of the coding potential in the Host-trained Genemark, but not for the self-trained Genemark. This information is slightly different, but the Starterator evidence is compelling enough to choose the start site at 48773 bp. /note=Function call: Clarkson_87 has the function of “HNH endonuclease”. Both the phagesDB and NCBI BlastP searches yielded highly matched alignments with low E values (phagesDB e<-52), supporting the idea that the gene is conserved across multiple phages as a HNH Endonuclease protein. PhagesDB showed that in Cluster S, which includes Clarkson, multiple other phages such as Tesla and Gattaca had 100% matches with the Clarkson sequence and endonuclease functionality. NCBI BlastP supported these findings. CDD returned 3 specific hits with E < 2e-6 and which all indicated HNH endonuclease function. HHPred shows that there are multiple hits that share the head decoration functionality. One such hit, d4ogca2, had a 97.2% probability, 71.3% coverage, and an E value of 0.00028. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Rodriguez, Sean; Reyes, Glania /note=Secondary Annotator QC: I have QC`d this location call and agree with the first annotator. Starterator trumps other evidence. Suggestions: Mention whether the chosen start site includes all of the coding potential and mention the coding potential found in GeneMark Self and Host; it`s important to mention that they give slightly different information on the coding potential. Fix the typos within location call and Starterator sections. SS at 48915 looks like it has a gap of 28, not 3, bp. According to Guiding Principles, an overlap of 14 bp is considered abnormal, not realistic. Like the other gene, the pham number has changed since 1/13. Mention any function call found within the Phamerator data. CDS 49226 - 53182 /gene="94" /product="gp94" /function="minor tail protein" /locus tag="Clarkson_94" /note=Original Glimmer call @bp 49226 has strength 10.6; Genemark calls start at 49226 /note=SSC: 49226-53182 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Mycobacterium phage LittleLaf]],,NCBI, q1:s1 100.0% 0.0 GAP: 26 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.703, -6.344996894520007, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Mycobacterium phage LittleLaf]],,AYB69895,99.9241,0.0 SIF-HHPRED: e.3.1.1 (A:3-335) D-aminopeptidase, N-terminal domain {Ochrobactrum anthropi [TaxId: 529]},,,d1ei5a3,26.1002,100.0 SIF-Syn: Minor tail protein. Upstream gene is HNH endonuclease, downstream gene is minor tail protein, like in phages Tesla and Raela, both in Cluster S with Clarkson. /note=Primary Annotator Name: Scavetti, Alexa /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 49226 with start codon GTG. /note=Coding Potential: Yes. Both Self- and Host-Trained GeneMark show coding potential in the forward direction in the region from 49226 bp to 53182 bp. The chosen start site of 49226 does cover all of the coding potential for this region. /note=SD (Final) Score: -6.345 with a Z-value of 1.703. This is not the best possible SD score and Z-value, but other possible start sites with better scores would leave unreasonable (200+ bp) upstream gaps that were not indicated in similar phages. /note=Gap/overlap: There is a reasonable 26 bp gap between this gene and the upstream gene, creating the longest possible ORF and a reasonable gene length. /note=Phamerator: Pham number 95249 as of 1/13/2022. Conserved in other S cluster phages, including Corazon and Blackbeetle. Function call is minor tail protein, which is consistent between Phamerator and the phams database and is included on the approved SEA-PHAGES list. /note=Starterator: Start site 70 is the most conserved among 183/555 non-draft genomes in the pham but is not present in Clarkson. Start site 66 is the most conserved start present in Clarkson and was called in 14/14 phages in which it is present (all S cluster). Start 66 corresponds to 49226 in Clarkson and was indicated by GeneMark and Glimmer. /note=Location call: Based on the evidence, this is a real gene and the most likely start site is 49226. Starterator agrees with Glimmer and GeneMark for S cluster phages. /note=Function call: The top two hits on both PhagesDB and NCBI BLASTp, sorted by e-value, suggested function is minor tail protein, with high query coverage (100%), high % identity (99%), and low e-values (0.0). CDD and HHpred both called the function as beta-lactamase, but with low coverage (<27%), indicating that this result likely does not reflect the function of the entire gene as a whole, especially considering the more reliable results provided by PhagesDB and BLASTp. Furthermore, synteny with other S cluster phages (Tesla, Raela) indicates that this gene is a minor tail protein, so this function makes the most sense in context. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predict any TMDs, therefore it is not a membrane protein. This agrees with the functional call of minor tail protein, which is not associated with the membrane. /note=Secondary Annotator Name: Ruiz, Paola, Sean Rodriguez /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. /note=For the Phamerator, mention the programs that the function call was consistent with. Fill out the transmembrane domains section, even if it is not informative. Fill out the synteny box and function window. CDS 53179 - 55317 /gene="95" /product="gp95" /function="minor tail protein" /locus tag="Clarkson_95" /note=Original Glimmer call @bp 53179 has strength 8.12; Genemark calls start at 53179 /note=SSC: 53179-55317 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Mycobacterium phage Corazon]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.6, -5.589328413357676, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Mycobacterium phage Corazon]],,QFP97644,99.7191,0.0 SIF-HHPRED: SIF-Syn: Minor tail protein, upstream gene is in pham 95249, downstream is in pham 50093, just like in phages Corazon and JoieB, which are in cluster S with Clarkson. /note=Primary Annotator Name: Shah, Aayushi /note=Auto-annotation: Glimmer and GeneMark, both at start site ​​53179 /note=Coding Potential: There is reasonable coding potential between the putative ORF, and the chosen start site covers all this coding potential. /note=SD (Final) Score: -5.589, not the best but reasonable, lower as the gene is likely part of an operon. /note=Gap/overlap: 4 bp overlap, suggesting an operon. /note=Phamerator: The pham as of 01/12/22 is 21166. A lot of other phages in this cluster have this pham present, as seen in Blackbeetle_95, Corazon_89, and Gattaca_93. The function called in phamerator is minor tail protein. /note=Starterator: Start site 3 is conserved across many manually annotated genomes in the pham, and represents the start site at bp 53179. 14/19 non-draft genes call this start site, which is strong evidence. /note=Location call: Based on evidence the gene is real and has a start site of 53179, which covers all coding potential and has the most reasonable overlap and final score, as well as evidence from phamerator as conserved within a pham. /note=Function call: Minor tail protein: The top 5 non-draft PhagesDB BLAST hits call the function as a minor tail protein with e values of 0.0 for all of them, which is very strong evidence. In the NCBI BLAST, nine of the top ten calls sorted by e value called it as a minor tail protein, all with e values of 0.0 and above 50% percent identity, also constituting strong evidence of the function being a minor tail protein. Not called by either CDD or HHpred with any significance, but the BLAST evidence is strong enough to call the function with confidence. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, so it is not a membrane protein. /note=Secondary Annotator Name: Saha, Atul; Ruiz, Paola /note=Secondary Annotator QC: typo in auto-annotation summary, glimmer & genemark predict 53,179. other than that looks good! /note=I have QC’ed this location call and agree with the first and second annotator. CDS 55340 - 55747 /gene="96" /product="gp96" /function="hypothetical protein" /locus tag="Clarkson_96" /note=Original Glimmer call @bp 55340 has strength 16.6; Genemark calls start at 55340 /note=SSC: 55340-55747 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_LITTLELAF_93 [Mycobacterium phage LittleLaf] ],,NCBI, q1:s1 100.0% 1.7506E-91 GAP: 22 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.778, -3.969072509284561, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LITTLELAF_93 [Mycobacterium phage LittleLaf] ],,AYB69897,100.0,1.7506E-91 SIF-HHPRED: SIF-Syn: There is NKF for this gene in Pham 50093. Phages of the same Pham display the same pattern such as Phage Corazon, where gene 94 is in the same position as gene 90 from the Clarkson Phage. The previous gene is a minor-tail protein and the following gene has NKF. /note=Primary Annotator Name: Shaikh, Iman /note=Auto-annotation: both Glimmer and Genemark, with the start site at 55340 with an ATG start codon. /note=Coding Potential: These gene has reasonable coding potential which can be seen in the host-trained and self-trained GeneMark. The chosen start site covers the coding potential. /note=SD (Final) Score: -3.969 is the highest final score which corresponds to start site 55340. /note=Gap/overlap: There is a 22 basepair gap for start site 55340 and although this is not the smallest gap, this start site still has the highest final score and Z score. /note=Phamerator: The pham according to Jan 12, 2022 is 50093. /note=Starterator: (Start: 6 @55340 has 54 MA`s (out of 114) - most called /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any transmembrane domains for this gene. /note=Secondary Annotator Name: Scavetti, Alexa /note=Secondary Annotator QC: I agree with the location call given the above evidence. Note: Starterator and Coding Potential menus have not been filled out. I would also suggest adding the start codon to `Auto-annotation,` adding rationale to the ‘SD (Final) Score’ and `Gap/overlap` sections, adding the function call and specific comparison phages to `Phamerator,` and adding the specific base pair corresponding to the start site in `Starterator.` See Lab Manual: PECAAN Notes for more detail. CDS 55757 - 55954 /gene="97" /product="gp97" /function="hypothetical protein" /locus tag="Clarkson_97" /note=Original Glimmer call @bp 55757 has strength 8.48; Genemark calls start at 55757 /note=SSC: 55757-55954 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_LITTLELAF_94 [Mycobacterium phage LittleLaf] ],,NCBI, q1:s1 100.0% 2.49852E-39 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.637, -5.511537306530897, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LITTLELAF_94 [Mycobacterium phage LittleLaf] ],,AYB69898,100.0,2.49852E-39 SIF-HHPRED: SIF-Syn: My gene is NKF. The upstream gene is NKF in Clarkson and downstream is NKF. When comparing my sequence to Marvin, the same gene is NKF and upstream and downstream is NKF. This shows me that my gene sequence is real but unknown. The pham number for Clarkson and Marvin is 1321. My gene is NKF. The upstream gene is NKF in Clarkson and downstream is NKF. When comparing my sequence to Corazon, the same gene is NKF and upstream and downstream is NKF. This shows me that my gene sequence is real but unknown. The pham number for Clarkson and Marvin is 1321. /note=Primary Annotator Name: Sharma, Devshi /note=Auto-annotation: I used Glimmer and Genemark to determine if my gene is real. They both agree with each other and the start site number is 55757. The start codon is called ATG. /note=Coding Potential: The gene has very low coding potential predicted within the open reading frame. This is seen in the host trained gene mark and the self trained gene mark. The start site is seen in the coding potential in both these sequences too. When I compared the gene sequence to other sequences, there were many similarities which helped me decide if this is a real gene which I think it is. The start site covers the entire coding potential. Furthermore, since I am looking at the graphs and synteny and see little to no gap or overlap from my genes to neighboring genes, I can say that I believe 55757 would be my start site. There is a decent final and z score as well. /note=SD (Final) Score: -5.512 This is not the best SD score, however it gives the least gap between the previous sequence and my sequence. The RBS score I think may be irrelevant for the start call because seeing the overlap and agreement from Glimmer and Genemark is more convincing. /note=Gap/overlap: 9 This is a gap that is reasonable because it is so small and another gene can not fit. I do not believe that there are alternative start candidates because of the evidence given from Glimmer and Genemark. This reading frame is the LORF. The length of the gene is acceptable being around 200 base pairs long. /note=Phamerator: The pham that my gene is found in 1321 found on 01/17/2022. The gene seems to be found in other clusters as well so it would be conserved and commonly annotated. The phages I used for comparison are Beezlebub_100 and Blackbeetle_97. There was no data for consistent functions for my gene or functions at all. /note=Starterator: I think the reasonable start site is 55757 which seems to be conserved among the members of the pham. The start site number in this pham is 8 and the base pair it corresponds to is 55757 for my phage. 24 members are in this pham and 13 of the 22 non-draft genes call site #8. /note=Location call: Taking in all of the information, I can tell that my start site is 55757. My gene is definitely a real gene based on a variety of evidence. I think the best start site is 55757 for my open reading frame. The evidence from this annotation tells me that the start site is 55757 and there is a lot of evidence proving that it is. I believe that my gene is a real gene and the best potential start site candidate seems to be 55757. This is a real gene and the start site is 55757. There is good coding potential. /note=Function call: PhagesDB and BLAST programs were used to predict the functions of my gene. These both however, did not give a good indication of what the function of my gene is. Since there are a lot of gene sequences that match mine, however, I think that my gene is still a real gene but no function has been found and determined. CDD and HHpred did not result in any hits so I was unable to make a conclusion about my function. At this moment, I am unsure about the function of my gene. Both TMHMM or TOPCONS did not provide any TMDs which did not help me figure out the function of my gene. I was unable to come up with a function for my gene sequence. /note=Transmembrane domains: No transmembrane domains were predicted and since originally I did not have a function either, I am unable to figure out what the function of my gene is. /note=Secondary Annotator Name: Shah, Aayushi; Scavetti, Alexa /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered, including coding potential, starterator, gap/overlap, synteny, and final score. I noticed your synteny box text is pasted twice, but otherwise the annotation looks good. CDS 55951 - 56196 /gene="98" /product="gp98" /function="hypothetical protein" /locus tag="Clarkson_98" /note=Original Glimmer call @bp 55951 has strength 7.55; Genemark calls start at 55951 /note=SSC: 55951-56196 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_LITTLELAF_95 [Mycobacterium phage LittleLaf] ],,NCBI, q1:s1 100.0% 1.43623E-48 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.182, -4.36137266396122, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LITTLELAF_95 [Mycobacterium phage LittleLaf] ],,AYB69899,100.0,1.43623E-48 SIF-HHPRED: SIF-Syn: Function unknown. The Pham number is 5027, upstream Pham number 1321, downstream Pham number 12180, just like in phages Corazon and Blackbeetle. /note=Primary Annotator Name: Sun, Xingzheng /note=Auto-annotation: Both Glimmer and Genemark called start at 55951. /note=Coding Potential: Coding potential is found in both GeneMark Self and Host. The ORF has a coding potential on the forward strand, and the chosen start site also includes all of the coding potential. /note=SD (Final) Score: The Final Score is the best at -4.361, and the z-value is above 2. /note=Gap/overlap: Overlap of 4. It is acceptable because it can be a part of an operon. /note=Phamerator: The Pham number is 5027 as the date of 1/12/2022. It is conserved in other phages such as Corazon and Gattaca that also belong to cluster S. /note=Starterator: Start site 1 at 55951 in Starterator was manually annotated 14 out of 14 genes in this Pham and is the most annotated site. It is the same as the site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence, it is a real gene with the start site most likely to be 55951. /note=Function call: Function unknown. Both PhageDB and NCBI BLAST called function unknown for the top hits with supportive e-value 4e-40 and 1e-48. Neither did CDD nor HHPRED provide any supporting evidence for any function to call. /note=Transmembrane domains: Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs. It is not a membrane protein. /note=Secondary Annotator Name: Shaikh, Iman /note=Secondary Annotator QC: I agree with the location call for this gene. Great work on the detail in your PECAAN notes and rationale behind item. /note=Secondary Annotator Name: Shah, Aayushi /note=Secondary Annotator QC: I agree with this annotation and function call. All of the evidence categories have been considered, including coding potential, starterator, gap/overlap, synteny, and final score, and the lack of evidence to call function has been correctly identified. CDS complement (56409 - 56579) /gene="99" /product="gp99" /function="hypothetical protein" /locus tag="Clarkson_99" /note=Genemark calls start at 56579 /note=SSC: 56579-56409 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein SEA_LITTLELAF_96 [Mycobacterium phage LittleLaf] ],,NCBI, q1:s1 100.0% 2.48595E-32 GAP: -14 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.59, -3.978761577779152, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LITTLELAF_96 [Mycobacterium phage LittleLaf] ],,AYB69900,100.0,2.48595E-32 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Taylor, Amaya /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 19043 /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host /note=SD (Final) Score: -2.034 - best final score on PECAAN /note=Gap/overlap: 15 - reasonable for a gene /note=Phamerator: 12180 as of 01/13/22; It is conserved and found in Beelzebub_102, Blackbeetle_99. /note=Starterator: Start site 1 was manually annotated in 13/13 of non-draft genes in Pham. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 56579. /note=Function call: Based on evidence from BLAST and HHPred, the function of this gene is unknown. HHpred hits showed very high E-values at 8.5 and 9. CDD had no relevant hits. /note=Transmembrane domains: No predicted TMD’s in TMHMM or TOPCONS. Can conclude there are no transmembrane domains. /note=Secondary Annotator Name: Sharma, Devshi /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. I see that you have updated the starterator and coding potential drop down menu in Pecaan notes but make sure you fill our the menu. /note=New secondary annotator (module 9): Shaikh, Iman /note=Secondary Annotator QC: I have QCd the function call for this gene and agree with the primary annotator that there is no known function for this gene. CDS complement (56566 - 56688) /gene="100" /product="gp100" /function="membrane protein" /locus tag="Clarkson_100" /note=Original Glimmer call @bp 56688 has strength 9.79; Genemark calls start at 56661 /note=SSC: 56688-56566 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein FH33_gp095 [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 8.07996E-19 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.64, -4.255672789525901, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein FH33_gp095 [Mycobacterium phage MosMoris] ],,YP_009031605,100.0,8.07996E-19 SIF-HHPRED: SIF-Syn: This gene possesses synteny with analogs in Corazon and Blackbeetle, which are both also part of Pham 20705. Like this gene, the analog in Blackbeetle also has the assigned function of membrane protein. However, the analog in Corazon has been assigned no known function (NKF). The gene upstream of the selected gene possesses synteny with other genomes in the same cluster. The gene directly upstream of this gene within the Clarkson genome is part of Pham 12180. The genes directly upstream of the analogs for Clarkson within the Corazon and Blackbeetle genomes are also part of Pham 12180. All of these genes have the function NKF. The gene downstream of the selected gene possesses synteny with other genomes in the same cluster. The gene directly downstream of this gene within the Clarkson genome is part of Pham 76587. The genes directly upstream of the analogs for Clarkson within the Corazon and Blackbeetle genomes are also part of Pham 76587. All of these genes have the function NKF. /note=Primary Annotator Name: Torres Espinosa, Michael /note=Auto-annotation: Auto annotation is present for both Glimmer and Genemark. Glimmer start site is 56688, and Genemark start site is 56661. Start codon is GTG. /note=Coding Potential: The gene does have reasonable coding potential. The Glimmer start site does cover this coding potential. /note=SD (Final) Score: The final score is -4.256, and this is the second best score. The best final score is -4.234. /note=Gap/overlap: There is an overlap of 4 base pairs. This is reasonable for a gene found in an operon. /note=Phamerator: As of January 13, 2022, the pham for this gene is 20705. This gene is conserved in the phages Beelzebub, Corazon, and Blackbeetle. Like Clarkson, each of these phages is also in cluster S. /note=Starterator: There are 14 non-draft members for this Pham. 13/14 members call start site 2, which corresponds to base pair 56688 for Clarkson. This agrees with the start site predicted by Glimmer. /note=Location call: The previous evidence suggests that this is a real gene with start site 56688. Starterator agrees with Glimmer. /note=Function call: This protein is a membrane protein. This is supported by TMHMM and TOPCONS evidence. The top 15 hits on PhagesDB BLAST had no known function with e-values ranging from 7e-18 to 1e-18. The top 3 hits from NCBI BLAST are all hypothetical proteins with e-values ranging from 7e-18 to 8e-19, coverage at 100%, and percent identity ranging from 97.5% t0 100%. CDD did not provide any hits, and none of the hits from HHpred were related to any phage proteins. /note=Transmembrane domains: TMHMM called one TMD, and TOPCONS also called one TMD. Consequently, this is a membrane protein. /note=Secondary Annotator Name: Sun, Xingzheng, Devshi Sharma /note=Secondary Annotator QC: I have QC’ed this annotation and agree with the first annotator on location and functional call. I have QC’ed this location call and agree with the first and second annotator. CDS complement (56685 - 57755) /gene="101" /product="gp101" /function="hypothetical protein" /locus tag="Clarkson_101" /note=Original Glimmer call @bp 57755 has strength 16.59; Genemark calls start at 57755 /note=SSC: 57755-56685 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_LITTLELAF_98 [Mycobacterium phage LittleLaf] ],,NCBI, q1:s1 100.0% 0.0 GAP: 124 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.418, -4.6382689799084815, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LITTLELAF_98 [Mycobacterium phage LittleLaf] ],,AYB69902,99.7191,0.0 SIF-HHPRED: SIF-Syn: This gene has NKF. The upstream gene has NKF of pham 12642. The downstream gene is a membrane protein of pham 20705. This same gene structure is seen in phages Corazon and LittleLaf. /note=Primary Annotator Name: Charton, Chris /note=Auto-annotation: Glimmer and GeneMark both call for the start site @ 57755 /note=Coding Potential: There is strong evidence of coding potential in the reverse direction in the fifth reference frame. This is seen in both Glimmer and GeneMark /note=SD (Final) Score: -4.638, with a Z-score of 2.418. There are a few sites with lower Final score, but none of these have strong Z-scores. This site has the longest ORF with a strong Z-score. /note=Gap/overlap: 124 bp. This is a large gap, however there are no reasonable start sites with a shorter gap. This sizeable gap is also seen in other phages of cluster S, such as Corazon and LittleLaf /note=Phamerator: The Pham as of 2/6/2022 is 76587. This gene is found in other members of cluster S such as the aforementioned Corazon and LittleLaf /note=Starterator: Start site 47 is manually annotated in 47 of 86 non-draft phages of this pham, including 14 of 16 of the other phages in cluster S. This corresponds to a start site @57755 for Clarkson. /note=Location call: Based on the evidence, this is a real gene and a start site @57755 is well supported. /note=Function call: NKF. There are no known functions in the PhagesDB BLAST, and from NCBI Blast a few hypothetical proteins. /note=Transmembrane domains: None. Neither TMHMM nor TOPCONS predict any TMDs. /note=Secondary Annotator Name: Sun, Xingzheng /note=Secondary Annotator QC: I have QC’ed this annotation and agree with the first annotator on location and functional call. CDS complement (57880 - 58083) /gene="102" /product="gp102" /function="hypothetical protein" /locus tag="Clarkson_102" /note=Original Glimmer call @bp 58083 has strength 16.57; Genemark calls start at 58083 /note=SSC: 58083-57880 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_LITTLELAF_99 [Mycobacterium phage LittleLaf] ],,NCBI, q1:s1 100.0% 4.08247E-39 GAP: 7 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.07, -3.284137222879636, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LITTLELAF_99 [Mycobacterium phage LittleLaf] ],,AYB69903,100.0,4.08247E-39 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Vazquez, Gilda /note=Auto-annotation: Glimmer and Genemark were both used. Both agree on the same start site. Start site number called was 58083. /note=Coding Potential: The gene does have reasonable coding potential predicted within the putative ORF. There is some coding potential visible within the genes coordinates. /note=SD (Final) Score: The final score for the called gene is -3.284. The Z-score value is -3.284.This is the best value observed for this gene. /note=Gap/overlap: There is a 7bp gap (downstream). If alternative LORF start site is used, a big overlap will occur. The chosen start site does cover all the coding potential for the gene. /note=Phamerator: The gene is found in pham 12642, as of 01/18/2022. It is conserved in other members of the S cluster. All other phages were used for comparison, like Corazon and Beelzebub phages. The function call of this gene is not listed. /note=Starterator: Reasonable start site number 2 is 58083. It is called by 14/14 non-draft genes. This is also the same start site called by both GeneMark and Glimmer. /note=Location call: This is likely a real gene based on above values for start site and data for coding potential. Thus, it may be a functional gene. The start site does not need to be changed. /note=Function call: Phagesdb Hit Gene displayed many products with function unknown, these were the highest E-values (such as 2e-32). 100% identities in evidence hits and in 5 total others. NCBI BLAST products were listed with hypothetical proteins, e-values such as 4e-39 & 2e-33. /note=Transmembrane domains: Considering that this is a real gene, it is still a NKF, the absence of both TMDs and TOPCON images support that the function of this protein is unknown. With 0 TMDs observed in TMHMM, it is concluded that it is also not a membrane protein. /note=Secondary Annotator Name: Torres Espinosa, Michael /note=Secondary Annotator QC: I have QC`d this annotation, and I agree with the primary annotator. /note=Secondary Annotator Name: Carreon, Justin /note=Secondary Annotator QC: I have QC`d this annotation, and overall agree with the primary annotator for the location, function, transmembrane, and synteny calls. CDS complement (58091 - 58282) /gene="103" /product="gp103" /function="RNA binding protein" /locus tag="Clarkson_103" /note=Original Glimmer call @bp 58282 has strength 9.87; Genemark calls start at 58267 /note=SSC: 58282-58091 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein FH33_gp099 [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 1.58812E-38 GAP: -14 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.505, -4.535884094130711, yes F: RNA binding protein SIF-BLAST: ,,[hypothetical protein FH33_gp099 [Mycobacterium phage MosMoris] ],,YP_009031609,100.0,1.58812E-38 SIF-HHPRED: g.41.3.1 (A:) RBP9 subunit of RNA polymerase II {Thermococcus celer [TaxId: 2264]},,,d1qypa_,65.0794,96.2 SIF-Syn: The upstream gene is NKF (pham 12642), downstream gene is Membrane protein (pham 17746). This is similar to other cluster S phages including Beelzebub and Blackbeetle. /note=Primary Annotator Name: Villarreal, Alexia /note=Auto-annotation: Glimmer Start begins at 58282; GeneMark Start begins at 58267. Difference between start sites is 15 base pairs. Start codon is ATG. /note=Coding Potential: Coding potential indicated through Glimmer and GeneMark programs with regards the potential start sites and the stop site at 58091. This is a reverse gene. /note=SD (Final) Score: SD Score is -4.536, which is pretty good in consideration that it isn`t necessarily the best one listed. /note=Gap/overlap: Small overlap of 14 base pairs is in good standing as it it under 50 bp range. /note=Phamerator: pham: 12497. Date 01/19/2022. It is conserved; found in phages Beelebub_107, Blackbeetle_104, Corazon_97. /note=Starterator: Start site 3 in Starterator was manually annotated in 14/14 non-draft genes in this pham. Start 3 is at 58282 in Clarkson. This evidence agrees with the site predicted by Glimmer. /note=Location call: Highly reasonable that this gene`s start site is at 58282. Would advise to keep this start site for data obtained indicate good Final score of -4.536 and good Z-scorer of 2.505. The start codon being ATG indicates higher probability of usage. As well, this gene shows conservation through synteny when compared to other phage data. Based on the above evidence, this is a real gene and the most likely start site is 58282. /note=Function call:Hypothesize that the function of this ORF is associated with RNA binding, specifically to metals as this was a common characteristic found throughout well-rounded hits within HHpred. Commonality of RNA binding subunit characteristic with hits within HHpred program, as well this binding unit may specifically be with metals. The data of hits investigated displayed high probabilities, coverages, and low e-values. /note=Transmembrane domains: Transmembrane domains: TMHMM or TOPCONS both do not predict any TMDs. Therefore, it is not a membrane protein. /note=Secondary Annotator Name: Taylor, Amaya /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. /note=Secondary Annotator Name: Torres Espinosa, Michael /note=Secondary Annotator QC: I have QC’ed this annotation and agree with the first annotator. CDS complement (58269 - 58442) /gene="104" /product="gp104" /function="membrane protein" /locus tag="Clarkson_104" /note=Original Glimmer call @bp 58442 has strength 9.74; Genemark calls start at 58442 /note=SSC: 58442-58269 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_LITTLELAF_101 [Mycobacterium phage LittleLaf] ],,NCBI, q1:s1 100.0% 4.87653E-34 GAP: 52 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.921, -2.8454123590742793, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein SEA_LITTLELAF_101 [Mycobacterium phage LittleLaf] ],,AYB69905,100.0,4.87653E-34 SIF-HHPRED: SIF-Syn: Membrane protein, downstream gene is in pham 12497, and upstream gene is in pham 4347. This order shows synteny with phage Corazon. /note=Primary Annotator Name: West, Julie /note=Auto-annotation:Glimmer and Genemark. Both predict a start site of 58442. /note=Coding Potential: There is coding potential in the reverse strand of the ORF. Coding potential is evident in both host- and self-trained GeneMark. The suggested start site covers all the coding potential of this gene. /note=SD (Final) Score: -2.845. This is the best final score (least negative) on PECAAN. /note=Gap/overlap: There is a gap of 52 bp preceding gene (stop@58269 R), but it is not large enough to fit another gene. /note=Phamerator: pham:17746. Date 01/14/2022. The pham is conserved in several Cluster S phage, including Corazon and Beelzebub. /note=Starterator: Start site 2 was manually annotated in 14/14 non-draft genes. This site agrees with the Glimmer and Genemark predicted start site at 58442. /note=Location call: Taking all evidence into account, this is a real gene with a start site at 58442. /note=Function call: Membrane protein, which is supported by hits in TmHmm and TOPCONS. phages db and NCBI BLASTp show unknown functions/hypothetical proteins. These hits provide evidence for the existence of this gene. /note=Transmembrane domains: One transmembrane domain predicted by TOPCONS and TmHmm. /note=Secondary Annotator Name: Vazquez, Gilda /note=Secondary Annotator QC: I have reviewed and QC the annotation and I agree with the Primary Author. CDS complement (58495 - 58926) /gene="105" /product="gp105" /function="hypothetical protein" /locus tag="Clarkson_105" /note=Original Glimmer call @bp 58926 has strength 10.65; Genemark calls start at 58926 /note=SSC: 58926-58495 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FH33_gp102 [Mycobacterium phage MosMoris] ],,NCBI, q1:s1 100.0% 3.80808E-100 GAP: 2 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.921, -2.8454123590742793, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FH33_gp102 [Mycobacterium phage MosMoris] ],,YP_009031612,100.0,3.80808E-100 SIF-HHPRED: SIF-Syn: This gene shows synteny with Corazon, both have downstream genes in pham 11989 and upstream genes in pham 17746. /note=Primary Annotator Name: Tenney, Megan /note=Auto-annotation: Both GeneMark and Glimmer call a start site of 58929bp with a codon of GTG. /note=Coding Potential: There is strong coding potential found by both GeneMark and GeneMarkS that is captured by this ORF, beginning with the predicted start site. /note=SD (Final) Score: The final score is -2.845, which is not ideal but is the largest of all start site candidates. The z-score is strong, at 2.921. /note=Gap/overlap: The start site of 58926bp minimizes the gap (2bp) while maximizing the ORF length (432bp). /note=Phamerator: As of 1/12/22 this gene was in pham 4347, which is fully comprised of phages from cluster S, all of which have a conserved gene length of 432bp). This pham includes 16 phages, 14 of which are non-drafts like Beelzebub and Blackbeetle. None of these genes have known functions at this time. /note=Starterator: Start 2, at 58929bp in Clarkson, is highly conserved among all members of this pham. This start site was called 100% of the time it was present, which happened to be the case for all members. /note=Location call: Based on the strong coding potential found in both host-trained and self-trained GeneMark and the synteny with other members of cluster S, like Blackbeetle, this is a real gene with a start site of 58926bp. This start site minimizes the upstream gap while maximizing the ORF length which is highly conserved among pham members; it also was associated with the largest z and final scores. This start site was found in all members of pham 4347 and called 100% of the time it was present, according to Starterator. /note=Function call: There are strong hits, with low e-values, though all hits have no known function. So as of now this gene has no known function. /note=Transmembrane domains: There are no transmembrane domains according to TOPCONS and TmHmm. /note=Secondary Annotator Name: Villarreal, Alexia. Vazquez, Gilda. /note=Secondary Annotator QC: Agree with the conclusions stated above in that this is a real gene at the indicated start site considering the data regarding strong coding potential, synteny, strong z score and reasonable final score, and starterator program. I am the second QC annotator and I agree with the primary annotator`s location call, current synteny evidence, and other program data obtained for this gene like Starterator and Phamerator. Other data is currently missing regarding function call, additional synteny notes, and evidence boxes. CDS complement (58929 - 59087) /gene="106" /product="gp106" /function="hypothetical protein" /locus tag="Clarkson_106" /note=Original Glimmer call @bp 59087 has strength 6.76; Genemark calls start at 59087 /note=SSC: 59087-58929 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_LITTLELAF_103 [Mycobacterium phage LittleLaf] ],,NCBI, q1:s1 100.0% 7.08434E-30 GAP: 87 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.169, -5.631026884021104, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LITTLELAF_103 [Mycobacterium phage LittleLaf] ],,AYB69907,100.0,7.08434E-30 SIF-HHPRED: SIF-Syn: NKF, upstream gene is in pham 4347, downstream gene is pham 13464, just like in phage Blackbeetle. /note=Primary Annotator Name: Villarreal, Alexia /note=Auto-annotation: Glimmer Start begins at 58282; GeneMark Start begins at 58267. Difference between start sites is 15 base pairs. Start codon is ATG. /note=Coding Potential: Coding potential indicated through Glimmer and GeneMark programs with regards the potential start sites and the stop site at 58091. This is a reverse gene. /note=SD (Final) Score: SD Score is -4.536, which is pretty good in consideration that it isn`t necessarily the best one listed. /note=Gap/overlap: Small overlap of 14 base pairs is in good standing as it it under 50 bp range. /note=Phamerator: pham: 12497. Date 01/19/2022. It is conserved; found in phages Beelebub_107, Blackbeetle_104, Corazon_97. /note=Starterator: Start site 3 in Starterator was manually annotated in 14/14 non-draft genes in this pham. Start 3 is at 58282 in Clarkson. This evidence agrees with the site predicted by Glimmer. /note=Location call: Highly reasonable that this gene`s start site is at 58282. Would advise to keep this start site for data obtained indicate good Final score of -4.536 and good Z-scorer of 2.505. The start codon being ATG indicates higher probability of usage. As well, this gene shows conservation through synteny when compared to other phage data. Based on the above evidence, this is a real gene and the most likely start site is 58282. /note=Function call: /note=Transmembrane domains: 0 /note=Secondary Annotator Name: Taylor, Amaya /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. /note=Secondary Annotator Name: Villarreal, Alexia /note=Secondary Annotator QC: Looks like annotation notes were copied from another mistakenly and therefore do not reflect evidence collected for this gene, would go back in note history and re-input original evidence and notes. CDS complement (59175 - 59597) /gene="107" /product="gp107" /function="membrane protein" /locus tag="Clarkson_107" /note=Original Glimmer call @bp 59597 has strength 6.57; Genemark calls start at 59597 /note=SSC: 59597-59175 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_LITTLELAF_104 [Mycobacterium phage LittleLaf] ],,NCBI, q1:s1 100.0% 2.65321E-97 GAP: 13 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.924, -4.976656753876107, no F: membrane protein SIF-BLAST: ,,[hypothetical protein SEA_LITTLELAF_104 [Mycobacterium phage LittleLaf] ],,AYB69908,100.0,2.65321E-97 SIF-HHPRED: SIF-Syn: Membrane protein, upstream gene is NKF, downstream is NKF, just like in phage Corazon /note=Primary Annotator Name: Magaling, Janelle /note=Auto-annotation: Glimmer and GeneMark agree on start 59597. Start is GTG. /note=Coding Potential: There is one reverse ORF for GeneMark Self and Host. The start site covers all coding potential. There is not a lot of coding potential in Host, but a good amount in Self. /note=SD (Final) Score: Final score is -4.977 which is good. Z score is 1.924 which is almost good. /note=Gap/overlap: There is a 13bp gap. This is not the LORF. The LORF has a large overlap and low Z score so this is likely to be the start. Length of 423 is acceptable. /note=Phamerator: pham 13464. Date 1/6/22. It is conserved in other cluster S non-draft phages such as Beelzebub and LittleLaf. /note=Starterator: Most conserved start site is 2 among pham 13464 and this Clarkson has this start site. Start 2 is bp 59597 in Clarkson. 14/14 non-draft genes called site 2. /note=Location call: Above evidence suggests this is a real gene with good coding potential and is conserved in phamrator. Start site 2 is supported by Glimmer and GeneMark and conserved in other non-draft genes of the same pham and covers all coding potential. /note=Function call: Membrane protein. TmHmm and Topcons support this. Phagesdb blastp and NCBI blastp show the top hits (>1e-80 and >3e-97) with unknown functions/hypothetical protein. CDD and HHpred had no good hits. /note=Transmembrane domains: There is 1 TMD called on both TmHmm and Topcons. This is evidence that this may be a membrane protein. /note=Secondary Annotator Name: Abana, Juana /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS complement (59611 - 59760) /gene="108" /product="gp108" /function="hypothetical protein" /locus tag="Clarkson_108" /note=Original Glimmer call @bp 59760 has strength 6.37; Genemark calls start at 59760 /note=SSC: 59760-59611 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_BEELZEBUB_113 [Mycobacterium phage Beelzebub] ],,NCBI, q1:s2 100.0% 9.88852E-29 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.849, -5.132255893439239, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BEELZEBUB_113 [Mycobacterium phage Beelzebub] ],,AZF93371,98.0,9.88852E-29 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Batteikh, Maysaa /note=Auto-annotation: Both Glimmer and Genemark agree on the same start site, 59760 /note=Coding Potential: The coding potential for this ORF is covered and the start site and stop site for this gene is covered by the coding potential. It runs in the reverse direction, therefore it is a reverse gene. /note=SD (Final) Score: The start site has a final score of -5.071 and a z-score of 1.849, the second best option on PECAAN. It has a smaller overlap compared to the best option. /note=Gap/overlap: Overlap of -1 which could indicate an operon. /note=Phamerator: Belongs to pham 17406 as of 1/6/2022. It is found in 15 phages, 13 of which are non draft from the same cluster as clarkson. Found in Beelzebub_113 and blackbeetle_110 and 11 others. /note=Starterator: (Start: 4 @59763 has 7 MA`s), (Start: 5 @59760 has 7 MA`s) Split right down the middle. /note=Location call:Glimmer, genemark and starterator all agree on the same start site of 59760. -1 is a slightly more likely start site for an operon. With the coding potential covered and a decent final score, all the evidence suggests this gene is a real gene. /note=Function call: Unknown function. According to PhagesDB blast, multiple hits have shown to have low e-values (<1e-25) that indicate an unknown function. Similar results were seen on NCBI Blast with low e-values of less than e-28 , high percent identity (>97.95%) and 100% coverage that indicate a hypothetical protein. CDD and HHPRED had no significant results. /note=Transmembrane domains: Both TmHmm and Topcons had no hits indicating the gene is not a transmembrane protein. /note=Secondary Annotator Name: LastName, FirstName /note=Module 9: Abana, Juana- I agree with the function call that the primary annotator concluded as well as the fact that this gene doesn`t have a transmembrane domain. However, the synteny box was not filled in. CDS complement (59760 - 60020) /gene="109" /product="gp109" /function="hypothetical protein" /locus tag="Clarkson_109" /note=Original Glimmer call @bp 60020 has strength 11.67; Genemark calls start at 60020 /note=SSC: 60020-59760 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VASUNZINGA_108 [Mycobacterium phage VasuNzinga] ],,NCBI, q1:s1 100.0% 1.68172E-58 GAP: 906 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.408, -3.8308929311341546, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VASUNZINGA_108 [Mycobacterium phage VasuNzinga] ],,AYB70739,100.0,1.68172E-58 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF, downstream is NKF, just like in phage Corazon /note=Primary Annotator Name: Parikh, Himani /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 60020. /note=Coding Potential: Coding potential in this ORF is mostly in the reverse, with very small amounts of coding potential seen in the forward direction. This indicates that the gene is likely a reverse gene. Coding potential is found in GeneMark Self and Host. /note=SD (Final) Score: The final score is -3.831. While it is not the least negative, it is the second best score listed. /note=Gap/overlap: There is a large gap of 906. This gap does seem to be conserved in the final phages of Cluster S. /note=Phamerator: Pham: 21845; date: 1/12/22; It is conserved in other Cluster S phages: Poise, Beelzebub, Blackbeetle, Corazon, and Marvin. /note=Starterator: Start site 5 was manually annotated in 13/14 non-draft phages. Start site 5 is 60020, which corresponds to the call made by Genemark and Glimmer. /note=Location call: The evidence suggests that this is a real gene with a start site of 60020. /note=Function call: NKF; All phagesDB and NCBI blast hits call the same function /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs,so most likely it is not a membrane protein. /note=Secondary Annotator Name: Bartolome, Alexandra /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 60927 - 61205 /gene="110" /product="gp110" /function="hypothetical protein" /locus tag="Clarkson_110" /note=Original Glimmer call @bp 60927 has strength 9.24; Genemark calls start at 60783 /note=SSC: 60927-61205 CP: no SCS: both-gl ST: NI BLAST-Start: [hypothetical protein SEA_VASUNZINGA_109 [Mycobacterium phage VasuNzinga] ],,NCBI, q1:s66 100.0% 2.03729E-58 GAP: 906 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.574, -3.83711508627892, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VASUNZINGA_109 [Mycobacterium phage VasuNzinga] ],,AYB70743,58.5987,2.03729E-58 SIF-HHPRED: SIF-Syn: Like Corazon, this gene has NKF and both have upstream genes in pham 21845 and downstream genes with the function of glutamine amidotransferase domain protein. /note=stop @ 61205: tricky start site call. Small pham (only 2 other members). Chose 60927 based on better RBS scores and ATG start codon and starterator, but leaves large gap. /note=Primary Annotator Name: Tenney, Megan /note=Auto-annotation: Glimmer predicts a start site of 60927, a codon of ATG, and GeneMark predicts a start site of 60783 which also has codon ATG. /note=Coding Potential: There is coding potential in both host-trained and self-trained coding maps, which is strong evidence that this gene is real. Glimmer`s predicted start site of 60927 does not capture all coding potential seen in self-trained GeneMark, which supports an earlier start site. Host-trained GeneMark shows a large gap in coding potential from 60783bp to about 60975bp, and a small gap from 61165bp to 61205bp. /note=SD (Final) Score: The final score associated with start 60927 is -3.837, which is the second largest. The largest final score is associated with start site 60798, at -3.190. Both of these have strong z-scores of 2.574 and 2.885 respectively. /note=Gap/overlap: Start site 60927 has a very large gap of 906bp and the ORF length is not the largest, at 279bp. Start site 60798 has a slightly smaller upstream gap (777bp) and the ORF is 408bp. These are the most significant candidates given the z-scores and final scores. This large gap is conserved in other members of cluster S, like Blackbeetle and Beelzebub, so a gene should not be inserted here. /note=Phamerator: As of 1/12/22, this gene was in pham 13284 which contained 16 phages, 14 of which were non-drafts, like Beelzebub and Blackbeetle, all of which were in cluster S. The ORF lengths were slightly more variable, with 12 having an ORF larger than 400bp and the remaining 4 (2 of which were drafts) had an ORF length of 279bp. None of these had known functions at this time. /note=Starterator: Start 3, 60732bp in Clarkson, was the most annotated start with 10/14 non-draft phage annotations (62.5%) when present. Though, given the poor z-score (<2) and very negative final score, this might not be the most ideal start site. Start 8, 60927 has 2 manual annotations (one draft, one non-draft phage) and start 7 has 1 manual annotation (60798bp), and all these start candidates were found in all phages. (Start: 8 @60927 has 2 MA`s) /note=Location call: Given the strong coding potential and synteny with other members of cluster S, this is a real gene. The start site candidate that is most likely is at 60798bp since it has the strongest z and final scores and it allows all coding potential to be captured within the ORF. This ORF length is not the longest of all candidates, though this start site allows for a length of 408bp which is shared by Beelzebub (non-draft) in cluster S, and allows for a gap size that is conserved among other members of cluster S (777bp). This start site is only manually annotated in one other phage in its pham (Beezlebub) however the most annotated start site is a poor option for this gene due to poor z and final scores. /note=Function call: There are some significant hits in phagesDB blast as well as NCBI blast, though none have known functions. HHpred and CDD showed no significant hits. This has a function of NKF. /note=Transmembrane domains: Neither TmHmm or TOPCONS detect a transmembrane domain. This is not a membrane protein /note=Secondary Annotator Name: Bharadwaj, Shreya /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator. CDS 61264 - 62625 /gene="111" /product="gp111" /function="glutamine amidotransferase domain" /locus tag="Clarkson_111" /note=Original Glimmer call @bp 61264 has strength 5.79; Genemark calls start at 61264 /note=SSC: 61264-62625 CP: yes SCS: both ST: SS BLAST-Start: [glutamine amidotransferase domain protein [Mycobacterium phage Pringar] ],,NCBI, q1:s1 100.0% 0.0 GAP: 58 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.311, -2.606079664606164, yes F: glutamine amidotransferase domain SIF-BLAST: ,,[glutamine amidotransferase domain protein [Mycobacterium phage Pringar] ],,QFP96969,100.0,0.0 SIF-HHPRED: Amidoligase_2 ; Putative amidoligase enzyme,,,PF12224.11,40.8389,99.9 SIF-Syn: Glutamine amidotransferase domain protein, upstream gene is in pham 13284, downstream gene is a glutamine amidotransferase domain protein, just like in phage Blackbeetle. /note=Primary Annotator Name: Whang, Allison /note=Auto-annotation: Glimmer and Genemark both agree on a start site of 61264. The corresponding start codon for this start site is ATG. /note=Coding Potential: There is coding potential present for the entire ORF. While this coding potential is present in both host and self-trained Genemark, there are some areas of the ORF that don’t have any coding potential (around 61500 and 62400). The other areas of the ORF all show coding potential. /note=SD (Final) Score: -2.606. While this is not the most negative SD score present on PECAAN, this score is still reasonable because it is more negative than the required threshold (<-2). /note=Gap/overlap: 59 bp gap. While this gap is relatively large, it is a reasonable gap because there is very likely no space for a gene to be inserted within this gene and the upstream gene. Additionally, when comparing this Clarkson gene to the analogous gene in Blackbeetle, there is no gene inserted within this space. /note=Phamerator: Information collected on 1/17/2022. The gene is found in pham 3019. All of the other phages that also had genes within this pham were within cluster S, the same cluster as Clarkson. The function listed for each gene that has been called is a glutamine amidotransferase domain protein. /note=Starterator: Information collected on 1/14/2022. Start site 12 was the most annotated start site for the genes that are in this pham, called for 14/14 (100%). For this particular gene, the corresponding start number to 12 is 61264. This is the same start site that was agreed upon by Glimmer and Genemark. /note=Location call: This gene seems like a real gene because start site 61264 covers all coding potential within the ORF, and that Glimmer and Genemark agree on this start site. /note=Function call: Relevant PhagesDB blast hits all match to glutamine amidotransferase domain proteins. Interestingly, these PhagesDB blast hits all have an e-value of 0, which is normally much larger than optimal. Relevant NCBI blast hits also all match to some variation of glutamine amidotransferase domain proteins with high percent identity (>95%), but also have e-values of 0. The only relevant HHpred hit matches to a putative amidoligase enzyme, which is essentially the same function as a glutamine amidotransferase domain protein (albeit more unspecific). The function of this gene appears to be a glutamine amidotransferase domain protein. /note=Transmembrane domains: No transmembrane domains indicated by TMHMM or TOPCONS. /note=Secondary Annotator Name: Bovee, Alyson, Shreya Bharadwaj, /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. (Module 9) I have QC’ed this location call and agree with the first annotator. I have added relevant changes to the spreadsheet CDS 62705 - 63604 /gene="112" /product="gp112" /function="glutamine amidotransferase domain" /locus tag="Clarkson_112" /note=Original Glimmer call @bp 62705 has strength 7.73; Genemark calls start at 62705 /note=SSC: 62705-63604 CP: yes SCS: both ST: SS BLAST-Start: [glutamine amidotransferase domain protein [Mycobacterium phage Corazon]],,NCBI, q1:s1 100.0% 0.0 GAP: 79 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.823, -3.0301037898918786, yes F: glutamine amidotransferase domain SIF-BLAST: ,,[glutamine amidotransferase domain protein [Mycobacterium phage Corazon]],,QFP97661,100.0,0.0 SIF-HHPRED: d.153.1.1 (A:1-249) Glutamine PRPP amidotransferase, N-terminal domain {Escherichia coli [TaxId: 562]},,,d1ecfa2,57.1906,99.7 SIF-Syn: glutamine amidotransferase domain, upstream gene is glutamine amidotransferase domain, downstream is NKF, just like in phage Corazon /note=Primary Annotator Name: Magaling, Janelle /note=Auto-annotation: Glimmer and GeneMark both call start at 62705 with codon ATG /note=Coding Potential: Self trained GeneMark shows lots of coding potential with one forward ORF with the start site covering all coding potential. But Host trained shows less coding potential. /note=SD (Final) Score: The final score is -3.030 which is okay and is the best out of the potential starts. The Z score is 2.823 which is good and is the best out of the start sites. /note=Gap/overlap: There is a gap of 79 bp which is reasonable. This is not the LORF but the second longest. This is better than the LORF because the LORF has a 20bp overlap and lower Z and final scores. The length of the gene is 900bp which is reasonable. /note=Phamerator: 1/9/22 pham 15388. 16/16 members of the pham such as marvin and corazon are also in cluster S like clarkson. The genes with called functions all called for glutamine amidotransferase domain protein whichs is in the approved function list. /note=Starterator: The most annotated start was start 3 which was called on 13/13 non-draft genes in the pham. Clarkson also calls start 3 @62705. /note=Location call: Glimmer and GeneMark both agree on the start site and the start site has good Final and Z scores and covers all coding potential. The start is also conserved in phamerator and starterator. This is a real gene. /note=Function call: glutamine amidotransferase domain. The top HHpred hits suggest the function glutamine amidotransferase domain protein with good coverage (57%) and low e values (​​1.6e-14). The top NCBI BLASTp hits suggest glutamine amidotransferase domain protein with high coverage (100%), high identity (>98%), and low e values (​​0). CDD hits also suggest glutamine amidotransferase domain protein. /note=Transmembrane domains: There were no TMHs and so cannot look for topcon hits /note=Secondary Annotator Name: Bovee, Alyson /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator CDS 63601 - 63849 /gene="113" /product="gp113" /locus tag="Clarkson_113" /note= /note=SSC: 63601-63849 CP: yes SCS: neither ST: NI BLAST-Start: [hypothetical protein SEA_LITTLELAF_110 [Mycobacterium phage LittleLaf] ],,NCBI, q1:s1 100.0% 3.26451E-48 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.291, -4.134629096801125, yes F: SIF-BLAST: ,,[hypothetical protein SEA_LITTLELAF_110 [Mycobacterium phage LittleLaf] ],,AYB69914,98.7654,3.26451E-48 SIF-HHPRED: SIF-Syn: /note=Added Gene. CP on GM-Self and on GM-host. Gene also found in this location in many other S phages. SS at 63601 has gap of -4; possible operson. Also has best RBS scores. CDS 63846 - 64094 /gene="114" /product="gp114" /function="hypothetical protein" /locus tag="Clarkson_114" /note=Original Glimmer call @bp 63972 has strength 3.56; Genemark calls start at 63846 /note=SSC: 63846-64094 CP: no SCS: both-gm ST: NI BLAST-Start: [hypothetical protein FDI61_gp107 [Mycobacterium phage Marvin] ],,NCBI, q1:s1 100.0% 1.25058E-50 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.574, -3.54831954703195, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDI61_gp107 [Mycobacterium phage Marvin] ],,YP_009614225,100.0,1.25058E-50 SIF-HHPRED: SIF-Syn: Downstream gene shows synteny with phage Reala, in the pham 15388 with a function of glutamine amidotransferase domain protein. /note=Primary Annotator Name: Batteikh, Maysaa /note=Auto-annotation: Gene mark and Glimmer do not agree on a single start site, Glimmer indicates the start site as 63972 and Genemark’s start site it 63846 /note=Coding Potential: No coding potential is seen in self or host trained gene mark /note=SD (Final) Score: -3.548. Best final score with a high z-score of 2.574 /note=Gap/overlap:241 bp. The smallest overlap present for this gene. There is no known synteny for this gene with other phages. /note=Phamerator:As of 2/9/2022 there are 3 members to pham 20717, all of which are draft genes in cluster S. /note=Starterator: There is no annotations to summarize since this pham is comprised of all draft annotations, was not helpful. /note=Location call: Final score evidence and gap suggests the start site to be 63846, since it is the smallest gap for the gene. Looking at other phages in the cluster, they have genes that are not in the same pham, but do have large gaps between the final gene. /note=Function call: Unknown function, hits on PhagesDB and NCBI blast suggest that it is a hypothetical protein. Phages DB hits were less than e-40,while NCBI blast had hits that had large e values. All other databases had no evidence. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: