CDS complement (907 - 1110) /gene="1" /product="gp1" /function="hypothetical protein" /locus tag="PumpkinSpice_1" /note=Original Glimmer call @bp 1110 has strength 8.87; Genemark calls start at 1110 /note=SSC: 1110-907 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB80_gp001 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 8.89877E-40 GAP: 100 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.556, -3.50848394673489, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp001 [Streptomyces phage Karimac] ],,YP_009840174,100.0,8.89877E-40 SIF-HHPRED: SIF-Syn: This gene is NFK with pham number 7359, the upstream gene is NFK with pham number 63243, and there is no downstream gene since this is the first gene in the genome and it is a reverse gene, just like in phage Karimac. /note=Primary Annotator Name: Beaudin, Catherine /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start at 1,110. /note=Phamerator: Pham 7359 as of 04/21/2021. The gene is conserved in phages Battuta, Bordeaux, and Karimac which are all in the same subcluster as PumpkinSpice. On Phamerator, there is no function called for this gene. /note=Starterator: Start site 2 in Starterator was manually annotated in 38/38 non-draft genes in this pham. Start 2 is 1110 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Coding Potential: The ORF has reasonable coding potential in the self-trained GeneMark and the chosen start site includes all of the coding potential. /note=SD (Final) Score: -3.508. It is the best final score on PECAAN. The Z-score is the highest at 2.556. /note=Gap/overlap: The 100 bp gap is somewhat large but ultimately reasonable because the gap is conserved in other phages (Starbow, Bordeaux). There is no coding potential in the gap that might be a new gene. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 1,110. Glimmer, GeneMark, and Starterator all agree. /note=Function call: NKF (no known function). All PhagesDB BLAST hits have no known function. The top two PhagesDB hits with no known function have high query coverage (100%), high percent identity (100%), and low e-values (9e-34). The top two NCBI BLAST hits are hypothetical proteins with high query coverage (100%), high percent identity (98%+), and low e-values (<1e-39). There were no hits from CDD. No significant hits from HHpred meaning none of the hits had a probability above 80, coverage above 40%, e-value below 10e-3, and reasonable function. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs for this gene, therefore it is not a membrane protein. /note=Secondary Annotator Name: Howe, Kathryn /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator. CDS complement (1211 - 1513) /gene="2" /product="gp2" /function="hypothetical protein" /locus tag="PumpkinSpice_2" /note=Original Glimmer call @bp 1513 has strength 5.8; Genemark calls start at 1513 /note=SSC: 1513-1211 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_2 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 1.66491E-68 GAP: 97 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.066, -2.970161406017234, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_2 [Streptomyces phage Starbow] ],,AXH66513,100.0,1.66491E-68 SIF-HHPRED: SIF-Syn: The downstream and upstream genes have no known function just like in phage Bordeaux. /note=Primary Annotator Name: Bhatnagar, Keshav /note=Auto-annotation start source: Glimmer and GeneMark agree at start site of 1,513. /note=Phamerator: As of April 22nd, the pham is found in 19906. The gene is conserved among the same and different subclusters to which my phage belongs such as Battuta, BoomerJR, Bordeaux, and Celia (different subcluster). Phamerator nor phams database listed a function for this gene. /note=Starterator: There is a reasonable start site conserved among members of the same and different subclusters. For the conserved start, the start site number is 3 in the pham. In my phage the start site number is 3 with position 33,051. There are 40 members in this pham and 18/28 of the final genes called this start site while 4/12 draft genomes called this the start site. /note=Coding Potential: Strong coding potential on forward strand on self-trained Genemark. Start site covers all coding potential. /note=SD (Final) Score: Final score is -2.970. Not the best SD score, but strong enough to suggest presence of a credible RBS. /note=Gap/overlap: 97bp gap which isn`t significant enough for a gene to be inserted. Alternative candidates have smaller ORFs. /note=Location call: This is a real gene with likely start at 1,513. It has a conserved phamerator and starterator, strong RBS and Z-scores, a small gap, and covers all the coding potential. /note=Function call: There were no hits for CDD, NCBI and Phagesdb BLAST had an unknown function, and HHpred had one hit, which was a human spliceosome thus not relevant here. /note=Transmembrane domains: No hits for ThmHmm or Topcons, which makes sense since HHpred, CDD, NCBI and phagesDB BLAST listed no known function for the ORF. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS complement (1611 - 2651) /gene="3" /product="gp3" /function="hypothetical protein" /locus tag="PumpkinSpice_3" /note=Original Glimmer call @bp 2651 has strength 15.33; Genemark calls start at 2651 /note=SSC: 2651-1611 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_BORDEAUX_3 [Streptomyces phage Bordeaux] ],,NCBI, q1:s1 100.0% 0.0 GAP: 195 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.922, -4.9157003279346805, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BORDEAUX_3 [Streptomyces phage Bordeaux] ],,QGH79776,100.0,0.0 SIF-HHPRED: SIF-Syn: NKF pham 63243, upstream gene is pham 19906, downstream gene is pham 64002, just like in phage Battuta. /note=AF: Start (#65, 2651) is chosen by GM/Glimmer, closes gap, and is annotated 55.2% of time when present (mostly by BE2 phages, 14 times). Many more of genes in this pham call a start site (#74, 2606bp) - annotated 75% of time with present, by mix of BE1 and BE2 phages. I am choosing 2651 for above reasons. /note=Primary Annotator Name: Billings, Sophie /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start at 2651. /note=Phamerator: pham 58232. Date 4/16/21, It is conserved; found in Battuta (BE), Bordeaux (BE), Boomer (BE). /note=Starterator: Start site 72 in Starterator was manually annotated in 53/234 non-draft genes in this pham. Start 72 is 2606 in PumpkinSpice. This evidence does not agree with the site predicted by Glimmer and GeneMark, but is still the most logical choice. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating this is a reverse gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: The final score is the most logical option at -4.801 and the z score is 1.976. /note=Gap/overlap: 240bp. Somewhat large, but ultimately reasonable because the gap is conserved in other phages (Battua, Yaboi) and there is no coding potential in the gap that might be a new gene. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 2606. /note=Function call: No Known Function. All of the phagesdb BLAST hits have the function listed as "unknown function" with E-values less than 1e-7 and all NCBI BLAST hits also have the function as "unknown function" with E-values less than 1e-7. There were no relevant hits in CDD or in HHpred. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Chang, Loren /note=Secondary Annotator QC: I agree with the primary annotator`s call. Although the proposed start site goes against the predictions made by both Glimmer and GeneMark, the Starterator evidence is convincing enough to warrant calling 2606 the most likely start site. CDS complement (2847 - 3239) /gene="4" /product="gp4" /function="hypothetical protein" /locus tag="PumpkinSpice_4" /note=Original Glimmer call @bp 3239 has strength 5.27; Genemark calls start at 3239 /note=SSC: 3239-2847 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 2.92894E-92 GAP: 111 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.815, -4.0178693334748665, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_099970936,100.0,2.92894E-92 SIF-HHPRED: SIF-Syn: [NKF, both upstream and downstream genes immediately surrounding are currently unknown, (listed as phams 10814 upstream; and 63243 downstream; 05/14/2021) with the genes matching with phages Bordeaux/Genie2] /note=Primary Annotator Name: Bruns, James /note=Auto-annotation start source: Both Glimmer and GeneMark concur with a Start at 3239. /note=Phamerator: As of 04/22/2021 the gene in question belongs in pham 59902, and is present in non-draft phages Wipeout and TomSawyer. Both phages similarly belong to cluster BE. /note=Starterator: Start eight, of which 31 out of 42 non-draft genes confirm, is located on PumpkinSpice 3239 bp. This was also called by both GeneMark and Glimmer which agree with the most annotated start given by Starterator. /note=Coding Potential: Coding potential is present in both Host and Self Trained GeneMark analysis with some atypical potential in the Self-Trained GeneMark page. The predicted gene length by the Self-Trained GeneMark analysis represents the whole gene including the predicted start site. Gene is in the reverse orientation. /note=SD (Final) Score: Score of -4.018, and is the lowest negative score present on PECAAN. /note=Gap/overlap: Gap present totaling 111 bp, and is the longest reasonable ORF for this Gene call. Upon synteny comparisons (Bordeaux/IchabodCrane), the gap seen was considered correct. /note=Location call: Based on the data listed above, this is highly likely to be a true gene with a start site of 3239. /note=Function call: Function Unknown. Over ten of the top matches from PhagesDB show the function of this gene to be unknown, with Phages TomSawyer and Wipeout having a score of 277 with an e-value of 6e-75. The NCBI Blastp analysis concurs when considering the two top matches. CDD and HHpred queries similarly show no satisfactory function hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS show any hits. Therefore, this is not a membrane protein. /note=Secondary Annotator Name: Liu, Lily Xiaoxi /note=Secondary Annotator QC: All evidence categories have been considered, and I agree with this annotation. CDS complement (3351 - 3698) /gene="5" /product="gp5" /function="hypothetical protein" /locus tag="PumpkinSpice_5" /note=Original Glimmer call @bp 3698 has strength 14.44; Genemark calls start at 3698 /note=SSC: 3698-3351 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 6.48102E-79 GAP: 42 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.976, -4.801303705573801, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675371,100.0,6.48102E-79 SIF-HHPRED: SIF-Syn: NKF/Unknown function, upstream gene is in pham 59902, downstream is in pham 17523, just like in phages Battuta and BoomerJR. /note=Primary Annotator Name: Canio, Noah /note=Auto-annotation start source: Both Glimmer and Genemark call start site at 3698 bp. /note=Phamerator: Pham 10814. Date: 04/21/2021. This Pham is conserved in other BE cluster phages (Battuta and TomSawyer used for comparison). There is no function defined for this Pham across BE cluster phages in the phams database. /note=Starterator: Start site 15 is conserved in 30/66 non-draft phage annotations. The site corresponds to 3698 bp in PumpkinSpice. /note=Coding Potential: Coding potential on the ORF is on the reverse strand, and it is all covered by the chosen start site. Good coding potential on Self, but it is not sufficient on Host. /note=SD (Final) Score: -4.801 The SD score is the best out of all of the start site candidates listed. /note=Gap/overlap: 42 bp. This gap is not too large, and it is conserved in different phages such as BoomerJR and Genie2. There is no coding potential in the gap that would indicate a new gene. /note=Location call: The gathered evidence above suggests that this is a real gene with the most likely start site at 3698. Starterator agrees with Glimmer and GeneMark. /note=Function call: No Known Function/Unknown function. The top 4 phagesdb BLAST hits have the function of "function unknown" (E-value = 3e-59). The top 4 NCBI BLASTp hits suggested function is unknown/hypothetical protein, with high query coverage (100%), high percent identity (>95%), and low E-values (between 7e-79 and 7e-76). There is insufficient information from the CDD and HHpred hits to suggest the ORF’s function. There are no relevant hits from CDD while HHpred has a poor best hit with 69.7% probability, 35.6522% coverage, and an e-value of 42. /note=Transmembrane domains: TMHMM and TOPCONS do not predict TMD`s to be present. Therefore, it cannot be identified as a transmembrane protein based on this information. /note=Secondary Annotator Name: Ali Pour, Paria /note=Secondary Annotator QC: I have QC’ed this location call and agree with the primary annotator. CDS complement (3741 - 3902) /gene="6" /product="gp6" /function="hypothetical protein" /locus tag="PumpkinSpice_6" /note=Original Glimmer call @bp 3902 has strength 5.4; Genemark calls start at 3902 /note=SSC: 3902-3741 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB80_gp006 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 2.05667E-30 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.922, -5.186767100221219, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp006 [Streptomyces phage Karimac] ],,YP_009840179,100.0,2.05667E-30 SIF-HHPRED: SIF-Syn: (As of 04/23/2021): NKF, upstream gene is pham 59902, downstream is pham 17523, just like in Battuta and Genie2. /note=Primary Annotator Name: Castillo, Salvador /note=Auto-annotation start source: Glimmer and GeneMark #3902 /note=Phamerator: As of 4/23/21 pham # 17523. The phages in which this gene is conserved with the same length, 162bp, are Battuta and Birchlyn. No function is known for this gene. /note=Starterator: The reasonable conserved start site is number 1 as it is manually chosen by 100% (66/66) which for this gene for this phage is 3902. With 82 members, 16 of which are drafts. /note=Coding Potential: Only in the reverse direction of the Self-Trained GeneMark, this start site covers the most coding potential. /note=SD (Final) Score: The final score is 3rd best at -5.187. /note=Gap/overlap: the gap is 10bp and is of no concern, this shows that it can`t be part of an operon and is an average gap between genes. /note=Location call: This gene has good evidence that it is a real gene, it has good coding potential, and is conserved in phamerator. The start site is also selected in similar genes in Starterator by every annotator, 3902. /note=Function call: Neither phagesdb BLASTp nor NCBI BLASTp showed predictions for the function of this protein. Neither HHpred nor CDD called any good hits for the function of this gene. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predict any TMDs, therefore it is not a membrane protein. Therefore not a membrane protein. /note=Secondary Annotator Name: Kelly, Samuel /note=Secondary Annotator QC: Agree CDS complement (3912 - 4424) /gene="7" /product="gp7" /function="Lsr2-like DNA bridging protein" /locus tag="PumpkinSpice_7" /note=Original Glimmer call @bp 4424 has strength 8.12; Genemark calls start at 4424 /note=SSC: 4424-3912 CP: yes SCS: both ST: SS BLAST-Start: [Lsr2 family protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 7.68969E-118 GAP: 115 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.969, -3.1725816037952645, yes F: Lsr2-like DNA bridging protein SIF-BLAST: ,,[Lsr2 family protein [Streptomyces sp. JV178] ],,WP_099970937,100.0,7.68969E-118 SIF-HHPRED: Protein lsr2; DNA-binding domain, Immune response, DNA BINDING PROTEIN; NMR {Mycobacterium tuberculosis},,,2KNG_A,21.7647,98.5 SIF-Syn: Lsr2-like DNA bridging protein, upstream gene is unknown, downstream is unknown, same as BoomerJR. /note=Primary Annotator Name: Cervantes, Richard /note=Auto-annotation start source: Glimmer and Genemark both called start site at 4424. /note=Phamerator: Pham 3538 (date 4/24/2021). Conserved in following phages: BoomerJR (BE) and Karimac (BE) /note=Starterator: Start Site 9 was called for 46/82 non-draft genes, for Gene 7 in PumpkinSpice /note=Coding Potential: There is coding potential in the self-trained genemark but not the host trained. This is in the Reverse. /note=SD (Final) Score: -3.173, this is the best final score! /note=Gap/overlap: There is a 10 bp gap, with previous gene, being under the 30bp size threshold! /note=Location call: The auto-generated start site of 4424 appears correct and real, based on all of our previous notes. This gene is real. /note=Function call: DNA Binding Proteins was called in HHpred and in CDD with strong E-Values. In particular both e-values in HHpred were very small, both for the Pfam and PDB hit. /note=Transmembrane domains: There were no TMDs called in TMHMM nor TOPCONS. /note=Secondary Annotator Name: Namaganda, Samali /note=Secondary Annotator QC: I agree with the location call. You could talk about whether typical or atypical coding potential showed up. CDS complement (4540 - 5262) /gene="8" /product="gp8" /function="hypothetical protein" /locus tag="PumpkinSpice_8" /note=Original Glimmer call @bp 5262 has strength 5.3; Genemark calls start at 5262 /note=SSC: 5262-4540 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 6.68297E-178 GAP: 103 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.865, -4.954064444624791, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675372,99.5833,6.68297E-178 SIF-HHPRED: SIF-Syn: Pham 30103 (NKF) is also present in both Battuta (BE2) and Bordeaux (BE2), and in both phages the adjacent genes are also pham 3538 upstream and pham 6734 downstream. Pham designations as of 05/28/2021. /note=Primary Annotator Name: Chang, Loren /note=Auto-annotation start source: Both Glimmer and GeneMark call the start site to be at 5262 bp. /note=Phamerator: Pham 30103 (04/21/2021). It is conserved; found in Battuta (BE2), Birchlyn (BE2), and Bordeaux (BE2). /note=Starterator: Start site 26 in Starterator was manually annotated in 142 out of 256 non-draft genes in this pham. Start site 26 is position 5262 in PumpkinSpice. This agrees with the autoannotated start site predicted by Glimmer and GeneMark. /note=Coding Potential: At this ORF, coding potential is not found in GeneMark, but good coding potential is found in GeneMark Self (in a reverse reading frame). Almost all, but not quite all, of the coding potential is covered. /note=SD (Final) Score: The score is -4.954, and it is the third highest final score on PECAAN. /note=Gap/overlap: Gap of 103 bp. Somewhat large, but reasonable; it is conserved in other phages like Battuta and Bordeaux. Also, there is no significant coding potential located in this gap. /note=Location call: Given the above evidence, this appears to be a real gene with a start site at 5262 bp. /note=Function call: No known function. The top hits from PhagesDB BLAST (E- values 1e-140) are all associated with proteins of no known function. The top hits from NCBI BLAST (100% coverage, 99%+ identity, E-values<2.75e-177) also give proteins of no known function. Both CDD and HHPred gave one shared significant hit- gene product 88, a protein with no known function. The HHPred hit had high probability (100%) and low E-value (3.6e-28), although coverage was not particularly high (67%). The CDD hit also gave a low E-value (1.58e-08). Ultimately, all of the evidence gathered shows that this ORF has no known function. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predicts the presence of any TMDs, suggesting that the protein is not a membrane protein. /note=Secondary Annotator Name: Haeri, Alliya /note=Secondary Annotator QC: I agree with this location call. All the appropriate sections have been correctly filled out. CDS complement (5366 - 5530) /gene="9" /product="gp9" /function="hypothetical protein" /locus tag="PumpkinSpice_9" /note=Original Glimmer call @bp 5530 has strength 6.56; Genemark calls start at 5530 /note=SSC: 5530-5366 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 9.22393E-29 GAP: 55 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.986, -4.780746793811868, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_180304116,100.0,9.22393E-29 SIF-HHPRED: SIF-Syn: NKF (Pham 6734), upstream gene is Pham 30103, downstream gene is Pham 54470, just like in phage MindFlayer. /note=Primary Annotator Name: Howe, Kathryn /note=Auto-annotation start source: Both Glimmer and GeneMark called the start site at 5530. /note=Phamerator: As of April 21, 2021, this gene was part of Pham 6734. This gene is conserved with other members of the same cluster BE, such as Battuta and IchabodCrane. Neither Phamerator nor other databases listed a function for this gene. /note=Starterator: The most conserved start site for this gene was at start site number 1, which corresponds to start site 5530 in the PumpkinSpice phage. This start site number was conserved in 30 out of 30 non-draft phage annotations. /note=Coding Potential: There is very high coding potential seen in the self-trained GeneMark and the chosen start site at 5530 covers all of the coding potential. /note=SD (Final) Score: The SD score is the best, -4.781 with the least negative value compared to all other start site options. /note=Gap/overlap: The gap with the upstream gene is 55bp which is a reasonable gap. /note=Location call: Using the information gathered so far, it is safe to call this a real gene. Starterator shows that the autoannotated start site is conserved. The autoannotated start also has the best Final score and z-score value of 1.986. Using this information, the most probable start site is at bp coordinate 5530. /note=Function call: Based on the evidence, there is no known function yet for this gene. All of the PhagesDB blasts, including those with e-values within our threshold, had no known function and the NCBI blasts were only hypothetical proteins. CDD and HHpred also did not produce any significant hits, which further suggests that this gene has no known function. /note=Transmembrane domains: Based on the TMHMM and TOPCON data, this gene does not code for a transmembrane protein. /note=Secondary Annotator Name: Haeri, Alliya /note=Secondary Annotator QC: Considering all the available evidence, I agree with the start location call of 5,530. CDS complement (5586 - 5726) /gene="10" /product="gp10" /function="hypothetical protein" /locus tag="PumpkinSpice_10" /note=Original Glimmer call @bp 5726 has strength 11.78; Genemark calls start at 5726 /note=SSC: 5726-5586 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 3.35426E-24 GAP: 132 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.713, -3.1816499351312086, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_180304117,100.0,3.35426E-24 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kim, James Joon /note=Auto-annotation start source: Glimmer and GeneMark where they both show the start being at 5726. /note=Phamerator: Pham number 54470 was run 04/21/21. /note=Starterator: The start number called the most often in the published annotations is 9, it was callednin 28 of the 28 non-draft genes in the pham. /note=Coding Potential: The coding potential is shown through its reverse strand, indicating that this gene is a reverse strand where this is visualized through the Host-Trained GeneMark data. /note=SD (Final) Score: The SD score is -3.182 which is very good and the Z-Score being greater than 2.5 is also very good results. /note=Gap/overlap: There is a gap of 132 bp but it doesn`t need to be filled for this gene. /note=Location call: With the given evidence, this appears to be a real gene with its respective start site at 5726. /note=Function call: No known function /note=Transmembrane domains: N/A, there were no predictions and it seems like this gene is truly has a `no known function`. /note=Secondary Annotator Name: Merlos, Andres Fernado /note=Secondary Annotator QC: After confirming the GeneMark and Glimmer start site, looking at the phamerator and starterator showed that start site (9, 5726) is a mostly annotated start site number and well conserved. The Final Score and Z-Score are sufficient, and the coding potential is logical. I would confirm that this gene is real with a start site at 5726. CDS complement (5859 - 6362) /gene="11" /product="gp11" /function="hypothetical protein" /locus tag="PumpkinSpice_11" /note=Original Glimmer call @bp 6362 has strength 12.39; Genemark calls start at 6362 /note=SSC: 6362-5859 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 7.87835E-118 GAP: 130 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.049, -3.30700010583914, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_099970938,100.0,7.87835E-118 SIF-HHPRED: SIF-Syn: Gene in pham 45824 is upstream of gene in pham 54470 (NKF) and downstream of gene in pham 18063 (NKF) just like in phages Battuta, Birchlyn, Bordeaux, and IchabodCrane. /note=Primary Annotator Name: Delgado, Yennifer /note=Auto-annotation start source: Glimmer and GeneMark. Both called the start at 6362. /note=Phamerator:pham: 45824. Date: 04/21/2021. It is conserved; found in Battuta_11, Birchlyn_8, and Bordeaux_11. /note=Starterator: Start site 3 in Starterator was manually annotated in 56/62 non-draft genes in this pham. Start 3 is 6362 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.307. It is the best final score on PECAAN. /note=Gap/overlap: 130 bp. The gap is a little bit large, but reasonable because the gap is conserved in other phages (Battuta, IchabodCrane) and there is no coding potential in the gap that might be a new gene. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 6362. /note=Function call: NKF. Even though HHpred provides good evidence for a potential function for this gene, CCD showed no significant hits. In addition, the top two phagesdb BLAST hits have unknown function (E-value =6e-93), and the 3 top NCBI BLAST hits also have unknown function (hypothetical protein) with 100% coverage, 98%+ identity, and E-value <6e-117. Thus, the function of this protein is very likely to be unknown. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Liu, Lily Xiaoxi /note=Secondary Annotator QC: In your starterator data, you said start site 3, which was agreed upon by GeneMark and Glimmer was the most annotated site. So why did you choose Not Informative in your Starterator menu? Other than that, all the evidence categories have been considered and I agree with this annotation. CDS complement (6493 - 6681) /gene="12" /product="gp12" /function="hypothetical protein" /locus tag="PumpkinSpice_12" /note=Original Glimmer call @bp 6681 has strength 10.51; Genemark calls start at 6681 /note=SSC: 6681-6493 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 2.27832E-36 GAP: 106 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.577, -3.4645021069762962, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_099970939,100.0,2.27832E-36 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Dines, Lily /note=Auto-annotation start source: Glimmer and GeneMark. Same start site at #6681. /note=Phamerator: Pham 18063 on 04/21/21. All genes in this pham come from cluster BE phages. Used Battuta, Bordeaux, and Birchlyn for comparison. No function called. /note=Starterator: Reasonable start choice, highly conserved in other cluster BE members. Start coordinate: (2, 6681). 20/22 non-draft phages called site #2 (gene is duplicated in all phages in this pham, so 10/11 phages call site #2 at each replication). /note=Coding Potential: The gene has strong coding potential within the ORF. The start site covers all of the coding potential. /note=SD (Final) Score: -3.465; 2nd highest score, but this is a reasonable start site because the best SD score has a gap of 271 bp, while the chosen candidate has a gap of 106 bp. /note=Gap/overlap: 106; LORF; this gap is reasonable as realistically another gene would not fit, as they are mostly over 130 bp. /note=Location call: Start site is 6681, and there are 10 start sites on Starterator. Evidence suggests that suggested start site is the real start site. This gene is real. Alternate start site starts with TTG, which is rare, whereas original starts with ATG, which is common. Synteny maps indicate original start site is correct. Z score and final score are strong. Nearly every gene in the pham called this start site, each track has 2 as the manual start site. Manual annotations of this start site were 20 0f 22 (10 of 11 without the duplication). /note=Function call: No program returned any informative results. There was 1 hit with a confidence value of 90, however the e-value did not meet the threshold, and there was no known function anywhere in other phages who shared this pham. Therefore, there is no known function. /note=Transmembrane domains: None /note=Secondary Annotator Name: Castillo, Salvador /note=Secondary Annotator QC: I agree with the primary annotator`s call. CDS complement (6788 - 7141) /gene="13" /product="gp13" /function="ParB-like nuclease domain" /locus tag="PumpkinSpice_13" /note=Original Glimmer call @bp 7141 has strength 14.73; Genemark calls start at 7141 /note=SSC: 7141-6788 CP: yes SCS: both ST: SS BLAST-Start: [ParB-like nuclease domain protein [Streptomyces phage Starbow] ],,NCBI, q1:s3 100.0% 8.13866E-82 GAP: 126 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.567, -3.5660483139923818, no F: ParB-like nuclease domain SIF-BLAST: ,,[ParB-like nuclease domain protein [Streptomyces phage Starbow] ],,AXH66524,98.3193,8.13866E-82 SIF-HHPRED: ParB domain protein nuclease; ParB-N, pnob8, partition, HYDROLASE; HET: MSE, CIT; 2.45A {Sulfolobus solfataricus},,,5K5D_C,62.3932,99.1 SIF-Syn: Gene 13 is a part of Pham 64554 with a ParB-like nuclease domain function with the upstream gene being from Pham 18063 noted as a membrane protein and the downstream gene is from Pham 17191 with no known function. This pattern of synteny is seen in both Starbow and MIndflayer except in these both the downstream and upstream genes are noted as having no known function. /note=Primary Annotator Name: Do, Vivian /note=Auto-annotation start source: Both Glimmer and GeneMark have a predicted start site of 7141. /note=Phamerator: This gene belongs to Pham 60181 based on an analysis ran on 04.21.21. The gene seems relatively conserved between other phage genes such as Battuta, Birclyn, and Bordeaux. The majority of these genes code for ParB-like nuclease domain protein. /note=Starterator: The most manually annotated start site (45 ma) was start site 20 which for Pumpkin spice is located at 7141 bp. This is consistent with the auto-annotated start site. /note=Coding Potential: High coding potential for Host-trained and Self-trained GeneMark on the 1st reading frame of the reverse strand. /note=SD (Final) Score: -3.566, though this not the best score it is the second best meaning it is rather reasonable as it is the best SD score that encompasses the entirety of the coding potential. It also has a good z-score at 2.329. /note=Gap/overlap: 126 bp overlap this is a reasonable gap as an ~120bp gap is conserved in other phages such as LukeCage and Starbow. /note=Location call: We have maintained a location call at 7141 as it is in favor with both Glimmer and GeneMark while maintaining a favorable z-score and RBS Final score and having a particularly long ORF and reducing the gap between neighboring genes. This start site was also the most manually annotated start site as of Starterator helping confirm this located as the true start site. /note=Function call: ParB-like nuclease domain protein as the e-values as significantly below 10e-7, the lengths match, and the top hits are from the same host and cluster. Top hits for HHpred and CDD also agree with this function call with top hits having a probability higher than 95%, a coverage percentage of 50-60%, and e-values much lower than 10e-3. /note=Transmembrane domains: According to both TMHMM and Topcons no transmembrane domains were predicted which makes sense are nuclease activity is generally localized within the nucleus. /note=Secondary Annotator Name: Cervantes, Richard /note=Secondary Annotator QC: Secondary Annotator Agrees with the above calls. CDS complement (7268 - 7411) /gene="14" /product="gp14" /function="membrane protein" /locus tag="PumpkinSpice_14" /note=Original Glimmer call @bp 7411 has strength 8.8; Genemark calls start at 7411 /note=SSC: 7411-7268 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 3.46289E-22 GAP: 117 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.165, -4.85287563363746, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_180304118,100.0,3.46289E-22 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Haeri, Alliya /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start at 7,411. /note=Phamerator: Pham: 17191. Date 4/21/2021. It is conserved, found in IchabodCrane (BE2) and Karimac (BE2). /note=Starterator: Start site 6 was manually annotated in 30/30 non-draft genes in this pham. Start 6 is 7,411 in PumpkinSpice. This evidence agrees with the site predicted by both Glimmer and GeneMark. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, suggesting that this is a reverse gene. High levels of coding potential is found in both the Host-trained and Self-trained GeneMark. /note=SD (Final) Score: The final score is the best option at -4.853 and the z-score is the highest at 2.165. /note=Gap/overlap: The gap for this gene is somewhat large at 117 base pairs. However, there is no coding potential being overlooked and this gap is conserved in other phages, such as IchabodCrane and LukeCage. /note=Location call: This gene is likely a real gene with the start site of 7,411 bp. /note=Function call: Unknown function. All the PhagesDB BLAST hits have unknown functions, including the top two hits (e-value=6e-21 for both). Similarly the top two NCBI BLAST hits have their functions listed as hypothetical protein (100% coverage, 97%+ identity, e-value=<3e-21). CDD had no hits and HHpred had no relevant hits. /note=Transmembrane domains: Due to a single hit from both TmHmm and TOPCONs, there is now justification to classify the function call as a membrane protein. /note=Secondary Annotator Name: Jakupova, Malika /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS complement (7529 - 7690) /gene="15" /product="gp15" /function="hypothetical protein" /locus tag="PumpkinSpice_15" /note=Original Glimmer call @bp 7750 has strength 11.39; Genemark calls start at 7747 /note=SSC: 7690-7529 CP: yes SCS: both-cs ST: NA BLAST-Start: [hypothetical protein SEA_BIRCHLYN_272 [Streptomyces phage Birchlyn]],,NCBI, q1:s21 100.0% 2.40618E-29 GAP: 114 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.556, -3.859592806742189, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BIRCHLYN_272 [Streptomyces phage Birchlyn]],,QDF17390,72.6027,2.40618E-29 SIF-HHPRED: SIF-Syn: NKF; Gene is conserved in phage Battuta and Wipeout. Both gaps upstream and downstream are conserved and no gene addition is necessary. /note=AF: Changed start site. Start: 3 @7690 has 27 MA`s /note=Primary Annotator Name: Hugo, Cristelle /note=Auto-annotation start source: Glimmer 7750; GeneMark 7747 /note=Phamerator: 4/24 Pham: 18618. It is conserved in 19/21 of the BE2 phages. No function listed. /note=Starterator: Start site 1 was manually annotated in 1/28 non-draft genes in this pham. It is 7750 in PumpkinSpice. It agrees with the site predicted by Glimmer. Although hardly any other phage has this as the start site, the most annotated start site does not capture the full coding potential. /note=Coding Potential: Good coding potential on the reverse strand, with start site covering all potential. /note=SD (Final) Score: -5.649. It is not the best, but all evidence points to 7750 being the best start site. /note=Gap/overlap: 54. The gap of 54 is larger than normal, but there is no other possible start site that would shorten this. Adding in a gene is not really possible since it would be too small of a gene. /note=Location call: This is a real gene, and the most likely start site is 7750. /note=Function call: Not known. Looking at last week`s analysis, all phages with significantly similar sequences had no known function. CDD resulted in no results, and HHPRED had no significant hits (all under 60% probability). /note=Transmembrane domains: 0; no predictions, NKF /note=Secondary Annotator Name: Kim, James Joon /note=Secondary Annotator QC: All the annotations look very good and detailed. I agree with the above annotations and believe this gene is real. CDS complement (7805 - 8143) /gene="16" /product="gp16" /function="hypothetical protein" /locus tag="PumpkinSpice_16" /note=Original Glimmer call @bp 8143 has strength 5.23; Genemark calls start at 8143 /note=SSC: 8143-7805 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_WIPEOUT_15 [Streptomyces phage Wipeout] ],,NCBI, q1:s1 100.0% 3.59287E-77 GAP: 113 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.049, -2.5588120788329394, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_WIPEOUT_15 [Streptomyces phage Wipeout] ],,QGH74262,100.0,3.59287E-77 SIF-HHPRED: SIF-Syn: This gene has NKF (pham 5540), has synteny with gene (pham 5540) that also has no function listed. Both genes are located at the same place and have same phams. Also, synteny can be seen with an upstream gene (Pham 18618) and downstream ( pham 8995) with phage Genie2. /note=Primary Annotator Name: Jakupova, Malika /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start at 8143. /note=Phamerator: the pham is 5540. This analysis was run 04/16/21. It is conserved; found in Battuta_16(BE) and Cross_17(BE). /note=Starterator: The start number called the most often in the published annotations is 9, it was called in 49 of the 54 non-draft genes in the pham. Start site 9 is 8143 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. The ORF does have reasonable coding potential on both GeneMark-Self and GeneMark-Host. The chosen start site does include all of the coding potential. /note=SD (Final) Score: -2.559. It is the best final score on PECAAN. /note=Gap/overlap: The gap/overlap with the upstream gene is a little large at a 113 bp gap. However, this gene is conserved in several other phages and the gap was seen in the other phages as well, such as phage MindFlayer and Karimac. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 8143 bp. Starterator agrees with the start site that was predicted by Glimmer and GeneMark. /note=Function call: NKF. CDD doesn`t provide any information about this gene. NCBI BLAST most of its hits show hypothetical function for this gene`s protein, with (e-values <10^-77, coverage 100%, identity 90%+). PhagesDB Blast also that this gene`s function is unknown, (e-values <10^-61). HHpred also provides uninformative information about this gene`s function, where most of its hits are either uncharacterized or have no function, also all of its hits have very positive e-values. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Liu, Lily /note=Secondary Annotator QC: Please indicate whether it is a forward or a reverse gene in the coding potential section, and please fill out the coding potential menu. Other than that, all the evidence categories have been considered and I agree with this annotation. CDS complement (8257 - 8523) /gene="17" /product="gp17" /function="hypothetical protein" /locus tag="PumpkinSpice_17" /note=Original Glimmer call @bp 8523 has strength 8.77; Genemark calls start at 8523 /note=SSC: 8523-8257 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_17 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 2.32491E-55 GAP: 34 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.19, -4.354778061539587, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_17 [Streptomyces phage Starbow] ],,AXH66528,100.0,2.32491E-55 SIF-HHPRED: SIF-Syn: Good pham synteny (lack of functional genes to observe immediately up/downstream from this gene) when compared to Genie2, IchabodCrane, and LukeCage. /note=Primary Annotator Name: Kelly, Samuel /note=Auto-annotation start source: Glimmer and GeneMark both have the start point marked at 8523. /note=Phamerator: (4/23/21) Pham 8995. This pham contains only genes from cluster BE, and no function was called. /note=Starterator: Start coordinate is (1, 8523). 100% (30/30) of manually annotated non-draft phages call start site #1, and this site is also highly conserved, suggesting it is likely correct. /note=Coding Potential: Good coding potential on both Host- and Self-trained GeneMark, with some atypical coding potential on the Self-. Coding potential is in the reverse orientation for this gene. /note=SD (Final) Score: -4.355, best of all the candidates. /note=Gap/overlap: Very small gap (34 bp) before gene, fairly large gap after gene but this is reasonable since a gene was excluded in this spot. /note=Location call: Considering the evidence, it seems reasonable to place the start site at 8523, which allows for coverage of all coding potential. Starterator and Phamerator data both support this start site as well, with high likelihood. /note=Function call: All of the top calls on both NCBI and PhagesDB suggest that this is a hypothetical protein/unknown function, with strong e-values of < 2e-43 and coverage of 98-100%. Neither CDD nor HHpred came up with significant hits, so function is likely NKF. /note=Transmembrane domains: Zero hits in either TMHMM or TOPCONS. /note=Secondary Annotator Name: Namaganda, Samali /note=Secondary Annotator QC: I agree with the location call and the explanation looks good. CDS complement (8558 - 8836) /gene="18" /product="gp18" /function="hypothetical protein" /locus tag="PumpkinSpice_18" /note=Genemark calls start at 8836 /note=SSC: 8836-8558 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s3 100.0% 1.50103E-61 GAP: 120 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.653, -4.6114151184034835, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_099970943,97.8723,1.50103E-61 SIF-HHPRED: SIF-Syn: Pham 47622 (as of 5/26/2021), is a membrane protein (Pham 17191 as of 5/25/2021), downstream is an HNH endonuclease (Pham 55565 as of 5/25/2021), just like in phages Starbow, Karimac, and IchabodCrane. /note=Primary Annotator Name: Lapurga, Kaira /note=Auto-annotation start source: Only GeneMark calls the start at 8836. Glimmer does not call the start. Possible suggested start sites include 8842, 8674, or 8650. /note=Phamerator: 47622. Date 4/22/2021. It is conserved; found in phagfes Battuta, IchabodCrane, and Starbow. /note=Starterator: Start site 4 in Starterator was manually annotated in 18/26 non-draft genes in this pham. Start 4 is 47622. This evidence agrees with the site predicted by GeneMark. /note=Coding Potential: Coding potential is only on reverse strand indicating a reverse gene. Coding potential only found in Self-Trained GeneMark. /note=SD (Final) Score: final score is -4.611 /note=Gap/overlap: Gap is at 120 /note=Location call: Real gene and the start site can remain at 8836 due to it covering all coding potential as seen in GeneMark. /note=Function call: CDD lacked hits and HHpred only had weak e-values, so cannot be determined, however strongest call is terminase. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Wang, Yiyang /note=Secondary Annotator QC: CDS complement (8957 - 9586) /gene="19" /product="gp19" /function="hypothetical protein" /locus tag="PumpkinSpice_19" /note=Original Glimmer call @bp 9586 has strength 9.93; Genemark calls start at 9586 /note=SSC: 9586-8957 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB80_gp019 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 1.89074E-152 GAP: 146 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.556, -3.50848394673489, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp019 [Streptomyces phage Karimac] ],,YP_009840192,100.0,1.89074E-152 SIF-HHPRED: SIF-Syn: NKF. upstream gene is pham 9696, downstream gene is pham 47622, just like in phages Battuta and Genie2. /note=Primary Annotator Name: Linares Cardona, Ninette /note=Auto-annotation start source: Both Glimmer and GeneMark call the gene and agree on the start site at 9586 bp. /note=Phamerator: pham: 13649. Date 04/21/2021. The gene is conserved in phages Battuta and Bordeaux, all in the same cluster as PumpkinSpice. /note=Starterator: Start site 7 in Starterator was manually annotated in 30/30 non-draft genes in this pham. Start 7 is 9586 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. The ORF does have reasonable coding potential and the chosen start site does include all of the coding potential. /note=SD (Final) Score: The final score is the best option at -3.508 and the z-score is the highest at 2.556. /note=Gap/overlap: The gap/overlap with the upstream gene is somewhat large at a 146 bp gap. However, this gene is conserved in several other phages and the gap was seen in the other phages as well, such as phage TomSawyer and LukeCage. This is also the LORF. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 9586 bp. Starterator agrees with Glimmer and GeneMark /note=Function call: No known function. The top two PhagesDB BLAST hits have no known function (e-value: e^-120), and the top two NCBI BLAST hits also have no known function (e-value 99.58%), high % identity (100%), and low E-values 1.32107e-175. Phagesdb BLAST listed the function as thymidylate synthase with low e-values. HHpred had one hit with a 100% probability, high coverage of 98.7%, and low e-value of 8.6e-39. CDD also had one hit with an identity of 36.8%, coverage of 98.7% and e-value of 6.1e-29. All the data suggests the ORF function is a type of thymidylate synthase. /note=Transmembrane domains: The absence of TMDs makes sense because ThyX is an enzyme thus doesn’t need to interact with the cell membrane. /note=Secondary Annotator Name: Merlos, Andres /note=Secondary Annotator QC: CDS complement (13799 - 14062) /gene="30" /product="gp30" /function="hypothetical protein" /locus tag="PumpkinSpice_30" /note=Original Glimmer call @bp 14062 has strength 6.61; Genemark calls start at 14065 /note=SSC: 14062-13799 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 3.20019E-56 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.47, -3.8294249377118383, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675283,100.0,3.20019E-56 SIF-HHPRED: SIF-Syn: NKF pham 14434, upstream gene is thymidylate synthase, downstream gene is pham 6771 just like in phage Battuta. /note=Primary Annotator Name: Billings, Sophie /note=Auto-annotation start source: Glimmer and GeneMark. Glimmer calls start at 14062 and GeneMark call start site at 14065. /note=Phamerator: Pham 14434. Date 4/21/21. It is conserved; pham only has phages from the BE cluster (Battuta, BoomerJR, Karimac, Starbow). /note=Starterator: Start site 15 in Starterator was manually annotated in 23/29 non-draft genes in this pham. Start 15 is 14062 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer. /note=Coding Potential: Coding potential in this ORF is on the reverse strand, indicating that this is a reverse gene, however there is a small atypical peak on the forward strand. Coding potential was not found in the GeneMark Host, but was found in the GeneMark Self. /note=SD (Final) Score: -3.829 and its z score is 2.47 /note=Gap/overlap: -1 overlap which is common and reasonable because it is highly conserved and common. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 14062. /note=Function call: function unknown. All of the phagesdb BLAST hits have the function listed as "unknown function" with E-values less than 1e-7 and all NCBI BLAST hits also have the function as "unknown function" with E-values less than 1e-7. CDD and HHpred had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein /note=Secondary Annotator Name: Merlos, Andres /note=Secondary Annotator QC: I agree that this is a real gene with a start site at 14062. The start site number is the most manually annotated, the gap is minimal, the coding potential is sufficient, and the Final Score and Z-Score are sufficient. CDS complement (14062 - 14232) /gene="31" /product="gp31" /function="hypothetical protein" /locus tag="PumpkinSpice_31" /note=Original Glimmer call @bp 14232 has strength 6.69; Genemark calls start at 14232 /note=SSC: 14232-14062 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_30 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 1.00564E-29 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.263, -5.427744181800115, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_30 [Streptomyces phage Starbow] ],,AXH66540,100.0,1.00564E-29 SIF-HHPRED: SIF-Syn: [NKF, upstream gene belongs to phams 12051 (NKF- PumpkinSpice; 05/14/2021); downstream gene belongs to phams 14434 (NKF – PumpkinSpice; 05/14/2021), matching phages /MindFlayer/Starbow.] /note=Primary Annotator Name: Bruns, James /note=Auto-annotation start source: Both Glimmer and GeneMark concur with a Start at 14232. /note=Phamerator: As of 04/22/2021 the gene in question belongs in pham 6771, and is present in non-draft phages Wipeout and TomSawyer. Both phages similarly belong to cluster BE. /note=Starterator: Start three, of which 26 out of 33 non-draft genes confirm, is located on PumpkinSpice 14232 bp. This was also called by both GeneMark and Glimmer which agree with the most annotated start given by Starterator. /note=Coding Potential: Coding potential is present in both Host and Self Trained GeneMark analysis with some atypical potential in the Self-Trained GeneMark page. The predicted gene length by the Self-Trained GeneMark analysis represents the whole gene including the predicted start site. Gene is in the reverse orientation. /note=SD (Final) Score: Score of -4.018, and is the lowest negative score present on PECAAN. /note=Gap/overlap: Gap present totaling 111 bp, and is the longest reasonable ORF for this Gene call. Upon synteny comparisons (Bordeaux/IchabodCrane), the gap seen was considered correct /note=Location call: Based on the data listed above, this is highly likely to be a true gene with a start site of 14232. /note=Function call: Function unknown. PhagesDB BLASTp search resulted in hits that indicated no known function (E-values of 88%), and low E-values (between 5e-30 and 2e-26). There are no relevant hits from CDD while there was somewhat of a relevant hit from HHpred (85.1% probability and 73.5849% coverage), but it has high e-value (10) and it is not a phage protein. /note=Transmembrane domains: TMHMM and TOPCONS do not predict TMD`s to be present. Therefore, it cannot be identified as a transmembrane protein based on this information. /note=Secondary Annotator Name: Merlos, Andres /note=Secondary Annotator QC: I agree with the location call, but the Z-Score should be added into the notes, as well. Glimmer and GeneMark both agree with a start site, and the start site of 14386 corresponds to the most annotated start site number, 3. The Final Score is sufficient and the gap is minimal. CDS complement (14461 - 15336) /gene="33" /product="gp33" /function="membrane protein" /locus tag="PumpkinSpice_33" /note=Original Glimmer call @bp 15336 has strength 7.25; Genemark calls start at 15336 /note=SSC: 15336-14461 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Streptomyces phage MindFlayer]],,NCBI, q1:s1 100.0% 0.0 GAP: 58 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.311, -2.45827804503836, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Streptomyces phage MindFlayer]],,QPL13673,99.6564,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Castillo, Salvador /note=Auto-annotation start source: Glimmer and GeneMark #15336 /note=Phamerator:Date: 4/23/21 Pham: 15396. This gene is conserved in Mindflayer and Starbow. No function known. /note=Starterator: Starting site that is conserved is start 12, at position 15336 for this phage. 39/41 pham members call start site 12 /note=Coding Potential: Only in the reverse direction of the Self-Trained GeneMark shows typical and atypical, the suggested start site 15336 covers all the coding potential. /note=SD (Final) Score: This start, 15336, has the best (lowest) score -2.458. /note=Gap/overlap: the gap with this start is 58bp, however it can`t be filled with other longer start sites because it would have no coding potential /note=Location call: Real gene because it shown to be conserved through the Pham Maps, with 15336 as the start at this is the longest reasonable ORF. The start site 15336 covers all the coding potential, the gene conserved in pham maps, length conserved in other phages from the same cluster shown in in the pham page of phagesDB, starterator shows conservation of this start site in the same pham for other phages. /note=Function call: Through NCBI BLASTp the phage Mindflayer has a 100% query, 99% identity match, and 0 e-value as a membrane protein. This was the only organism with a high value and a declared function. All other programs didn`t provide any valuable information. /note=Transmembrane domains: TMHMM called one TMD, while TOPCON called 4 TMD, which is good evidence to suggest that this protein is involved in the membrane and therefore can be called a membrane protein. /note=Secondary Annotator Name: Haeri, Alliya /note=Secondary Annotator QC: I agree that all provided evidence supports the start location call of 15,336. CDS complement (15395 - 15796) /gene="34" /product="gp34" /function="DNA binding protein" /locus tag="PumpkinSpice_34" /note=Original Glimmer call @bp 15796 has strength 8.44; Genemark calls start at 15796 /note=SSC: 15796-15395 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 1.6872E-91 GAP: 223 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.066, -2.5052746077145835, yes F: DNA binding protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_180304082,100.0,1.6872E-91 SIF-HHPRED: a.25.1.1 (A:3-173) Dodecameric ferritin homolog {Lactococcus lactis, DpsB [TaxId: 1358]},,,d1zs3a1,99.2481,99.9 SIF-Syn: DNA binding protein, upstream gene is NFK, downstream is a membrane protein, just like in phage MindFlayer. /note=AF: Functional call on this one is tricky but HHpred/CDD evidence seem to support it. /note=Primary Annotator Name: Cervantes, Richard /note=Auto-annotation start source: Glimmer and GeneMark. Both called start at 15,395. /note=Phamerator: Pham 15743 (date 4/24/2021). Conserved in following phages: BoomerJR (BE) and Evy (BE) /note=Starterator: Start Site 1 was called for 31/35 non-draft genes for Gene 33 in PumpkinSpice. /note=Coding Potential: There is coding potential in the self-trained genemark but not the host-trained. The coding potential was found for the reverse strand. /note=SD (Final) Score: -2.505 is the best final score. /note=Gap/overlap: 59 bp gap ← No coding potential within gap (in neither self-trained nor host-trained). This gap is also present in Genie2 and BoomerJR, thus conserved. It is also not too large in the first place, thus it is okay. /note=Location call: I believe the auto generated start site of 15796 is correct, based on all previous notes. This gene is real. /note=Function call: Electron Transport was called for the HHpred PDB hit, which had a very small E-value and a 99.6 probability. There was a CDD Hit that simply said the protein was multi-functional, thus I believe the HHpred is simply more specified and thus more accurate. However, on PECAAN the PhagesDB Function Frequency has DNA binding protein as a match with 100% frequency thus, I believe this is the best function call. /note=Transmembrane domains: There were no TMDs called in TMHMM nor TOPCONS. /note=Secondary Annotator Name: Dines, Lily /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS complement (16020 - 16226) /gene="35" /product="gp35" /function="hypothetical protein" /locus tag="PumpkinSpice_35" /note=Original Glimmer call @bp 16226 has strength 4.12; Genemark calls start at 16226 /note=SSC: 16226-16020 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_BIRCHLYN_33 [Streptomyces phage Birchlyn] ],,NCBI, q1:s1 100.0% 2.67912E-41 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.249, -4.213516249738288, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BIRCHLYN_33 [Streptomyces phage Birchlyn] ],,QDF17210,100.0,2.67912E-41 SIF-HHPRED: SIF-Syn: Pham 22573 is also present in BE2 phages like BoomerJr and Genie2. Pham 22573 is also adjacent to pham 15743 upstream and pham 53592 downstream in all of these phages. Pham designations as of 05/28/2021. /note=Primary Annotator Name: Chang, Loren /note=Auto-annotation start source: Both Glimmer and GeneMark predict the start site to be at 16226 bp. /note=Phamerator: Pham 22753 (04/23/2021). It is conserved; found in BoomerJr (BE2), IchabodCrane(BE2), and Starbow(BE2). /note=Starterator: Start site 6 in Starterator was manually annotated in 14/30 non-draft genes in this pham. Start site 6 is position 16226 in PumpkinSpice. Called 95.2% of time when present. This agrees with the annotated start site provided by Glimmer and GeneMark. /note=Coding Potential: At this ORF, GeneMark does not show coding potential, while GeneMarkS shows good coding potential (in a reverse reading frame). Almost all, but not quite all, of the coding potential is covered. /note=SD (Final) Score: The final score is -4.214. It is highest final score on PECAAN. /note=Gap/overlap: Has an overlap of 4 bp. This overlap is not large enough to be problematic. Additionally, this overlap is present in other phages like LukeCage and Starbow. May indicate that it is part of an operon. /note=Location call: Given the above evidence, this is a real gene, and its most likely start site is at 16226 bp. /note=Function call: No known function. The top PhagesDB BLAST hits (E-values 2e^-34) are all associated with proteins of unknown function. The top NCBI BLAST hits (100% coverage, 98.5%+ identity, E-values<2.68e-41) are also associated with proteins of unknown function. Both CDD and HHPred gave no significant hits. Thus, there is no known function at this time. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predict the presence of any TMDS for this protein. This suggests that it is not a membrane protein. /note=Secondary Annotator Name: Dines, Lily /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS complement (16223 - 16384) /gene="36" /product="gp36" /function="membrane protein" /locus tag="PumpkinSpice_36" /note=Genemark calls start at 16384 /note=SSC: 16384-16223 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_35 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 5.79922E-27 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.684, -5.412417138717724, no F: membrane protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_35 [Streptomyces phage Starbow] ],,AXH66545,100.0,5.79922E-27 SIF-HHPRED: SIF-Syn: Membrane protein, upstream gene is Pham 22753, downstream gene is Pham 29588, just like in phages MindFlayer, LukeCage and Karimac. /note=Primary Annotator Name: Taheri, Armin /note=Auto-annotation start source: GeneMark calls 16,384. Glimmer does not call a start site. /note=Phamerator: Pham 53592. Conserved in some related phages, including TomSawyer, Mindflayer, and Bordeaux. No function. Date: 4/23/21. /note=Starterator: Site number 1, position 16,384 in PumpkinSpice. This is the only manually annotated start (called in 12 of 12 non-draft genes in pham). This is the start site called by GeneMark. /note=Coding Potential: Good reverse coding potential on GeneMarkHost, as well as typical and atypical reverse coding potential in GeneMarkS, contained by the auto-annotated start site (16,384). /note=SD (Final) Score: The final score for the auto-annotated start site is -5.412. The best final score is -3.506 for start site 16297. The z-score for the auto-annotated start site is 1.684. The best z-score is 2.309 for start site 16276. /note=Gap/overlap: The -4 upstream gap could indicate an operon with the previous gene. Gap is conserved in other final genomes, such as phages Starbow, MindFlayer, and LukeCage. /note=Location call: This gene is real, with a start site of 16,384. Although this start site does not have the best z-score or final score, the scores are still sufficient, and the location call is supported by Starterator and GeneMark data. /note=Function call: Membrane protein. All strong PhagesDB hits have unknown function. There are no strong HHpred or CDD hits. All strong NCBI BLAST hits are "hypothetical proteins," except for one hit with the function "membrane protein." TMHMM and TOPCONs both predict two transmembrane domains. /note=Transmembrane domains: TMHMM and TOPCONs both predict two transmembrane domains. /note=Secondary Annotator Name: Lapurga, Kaira /note=Secondary Annotator QC: Information seems to be in order, but appears to be missing further explanation in auto annotation start source, starterator, and phamerator portions. Please refer to the lab manual for proper template. CDS complement (16381 - 16506) /gene="37" /product="gp37" /function="hypothetical protein" /locus tag="PumpkinSpice_37" /note=Original Glimmer call @bp 16506 has strength 5.65; Genemark calls start at 16485 /note=SSC: 16506-16381 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein HWB80_gp037 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 2.44154E-21 GAP: 86 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.101, -4.987866211241783, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp037 [Streptomyces phage Karimac] ],,YP_009840210,100.0,2.44154E-21 SIF-HHPRED: SIF-Syn: Gene in pham 29588 is upstream of gene in pham 53592 (NKF) and downstream of gene in pham 9745 just like in phages Battuta, Birchlyn, Bordeaux, andBoomerJR. /note=Primary Annotator Name: Delgado, Yennifer /note=Auto-annotation start source: Glimmer and GeneMark. Glimmer called the start at 16506 while GeneMark called the start at 16485. /note=Phamerator: pham: 29588, Date: 04/23/2021. It is conserved; found in Battuta_36, Birchlyn_35, BoomerJR_38. /note=Starterator: Start site 3 in Starterator was manually annotated in 11/52 non-draft genes in this pham. Start 3 is 16506 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found in GeneMark Self but not in GeneMark Host. /note=SD (Final) Score: -4.988. It is the second best final score on PECAAN. /note=Gap/overlap: 86 bp. The gap is a little bit large, but reasonable because the gap is conserved in other phages (Battuta, Bordeaux) and there is no coding potential in the gap that might be a new gene. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 16506. /note=Function call: NKF. Both CDD and HHpred did not show any significant hits for this gene. The top two phagesdb BLAST hits have unknown function (E-value =1e-18), and the 2 top NCBI BLAST hits also have unknown function (hypothetical protein) with 100% coverage, 80%+ identity, and E-value <1e-15. Thus, the function of this gene is unknown. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Ali Pour, Paria /note=Secondary Annotator QC: I have QC’ed this location call and agree with the primary annotator. CDS complement (16593 - 17066) /gene="38" /product="gp38" /function="HNH endonuclease" /locus tag="PumpkinSpice_38" /note=Original Glimmer call @bp 17066 has strength 3.8 /note=SSC: 17066-16593 CP: yes SCS: glimmer ST: SS BLAST-Start: [HNH endonuclease [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 4.36431E-111 GAP: 40 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.31, -6.193723690050797, no F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Streptomyces phage Karimac] ],,YP_009840211,100.0,4.36431E-111 SIF-HHPRED: CRISPR-associated endonuclease Cas9; AcrIIC3, HYDROLASE-HYDROLASE INHIBITOR complex; HET: MSE; 2.606A {Neisseria meningitidis},,,6J9N_A,56.6879,98.7 SIF-Syn: Gene 37 is a part of Pham 9745 with a HNH endonuclease function with the upstream gene being from Pham 58806 with a endolysin function and the downstream gene belonging to Pham 29588 with no known function. This pattern of synteny was also be see in BE2 phages Karimac and IchabodCrane. /note=Primary Annotator Name: Do, Vivian /note=Auto-annotation start source: Glimmer start site is 17066 and GeneMark did not call for a start site. /note=Phamerator: As of 04.21.21 this gene belongs to Pham 9745. This gene is very conserved with the majority of the genes having a length of 474 bp as seen in genes of Battuta, Birchlyn, and Bordeaux. The majority of the genes have HNH endonuclease as the noted function. /note=Starterator: The most annotated start site with 33 manual annotations was start site 2. This correlates to the start site at 17066 bp which is the same as the auto-annotated start site. /note=Coding Potential: There was no coding potential for the Host-trained GeneMark but is some coding potential on the Self-trained GeneMark in the 2nd ORF in the reverse direction. /note=SD (Final) Score: -6.194, this is the worst SD score reported. Though this would typically be evidence against a bad start site however, it is the only start site that encompasses the totality of the coding potential. /note=Gap/overlap: 40 bp which is reasonable as it is also conserved in other phages (LukeCage and Starbow). There is no coding potential in the gap that indicates another gene and Pham Maps shows a tRNA in the gap. /note=Location call: With the data that we currently have I would confirm the start of this gene at 17066 as its starting codon has a good probability being ATG, encompasses the full coding potential, and has a decent RBS Final Score and z-score, though it is not the best this is the longest ORF helping mitigate gaps between genes. Though the SD Final Score it is the only start site that allows for a gene that is at least 150 bp and includes all of the coding potential, this is also confirmed as this site has the most annotation according to Starterator. /note=Function call: Yes, our data shows that the function of this gene is most likely HNH endonuclease as the e-values as significantly below 10e-7, the lengths match, and the top hits are from the same host and cluster. CDD top hits were of HNH endonucleases and HHpred top hits varied a little more but most pointed towards endonuclease and either mentioned no specific domain or HNH domain. All CDD and HHpred top hits had a probability greater than 95%, coverage percentage greater than 50%, and e-values significantly lower than 10e-3. /note=Transmembrane domains: Both TMHMM and TOPCONS predicts one transmembrane domains each reaching the threshold needed to consider this a membrane protein which is an unusual result considering HNH endonuclease are generally involved in recombination of DNA. /note=Secondary Annotator Name: Zuelch, Avery /note=Secondary Annotator QC: I have QC`d this gene, and agree with the start site that has been called. All data suggests that this start site is the correct start site/the best start site available. The SD (Final) Score is not the best, however, it is acceptable because it is the only value for the start site that covers all of the coding potential. CDS complement (17107 - 17652) /gene="39" /product="gp39" /function="endolysin" /locus tag="PumpkinSpice_39" /note=Original Glimmer call @bp 17652 has strength 7.52; Genemark calls start at 17619 /note=SSC: 17652-17107 CP: yes SCS: both-gl ST: SS BLAST-Start: [endolysin [Streptomyces phage Birchlyn] ],,NCBI, q1:s1 100.0% 1.61144E-128 GAP: 98 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.178, -4.826259462263871, no F: endolysin SIF-BLAST: ,,[endolysin [Streptomyces phage Birchlyn] ],,QDF17214,99.4475,1.61144E-128 SIF-HHPRED: MEMBRANE-BOUND LYTIC MUREIN TRANSGLYCOSYLASE F; HYDROLASE, LYTIC TRANSGLYCOSILOSE, CELL WALL RECYCLING; HET: MSE, EDO; 1.801A {PSEUDOMONAS AERUGINOSA PAO1},,,5A5X_B,46.9613,93.6 SIF-Syn: endolysin, just like in Bordeaux; upstream gene is HNH endonuclease; downstream is NKF /note=Primary Annotator Name: Haeri, Alliya /note=Auto-annotation start source: Glimmer calls the gene to start at 17,652. GeneMark calls the start site of 17619. /note=Phamerator: Pham: 58806. Date 04/23/2021. It is conserved, and found in Bordeaux (BE2) and IchabodCrane (BE2). /note=Starterator: Start site 6 is manually annotated in 30/33 non-draft phages in the pham. Start 6 is 17,652 in PumpkinSpice. This evidence agrees with the start site proposed by Glimmer. /note=Coding Potential: Coding potential is found in the reverse strand only, suggesting that this is a reverse gene. Coding potential is found in GeneMark Self but not in GeneMark Host. /note=SD (Final) Score: The RBS score is one of the best combinations of z-score and Final score with a 2.178 z-score and a -4.826 Final score. /note=Gap/overlap: The gap is 98 base pairs. There is no coding potential within the gap and the gap is conserved across other phages, such as LukeCage and StarPlatinum. /note=Location call: This is likely a real gene due to the above evidence, with the start site of 17,652. /note=Function call: Endolysin. The top PhagesDB BLAST hits have endolysin as the listed function (evalues = e-105), and many of the top NCBI hits have endolysin listed as the function (100% coverage, identity>87%, e-value<1e-99). Many top NCBI hits did have hydrolase as the listed function, but the top NCBI hit (100% coverage, identity >99%, e-value = 2e-128) had endolysin listed as the function, making that the more probable function call. There was a HHpred hit with a hydrolase classification, however, the coverage was very low and not particularly compelling evidence. There were no significant CDD hits. /note=Transmembrane domains: There was one hit for a TMD by TmHmm but no hits for a TMD by TOPCON. While there might be some expectation for TMDs within this protein, because the function call for this gene was an endolysin and that protein can interact with the membrane, endolysins are a broad category of protein and this protein specifically could potentially not have membrane interactions. /note=Secondary Annotator Name: Torres, Canela /note=Secondary Annotator QC: I agree with the primary annotator`s location and functional calls. CDS complement (17751 - 17915) /gene="40" /product="gp40" /function="hypothetical protein" /locus tag="PumpkinSpice_40" /note=Original Glimmer call @bp 17915 has strength 14.63; Genemark calls start at 17915 /note=SSC: 17915-17751 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB80_gp239 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 7.6152E-30 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.969, -2.7254235724530456, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp239 [Streptomyces phage Karimac] ],,YP_009840213,100.0,7.6152E-30 SIF-HHPRED: SIF-Syn: NKF (Pham 795), upstream gene is endolysin, downstream gene is Pham 17415, just like in phage MindFlayer. /note=Primary Annotator Name: Howe, Kathryn /note=Auto-annotation start source: Both Glimmer and GeneMark identified 17915 as the start site. /note=Phamerator: As of April 22, 2021, this gene was part of Pham 795. This gene is conserved with other members of the same cluster BE, such as Battuta and IchabodCrane. Phamerator listed endolysin as the function of this gene which is found on the approved function list. /note=Starterator: The most conserved start site for this gene was at start site number 7, which corresponds to start site 17915 in the PumpkinSpice phage. Start site #7 was called in 25 of 39 non-draft phage annotations. /note=Coding Potential: There is very good coding potential on the Self-trained GeneMark from the designated start site. The chosen start site at 17915 covers all of the coding potential. /note=SD (Final) Score: The SD final score is the best with a value of -2.725 /note=Gap/overlap: The overlap with the upstream gene is only 1 bp and is reasonable. /note=Location call: Using the information gathered so far, it is safe to call this a real gene. Starterator shows that the auto annotated start site is conserved among similar phages. The auto annotated start also has the best final score. This also has the longest open reading frame. This site also had the best z-score value of 2.969. Using this information, the most probable start site is at bp coordinate 17915. /note=Function call: Based on the evidence so far, there is no known function yet for this gene. All of the PhagesDB blasts, including those with an e-value within our threshold, had no known function and the NCBI blasts were only hypothetical proteins. CDD and HHpred also did not produce any significant hits, which further suggests that this gene has no known function. /note=Transmembrane domains: Based on the TMHMM and TOPCON data, this gene does not code for a transmembrane protein. /note=Secondary Annotator Name: Do, Vivian /note=Secondary Annotator QC: I agree with the start site called I would just add more evidence in the Location Call section. You can add more about how this also the LORF and that the z-score is best one presented. I would also add information of about whether or not the overlap observed is conserved in other phages. CDS complement (17915 - 18067) /gene="41" /product="gp41" /function="hypothetical protein" /locus tag="PumpkinSpice_41" /note=Original Glimmer call @bp 18067 has strength 2.92; Genemark calls start at 18067 /note=SSC: 18067-17915 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_41 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 5.53571E-27 GAP: 419 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.556, -3.588526034455651, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_41 [Streptomyces phage Starbow] ],,AXH66550,100.0,5.53571E-27 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hugo, Cristelle /note=Auto-annotation start source: Glimmer and GeneMark 18067 /note=Phamerator: 4/24 Pham: 17415. It is conserved in 20/21 of the BE2 phages. No function listed. /note=Starterator: Start site 5 was manually annotated in 12/15 non-draft genes in this pham. It is 18067 in PumpkinSpice. It agrees with the site predicted by Glimmer and GeneMark. The only other possible site does not include the whole coding potential. /note=Coding Potential: Good coding potential on reverse strand, with start site covering all potential. /note=SD (Final) Score: -3.589. This is the best score. /note=Gap/overlap: 419. This reverse gene is followed by a gene in the opposite direction, accounting for this large gap. Also, looking through all the other finalized phages, they all have large gaps ~400 separating them. So this seems normal. /note=Location call: This is a real gene, and the most likely start site is 18067. /note=Function call: No known function. No significant matches at all. HHpred matches were under 80%. Phage matches with similar sequences had no function known either. /note=Transmembrane domains: 0; no predictions, NKF /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 18487 - 19455 /gene="42" /product="gp42" /function="glycosyltransferase" /locus tag="PumpkinSpice_42" /note=Original Glimmer call @bp 18487 has strength 11.25; Genemark calls start at 18487 /note=SSC: 18487-19455 CP: yes SCS: both ST: SS BLAST-Start: [glycosyltransferase [Streptomyces phage Birchlyn] ],,NCBI, q1:s1 100.0% 0.0 GAP: 419 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.818, -3.4897410997597818, no F: glycosyltransferase SIF-BLAST: ,,[glycosyltransferase [Streptomyces phage Birchlyn] ],,QDF17217,99.6894,0.0 SIF-HHPRED: DNA alpha-glucosyltransferase; Transferase; HET: GOL, CME, UDP, EDO; 1.73A {Enterobacteria phage T4},,,1XV5_A,99.6894,100.0 SIF-Syn: The function of gene 41 is glycosyltransferase has a synteny with Bordeaux phage, where this gene 41 is found in the same place and has the same function as gene 42 in Bordeaux; also, upstream gene (pham 17415) and downstream gene (ribonucleotide reductase) were found in the same place and has the same function just like in phage Bordeaux. /note=Primary Annotator Name: Jakupova, Malika /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start at 18487. /note=Phamerator: pham 21430; This analysis was run 04/16/21; It is conserved, found in Bmoc_40(BE) and Cross_42(BE). /note=Starterator: The start number called the most often in the published annotations is 6, it was called in 54 of the 66 non-draft genes in the pham. Start site 6 is 18487 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene.The ORF does have reasonable coding potential on GeneMark-Self and the chosen start site does include all of the coding potential. It doesn`t show any coding potentials on GeneMark- Host. /note=SD (Final) Score: -3.490. It is the best final score on PECAAN. /note=Gap/overlap: The gap/overlap with the upstream gene is a little large at a 419 bp gap. However, this gene is conserved in several other phages and the gap was seen in the other phages as well, such as phage Genie2 and LukeCage. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 18487. /note=Function call: all of the hits from phagesDB BLAST suggest that it is a glycosyltransferase protein with very small e-values of 0 to 4e-88 and most of the hits from NCBI BLAST also have function of glycosyltransferase. Where two selected best hits have around (99% identity, 100% coverage, 0 e-value). Hhpred has hits that correspond to SEA-PHAGES requirements for this gene. It has hits with molecule Glycosyltransferase B736L from Chollera virus NV2A (100% probability, 99.6% coverage, 2.7e-33 e-value) and also with virus T4 protein alpha-glycosyltransferase (ATG) where it has (100% probability, 99.7% coverage, 1.3e-32 e-value). /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Bruns, James Alan /note=Secondary Annotator QC: Good, I would just try to make your notes in complete sentences. Make sure to Check Gene Coding Capacity Box CDS 19533 - 21473 /gene="43" /product="gp43" /function="ribonucleotide reductase" /locus tag="PumpkinSpice_43" /note=Original Glimmer call @bp 19533 has strength 8.13; Genemark calls start at 19533 /note=SSC: 19533-21473 CP: yes SCS: both ST: SS BLAST-Start: [ribonucleotide reductase [Streptomyces phage Wipeout] ],,NCBI, q1:s1 100.0% 0.0 GAP: 77 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.77, -3.061661148509975, no F: ribonucleotide reductase SIF-BLAST: ,,[ribonucleotide reductase [Streptomyces phage Wipeout] ],,QGH74288,100.0,0.0 SIF-HHPRED: RIBONUCLEOSIDE TRIPHOSPHATE REDUCTASE; 10-stranded alpha-beta barrel, central finger loop, OXIDOREDUCTASE; 1.75A {Lactobacillus leichmannii} SCOP: c.7.1.4,,,1L1L_C,98.7616,100.0 SIF-Syn: Good pham synteny up- and downstream from this gene, and also conserved LysM-like peptidoglycan binding protein immediately downstream. This is true in all checked phages: MindFlayer, Starbow, and TomSaywer. /note=Primary Annotator Name: Kelly, Samuel /note=Auto-annotation start source: Glimmer and GeneMark both have the start point marked at 19533. /note=Phamerator: (4/23/21) Pham 14994. This pham contains only genes from cluster BE and BK. Nearly every gene has the function ribonucleotide reductase. /note=Starterator: Start coordinate is (13, 19533). Approx. 61% (30/49) of manually annotated non-draft phages call start site #13, and this site is also highly conserved, suggesting it is likely correct. /note=Coding Potential: Lack of coding potential altogether on Host-trained GeneMark, but fairly strong coding potential on the Self-trained GeneMark. Coding potential is in the forward orientation for this gene. Chosen start site covers coding potential. /note=SD (Final) Score: -3.062, second best of the candidates. /note=Gap/overlap: Small gap (77 bp) before and after (87bp) gene, both of which are reasonable. /note=Location call: Considering the evidence, it seems reasonable to place the start site at 19533. Phamerator and Starterator both support this as well. /note=Function call: Ribonucleotide reductase. The top 2 (along with many other) BLASTs on both PhagesDB and NCBI call this function, with strong coverage and identities from 97-100%. The e-values (0.0) are not very strong, however, but overall evidence points to this function. CDD and HHpred also strongly support this function call, with all database calls pointing to RNR with 100% probability, good e-values, and solid coverage. /note=Transmembrane domains: Zero hits in both TMHMM and TOPCONS. /note=Secondary Annotator Name: Howe, Kathryn /note=Secondary Annotator QC: Based on the evidence present, I agree with the original annotator however, coding potential should have included whether or not the chosen start site covers the entirety of the coding potential. CDS 21560 - 22441 /gene="44" /product="gp44" /function="LysM-like peptidoglycan binding protein" /locus tag="PumpkinSpice_44" /note=Original Glimmer call @bp 21560 has strength 14.73; Genemark calls start at 21560 /note=SSC: 21560-22441 CP: yes SCS: both ST: SS BLAST-Start: [N-acetylmuramoyl-L-alanine amidase [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 0.0 GAP: 86 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.157, -3.16089827114923, no F: LysM-like peptidoglycan binding protein SIF-BLAST: ,,[N-acetylmuramoyl-L-alanine amidase [Streptomyces sp. JV178] ],,WP_099970889,100.0,0.0 SIF-HHPRED: N-acetylmuramoyl-L-alanine amidase; amidase, zinc binding, cell wall degradation, endolysine, hydrolase; HET: PO4, GOL; 1.21A {Clostridium intestinale},,,6SSC_A,67.9181,99.7 SIF-Syn: same upsteam gene as in Bordeaux, BoomerJR /note=Primary Annotator Name: Kim, James Joon /note=Auto-annotation start source: Glimmer and GeneMark where they both show the start being at 21560. /note=Phamerator: Pham 61245 as of May 4, 2021. The gene was compared to Battuta and Birchyln for reference. /note=Starterator: Starterator says start number 59 is the most common one found in publications. With this being said, the best start site would then correspond to 21560 where it has 71 manual annotations which is supporting evidence that this is our actual start site. /note=Coding Potential: The coding potential is shown through its forward strand, indicating that this gene is a forward strand where this is visualized through the Host-Trained GeneMark data. /note=SD (Final) Score: The RBS final score is -3.161, Z-Score is 3.157, and by having these two wonderful scores, I think it`s the best option for the LORF. /note=Gap/overlap: The gap is 86 which is an acceptable gap when comparing it to other phages and with the given gap, it is advised that a new gene should not be added. /note=Location call: With the given evidence, this appears to be a real gene with its respective start site at 21560. Evidence such as 71 manual annotations being done on start site 21560 shows strong support that the start site is actually 21560 in addition to Cristelle`s analysis of how the start site called includes all of the coding potential. /note=Function call: LysM-Peptidoglycan Protein Binding /note=Transmembrane domains: N/A, this means that the known function, `LysM-like peptidoglycan binding protein` has no relation to the membrane protein. /note=Secondary Annotator Name: Hugo, Cristelle Linnette /note=Secondary Annotator QC: Phamerator and starterator notes are missing. I would mention that the start site called is the only one that includes all the coding potential since I had to look at coding potential maps. CDS 22589 - 24325 /gene="45" /product="gp45" /function="hypothetical protein" /locus tag="PumpkinSpice_45" /note=Original Glimmer call @bp 22589 has strength 10.09; Genemark calls start at 22571 /note=SSC: 22589-24325 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 0.0 GAP: 147 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.546, -4.631219452820914, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_099970890,100.0,0.0 SIF-HHPRED: SIF-Syn: Pham 21321 (as of 5/25/2021), upstream is a LysM-like peptidoglycan binding protein (Pham 64647 as of 5/25/2021), downstream is a portal protein (Pham 3276 as of 5/26/2021), just like in phages IchabodCrane, StarPlatinum, and TomSawyer. /note=Primary Annotator Name: Lapurga, Kaira /note=Auto-annotation start source: Both Glimmer and Genemark call the gene but do not agree on the start site (Glimmer: 22589, Genemark: 22571). Suggested start sites: 22511,22565, 22589, 22658, 22781, 22841. /note=Phamerator: 21321. Date 4/22/2021. It is conserved; found in phages Cross, Evy, Enygma and Peebs. /note=Starterator: Start site 4 in Starterator was manually annotated in 18/26 non-draft genes in this pham. Start site at 22589. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Coding Potential: Coding potential in the ORF is only on forward strand, indicating a forward gene. Coding potential found in Self-Trained GeneMark. /note=SD (Final) Score: final score is at -4.631. /note=Gap/overlap: Gap is 147bp which is fairly large, but understandable due to lack of coding potential in the gap in GeneMark. /note=Location call: Real gene and we would use the start site 22589 due to the start site covering coding potential as seen in GeneMark. /note=Function call: Highest function frequency for minor tail protein but may be unknown due to contradicting function calls between CDD and HHpred. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Castillo, Salvador Castillo /note=Secondary Annotator QC: I agree with the call. CDS 24389 - 24724 /gene="46" /product="gp46" /function="hypothetical protein" /locus tag="PumpkinSpice_46" /note=Original Glimmer call @bp 24389 has strength 15.19; Genemark calls start at 24383 /note=SSC: 24389-24724 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_MINDFLAYER_46 [Streptomyces phage MindFlayer]],,NCBI, q1:s3 100.0% 3.80341E-77 GAP: 63 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.213, -6.396143887828828, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_MINDFLAYER_46 [Streptomyces phage MindFlayer]],,QPL13686,98.2301,3.80341E-77 SIF-HHPRED: SIF-Syn: NKF, upstream gene is pham 21321, downstream is pham 3276, just like in phages BoomerJr and Battuta. /note=Primary Annotator Name: Linares Cardona, Ninette /note=Auto-annotation start source: Both Glimmer and GeneMark call the gene but they do not agree on the start site. The start site for Glimmer is at 24389 bp. The start site for GeneMark is at 24383 bp. /note=Phamerator: The pham number as of 04/21/2021 is 5504. The gene is conserved in phages MindFlayer and LilMartin, all in the same cluster as PumpkinSpice. /note=Starterator: Start site 6 in Starterator was manually annotated in 38/54 non-draft genes in this pham. Start 6 is 24389 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is only found in Self-Trained GeneMark. The ORF does have reasonable coding potential and the chosen start site does include all of the coding potential. /note=SD (Final) Score: The final score is not the best but it is reasonable at -6.396 and the z-score is 1.213. /note=Gap/overlap: The gap/overlap with the upstream gene is a little large at a 63 bp gap. However, the gene is conserved in several other phages and the gap was seen in the other phages as well, such as phage Battuta and BoomerJR. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 24389 bp. Starterator agrees with Glimmer. /note=Function call: No known function. The top two PhagesDB BLAST hits have no known function (e-value: e^-61), and the top two NCBI BLAST hits also have no known function (e-value: e^-77). CDD and HHpred didn`t have any significant hits for a function. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Quijada, Britney /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. I only suggest stating the function of the pham in pham notes if there is one. CDS 24714 - 26285 /gene="47" /product="gp47" /function="portal protein" /locus tag="PumpkinSpice_47" /note=Original Glimmer call @bp 24714 has strength 15.02; Genemark calls start at 24714 /note=SSC: 24714-26285 CP: yes SCS: both ST: SS BLAST-Start: [portal protein [Streptomyces phage Wipeout] ],,NCBI, q1:s1 100.0% 0.0 GAP: -11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.066, -2.970161406017234, yes F: portal protein SIF-BLAST: ,,[portal protein [Streptomyces phage Wipeout] ],,QGH74293,100.0,0.0 SIF-HHPRED: Phage portal protein, HK97 family; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_B,72.4665,100.0 SIF-Syn: Portal Protein, upstream gene belongs to pham 5504, downstream gene belongs to pham 59423, just like in phages Battuta and Wipeout.` /note=Primary Annotator Name: Liu, Lily /note=Auto-annotation start source: Both Glimmer and GeneMark call the start site at 24714bp. /note=Phamerator: pham 3276, date 04/21/2021. It is conserved in phages such as Battuta, BillNye, and BoomerJR. /note=Starterator: start site 2 was manually annotated in 43/54 non-draft genes in this pham. start 2 is 24714bp, which agrees with the site called by GeneMark and Glimmer. Starterator was run on 4/16/21. /note=Coding Potential: The self-trained GeneMark shows both typical and alternative coding potential, but the host-trained GeneMark does not show any coding potential at all. Both the self-trained and the host-trained GeneMarks correspond to the third reading frame. /note=SD (Final) Score: The best final score is -2.970 and the best z-score is 3.066. /note=Gap/overlap: There is a 11bp overlap between this gene and the previous gene, which is not too large of an overlap, and this start site covers all of the coding potential. The 11bp overlap might indicate that this gene is part of an operon. /note=Location call: Based on the above evidence, this is a real gene and most likely has a start site at 24714bp. /note=Function call: This is a portal protein. The majority of PhagesDB BLAST hits were listed as portal proteins, with the top two e-values both being 0.0. NCBI BLAST hits all listed this gene as a portal protein, with the top two e-values both being 0.0. All CDD hits listed this gene as a portal protein, the top two hits having e-values of 2.63e-33 and 8.06e-31. The majority of the HHpred hits also listed this gene as a portal protein, with the top two hits having e-values of 5.5e-33 and 1.8e-30. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Namaganda, Samali /note=Secondary Annotator QC:-Do you think the gene might be part of an operon because of the overlap? is so, you might want to mention this. You might want to add the date when the Starterator file was run CDS 26282 - 26740 /gene="48" /product="gp48" /function="membrane protein" /locus tag="PumpkinSpice_48" /note=Original Glimmer call @bp 26282 has strength 8.13; Genemark calls start at 26282 /note=SSC: 26282-26740 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Streptomyces phage MindFlayer]],,NCBI, q1:s1 100.0% 1.05383E-106 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.417, -4.4521226836255074, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Streptomyces phage MindFlayer]],,QPL13688,100.0,1.05383E-106 SIF-HHPRED: SIF-Syn: Downstream protein is a portal protein in pham 3276 , upstream protein is a capsid maturation protease in pham 76037, just like in phage Battuta. /note=Primary Annotator Name: Merlos, Andres /note=Auto-annotation start source: Glimmer and GeneMark call a start site at 26282. /note=Phamerator: Pham number 59423 was ran on 23 April 2021. This gene is highly conserved, seen in phages such as Battuta_48 and Birchlyn_47. There are 14 draft genes. /note=Starterator: (1, 26282). The start site number most published was start site number 1. This was called in 31 of 49 non-draft genes. This was the most annotated start site number. /note=Coding Potential: There is coding potential in the self-trained GeneMark but not the host-trained GeneMark. The start site is 26282 F with a stop site of 26740. The start site is further justified with the phamerator`s starterator. The start site covers all the coding potential. /note=SD (Final) Score: -4.452 is a good Final Score along with a Z-score of 2.417. /note=Gap/overlap: -4, which is a very acceptable gap. No gene needs to be added. /note=Location call: This gene is a real gene. The start site of 26282 is sufficient, as it minimizes any potential gap. Furthermore, the final score and the Z-score are sufficient. The start site is further justified as the manually annotated start number matches the autoannotated. This gene is real with a start site of 26282 and a stop site of 26740. /note=Function call: I would call this a membrane protein of some sorts. Although all the hits from PhagesDB are unknown, there are genes that would hint for this being a membrane protein on NCBI, as seen from MindFlayer. Based on all the data, there is not enough information to determine a function. CDD and HHpred do not produce the needed results to determine a specific membrane protein. Based on the 4 TMDs from TMHMM, we can safely call this a membrane protein. We do not need TOPCONS as TMHMM already has 4 TMDs. /note=Transmembrane domains: 4 TMDs in TMHMM, TOPCONS is irrelevant as we have 4 TMDs from TMHMM. /note=Secondary Annotator Name: Zuelch, Avery /note=Secondary Annotator QC: After looking at this information, I agree with the start site mentioned above CDS 26737 - 28029 /gene="49" /product="gp49" /function="capsid maturation protease" /locus tag="PumpkinSpice_49" /note=Original Glimmer call @bp 26737 has strength 10.79; Genemark calls start at 26737 /note=SSC: 26737-28029 CP: yes SCS: both ST: SS BLAST-Start: [capsid maturation protease [Streptomyces phage IchabodCrane]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.045, -6.729746247838821, no F: capsid maturation protease SIF-BLAST: ,,[capsid maturation protease [Streptomyces phage IchabodCrane]],,QFP97364,100.0,0.0 SIF-HHPRED: SIF-Syn: Capsid maturation protease, the upstream gene has no known function (is in Pham 59423) and the downstream gene is a major capsid protein just like in Karimac and TomSawyer. /note=Primary Annotator Name: Namaganda, Samali /note=Auto-annotation start source: Glimmer and GeneMark call 26737 /note=Phamerator: As of April 23, 2021, the gene is found in Pham 59300. The Pham has 68 members, 14 of which are drafts. The gene is conserved in phages of the same cluster BE such as Battuta, Cross, Evy and Genie2. /note=Starterator: Analysis was run on 4/21/21. I do not think the most annotated start (7, 26743) is also the best start because start (5, 26737) is actually the most annotated (annotated 14 times) in subcluster BE2 which Pumpkin Spice is a part of. Start 5 also agrees with the Glimmer and GeneMark calls. /note=Coding Potential: The ORF has coding potential on only the forward strand indicating that it is a forward gene. There is both atypical and typical coding potential in only the self-trained genemark /note=SD (Final) Score: -6.730 is best SD Score among all the possible start sites with a gap/overlap of less than 7bp /note=Gap/overlap: There is a 4 basepair overlap with the upstream gene which is conserved in most members of subcluster BE2 (Tom Sawyer, Battuta). This points to the gene most likely being part of an operon. /note=Location call: Start site 26737 is the LORF and is shared by other genes in members of the same subclsuter BE2 that are part of the same pham 59,300. /note=Function call: There is enough data to hypothesize the gene function is “Capsid maturation protease”. There were many strong hits of E = 0 in both the NCBI Blastp and the PhagesDB blast with the same function assigned. /note=Transmembrane domains: Analysis ran on 3/24/21 shows that there are no TMH called in TmHmm /note=Secondary Annotator Name: Zuelch, Avery /note=Secondary Annotator QC: I have QC`d this gene, and agree with the primary annotator due to all of the information represented above. CDS 28062 - 29057 /gene="50" /product="gp50" /function="major capsid protein" /locus tag="PumpkinSpice_50" /note=Original Glimmer call @bp 28062 has strength 14.42; Genemark calls start at 28062 /note=SSC: 28062-29057 CP: yes SCS: both ST: SS BLAST-Start: [major capsid protein [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 0.0 GAP: 32 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.713, -3.833788790802489, yes F: major capsid protein SIF-BLAST: ,,[major capsid protein [Streptomyces phage Karimac] ],,YP_009840223,100.0,0.0 SIF-HHPRED: Major capsid protein Rcc01687; "capsid", "jelly roll", "spike", "HK97", VIRUS; 3.42A {Rhodobacter capsulatus DE442},,,6TSU_T4,96.6767,100.0 SIF-Syn: Major capsid protein, upstream gene is capsid maturation protease (pham 59300), downstream gene is from pham 20141, just like in phage Bordeaux. /note=Primary Annotator Name: Quijada, Britney /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start site at 28062. /note=Phamerator: The pham number as of April 22, 2021 is 15120. The gene is conserved in phages BoomerJr, Bordeaux, and Genie2, all in the same cluster as PumpkinSpice. PhagesDB listed major capsid protein as the function of this gene. This is also found in the approved function list. /note=Starterator: Start site 2 in Starterator was the most manually annotated in 45/54 non-draft genes in this pham, which correlates to a start site of 28062 bp for PumpkinSpice. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is only found in the Self-Trained GeneMark. The chosen start site includes all of the coding potential. /note=SD (Final) Score: The SD (Final) Score is -3.834 with a Z score of 2.713. These are reasonable scores and the second best options for the LORF. /note=Gap/overlap: 32 bp gap. Relatively small gap that is well-conserved in other phages (Yaboi, Karimac) and this gap does not suggest a new gene since anything < 120 bp may be invalid. No coding potential found within this gap in GeneMark. /note=Location call: Considering all the evidence above, this is a real gene and the most likely start site is at 28062 bp. Starterator`s graphical output and summary report agree with Genemark, Glimmer, and the manual annotations. /note=Function call: There are multiple phagesDB BLASTp hits with suggested major capsid protein function with top two smallest e-values of 0 and e-153. HHPRED has strong hits with query coverages of 96.6% and 94.8% and e-values of 1.1e-29 and 1.3e-26 for major capsid protein function. CDD had no relevant hits. Multiple NCBI BLASTp hits also have major capsid protein listed with e-values such as 0 and 1e-147 in alignment with Streptomyces phage Karimac and Streptomyces phage Muntaha (96%+ coverage, 75%+ identity). Function must be major capsid protein. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Canio, Noah Luke Picart /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 29130 - 29525 /gene="51" /product="gp51" /function="hypothetical protein" /locus tag="PumpkinSpice_51" /note=Original Glimmer call @bp 29130 has strength 12.71; Genemark calls start at 29130 /note=SSC: 29130-29525 CP: no SCS: both ST: NI BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 1.42438E-90 GAP: 72 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.888, -2.895726594994071, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675290,100.0,1.42438E-90 SIF-HHPRED: SIF-Syn: NKF; upstream gene is major capsid protein (pham 15120), downstream gene is from pham 56295 , just like in phage Bordeaux and Wipeout. /note=Primary Annotator Name: RAFAEL, ADRIANA NICOLE /note=Auto-annotation start source: Both Glimmer and GeneMark call the start site 29130 /note=Phamerator: Pham 20141 on 4/23/21...gene length is not conserved: 20/63 of the phages have a gene length of 396. /note=Starterator: Start site 14, the most called start site, is not called by PumpkinSpice even though it is manually annotated 32/49 non-draft genes in Pham 20141. The start site that Starterator calls is Site (12, 29130) which agrees with the start site Glimmer and GeneMark called (29130). Site 12 is only present in 31.7% of genes in this Pham. Although it is only present in a low percentage of these genes, it is called 100% of the time when present. /note=Coding Potential: There is coding potential in the Self-trained GeneMark but not in the Host-Trained GeneMark. The coding potential is also visible in the forward strands indicating that this is a forward gene. /note=SD (Final) Score: -2.896..this is the best final score on PECAAN. /note=Gap/overlap: 72bp..this gap is somewhat large, however a gene could not be added to fix the gap. Note: this start site (29130) is the LORF /note=Location call: This start site should be kept. It covers all of the coding potentials from the GeneMark map and has the best Final Score and Z-value. It is also the LORF. However, there is a gap of 72 which is concerning and, according to Starterator, the most annotated start site (site 14) is not called in PumpkinSpice (see Starterator section). /note=Function call: No known function.NCBI and PhagesDB Blastp results all considered it to be a “hypothetical protein”. There were no CDD hits, indicating that there are no known conserved domains that match this gene. Also, the HHpred hits did not deliver a high enough e-value (< 10e-3) to be considered good evidence. However, it did have a good probability ( < 90). The absence of any predicted transmembrane domains by TMHMM and TOPCONS also makes this protein unqualified to be considered a membrane protein. /note=Transmembrane domains: TMHMM did not predict any transmembrane domains, therefore, it is unqualified to be considered a membrane protein. /note=Secondary Annotator Name: HUGO, CRISTELLE LINNETTE /note=Secondary Annotator QC: Great thorough notes! CDS 29583 - 30452 /gene="52" /product="gp52" /function="head-to-tail adaptor" /locus tag="PumpkinSpice_52" /note=Original Glimmer call @bp 29583 has strength 12.32; Genemark calls start at 29583 /note=SSC: 29583-30452 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 0.0 GAP: 57 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.611, -4.699110570850454, yes F: head-to-tail adaptor SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675291,100.0,0.0 SIF-HHPRED: Adaptor protein Rcc01688; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_C,72.3183,99.8 SIF-Syn: Head-to-tail-adaptor, upstream gene is a head-to-tail-stopper, downstream Pham is 20141, just like in phage MindFlayer. /note=Primary Annotator Name: Rivera, Bryanna /note=Auto-annotation start source: Glimmer and GeneMark, which both called the start site at 29583. /note=Phamerator: As of 04/21/21 the pham number is 56295. This gene is conserved in the subcluster BE2, and phages “Battuta” and “Genie2” were used for comparison. No function called. /note=Starterator: Start site 2 in starterator was manually annotated in 49/54 non-draft genes in this pham. This gene did had the most annotated start site, and this does correlate with the start site 29583 bp that was called by Glimmer and GeneMark. /note=Coding Potential: This gene had really good coding potential in the second forward frame. There appeared to be no coding potential in the top two forward frames, as well as the reverse frames. Coding potential was displayed only in the Self-trained GeneMark. /note=SD (Final) Score: Score of -4.699, but it is not the lowest negative score present on PECAAN. This was second lowest final score, but this start site did have the highest Z-score. It also had ATG as a start codon which is a high chance of being a start codon. /note=Gap/overlap: There is an acceptable gap of 57 bp, and it is the longest reasonable ORF for this gene call. This gap was also conserved in other genes. /note=Location call: Based on the data listed above, I definitely would not change the start site of 29583. All the evidence collected backs up the auto annotated start site. As of 04/23 with the evidence collected within Phamerator and Starterator, I yet again feel reassured that the auto-annotated start site is the correct call. /note=Function call: Based on all the data collected I believe that the function Head-To-Tail-Adaptor is the correct function call as 1) the SEA-PHAGES list said that in order to make this call there MUST be a 5A21 C or D chain in the macromolecular complex, and it was present in one of the hit’s structures. The other hit was also named “adaptor protein”, “neck”, “portal” “capsid”. 2) There were two hits, that although the e-value could have been lower, it still met the standard of a lower e-value than 10^-3. It also had a high probability and high coverage. 3) Based on the notes from the previous module, three of the four best hits had a head-to-tail-adaptor function, with really good values that met all the criteria. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Linares Cardona, Ninette /note=Secondary Annotator QC: I agree with this location call. A suggestion would be to state `Battuta` and `Genie2` in the phamerator section instead of also including the gene numbers. Also, in the starterator section, you should include that 49 of the 54 non-draft genomes had the most annotated start site. In the SD (Final) score section, it may be useful to include and comment on the z-score. In the gap/overlap section, you should include if this gap is reasonable/unreasonable and if this gap is conserved in other phages. In the location call section, you should include the start site. CDS 30449 - 30943 /gene="53" /product="gp53" /function="head-to-tail stopper" /locus tag="PumpkinSpice_53" /note=Original Glimmer call @bp 30449 has strength 5.62; Genemark calls start at 30449 /note=SSC: 30449-30943 CP: yes SCS: both ST: SS BLAST-Start: [head-to-tail stopper [Rhodococcus phage NiceHouse]],,NCBI, q1:s2 99.3902% 2.60466E-15 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.21, -4.312910097770006, yes F: head-to-tail stopper SIF-BLAST: ,,[head-to-tail stopper [Rhodococcus phage NiceHouse]],,QLF83308,61.4286,2.60466E-15 SIF-HHPRED: HEAD COMPLETION PROTEIN GP16; VIRAL PROTEIN, VIRAL INFECTION, TAILED BACTERIOPHAGE, SIPHOVIRIDAE, SPP1, VIRAL ASSEMBLY, HEAD-TO-TAIL INTERFACE, DNA GATEKEEPER, ALLOSTERIC MECHANISM; 7.2A {BACILLUS PHAGE SPP1},,,5A21_E,96.9512,95.6 SIF-Syn: Head-to-tail-stopper, upstream gene is head-to-tail adaptor, downstream gene is Pham 20616, just like in phages MindFlayer, Genie2 and Karimac. /note=Primary Annotator Name: Taheri, Armin /note=Auto-annotation start source: 30449 Glimmer and GeneMark /note=Phamerator: Pham 9829. Conserved in related phages, including TomSawyer, Mindflayer, and LukeCage. Head-to-tail stopper. Date: 4/23/21. /note=Starterator: Site number 3, position 30449 in PumpkinSpice. This is the most annotated start (called in 43 of 65 non-draft genes in pham). This start site was auto-annotated by Glimmer and GeneMark. /note=Coding Potential: Good atypical forward coding potential in GeneMarkS. The coding potential is contained within the auto-annotated start site, but does not cover the entire reading frame. /note=SD (Final) Score: For the auto-annotated start site, the final score is -4.313 and the z-score is 2.21. These scores are sufficient and the best possible scores. /note=Gap/overlap: The -4 upstream gap could indicate an operon with the previous gene. Gap is conserved in other final genomes, such as phages LukeCage, MindFlayer and Karimac. /note=Location call: This gene is real, with a start site of 30,449. /note=Function call: Head-to-tail stopper. There are two significant PhagesDB hits and two significant NCBI BLAST with the function "head-to-tail stopper" (e-values 2.60466e-15 to 2e-92) It meets specific HHpred alignment requirements as specified by SEA-PHAGES (alignment with structure 5A21_E). The upstream gene is a head-to-tail adaptor, which would be expected to occur near the head-to-tail stopper. /note=Transmembrane domains: No transmembrane domains predicted by TmHmm or Topcons. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 30936 - 31724 /gene="54" /product="gp54" /function="hypothetical protein" /locus tag="PumpkinSpice_54" /note=Original Glimmer call @bp 30936 has strength 6.45; Genemark calls start at 30936 /note=SSC: 30936-31724 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 0.0 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.491, -4.173268357864672, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675293,100.0,0.0 SIF-HHPRED: SIF-Syn: NKF, upstream gene is in pham 9829, downstream gene is Pham 77612, just like in phages MindFlayer, Genie2 /note=Primary Annotator Name: Dines, Lily /note=Auto-annotation start source: Genemark and Glimmer: 30936 /note=Phamerator: Pham 20616 on 04/21/21. Gene conserved in cluster BE phages. No function called. /note=Starterator: Reasonable and conserved start site. Start coordinate: (6, 30936). 36 of 49 call site #6. /note=Coding Potential: Reasonable coding potential predicted in self-trained genemark predicated within the ORF. Nothing for host-trained Genemark. The chosen start site covers all coding potential. /note=SD (Final) Score: -4.173; best/lowest score /note=Gap/overlap: -8 bp; reasonable. LORF creates too large of an overlap. /note=Location call: This is a real gene with a start site at 30936. The gene has a lot of synteny, the e-values are very low, there is strong coding potential in the self-trained genemark, it is an appropriate length, and both genemark and glimmer agree on the start sites. Additionally, the gene has a common start codon (LORF does not) and a common 8bp overlap (and the LORF has too large of an overlap). /note=Function call: No program returned any informative results. Therefore, there is no known function. /note=Transmembrane domains: None /note=Secondary Annotator Name: Rivera, Bryanna /note=Secondary Annotator QC: 1) Staterator: state whether the 36/49 genes that called start site 6 were non-drafts or not. 2) State that the SD score is the best and lowest score. (I’ve made that small change already, but just a heads up) Other than that everything looks great! Based on all the data/evidence collected, I definitely agree with the primary annotator that this is a real gene and the auto annotated start site is the correct call. CDS 31721 - 32224 /gene="55" /product="gp55" /function="hypothetical protein" /locus tag="PumpkinSpice_55" /note=Original Glimmer call @bp 31721 has strength 12.14; Genemark calls start at 31721 /note=SSC: 31721-32224 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_WIPEOUT_55 [Streptomyces phage Wipeout] ],,NCBI, q1:s1 100.0% 1.12164E-121 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.815, -2.967790469131549, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_WIPEOUT_55 [Streptomyces phage Wipeout] ],,QGH74301,100.0,1.12164E-121 SIF-HHPRED: SIF-Syn: This gene has no known function, and is part of Pham 63963. The gene upstream of it is from Pham 20616, and the gene downstream of it is in Pham 55588 with the function of major tail protein. The same is true in phage IchabodCrane and Battuata. /note=Primary Annotator Name: Zuelch, Avery /note=Auto-annotation start source: Both Glimmer and GeneMark call for a start site of 31721. /note=Phamerator:Pham # 14194 on 4/22/21. Gene is conserved among other BE phages such as Mildred, MindFlyer, and MulchMason. There are 63 members of this Pham with 14 of them being drafts. /note=Starterator: Starterator calls for an auto-annotated start at start number 12, position 31721. It is the most manually-annotated start with 19MA’s and is called 100% of the time when it is present. /note=Coding Potential: No coding potential on GeneMark Host. Both typical and atypical coding potential on GeneMarkS in second reading frame. Start site covers the whole coding potential. /note=SD (Final) Score: -2.968 which is the best possible. Z-score is the best available as well. /note=Gap/overlap: Overlap of 4bp which is acceptable because it is under the 50bp suggestion. /note=Location call: Start 12 @31721 by starterator, most conserved throughout the Pham and has 19 MA`s and is called 100% when it is present. Also, it is the auto-annotated start. /note=Function call: function unknown, no known function; Based on all of the data that has been collected, i think that this gene has no known function. All hits on the BLAST report that has good E-values and alignments all showed protein of unknown function. The CDD report did not yield any hits, and the two hits that were found on HHpred were very terrible hits, with positive E-values and very low probabilities. /note=Transmembrane domains:0 TMDs. Both TMHMM and TOPCONS showed 0 TMDs. This is acceptable given the fact that the function of this gene is NKF. /note=Secondary Annotator Name: Chang, Loren /note=Secondary Annotator QC: After reviewing the above evidence, I agree with the primary annotator`s location call for this gene. CDS 32331 - 32954 /gene="56" /product="gp56" /function="major tail protein" /locus tag="PumpkinSpice_56" /note=Original Glimmer call @bp 32331 has strength 11.37; Genemark calls start at 32331 /note=SSC: 32331-32954 CP: yes SCS: both ST: SS BLAST-Start: [major tail protein [Streptomyces phage StarPlatinum] ],,NCBI, q1:s1 100.0% 1.06806E-146 GAP: 106 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.713, -3.833788790802489, no F: major tail protein SIF-BLAST: ,,[major tail protein [Streptomyces phage StarPlatinum] ],,YP_009839495,99.5169,1.06806E-146 SIF-HHPRED: YSD1_22 major tail protein; Bacteriophage tail, helical assembly, VIRAL PROTEIN; 3.5A {Bacteriophage sp.},,,6XGR_H,92.2705,99.3 SIF-Syn: This gene is a major tail protein with pham number 55588, the upstream gene is NFK with pham number 63963, and the downstream gene is a tail assembly chaperone with pham number 5495, just like in phages Starbow, MindFlayer, Karimac, and IchabodCrane. /note=Primary Annotator Name: Beaudin, Catherine /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start at 32,331. /note=Phamerator: Pham 55588 as of 04/23/2021. The gene is conserved in phages IchabodCrane, Bordeaux, and Starbow which are all in the same subcluster as PumpkinSpice. The function call for this gene is major tail protein and it is consistent between Phamerator and the phams database. It is on the approved SEA-PHAGES list. /note=Starterator: Start site 1 in Starterator was manually annotated in 51/53 non-draft genes in this pham and was called 100% of the time when present. Start 1 is 32331 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Coding Potential: The ORF has reasonable coding potential in the self-trained GeneMark and the chosen start site includes all of the coding potential. /note=SD (Final) Score: -3.834. It is the second best final score on PECAAN. The Z-score of 2.713 is the second highest. This is high enough to suggest the presence of a credible ribosome binding site. /note=Gap/overlap: The 106 bp gap is somewhat large but ultimately reasonable because the gap is conserved in other phages (LukeCage, Karimac) and there is no coding potential in the gap that might be a new gene. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 32,331. Glimmer, GeneMark, and Starterator all agree. /note=Function call: Major tail protein. The top three PhagesDB hits are assigned the major tail protein function and have high query coverage (100%), high percent identity (94%+), and low e-values (1e-115). The top two NCBI BLAST hits are major tail proteins with high query coverage (100%), high percent identity (94%+), and low e-values (<2e-140). No hits returned from CDD. HHpred has a hit for major tail protein with probability of 99.3, 92% coverage, and an e-value of 7.6e-10 which is a significant hit that supports the function call. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs for this gene, therefore it is not a membrane protein. This is in line with the hypothesized function of major tail protein for this gene. /note=Secondary Annotator Name: Rivera, Bryanna /note=Secondary Annotator QC: Based on all the data/evidence gathered, I agree with the primary annotator that this is a real gene and the autoannotated start call is accurate. Great job! CDS 33051 - 33401 /gene="57" /product="gp57" /function="tail assembly chaperone" /locus tag="PumpkinSpice_57" /note=Original Glimmer call @bp 33051 has strength 16.52; Genemark calls start at 33051 /note=SSC: 33051-33401 CP: yes SCS: both ST: SS BLAST-Start: [tail assembly chaperone [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 7.33574E-77 GAP: 96 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.815, -4.0178693334748665, no F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Streptomyces phage Karimac] ],,YP_009840230,100.0,7.33574E-77 SIF-HHPRED: Phage_TAC_12 ; Phage tail assembly chaperone protein, TAC,,,PF12363.9,70.6897,98.4 SIF-Syn: The genes upstream and downstream had no known functions when compared to Starbow despite having functions in PumpkinSpice (upstream being major tail protein and downstream being tail assembly chaperone). /note=Primary Annotator Name: Bhatnagar, Keshav /note=Auto-annotation start source: Glimmer and GeneMark agree at start site of 33,051. /note=Phamerator: As of April 22nd, the pham is found in 5,495. The gene is conserved among the same and different subclusters to which my phage belongs such as Annadreamy (BK cluster), Batttuta (BE cluster), Birchlyn (DE/same cluster), and Cross (DE/same cluster). Phamerator nor phams database listed a function for this gene; However, phagesdb identifies a similar gene in other phages as a tail assembly chaperone. /note=Starterator: There is a reasonable start site conserved among members of the same and different subclusters. For the conserved start, the start site number is 3 in the pham. In my phage the start site number is 3 with position 33,051. There are 63 members in this pham and 49/49 of the final genes called this start site while none of the draft genomes called this the start site. /note=Coding Potential: Strong coding potential on forward strand on self-trained Genemark. Start site covers all coding potential. /note=SD (Final) Score: Final score is -4.018. Not the best SD score, but strong enough to suggest presence of a credible RBS. /note=Gap/overlap: 96bp gap which isn`t significant enough for a gene to be inserted (protein coding genes are at least around 200bp long). Alternative candidates have smaller ORFs. /note=Location call: It is likely a real gene with a start at 33,051. It has a conserved phamerator and starterator, strong RBS and Z-scores, a small gap, and covers all the coding potential. /note=Function call: The top 2 NCBI BLASTp hits, sorted by E-value, suggested function is tail assembly chaperone, with high query coverage (>99.1%), high % identity (>99.1%), and low E-values (2.2e-75). The top 2 PhagesDB BLAST hits had small e-values and listed the function as tail assembly chaperone. Finally, CDD didn’t have any hits but HHpred did and had a 90.2% probability with a coverage of 85% and slightly high e-value of 7.2 on PECAAN. Based on this data, I predict the function to be a tail assembly chaperone. /note=Transmembrane domains: I would’ve expected the presence of a TMD, but based on its absence in Thmhmm and Topcons we can conclude that the tail assembly chaperone doesn’t interact with the cell membrane /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 33431 - 33703 /gene="58" /product="gp58" /function="tail assembly chaperone" /locus tag="PumpkinSpice_58" /note=Original Glimmer call @bp 33431 has strength 8.5; Genemark calls start at 33461 /note=SSC: 33431-33703 CP: yes SCS: both-gl ST: SS BLAST-Start: [tail assembly chaperone [Streptomyces phage Genie2] ],,NCBI, q1:s1 100.0% 2.48552E-49 GAP: 29 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.449, -5.823649089815757, no F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Streptomyces phage Genie2] ],,QAY08721,94.382,2.48552E-49 SIF-HHPRED: SIF-Syn: tail assembly chaperone, upstream gene is tail assembly chaperone, downstream gene is tape measure protein, just like in phage BoomerJR. /note=Primary Annotator Name: Billings, Sophie /note=Auto-annotation start source: Glimmer and GeneMark. The GeneMark call start site is at 33461 and the Glimmer call start site is at 33431. /note=Phamerator: Pham 22821. Date 4/21/21. It is conserved; found in Battuta (BE), Bmoc (BE), Evy (BE) and Genie2 (BE). /note=Starterator: Start site 4 in Starterator was manually annotated by 23/44 non-draft genes in this pham. Start 4 is 33431 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is not found in GeneMark Host, but is found in GeneMark Self. /note=SD (Final) Score: -5.824. It is the best final score and it has the best z score 1.449. /note=Gap/overlap: Gap: 29bp. Fairly small, but ultimately reasonable because the gap is conserved in other phages (StarPlatinum, Bordeaux) and there is no coding potential in the gap that might be a new gene. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 33431. /note=Function call: tail assembly chaperone. Top phagesdb BLAST hits have the function tail assembly chaperone (E-value <2e-39) and 3 out of 6 top NCBI BLAST hits also have the function tail assembly chaperone. (100% coverage, 91% identities, and E-value <3e-49). CDD and HHpred had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein /note=Secondary Annotator Name: Kelly, Samuel /note=Secondary Annotator QC: I agree with this location call CDS 33796 - 40095 /gene="59" /product="gp59" /function="tape measure protein" /locus tag="PumpkinSpice_59" /note=Original Glimmer call @bp 33796 has strength 11.25; Genemark calls start at 33796 /note=SSC: 33796-40095 CP: no SCS: both ST: SS BLAST-Start: [tape measure protein [Streptomyces phage Wipeout] ],,NCBI, q1:s1 100.0% 0.0 GAP: 92 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.357, -6.015902309101854, no F: tape measure protein SIF-BLAST: ,,[tape measure protein [Streptomyces phage Wipeout] ],,QGH74305,100.0,0.0 SIF-HHPRED: Tape Measure Protein, gp57; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_CF,41.4483,99.8 SIF-Syn: [tape measure protein, upstream gene belongs to pham 22821 (tail assembly chaperone - PumpkinSpice; 05/14/2021); downstream gene belongs to pham 7753 (minor tail protein – PumpkinSpice; 05/14/2021), matching phages StarPlatinum/Battuta.] /note=Primary Annotator Name: Bruns, James /note=Auto-annotation start source: Both Glimmer and GeneMark concur with a Start at 33796. /note=Phamerator: As of 04/22/2021 the gene in question belongs in pham 20865, and is present in non-draft phages Wipeout and TomSawyer. Both phages similarly belong to cluster BE. /note=Starterator: Start three, of which 40 out of 49 non-draft genes confirm, is located on PumpkinSpice 33796 bp. This was also called by both GeneMark and Glimmer which agree with the most annotated start given by Starterator. /note=Coding Potential: Coding potential is present as typical, and atypical variety on the Self-Trained GeneMark analysis.The predicted gene length by the Self-Trained GeneMark analysis represents most of the coding potential present, but not all. /note=SD (Final) Score: Score of -6.016, and is not lowest negative score present on PECAAN. With other supporting evidence however, this is the likely start site. /note=Gap/overlap: Gap present totaling 92 bp, and is the longest reasonable ORF for this Gene call. Upon synteny comparisons (Bordeaux/IchabodCrane), the gap seen was considered correct. /note=Location call: Based on the data listed above, this is highly likely to be a true gene with a start site of 33796. /note=Function call: Tape Measure Protein. Both PhagesDB and NCBI BLASTp queries resulted in hits with E-values of 0.0, and 100% identity/coverage. CDD and HHpred searches concurred with this hypothetical function with E-values < E- 11. /note=Transmembrane domains: TMHMM analysis shows 4 TMDs, with TOPCONS confirming this evidence with hits in every analysis. Therefore this is a membrane protein, but the functional call of tape measure protein is more specific. /note=Secondary Annotator Name: Rafael, Adriana Nicole /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 40092 - 40487 /gene="60" /product="gp60" /function="minor tail protein" /locus tag="PumpkinSpice_60" /note=Original Glimmer call @bp 40092 has strength 11.5; Genemark calls start at 40092 /note=SSC: 40092-40487 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Streptomyces phage LukeCage] ],,NCBI, q1:s1 100.0% 9.66796E-88 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.066, -3.544192673744953, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Streptomyces phage LukeCage] ],,YP_009839985,100.0,9.66796E-88 SIF-HHPRED: Phage_min_tail ; Phage minor tail protein,,,PF05939.14,80.916,98.3 SIF-Syn: Minor tail protein, upstream gene is tape measure protein, downstream is minor tail protein, just like in phage Cross (BE1). /note=Primary Annotator Name: Canio, Noah /note=Auto-annotation start source: Glimmer and GeneMark call start site at 40092 bp. /note=Phamerator: Pham 7753. Date: 04/21/2021. This Pham is conserved in other BE cluster phages (Battuta and Birchlyn used for comparison). Function is called to be "minor tail protein" in a few BE phages, but it there is no function called for most BE phages. /note=Starterator: Starterator: Start site 4 is conserved in 16/51 non-draft phage annotations. The site corresponds to 40092 bp in PumpkinSpice. /note=Coding Potential: Coding potential on the ORF is on the forward strand, and it is all covered by the chosen start site. Good coding potential on Self, but it is not sufficient on Host. /note=SD (Final) Score: -3.544. It doesn`t have the best final score. However, this is irrelevant due to having a 4 bp overlap with the preceding gene. /note=Gap/overlap: The overlap is reasonable given that it is 4 bp, which suggests that this gene is part of an operon. /note=Location call: Based on the information provided, this is a real gene and its start site is 40092 bp. Starterator agrees with Glimmer and GeneMark. /note=Function call: Minor Tail Protein. While not the top hits, high quality phagesdb BLAST hits have the function of "minor tail protein" (E-value < e-55). The top 5 NCBI BLASTp hits suggested functions are unknown/hypothetical protein and minor tail protein. The top 3 functions are unknown while the lower two of the top 5 call minor tail protein. Each call has high query coverage (100%), high percent identity (>90%), and low E-values (between 4e-93 and 7e-87). There are no relevant hits from CDD while HHpred has a best hit identifying the ORF as a minor tail protein with 98.3% probability, 80.916% coverage, and an e-value of 9.6e-7. /note=Transmembrane domains: TMHMM and TOPCONS do not predict TMD`s to be present. Therefore, it cannot be identified as a transmembrane protein based on this information. /note=Secondary Annotator Name: Howe, Kathryn /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator. CDS 40491 - 43688 /gene="61" /product="gp61" /function="minor tail protein" /locus tag="PumpkinSpice_61" /note=Original Glimmer call @bp 40491 has strength 7.96; Genemark calls start at 40491 /note=SSC: 40491-43688 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Streptomyces phage IchabodCrane] ],,NCBI, q1:s1 100.0% 0.0 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.888, -3.916915894064009, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Streptomyces phage IchabodCrane] ],,QFP97376,100.0,0.0 SIF-HHPRED: b.106.1.1 (A:2-180) Baseplate protein gpP {Shewanella oneidensis [TaxId: 70863]},,,d3cdda2,17.3709,98.5 SIF-Syn: Minor tail protein, upstream gene belonged to pham 7753, downstream gene belonged to pham 22098, just like in Battuta /note=Primary Annotator Name: Castillo, Salvador /note=Auto-annotation start source: Glimmer and GeneMark same #40491 /note=Phamerator: 4/23/21 in Pham 4170. Phages Battuta and BoomerJR shares conservation of this gene and gene length. The function of the genes in the pham were consistently minor tail proteins. /note=Starterator: The start site number 3 is the conserved start site at the 40491bp position for this phage. There are 63 members total in this Pham and 32 out of 63 call site 3 /note=Coding Potential: Only in the forward direction of the Self-trained GeneMark, the start covers all the coding potential, although it may be small in the beginning. /note=SD (Final) Score: The final score is the eighth-best, but still reasonably low -3.917. The Z-score is 4th best, 2.888. This can be reasonably accepted because it is the LORF and covers all coding potential. /note=Gap/overlap: The gap is only 3bp and can`t be filled with a different start site. /note=Location call: This is a real gene, has good coding potential, and synteny with other phages like Battuta and BoomerJR which share gene length and Pham. The suggested start site has the best evidence as its the LORF, has all the coding potential, and is called the most often in starterator with the most annual annotations at this start site. /note=Function call: Both phagesdb BLASTp and NCBI BLASTp had results with 0 e-value, >99% query coverage, and >99% identity with all the given functions being a minor tail protein. HHPred called some hits with a decent e-value less than 1.9e-5, 98% probability and greater than 17% coverage stating calling it as a type of tail protein. Suggesting that this is a minor tail protein. /note=Transmembrane domains: Both TMHMM and TOPCON don’t call any hits for TMDs, this protein doesn’t interact with the membrane. /note=Secondary Annotator Name: Delgado, Yennifer /note=Secondary Annotator QC: The SD score is actually the third-best score. This might not be so relevant, but I wanted to point it out. I have QC’ed this location call and agree with the first annotator. CDS 43725 - 44093 /gene="62" /product="gp62" /function="hypothetical protein" /locus tag="PumpkinSpice_62" /note=Original Glimmer call @bp 43725 has strength 10.7; Genemark calls start at 43725 /note=SSC: 43725-44093 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 7.65888E-75 GAP: 36 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.561, -6.242072432500853, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675378,100.0,7.65888E-75 SIF-HHPRED: SIF-Syn: This gene is NFK, upstream gene is a minor tail protein, downstream is a minor tail protein, just like in phage Starbow /note=Primary Annotator Name: Cervantes, Richard /note=Auto-annotation start source: 43725 /note=Phamerator: Pham 22098 (date 4/24/2021). Conserved in following phages: BoomerJR (BE) and Comrade (BE). /note=Starterator: Start Site 4 was called for 49/49 non-draft genes for Gene 61 in PumpkinSpice. /note=Coding Potential: There appears to be coding potential in the self-trained genemark but there is no coding potential in the host-trained. This coding potential is found in the forward strands. /note=SD (Final) Score: The top given RBS final score was -6.242 which is definitely not the best. The start site 43899 has the highest Z-Score and a final score of -6.098 which is lower than the top recommended. However, the called start site of 43725 has the longest ORF, and preferred start codon, thus I deem it the best start. /note=Gap/overlap: 37 bp gap, thus around the threshold of the 30bp gap of reasonability. /note=Location call: Based on the culmination of all previous notes, such as starterator and coding potential, I believe this gene is real and the start is 43725. /note=Function call: There were no hits on CDD. Both hits on HHpred for PDB and Pfam were very large (both above 300), and the coverage was very low (high of 33%). Thus it is of unknown function. /note=Transmembrane domains: There were no TMDs called in TMHMM nor TOPCONS. /note=Secondary Annotator Name: Rivera, Bryanna /note=Secondary Annotator QC: 1) Phamerator: State whether the gene has a function or not. 2) SD Score: When stating that another start site has a better RBS score and Z-score, add a little more notes as in why you still think it’s the best even if it’s not the lowest. For example: talk about whether it’s the longest ORF and talk about it’s start codon. What set’s this start apart from the one that has slightly better scores. Other than that, I do agree that this is a real gene and this is the correct start call. Good job! CDS 44090 - 44887 /gene="63" /product="gp63" /function="minor tail protein" /locus tag="PumpkinSpice_63" /note=Original Glimmer call @bp 44090 has strength 11.79; Genemark calls start at 44090 /note=SSC: 44090-44887 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.11, -4.793289943653216, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Streptomyces phage Starbow] ],,AXH66572,100.0,0.0 SIF-HHPRED: SIF-Syn: Pham 20295 is also present in other BE2 phages like LukeCage and Starbow. In all of these phages, pham 20295 is also adjacent to pham 22098 upstream and pham 9209 downstream. Pham designations as of 05/28/2021. /note=Primary Annotator Name: Chang, Loren /note=Auto-annotation start source: Both Glimmer and GeneMark predict the start site to be at 44090 bp. /note=Phamerator: Pham 20295 (04/23/2021). It is conserved; found in Battuta (BE2), Birchlyn (BE2), and BoomerJr (BE2). /note=Starterator: Start site 3 in Starterator was manually annotated in 41 out of 63 non-draft genes in this pham. Start site 3 is position 44090 in PumpkinSpice. This agrees with the autoannotated start site predicted by Glimmer and GeneMark. /note=Coding Potential: At this ORF, GeneMark shows no coding potential, while GeneMarkS shows good coding potential (in a forward reading frame). All of the coding potential is covered. /note=SD (Final) Score: The final score is -4.793. It is the highest final score on PECAAN /note=Gap/overlap: There is an overlap of 4 bp. This overlap is not large enough to be problematic, and is conserved in other phages like MindFlayer and StarPlatinum. May indicate that it is part of an operon. /note=Location call: Given the above evidence, this is a real gene, and the most probable start site is at 44090 bp. /note=Function call: Minor tail protein. The top PhagesDB hits (E-values 1e^-147) are all associated with minor tail proteins. The top NCBI hits (100% coverage, 98.5%+ identity, E-values ~0) are also all associated with minor tail proteins. CDD gives no significant hits. Finally, HHPred gives a number of significant hits. The top two are associated with the Lynase N protein (probability 98.04, E-value 2.9e-4, coverage 58.3%) and Endoglucase H, a sugar binding protein (probability 97.98, E-value 3.6e-4, coverage 51.3%). The PhagesDB and NCBI BLAST results both suggest that the function should be minor tail protein, while the two most significant hits in HHPred give varying functions. Thus, the preponderance of evidence suggests that the function should be minor tail protein. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predicted the presence of any transmembrane domains for this protein. Some minor tail proteins have transmembrane domains, while others do not, so this result is not unexpected /note=Secondary Annotator Name: Liu, Lily /note=Secondary Annotator QC: All of the evidence categories have been considered, and I agree with this annotation. CDS 44884 - 50898 /gene="64" /product="gp64" /function="minor tail protein" /locus tag="PumpkinSpice_64" /note=Original Glimmer call @bp 44884 has strength 7.87; Genemark calls start at 44884 /note=SSC: 44884-50898 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Streptomyces phage Wipeout]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.879, -5.064680329293257, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Streptomyces phage Wipeout]],,QGH74310,99.8503,0.0 SIF-HHPRED: Receptor-type tyrosine-protein phosphatase delta; Trans-synaptic complex, Synapse organizer, HYDROLASE-IMMUNE SYSTEM complex; HET: BMA, MAN, NAG; 4.4A {Mus musculus},,,4YH7_A,22.505,99.8 SIF-Syn: As of 5/14/21: Minor tail protein, upstream is pham 20295, downstream is 55842, just like in Boomer Jr. /note=Primary Annotator Name: Castillo, Salvador /note=Auto-annotation start source: Both Glimmer and GeneMark predict the start site to be at 44884 bp. /note=Phamerator: 4/23/21 Pham 9209, compared to BoomerJR and Birchlyn share the same genome length and overlap. The function of this gene is consistently a minor tail protein. /note=Starterator: 4/21/21 Start site 2 was called the most of the time with 33 of the 49 non-draft genes in the phsam. For this phage, this start position is 44884 with 33 manual annotations. /note=Coding Potential: At this ORF, the host-trained GeneMark shows no coding potential, while the self-trained GeneMark shows good coding potential in the forward reading frame. /note=SD (Final) Score: The final score is -5.063, the 21st lowest score on PECAAN for this gene, which may be irrelevant because it may be part of an operon as evidenced by a 4bp overlap, therefore reasonable. This also makes it the LORF. /note=Gap/overlap: There is an overlap of 4 bp. This overlap is not large enough to be problematic, and is conserved in other phages like MindFlayer. /note=Location call: Given the above evidence, this is a real gene because it is conserved with members of its pham, along with the 4bp overlap in the mentioned phages that I compared it with. The start site, 44884, is conserved in starterator, called the most often and has the most manual annotations. /note=Function call: Both phagesdb BLASTp and NCBI BLASTp called other phages with >99% query coverage, 0 e-value, and >99% identity with the function as a minor tail protein. HHpred and CDD have some significant hits but aren`t phage proteins. /note=Transmembrane domains: Both TMHMM and TOPCON don`t have any hits for TMDs, therefore this protein doesn`t interact with the membrane nor a membrane protein, which makes sense with the function call. /note=Secondary Annotator Name: Do, Vivian /note=Secondary Annotator QC: I agree with the location call I would just add some details of this start site also being the longest ORF as well as add which frame is being referenced when talking about the Coding Potential. And discuss how the z-score and final score are not the best but are reasonable. CDS 50970 - 51173 /gene="65" /product="gp65" /function="hypothetical protein" /locus tag="PumpkinSpice_65" /note=Original Glimmer call @bp 50973 has strength 12.65; Genemark calls start at 50973 /note=SSC: 50970-51173 CP: no SCS: both-cs ST: NI BLAST-Start: [hypothetical protein HWB80_gp214 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 4.4539E-38 GAP: 71 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.049, -2.8298788511194775, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp214 [Streptomyces phage Karimac] ],,YP_009840238,100.0,4.4539E-38 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: AF /note=Start: 8 @50970 has 32 MA`s (out of 55 non-draft members) /note=best RBS score CDS 51148 - 51591 /gene="66" /product="gp66" /function="hypothetical protein" /locus tag="PumpkinSpice_66" /note=Original Glimmer call @bp 51148 has strength 1.5; Genemark calls start at 51148 /note=SSC: 51148-51591 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB80_gp213 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 9.54366E-105 GAP: -26 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.287, -4.134629096801125, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp213 [Streptomyces phage Karimac] ],,YP_009840239,100.0,9.54366E-105 SIF-HHPRED: SIF-Syn: Gene in pham 62410 is upstream of gene in pham 13787 (NKF) and downstream of gene in pham 55842 just like in phages Battuta, Birchlyn, Bordeaux, and BoomerJR. /note=Primary Annotator Name: Delgado, Yennifer /note=Auto-annotation start source: Glimmer and GeneMark. Both called the start at 51148. /note=Phamerator: pham: 59869. Date: 04/23/2021. It is conserved; found in Battuta_66, Birchlyn_65, and IchabodCrane_65. /note=Starterator: Start site 25 in Starterator was manually annotated in 17/49 non-draft genes in this pham. Start 25 is 51148 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found in GeneMark Self but not in GeneMark Host. /note=SD (Final) Score: -4.135. It is the second best final score on PECAAN. /note=Gap/overlap: -26 bp. The overlap is a little bit large, but reasonable because the overlap is conserved in other phages (BoomerJR, Battuta). /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 51148. /note=Function call: NKF. HHpred provided only 1 good hit while CCD showed no significant hits. In addition, the top two phagesdb BLAST hits have unknown function (E-value =2e-82), and the 3 top NCBI BLAST hits also have unknown function (hypothetical protein) with 100% coverage, 99%+ identity, and E-value <6e-104. Thus, the function of this protein is very likely to be unknown. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Ali Pour, Paria /note=Secondary Annotator QC: I have QC’ed this location call and agree with the primary annotator. CDS 51552 - 51992 /gene="67" /product="gp67" /function="hypothetical protein" /locus tag="PumpkinSpice_67" /note=Original Glimmer call @bp 51552 has strength 8.24; Genemark calls start at 51552 /note=SSC: 51552-51992 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 2.05501E-105 GAP: -40 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.066, -3.095100142625534, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675381,100.0,2.05501E-105 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Dines, Lily /note=Auto-annotation start source: 51552 /note=Phamerator: 13787 on 04/21/21. Highly conserved, compared Battuta, Birchlyn, and Bmoc. Called by 1 as minor tail protein. /note=Starterator: reasonable start site, conserved. Start coordinate: (1, 51552). 48/49 call site #6. /note=Coding Potential: Strong coding potential in self-trained gene mark that covers the entire ORF. The start site covers all of the coding potential. Host-trained gene mark does not show coding potential. /note=SD (Final) Score: -3.095; best final score /note=Gap/overlap: -40; a bit of a long overlap, but reasonable. Alternate candidate has 317 bp gap. chosen candidate has a lot of synteny. /note=Location call: Evidence suggests the original start site is the real start site. This gene is real. Start site is 51552. It is odd that genemark does not show its predicted start site on its maps, but it covers all coding potential, there is synteny, and it has the highest final score and z score. It is highly conserved among other start sites in the pham, conserved among cluster BE phages. /note=Function call: Programs yielded no useful results. Therefore, there is no known function /note=Transmembrane domains: None /note=Secondary Annotator Name: Canio, Noah /note=Secondary Annotator QC: Some comments: 1.) Auto-annotation start site. Say if it`s called be GeneMark, Glimmer, or Both. 2.) Phamerator. State the cluster of your compared phages. 3.) 48/49 non-draft or total phage genomes call the site? (Other than those, I agree with this annotation. All of the evidence categories have been considered.) CDS 52115 - 52276 /gene="68" /product="gp68" /function="HNH endonuclease" /locus tag="PumpkinSpice_68" /note=Original Glimmer call @bp 52142 has strength 4.36; Genemark calls start at 51992 /note=SSC: 52115-52276 CP: no SCS: both-cs ST: SS BLAST-Start: [HNH endonuclease [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 4.59302E-30 GAP: 122 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.113, -5.740639577273899, no F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Streptomyces phage Starbow] ],,AXH66577,100.0,4.59302E-30 SIF-HHPRED: SIF-Syn: Gene 67 is a part of Pham 57275 with a HNH endonuclease function with the upstream gene being from Pham 61910 with a metallphosphoesterase function and the downstream gene belonging to Pham 65303 with no known function. This pattern of synteny was also be see in BE2 phages Birchlyn and LukeCage. /note=AF: other SSs available, but othe rmembers of pham tend to choose start at 52,276. Also corresponds with coding potential (earlier start sites include a lot of non-coding space). This start also has better RBS stats. /note=Primary Annotator Name: Do, Vivian /note=Auto-annotation start source: Glimmer called a start site of 52142 and GeneMark called a start site of 51992. /note=Phamerator: As of 04.21.21 this gene belongs to Pham 57275. This gene is decently conserved with 285 bp but there are some difference. The noted function is HNH endonuclease. /note=Starterator: As of 04.21.21 the most manually annotated start site was start site 12 with 27 manual annotations. This corresponds with a start site of 15992 which is different from the auto-annotated site. /note=Coding Potential: Coding potential on the self-trained GeneMark on the 2nd ORF of the forward strand but not on Host-trained. /note=SD (Final) Score: -6.366 though this is not the best SD score by far it is the only score that covers the entirety of the normal coding potential. /note=Gap/overlap: -1 bp. This small overlap is pretty well conserved amongst genes in this cluster, it can also be seen in Birchlyn and MindFlayer. /note=Location call: The start site has been changed from the original 52142 to 51992 in order to encompass the entirety of the coding potential observed, it also is the longest ORF with a 1 bp overall and decent z-scores and RBS Final Scores. The SD final score is not good but it is the only one that covers all of the normal coding potential as it corresponds with the most manually annotated start site. /note=Function call:Yes, our data shows that the function of this gene is most likely HNH endonuclease as the e-values as significantly below 10e-7, the lengths match, and the top hits are from the same host and cluster. CDD top hits were of HNH endonucleases and HHpred top hits varied a little more but most pointed towards endonuclease and either mentioned no specific domain or HNH domain. All CDD and HHpred top hits had a probability greater than 99%, coverage percentage greater than 50%, and e-values significantly lower than 10e-3. Top HHpred hits noted endonuclease with no specified domain but hits with slightly lower coverage percentage or 47% noted HNH domain therefore in conjunction with our other data we can conclude it`s function is HNH endonuclease. /note=Transmembrane domains: No transmembrane domains were predicted by either TMHMM or TOPCONS meaning it does not meet the criteria to be classified as a membrane protein. This falls in line with the hypothesized function as it HNH endonucleases are integral to DNA recombination. /note=Secondary Annotator Name: Namaganda, Samali /note=Secondary Annotator QC: It looks amazing good job! I agree with the location call and the data. CDS 52317 - 53078 /gene="69" /product="gp69" /function="metallophosphoesterase" /locus tag="PumpkinSpice_69" /note=Original Glimmer call @bp 52317 has strength 6.97; Genemark calls start at 52317 /note=SSC: 52317-53078 CP: yes SCS: both ST: SS BLAST-Start: [metallophosphoesterase [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 0.0 GAP: 40 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.203, -2.2364030944336752, yes F: metallophosphoesterase SIF-BLAST: ,,[metallophosphoesterase [Streptomyces phage Karimac] ],,YP_009840242,100.0,0.0 SIF-HHPRED: Hypothetical protein yfcE; yfce, structural genomics, phosphoesterase, PSI, Protein Structure Initiative, Midwest Center for Structural Genomics, MCSG, UNKNOWN FUNCTION; HET: SO4; 2.25A {Escherichia coli} SCOP: d.159.1.7,,,1SU1_C,93.6759,99.7 SIF-Syn: /note=Primary Annotator Name: Haeri, Alliya /note=Auto-annotation start source: Both Glimmer and GeneMark call this gene with the start site of 52,317. /note=Phamerator: pham: 57,457. Date 04/24/2021. It is conserved, and is found in phages LukeCage (BE2) and Bordeaux (BE2). /note=Starterator: The gene start site 118 is only found 38/685 phages within the pham, but it is found in all manual annotations for phages in subclusters BE1 and BE2 (33/33 phages). Start 118 is 52,317 in PumpkinSpice. This evidence agrees with the site predicted by both Glimmer and GeneMark. /note=Coding Potential: Coding in this gene is only in the forward direction, suggesting that this is a forward gene. The coding potential is found only in Self-Trained GeneMark. /note=SD (Final) Score: -2.236. It is the best RBS score with a good z-score. /note=Gap/overlap: The gap is 40 basepairs. The start site does cover all the coding potential and the gap is conserved across other phages, such as LukeCage and StarPlatinum. /note=Location call: Based off the above evidence, this is a real gene with the most likely start site of 52,317. /note=Function call: Metallophosphoesterase. Both the top PhagesDB BLAST hits (e-value99%) have metallophosphoesterase listed as their function. There were no significant CDD hits. The HHpred hits with high evalues/coverage all marked domains found within organisms wildly separate from phages (e.g. humans or various amoeba parasites). As such, the BLASt hits were the primary evidence for the function call. /note=Transmembrane domains: There were no hits for a transmembrane domain by either TMHMM or TOPCONS, indicating that this is not a membrane protein. /note=Secondary Annotator Name: Torres, Canela /note=Secondary Annotator QC: I agree with the primary annotator`s location and functional calls. CDS 53129 - 53380 /gene="70" /product="gp70" /function="holin" /locus tag="PumpkinSpice_70" /note=Original Glimmer call @bp 53129 has strength 8.1; Genemark calls start at 53129 /note=SSC: 53129-53380 CP: yes SCS: both ST: SS BLAST-Start: [holin [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 1.24541E-49 GAP: 50 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.556, -3.859592806742189, yes F: holin SIF-BLAST: ,,[holin [Streptomyces sp. JV178] ],,WP_099970947,100.0,1.24541E-49 SIF-HHPRED: Phage_r1t_holin ; Putative lactococcus lactis phage r1t holin,,,PF16945.6,81.9277,99.9 SIF-Syn: Holin, upstream gene is metallophosphoesterase, downstream gene is Pham 55493, just like in phage Battuta. /note=Primary Annotator Name: Howe, Kathryn /note=Auto-annotation start source: Both Glimmer and GeneMark identified 53129 as the start site. /note=Phamerator: As of April 22, 2021, this gene was part of Pham 57925. This gene is conserved with other members of the same cluster BE, such as Battuta and IchabodCrane. Phamerator listed minor tail protein as the function of this gene, however in PhagesDB in many of the notes of similar phages, the function was listed to be holin. Both of these functions are found on the approved function list. /note=Starterator: The most conserved start site for this gene was at start site number 59, which corresponds to start site 53129 in the PumpkinSpice phage. Start site #59 was called in 242 of 281 non-draft phage annotations. /note=Coding Potential: There is a lot of good coding potential in the Self-trained GeneMark. The chosen start site covers the entirety of the coding potential. /note=SD (Final) Score: The SD Final Score for the suggested start at 53129 is -3.860 is the best. /note=Gap/overlap: The gap with the upstream gene is only 50 bp which is reasonable. This gap was also conserved within other BE2 phages, such as BoomerJR and IchabodCrane. /note=Location call: Using the information gathered so far, it is safe to call this a real gene. Starterator shows that the auto annotated start site is conserved among similar phages. The auto annotated start also has the best final score. This start site also was the longest open reading frame and had the best z-score of 2.556. Using this information, the most probable start site is at bp coordinate 53129. /note=Function call: Based on the evidence from PhagesDBBlast and NCBI, we can hypothesize that the function of this gene is holin. There are numerous alignments, such as in Mindflayer, Battuta, LilMartin, and Gudmit which call holin as the function and are within the cutoff for the e-value, have high query coverage, and have high identity values. The HHPred and CDD analysis hits with a good e-value also indicates that the function is holin. /note=Transmembrane domains: Based on the TMHMM and TOPCON data, this gene does not code for a transmembrane protein. This makes sense, since most of the data suggests the function is holin, which is not a transmembrane domain protein. /note=Secondary Annotator Name: Quijada, Britney /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator. Would suggest mentioning that this is the true LORF and the Z score for more strength in evidence. You should also mention which phages (at least 2) have the gap conserved! CDS 53367 - 53546 /gene="71" /product="gp71" /function="helix-turn-helix DNA binding domain" /locus tag="PumpkinSpice_71" /note=Original Glimmer call @bp 53367 has strength 7.04; Genemark calls start at 53394 /note=SSC: 53367-53546 CP: yes SCS: both-gl ST: SS BLAST-Start: [helix-turn-helix DNA-binding protein [Streptomyces phage IchabodCrane] ],,NCBI, q1:s1 100.0% 1.5037E-36 GAP: -14 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.77, -3.123974469270304, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA-binding protein [Streptomyces phage IchabodCrane] ],,QFP97386,100.0,1.5037E-36 SIF-HHPRED: Transcription initiation factor IIE, alpha subunit; ZINC FINGER, Transcription; HET: ZN; NMR {Homo sapiens} SCOP: g.41.3.1,,,1VD4_A,66.1017,98.4 SIF-Syn: /note=AF: hit to 3GA8_A sufficient for HTH functional call? /note=Primary Annotator Name: Hugo, Cristelle /note=Auto-annotation start source: Glimmer 53367; GeneMark 53394 /note=Phamerator: 4/24/21 Pham: 55493. It is conserved in 21/21 of the BE2 phages. No function listed. /note=Starterator: Start site 17 was manually annotated in 26/42 non-draft genes in this pham. It is 53367 in PumpkinSpice. It agrees with the site predicted by Glimmer. It was between sites 17 and 18. Within cluster BE2, site 17 was chosen 10 times, and site 18 was chosen 5 times. When start 18 was present, it was called 47.4% of the time. When start 17 was present, it was called 75.6%. This suggests that the start site 17 is more preferable over site 18. /note=Coding Potential: Good coding potential on forward strand, with start site covering all potential. (self trained) /note=SD (Final) Score: -3.124, this is the best score. /note=Gap/overlap: -14. There are no gaps upstream, as the start site (53367) causes overlap. Other phages in the cluster had this same overlap. /note=Location call: This is a real gene, and the start site may be at 53367. This needs to be double-checked, though. /note=Function call: Possibly helix-turn-helix DNA-binding protein? NKF to be safe. There are a lot of results that all have very high probabilities (>90%), low e-values, and good coverage (>45%). The last module suggested somewhat that it could be a helix turn helix protein, but the evidence wasn’t strong for that. That function was only reported in 1 phage out of the other ones that all had NKF. In PhagesDB, there were 4 non-draft phages that had the exact same score and E value, and 2 of those had reported functions of helix-turn-helix DNA-binding protein. HHpred showed a lot of good results that all match the evidence criteria and are fairly close in e value. A good amount of them had something to do with zinc, but there were others that did not. It feels like there’s not enough evidence to make a clear call on what function this gene has. /note=Transmembrane domains: 0; no predictions, function not defined, but helix-turn-helix is still possible /note=Secondary Annotator Name: Do, Vivian /note=Secondary Annotator QC: I would add information about whether the overlap was conserved in other phages. as of 05.02 Starterator said that the most called start site was 17 not 5 (this still corresponds with your start site though). When talking about coding potential you need to specify if you are talking about the host-trained or self-trained as well as which reading frame you are referring to. (ORF 3 on self trained) I would also note that this is the best z-score as well. I would also note that that there are some phages that have a function call of helix-turn-helix DNA-binding protein. You also say this agrees with the Glimmer and GeneMark start site but GeneMark calls for a different site. CDS 53622 - 54143 /gene="72" /product="gp72" /function="helix-turn-helix DNA binding domain" /locus tag="PumpkinSpice_72" /note=Original Glimmer call @bp 53622 has strength 5.83; Genemark calls start at 53622 /note=SSC: 53622-54143 CP: yes SCS: both ST: SS BLAST-Start: [helix-turn-helix DNA-binding protein [Streptomyces phage IchabodCrane]],,NCBI, q1:s5 100.0% 2.02378E-125 GAP: 75 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.429, -4.602200557842397, no F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA-binding protein [Streptomyces phage IchabodCrane]],,QFP97387,97.7401,2.02378E-125 SIF-HHPRED: Regulatory protein cox; helix-turn-helix, DNA binding, VIRAL PROTEIN; 2.401A {Enterobacteria phage P2},,,4LHF_A,36.9942,98.3 SIF-Syn: This gene "helix-turn-helix DNA binding domain" shows synteny with phages Bordeaux, Genie2, IchabodCrane. Where the upstream gene (pham 20588) and downstream gene (55493) also show synteny in those phages. All of genes are located at the same place and have the same order. /note=Primary Annotator Name: Jakupova, Malika /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start at 53622. /note=Phamerator: pham: 2317. This analysis was run 04/16/21. It is conserved; found in Battuta_72(BE) and BoomerJR_74(BE). /note=Starterator: The start number called the most often in the published annotations is 12, it was called in 16 of the 54 non-draft genes in the pham. Start site 12 is 53622 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. The ORF does have a reasonable coding potential in GeneMark-Self. Chosen start site does include all of the coding potential. GeneMark-Host doesn`t show any coding potentials. /note=SD (Final) Score: -3.606 is the best SD final score on PECAAN. However, the start site for this final score doesn`t agree with predicted start site by Glimmer and GeneMark. The predicted start site 53622 has SD final score -4.602, which is also one of the best Final Scores on PECAAN. /note=Gap/overlap: Gap: 75bp for the start site 53622, ultimately reasonable because the gap is conserved in other phages (MindFlayer, Genie2) and there is no coding potential in the gap that might be a new gene. The start site 53586 doesn`t cover all of the coding potential even though it makes the LORF. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 53622 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: PhagesDB provides strong hits for helix-turn-helix DNA binding domain (e-values 4e-99 and 5e-59). CDD doesn`t have any information about this gene. NCBI also agrees with "helix-turn-helix DNA" function for this protein ( identity 50%+, coverage 100 and very small e-values). HHpred has a strong hit that suggests that it is helix-turn-helix, with ( coverage 37%, e-value 0.0000042, probability 98%). /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Canio, Noah /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 54147 - 54449 /gene="73" /product="gp73" /function="hypothetical protein" /locus tag="PumpkinSpice_73" /note=Original Glimmer call @bp 54147 has strength 5.65; Genemark calls start at 54147 /note=SSC: 54147-54449 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB80_gp206 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 2.14763E-64 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.855, -5.32613555225216, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp206 [Streptomyces phage Karimac] ],,YP_009840246,100.0,2.14763E-64 SIF-HHPRED: SIF-Syn: Good pham and functional gene synteny up and downstream with all phages that were checked (TomSawyer, Wipeout, and Wofford). DNA helicase, DNA primase, and DNA binding protein are all conserved downstream. /note=Primary Annotator Name: Kelly, Samuel /note=Auto-annotation start source: Glimmer and GeneMark both have the start point marked at 54147. /note=Phamerator: (4/23/21) Pham 20588. This pham contains only genes from cluster BE and BK. No function was listed for any phage. /note=Starterator: Start coordinate is (6, 54147). Approx. 32% (20/63) of manually annotated non-draft phages call start site #6. Compared to the most-called start site (7 with 33.3%), it is very close to being the most-called but both percentages are fairly low. /note=Coding Potential: Lack of coding potential altogether on Host-trained GeneMark, but fairly strong coding potential on the Self-trained GeneMark. Decent amount atypical coding potential in Self-trained. Coding potential is in the forward orientation for this gene. /note=SD (Final) Score: The start site with the best Z- and Final scores is 54294, but this site is refuted by the self-trained coding potential GeneMark, which supports start site 54147. /note=Gap/overlap: Very small gap (3 bp) before gene. Overlap of 23 bp, which means this gene is part of an operon. /note=Location call: Considering the evidence, it seems reasonable to place the start site at 54147. Glimmer and GeneMark, along with Phamerator and Starterator, all call this start. /note=Function call: All of the top calls on both NCBI and PhagesDB suggest that this is a hypothetical protein/unknown function, with strong e-values of < 2e-43 and coverage of 98-100%. CDD yielded no conserved domains, and HHpred yielded only weak hits with low probabilities, which further supports this gene having NKF. /note=Transmembrane domains: Zero hits in both TMHMM and TOPCONS. /note=Secondary Annotator Name: Cervantes, Richard /note=Secondary Annotator QC: Since the start size for 54147 was called by both Glimmer and Genemark, this would overcast the different called start site of 54294 based on Z-Score/Final Score. For the Final Score, please note whether this should be evidence for or against the given start site. Please mention what the top score is as well. CDS 54427 - 54615 /gene="74" /product="gp74" /function="hypothetical protein" /locus tag="PumpkinSpice_74" /note=Original Glimmer call @bp 54427 has strength 7.63; Genemark calls start at 54427 /note=SSC: 54427-54615 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB80_gp205 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 4.53141E-37 GAP: -23 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.815, -4.273141838578173, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp205 [Streptomyces phage Karimac] ],,YP_009840247,100.0,4.53141E-37 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kim, James Joon /note=Auto-annotation start source: Glimmer and GeneMark where they both show the start to be at 54427. /note=Phamerator: Pham 57149 as of April 21st, 2021. The phages used for comparison were Battuta and Birchlyn. /note=Starterator: Starterator says start number 23 (start site 54427) is the best fit for the gene. This is because start number 23 has 11 manual annotations which is sufficient evidence for the start site to be concluded. Called 83.3% of time when present. /note=Coding Potential: The coding potential is shown on the forward strand which indicates that this is indeed a forward gene and this was visualized through the Host-Trained GeneMark data. /note=SD (Final) Score: The RBS final score is -4.273, Z-Score is 2.815, and by having these two wonderful scores, I think it`s the best option for the LORF. /note=Gap/overlap: The overlap is -23 which is an acceptable overlap when comparing it to other phages and with the given overlap, it is advised that a new gene should not be added. /note=Location call: With the given evidence, this appears to be a real gene with its respective start site at 54427. /note=Function call: Based on the evidence above, I don’t think the protein has a function because of the consistent outputs from phagesDB and NCBI’s BlastP results showing the protein function being unknown. In addition, the unknown function being supported with the 10^-7 e-value is also convincing that our protein has no function. /note=Transmembrane domains: N/A, this means that this ORF having `no known function` has no relation to the membrane protein. /note=Secondary Annotator Name: Cervantes, Richard Javier /note=Secondary Annotator QC: There should be a mention of the phages that were used in your comparison for phamerator and starterator. Please also note start site 11`s actual number (54,427 for example), under phamerator. CDS 54612 - 54959 /gene="75" /product="gp75" /function="hypothetical protein" /locus tag="PumpkinSpice_75" /note=Original Glimmer call @bp 54612 has strength 8.16; Genemark calls start at 54612 /note=SSC: 54612-54959 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 2.83612E-80 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.812, -3.0541649530135073, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_180304122,100.0,2.83612E-80 SIF-HHPRED: SIF-Syn: Pham 5301 (as of 5/26/2021), upstream is a helix-turn-helix DNA binding domain (Pham 2317 as of 5/25/2021), downstream is a DNAB-like dsDNA helicase (Pham 20731 as of 5/25/2021), just like in phage LukeCage, Genie2, and BoomerJR. /note=Primary Annotator Name: Lapurga, Kaira /note=Auto-annotation start source: Both Glimmer and Genemark call the gene and agree on the start site (Glimmer: 54612, Genemark: 54612). Start sites: 54723, 54732, 54753, 54831, 54846, 54891, 54921. /note=Phamerator: 5301. Date 4/22/2021. It is not conserved; found in Braelyn, Warpy, Sushi23, Daubenski, Mildred21, MulchMansion, Bmoc. /note=Starterator: Start site 1 in Starterator was manually annotated in 18/49 non-draft genes in this pham. Start 2 is 5301. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Coding Potential: Coding potential in the ORF is only on forward strand, indicating a forward gene. Coding potential found in Self-Trained GeneMark. /note=SD (Final) Score: Final score is -3.054. It is the best final score on PECAAN. /note=Gap/overlap: Overlap is at -4. It is the best overlap and suggests that it would most likely be the longest reasonable ORF for this the gene call. /note=Location call: Real gene and original start site should be used due to the start site covering coding potential as seen in GeneMark. /note=Function call: Unknown based on inconclusive hits from both CDD and HHpred. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Bruns, James Alan /note=Secondary Annotator QC: CDS 54931 - 56151 /gene="76" /product="gp76" /function="DnaB-like dsDNA helicase" /locus tag="PumpkinSpice_76" /note=Original Glimmer call @bp 54931 has strength 8.99; Genemark calls start at 54931 /note=SSC: 54931-56151 CP: yes SCS: both ST: SS BLAST-Start: [DnaB-like dsDNA helicase [Streptomyces phage Wofford] ],,NCBI, q1:s1 99.7537% 0.0 GAP: -29 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.766, -3.2123487306667524, no F: DnaB-like dsDNA helicase SIF-BLAST: ,,[DnaB-like dsDNA helicase [Streptomyces phage Wofford] ],,YP_009839765,99.7537,0.0 SIF-HHPRED: DNAB-Like Replicative Helicase; ATPase, REPLICATION; 3.91A {Bacillus phage SPP1},,,3BGW_E,99.2611,100.0 SIF-Syn: DnaB-like dsDNA helicase, the upstream gene is pham 5301, the downstream gene is pham 5854, just like phages Battuta, Birchlyn, BoomerJR, Bordeaux, and Genie2. /note=Primary Annotator Name: Linares Cardona, Ninette /note=Auto-annotation start source: Both Glimmer and GeneMark call the gene and they agree on the start site at 54931 bp. /note=Phamerator: The pham number as of 04/21/2021 is 20731. The gene is conserved in phages Genie2 and LukeCage, all in the same cluster as PumpkinSpice. /note=Starterator: Start site 7 was manually annotated in 35/54 non-draft genes in this pham. This start site isn`t found in this gene. Instead, start site 6 is auto-annotated for this gene which is near the same location as site 7 in other phages. Start site 6 is 54931 in PumpkinSpice. it is Called 100.0% of time when present /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is only found in Self-Trained GeneMark. The ORF does have reasonable coding potential and the chosen start site does include all of the coding potential. /note=SD (Final) Score: The final score is the best option at -3.212 and the z-score is the highest at 2.766. /note=Gap/overlap: The gap/overlap with the upstream gene is a little large at a 29 bp overlap. However, this gene is conserved in several other phages and the overlap was seen in the other phages as well, such as phage BoomerJR and LukeCage. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 54931 bp. /note=Function call: DNAB-like dsDNA helicase. Multiple PhagesDB BLAST and NCBI BLAST has hits with the suggested function DNAB-like dsDNA helicase with small e-values of 0. HHpred had hits for DNAB-like helicase with 100% probability, 99% coverage, and e-value of 6.2e-41. CDD didn`t result in specific hits. DNA helicase also produced strong hits, but more recently annotated phages concluded with DNAB-like dsDNA helicase, suggesting that this is the correct function call /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Lapurga, Kaira /note=Secondary Annotator QC: All information seems to be in order and explained well. CDS 56155 - 56457 /gene="77" /product="gp77" /function="hypothetical protein" /locus tag="PumpkinSpice_77" /note=Genemark calls start at 56152 /note=SSC: 56155-56457 CP: yes SCS: genemark-cs ST: SS BLAST-Start: [hypothetical protein SEA_TOMSAWYER_77 [Streptomyces phage TomSawyer] ],,NCBI, q1:s2 100.0% 4.53668E-69 GAP: 3 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.745, -5.345053749965421, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TOMSAWYER_77 [Streptomyces phage TomSawyer] ],,QGH78964,99.0099,4.53668E-69 SIF-HHPRED: SIF-Syn: NKF, upstream gene is a DNA helicase from pham 20731, downstream gene is a DNA primase from pham 20966, just like in phages Battuta and Birchlyn. /note=Primary Annotator Name: Liu, Lily /note=Auto-annotation start source: GeneMark calls the start site at 56152bp. Glimmer did not call a start site. /note=Phamerator: pham 5864, date 04/22/2021. It is conserved in phages such as Birchlyn, BoomerJR, and Bordeaux. /note=Starterator: GeneMark called start site 4 for this gene, at 56152bp, but starterator showed that start site 4 was manually annotated in only 5/45 non-draft genes in this Pham. (Glimmer did not call a start site.) Start site 5 was manually annotated in 40/45 non-draft genes in this Pham, at 56155bp, and this is the one I picked as the start site. /note=Coding Potential: The self-trained GeneMark shows both typical and alternative coding potential, but the host-trained GeneMark does not show any coding potential at all. Both the self-trained and host-trained GeneMarks correspond to the first reading frame. /note=SD (Final) Score: The final score is -5.345 and the z-score is 1.745. (these scores correlate to the decided start site.) /note=Gap/overlap: There is a 3bp gap, which is conserved in other phages (like Birchlyn, BoomerJR, and Bordeaux). /note=Location call: Based on the above evidence, this is a real gene and most likely has a start site at 56155bp. /note=Function call: There is no known function for this gene. All PhagesDB BLAST hits had an unknown function, with the top two e-values both being 1e-54. NCBI BLAST hits all listed this gene as a hypothetical protein, with the top two e-values both being 5e-69. CDD did not have any hits for this gene, and HHpred did not show any relevant hits for this gene. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Zuelch,Avery /note=Secondary Annotator QC: I have QC`d this gene and agree with the primary annotator due to all of the evidence above. CDS 56382 - 57458 /gene="78" /product="gp78" /function="DNA primase" /locus tag="PumpkinSpice_78" /note=Original Glimmer call @bp 56400 has strength 10.42; Genemark calls start at 56454 /note=SSC: 56382-57458 CP: yes SCS: both-cs ST: SS BLAST-Start: [DNA primase [Streptomyces phage Battuta]],,NCBI, q1:s1 100.0% 0.0 GAP: -76 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.287, -4.422065872383915, no F: DNA primase SIF-BLAST: ,,[DNA primase [Streptomyces phage Battuta]],,QRI45766,99.7207,0.0 SIF-HHPRED: DNA primase; Zinc Ribbon, TOPRIM, RNA POLYMERASE, DNA REPLICATION, TRANSFERASE; 2.0A {Aquifex aeolicus},,,2AU3_A,96.9274,100.0 SIF-Syn: DNA Primase, upstream is DNA binding protein, downstream is an unknown function, just like in phage IchabodCrane /note=AF: could possibly be DNA primase/helicase? some HHpred hits to helicase but not as strong as first primase hit. /note=Primary Annotator Name: Merlos, Andres /note=Auto-annotation start source: Glimmer called start site of 56400, while GeneMark called a start site of 56454. /note=Phamerator: Pham number 20966 was ran on 23 April 2021. This gene is highly conserved among those within its pham. Two examples include Battuta_78 and Birchlyn_77. The most like function would be DNA Primase. /note=Starterator: Site number 6 was the most published annotated start site number. This called 41 of 54 non-draft genes. (6, 56382) is the better option. /note=Coding Potential: There is coding potential in the self-trained GeneMark but not the host-trained GeneMark, with a start site of 56400 F and a stop site of 57458. /note=SD (Final) Score: -4.460 is the Final Score which is good, while 2.287 is the Z-score which is also sufficient. /note=Gap/overlap: -58, meaning it overlaps. This is an acceptable start site, as no gene needs to be added to fill the overlap. This gap is conserved in other phages. /note=Location call: The gene is real with a start site of 56400 F and stop site of 57458. The start site is sufficient as it`s Final score and Z-score are sufficient. Furthermore, there is an overlap, rather than a gap. The starterator shows that the autoannotated start site is not the best option. It would be best to switch to (6, 56382), as this covers the whole coding range, is the LORF, and is approved by the starterator. /note=Function call: There is strong evidence from both MindFlayer and Battuta that the function of this gene is DNA primase. Seen in phagesDB, there is a strong match with genes of known functions. In NCBI, there are further matches of DNA function. As such, the function of this gene is DNA primase. Finally, with CDD hits and HHpred hits, there is strong evidence that the function of this gene is DNA Primase. Based on all the data, I would say that the function of this gene is DNA primase. Previous matches on PhagesDB and NCBI would suggest as such. There are no TMDs on TMHMM or on TOPCONS. This would suggest that it is not a membrane protein, and the function of DNA primase is logical. /note=Transmembrane domains: /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 57537 - 58349 /gene="79" /product="gp79" /function="DNA binding protein" /locus tag="PumpkinSpice_79" /note=Original Glimmer call @bp 57537 has strength 14.8; Genemark calls start at 57537 /note=SSC: 57537-58349 CP: yes SCS: both ST: SS BLAST-Start: [ssDNA-binding protein [Streptomyces phage LukeCage] ],,NCBI, q1:s1 100.0% 0.0 GAP: 78 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.165, -5.631026884021104, yes F: DNA binding protein SIF-BLAST: ,,[ssDNA-binding protein [Streptomyces phage LukeCage] ],,YP_009840004,98.513,0.0 SIF-HHPRED: gp32 single stranded DNA binding protein; Zn2+ binding subdomain, 5-stranded beta-sheet, OB fold, single-stranded DNA binding, DNA BINDING PROTEIN; 2.0A {Enterobacteria phage RB69} SCOP: b.40.4.7,,,2A1K_B,68.5185,99.9 SIF-Syn: DNA binding protein, upstream is a DNA primase and downstream is DNA-E like DNA Polymerase III just like in Karimac and LukeCage. /note=Primary Annotator Name: Namaganda, Samali /note=Auto-annotation start source: Glimmer and GeneMark. Both call 57537 as the start site /note=Phamerator: As of 4/23/21 the gene is part of pham 14,552. The pham has 68 members, 14 of which are drafts. The gene is conserved in other members of the cluster BE such as Evy, Genie2, JimJam and LukeCage. /note=Starterator: Analysis was run 4/21/21. The start site that was most called in published annotations ( 37 of the 54 non-draft genes) was start (3, 57537), which is also the autoannotated start site. It is also the best start site because it agrees with the GeneMark and Glimmer Calls, it is also the only called start site of BE2, pham 14552 members. /note=Coding Potential: The ORF has coding potential on the forward strand indicating that it is a forward gene. The gene has both atypical and typical coding potential in only the self-trained GeneMark. /note=SD (Final) Score: -5.631, It is the second best SDS-score. The other possible start site with the better score has a wider gap, lower z-score and is not shared by other genes in its phamilies. /note=Gap/overlap: All the gaps are conserved in other subcluster members such Battuta, LukeCage, Genie2 and TomSawyer. /note=Location call: Start site 57537 is the LORF, has the smallest gap, best z-score and it is shared by other genes in its pham. It is also the only start site called for BE2 phages part of the 14552 pham member. /note=Function call: There is enough data to hypothesize the gene function is “DNA-binding protein”. There were many strong hits of E < 1 e -30in both the NCBI Blastp and the PhagesDB blast with the same function assigned. HHpred and CDD hits with Evalues < 9.0 e -23, 99.9% probability, and >60% coverage, certify that the gene is a DNA binding protein. /note=Transmembrane domains: Analysis ran on 3/24 showed that there are no TMH calls in TmHmm /note=Secondary Annotator Name: Bhatnagar, Keshav /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 58443 - 61763 /gene="80" /product="gp80" /function="DnaE-like DNA polymerase III (alpha)" /locus tag="PumpkinSpice_80" /note=Original Glimmer call @bp 58443 has strength 9.49; Genemark calls start at 58443 /note=SSC: 58443-61763 CP: yes SCS: both ST: SS BLAST-Start: [DnaE-like DNA polymerase III (alpha) [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 0.0 GAP: 93 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.77, -3.8898912632369362, no F: DnaE-like DNA polymerase III (alpha) SIF-BLAST: ,,[DnaE-like DNA polymerase III (alpha) [Streptomyces phage Starbow] ],,AXH66588,99.9096,0.0 SIF-HHPRED: DNA POLYMERASE III ALPHA; TRANSFERASE, DNA REPLICATION, DNA POLYMERASE III ALPHA, DNA POLYMERASE III BETA, DNA POLYMERASE III EPSILON; 7.3A {ESCHERICHIA COLI K-12},,,5FKW_A,99.0054,100.0 SIF-Syn: DnaE-like DNA polymerase III (alpha), upstream gene is DNA binding protein (pham 14552), downstream gene is from pham 12461, just like in phage IchabodCrane. /note=Primary Annotator Name: Quijada, Britney /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start site at 58443. /note=Phamerator: The pham number as of April 22, 2021 is 60182. The gene is conserved in phages BoomerJr, IchabodCrane, and Birchlyn, all in the same cluster as PumpkinSpice. PhagesDB Function Frequency listed dnae-like dna polymerase iii (alpha) as the function of this gene. /note=Starterator: Start site 71 in Starterator was manually annotated in only 57/334 non-draft genes in this pham. However, it was manually annotated the most for cluster BE2, which correlates to a start site of 58443 bp for PumpkinSpice. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found only in the Self-Trained GeneMark. The chosen start site includes all of the coding potential. /note=SD (Final) Score: The SD (Final) Score is -3.890 with a Z score of 2.77. Reasonable options for LORF. /note=Gap/overlap: 93 bp. A moderate sized gap that is well-conserved in other phages (Karimac, Battuta) and this gap does not suggest a new gene since anything < 120 bp may be invalid. No coding potential found within this gap in GeneMark. /note=Location call: Considering all of the evidence above, this is a real gene and the most likely start site is at 58443 bp. Starterator`s graphical output and summary report are in accordance with GeneMark, Glimmer, and the manual annotations. /note=Function call: There are multiple phagesDB BLASTp hits with suggested DnaE-like DNA polymerase III alpha function with top e-value hits of 0. HHPRED has strong hits with query coverages of 99% and 99.7% and e-values of 0 for DNA polymerase III alpha function. CDD had reasonable query coverages of 81% and 38% (pfam hit), e-values of 0, and % identitities of 29.3% and 43% (pfam hit) for two alignment matches to the protein function, so were checked off. Multiple NCBI BLASTp hits also have DnaE-like DNA polymerase III alpha function listed with e-values of 0 in alignment with Streptomyces phage Starbow and Streptomyces phage Annadreamy (100% coverage, 59%+ identity). Function must be DnaE-like DNA polymerase III (alpha). /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Jakupova, Malika /note=Secondary Annotator QC: I agree with this location call. CDS 61771 - 61968 /gene="81" /product="gp81" /function="hypothetical protein" /locus tag="PumpkinSpice_81" /note=Original Glimmer call @bp 61771 has strength 6.82; Genemark calls start at 61771 /note=SSC: 61771-61968 CP: no SCS: both ST: NI BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 2.46375E-38 GAP: 7 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.165, -4.85287563363746, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675387,100.0,2.46375E-38 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: RAFAEL, ADRIANA NICOLE /note=Auto-annotation start source: Both Glimmer and GeneMark call the start site 61771 /note=Phamerator: Pham 12461 on 4/23/21...The Gene is somewhat conserved: 21/41 phages have a gene length of 198. (conserved in Jay2Jay_85 and BoomerJR_83) /note=Starterator: Start site 3 (site 61771 in PumkinSpice) is the most annotated start and is called in 32/33 non-draft genes in Pham 12461. This agrees with the start site Glimmer and GeneMark called (start site 61771) /note=Coding Potential: There is coding potential in the Self-trained GeneMark but not in the Host-Trained GeneMark. The coding potential is also visible in the forward strands indicating that this is a forward gene. /note=SD (Final) Score: -4.853...This is not the best Final score on PECAAN, however, this start site is the LORF which is promising. /note=Gap/overlap: 7...This is not that big of a gap; it is an acceptable gap length. /note=Location call: Evidence suggests that this is the correct start site for this gene (Site 61771). The coding potential reaches the start site, there is a minimal gap, and the Final score and Z-value are promising. It is also the LORF. According to Starterator, this is also the most annotated start site. /note=Function call: No known function. NCBI and PhagesDB Blastp results all considered it to be a “hypothetical protein”. There were no CDD hits, indicating that there are no known conserved domains that match this gene. Also, the HHpred hits did not give suggestions for a specific function as they were all labeled “function unknown”. /note=Transmembrane domains: TMHMM does not predict any transmembrane domains, therefore it is unqualified to be called a membrane protein. /note=Secondary Annotator Name: BHATNAGAR, KESHAV /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 61946 - 62296 /gene="82" /product="gp82" /function="MazG-like nucleotide pyrophosphohydrolase" /locus tag="PumpkinSpice_82" /note=Original Glimmer call @bp 61946 has strength 13.57; Genemark calls start at 61946 /note=SSC: 61946-62296 CP: no SCS: both ST: SS BLAST-Start: [MazG-like nucleotide pyrophosphohydrolase [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 1.21211E-77 GAP: -23 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.879, -5.751911930358031, no F: MazG-like nucleotide pyrophosphohydrolase SIF-BLAST: ,,[MazG-like nucleotide pyrophosphohydrolase [Streptomyces phage Starbow] ],,AXH66590,100.0,1.21211E-77 SIF-HHPRED: Nucleotide pyrophosphohydrolase; MazG, nucleoside triphosphate pyrophosphohydrolase, HYDROLASE; 2.8A {Bacillus cereus} SCOP: a.204.1.2,,,5IE9_B,90.5172,99.6 SIF-Syn: MazG-like nucleotide pyrophosphohydrolase, upstream gene RecA-like DNA recombinase, downstream pham 12461, just like in phage Battuta. /note=Primary Annotator Name: Rivera, Bryanna /note=Auto-annotation start source: Glimmer and GeneMark, which both called the start site at 61946. /note=Phamerator: As of 04/21/21 the pham number is 57328. This gene is conserved in the subcluster BE2, and phages “Genie2” and “MindFlayer” were used for comparison. No function called. /note=Starterator: Start site 31 in Starterator was manually annotated in 24 of 144 (16.7%) non-draft genes in this pham and was called 83.3% of the times it`s been present. This does correlate with the start site 61946 bp that was called by Glimmer and GeneMark. /note=Coding Potential: There appeared to be really good coding potential, which only appeared in the second forward frame. The top and bottom frames showed no coding potential, and the reverse frames showed no coding potential. Coding potential was only found in the Self-Trained GeneMark. /note=SD (Final) Score: Score of -5.752, but it is not the lowest negative score present on PECAAN. It was the second best final score and second best z-score. /note=Gap/overlap: There is an overlap of 23 bp, and this is not the longest reasonable ORF for this gene call. Although there is another start site which is considered the LORF, its start codon is TTG which has about a 7% chance of being a start codon. This overlap was also conserved in other genes. /note=Location call: Based on all the data gathered so far, I would definitely keep the original start site as it had superb ribosome binding scores, a good start codon, and great coding potential. As of 04/23 based on the data found in Phamerator and Starterator, I firmly believe this is the accurate call. /note=Function call: Based on all the evidence collected, it strongly appears that the correct function call for this ORF is MazG-like nucleotide pyrophosphohydrolase. Not only was it classified as a nucleoside triphosphate pyrophosphohydrolase family protein, there were good CDD and HHpred hits. CDD displayed three domains in which two of those were named MazG, which is the protein structure as well. HHpred’s top three best hits had good values as well, the top one which the PDB name is: Nucleotide pyrophosphohydrolase; MazG, nucleoside triphosphate pyrophosphohydrolas, had a high probability and high percent coverage with an e-value of 3.8e-14. It’s protein structure displayed a crystal structure of the Bacillus-conserved MazG protein, a nucleotide pyrophosphohydrolase. Based on PhagesDB BLASTp, the top two hits both displayed functions of nucleotide pyrophosphohydrolase, with the top two lowest e-values of 1e-60. NCBI also displayed hits with the same function, with the top two lowest e-values being 1e-77 and 8e-77. With all this data, it is apparent that the correct function call is MazG-like nucleotide pyrophosphohydrolase. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Chang, Loren /note=Secondary Annotator QC: After reviewing the above evidence, I agree with the primary annotater`s gene location call. However, for the gap/overlap section, I do believe that the proposed start site gives the longest reasonable ORF; there is only one start that gives a longer ORF, and that ORF has very low RBS Final Score and Z-Score. CDS 62488 - 63492 /gene="83" /product="gp83" /function="RecA-like DNA recombinase" /locus tag="PumpkinSpice_83" /note=Original Glimmer call @bp 62488 has strength 14.17; Genemark calls start at 62488 /note=SSC: 62488-63492 CP: yes SCS: both ST: SS BLAST-Start: [RecA-like DNA recombinase [Streptomyces phage StarPlatinum] ],,NCBI, q1:s1 100.0% 0.0 GAP: 191 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.78, -3.948378948785431, no F: RecA-like DNA recombinase SIF-BLAST: ,,[RecA-like DNA recombinase [Streptomyces phage StarPlatinum] ],,YP_009839522,99.7006,0.0 SIF-HHPRED: Protein recA; Alpha and beta proteins (a/b, a+b), ATP-binding, Cytoplasm, DNA damage, DNA recombination, DNA repair, DNA-binding, Nucleotide-binding; 1.95A {Thermotoga maritima},,,3HR8_A,98.8024,100.0 SIF-Syn: RecA-like DNA Recombinase, upstream gene is MazG-like nucleotide pyrophosphohydrolase, downstream gene is Pham7239, just like in phages Mindflayer, LukeCage and Karimac. /note=Primary Annotator Name: Taheri, Armin /note=Auto-annotation start source: GeneMark and Glimmer called the gene start site at 62488. /note=Phamerator: Pham 55996. Conserved in related phages, including TomSawyer, Mindflayer, and IchabodCrane. DNA recombinase. Date: 4/23/21. /note=Starterator: Site number 10, position 62488 in PumpkinSpice. This site is manually annotated in 32 of 157 non-draft genes in pham. Most annotated site is number 7, which does not exist in PumpkinSpice. Start 10 Called 95.5% of time when present. /note=Coding Potential: Very good typical and atypical forward coding potential on GeneMarkS, contained by the auto-annotated start site. /note=SD (Final) Score: For the auto-annotated start site (62488), the final score is -3.948 and the z-score is 2.78. These are the second best scores. Start site 63085 has the best RBS (-3.866) and z-score (2.888), but it results in an unreasonably large gap (788bp). /note=Gap/overlap: 191. This gap is large but it is conserved in other final genomes such as phages Karimac, LukeCage, and MindFlayer. Start site 62461 creates a longer ORF (1032bp) and smaller gap (164bp), but has poor RBS (-5.222) and Z-scores (1.767) compared to the auto-annotated start site. /note=Location call: This gene is real, with a start site of 62,488. Glimmer and GeneMark agree on this start site. The most annotated start site on Starterator does not exist in PumpkinSpice, but site 62,488 has 32 manual annotations. /note=Function call: RecA-like DNA Recombinase. This call is supported by many PhagesDB and NCBI BLAST hits with e-values of 0, many HHpred hits with e-values as low as 8.39994e-41, and many CDD hits with e-values as low as 0. /note=Transmembrane domains: No transmembrane domains predicted by TmHmm or Topcons. /note=Secondary Annotator Name: Billings, Sophie /note=Secondary Annotator QC: I agree with this location call. All of the evidence categories have been considered. Note: for the gap note which phages it is conserved in CDS 63485 - 63682 /gene="84" /product="gp84" /function="hypothetical protein" /locus tag="PumpkinSpice_84" /note=Original Glimmer call @bp 63485 has strength 4.68; Genemark calls start at 63485 /note=SSC: 63485-63682 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 1.98198E-40 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.479, -4.197870532213559, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675388,100.0,1.98198E-40 SIF-HHPRED: SIF-Syn: Upstream gene is in pham 55996 and is a RecA-like recombinase; Downstream gene is in pham 3723 (NKF) just like in MindFlayer and Bordeaux /note=Primary Annotator Name: Garcia Vedrenne, Ana /note=Auto-annotation: Both Glimmer and GenMark call the gene and agree that the start site is 63485 /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is only found in Self-Trained GeneMark. /note=SD (Final) Score: The final score is the best option at -4.198 and the z score is the good at 2.479. /note=Gap/overlap: Overlap of 8bp /note=Phamerator: Pham number 7239 on 9/17/2021. Conserved in other phages in cluster BE such as Starbow and Mindflayer /note=Starterator: Pham number 7239 has 64 members, 11 are drafts. The start number called the most often in the published annotations is 5. However, Pumpkin Spice calls start 4, which is found in 20 of 64 ( 31.2% ) of genes in pham and is called 100.0% of time when present. Start: 4 @63485 has 15 MA`s /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 63485 bp /note=Function call: NKF /note=Transmembrane domains: No transmembrane proteins were predicted by either TMHMM or TOPCONS /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 63682 - 64002 /gene="85" /product="gp85" /function="Holliday junction resolvase" /locus tag="PumpkinSpice_85" /note=Original Glimmer call @bp 63682 has strength 12.9; Genemark calls start at 63682 /note=SSC: 63682-64002 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 2.73021E-72 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.165, -4.85287563363746, no F: Holliday junction resolvase SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675389,100.0,2.73021E-72 SIF-HHPRED: HOLLIDAY-JUNCTION RESOLVASE; HYDROLASE, ENZYME, HOMOLOGOUS RECOMBINATION, HOLLIDAY JUNCTION RESOLVING ENZYME, NUCLEASE, ARCHAEA, THERMOPHILE; HET: EDO, SO4; 1.8A {SULFOLOBUS SOLFATARICUS} SCOP: c.52.1.18,,,1OB8_A,88.6792,98.3 SIF-Syn: This gene has NKF and is in Pham 3723. The gene upstream of it is in Pham 7239 like in phage Battuta. The gene downstream from it is in Pham 1588like in phage Birchlyn. /note=AF: Updated function to holliday junction resolvase based on HHpred and BLAST hits /note=Primary Annotator Name: Zuelch, Avery /note=Auto-annotation start source: Both Glimmer and Genemark call for a start site of 63682. /note=Phamerator:Date: 4/23/21, Pham #3723, 68 phages in the Pham with 14 of them being drafts. Gene is conserved in other members of the BE cluster such as Samisti12, Starbow, and StarPlatinum /note=Starterator: Starterator calls an auto-annotated start site at Start site 10@ 63682 bp. This site is the Most-annotated, with it being called 49 out of the 54 times, and it has 49 MA’s. /note=Coding Potential:High coding potential for self-trained GeneMark- both typical and atypical. Found in the first reading frame. No coding Potential on the Genemark-Host. Start site covers all of the coding potential. /note=SD (Final) Score:-4.853 which is the second best, but still acceptable.Z-score is also very strong. /note=Gap/overlap:Overlap of 1bp which is common and acceptable. /note=Location call: Start site 10 @63682 bp due to all information given above such as it being the most annotated site with 49 HAs, covers all of the coding potential, and it is conserved among the Pham. /note=Function call: Based on all of the data collected, I believe that this gene has no known function. All of the hits found on BLAST that have strong E-values and good alignment correlated to a protein with a function unknown. There was no CDD hits found, and those hits found on HHpred were poor hits, with low E-values. /note=Transmembrane domains: 0 TMDs. Both TMHMM and TOPCONS showed 0 TMDs. This is acceptable given the fact that the function of this gene is NKF. /note=Secondary Annotator Name: Dines, Lily /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 63980 - 64192 /gene="86" /product="gp86" /function="hypothetical protein" /locus tag="PumpkinSpice_86" /note=Original Glimmer call @bp 63983 has strength 9.16; Genemark calls start at 63980 /note=SSC: 63980-64192 CP: yes SCS: both-gm ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_86 [Streptomyces phage Starbow]],,NCBI, q1:s1 100.0% 5.13177E-43 GAP: -23 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.815, -3.494990588194529, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_86 [Streptomyces phage Starbow]],,AXH66594,98.5714,5.13177E-43 SIF-HHPRED: SIF-Syn: This gene is NKF with pham number 1588, the upstream gene is NFK with pham number 3723, and the downstream gene is NFK with pham number 1093, just like in phages Starbow, MindFlayer, and Karimac. /note=Primary Annotator Name: Beaudin, Catherine /note=Auto-annotation start source: Glimmer and GeneMark. Glimmer calls the start at 63,983. GeneMark calls the start at 63,980. /note=Phamerator: Pham 1588 as of 04/23/2021. The gene is conserved in phages Battuta, Bordeaux, and Karimac which are all in the same subcluster as PumpkinSpice. On Phamerator, there is no function called for this gene. /note=Starterator: Start site 1 in Starterator was manually annotated in 33/33 non-draft genes in this pham. Start 1 is 63980 in PumpkinSpice. This evidence agrees with the site predicted by GeneMark, not the start site predicted by Glimmer. /note=Coding Potential: The ORF has reasonable coding potential in the self-trained GeneMark and the chosen start site includes all of the coding potential. /note=SD (Final) Score: 3.495. It is the second best final score on PECAAN. The Z-score of 2.815 is the best. This is high enough to suggest the presence of a credible ribosome binding site. /note=Gap/overlap: The 23 bp overlap with the upstream gene is reasonable. This overlap was seen in many other phages including Starbow, StarPlatinum, and TomSawyer. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 63,980. GeneMark and Starterator are in agreement. /note=Function call: NKF. The good PhagesDB BLAST hits have unknown function. The top three PhagesDB hits with no known function have high query coverage (100%), high percent identity (97%+), and low e-values (<8e-35). The top two NCBI BLAST hits are hypothetical proteins with high query coverage (93%+), high percent identity (97%+), and low e-values (<2e-42). No hits returned from CDD. The best hit on HHpred is a bacterial protein classified as “structural genomic, unknown function” with probability of 83.09, 64% coverage, and an e-value of 9. This e-value is significantly higher than the range of <10e-3 needed to constitute as strong evidence. However, this HHpred hit with high probability and coverage also has no known function which is in line with the evidence from PhagesDB and NCBI BLAST. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs for this gene, therefore it is not a membrane protein. /note=Secondary Annotator Name: Delgado, Yennifer /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 64167 - 64349 /gene="87" /product="gp87" /function="hypothetical protein" /locus tag="PumpkinSpice_87" /note=Original Glimmer call @bp 64167 has strength 7.56; Genemark calls start at 64167 /note=SSC: 64167-64349 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB80_gp192 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 5.04118E-34 GAP: -26 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.79, -3.5466635877785, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp192 [Streptomyces phage Karimac] ],,YP_009840260,100.0,5.04118E-34 SIF-HHPRED: SIF-Syn: The genes upstream and downstream have no known functions just like in phage Karmiac. /note=Primary Annotator Name: Bhatnagar, Keshav /note=Auto-annotation start source: GeneMark and Glimmer have same start source at 64,167 /note=Phamerator: As of April 22nd, the pham is found in 1093. The gene is conserved among the same subclusters to which my phage belongs such as Bordeaux_87, Enygma_93, Genie2_89, and IchabodCrane_86. Phamerator nor the phams database listed a function for this gene. /note=Starterator: There is a reasonable start site conserved among members of the same and different subclusters. For the conserved start, the start site number is 2 in the pham. In my phage the start site number is 2 with position 64,167. There are 20 members in this pham and 15/15 of the final genes called this start site while none of the draft genomes called this the start site. /note=Coding Potential: Strong coding potential on forward strand on self-trained Genemark. Start site covers all coding potential. /note=SD (Final) Score: Final score is -3.547 and is the best score. /note=Gap/overlap: 26bp overlap, which is reasonable as it is conserved in other phages and is below the 30bp threshold according to the bioinformatic guidelines. Alternative candidates don`t have a longer ORF. /note=Location call: This is likely a real gene with a start at 64,167. It has a conserved phamerator and starterator, strong RBS and Z-scores, an insignificant overlap, and covers all the coding potential. /note=Function call: There was no hit for CDD, NCBI and Phagesdb BLAST listed no functions, and HHpred had one very weak hit with a probability of 29%, coverage of 18%, and e-value of 120. Based on this information, it is safe to conclude that the ORF has no known function /note=Transmembrane domains: No results, which make sense because CDD, NCBI and phagesDB blast predicted an unknown function for this ORF. HHpred predicted it to be an elongation factor, but the probability was too low to be significant. /note=Secondary Annotator Name: Kelly, Samuel /note=Secondary Annotator QC: Agree with primary annotator`s location call. CDS 64339 - 64680 /gene="88" /product="gp88" /function="hypothetical protein" /locus tag="PumpkinSpice_88" /note=Original Glimmer call @bp 64339 has strength 15.17; Genemark calls start at 64339 /note=SSC: 64339-64680 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 1.48876E-77 GAP: -11 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.382, -4.40029072999159, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675391,100.0,1.48876E-77 SIF-HHPRED: SIF-Syn: NKF pham 15528, upstream gene is pham 1093, downstream gene is Cas4 family exonuclease, just like in phage BoomerJR. /note=Primary Annotator Name: Billings, Sophie /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start site at 64339. /note=Phamerator: Pham 15528. Date 4/20/21. It is conserved; found in Battuta (BE), Birchlyn (BE), Bmoc (BE), Boomer (BE) and Braelyn (BE). /note=Starterator: As of 2/210/22: Start site 19 in Starterator was manually annotated in 12/54 non-draft genes in this pham. Start 19 is 64339 which agrees with the auto-annotated Glimmer and GeneMark start site. Called 50.0% of time when present. /note=Coding Potential: Coding potential in this ORF is on the forward strand onl, indicating that this is a forward gene. Coding potential is not found in the GeneMark Host, but is found in the GeneMark Self. /note=SD (Final) Score: -4.400. It is the best final score and it has the best z score of 2.3822. /note=Gap/overlap: Overlap upstream (-11 bp) which is small and reasonable because the overlap is conserved in other phages (Bordeaux, Yaboi). /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 64339. /note=Function call: No known function. All of the phagesdb BLAST hits have the function listed as "unknown function" with E-values less than 1e-7 and all NCBI BLAST hits also have the function as "unknown function" with E-values less than 1e-7. CDD and HHpred had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein /note=Secondary Annotator Name: Bruns, James Alan /note=Secondary Annotator QC: Good, I would just make the reports a complete sentence. CDS 64713 - 65486 /gene="89" /product="gp89" /function="Cas4 family exonuclease" /locus tag="PumpkinSpice_89" /note=Original Glimmer call @bp 64704 has strength 10.54; Genemark calls start at 64713 /note=SSC: 64713-65486 CP: yes SCS: both-gm ST: SS BLAST-Start: [Cas4 family exonuclease [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 0.0 GAP: 32 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.576, -4.2951664350100875, no F: Cas4 family exonuclease SIF-BLAST: ,,[Cas4 family exonuclease [Streptomyces phage Starbow] ],,AXH66597,100.0,0.0 SIF-HHPRED: CRISPR-associated exonuclease, Cas4 family; MCSG, STRUCTURAL GENOMICS, PSI-BIOLOGY, EXONUCLEASE, HYDROLASE, Midwest Center for Structural Genomics; 2.65A {Pyrobaculum calidifontis JCM 11548},,,4R5Q_A,78.5992,99.8 SIF-Syn: [Cas4 family exonuclease, upstream gene belongs to pham 15528 (NKF – PumpkinSpice; 05/14/2021); downstream gene belongs to pham 1646 (NKF – PumpkinSpice; 05/14/2021), matching phages Starbow/IchabodCrane.] /note=AF: added addtl evidence specific to Cas4 /note=Primary Annotator Name: Bruns, James /note=Auto-annotation start source: Glimmer and GeneMark do not concur. Gimmer Start predicted at 64704. GeneMark Start pedicted at 64713 /note=Phamerator: As of 04/22/2021 the gene in question belongs in pham 14867, and is present in non-draft phages Wipeout and TomSawyer. Both phages similarly belong to cluster BE. /note=Starterator: Start eight, of which 17 out of 54 non-draft genes confirm, is located on PumpkinSpice 64713 bp. GeneMark and the scores listed above concur with this call, suggesting it is correct. /note=Coding Potential: Coding potential is present as typical, and atypical variety on the Self-Trained GeneMark analysis. Coding potential is absent with regards to the Host-Trained GeneMark analysis. The predicted gene length by the Self-Trained GeneMark analysis represents the whole gene, including the predicted start site. Gene is in the forward direction. /note=SD (Final) Score: Score of -4.295, and is the lowest negative score present on PECAAN. /note=Gap/overlap: Gap present of 32 bp, and is not the longest reasonable ORF for this Gene call, however given synteny with Final Draft Phages Bordeaux & IchabodCrane, Final score, and Z-score, this is most likely to be the correct start sequence. /note=Location call: Based on the data listed above, this is highly likely to be a true gene with a start site of 34713. /note=Function call: Cas4 family exonuclease. NCBI BLASTp hits included two matches with 100% coverage and an identity match of 100% for hit AXH66597, and 99.6109 for QGH74334. CDD and HHpred concurred with hits with E-values of 93%), and low E-values (between 6e-42 and 4e-49). There is insufficient information from the CDD and HHpred hits to suggest the ORF’s function. There are no relevant hits from CDD while HHpred has a best hit with 22.7% probability, 21.9512% coverage, and an e-value of 180. /note=Transmembrane domains: TMHMM and TOPCONS do not predict TMD`s to be present. Therefore, it cannot be identified as a transmembrane protein based on this information. /note=Secondary Annotator Name: Hugo, Cristelle /note=Secondary Annotator QC: Good clear notes! CDS 65755 - 66324 /gene="91" /product="gp91" /function="RuvC-like resolvase" /locus tag="PumpkinSpice_91" /note=Original Glimmer call @bp 65755 has strength 7.83; Genemark calls start at 65749 /note=SSC: 65755-66324 CP: yes SCS: both-gl ST: SS BLAST-Start: [RuvC-like resolvase [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 3.84105E-137 GAP: 6 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.987, -5.350537590564816, no F: RuvC-like resolvase SIF-BLAST: ,,[RuvC-like resolvase [Streptomyces phage Karimac] ],,YP_009840264,100.0,3.84105E-137 SIF-HHPRED: c.55.3.6 (A:) RuvC resolvase {Escherichia coli [TaxId: 562]},,,d1hjra_,92.5926,99.9 SIF-Syn: As of 5/14/21: RuvC-like resolvase, upstream pham is 1646, downstream pham 4359, just like in phage Battuta. /note=Primary Annotator Name: Castillo, Salvador /note=Auto-annotation start source: Glimmer #65755, GeneMark #65749 /note=Phamerator:04/23/21 Pham 2479 with 68 pham members. Compared to Battuta and BoomerJR. The most common given function for this gene is "RuvC-like Resolvase" /note=Starterator: The most annotated start site choice is number 3, and for this phage, this is at position 65755bp with 40 manual annotations. 40 of the 54 members call site 3 as the most conserved site. /note=Coding Potential: Only in the forward direction of the Self-Trained GeneMark. The suggested start is somewhat outside of the coding potential but is also the second-longest reading frame. /note=SD (Final) Score: The final score of the suggested is only 5th best (smallest), -5.351, but doesn`t vary much from the ones below it. /note=Gap/overlap: The gap is 6 bp and can be filled with the alternative start site, 65749. However, this is further away from coding potential. /note=Location call: The coding potential and conservation through Pham suggests that this is a real gene. The suggested start site is likely the real one because it covers all the coding potential. The most favorable evidence for this start site is that it was conserved in Starterator with the most manual annotations, 65755 is the most likely start site. /note=Function call: Both phagesdb BLASTp and NCBI BLASTp suggested the function RuvC-like resolvase and called phages with >99% query match, >99$ identity, and e-values less (better) than 1e-103 and 6e-137 respectively. HHpred showed hits with the same function with >99% probability, >92% coverage, and e-values below 1e-22. /note=Transmembrane domains: Neither TMHMM nor TOPCONS show hits for TMDs. Therefore this protein doesn`t interact or is part of the membrane, which is reasonable for the suggested function for this protein. /note=Secondary Annotator Name: Namaganda, Samali /note=Secondary Annotator QC: I agree with the location call for the gene and the explanation. CDS 66326 - 66790 /gene="92" /product="gp92" /function="DprA-like DNA processing chain A" /locus tag="PumpkinSpice_92" /note=Original Glimmer call @bp 66326 has strength 5.11; Genemark calls start at 66326 /note=SSC: 66326-66790 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 3.99335E-108 GAP: 1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.066, -2.523003374675015, yes F: DprA-like DNA processing chain A SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675393,100.0,3.99335E-108 SIF-HHPRED: DNA processing protein DprA; SAM and Rossmann Fold, DNA processing protein A, DNA BINDING PROTEIN; HET: SO4, MSE; 2.7A {Streptococcus pneumoniae},,,3UQZ_C,95.4545,99.9 SIF-Syn: This is a DprA-like DNA processing chain A, upstream gene is a helix-turn-helix DNA binding protein , downstream is a RuvC-like resolvase , just like in phage BoomerJR /note=Primary Annotator Name: Cervantes, Richard /note=Auto-annotation start source: The auto generated start site is 66326. /note=Phamerator: Pham 4359 (date 4/24/2021). Conserved in following phages: BoomerJR (BE) and Battuta (BE) /note=Starterator: Start Site 8 was called for 33/49 non-draft genes in Gene 91 within PumpkinSpice. /note=Coding Potential: There is coding potential in the self trained genemark but not the host-trained. This coding potential is in the forward strands. /note=SD (Final) Score: The final score was -5.368 for start site , which was not the closest final score to zero. The start site at 66326 has a final score of -2.523 which is way lower than our top final score (with the longest ORF). Thus I call that 66326 is the correct start site based on scoring. /note=Gap/overlap: There is a 1 bp gap, essentially no gap. /note=Location call: Unknown function, as there were no CDD hits, and the HHpred hits for PDB and Pfam were rather big. Pfam had a probability of 57.9% with an e-value of 60. PDB had a better probability of 81.9% and an E-Value of 12. The PDB Hit had a function of “structural genomics” which is not in the approved Sea Phages Protein list. However on the HHpred list of proteins there appears to be a list of similair proteins marked as DNA Processing or again, Structural Genomics. /note=Function call: Based solely on the NCBI Genbank, we can conclude that the protein is used in a DNA Processing chain A. /note=Transmembrane domains: There were no TMDs called in TMHMM nor TOPCONS. /note=Secondary Annotator Name: Jakupova, Malika /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. Please, check the box under "Pham starterator" to prove that this start site is correct. Also, please elaborate what start site 8 is regarding to Pumpkinspice. You can write that "Start 8 is 66326 in PumpkinSpice" and that it also agrees with Glimmer and Genemark. Please, indicate if the coding potential is in the F or R directions only in your Coding potential notes. Also, the gap is 1, which might be an error, please update your notes. Lastly, check boxes for starterator and Coding potential. CDS 66783 - 66932 /gene="93" /product="gp93" /function="helix-turn-helix DNA binding domain" /locus tag="PumpkinSpice_93" /note=Original Glimmer call @bp 66783 has strength 2.04; Genemark calls start at 66783 /note=SSC: 66783-66932 CP: yes SCS: both ST: SS BLAST-Start: [helix-turn-helix DNA binding domain protein [Streptomyces phage LilMartin] ],,NCBI, q1:s1 100.0% 3.11604E-14 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.479, -3.7329837339109084, no F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding domain protein [Streptomyces phage LilMartin] ],,QNN98339,79.5918,3.11604E-14 SIF-HHPRED: Transcriptional regulator ComR; RNPP family TPR domain HTH domain bacterial signaling peptide binding, TRANSCRIPTION; 1.894A {Streptococcus vestibularis F0396},,,6HU8_A,89.7959,97.1 SIF-Syn: Pham 17505 is also present in other BE2 phages like Wipeout and TomSawyer, and in all of these phages pham 17505 is also adjacent to pham 4359 upstream and pham 56079 downstream. Pham designations as of 05/28/2021. /note=Primary Annotator Name: Chang, Loren /note=Auto-annotation start source: Both Glimmer and GeneMark call the start site to be at 66783 bp. /note=Phamerator: Pham 17505 (04/23/2021). It is conserved; found in BoomerJR (BE2), Bordeaux (BE2) and Battuta (BE2). /note=Starterator: Start 13 in Starterator is called by 27 out of 67 of the non-draft genes in this pham. Start 13 is position 66783 in PumpkinSpice. This agrees with the auto-annotated start site predicted by Glimmer and GeneMark. /note=Coding Potential: At this ORF, GeneMark Host shows no coding potential, while GeneMark Self shows good coding potential (in a forward reading frame). All of the coding potential is covered. /note=SD (Final) Score: The final score is -3.733. This is the second highest final score on PECAAN. /note=Gap/overlap: There is an overlap of 8 bp. This overlap is not large enough to be problematic. Additionally, this overlap is conserved in phages like TomSawyer and Wofford. /note=Location call: Given the above evidence, this appears to be a real gene, with the most probable start site at 66783 bp. /note=Function call: Helix-turn-helix DNA binding domain. Two top PhagesDB BLAST hits (E-values 1e-22 and 3e-12) are associated with helix-turn-helix DNA binding domains/proteins. Two top NCBI BLAST (93.8%+ coverage, 66+% identity, E-values< 3.68e-13) are also associated with helix-turn-helix DNA binding domains/proteins. CDD gives one significant hit (E-value 9.9e-4) associated with chromosome-anchoring protein RacA, which according to previous research contains a helix-turn-helix DNA-binding domain (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4914108/). Finally, the top HHPred hit is associated with transcriptional regulator ComR (Probability 97.82%, E-value 2.6e-4, coverage 90%), which contains a helix-turn-helix DNA binding motif (https://www.uniprot.org/uniprot/P75952). Ultimately, all the evidence avaliable suggests that the function should be helix-turn-helix DNA binding domain. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predicted any TMDs for this protein, suggesting that it is not a membrane protein. This makes sense, as DNA-binding domains should be interacting closely with the DNA in the interior of the capsid, and not with the membrane. /note=Secondary Annotator Name: Lapurga, Kaira /note=Secondary Annotator QC: All information seems to be in order and explained enough/well. CDS 66929 - 67720 /gene="94" /product="gp94" /function="methyltransferase" /locus tag="PumpkinSpice_94" /note=Original Glimmer call @bp 66929 has strength 7.71; Genemark calls start at 66929 /note=SSC: 66929-67720 CP: yes SCS: both ST: SS BLAST-Start: [methyltransferase [Streptomyces phage Birchlyn] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.235, -2.0895162839948163, yes F: methyltransferase SIF-BLAST: ,,[methyltransferase [Streptomyces phage Birchlyn] ],,QDF17265,100.0,0.0 SIF-HHPRED: V_cholerae_RfbT ; Vibrio cholerae RfbT protein,,,PF05575.12,72.2433,99.9 SIF-Syn: This gene is a methyltransferase in pham 56079. Upstream gene is in pham 17505 and is a helix-turn-helix binding domain; Downstream gene is in pham 27275 (no function yet for this gene, but other have as thymidylate kinase) just like in MindFlayer and Bordeaux /note=Primary Annotator Name: Garcia Vedrenne, Ana /note=Auto-annotation: Both Glimmer and GenMark call the gene and agree that the start site is 66929 /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is only found in Self-Trained GeneMark. /note=SD (Final) Score: The final score is the best option at -2.090 and the z score is the best at 3.235 /note=Gap/overlap: Overlap of 4 bp, suggesting it is part of an operon /note=Phamerator: Pham number 56079 on 9/18/2021. Conserved in other phages in cluster BE such as Starbow and Mindflayer. Other genes have methyltransferase listed as function /note=Starterator: Pham number 56079 has 36 members, 8 are drafts. The start number called the most often in the published annotations is 5, it was called in 27 of the 28 non-draft genes in the pham, including PumpkinSpice. Start: 5 @66929 has 27 MA`s /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 66929 bp /note=Function call: methyltransferase. Phagesdb has many hits with e-value of 0 that call this function; NCBI blast also returns many hits high identity, high coverage and low e-values of 0. HHPRED and CDD also have good hits for this function /note=Transmembrane domains: No transmembrane proteins were predicted by either TMHMM or TOPCONS /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 67717 - 68340 /gene="95" /product="gp95" /function="thymidylate kinase" /locus tag="PumpkinSpice_95" /note=Original Glimmer call @bp 67717 has strength 10.17; Genemark calls start at 67717 /note=SSC: 67717-68340 CP: yes SCS: both ST: SS BLAST-Start: [thymidylate kinase [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 1.69674E-151 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.392, -3.993395564091666, no F: thymidylate kinase SIF-BLAST: ,,[thymidylate kinase [Streptomyces phage Karimac] ],,YP_009840268,99.5169,1.69674E-151 SIF-HHPRED: THYMIDYLATE KINASE; NUCLEOTIDE BIOSYNTHESIS, ATP-BINDING, NUCLEOTIDE-BINDING, KINASE, POXVIRUS, TMP KINASE, TRANSFERASE; HET: POP, TYD; 2.4A {VACCINIA VIRUS COPENHAGEN} SCOP: c.37.1.0,,,2V54_B,92.7536,99.7 SIF-Syn: This gene is a thymidylate kinase in pham 27275. Upstream gene is in pham 56079 and is a methyltransferase; Downstream gene is in pham 62554 (NKF) just like in MindFlayer and Bordeaux /note=Primary Annotator Name: Garcia Vedrenne, Ana /note=Auto-annotation: Both Glimmer and GenMark call the gene and agree that the start site is 67717 /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is only found in Self-Trained GeneMark. /note=SD (Final) Score: The final score is second best option at -3.993 and the z score is the best at 2.392 /note=Gap/overlap: Overlap of 4 bp, suggesting it is part of an operon /note=Phamerator: Pham number 27275 on 9/18/2021. Conserved in other phages in cluster BE such as Starbow and Mindflayer. Other genes have thymidylate kinase listed as function /note=Starterator: Pham number 27275 has 42 members, 8 are drafts.The start number called the most often in the published annotations is 9, it was called in 12 of the 34 non-draft genes in the pham, including PumpkinSpice. Start: 9 @67717 has 12 MA`s /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 67717 bp /note=Function call: thymidylate kinase. Phagesdb has many hits that call this function; NCBI blast also returns many hits with high identity, high coverage and low e-values. HHPRED also has good hits for this function /note=Transmembrane domains: No transmembrane proteins were predicted by either TMHMM or TOPCONS /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 68362 - 68682 /gene="96" /product="gp96" /function="hypothetical protein" /locus tag="PumpkinSpice_96" /note=Original Glimmer call @bp 68362 has strength 10.83; Genemark calls start at 68362 /note=SSC: 68362-68682 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_96 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 5.61728E-73 GAP: 21 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.912, -3.8154491356968365, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_96 [Streptomyces phage Starbow] ],,AXH66604,100.0,5.61728E-73 SIF-HHPRED: SIF-Syn: Gene in pham 62554 is upstream of gene in pham 8003 (DNA binding protein) and downstream of gene in pham 27275 just like in phages Battuta, Birchlyn, Bordeaux, and BoomerJR. /note=Primary Annotator Name: Delgado, Yennifer /note=Auto-annotation start source: Glimmer and GeneMark. Both called the start at 68362. /note=Phamerator: pham: 55743, Date: 04/23/2021. It is conserved; found in Battuta_96, Bmoc_93, and BoomerJR_98. /note=Starterator: Start site 22 in Starterator was manually annotated in 33/303 non-draft genes in this pham. Start 22 is 68362 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer and Genemark. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found in GeneMark Self but not in GeneMark Host. /note=SD (Final) Score: -3.815. It is the second best final score on PECAAN, which is very reasonable given that the start site for the best SD score yields a gene that is 63 bp long (possibly not a real gene). /note=Gap/overlap: 21 bp. The gap is very small, so that no gene could possibly be added there. In addition, the gap is conserved in phages Battuta and BoomerJR. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 68362. /note=Function call: NFK. Both CDD and HHpred did not show any significant hits for this gene. The top two phagesdb BLAST hits have unknown function (E-value =2e-59 ), and the 3 top NCBI BLAST hits also have unknown function (hypothetical protein) with 100% coverage, 90%+ identity, and E-value <8e-66. Thus, the function of this gene is unknown. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Do, Vivian Tuongvi /note=Secondary Annotator QC: Needs to specify which reading frame you are referring to. Add information about how the SD score is not the best but reasonable considering the best score is given to the start site that yields a gene that is 63 bp long (not possible). Add that it was also the LORF. Discuss whether or not the gap was conserved in other phages and add notes on how the most manually annotated start site is not available for this phage, though I do agree with your location call! CDS 68675 - 69199 /gene="97" /product="gp97" /function="hypothetical protein" /locus tag="PumpkinSpice_97" /note=Original Glimmer call @bp 68675 has strength 12.66; Genemark calls start at 68675 /note=SSC: 68675-69199 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB80_gp182 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 2.51042E-123 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.066, -2.523003374675015, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp182 [Streptomyces phage Karimac] ],,YP_009840270,100.0,2.51042E-123 SIF-HHPRED: RNA polymerase sigma-H factor; PSI, MCSG, Structural Genomics, Midwest Center for Structural Genomics, Protein Structure Initiative, RNA BINDING PROTEIN; 2.5A {Fusobacterium nucleatum subsp. nucleatum},,,3MZY_A,21.8391,96.5 SIF-Syn: NKF; Upstream gene is in pham 8003; downstream gene is terminase in pham 15068, just like in Bordeaux and Battuta /note=Primary Annotator Name: Dines, Lily /note=Auto-annotation start source: 68675 was called for Glimmer and Genemark. /note=Phamerator: Pham 8003 on 04/21/21. Most phages in this pham are cluster BE --> conserved. Compared start site with Starbow and Starplatinum. DNA binding protein is the function called. /note=Starterator: Reasonable start site conserved. Start coordinate: (5, 68675). 48/54 call site #5. /note=Coding Potential: Coding potential in self-trained Genemark spanning the entire ORF. the start site covers all of this coding potential. /note=SD (Final) Score: -2.523; best final score. /note=Gap/overlap: -8; reasonable overlap. /note=Location call: Evidence suggests original start site is real. This gene is real. Most likely start site candidate is 68675. Genemark and glimmer both predicted the same start site. There is reasonable gene overlap, and the best z score and final score. All coding potential covered by this start site, and is conserved in Starterator. The gap is conserved in Pham Maps. /note=Function call: CDD predicted no functions. The top 4 HHpred hits, sorted by probability, suggested function is DNA binding protein, with high probability (>95.79), and e-values that are relatively small however do not reach the threshold. Upon further investigation into Phagesdb BLAST, can confidently conclude that this is a DNA binding protein. /note=Transmembrane domains: None /note=Secondary Annotator Name: Linares Cardona, Ninette /note=Secondary Annotator QC: I agree with this location call. In the starterator section, the start site should be 68675 instead of 51522. In the coding potential section, you should include if the ORF has reasonable coding potential. In the SD (Final) score section, it may be useful to include and comment on the z-score. In the gap/overlap section, you should mention if this overlap is conserved in other phages. Also, the Pham Starterator menu has not been filled out. CDS 69261 - 71081 /gene="98" /product="gp98" /function="terminase, large subunit" /locus tag="PumpkinSpice_98" /note=Original Glimmer call @bp 69261 has strength 6.78; Genemark calls start at 69261 /note=SSC: 69261-71081 CP: yes SCS: both ST: SS BLAST-Start: [terminase [Streptomyces phage Wipeout] ],,NCBI, q1:s2 100.0% 0.0 GAP: 61 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.999, -3.8876811084318192, no F: terminase, large subunit SIF-BLAST: ,,[terminase [Streptomyces phage Wipeout] ],,QGH74343,99.8353,0.0 SIF-HHPRED: Terminase_1 ; Phage Terminase,,,PF03354.17,82.8383,100.0 SIF-Syn: Gene 97 is a part of Pham 15068 with a terminase, large subunit function with the upstream gene being from Pham 8003 with a DNA binding protein function and the downstream gene belonging to Pham 2336 with HNH endonuclease function. This pattern of synteny was also be see in StarBow and StarPlatnium except the the functions were listed as terminase (large subunit not mentioned) and genes belonging to Pham 8003 had no listed functions. /note=Primary Annotator Name: Do, Vivian /note=Auto-annotation start source: Both Glimmer and GeneMark called a start site of 69261. /note=Phamerator: As of 04.21.21 this gene belongs to Pham 15068. The gene is highly conserved with a length of 1821 bp and has a noted function of terminase. This gene is also seen in phages such as Battuta and Birchlyn. /note=Starterator: As of 04.21.21 start site 10 was the most manually annotated start sit with 35 MA out of 54 non-draft genes. This corresponds with a start site of 69,291 bp which was the auto-annotated start site. /note=Coding Potential: No coding potential on the Host-trained GeneMark. High coding potential throughout on the 3 ORF of the forward strand on the Self-trained GeneMark. /note=SD (Final) Score: -3.888 this is a rather good score. Though it is not the best it is reasonable as it covers the entire coding potenial it also has a ATG start codon. Z-score is also rather high with it being over 2. /note=Gap/overlap: 61 bp gap is well conserved and can also be seen on LukeCage and Starbow. /note=Location call: We have kept the start site at 69621 as it encompasses the entirety of the coding potential while having good z-score and RBS final score. It also maintains a decently sized gap of 61. This can also be confirmed as it was also the most manually annotated start site and a length of 1821 bp is well conserved amongst other genes in this pham. /note=Function call: Terminase as the e-values as significantly below 10e-7, the lengths match, and the top hits are from the same host and cluster evidence from phages Wipeout and TomSawyer. CDD showed a top hit of Terminase with a e-value meeting the significance threshold of 10e-3. HHpred have good insight with hits of pfam and PDB noting terminase function with coverage percentages greater than 50%, probabilities of 100, and e-values around 10e-30 which is significantly low. CDD and HHpred gave insight that this would mostly be the large subunit of the terminase protein. /note=Transmembrane domains: There were no transmembrane domains detected by wither TMHMM or TOPCONS meaning it does not meet the criteria to be a membrane protein. This falls in line with hypothesized function as terminases are integral to DNA packing meaning its gene product would be predominantly in the nucleus not the membrane domain. /note=Secondary Annotator Name: Linares Cardona, Ninette /note=Secondary Annotator QC: I agree with this location call. In the phamerator section, you should include a couple of the phages that also contain this gene. In the starterator section, I suggest that you also include that this start site was called in 35 out of the 54 non-draft genes. In the SD (Final) score section, it may be useful to include and comment on the z-score. CDS 71095 - 71622 /gene="99" /product="gp99" /function="HNH endonuclease" /locus tag="PumpkinSpice_99" /note=Original Glimmer call @bp 71095 has strength 2.8; Genemark calls start at 71230 /note=SSC: 71095-71622 CP: yes SCS: both-gl ST: SS BLAST-Start: [HNH endonuclease [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 2.65829E-126 GAP: 13 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.214, -2.484606983760709, yes F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Streptomyces sp. JV178] ],,WP_099970955,100.0,2.65829E-126 SIF-HHPRED: d.4.1.8 (A:775-907) CRISPR-associated endonuclease Cas9/Csn1, HNH domain {Streptococcus pyogenes [TaxId: 301447]},,,d4oo8a2,38.8571,98.6 SIF-Syn: /note=Primary Annotator Name: Haeri, Alliya /note=Auto-annotation start source: Glimmer calls this gene with a start site of 71,095. GeneMark uses a different start site (71230) that does not fully encompass all coding potential. /note=Phamerator: pham: 2336. Date 4/24/2021. It is conserved, found in phages Bordeaux and IchabodCrane. /note=Starterator: Start site 17 was manually annotated in 34/54 non-draft genes and in all 34 non-draft genes that have this start site as an option. Start site 17 is 71,095 in PumpkinSpice. This evidence agrees with the start site called by Glimmer. /note=Coding Potential: Coding potential is only found in the forward direction for this gene, suggesting it is a forward gene. The start site covers all coding potential, which was only found in the Self-Trained GeneMark. /note=SD (Final) Score: The final score is -2.485, which is the best score and all the lower RBS scores correlate with start sites that would create much larger gaps between genes. /note=Gap/overlap: The gap is 13 basepairs, which is relatively small but conserved in other phages within the cluster BE2 such as Bordeaux and IchabodCrane. /note=Location call: Based off the above evidence, this gene is likely real and has a start site of 71,095. /note=Function call: HNH endonuclease. The top three PhagesDB BLAST hits (evalue=e-107) with listed functions have the function of HNH endonuclease. The top five NCBI BLAST hits (100% coverage, evalue<7e-123, identity 159+/175) also have HNH endonuclease as their listed function. The CDD hit has HNH endonuclease as the listed classification, with the top domain hit of HNH endonuclease having an evalue equal to 7.63e-6. HHpred also has two hits with HNH endonuclease as the listed classification, one with a probability of 98.65%, coverage of 38.85%, and e-value of 7.7e-8, and the other with a probability of 98.61%, coverage of 39.42, and e-value of 7.1e-8. /note=Transmembrane domains: There were no TMD hits by either TMHMM or TOPCONS, indicating that this is not a membrane protein. /note=Secondary Annotator Name: Kim, James Joon /note=Secondary Annotator QC: All the annotations look very good and detailed. I agree with the above annotations and believe this gene is real. CDS 71632 - 72252 /gene="100" /product="gp100" /function="hypothetical protein" /locus tag="PumpkinSpice_100" /note=Original Glimmer call @bp 71632 has strength 8.24; Genemark calls start at 71632 /note=SSC: 71632-72252 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB80_gp179 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 8.36004E-149 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.999, -3.4105598537121566, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp179 [Streptomyces phage Karimac] ],,YP_009840273,100.0,8.36004E-149 SIF-HHPRED: SIF-Syn: NKF (Pham 9114), upstream gene is HNH endonuclease, downstream gene is Pham 19890, just like in phage Genie2. /note=Primary Annotator Name: Howe, Kathryn /note=Auto-annotation start source: Both Glimmer and GeneMark identify 71632 as the start site. /note=Phamerator: As of April 22, 2021, this gene was part of Pham 9114. This gene is conserved with other members of the same cluster BE, such as Battuta and IchabodCrane. Phamerator listed helix-turn-helix DNA-binding protein as the function of this gene, however in PhagesDB in many of the notes of similar phages, the function was listed to be DNA ligase. Both of these functions are found on the approved function list. /note=Starterator: The most conserved start site for this gene was at start site number 7 which was called in 44 0f 61 non-draft phage annotations, however start site 7 does not have a coordinating start site in the PumpkinSpice phage. When present, however, start site 6 is called 100% of the time and is conserved among BE2 subclusters which PumpkinSpice is a part of. Start site 6 corresponds to the bp coordinate 71632 in the PumpkinSpice phage. Start site #6 was called in 14 of 61 non-draft phage annotations. /note=Coding Potential: There is a lot of coding potential at the chosen start site in the Self-trained GeneMark. The chosen start site at 71632 covers all of the coding potential. /note=SD (Final) Score: The start site at 71632 has the best SD Final score with a value of -3.411. This start site also had the best z-score value of 2.999. /note=Gap/overlap: There is a gap of 9 bp with the upstream gene which is reasonable. /note=Location call: Using the information gathered so far, it is safe to call this a real gene. The auto-annotated start site is the most probable start site. The start site 71632 covers all of the coding potential and has the best RBS final score and z-score. It also forms the longest ORF. Starterator also agrees, with start site 6 being well conserved among similar phages. /note=Function call: There is not enough evidence to hypothesize a function just yet. Many of the strong hits do not have a listed function and there are some that called DNA ligase as the function, but they were not as strong of hits, however they were still within the range of our acceptable e-values and query coverage. Some CDD and HHPred hits indicate a function like cell adhesion, however the SEA-PHAGES list does not have this as an approved function. /note=Transmembrane domains: Based on the TMHMM and TOPCON data, this gene does not code for a transmembrane protein. /note=Secondary Annotator Name: Beaudin, Catherine /note=Secondary Annotator QC: I agree with the location call for this gene at start site 71,632. I noticed that you forgot to complete the GM Coding Capacity box. Also, I would briefly mention the Z-score as well in the section “SD (Final) Score” of the PECAAN notes. tRNA 72361 - 72434 /gene="101" /product="tRNA-Gly(tcc)" /locus tag="PUMPKINSPICE_101" /note=tRNA-Gly(tcc) tRNA 72564 - 72636 /gene="102" /product="tRNA-Gln(ctg)" /locus tag="PUMPKINSPICE_102" /note=tRNA-Gln(ctg) tRNA 72741 - 72812 /gene="103" /product="tRNA-Trp(cca)" /locus tag="PUMPKINSPICE_103" /note=tRNA-Trp(cca) CDS 72868 - 73314 /gene="104" /product="gp104" /function="hypothetical protein" /locus tag="PumpkinSpice_104" /note=Original Glimmer call @bp 72868 has strength 12.7; Genemark calls start at 72868 /note=SSC: 72868-73314 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 8.0721E-101 GAP: 615 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.976, -5.2484617369160205, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_180304078,100.0,8.0721E-101 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hugo, Cristelle /note=Auto-annotation start source: Glimmer and GeneMark 72868 /note=Phamerator: 4/24 Pham: 19890. It is conserved in 20/21 of the BE2 phages. No function listed. /note=Starterator: Start site 9 was manually annotated in 33/36 non-draft genes in this pham. It is 72868 in PumpkinSpice. It agrees with the site predicted by Glimmer and GeneMark. This was the only site that included all the coding potential. /note=Coding Potential: Good coding potential on forward strand, with start site covering all potential. /note=SD (Final) Score: -5.248. This is not the lowest score, but it is the only predicted site that includes all the coding potential. /note=Gap/overlap: 615. This is due to the presence of tRNAs before the gene. /note=Location call: This is a real gene, and the likely start site is at 72868. /note=Function call: NKF. While we did find 2 good matches, both resulted in proteins that are NKF. So while we may not know which protein this gene for sure matches well with, we can say that it likely does not have a known function. Phage genes that had similar sequences to this ORF also had no function listed. /note=Transmembrane domains: 0; no predictions, NKF /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 73316 - 73483 /gene="105" /product="gp105" /function="hypothetical protein" /locus tag="PumpkinSpice_105" /note=Original Glimmer call @bp 73316 has strength 2.15 /note=SSC: 73316-73483 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein HWB80_gp177 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 3.44528E-33 GAP: 1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.01, -3.0866669750886713, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp177 [Streptomyces phage Karimac] ],,YP_009840275,100.0,3.44528E-33 SIF-HHPRED: SIF-Syn: gene 101 (pham 29588), has not known function, it shows synteny with phage Bordeaux, where its upstream gene (pham 19890) and downstream gene nucleotidyltransferase also show synteny between each other. However, between genes 101( pham 29588) and nucleotidyltransferase, there is an addition of gene (pham 63117) in PumpkinSpice genome. This gene (pham 63117) cannot be found in Bordeux genome and in other phages genomes. /note=Primary Annotator Name: Jakupova, Malika /note=Auto-annotation start source: Glimmer calls the start site at 73316. GeneMark doesn`t show any start site. /note=Phamerator: pham 29588; The analysis was run 04/16/21; It is conserved, found in Bmoc_33(BE) and EGole_34(BE). /note=Starterator: The start number called the most often in the published annotations is 9, it was called in 22 of the 52 non-draft genes in the pham. Start site 9 is 73316 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. The ORF does have reasonable coding potential and the chosen start site does include all of the coding potential. /note=SD (Final) Score: -3.087. It is the best final score on PECAAN. /note=Gap/overlap: Gap: 1 bp. It is a small gap, suggesting that this gene is a part of an operon. This gene is conserved in several other phages and the gap of 1 bp was seen in the other phages as well, such as phage LukeCage and MindFlayer. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 73316. /note=Function call: NKF. CDD doesn`t have any information about this gene. HHpred adoesn`t provide any informative evidence about this gene`s function, all of the hits from this program were mammalian hormones or proteins with very positive e-values that doesn`t agree with a threshold e-value <10e-3. PhagesDB BLAST hits show unknown functions for this gene with small e-values around 5e-29. NCBI BLAST only provides different hypothetical functions for this gene, however, with good e-values<10^-33, 100% coverage and 99%+ identity. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Rafael, Adriana Nicole /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. tRNA 73538 - 73611 /gene="106" /product="tRNA-Pro(tgg)" /locus tag="PUMPKINSPICE_106" /note=tRNA-Pro(tgg) CDS 73788 - 74195 /gene="107" /product="gp107" /function="HNH endonuclease" /locus tag="PumpkinSpice_107" /note=Original Glimmer call @bp 73788 has strength 7.72; Genemark calls start at 73788 /note=SSC: 73788-74195 CP: yes SCS: both ST: SS BLAST-Start: [HNH endonuclease [Gordonia phage Daredevil] ],,NCBI, q1:s1 94.0741% 3.83548E-21 GAP: 304 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.79, -5.938832282031755, no F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Gordonia phage Daredevil] ],,YP_009807165,55.3957,3.83548E-21 SIF-HHPRED: restriction endonuclease PacI; HNH restriction endonuclease, beta-beta-alpha-metal active site, 8 base-pair rare cutter, HYDROLASE-DNA complex; HET: SO4; 1.92A {Pseudomonas alcaligenes},,,3M7K_A,91.8519,99.0 SIF-Syn: This gene is not present in observed phages Battuta, MindFlayer, Starbow. Upstream and downstream synteny is observed in these phages, which suggests that this gene may be an HNH endonuclease which has just homed to this phage and is not a conserved gene. /note=AF: Diverse pham but start site 73788 is best option and aligns roughly with location of other chosen starts in pham. /note=AGV notes: starterator info is incorrect, but otherwise agree /note=Primary Annotator Name: Kelly, Samuel /note=Auto-annotation start source: Glimmer and GeneMark both have the start point marked at 73788. /note=Phamerator: (4/23/21) Pham 59446. This pham contains genes from mostly subcluster BE and BK but also a variety of others (L, C, DN, even a singleton). HNH endonuclease is listed as the function for most of the genes, and this is a conserved gene among all non-draft phages within the BE cluster. /note=Starterator: Start coordinate is (26, 73788). Approx. 76% (28/37) of manually annotated non-draft phages call start site #26, suggesting it is a good start site. /note=Coding Potential: Lack of coding potential on Host-trained GeneMark, but fairly strong coding potential on the Self-trained GeneMark. Decent amount atypical coding potential in Self-trained, and Hmm. prediction extends a little past coding potential. Coding potential is in the forward orientation for this gene. /note=SD (Final) Score: -5.939, which is not a great value compared to other candidates. /note=Gap/overlap: Smallest gap before gene (304 bp) among all candidates. /note=Location call: Considering the evidence, it seems reasonable to place the start site at 73788. Starterator supports this evidence, but Phamerator`s data is kind of inconclusive in terms of what cluster the genes are in. /note=Function call: There is some evidence for HNH endonuclease on both NCBI and PhagesDB (more so on the latter), but these calls are not supported by strong e-values/coverage/identities. Highest scores point to unknown function. CDD yielded no results, but HHpred yielded many strong hits suggesting this is an HNH endonuclease. Considering CDD`s evidence for this gene not being a conserved domain and HHpred hits, the function is likely HNH endonuclease. This also logically makes sense because of the "homing" nature of HNH endonucleases, which may cause them to just spread to certain phages via HGT and not be present in others. /note=Transmembrane domains: Zero hits in both TMHMM and TOPCONS. /note=Secondary Annotator Name: Howe, Kathryn /note=Secondary Annotator QC: Based on the present evidence, I agree with the start site however more information regarding phamerator and if it is a conserved gene among similar phages should be included. tRNA 74202 - 74274 /gene="108" /product="tRNA-Pro(tgg)" /locus tag="PUMPKINSPICE_108" /note=tRNA-Pro(tgg) CDS 74437 - 74958 /gene="109" /product="gp109" /function="Nucleotidyl transferase" /locus tag="PumpkinSpice_109" /note=Original Glimmer call @bp 74437 has strength 9.75; Genemark calls start at 74437 /note=SSC: 74437-74958 CP: yes SCS: both ST: SS BLAST-Start: [nucleotidyltransferase domain-containing protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 2.14566E-124 GAP: 241 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.818, -3.0425830684175623, yes F: Nucleotidyl transferase SIF-BLAST: ,,[nucleotidyltransferase domain-containing protein [Streptomyces sp. JV178] ],,WP_107489961,100.0,2.14566E-124 SIF-HHPRED: Aminoglycoside NucleotidylTransferase; antibiotic resistance, neomycin, TRANSFERASE-ANTIBIOTIC complex; HET: NMY; 1.65A {Staphylococcus aureus} SCOP: a.24.16.1, d.218.1.1,,,6UN8_B,52.6012,98.0 SIF-Syn: /note=Primary Annotator Name: Kim, James Joon /note=Auto-annotation start source: Glimmer and GeneMark where they both start at 74437. /note=Phamerator: Pham 19102 as of 4/21/21. The two phages used to compare the gene with are battuta and birchyln. /note=Starterator: Starterator says start number 12 (start site 74437) is the best suited start site for our gene. This is because start site 74437 has 33 manual annotations where this is very convincing evidence of a good start site. /note=Coding Potential: The coding potential is shown through the forward strand which indicates that this gene is indeed a forward gene and this was visualized through the Host-Trained GeneMark data. /note=SD (Final) Score: The RBS final score is -3.043, Z-Score is 2.818, and by having these two wonderful scores, I think it`s the best option for the LORF. /note=Gap/overlap: The gap is 241 which is an acceptable gap when comparing it to other phages and with the given gap, it is advised that a new gene should not be added. It is also the smallest gap in its respective entry. /note=Location call: With the given evidence, this appears to be a real gene with its respective start site at 74437. /note=Function call: nucleotidyltransferase /note=Transmembrane domains: N/A, this means that the known function of this gene, tRNA Nucleotidyltransferase, has no relation to a membrane protein due to the absence of a TMH prediction. /note=Secondary Annotator Name: Billings, Sophie /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. Note: Phamerator notes should include whether it is conserved or not. Starterator notes need to include if this site is in this gene and what the location is if included. Also, if the Starterator evidence agrees with Glimmer and GeneMark. Also, for the gap annotation mention which phages it is conserved in for reference. CDS 75085 - 76896 /gene="110" /product="gp110" /function="FtsK-like DNA translocase" /locus tag="PumpkinSpice_110" /note=Original Glimmer call @bp 75220 has strength 8.01; Genemark calls start at 75220 /note=SSC: 75085-76896 CP: no SCS: both-cs ST: SS BLAST-Start: [FtsK-like DNA translocase [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 0.0 GAP: 126 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.983, -5.058605177999316, no F: FtsK-like DNA translocase SIF-BLAST: ,,[FtsK-like DNA translocase [Streptomyces phage Starbow] ],,AXH66612,100.0,0.0 SIF-HHPRED: DNA TRANSLOCASE FTSK; NUCLEOTIDE-BINDING, CHROMOSOME PARTITION, ATP-BINDING, DNA- BINDING, CELL DIVISION, DNA TRANSLOCATION, KOPS, MEMBRANE, DIVISOME, CELL CYCLE, MEMBRANE; HET: AGS; 2.25A {PSEUDOMONAS AERUGINOSA},,,2IUT_A,69.1542,100.0 SIF-Syn: Pham 4368 (as of 5/26/2021), upstream is a tRNA nucleotidyltransferase (Pham 19102 as of 5/25/2021), downstream is a membrane protein (Pham 12894 as of 5/26/2021), just like in phage BoomerJR, MindFlayer, and Starbow. /note=AF: changed start 2/10/22 to match starterator; added TMDs /note=Primary Annotator Name: Lapurga, Kaira /note=Auto-annotation start source: Glimmer & GeneMark both call the start at 75220. Possible suggested start sites include /note=Phamerator: 4368. Date 5/25/2021. It is conserved; found in phages Karimac, Mindflayer and Starbow. /note=Starterator: Start site 4 in Starterator was manually annotated in 31/33 non-draft genes in this pham. This evidence agrees with the start site predicted by GeneMark as well. /note=Coding Potential: Coding potential in the ORF is only on forward strand, indicating a forward gene. Coding potential found in Self-Trained GeneMark. /note=SD (Final) Score: final score is -4.601 /note=Gap/overlap: Gap is between gene 103 (stop site: 74958) which is ahead of this gene. /note=Location call: Real gene and the start site can remain at 75220 due to it covering all coding potential as seen in GeneMark. /note=Function call: Common fairly strong calls in BLASTp, HHpred and Pfam for FtsK-like DNA translocase. /note=Transmembrane domains: one TMH found in TMHMM and in TOPCONSs /note=Secondary Annotator Name: Cardona, Ninette /note=Secondary Annotator QC: CDS 76896 - 77111 /gene="111" /product="gp111" /function="membrane protein" /locus tag="PumpkinSpice_111" /note=Original Glimmer call @bp 76896 has strength 8.17; Genemark calls start at 76896 /note=SSC: 76896-77111 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 5.59608E-42 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.801, -3.077027835973012, no F: membrane protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675276,100.0,5.59608E-42 SIF-HHPRED: SIF-Syn: membrane protein. upstream gene is pham 4368, downstream is pham 17543, just like in phages Battuta and BoomerJr. /note=Primary Annotator Name: Linares Cardona, Ninette /note=Auto-annotation start source: Both Glimmer and GeneMark call the gene and they both agree on the start site at 76896 bp. /note=Phamerator: The pham number as of 04/21/2021 is 12894. The gene is conserved in phages Braelyn, MulchMansion, and Karimac, all in the same cluster as PumpkinSpice. /note=Starterator: Start site 6 in Starterator was manually annotated in 20/33 non-draft genes in this pham. Start 6 is 76896 in PumpkinSpice. This evidence agrees with the site predicted by GeneMark and Glimmer. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found only in Self-Trained GeneMark. The ORF does have reasonable coding potential and the chosen start site does include all of the coding potential. /note=SD (Final) Score: The final score is not the highest but it is reasonable at -3.041 and the z-score is 2.801. /note=Gap/overlap: The gap/overlap with the upstream gene is a 1 bp overlap, suggesting that it might be part of an operon. This gene is conserved in several other phages and the overlap was seen in the other phages as well, such as phage Genie2 and MindFlayer. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 76896. Starterator agrees with Glimmer and GeneMark. /note=Function call: membrane protein. Multiple PhagesDB BLAST and NCBI BLAST has hits with no known function (e-values < e^-36). CDD and HHpred didn`t have any significant hits for a function. TMHMM predicts one TMD and TOPCONS predicts two TMDs, suggesting that this gene is a membrane protein. /note=Transmembrane domains: TMHMM predicts just one TMD. TOPCONS predicts two TMDs. Based on this evidence, this gene can be assumed to have a real TMD and is therefore a membrane protein. /note=Secondary Annotator Name: Liu, Lily /note=Secondary Annotator QC: All of the evidence categories have been considered, and I agree with this annotation. tRNA 77160 - 77234 /gene="112" /product="tRNA-Ile(tat)" /locus tag="PUMPKINSPICE_112" /note=tRNA-Ile(tat) CDS 77260 - 77433 /gene="113" /product="gp113" /function="hypothetical protein" /locus tag="PumpkinSpice_113" /note=Original Glimmer call @bp 77260 has strength 3.85; Genemark calls start at 77260 /note=SSC: 77260-77433 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB80_gp173 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 2.74696E-33 GAP: 148 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.556, -3.588526034455651, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp173 [Streptomyces phage Karimac] ],,YP_009840279,100.0,2.74696E-33 SIF-HHPRED: SIF-Syn: NKF, upstream gene belongs to pham 12894, downstream gene belongs to pham 5721, just like in phages Battuta and Wipeout. /note=Primary Annotator Name: Liu, Lily /note=Auto-annotation start source: Both Glimmer and GeneMark call the start site at 77260bp. /note=Phamerator: pham 17543, date 04/44/2021. It is conserved in phages such as Battuta, Birchlyn, and Bordeaux. /note=Starterator: start site 3 was manually annotated in 33/33 non-draft genes in this pham. Start 3 is at 77260bp, which agrees with the site called by Glimmer and GeneMark. /note=Coding Potential: The self-trained GeneMark shows both typical and alternative coding potential, but the host-trained GeneMark does not show any coding potential at all. Both the self-trained and the host-trained GeneMarks correspond to the first reading frame. /note=SD (Final) Score: The best final score is -3.589 and the best z-score is 2.556. /note=Gap/overlap: There is a 148bp gap, which is quite large, but this is reasonable because this gap is conserved in other phages (Battuta and BoomerJR), and there is no coding potential present in the gap. /note=Location call: Based on the above evidence, this is a real gene and most likely has a start site at 77260 bp. /note=Function call: There is no known function for this gene. All PhagesDB BLAST hits had an unknown function, with the top two e-values both being 6e-27. NCBI BLAST hits all listed this gene as a hypothetical protein, with the top two e-values being 3e-33 and 1e-28. CDD did not have any hits for this gene. HHpred did have a lot of hits with good coverages, probabilities, and e-values, but all of those hits corresponded to human or fungus proteins, so I did not include them as evidence. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Chang, Loren /note=Secondary Annotator QC: After reviewing the above evidence, I agree with the primary annotator`s location call. tRNA 77442 - 77516 /gene="114" /product="tRNA-Leu(taa)" /locus tag="PUMPKINSPICE_114" /note=tRNA-Leu(taa) CDS 77520 - 77831 /gene="115" /product="gp115" /function="hypothetical protein" /locus tag="PumpkinSpice_115" /note=Original Glimmer call @bp 77571 has strength 8.51; Genemark calls start at 77517 /note=SSC: 77520-77831 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178]],,NCBI, q1:s2 100.0% 6.09276E-71 GAP: 86 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.291, -4.124821728793217, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178]],,WP_143675277,99.0385,6.09276E-71 SIF-HHPRED: SIF-Syn: NKF, upstream NKF, downstream NKF, just like phage IchabodCrane /note=Primary Annotator Name: Merlos, Andres /note=Auto-annotation start source: Glimmer calls start site 77571, while GeneMark calls start site 77517. /note=Phamerator: Pham number 5721 was ran on 23 April 2021. This gene is heavily conserved within its pham, Battuta_114 and Birchlyn_113. This gene does not have an assigned function. There are 4 draft genes. /note=Starterator: Autoannotated is (5, 77571). Start site number 2 was the most called, called in 14 of 14 non-draft genes. A better option would, thus, be (2, 77520). /note=Coding Potential: There is coding potential in the self-trained GeneMark but not in the host-trained GeneMark. /note=SD (Final) Score: -5.333 is a good Final Score and a Z-score of 2.08 is also sufficient. The revised start site of 77520 does not cover all coding potential but it is very close. /note=Gap/overlap: 137bp is large, but this is the smallest gapped start site. The gap is highly conserved, as no other genes seem to fill it with other genes. /note=Location call: The start site is called for at 77571 F, and a stop site is called for at 77831. I would change the start site to 77517, as the start codon is more common, it is the LORF, the gap is minimized, and the Final score and Z-score are sufficient. As seen in the starterator, I do not believe the start site is sufficient. A better option would be (2, 77520), so I chose this as my start site. This start site is conserved and it is the most annotated start site number. /note=Function call: The function is seemingly unknown. There are no matches in both PhagesDB or NCBI that have a gene with a known function. Based on all previous data, such as no matches in PhagesDB and NCBI, I would hypothesize that the function is NKF. There are no TMDs on TMHMM, so TOPCONS is irrelevant. As such, we cannot call the function of this gene a transmembrane protein. /note=Transmembrane domains: No TMDs on TMHMM. Because there are no TMDs on TMHMM, TOPCONS is irrelevant. /note=Secondary Annotator Name: Taheri, Armin /note=Secondary Annotator QC: From your notes, I wasn`t able to tell which start site you ended up deciding on. I would reword the "location call" section to clarify which of the three start sites you mention is your final call. Also, for the "gap" and "SD Score" sections, I would mention the scores and gap not just for the autoannotated start but also for the one you chose, especially since the autoannotated start doesn`t have the best score or the longest ORF. Again, I couldn`t tell which site you chose, but, based on the data, I believe the correct start site is 77,520. CDS 77831 - 78010 /gene="116" /product="gp116" /function="hypothetical protein" /locus tag="PumpkinSpice_116" /note=Original Glimmer call @bp 77831 has strength 18.84; Genemark calls start at 77831 /note=SSC: 77831-78010 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ICHABODCRANE_114 [Streptomyces phage IchabodCrane]],,NCBI, q1:s2 100.0% 4.68616E-31 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.567, -3.5660483139923818, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ICHABODCRANE_114 [Streptomyces phage IchabodCrane]],,QFP97423,98.3333,4.68616E-31 SIF-HHPRED: SIF-Syn: NKF, the gene upstream (Pham 5721) and the gene downstream (Pham 1620) both have unknown function just like in Karimac and LukeCage. /note=Primary Annotator Name: Namaganda, Samali /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start site as 77831 /note=Phamerator:As of 4/23/21, the gene belongs to pham 29,451.The pham has 40 members, 8 of which are drafts. The gene is conserved in other BE members Ivy, LilMartin, Battuta and LukeCage. /note=Starterator: Analysis was run on 4/21/21. Start (8, 77831) was called most often in the published annotations, 17 of the 32-non-draft genes in the pham. /note=Coding Potential: The ORF only has coding potential in the Forward direction indicating that it is a forward gene. The ORF has both atypical and typical coding potential in only the self-trained GeneMark /note=SD (Final) Score: -3.566, best RBS score for every option with an overlap/gap of less than 7bp. /note=Gap/overlap: -1, 1 basepair overlap conserved in Battuta and Karimac /note=Location call: Start site 77831 is shared by other genes in other pham members /note=Function call: As of 5/7/21, there is yet to be enough data to hypothesize the gene function. All the significant hits (E values < 1e-7) in PhagesDB Blastp and the NCBI Hit Gene product did not have a function. /note=Transmembrane domains: Analysis ran on 3/24/21 shows that there are no TMH calls in TmHmm /note=Secondary Annotator Name: Rafael, Adriana Nicole /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 78135 - 78359 /gene="117" /product="gp117" /function="hypothetical protein" /locus tag="PumpkinSpice_117" /note=Original Glimmer call @bp 78135 has strength 8.86; Genemark calls start at 78135 /note=SSC: 78135-78359 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_BIRCHLYN_116 [Streptomyces phage Birchlyn] ],,NCBI, q1:s1 100.0% 1.09898E-46 GAP: 124 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.567, -3.5660483139923818, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BIRCHLYN_116 [Streptomyces phage Birchlyn] ],,QDF17280,100.0,1.09898E-46 SIF-HHPRED: SIF-Syn: NKF, upstream gene is from pham 29451, downstream gene is from pham 62177, just like in phage Birchlyn. /note=Primary Annotator Name: Quijada, Britney /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start site at 78135. /note=Phamerator: The pham number as of April 22, 2021 is 1620. The gene is conserved in phages Battuta, Birchlyn, and Starbow, all in the same cluster as PumpkinSpice. Function of this pham is not listed. /note=Starterator: Start site 6 in Starterator was manually annotated in only 9/33 non-draft genes in this pham. However, it was manually annotated the most for cluster BE2, which correlates to a start site of 78135 bp for PumpkinSpice. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found only in the Self-Trained GeneMark. The chosen start site includes all of the coding potential. /note=SD (Final) Score: The SD (Final) Score is -3.566 with a Z score of 2.567. Best options for LORF. /note=Gap/overlap: 124 bp gap. The 124 bp gap upstream of this gene is relatively large at 124 bp. However, this gap is conserved in several other phages, such as phage Starbow and Karimac. No coding potential is observed within this gap in GeneMark. /note=Location call: Considering all of the evidence above, this is a real gene and the most likely start site is at 78135 bp. Starterator`s graphical output and summary report are in accordance with GeneMark, Glimmer, and the manual annotations. /note=Function call: There are multiple phagesDB BLASTp hits with suggested "function unknown" with top two smallest e-values of 8e-38 and 6e-37. Although HHPRED does have a decent query coverage (91%) for one protein function listed, all e-values were > 4.3 which makes the hit more random. CDD had no apparent hits. Multiple NCBI BLASTp hits also have "hypothetical protein" listed with e-values such as 1e-46 and 2e-45 in alignment with Streptomyces phage Birchlyn and Streptomyces phage Karimac (98%+ coverage, 98%+ identity). Function must be unknown. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Rivera, Bryanna /note=Secondary Annotator QC: 1) Phamerator: State whether that gene has a function or not. Other than that, looking at the evidence/data gathered, I agree with the primary annotater that this is a real gene and the autoannotated start call is accurate. Good Job! CDS 78501 - 78656 /gene="118" /product="gp118" /function="hypothetical protein" /locus tag="PumpkinSpice_118" /note=Original Glimmer call @bp 78501 has strength 4.06 /note=SSC: 78501-78656 CP: no SCS: glimmer ST: NI BLAST-Start: [hypothetical protein SEA_ICHABODCRANE_116 [Streptomyces phage IchabodCrane] ],,NCBI, q1:s35 100.0% 9.62814E-30 GAP: 141 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.399, -4.36385515561809, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ICHABODCRANE_116 [Streptomyces phage IchabodCrane] ],,QFP97425,60.0,9.62814E-30 SIF-HHPRED: SIF-Syn: /note=AGV notes: Gap is 141, not 8 /note=Primary Annotator Name: RAFAEL, ADRIANA NICOLE /note=Auto-annotation start source: Glimmer calls the start site 78501, however, GeneMark does not call a start site. /note=Phamerator: Pham 57525...It is mostly conserved; 17/26 phages have a gene length of 156. (conserved in Birchlyn_118 and Bordeaux_119) /note=Starterator: Start site 12 (site 78501in PumkinSpice) is the most annotated start and is called in 12/19 non-draft genes in Pham 57525. This agrees with the start site Glimmer called (start site 78501). /note=SD (Final) Score: -4.364..This is the best Final score in PECAAN. This start site is also the LORF. /note=Coding Potential: There is coding potential in the Self-trained GeneMark but not in the Host-Trained GeneMark. The coding potential is also visible in the forward strands indicating that this is a forward gene; it also covers all of the ORF. /note=Gap/overlap: 8 bp...this is a reasonable gap size. /note=Location call: Evidence suggests that this is the correct start site for this gene (Site 78501). The coding potential covers all of the ORF with this start site, there is a minimal gap, and the Final score and Z-value are the best in PECAAN. It is also the LORF. According to Starterator, this is also the most annotated start site. /note=Function call: There is no evidence that points towards a possible function. NCBI and PhagesDB Blastp results all considered it to be a “hypothetical protein”. There were no CDD hits, indicating that there are no known conserved domains that match this gene. HHpred hits did deliver a high possibility (< 80) and decent e-values (< 1e-2), the %Conserved was not as high (<40%). Therefore, this gene should be labeled “no known function” /note=Transmembrane domains: There are no transmembrane domains covered by TMHMM, making this unqualified t be named a membrane protein. /note=Secondary Annotator Name: TAHERI, ARMIN /note=Secondary Annotator QC: I agree with this location call. Just a small typo: You say that the coding potential "covers all of the start site" but I think it should be "covers all of the ORF" since the start site is one position and you can`t really cover it. CDS 78649 - 78837 /gene="119" /product="gp119" /function="hypothetical protein" /locus tag="PumpkinSpice_119" /note=Original Glimmer call @bp 78649 has strength 13.83; Genemark calls start at 78649 /note=SSC: 78649-78837 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ICHABODCRANE_117 [Streptomyces phage IchabodCrane] ],,NCBI, q1:s1 100.0% 7.17406E-35 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.405, -4.653873557426368, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ICHABODCRANE_117 [Streptomyces phage IchabodCrane] ],,QFP97426,100.0,7.17406E-35 SIF-HHPRED: SIF-Syn: NKF, upstream gene is from Pham 10563, downstream gene is from Pham 63310, and they both have unknown functions just like in phage Bordeaux. /note=Primary Annotator Name: Rivera, Bryanna /note=Auto-annotation start source: Glimmer and GeneMark, which both called the start site at 78649. /note=Phamerator: As of 04/21/21 the pham number is 17826. This gene is conserved in the subcluster BE2 and phages “Bordeaux” and “Birchlyn” were used for comparison. No function called. /note=Starterator: Start site 8 in starterator was manually annotated in 12 of 12 non-draft genes in this pham, and this does correlate with the start site 78649 bp that was called by Glimmer and GeneMark. Aside from correlating, site 8 was the most annotated start site. It was found in 17 of 17 genes in the pham. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. There appeared to be really good coding potential in the first forward frame, and very minimal atypical coding potential in the reverse frames. Coding potential only displayed in the Self-Trained GeneMark. /note=SD (Final) Score: Score of -4.654, but it is not the lowest negative score on PECAAN. This was the second lowest final score, and second highest z-score. It had a good start codon of ATG. /note=Gap/overlap: There is an overlap of 8 bp, which is almost nothing, it’s definitely acceptable and is conserved throughout other genes. This is not the longest reasonable ORF for this gene call. /note=Location call: Based on the evidence gathered, this is a real gene with a start site of 78649. This gene had good coding potential, good final and z-scores. Although they weren’t the top ones, using the overlap and the start codon, (along side all the other data collected) helps solidify that this is the correct start site call. As of 04/23 and using the data from Phamerator, Starterator, and all the evidence gathered I am firmly convinced this is the accurate start site. /note=Function call: No known function. There were no CDD hits, and the hits in HHpred did not have values that met the criteria, the lowest e-value was 0.039. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Zuelch, Avery /note=Secondary Annotator QC: I have QC`d this gene and agree with the primary annotator due to all of the above evidence. CDS 78834 - 79226 /gene="120" /product="gp120" /function="membrane protein" /locus tag="PumpkinSpice_120" /note=Original Glimmer call @bp 78834 has strength 8.23; Genemark calls start at 78834 /note=SSC: 78834-79226 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Streptomyces phage MulchMansion]],,NCBI, q1:s1 100.0% 3.28532E-75 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.214, -2.9617282384803714, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Streptomyces phage MulchMansion]],,QNO12531,93.0769,3.28532E-75 SIF-HHPRED: SIF-Syn: Membrane protein, upstream gene is Pham 17826, downstream gene is Pham 63732, just like in phage Mindflayer. /note=Primary Annotator Name: Taheri, Armin /note=Auto-annotation start source: GeneMark and Glimmer called the gene start site at 78834. /note=Phamerator: Pham 10563. Conserved in related phages, including TomSawyer, Mindflayer, and IchabodCrane. No function called. Date: 4/23/21. /note=Starterator: Site number 18, position 78834 in PumpkinSpice, most annotated start (called in 34 of 50 non-draft genes in pham). /note=Coding Potential: High typical and atypical forward potential on GeneMarkS that covers the entire ORF. The atypical coding potential goes slightly further upstream than the start site. /note=SD (Final) Score: For the auto-annotated start site, the final score is -2.962. The z-score is 3.214. These are the best possible scores. /note=Gap/overlap: The -4 upstream gap could indicate an operon with the previous gene. Gap is conserved in other final genomes such as phages Karimac, Starbow, and MindFlayer. /note=Location call: The gene is real, with a start site of 78,834. Starterator agrees with GeneMark and Glimmer. /note=Function call: Membrane protein. All strong PhagesDB BLAST hits have unknown function. All significant CDD and HHpred hits have unknown function. There are two strong NCBI BLAST hits with the function “membrane protein” and e-values as low as 3.28532e-75. TmmHmm and Topcons agree on one transmembrane domain. /note=Transmembrane domains: TmHmm predicts one transmembrane domain. Topcons predicts the same domain as well as three other ones. /note=Secondary Annotator Name: Rivera, Bryanna /note=Secondary Annotator QC: 1) Starterator: State whether the 34/49 genes that called the start site were drafts or non-drafts. 2) Coding Potential: Add a little more information in terms of the typical coding potential. You stated that there was coding potential but was there good coding potential? Did the start and stop site cover a lot of coding potential, partly, or was there very minimal? Other than that, based on the evidence/data gathered, I agree with the primary annotater that this is a real gene and the auto annotated start site is the correct call. CDS 79331 - 79474 /gene="121" /product="gp121" /function="hypothetical protein" /locus tag="PumpkinSpice_121" /note= /note=SSC: 79331-79474 CP: no SCS: neither ST: NI BLAST-Start: [hypothetical protein SEA_BIRCHLYN_121 [Streptomyces phage Birchlyn]],,NCBI, q1:s1 100.0% 5.26779E-24 GAP: 104 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.994, -4.746946755792842, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BIRCHLYN_121 [Streptomyces phage Birchlyn]],,QDF17402,95.7447,5.26779E-24 SIF-HHPRED: SIF-Syn: /note=Coding potential for this gene is VERY low and only present on GM-self. Adding it based on gap size and several other phages in BE2 that also called it. CDS 79562 - 79816 /gene="122" /product="gp122" /function="hypothetical protein" /locus tag="PumpkinSpice_122" /note=Original Glimmer call @bp 79562 has strength 5.81; Genemark calls start at 79562 /note=SSC: 79562-79816 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB80_gp166 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 3.15184E-55 GAP: 87 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.099, -4.52735728736575, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp166 [Streptomyces phage Karimac] ],,YP_009840286,100.0,3.15184E-55 SIF-HHPRED: SIF-Syn: This gene is NKF in pham number 78387, the upstream gene is NFK with pham number 10563, and the downstream gene is NFK with pham number 12078, just like in phages Starbow and MindFlayer. /note=Primary Annotator Name: Garcia Vedrenne, Ana /note=Auto-annotation: Both Glimmer and GenMark call the gene and agree that the start site is 79562 /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is only found in Self-Trained GeneMark. /note=SD (Final) Score: The final score is the best option at -4.527 and the z score is the secondhighest at 2.099. /note=Gap/overlap: Gap of 335 bp is conserved in IchabodCrane and Bordeaux. No coding potential in gap/ /note=Phamerator: Pham number 78387 on 9/17/2021. Conserved in other phages in cluster BE such as Starbow and Mindflayer /note=Starterator: Pham number 78387 has 63 members, 14 are drafts. The start number called the most often in the published annotations is 12, it was called in 48 of the 49 non-draft genes in the pham, including PumpkinSpice. Start: 12 @79562 has 48 MA`s /note=Location call:Considering all of the evidence above, this gene is a real gene and has a start site at 79562 bp /note=Function call: NKF /note=Transmembrane domains: No transmembrane proteins were predicted by either TMHMM or TOPCONS /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 79825 - 79989 /gene="123" /product="gp123" /function="hypothetical protein" /locus tag="PumpkinSpice_123" /note=Original Glimmer call @bp 79825 has strength 10.35; Genemark calls start at 79825 /note=SSC: 79825-79989 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB80_gp165 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 3.87663E-31 GAP: 8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.829, -3.4668782168002776, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp165 [Streptomyces phage Karimac] ],,YP_009840287,100.0,3.87663E-31 SIF-HHPRED: SIF-Syn: This gene has no known function and is part of Pham 12078. The gene upstream of it is in Pham 63762like in Birchlyn . The gene downstream of it is in Pham 60197 like Starbow. /note=Primary Annotator Name: Zuelch, Avery /note=Auto-annotation start source: Both Glimmer and Genemark call for a start site of 79825 /note=Phamerator: Date 4/23/21, Pham# 12078, 20 phages in the pham with 5 of them being drafts. Gene is conserved in most phages of the BE cluster such as LukeCage, Genie2, and BoomerJR. /note=Starterator: Start #4 @ 79825 is the Most Annotated start site as well as the auto-annotated start site. It is called in 15 of the 15 non-draft phages and has 15 MA’s /note=Coding Potential: No coding potential on Host-trained GeneMark. High typical and atypical coding potential on Self-trained Genemark on the second reading frame. All coding potential is covered by start site. /note=SD (Final) Score: -3.467 which is the best of the options. Z score is 2.829 which is also the best possible out of the given options. /note=Gap/overlap: Gap of 8bp which is acceptable /note=Location call: Start number 4 @79825bp due to all of the available evidence mentioned above. This start site covers all of the coding potential, has a good Z-score and Final Score, is the most annotated start site as well as the auto-annotated start site, and has 15 MAs. /note=Function call: function unknown, NKF; All of the significant BLAST hits with the strongest e-values and pairwise alignment scores showed an unknown function. There were no matches when doing the CDD search. On HHPred, all hits were very poor, with high e-values/ had a protein of unknown function. /note=Transmembrane domains:0 TMDs. Both TMHMM and TOPCONS showed 0 TMDs. This is acceptable given the fact that the function of this gene is NKF. /note=Secondary Annotator Name: Ali Pour, Paria /note=Secondary Annotator QC: I have QC’ed this location call and agree with the primary annotator. tRNA 80055 - 80132 /gene="124" /product="tRNA-Asp(gtc)" /locus tag="PUMPKINSPICE_124" /note=tRNA-Asp(gtc) CDS 80163 - 80354 /gene="125" /product="gp125" /function="hypothetical protein" /locus tag="PumpkinSpice_125" /note=Original Glimmer call @bp 80163 has strength 9.9; Genemark calls start at 80163 /note=SSC: 80163-80354 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 1.14403E-36 GAP: 173 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.155, -4.697746622201849, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_099970965,100.0,1.14403E-36 SIF-HHPRED: SIF-Syn: This gene is NKF with pham number 60197, the upstream gene is NFK with pham number 12078, and the downstream gene is NFK with pham number 56377, just like in phages Starbow and MindFlayer. /note=Primary Annotator Name: Beaudin, Catherine /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start at 80,163. /note=Phamerator: Pham 60197 as of 04/23/2021. The gene is conserved in phages Battuta, Bordeaux, and MindFlayer which are all in the same subcluster as PumpkinSpice. On Phamerator, there is no function called for this gene. /note=Starterator: Start site 8 in Starterator was manually annotated in 12/16 non-draft genes in this pham. Start 8 is 80163 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Coding Potential: The ORF has reasonable coding potential in the self-trained GeneMark. The chosen start site seems like it may cut off some of the coding potential but upon further review, the chosen start site is likely the best choice to include the most coding potential possible. /note=SD (Final) Score: -4.698. It is the second best final score on PECAAN. The Z-score of 2.155 is the second highest. This is high enough to suggest the presence of a credible ribosome binding site. /note=Gap/overlap: The 173 bp gap is somewhat large but ultimately reasonable because the gap is conserved in other phages (Battuta, StarPlatinum) and there is no coding potential in the gap that might be a new gene. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 80,163. Glimmer, GeneMark, and Starterator all agree. /note=Function call: NKF. All PhagesDB BLAST hits with reasonable e-values (11 which is significantly higher than the range of <10e-3 required for a hit to serve as strong evidence that supports a function call. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs for this gene, therefore it is not a membrane protein. /note=Secondary Annotator Name: Dines, Lily /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 80354 - 80554 /gene="126" /product="gp126" /function="hypothetical protein" /locus tag="PumpkinSpice_126" /note=Original Glimmer call @bp 80354 has strength 14.21; Genemark calls start at 80354 /note=SSC: 80354-80554 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 1.07305E-38 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.241, -4.2474293116543835, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675450,100.0,1.07305E-38 SIF-HHPRED: RPLX; RIBOSOME, PROTEIN SYNTHESIS; 6.6A {METHANOTHERMOBACTER THERMAUTOTROPHICUS},,,4ADX_G,95.4545,94.1 SIF-Syn: The gene upstream and down stream has no known function like in phage Battuta. /note=Primary Annotator Name: Bhatnagar, Keshav /note=Auto-annotation start source: GeneMark and Glimmer both agree at start site 80354 /note=Phamerator: As of April 22nd, the pham is found in 56377. The gene is conserved among the same subclusters to which my phage belongs such as Enygma_131, IchabodCrane_123, JimJam_132, and Karimac_127. Phamerator nor phams database listed a function for this gene. /note=Starterator: There is a reasonable start site conserved among members of the same subclusters. For the conserved start, the start site number is 1 in the pham. In my phage the start site number is 1 with position 80,354. There are 20 members in this pham and 15/15 of the final genes called this start site while none of the draft genomes called this the start site. /note=Coding Potential: Strong coding potential on the forward strand in self trained genemark. Start site covers all coding potential. /note=SD (Final) Score: The final score is -4.247, which is the best score. /note=Gap/overlap: There is a 1bp overlap, which is reasonable and also means it is likely part of an operon. The alternate candidate has a gene length that is too small to be considered /note=Location call: The gene is likely real with start site at 80,354. Glimmer and GeneMark agree on the start site. It has a conserved phamerator and starterator, strong RBS and Z-scores, a 1bp overlap, and covers all the coding potential. /note=Function call: No known function. There was no hit for CDD. NCBI BLAST listed a hypothetical protein with an identity >98%, coverage of 100% and e-value of <2.6e-38. PhagesDB BLAST listed a hypothetical protein with an e-value of 4e-32. HHpred had one hit with a probability of 94% and coverage of 95%, which listed the function as a ribosome. /note=Transmembrane domains: No hits on TMHMM or Topcons. These results make sense because CDD, NCBI and phagesDB blast predicted an unknown function for this ORF. HHpred listed it as a ribosome with decent evidence, but the rest had no evidence. /note=Secondary Annotator Name: Castillo, Salvador /note=Secondary Annotator QC: I agree with the call. tRNA 80575 - 80646 /gene="127" /product="tRNA-Asp(gtc)" /locus tag="PUMPKINSPICE_127" /note=tRNA-Asp(gtc) CDS 80865 - 81200 /gene="128" /product="gp128" /function="hypothetical protein" /locus tag="PumpkinSpice_128" /note=Original Glimmer call @bp 80865 has strength 15.09; Genemark calls start at 80865 /note=SSC: 80865-81200 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 8.70442E-72 GAP: 310 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.999, -2.5823297389851954, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_099970964,100.0,8.70442E-72 SIF-HHPRED: SIF-Syn: NKF pham 14521, upstream gene is pham 56377, downstream gene is Ro-like RNA binding protein, just like in phage Bordeaux. /note=Primary Annotator Name: Billings, Sophie /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start at 80865. /note=Phamerator: Pham 14521. Date 4/22/21. It is conserved; all other phages in this pham are in the BE cluster (Battuta, Karimac, Peebs, StarPlatinum). /note=Starterator: Start site 9 in Starterator was manually annotated in 33/33 non-draft genes in this pham. Start 9 is 80865 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is not found in GeneMark Host, but is found in GeneMark Self. This start covers all of the coding potential. /note=SD (Final) Score: -2.582. It is the best final score and it has the best z score of 2.999 /note=Gap/overlap: Gap: 195bp. Large, but ultimately reasonable because the gap is conserved in other phages (Karimac, Yaboi) and there is no coding potential in the gap that might be a new gene and a tRNA resides in the gap. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 80865. /note=Function call: All of the phagesdb BLAST hits have the function listed as "unknown function" with E-values less than 1e-7 and all NCBI BLAST hits also have the function as "unknown function" with E-values less than 1e-7. CDD and HHpred had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein /note=Secondary Annotator Name: Beaudin, Catherine /note=Secondary Annotator QC: I agree with the location for this gene at start site 80,865. I know this was already addressed by selecting “Yes” in the GM Coding Capacity menu but I would still make a brief note in the “Coding Potential” section of the PECAAN notes whether the suggested start site includes all of the coding potential. CDS 81237 - 82865 /gene="129" /product="gp129" /function="Ro-like RNA binding protein" /locus tag="PumpkinSpice_129" /note=Original Glimmer call @bp 81237 has strength 13.31; Genemark calls start at 81237 /note=SSC: 81237-82865 CP: yes SCS: both ST: SS BLAST-Start: [Ro-like RNA binding protein [Streptomyces phage Starbow]],,NCBI, q1:s1 100.0% 0.0 GAP: 36 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.203, -2.6835611257758942, yes F: Ro-like RNA binding protein SIF-BLAST: ,,[Ro-like RNA binding protein [Streptomyces phage Starbow]],,AXH66625,99.8155,0.0 SIF-HHPRED: Ro sixty-related protein, RSR; alpha helical repeats, von Willebrand Factor A domain, beta-sheet, RNA BINDING PROTEIN; 1.89A {Deinococcus radiodurans} SCOP: c.62.1.0,,,2NVO_A,97.417,100.0 SIF-Syn: [Ro-like RNA binding protein, upstream genes belong to pham 14521 (NKF – PumpkinSpice; 05/14/2021); downstream genes belong to phams 17217 (NKF – PumpkinSpice; 05/14/2021), matching phages MindFlayer/Wipeout.] /note=Primary Annotator Name: Bruns, James /note=Auto-annotation start source: Both Glimmer and GeneMark concur with a Start at 81237. /note=Phamerator: As of 04/22/2021 the gene in question belongs in pham 8900, and is present in non-draft phages Wipeout and TomSawyer. Both phages similarly belong to cluster BE. /note=Starterator: Start sixteen, of which 29 out of 191 non-draft genes confirm, is located on PumpkinSpice 81237 bp. This is not the most annotated start, however, that start is not present in PumpkinSpice Gene Stop 82865; F. Start 16 is called correct in 97.3% genes when present, and is found in Stop 82865; F. /note=Coding Potential: Coding potential is present as typical, and atypical variety on the Self-Trained GeneMark analysis. The predicted gene length by the Self-Trained GeneMark analysis represents the whole gene, including the predicted start site. Gene is in the forward direction. /note=SD (Final) Score: Score of -2.684, and is the lowest negative score present on PECAAN. /note=Gap/overlap: Gap present totaling 36 bp, and is the longest reasonable ORF for this Gene call, and covers all coding potential. Comparisons with phage Bordeaux and IchabodCrane show that the gap is conserved. /note=Location call: Based on the data listed above, this is highly likely to be a true gene with a start site of 81237. /note=Function call: Ro-like RNA binding protein. The top two PhagesDB BLAST hits have the function listed as Ro-like RNA binding protein (E-value 0; Score 1071). Top NCBI BLAST hits also list the function as Ro-like RNA binding protein with probabilities of >98% identity and alignment. CDD and HHpred pfam hits recorded TROVE domain hits of 62.5461% coverage, backed up by PDB hit with an E-value of 0 and 97.417% coverage. /note=Transmembrane domains: Only TOPCON SPOCTOPUS analysis shows one TMD, however this is not sufficient on its own to call a gene a membrane protein without any TMHMM hits. Therefore, this is not a transmembrane protein. /note=Secondary Annotator Name: Taheri, Armin /note=Secondary Annotator QC: I agree with this location call. CDS 82909 - 83049 /gene="130" /product="gp130" /function="membrane protein" /locus tag="PumpkinSpice_130" /note=Genemark calls start at 82909 /note=SSC: 82909-83049 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_130 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 1.50303E-22 GAP: 43 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.356, -7.068150495905546, no F: membrane protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_130 [Streptomyces phage Starbow] ],,AXH66626,100.0,1.50303E-22 SIF-HHPRED: SIF-Syn: Membrane protein, upstream gene is Ro-like RNA binding protein, downstream is in pham 7474, just like in phages Battuta and MindFlayer. /note=Primary Annotator Name: Canio, Noah /note=Auto-annotation start source: Glimmer does not call gene, but GeneMark calls the start site at 82909 bp. /note=Phamerator: Pham 17217. Date: 04/21/2021. This Pham is conserved in other BE cluster phages (Battuta and TomSawyer used for comparison). There is no function defined for this Pham across BE cluster phages in the phams database. /note=Starterator: Start site 2 is conserved in 28/29 of non-draft phage annotations. The site corresponds to 82909 bp in PumpkinSpice. /note=Coding Potential: Coding potential on the ORF is on the forward strand, and it is all covered by the chosen start site. Good coding potential on Self, but it is not sufficient on Host. /note=SD (Final) Score: -7.068. This is not the best SD score. However, it is reasonable due to being a value around some of the other called start sites. /note=Gap/overlap: 43 bp gap. This is a reasonable size for a gap in the genome. The gap is conserved by different phages including Genie2 and Karimac. /note=Location call: Based on the information above, this is a real gene. The predicted start site is 82909. Starterator agrees with GeneMark. /note=Function call: No Known Function/Unknown Function.The top 4 phagesdb BLAST hits have the function of "function unknown" (E-value = 2e-17). The top 4 NCBI BLASTp hits suggested function is unknown/hypothetical protein, with high query coverage (100%), high percent identity (>82%), and low E-values (between 2e-22 and 2e-18). There are no relevant hits from CDD while HHpred has an insufficient best hit with 89.7% probability, 34.7826% coverage, and an e-value of 0.95. Despite having a good probability value, the other parameter values are insufficient evidence, and the function is denoted for eukaryotes based on primary literature. So, there is no known function based on this information. /note=Transmembrane domains: Both TMHMM and TOPCONS predict one TMD. Therefore, this indicates that this gene can be identified as a "transmembrane protein." /note=Secondary Annotator Name: Ali Pour, Paria /note=Secondary Annotator QC: I have QC’ed this location call and agree with the primary annotator. CDS 83081 - 83266 /gene="131" /product="gp131" /function="hypothetical protein" /locus tag="PumpkinSpice_131" /note= /note=SSC: 83081-83266 CP: yes SCS: neither ST: NI BLAST-Start: [hypothetical protein SEA_BIRCHLYN_131 [Streptomyces phage Birchlyn] ],,NCBI, q1:s1 100.0% 3.14762E-37 GAP: 31 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 0.93, -7.560645402102623, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BIRCHLYN_131 [Streptomyces phage Birchlyn] ],,QDF17403,100.0,3.14762E-37 SIF-HHPRED: SIF-Syn: /note=CP present in GM-self mostly toward first half of gene. Found in several other phages. Large gap if this gene does not exist. Most other genes in related pham call start that produces gene of length ~186 bp. CDS 83263 - 83475 /gene="132" /product="gp132" /function="hypothetical protein" /locus tag="PumpkinSpice_132" /note=Original Glimmer call @bp 83263 has strength 9.15; Genemark calls start at 83263 /note=SSC: 83263-83475 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_131 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 3.23885E-43 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.388, -3.9401302979756174, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_131 [Streptomyces phage Starbow] ],,AXH66627,100.0,3.23885E-43 SIF-HHPRED: SIF-Syn: NKF, upstream gene needs to be added for PumpkinSpice, downstream is NKF in pham 57098, /note=AGV notes: hydrolase function had no support, notes have wrong data, but it is a real gene /note=Primary Annotator Name: Cervantes, Richard /note=Auto-annotation start source: Glimmer and Genemark both called start site at 83263. /note=Phamerator: Pham 3538 (date 4/24/2021). Conserved in following phages: BoomerJR (BE) and Karimac (BE) /note=Starterator: Start Site 2 (start site 83263) was called /note=Coding Potential: There is coding potential in the self-trained genemark but not the host trained. This coding potential covers the entire gene, in the forward direction. /note=SD (Final) Score: Our final score was given at -3.940 (closest to zero) and an Z-Score of 2.388 (above 2), thus I call the start site of 83263 as correct! /note=Gap/overlap: There is a 683 bp gap, with previous gene, which has been brought to the professors attention. There appears to be a gene between 120 and 119, as there was coding potential, and synteny with other bacteriophages. /note=Location call: The auto-generated start site of 4424 appears correct and real, based on all of our previous notes. This gene is real. /note=Function call: There were no CDD hits. The HHpred Pfam and PDB hits were relatively big. Pfam had a probability of 39.4% and an e-value of 160, with a role of hydrolase. Pfam had a probability of 35.6% and an e-value of 210 with a role defined as “ribosome regulation under stress conditions”. Thus it is best to put hydrolase as it’s function due to a better e-value and probability. /note=Transmembrane domains: There were no TMDs called in TMHMM nor TOPCONS. /note=Secondary Annotator Name: Howe, Kathryn /note=Secondary Annotator QC: There is not enough evidence here for me to agree or disagree with the start site. Starterator notes need to include which start coordinate corresponds to start site 1. Coding potential notes need to include whether the chosen start site covers the entire coding potential. The gap section needs to include whether a gene should be added or if there is any indication that the start site could be moved to fill the large gap. Overall, more information needs to be added before I can agree or disagree with the chosen start site. CDS 83472 - 83843 /gene="133" /product="gp133" /function="hypothetical protein" /locus tag="PumpkinSpice_133" /note=Original Glimmer call @bp 83472 has strength 10.88; Genemark calls start at 83472 /note=SSC: 83472-83843 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 1.59279E-87 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.594, -3.430952136556091, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675312,100.0,1.59279E-87 SIF-HHPRED: SIF-Syn: Pham 57098 is also found in other BE2 phages, like Birchlyn and Bordeaux. For all of these phages pham 57098 is also adjacent to pham 7474 upstream and pham 3911 downstream. Pham designations as of 05/28/2021. /note=Primary Annotator Name: Chang, Loren /note=Auto-annotation start source: Both Glimmer and GeneMark call the start site to be at 83472 bp. /note=Phamerator: Pham 57098 (04/23/2021). It is conserved; found in Karmiac (BE2) and Star Platinum (BE2) /note=Starterator: Start site 6 in Starterator was called for 15/25 of non-draft genes in this pham. Start site 6 is position 83472 in PumpkinSpice. This agrees with the autoannotated start site predicted by Glimmer and GeneMark. /note=Coding Potential: At this ORF, GeneMark does not show coding potential, while GeneMark Self shows good coding potential (in a forward reading frame). All of the coding potential is covered. /note=SD (Final) Score: The final score is -3.431. This is the second highest final score on PECAAN. /note=Gap/overlap: There is an overlap of -4 bp. This overlap is not large enough to be problematic. Additionally, this overlap is conserved in phages like IchabodCrane and Karimac. This may indicate that it is part of an operon. /note=Location call: Given the above evidence, this appears to be a real gene, with the most probable start site being at 83472 bp. /note=Function call: No known function. The top PhagesDB BLAST hits (with E-values of 1e-69) are all associated with proteins of no known functions. The top NCBI BLAST hits (100% coverage, 99.2%+ identity, E-values< 1e-86) are also associated with proteins of unknown function. There were no significant hits from CDD, and the only significant hit from HHPred (100% probability, unknown coverage (not found in PECAAN), E-value 6.9e-29) was Uncharacterized protein 058R, a protein of unknown function. Ultimately, the above evidence suggests that this ORF is associated with a protein of no known function. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predicted the presence of any TMDs for this protein. This suggests that it is likely not a membrane protein. /note=Secondary Annotator Name: Haeri, Alliya /note=Secondary Annotator QC: I agree that the evidence available supports the start site location call of 83,472 for this gene. CDS 83818 - 84135 /gene="134" /product="gp134" /function="helix-turn-helix DNA binding domain" /locus tag="PumpkinSpice_134" /note=Original Glimmer call @bp 83926 has strength 6.31; Genemark calls start at 83926 /note=SSC: 83818-84135 CP: no SCS: both-cs ST: SS BLAST-Start: [hypothetical protein HWB80_gp157 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 1.36059E-72 GAP: -26 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.77, -3.588861267572955, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[hypothetical protein HWB80_gp157 [Streptomyces phage Karimac] ],,YP_009840295,100.0,1.36059E-72 SIF-HHPRED: HNH homing endonuclease; HNH catalytic motif, Helix-turn-helix DNA binding domain, protein-DNA complex, DNA binding protein-DNA COMPLEX; HET: EDO; 2.92A {Bacillus phage SPO1} SCOP: d.4.1.3, d.285.1.1,,,1U3E_M,97.1429,98.2 SIF-Syn: /note=AF 2/10/22: /note=all genes in pham call first start as the start site. /note=Top HHpred hit does include HTH domain. CDS 84137 - 84358 /gene="135" /product="gp135" /function="hypothetical protein" /locus tag="PumpkinSpice_135" /note=Original Glimmer call @bp 84137 has strength 1.55; Genemark calls start at 84137 /note=SSC: 84137-84358 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein HWB80_gp156 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 5.08955E-46 GAP: 1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.422, -5.942640154802251, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp156 [Streptomyces phage Karimac] ],,YP_009840296,100.0,5.08955E-46 SIF-HHPRED: SIF-Syn: /note=AF 2/10/22 /note=Start 2 (84137) /note=• Found in 21 of 21 ( 100.0% ) of genes in pham /note=• Manual Annotations of this start: 14 of 15 /note=• Called 76.2% of time when present CDS 84358 - 84750 /gene="136" /product="gp136" /function="hypothetical protein" /locus tag="PumpkinSpice_136" /note=Original Glimmer call @bp 84358 has strength 9.34; Genemark calls start at 84358 /note=SSC: 84358-84750 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 5.6594E-92 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.567, -3.5660483139923818, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_180304096,100.0,5.6594E-92 SIF-HHPRED: SIF-Syn: Gene in pham 15833 is upstream of gene in pham 61625 (NKF) and downstream of gene in pham 8216 just like in phages Battuta, Birchlyn, Bordeaux, and BoomerJR. /note=Primary Annotator Name: Delgado, Yennifer /note=Auto-annotation start source: Glimmer and GeneMark. Both called the start at 84358. /note=Phamerator: pham: 15833. Date: 04/23/2021. It is conserved; found in Battuta_136, Birchlyn_136, BoomerJR_137. /note=Starterator: Start site 8 in Starterator was manually annotated in 19/37 non-draft genes in this pham. Start 8 is 84358 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found in GeneMark Self but not in GeneMark Host. /note=SD (Final) Score: -3.566. It is the best final score on PECAAN. /note=Gap/overlap: -1 bp. The overlap is very small. No gene could possibly be added. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 84358. /note=Function call: NKF. Both CDD and HHpred did not show any significant hits for this gene. The top two phagesdb BLAST hits have unknown function (E-value =7e-76), and the 3 top NCBI BLAST hits also have unknown function (hypothetical protein) with 100% coverage, 98%+ identity, and E-value <1e-90. Thus, the function of this gene is unknown. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Taheri, Armin /note=Secondary Annotator QC: I agree with this location call. CDS 84747 - 85286 /gene="137" /product="gp137" /function="hypothetical protein" /locus tag="PumpkinSpice_137" /note=Original Glimmer call @bp 84747 has strength 11.3; Genemark calls start at 84747 /note=SSC: 84747-85286 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 3.77998E-130 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.01, -2.6395089437464523, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675313,100.0,3.77998E-130 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Dines, Lily /note=Auto-annotation start source: Glimmer and Genemark called the start site at 84,747. /note=Phamerator: Pham 19445 on 04/21/21. Gene is conserved among other BE phages. Used Battuta, Birchlyn, and Bmoc for comparison. No function called. /note=Starterator: Called start site is not most annotated start site, but it is still well conserved (29% of start sites in pham), so it is a reasonable start site conserved among phamily. Start site coordinate: (13, 84747). 16 of 37 non-draft phages call start #15 (most annotated), called start site #13 was called in 14 of 49 phages (draft and non-draft), and was called 100% of the time when present. /note=Coding Potential: Coding potential in self-trained genemark spans ORF. Start site covers all of coding potential. /note=SD (Final) Score: -2.640; this is the best score, and is also reasonable as the start site with next best score is not the LORF, has a lower z-score, and the chosen candidate`s gap length is common among operons. /note=Gap/overlap: -4. This is a common overlap length among genes that are within operons, meaning thing gene is likely part of an operon. This gap is conserved among pham maps and starterator. /note=Location call: evidence suggests this is the real start site. This gene is real. most likely start site candidate is 84747. This start site has the best final and z score, small gene overlap, synteny, covers all coding potential. /note=Start site fairly well conserved in Starterator, although not the most called start site. a gap of -4 tells us this gene is part of an operon. /note=Function call: No program had a significant function call. Therefore, there is no known function. /note=Transmembrane domains: None /note=Secondary Annotator Name: Quijada, Britney /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator. Would suggest mentioning that this is true LORF and the Z score for strength in evidence (don`t have to mention that the next best score is not the LORF). Also, select an option for the starterator box! Must also mention which phages (at least 2) have the gap conserved! CDS 85295 - 85480 /gene="138" /product="gp138" /function="hypothetical protein" /locus tag="PumpkinSpice_138" /note=Original Glimmer call @bp 85295 has strength 12.86; Genemark calls start at 85295 /note=SSC: 85295-85480 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB78_gp150 [Streptomyces phage Wofford] ],,NCBI, q1:s1 100.0% 3.152E-34 GAP: 8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.157, -2.7806870294376242, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB78_gp150 [Streptomyces phage Wofford] ],,YP_009839812,100.0,3.152E-34 SIF-HHPRED: SIF-Syn: Gene 126 belongs to Pham 61625, the upstream gene belongs to Pham 15833 and the downstream gene belongs to Pham 12392. All three of them genes have no known function. This pattern of synteny is also observed in other BE2 phages such as TomSawyer and Genie2.` /note=Primary Annotator Name: Do, Vivian /note=Auto-annotation start source: Both Glimmer and GeneMark called a start site of 85295. /note=Phamerator: As of 04.21.21 this gene belongs to Pham 12392 as is highly conserved amongst other genes with a length of 186 bp. /note=Starterator: As of 04.21.21 the most manually annotated start site is start site 4 with 33/33 non-draft genes calling for it. This also corresponds with the auto-annotated start site of 85295 bp. /note=Coding Potential: No coding potential on Host-trained GeneMark. Good coding potential on the 2nd ORF of the forward strand on the Self-trained GeneMark. /note=SD (Final) Score: -2.781 this is the best SD score out of all the potential start sites. /note=Gap/overlap: 8 bp gap is well conserved and can also be seen in LukeCage and Starbow, it help maintain a densely packed genome. /note=Location call: We have maintained a start site of 85295 as it fully encompasses the coding potential has a starting codon of ATG which has high probability and has the best z-score and RBS Final score values. It is the only start site called for that is of decent length according to the Guiding Principles and this start site is confirmed by starterator. /note=Function call: Function unknown, the top hits on both Phagesdb and NCBI Blast had decent but not great alignment and all phages was designated with function unknown. Both HHpred and CDD did not produce any significant hits helping us confirm that at this time the function of this gene is unknown. /note=Transmembrane domains: There were no transmembrane domains predicted by either TMHMM or TOPCONS meaning it does not meet the criteria to be a membrane protein. This gives us no further insight to our gene with no known function. /note=Secondary Annotator Name: Dines, Lily /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 85535 - 85807 /gene="139" /product="gp139" /function="hypothetical protein" /locus tag="PumpkinSpice_139" /note=Original Glimmer call @bp 85535 has strength 11.79; Genemark calls start at 85535 /note=SSC: 85535-85807 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB80_gp152 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 2.55368E-57 GAP: 54 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.713, -3.708850054194189, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp152 [Streptomyces phage Karimac] ],,YP_009840300,100.0,2.55368E-57 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Haeri, Alliya /note=Auto-annotation start source: Both Glimmer and GeneMark call this gene with the start site of 85,535. /note=Phamerator: pham: 16408. 4/23/2021. This gene is conserved; found in phages IchabodCrane and Karimac. /note=Starterator: The start site 4 was manually annotated in 40/49 non-draft genes within the pham. Start size 4 is 85535 in PumpkinSpice. This evidence agrees with the start site called by both Glimmer and GeneMark. /note=Coding Potential: Coding potential in the open reading frame of this gene is only in the forward direction, indicating this gene to be a forward gene. The start site covers all coding potential. Coding potential is only found by the Self-Trained GeneMark. /note=SD (Final) Score: The final score is -3.709, which is the best score. The accompanying z-score is also the best. /note=Gap/overlap: The gap between genes is only 8 base pairs, which is very small. This gap is conserved, found in other phages within the subcluster BE2, such as IchabodCrane and Karimac. /note=Location call: Based off the above evidence, it is likely that this a real gene with the likely start site of 85,5535. /note=Function call: Function unknown. All the top PhagesDB BLAST hits (evalue = 3e-48) have function unknown listed as their function. All the top NCBI BLAST hits (100% coverage, evalue<1e-52, 92%+ identity) also have hypothetical protein listed as their function. Additionally, CDD had no significant hits, and all HHpred hits had e-values of greater than 4, which are not significant. /note=Transmembrane domains: This gene had no TMD hits with either TMHMM or TOPCONS, indicating that this is not a membrane protein. /note=Secondary Annotator Name: Do, Vivian /note=Secondary Annotator QC: I agree with your location call I would just add a list more detail to your location call. I would correct thats the -3.709 RBS score is the best one and it is accompanied with the best z-score with it being higher than 2. I would add which open reading frame you are referring to when talking about coding potential (1st, 2nd, 3rd). I would add those details into the location call as well as well as that ATG start codon has a high probability and other start sites leave a large gap. tRNA 85882 - 85958 /gene="140" /product="tRNA-Glu(ctc)" /locus tag="PUMPKINSPICE_140" /note=tRNA-Glu(ctc) CDS 85959 - 86207 /gene="141" /product="gp141" /function="membrane protein" /locus tag="PumpkinSpice_141" /note=Original Glimmer call @bp 86052 has strength 1.5 /note=SSC: 85959-86207 CP: yes SCS: glimmer-cs ST: SS BLAST-Start: [hypothetical protein HWB80_gp151 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 5.58543E-52 GAP: 151 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.445, -6.659727073547115, no F: membrane protein SIF-BLAST: ,,[hypothetical protein HWB80_gp151 [Streptomyces phage Karimac] ],,YP_009840301,100.0,5.58543E-52 SIF-HHPRED: SIF-Syn: Membrane protein, upstream gene is Pham 16408, downstream gene is HNH endonuclease, just like in phage Wipeout. /note=Primary Annotator Name: Howe, Kathryn /note=Auto-annotation start source: Only Glimmer identified the start site as 86502. GeneMark did not call the gene. /note=Phamerator: As of April 22, 2021, this gene was part of Pham 7633. This gene is conserved with other members of the same cluster BE, such as Battuta and IchabodCrane. Neither Phamerator nor other pham databases listed a function for this gene. /note=Starterator: The most conserved start site for this gene was at start site number 3, which corresponds to start site 85959 in the PumpkinSpice phage. Start site #3 was called in 14 of 27 non-draft phage annotations. /note=Coding Potential: There is a little alternate coding potential in the self-trained GeneMark starting at the chosen start site. The start site of 85959 covers all of the coding potential. /note=SD (Final) Score: The start site of 85959 does not have the best SD Final Score of -6.660 however it is still reasonable. The z-score was also not the best, but still reasonable with a value of 1.445. /note=Gap/overlap: The gap with the upstream gene using the start site of 85959 is 151 bp however no synteny indicates that a gene should be added and the start site cannot be changed to fill this gap, meaning the gap is reasonable. This gap is also conserved with other BE2 phages such as IchabodCrane and Mindflayer. /note=Location call: Using the information gathered so far, it is safe to call this a real gene. The auto-annotated start site does not cover all of the coding potential. Start site 85959 covers all of the coding potential and is shown in Starterator to be conserved among similar phages. This start site is also the longest open reading frame. Using this information, the most probable start site is at bp coordinate 85959. /note=Function call: Based on the evidence, the function of this gene is a transmembrane protein. Although PhagesDB, NCBI, CDD, and HHPred had insignificant hits or had unknown functions, the TMHMM and TOPCON each had one TMD which strongly suggests that the function is a transmembrane protein. /note=Transmembrane domains: TMHMM and TOPCON each had one TMD which strongly suggests that the function of this gene is a transmembrane protein. /note=Secondary Annotator Name: Beaudin, Catherine /note=Secondary Annotator QC: I agree with the location call for this gene at the start site 85,959. I noticed that you forgot to select “no” in the All GM Coding Capacity menu for the auto-annotated start site 86,052 as well as selecting “yes” for the manually called start site 85,959 including all of the coding potential in the GM Coding Capacity menu. Also, you should select the start site 85,959 under the “Gene Candidates” of PECAAN. Finally, I would include the Z-score for start site 85,959 in the “SD (Final) Score” section of your PECAAN notes. tRNA 86212 - 86287 /gene="142" /product="tRNA-Glu(ttc)" /locus tag="PUMPKINSPICE_142" /note=tRNA-Glu(ttc) tRNA 86355 - 86428 /gene="143" /product="tRNA-Val(tac)" /locus tag="PUMPKINSPICE_143" /note=tRNA-Val(tac) tRNA 86587 - 86661 /gene="144" /product="tRNA-Leu(tag)" /locus tag="PUMPKINSPICE_144" /note=tRNA-Leu(tag) tRNA 86673 - 86754 /gene="145" /product="tRNA-Leu(tag)" /locus tag="PUMPKINSPICE_145" /note=tRNA-Leu(tag) tRNA 87068 - 87140 /gene="146" /product="tRNA-Phe(gaa)" /locus tag="PUMPKINSPICE_146" /note=tRNA-Phe(gaa) CDS 87194 - 87697 /gene="147" /product="gp147" /function="HNH endonuclease" /locus tag="PumpkinSpice_147" /note=Original Glimmer call @bp 87194 has strength 5.83; Genemark calls start at 87449 /note=SSC: 87194-87697 CP: yes SCS: both-gl ST: SS BLAST-Start: [HNH endonuclease [Streptomyces phage Wipeout] ],,NCBI, q1:s1 100.0% 3.62141E-122 GAP: 986 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.213, -6.968240655779347, no F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Streptomyces phage Wipeout] ],,QGH74374,100.0,3.62141E-122 SIF-HHPRED: restriction endonuclease PacI; HNH restriction endonuclease, beta-beta-alpha-metal active site, 8 base-pair rare cutter, HYDROLASE-DNA complex; HET: SO4; 1.92A {Pseudomonas alcaligenes},,,3M7K_A,61.6766,97.9 SIF-Syn: /note=Primary Annotator Name: Hugo, Cristelle /note=Auto-annotation start source: Glimmer 87194; GeneMark 87449 /note=Phamerator: 4/24 Pham: 2205. It is conserved in 3/21 of the BE2 phages, comparing it to TomSawyer and Wipeout. It is an HNH endonuclease. /note=Starterator: Start site 1 was manually annotated in 2/3 non-draft genes in this pham. It is 87194 in PumpkinSpice. It agrees with the site predicted by Glimmer. This was the only site that included the entire coding potential. The gene length with this start site also matched that of TomSawyer and Wipeout. /note=Coding Potential: Variable, on forward strand, not consistent potential throughout ORF. /note=SD (Final) Score: -6.968. Although this is not the best final score, this start site is the only one that captures all the coding potential. /note=Gap/overlap: 986. This gap is large due to the gene being flanked by tRNAs on both sides. There is low alternate coding potential in between a tRNA break. Theoretically, a gene of ~314 bp could fit. Wipeout does not have a gene prior to this pham. TomSawyer does, but that gene shows synteny to a PS gene that is in a completely different location. /note=Location call: This is likely a real gene, and the most likely start site is at 87194. /note=Function call: HNH endonuclease. The previous module suggested the function is an HNH endonuclease. Multiple hits on PhagesDB resulted in genes coding for HNH endonucleases. A lot of the top matches on NCBI also showed this function. In HHpred, 1/3 of the top hits had this function listed. One of the hits with lesser coverage (but near 40%) also had this function. The fact that numerous similar phage genes resulted in this function is significant evidence. /note=Transmembrane domains: 0; no predictions, HNH endonuclease, which does not interact with the cellular membrane. /note=Secondary Annotator Name: Quijada, Britney /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator. However, I would suggest mentioning that this is the true LORF and the Z score. The date of analysis should also include the year and for the gap information, was coding potential found? If not, you should mention what phages (at least 2) have this gap conserved. tRNA 87683 - 87754 /gene="148" /product="tRNA-Gly(gcc)" /locus tag="PUMPKINSPICE_148" /note=tRNA-Gly(gcc) tRNA 87759 - 87832 /gene="149" /product="tRNA-Arg(tct)" /locus tag="PUMPKINSPICE_149" /note=tRNA-Arg(tct) CDS 87852 - 88037 /gene="150" /product="gp150" /function="hypothetical protein" /locus tag="PumpkinSpice_150" /note=Original Glimmer call @bp 87852 has strength 11.45; Genemark calls start at 87852 /note=SSC: 87852-88037 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ICHABODCRANE_145 [Streptomyces phage IchabodCrane] ],,NCBI, q1:s1 100.0% 2.39949E-36 GAP: 154 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.976, -4.801303705573801, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ICHABODCRANE_145 [Streptomyces phage IchabodCrane] ],,QFP97444,100.0,2.39949E-36 SIF-HHPRED: SIF-Syn: NKF (pham 18877) has well conserved synteny with phage Genie2`s gene 149 (pham 18877). Both of them are located at the same place in the genome. Also, downstream gene 131(pham 11924) also shows synteny with Genie2`s genome. There is an upstream gene (pham 2205) in PumpkinSpice that doesn`t appear in any final phage genomes (Genie2, MindFlayer, etc). However, the gap upstream that gene is conserved in most of those phage genomes. /note=Primary Annotator Name: Jakupova, Malika /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start site at 87852. /note=Phamerator: pham number 18877; This analysis was run 04/16/21. It is conserved, found in Battuta_149(BE) and Karimac_149(BE). /note=Starterator: The start number called the most often in the published annotations is 4, it was called in 15 of the 33 non-draft genes in the pham. Start site 4 is 87852 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. The ORF does have reasonable coding potential and the chosen start site does include all of the coding potential on GeneMarkSelf only. GeneMarkHost doesn`t show any coding potentials. /note=SD (Final) Score: -4.801. It is the best final score on PECAAN. /note=Gap/overlap: The gap/overlap with the upstream gene is a little large at a 154 bp gap. However, this gene is conserved in several other phages and the gap was seen in the other phages as well, such as phage Genie2 and MindFlayer. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 87852 bp. Starterator agrees with the start site that was predicted by Glimmer and GeneMark. /note=Function call: NKF. PhagesDB hits show that this gene has no known function, with e-value <10e-30. NCBI hits suggest only hypothetical functions for this gene (90%+ identity, 100% coverage, e< 10e-35). CDD doesn`t provide any information about this protein. HHpred hits provide information mostly about mammalian proteins; there are some bacterial hits for this gene, but most of them have very high e-values, that are greater than 10e-3. /note=Transmembrane domains:Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 88034 - 88186 /gene="151" /product="gp151" /function="hypothetical protein" /locus tag="PumpkinSpice_151" /note=Original Glimmer call @bp 88034 has strength 6.56; Genemark calls start at 88034 /note=SSC: 88034-88186 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 9.85221E-26 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.877, -3.888626254576132, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_180304097,100.0,9.85221E-26 SIF-HHPRED: SIF-Syn: Good pham synteny up- and downstream from this gene, and also conserved peptidase two genes downstream. This is true in all checked phages: MindFlayer, Starbow, and TomSaywer. /note=Primary Annotator Name: Kelly, Samuel /note=Auto-annotation start source: Glimmer and GeneMark both have the start point marked at 88034. /note=Phamerator: (4/23/21) Pham 11924. This pham contains only genes from cluster BE and BK. No function was listed for any phage. /note=Starterator: Start coordinate is (3, 88034). 100% (49/49) of manually annotated non-draft phages call start site #3, and this site is also highly conserved, suggesting it is likely correct. /note=Coding Potential: Lack of coding potential altogether on Host-trained GeneMark, but fairly strong coding potential on the Self-trained GeneMark. Coding potential is in the forward orientation for this gene. Start site covers all coding potential. /note=SD (Final) Score: -3.889, which is the best of the candidates. /note=Gap/overlap: Small overlap (-4 bp) before gene, which means this gene is part of an operon. /note=Location call: This is a real gene. Considering the evidence, it seems reasonable to place the start site at 88034. Phamerator and Starterator both strongly support this start site. /note=Function call: All of the top calls on both NCBI and PhagesDB suggest that this is a hypothetical protein/unknown function, with strong e-values of < 2e-43 and coverage of 98-100%. CDD yielded no results, and HHpred only came up with strong hits for non-SEA-PHAGES-approved functional genes. NKF. /note=Transmembrane domains: Zero hits in both TMHMM and TOPCONS. /note=Secondary Annotator Name: Rafael, Adriana /note=Secondary Annotator QC: at 2021-04-24 04:29:12.0 CDS 88176 - 88346 /gene="152" /product="gp152" /function="hypothetical protein" /locus tag="PumpkinSpice_152" /note=Original Glimmer call @bp 88206 has strength 5.96; Genemark calls start at 88176 /note=SSC: 88176-88346 CP: yes SCS: both-gm ST: SS BLAST-Start: [hypothetical protein SEA_ICHABODCRANE_147 [Streptomyces phage IchabodCrane] ],,NCBI, q1:s1 100.0% 1.5151E-31 GAP: -11 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.412, -3.8723521859418497, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ICHABODCRANE_147 [Streptomyces phage IchabodCrane] ],,QFP97446,100.0,1.5151E-31 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kim, James Joon /note=Auto-annotation start source: Glimmer and GeneMark where Glimmer shows a start site at 88206 and GeneMark shows a start site at 88176. /note=Phamerator: Pham number 12128 was run on 4/21/21. The two phages used to compare it to this gene are battuta and birchyln. /note=Starterator: Starterator suggests that start number 6 (start site 88206) is the best start site for our gene, however upon further analysis, start number 3 (start site 88176) seems to be the better choice because it has 17 manual annotations which is very convincing evidence of a good start site. /note=Coding Potential: The coding potential is shown through the forward strand which indicates that this gene is indeed a forward gene and this was visualized through the Host-Trained GeneMark data. /note=SD (Final) Score: The RBS final score is -3.872, Z-Score is 2.412, and by having these two wonderful scores, I think it`s the best option for the LORF. /note=Gap/overlap: The overlap is -11 which is an acceptable overlap when comparing it to other phages and with the given overlap, it is advised that a new gene should not be added. /note=Location call: With the given evidence, this appears to be a real gene with its respective start site at 88176. /note=Function call: Unknown Function /note=Transmembrane domains: N/A. Although there is a TOPCON graph that shows evidence to a match, there isn`t a TMH match and because of this, it does not align with the guidelines that it has any relation to a membrane protein since at least one TMH prediction and at least one TOPCON match has to be present. Because of this, we can conclude the gene to have a `no known function` in addition to having no relation to the membrane protein. /note=Secondary Annotator Name: Billings, Sophie /note=Secondary Annotator QC:I do not agree with this start site. There is a lot of evidence that would support another start site over this one including Final Score, Z-score, and Starterator has strong evidence that the start site is start 88176 as it has been manually annotated 17 times and is the most annotated start. Phamerator notes need to include where it is conserved or not. Starterator notes need to include if this start site is in this gene and what the location is if included. Also, if the Starterator evidence agrees with the Glimmer and GeneMark. The overlap data is incorrect for the chosen start site. tRNA 88362 - 88434 /gene="153" /product="tRNA-Ala(tgc)" /locus tag="PUMPKINSPICE_153" /note=tRNA-Ala(tgc) tRNA 88600 - 88673 /gene="154" /product="tRNA-Lys(ctt)" /locus tag="PUMPKINSPICE_154" /note=tRNA-Lys(ctt) CDS 88729 - 89280 /gene="155" /product="gp155" /function="peptidase" /locus tag="PumpkinSpice_155" /note=Original Glimmer call @bp 88729 has strength 6.04; Genemark calls start at 88693 /note=SSC: 88729-89280 CP: yes SCS: both-gl ST: SS BLAST-Start: [peptidase [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 2.08288E-131 GAP: 382 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.143, -5.200466576807147, yes F: peptidase SIF-BLAST: ,,[peptidase [Streptomyces phage Starbow] ],,AXH66639,100.0,2.08288E-131 SIF-HHPRED: Cysteine peptidase; Cysteine peptidase, Structural Genomics, Joint Center for Structural Genomics, JCSG, Protein Structure Initiative, PSI-2, HYDROLASE; HET: LYS, MSE; 2.5A {Bacillus cereus},,,3KW0_C,95.082,99.9 SIF-Syn: peptidase, the upstream gene is pham 12128, the downstream gene is pham 1884, just like in phages Genie2 and Battuta. /note=Primary Annotator Name: Linares Cardona, Ninette /note=Auto-annotation start source: Both Glimmer and GeneMark call the gene but they do not agree on the start site. The start site for Glimmer is at 88729 bp. The start site for GeneMark is at 88693 bp. /note=Phamerator: The pham number as of 04/21/2021 is 8172. The gene is conserved in phages Genie2, Birchlyn, and Braelyn, all in the same cluster as PumpkinSpice. /note=Starterator: Start site 18 in Starterator was manually annotated in 28/41 non-draft genes in this pham. Start 18 is 88729 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found only in Self-Trained GeneMark. The ORF does have reasonable coding potential and the chosen start site does include all of the coding potential. /note=SD (Final) Score: The final score is the best option at -5.200 and the z-score is the highest at 2.143. /note=Gap/overlap: The gap/overlap with the upstream gene is somewhat large at a 194 bp gap. However, this gene is conserved in several other phages and the gap was seen in the other phages as well, such as phage BoomerJR and MindFlayer. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 88729 bp. Starterator agrees with Glimmer. /note=Function call: Peptidase. Multiple phagesDB BLAST and NCBI BLAST has hits with the suggested function peptidase with small e-values of e^-103 to 9e^-131. HHpred had a hit for peptidase with 99% probability, 92% coverage and e-value of 5.4e-19. CDD only provided non-specific hits but they had the peptidase function. These hits don`t show up as peptidase in the CDD box. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Kim, James Joon /note=Secondary Annotator QC: All the annotations look very good and detailed. I agree with the above annotations and believe this gene is real. CDS 89258 - 89461 /gene="156" /product="gp156" /function="hypothetical protein" /locus tag="PumpkinSpice_156" /note=Original Glimmer call @bp 89258 has strength 2.86; Genemark calls start at 89258 /note=SSC: 89258-89461 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_154 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 1.01526E-39 GAP: -23 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.394, -3.909460882755061, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_154 [Streptomyces phage Starbow] ],,AXH66640,100.0,1.01526E-39 SIF-HHPRED: SIF-Syn: NKF, upstream gene belongs to pham 8172 (peptidase), downstream gene belongs to pham 14758, just like in phages Bordeaux and Wipeout. /note=Primary Annotator Name: Liu, Lily /note=Auto-annotation start source: Both Glimmer and GeneMark call the start site at 89258bp. /note=Phamerator: pham 1884, date 04/22/2021. It is conserved in phages such as Battuta, Birchlyn, and Bordeaux. /note=Starterator: start site 11 was manually annotated in 15/32 non-draft genes in this pham. Start site 11 is 89258bp, which agrees with the site called by GeneMark and Glimmer. /note=Coding Potential: The self-trained GeneMark shows both typical and alternative coding potential, but the host-trained GeneMark does not show any coding potential at all. Both the self-trained and the host-trained GeneMarks correspond to the second reading frame. All coding potential is covered. /note=SD (Final) Score: The best final score is -3.909 and the best z-score is 2.394. /note=Gap/overlap: There is a 23bp overlap, but this overlap is conserved among other phages (like Battuta and Birchlyn). /note=Location call: Based on the above evidence, this is a real gene and most likely has a start site at 89258bp. /note=Function call: There is no known function for this gene. All PhagesDB BLAST hits had an unknown function, with the top two e-values both being 4e-35. NCBI BLAST hits all listed this gene as a hypothetical protein, with the top two e-values being 1e-39 and 8e-31. CDD did not have any hits for this gene. HHpred did have some acceptable hits, but those hits corresponded to yeast proteins, so I did not include them as evidence. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Billings, Sophie /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. tRNA 89484 - 89561 /gene="157" /product="tRNA-Asn(gtt)" /locus tag="PUMPKINSPICE_157" /note=tRNA-Asn(gtt) CDS 89728 - 89901 /gene="158" /product="gp158" /function="hypothetical protein" /locus tag="PumpkinSpice_158" /note=Original Glimmer call @bp 89728 has strength 10.25; Genemark calls start at 89728 /note=SSC: 89728-89901 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 3.16874E-33 GAP: 266 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.546, -4.580066930373532, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_180304098,100.0,3.16874E-33 SIF-HHPRED: SIF-Syn: NKF, upstream NKF, downstream NKF, just like IchabodCrane. /note=Primary Annotator Name: Merlos, Andres /note=Auto-annotation start source: Glimmer and GeneMark both call start site 89728. /note=Phamerator: Pham number 14758 was run on 23 April 2021. This gene is highly conserved in other genes within its pham, such as Battuta_158 and Birchlyn_158. There is no assigned function for this gene. /note=Starterator: Start site number 1 was the most published of the annotations. It was called in 27 of 28 non-draft genes. So (1, 89728) would be a good start site. This gene had 8 draft genes. /note=Coding Potential: There is coding potential in the self-trained GeneMark but not in the host-trained GeneMark. The start site covers all coding potential. /note=SD (Final) Score: -4.580 is a good Final Score, while 2.546 is a good Z-Score. /note=Gap/overlap: 266 bp is large, but this is the smallest gapped start site. A gene can be added to the gap, as seen in Birchlyn. Thus, we should consider adding a new gene. /note=Location call: The start site is called for at 89728 F, and the stop site is called for at 89901. I would keep the original start site as it is the LORF, the Final score and the Z-score are sufficient, and the gap is minimized. The autoannotated start site is the best start site to use. It is the most annotated start site number, and it is highly conserved in other genes. This gene is real with a start site of 89728 and a stop site of 89901. /note=Function call: Although there are some hits in phagesDB that match with our gene, the e-value is so high and the score is so low that it may not be accurate to call a function for this gene. Either way, if I were to make a guess, I would call this gene NKF. After running CDD and HHpred, no hits were significant enough to be indicative of a function. Based on previous data, such as no conclusive hits on phagesDB and NCBI, I would hypothesize this gene as NKF. Because there is no TMDs from TMHMM and TOPCONS is irrelevant in this situation, I would continue to call the function of this gene NKF, as we cannot call it a transmembrane protein. /note=Transmembrane domains: No TMDs from TMHMM, TOPCONS irrelevant. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. However, I believe a gene could be inserted upstream of this gene. The 266 bp gap is large enough to fit a gene and, when looking through Pham Maps, other phages in the same cluster (such as BoomerJR, IchabodCrane, and LukeCage) CDS 89902 - 90123 /gene="159" /product="gp159" /function="hypothetical protein" /locus tag="PumpkinSpice_159" /note=Original Glimmer call @bp 89902 has strength 11.25; Genemark calls start at 89902 /note=SSC: 89902-90123 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ICHABODCRANE_155 [Streptomyces phage IchabodCrane] ],,NCBI, q1:s1 100.0% 7.09475E-48 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.912, -2.7653702713535186, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ICHABODCRANE_155 [Streptomyces phage IchabodCrane] ],,QFP97451,100.0,7.09475E-48 SIF-HHPRED: SIF-Syn: NKF, the gene upstream (pham 14758) and the gene downstream (Pham 1435) have no known function just like in TomSawyer and Karimac /note=Primary Annotator Name: Namaganda, Samali /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start site as 77831 /note=Phamerator: As of 4/23/21, the gene belongs to pham 29,451. The pham has 40 members, 8 of which are drafts. The gene is conserved in other BE members Ivy, LilMartin, Battuta and LukeCage. /note=Starterator: Start (8, 77831) was called most often in the published annotations, 17 of the 32-non-draft genes in the pham. It is the most manually annotated (13 times compared to start 7 with 2 times) start site in cluster BE2 phages in the same pham. /note=Coding Potential: The ORF only has coding potential in the Forward direction indicating that it is a forward gene. The ORF has both atypical and typical coding potential in only the self-trained GeneMark /note=SD (Final) Score: -3.566, best RBS score for every option with an overlap/gap of less than 7bp. /note=Gap/overlap: The gaps and overlaps are conserved in the other BE2 members. /note=Location call: Start site 77831 is shared by other genes in other pham members and covers most of the coding potential. /note=Function call: There is yet to be enough data to hypothesize the gene function. All the significant hits (E values < 1e-7) in PhagesDB Blastp and the NCBI Hit Gene product did not have a function. As of 5/14/21, there were no significant hits in HHpred and CDD databases. /note=Transmembrane domains: Analysis ran 5/24/21 shows that there are no TMH called in TmHmm /note=Secondary Annotator Name: Rafael, Adriana Nicole /note=Secondary Annotator QC: CDS 90192 - 90332 /gene="160" /product="gp160" /function="hypothetical protein" /locus tag="PumpkinSpice_160" /note=Original Glimmer call @bp 90192 has strength 2.36; Genemark calls start at 90192 /note=SSC: 90192-90332 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_158 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 9.84758E-24 GAP: 68 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.425, -5.874376955301545, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_158 [Streptomyces phage Starbow] ],,AXH66643,100.0,9.84758E-24 SIF-HHPRED: SIF-Syn: NKF, upstream gene is from pham 12168, downstream gene is from pham 59636, just like in phage Karimac. /note=Primary Annotator Name: Quijada, Britney /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start site at 90192. /note=Phamerator: The pham number as of April 22, 2021 is 1435. The gene is conserved in phages Karimac, LukeCage, and Genie2, all in the same cluster as PumpkinSpice. Function of this pham is not listed. /note=Starterator: Start site 3 in Starterator was manually annotated the most in 14/14 non-draft genes in this pham, which correlates to a start site of 90192 bp for PumpkinSpice. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found only in the Self-Trained GeneMark. The chosen start site includes all of the coding potential. /note=SD (Final) Score: The SD (Final) Score is -5.874 with a Z score of 1.425. These were the only results. Reasonable for the LORF. /note=Gap/overlap: 68 bp gap. The 68 bp gap upstream of the gene does not suggest a new gene since it is < 120 bp. This gap is also well-conserved in several other phages such as IchabodCrane and StarPlatinum. No coding potential is observed within this gap in GeneMark. /note=Location call: Considering all of the evidence above, this is a real gene and the most likely start site is at 90192 bp. Starterator`s graphical output and summary report are in accordance with GeneMark, Glimmer, and the manual annotations. /note=Function call: There are multiple phagesDB BLASTp hits with suggested "function unknown" with top two smallest e-values of 2e-19 and 3e-17. HHPRED does not have strong query coverage or e-values for the protein functions listed. CDD had no apparent hits. Multiple NCBI BLASTp hits also have "hypothetical protein" listed with e-values such as 1e-23 and 3e-21 in alignment with Streptomyces phage Starbow and Streptomyces phage Wofford (100% coverage, 95%+ identity). Function must be unknown. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Castillo, Salvador Alexander Jr. /note=Secondary Annotator QC: I do not have any issues with your notes overall. All necessary evidence required at this time is provided. I agree with the call. tRNA 90323 - 90396 /gene="161" /product="tRNA-Asn(gtt)" /locus tag="PUMPKINSPICE_161" /note=tRNA-Asn(gtt) CDS 90425 - 91291 /gene="162" /product="gp162" /function="membrane protein, Band-7 -like" /locus tag="PumpkinSpice_162" /note=Original Glimmer call @bp 90425 has strength 14.45; Genemark calls start at 90425 /note=SSC: 90425-91291 CP: no SCS: both ST: NI BLAST-Start: [band-7-like membrane protein [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 0.0 GAP: 92 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.999, -3.109529858048176, yes F: membrane protein, Band-7 -like SIF-BLAST: ,,[band-7-like membrane protein [Streptomyces phage Starbow] ],,AXH66644,100.0,0.0 SIF-HHPRED: Band_7 ; SPFH domain / Band 7 family,,,PF01145.26,64.2361,99.9 SIF-Syn: /note=Primary Annotator Name: RAFAEL, ADRIANA NICOLE /note=Auto-annotation start source: Both Glimmer and GeneMark call the start site 90425 /note=Phamerator: Pham 59636 as of 4/23/21...The gene length is not conserved at all. No other phages have a gene length of 867 bp /note=Starterator: Start site 33, the most called start site, is not present in PumpkinSpice even though it is manually annotated in 116/201 non-draft genes in Pham 59636. The start site that Starterator calls is Site (25, 90452) which agrees with the start site Glimmer and GeneMark called (90425). Site 25 is only present in 20.7% of genes in this Pham. Although it is only present in a low percentage of these genes, it is called 93.5% of the time when present. /note=Coding Potential: There is coding potential in the Self-trained GeneMark but not in the Host-Trained GeneMark. The coding potential is also visible in the forward strands indicating that this is a forward gene; it also covers all of the start site. /note=SD (Final) Score: -3.110...this is the best final score in PECAAN /note=Gap/overlap: 92...this gap is somewhat large, however, a gene could not be added to fix the gap. Note: this start site (90425) is the LORF. /note=Location call: This start site should be kept. It covers all of the coding potentials from the GeneMark map and has the best Final Score and Z-value. It is also the LORF. However, there is a gap of 92 which is concerning and, according to Starterator, the most annotated start site (site 33) is not present in PumpkinSpice (see Starterator section). /note=Function call: Band-7-like membrane protein. NCBI and PhagesDB Blastp hits all show that this gene is similar to other Band-7 membrane proteins, specifically from the SPFH domain. The CDD and HHpred hits also confirm this while also delivering good e-values (< 10e-3), possibilities (< 80), and %Conserved values (< 80%). The presence of 2 Transmembrane domains dictated by TMHMM and TOPCONS also confirms that this gene codes for a membrane protein /note=Transmembrane domains: Both TMHMM and TOPCONS predict transmembrane domains (TMHMM predicts 2 TDs). Therefore this gene most likely codes for a membrane protein. /note=Secondary Annotator Name: Kelly, Samuel /note=Secondary Annotator QC: I agree with the primary annotator`s location call. CDS 91473 - 91640 /gene="163" /product="gp163" /function="hypothetical protein" /locus tag="PumpkinSpice_163" /note=Original Glimmer call @bp 91473 has strength 6.18; Genemark calls start at 91473 /note=SSC: 91473-91640 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_161 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 1.22502E-31 GAP: 181 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.3, -2.033982896655645, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_161 [Streptomyces phage Starbow] ],,AXH66645,100.0,1.22502E-31 SIF-HHPRED: SIF-Syn: NFK, upstream gene from pham 10445, downstream gene from pham 63472, they had no functions, just like in phage Genie2. /note=Primary Annotator Name: Rivera, Bryanna /note=Auto-annotation start source: Glimmer and GeneMark, which both called the stop site at 91473. /note=Phamerator: As of 04/21/21 the pham number is 767. This gene is conserved in the subcluster BE2, and phages “Tomsawyer” and “Wipeout” were used for comparison. No function called. /note=Starterator: Start site 8 in Starterator was manually annotated in 12 of 33 non-draft genes in this pham, and this does correlate with the start site 91473 bp that was called by Glimmer and GeneMark. Although this was not the most annotated site in the pham, it was the auto annotated and most conserved site on track 10. It was found in 16 of 41 (39.0%) of genes in pham and was called 100 % of the time it has been present. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. There was really good coding potential in the third forward frame, very minimal atypical potential in the top frames, and no coding potential in the reverse frames. Coding potential was only displayed in the Self-Trained GeneMark. /note=SD (Final) Score: Score of -2.034, and it is the best/lowest negative score present on PECAAN, with a good start codon: ATG. /note=Gap/overlap: There is a gap of 181 bp, and it is the longest reasonable ORF. This gap was conserved in other phages called by Glimmer and GeneMark. /note=Location call: Based on all the data listed above, this is a real gene with a start site of 91473. Good coding potential, the best/lowest RBS and z-scores, good start codon, and a gap that is conserved throughout other genes. As of 04/23 using the data collected from Phamerator and Starterator, alongside all the other evidence, I am firmly convinced this is the accurate start site. /note=Function call: NFK, there were no CDD hits, and the HHpred hits did not have values that met the criteria, the lowest e-value was 8.3. The PhagesDB and NCBI hits displayed unknown function and hypothetical proteins. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Canio, Noah /note=Secondary Annotator QC: Comments: I agree with this annotation. All of the evidence categories have been considered. CDS 91644 - 92030 /gene="164" /product="gp164" /function="hypothetical protein" /locus tag="PumpkinSpice_164" /note=Original Glimmer call @bp 91644 has strength 18.02; Genemark calls start at 91644 /note=SSC: 91644-92030 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 1.84815E-85 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.999, -2.6623718267059564, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675316,100.0,1.84815E-85 SIF-HHPRED: SIF-Syn: NKF, upstream gene is Pham 767, downstream gene is Pham 1660, just like in phages Bordeaux, Starbow and LukeCage. /note=Primary Annotator Name: Taheri, Armin /note=Auto-annotation start source: GeneMark and Glimmer called the start site at 91664. /note=Phamerator: Pham 10445. Conserved in related phages, including TomSawyer, Mindflayer, and IchabodCrane. No function. Date: 4/23/21. /note=Starterator: Site number 6, position 91644 in PumpkinSpice. This is the most annotated start site (manually annotated in 15 of 32 non-draft genes in pham) /note=Coding Potential: Good typical and atypical forward potential on GeneMarkS, contained by the auto-annotated start site. /note=SD (Final) Score: The final score is -2.662 and the z-score is 2.999. These are the best possible scores. /note=Gap/overlap: 3. This is a reasonable gap and conserved in other phages, such as Bordeaux and MindFlayer. /note=Location call: This gene is real, with a start site of 91,644. /note=Function call: Unknown function. All strong PhagesDB and NCBI BLAST hits have unknown function. There are no significant HHpred hits and no CDD hits. /note=Transmembrane domains: No transmembrane domains predicted by TmHmm or Topcons. /note=Secondary Annotator Name: Merlos, Andres /note=Secondary Annotator QC: I agree with the location call, but include the Z-Score in the PECAAN notes. Both GeneMark and Glimmer call a start site at 91644. The gap is minimal and the Final Score is sufficient. Coding potential is present and the most annotated start site number, 6, belongs to the start site of 91644. tRNA 92050 - 92122 /gene="165" /product="tRNA-Arg(cct)" /locus tag="PUMPKINSPICE_165" /note=tRNA-Arg(cct) tRNA 92167 - 92240 /gene="166" /product="tRNA-Ala(ggc)" /locus tag="PUMPKINSPICE_166" /note=tRNA-Ala(ggc) tRNA 92489 - 92562 /gene="167" /product="tRNA-Lys(ctt)" /locus tag="PUMPKINSPICE_167" /note=tRNA-Lys(ctt) tRNA 92687 - 92761 /gene="168" /product="tRNA-Lys(ctt)" /locus tag="PUMPKINSPICE_168" /note=tRNA-Lys(ctt) CDS 92903 - 93115 /gene="169" /product="gp169" /function="hypothetical protein" /locus tag="PumpkinSpice_169" /note=Original Glimmer call @bp 92903 has strength 10.64; Genemark calls start at 92903 /note=SSC: 92903-93115 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_BIRCHLYN_169 [Streptomyces phage Birchlyn]],,NCBI, q1:s1 100.0% 4.75732E-43 GAP: 872 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.055, -6.646470679458422, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BIRCHLYN_169 [Streptomyces phage Birchlyn]],,QDF17311,100.0,4.75732E-43 SIF-HHPRED: SIF-Syn: /note=AF 2/10/22 /note=Start 4 (92903): /note=• Found in 14 of 22 ( 63.6% ) of genes in pham /note=• Manual Annotations of this start: 9 of 16 /note=• Called 92.9% of time when present tRNA 93124 - 93199 /gene="170" /product="tRNA-Lys(ttt)" /locus tag="PUMPKINSPICE_170" /note=tRNA-Lys(ttt) CDS 93200 - 93346 /gene="171" /product="gp171" /function="hypothetical protein" /locus tag="PumpkinSpice_171" /note=Original Glimmer call @bp 93200 has strength 2.27; Genemark calls start at 93200 /note=SSC: 93200-93346 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB80_gp137 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 1.94635E-25 GAP: 84 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.049, -3.005970110175159, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp137 [Streptomyces phage Karimac] ],,YP_009840315,100.0,1.94635E-25 SIF-HHPRED: SIF-Syn: This gene is part of Pham 17289 . The gene upstream of it is from Pham 1660 like Starbow. The gene downstream of it is in Pham 56919 like TomSawyer. /note=Primary Annotator Name: Zuelch, Avery /note=Auto-annotation start source: Both Glimmer and GeneMark call for a start site of 93200. /note=Phamerator:Date: 4/23/21, Pham# 17289 has 22 members with 5 of them being drafts. Gene is conserved in many phages of the BE cluster such as TomSawyer, Wipeout, and Yaboi. /note=Starterator: Start number 4 @ 93200 is the most annotated start site for phages in this pham, as well as is the auto-annotated start site for this gene. It is called in 14 of the 17 non-draft phages, and has 14 MA’s . /note=Coding Potential: No coding potential in the Host-Trained GeneMark. High typical/atypical coding potential in the Self-trained GeneMark in the forward direction in the second reading frame. All coding potential is covered with the selected start site. /note=SD (Final) Score: -3.006 which is the best, and only one present. /note=Gap/overlap: Gap of 84 bp which is acceptable as it is conserved in the Pham. /note=Location call: Start number 4 @93200bp due to all of the evidence above such as it covers all of the coding potential, is the most annotated start site, has 14 MAs, and was also the auto-annotated start site. /note=Function call: . This correlated with all of the data collected. All of the BLAST hits with strong E-values and pairwise alignments showed a protein with a function unknown. There were no CDD hits. The HHpred hits all had very good E-values, probabilities, and percent coverage, however they were all ribosome proteins, and phages do not have ribosomes. Thus, the decision was made for NKF. /note=Transmembrane domains: 0 TMDs. Both TMHMM and TOPCONS showed 0 TMDs. This is acceptable given the fact that the function of this gene is NKF. /note=Secondary Annotator Name: Ali Pour, Paria /note=Secondary Annotator QC: I have QC’ed this location call and agree with the primary annotator. tRNA 93384 - 93457 /gene="172" /product="tRNA-Arg(tcg)" /locus tag="PUMPKINSPICE_172" /note=tRNA-Arg(tcg) CDS 93660 - 94454 /gene="173" /product="gp173" /function="hypothetical protein" /locus tag="PumpkinSpice_173" /note=Original Glimmer call @bp 93660 has strength 11.69; Genemark calls start at 93660 /note=SSC: 93660-94454 CP: yes SCS: both ST: SS BLAST-Start: [VWA domain-containing protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 0.0 GAP: 313 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.912, -3.2925703904164987, yes F: hypothetical protein SIF-BLAST: ,,[VWA domain-containing protein [Streptomyces sp. JV178] ],,WP_099970913,99.6212,0.0 SIF-HHPRED: CobT_C ; Cobalamin biosynthesis protein CobT VWA domain,,,PF11775.9,50.0,98.5 SIF-Syn: Pham 56919 (NKF), upstream gene is Pham 17289 (NKF), downstream is Pham 11643 (NKF), just like in phage TomSwayer. /note=AF: may be a CobT-like protein, but not a lot of evidence. leaving as NKF for now. /note=Primary Annotator Name: Garza, Dominic /note=Auto-annotation start source: Both Glimmer and GeneMark called the start site at 93,660. /note=Phamerator: The gene is a member of Pham 56919 (Recorded on April 12th, 2021). The gene appears to be relatively conserved among other BE2 subcluster phages; however, the gene seem to be conserved across several other clusters as well. I compared PumpkinSpice`s genome to the genomes of TomSawyer (BE2) and Starbow (BE2), which showed that Draft Gene 143 (stop 94454 F) was conserved among these BE2 phages. Phamerator indicated that the gene`s function had been listed as `CobT-like cobalamin biosynthesis protein` in a couple of other genomes that had been previously annotated by other investigators. While this function was approved on the SEA-PHAGES list, the function didn`t appear to be very consistent across other phages listed on PhagesDB under Pham 56919 (Recorded on April 12th, 2021). /note=Starterator: Yes, the Starterator report (Ran on April 12th, 2021) did seem to indicate that there was a particular start site that was well-conserved across other phages in Pham 38697. This is because the start site 24, the "Most Annotated" start site, was listed in 27 out of 55 non-draft genes in Pham 56919. Draft Gene 143 (stop 94454 F) did contain the "Most Annotated" start site listed on the Starterator report, as start site 24 correspond to the start site of 93,660 in PumpkinSpice`s genome. /note=Coding Potential: Only the Self-Trained Gene Mark predicted reasonable typical coding potential in the putative ORF defined by the start site, 93,660, and the stop site, 94,454. The chosen start site of 93,669 covers all coding potential in the Self-Trained Gene Mark. /note=SD (Final) Score: The original start site`s SD is the best. The SD score is -3.293. /note=Gap/overlap: The gap with the upstream gene is considerably large for the start site of 93,660. However, the only other potential start site candidate, 93,561, that could limit this upstream gap doesn`t appear to be a better option. This is because the start site of 93,561 has a poorer SD score, which implies that it would be a worse RBS compared to the start site of 93,660. The RBS score is important since the start site of 93,561 doesn`t create an operon and still establishes a relatively large upstream gap. /note=Location call: Evidence seems to suggest the gene is `real` because good coding potential is present. The start candidate of 93,660 appears to be the most likely candidate. /note=Function call: For PhagesDB BLASTp, only 2 hits were associated with a potential functional call of ‘CobT-like cobalamin biosynthesis protein’. These hits had relatively strong e-values (>2e-27); however, these functional calls were found only in phages outside cluster BE. For NCBI BLASTp, the majority of the hits produced were associated with proteins of no known function; however, there was a single hit that for a protein associated with ‘VWA domain’. There did not appear to be a clear function implied by this hit though. /note=For CDD, the program did return a similar hit associated with ‘VWA domain’, but the hit was associated with relatively poor identity (>26.63%). For HHpred, the output was uninformative as well because the hits were associated with only ‘VWA domain’. These hits had relatively strong e-values (>9.9e-12), fair coverage (73.48%), high probability (99%); however, there does not seem to be a very clear function that is interpretable from ‘VWA domain’ alone. Therefore, the evidence collected seems to suggest that the function is `no known function (NKF)’; however, this could change with more knowledge about ‘VWA domains’. /note=Transmembrane domains: Neither TOPCONS or TMHMM predicted the potential presence of TMD`s, which means that the protein is not a membrane protein. /note=Secondary Annotator Name: Zorawik, Michelle /note=Secondary Annotator QC: I have reviewed all the evidence and agree with the location call. The gene is real and start site 93660 is the most likely choice based on Starterator and gap conservation in other phages. tRNA 94513 - 94586 /gene="174" /product="tRNA-Met(cat)" /locus tag="PUMPKINSPICE_174" /note=tRNA-Met(cat) tRNA 94587 - 94658 /gene="175" /product="tRNA-Met(cat)" /locus tag="PUMPKINSPICE_175" /note=tRNA-Met(cat) CDS 94738 - 94857 /gene="176" /product="gp176" /function="hypothetical protein" /locus tag="PumpkinSpice_176" /note=Original Glimmer call @bp 94738 has strength 8.82; Genemark calls start at 94738 /note=SSC: 94738-94857 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein HWB80_gp134 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 9.77147E-18 GAP: 283 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.066, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp134 [Streptomyces phage Karimac] ],,YP_009840318,100.0,9.77147E-18 SIF-HHPRED: SIF-Syn: Pham 11643 (NKF), upstream gene is Pham 56919 (NKF), downstream is Pham 55598 (NKF), just like in phage TomSwayer. /note=Primary Annotator Name: Garza, Dominic /note=Auto-annotation start source: Both Glimmer and GeneMark called the start site at 94,738. /note=Phamerator: The gene is a member of Pham 11643 (Recorded on April 12th, 2021). The gene appears to be well-conserved among other BE and BK cluster phages. I compared PumpkinSpice`s genome to the genomes of TomSawyer (BE2) and Starbow (BE2), which showed that Draft Gene 144 (stop 94857 F) was conserved among these BE2 phages. Phamerator didn`t indicate that the gene had been annotated with a particular function, as all other members of Pham 11643 (Recorded on April 12th, 2021) didn`t have a function listed for them. /note=Starterator: Yes, the Starterator report (Ran on April 12th, 2021) did seem to indicate that there was a particular start site that was well-conserved across other phages in Pham 11643. This is because the start site 11, the "Most Annotated" start site, was listed in 21 out of 40 non-draft genes in Pham 11643. However, Draft Gene 144 (stop 94857 F) didn`t contain the "Most Annotated" start site listed on the Starterator report, as start site 11 was absent in PumpkinSpice`s genome. The Starterator report indicated that the start site 10 corresponded to the start site of 94738 in PumpkinSpice`s genome, which appeared to have been annotated across other BE2 cluster phages! /note=Coding Potential: Only the Self-Trained Gene Mark predicted reasonable typical coding potential in the putative ORF defined by the start site, 94,738, and the stop site, 94,857. The chosen start site of 94,738 covers all coding potential in the Self-Trained Gene Mark. /note=SD (Final) Score: The original start site`s SD is the best. The SD score is -2.443. /note=Gap/overlap: The gap with the upstream gene is considerably large, but there doesn`t appear to be enough evidence that would warrant changing the start site to 94732. This start site generally has a poorer SD score and doesn`t mitigate the gap in a considerable manner. /note=Location call: Evidence seems to suggest the gene is `real` because good coding potential is present. The start candidate of 94,738 appears to be the most likely candidate. /note=Function call: For NCBI BLASTp and PhagesDB BLASTp, both programs returned hits that were exclusively associated with `no known function`. For CDD, the program did not return any hits for conserved domains. For HHpred, the output was uninformative as well because the hits had relatively poor e-values (>12), considerably poor probability (>35.4%) and functional calls that were not informative because they failed to provide consensus about a single potential functional call (multiple different separate functions were associated with the hits). All in all, the evidence collected seems to suggest that the function is `no known function (NKF)`. /note=Transmembrane domains: Neither TOPCONS or TMHMM predicted the potential presence of TMD`s, which means that the protein is not a membrane protein. /note=Secondary Annotator Name: Bang, Clara /note=Secondary Annotator QC: I agree with Dom`s calls that this gene is real and I also agree with his start site. I think the evidence Dom provided is pretty compelling. CDS 94860 - 95033 /gene="177" /product="gp177" /function="hypothetical protein" /locus tag="PumpkinSpice_177" /note=Original Glimmer call @bp 94860 has strength 7.57; Genemark calls start at 94860 /note=SSC: 94860-95033 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB80_gp133 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 1.05728E-31 GAP: 2 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.713, -4.00988004985817, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp133 [Streptomyces phage Karimac] ],,YP_009840319,100.0,1.05728E-31 SIF-HHPRED: SIF-Syn: Pham 55598 (NKF), upstream gene is Pham 11643 (NKF), downstream is Pham 5232 (NKF), just like in phage TomSwayer. /note=Primary Annotator Name: Garza, Dominic /note=Auto-annotation start source: Both Glimmer and GeneMark called the start site at 94,860. /note=Phamerator: The gene is a member of Pham 55598 (Recorded on April 12th, 2021). The gene appears to be well-conserved among other BE2 subcluster phages exclusively. I compared PumpkinSpice`s genome to the genomes of TomSawyer (BE2) and Starbow (BE2), which showed that Draft Gene 145 (stop 95033 F) was conserved among these BE2 phages. Phamerator didn`t indicate that the gene had been annotated with a particular function, as all other members of Pham 55598 (Recorded on April 12th, 2021) didn`t have a function listed for them. /note=Starterator: Yes, the Starterator report (Ran on April 12th, 2021) did seem to indicate that there was a particular start site that was well-conserved across other phages in Pham 55598. This is because the start site 3, the "Most Annotated" start site, was listed in 16 out of 19 non-draft genes in Pham 55598. Draft Gene 144 (stop 95033 F) did contain the "Most Annotated" start site listed on the Starterator report, as start site 3 correspond to the start site of 94860 in PumpkinSpice`s genome. /note=Coding Potential: Only the Self-Trained Gene Mark predicted reasonable typical coding potential in the putative ORF defined by the start site, 94,860, and the stop site, 95,033. The chosen start site of 94,860 covers all coding potential in the Self-Trained Gene Mark. /note=SD (Final) Score: The original start site`s SD is the best. The SD score is -4.010. /note=Gap/overlap: The overlap with the upstream gene is reasonable since it establishes a gap of 2 bps. However, the start site candidate of 94,854 might establish a better overlap of -4 bps, but it appears that a start site of 94,860 has been annotated across many other members of the pham. /note=Location call: Evidence seems to suggest the gene is `real` because good coding potential is present. The start candidate of 94,860 appears to be reasonable, but the start site of 94,854 might be more reasonable since it establishes a potential operon with the upstream gene that we see present across other Cluster BE2 phages. /note=Function call: For NCBI BLASTp and PhagesDB BLASTp, both programs returned hits that were exclusively associated with `no known function`. For CDD, the program did not return any hits for conserved domains. For HHpred, the output was uninformative as well because the hits had relatively poor e-values (>18), relatively poor percent coverage (>12.28%), considerably poor probability (>53.1%) and functional calls that were not informative. All in all, the evidence collected seems to suggest that the function is `no known function (NKF)`. /note=Transmembrane domains: Neither TOPCONS or TMHMM predicted the potential presence of TMD`s, which means that the protein is not a membrane protein. /note=Secondary Annotator Name: Lopez, Erick /note=Secondary Annotator QC: Dominic did a great job with this annotation, very thoroughly explained and I agree that his gene is real and of no known function. tRNA 95054 - 95137 /gene="178" /product="tRNA-Tyr(gta)" /locus tag="PUMPKINSPICE_178" /note=tRNA-Tyr(gta) tRNA 95400 - 95473 /gene="179" /product="tRNA-His(gtg)" /locus tag="PUMPKINSPICE_179" /note=tRNA-His(gtg) CDS 95505 - 95882 /gene="180" /product="gp180" /function="hypothetical protein" /locus tag="PumpkinSpice_180" /note=Original Glimmer call @bp 95505 has strength 16.53; Genemark calls start at 95505 /note=SSC: 95505-95882 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_MINDFLAYER_175 [Streptomyces phage MindFlayer]],,NCBI, q1:s27 100.0% 1.17036E-84 GAP: 471 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.879, -5.0037239033518315, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_MINDFLAYER_175 [Streptomyces phage MindFlayer]],,QPL13784,82.7815,1.17036E-84 SIF-HHPRED: SIF-Syn: Pham 5232 (NKF), upstream gene is Pham 55598 (NKF), downstream is Pham 48781 (NKF), just like in phage TomSwayer. /note=Primary Annotator Name: Garza, Dominic /note=Auto-annotation start source: Both Glimmer and GeneMark called the start site at 95,505. /note=Phamerator: The gene is a member of Pham 5232 (Recorded on April 12th, 2021). The gene appears to be well-conserved among other BE2 subcluster phages exclusively. I compared PumpkinSpice`s genome to the genomes of TomSawyer (BE2) and Starbow (BE2), which showed that Draft Gene 146 (stop 95882 F) was conserved among these BE2 phages. Phamerator didn`t indicate that the gene had been annotated with a particular function, as all other members of Pham 5232 (Recorded on April 12th, 2021) didn`t have a function listed for them. /note=Starterator: Yes, the Starterator report (Ran on April 12th, 2021) did seem to indicate that there was a particular start site that was well-conserved across other phages in Pham 5232. This is because the start site 6, the "Most Annotated" start site, was listed in 10 out of 11 non-draft genes in Pham 5232. The Starterator report indicated that the start site 6 corresponded to the start site of 95505 in PumpkinSpice`s genome, which appeared to have been annotated across other BE2 cluster phages! /note=Coding Potential: Only the Self-Trained Gene Mark predicted reasonable typical coding potential in the putative ORF defined by the start site, 95,505, and the stop site, 95,882. The chosen start site of 95,505 covers all the coding potential predicted in the Self-Trained Gene Mark. /note=SD (Final) Score: The original start site`s SD isn`t the best. The SD score is -5.004. The start site of 95,433 establishes the best SD score of -4.314. /note=Gap/overlap: The overlap with the upstream gene is considerably large, which suggests that we might consider changing the start site to minimize the current excessive gap established by the initial start site candidate of 95,505. The start site of 95,433 could potentially mitigate this excessive gap, but not by a lot. However, Starterator seems to suggest that the original start site call is the most conserved across other Cluster BE2 phages. /note=Location call: Evidence seems to suggest the gene is `real` because good coding potential is present. The start candidate of 95,505 seems to be an appropriate start site that is well-conserved across other cluster BE2 phages. /note=Function call: For NCBI BLASTp and PhagesDB BLASTp, both programs returned hits that were exclusively associated with `no known function`. For CDD, the only conserved domain hit that was returned was associated with a protein of ‘V-type ATP synthase subunit I; however, there were not any published works on PubMed that elaborated on the meaning of this function. For HHpred, the output was uninformative as well because the hits had relatively poor e-values (>9.5), relatively poor percent coverage (>23.2%) and functional calls that were not informative. All in all, the evidence collected seems to suggest that the function is `no known function (NKF)`. /note=Transmembrane domains: Neither TOPCONS or TMHMM predicted the potential presence of TMD`s, which means that the protein is not a membrane protein. /note=Secondary Annotator Name: Nguyen, Jennifer /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 95893 - 96189 /gene="181" /product="gp181" /function="hypothetical protein" /locus tag="PumpkinSpice_181" /note=Original Glimmer call @bp 95893 has strength 8.96; Genemark calls start at 95893 /note=SSC: 95893-96189 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_STARBOW_180 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 1.47221E-65 GAP: 10 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.166, -4.975913734906325, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_180 [Streptomyces phage Starbow] ],,AXH66653,100.0,1.47221E-65 SIF-HHPRED: SIF-Syn: Pham 48781 (NKF), upstream gene is Pham 5232 (NKF), downstream is Pham 13502 (NKF), just like in phage TomSwayer. /note=Primary Annotator Name: Garza, Dominic /note=Auto-annotation start source: Both Glimmer and GeneMark called the start site at 95,893. /note=Phamerator: The gene is a member of Pham 48781 (Recorded on April 12th, 2021). The gene appears to be relatively conserved among other BE2 subcluster phages. I compared PumpkinSpice`s genome to the genomes of TomSawyer (BE2) and Starbow (BE2), which showed that Draft Gene 147 (stop 96189 F) was conserved among these BE2 phages. Phamerator didn`t indicate that the gene had been annotated with a particular function, as all other members of Pham 48781 (Recorded on April 12th, 2021) didn`t have a function listed for them. /note=Starterator: Yes, the Starterator report (Ran on April 12th, 2021) did seem to indicate that there was a particular start site that was well-conserved across other phages in Pham 48781. This is because the start site 7, the "Most Annotated" start site, was listed in 17 out of 33 non-draft genes in Pham 48781. However, Draft Gene 147 (stop 96189 F) didn`t contain the "Most Annotated" start site listed on the Starterator report, as start site 7 was absent in PumpkinSpice`s genome. The Starterator report indicated that the start site 12 corresponded to the start site of 95893 in PumpkinSpice`s genome, which appeared to have been annotated across other BE2 cluster phages! /note=Coding Potential: Only the Self-Trained Gene Mark predicted reasonable typical coding potential in the putative ORF defined by the start site, 95,893, and the stop site, 96,189. The chosen start site of 95,893 contains all coding potential predicted in the Self-Trained Gene Mark. /note=SD (Final) Score: The original start site`s SD isn`t the best, but it is the second best option. The SD score is -4.976. The SD score should not be a huge deciding factor because the best SD score option produces an over excessive overlap of 26 bps. /note=Gap/overlap: The overlap with the upstream gene seems reasonable since it minimizes excessive overlap/gaps provided by other potential start sites. The gap value of 10 seems to be reasonable considering the other potential start sites create either too much upstream overlap/gap. /note=Location call: Evidence seems to suggest the gene is `real` because good coding potential is present. The start candidate of 95,893 appears to be the most likely candidate. /note=Function call: For NCBI BLASTp and PhagesDB BLASTp, both programs returned hits that were exclusively associated with `no known function`. For CDD, the only conserved domain hit that was returned was associated with a protein of ‘no known function’. For HHpred, the output was uninformative as well because the hits had relatively poor e-values (>.81) and produced a number of different functional calls that were either uninformative or failed to provided a consensus on the protein’s potential function (produced a wide variety of potential functions). All in all, the evidence collected seems to suggest that the function is `no known function (NKF)`. /note=Transmembrane domains: Neither TOPCONS or TMHMM predicted the potential presence of TMD`s, which means that the protein is not a membrane protein. /note=Secondary Annotator Name: Wobig, Nathan /note=Secondary Annotator QC: I agree with all the evidence provided by Dom; I believe the evidence is sufficient. CDS 96191 - 96445 /gene="182" /product="gp182" /function="hypothetical protein" /locus tag="PumpkinSpice_182" /note=Original Glimmer call @bp 96191 has strength 10.06; Genemark calls start at 96191 /note=SSC: 96191-96445 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178]],,NCBI, q1:s1 100.0% 2.52799E-53 GAP: 1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.567, -3.486006226271621, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178]],,WP_143675318,100.0,2.52799E-53 SIF-HHPRED: SIF-Syn: Pham 13502 (NKF), upstream gene is Pham 48781 (NKF), downstream is Pham 8778 (NKF), just like in phage TomSwayer. /note=Primary Annotator Name: Garza, Dominic /note=Auto-annotation start source: Both Glimmer and GeneMark called the start site at 96,191. /note=Phamerator: The gene is a member of Pham 13502 (Recorded on April 12th, 2021). The gene appears to be well-conserved among other BE2 subcluster phages. I compared PumpkinSpice`s genome to the genomes of TomSawyer (BE2) and Starbow (BE2), which showed that Draft Gene 148 (stop 96445 F) was conserved among these BE2 phages. Phamerator didn`t indicate that the gene had been annotated with a particular function, as all other members of Pham 13502 (Recorded on April 12th, 2021) didn`t have a function listed for them. /note=Starterator: Yes, the Starterator report (Ran on April 12th, 2021) did seem to indicate that there was a particular start site that was well-conserved across other phages in Pham 13502. This is because the start site 7, the "Most Annotated" start site, was listed in 18 out of 33 non-draft genes in Pham 13502. Draft Gene 148 (stop 96445 F) did contain the "Most Annotated" start site listed on the Starterator report, as start site 7 correspond to the start site of 96191 in PumpkinSpice`s genome. /note=Coding Potential: Only the Self-Trained Gene Mark predicted reasonable typical coding potential in the putative ORF defined by the start site, 96,191, and the stop site, 96,445. The chosen start site of 96,191 contains all coding potential predicted in the Self-Trained Gene Mark. /note=SD (Final) Score: The original start site`s SD is the best. The SD score is -3.486. /note=Gap/overlap: The overlap with the upstream gene is reasonable since it minimizes excessive overlap/gaps provided by other potential start sites. The gap value of 1 suggests the gene may be part of an operon. /note=Location call: Evidence seems to suggest the gene is `real` because good coding potential is present. The start candidate of 96,191 appears to be the most likely candidate. /note=Function call: For NCBI BLASTp and PhagesDB BLASTp, both programs returned hits that were exclusively associated with `no known function`. For CDD, no conserved domain hits were provided and the information was considered uninformative about the protein`s potential function. For HHpred, the output was uninformative as well because the hits had relatively poor e-values (>43), relatively low coverage (21.4%), relatively low probability (44.9%) and uninformative potential function calls. All in all, the evidence collected seems to suggest that the function is `no known function (NKF)`. /note=Transmembrane domains: Neither TOPCONS or TMHMM predicted the potential presence of TMD`s, which means that the protein is not a membrane protein. /note=Secondary Annotator Name: Zorawik, Michelle /note=Secondary Annotator QC: I have reviewed all the evidence and agree with the location call. The gene is real and 96191 is the most likely start site based on Starterator, the Z-score, and Final-score. tRNA 96544 - 96619 /gene="183" /product="tRNA-Thr(tgt)" /locus tag="PUMPKINSPICE_183" /note=tRNA-Thr(tgt) tRNA 96710 - 96782 /gene="184" /product="tRNA-Thr(ggt)" /locus tag="PUMPKINSPICE_184" /note=tRNA-Thr(ggt) CDS 96801 - 96905 /gene="185" /product="gp185" /function="hypothetical protein" /locus tag="PumpkinSpice_185" /note=Genemark calls start at 96801 /note=SSC: 96801-96905 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein HWB80_gp129 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 9.47975E-13 GAP: 355 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.999, -2.723328252647382, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp129 [Streptomyces phage Karimac] ],,YP_009840323,100.0,9.47975E-13 SIF-HHPRED: SIF-Syn: NKF, upstream is pham 13505, downstream is pham 17151, just like in phages Karimac, and Starbow. /note=Primary Annotator Name: Bahena, Kenninsy /note=Auto-annotation start source: Only Genemark called start site at 96801 bp, Glimmer does not have a start site /note=Phamerator: Pham 8778 as of June 14, 2021. The gene is conserved between cluster BE and sub cluster BE2, use for comparison LukeCage, Karimac, and Starbow. Possible functions unknown. /note=Starterator: Reasonable conserved start site. 24/29 non-draft genes "Most annotated" called site #3. For the phage, Starterator (3, 96801) found in 53/53 (100%) of genes in pham and called 96.6% when present. /note=Coding Potential: From start site ( 96,801) to the end site (96,905) high and strong coding potential and atypical coding potential within the entire ORF, frame 3 in the Genemark map. /note=SD (Final) Score: Both RBS is -2.723 and the Z-score is 2.999 are the best in this gene. /note=Gap/overlap: 355, reasonable LORF, even though a big gap was present. This had the best Z-score, RBS score and it was collaborated by Genemark start site. /note=Location call: Real gene. Potential start 96,801, conserved in Phamerator cluster (BE) and sub cluster BE2. Found in 29/34 of genes and called 96.6% when present. High and strong coding potential covering the entire ORF. Phagesbd BLAST strong hit values with lower than E-values 10^-7. Start codon ATG. /note=Function call: NKF, based on all the data I have collected I was not able to find the function of this gene ( NKF: no known function). In BLASTp, NCBI there were hits with query coverage of 100% with percent identity of 100% and e-values lower than e-12, however there were for hypothetical proteins. In phagesdb BLAST there were hits but the scores ( bits) were at 73 and the hits were for unknown functions. In CDD had no hits this indicates that a gene might not have conserved domains. Here HHPRED hit list had acceptable probability,, but not acceptable percent coverage and e-values. /note=Transmembrane domains: No TMD, Number of predicted TMHs: 0 /note=Secondary Annotator Name: Bang, Clara /note=Secondary Annotator QC: tRNA 96915 - 96988 /gene="186" /product="tRNA-Arg(acg)" /locus tag="PUMPKINSPICE_186" /note=tRNA-Arg(acg) CDS 97045 - 97203 /gene="187" /product="gp187" /function="hypothetical protein" /locus tag="PumpkinSpice_187" /note=Original Glimmer call @bp 97045 has strength 1.62 /note=SSC: 97045-97203 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_187 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 3.7554E-28 GAP: 139 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.987, -5.800577329217627, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_187 [Streptomyces phage Starbow] ],,AXH66656,100.0,3.7554E-28 SIF-HHPRED: SIF-Syn: NKF, upstream is pham 4753, downstream is pham 8778, just like in phages Karimac, and Starbow. /note=Primary Annotator Name: Bahena, Kenninsy /note=Auto-annotation start source: Only Glimmer called start site at 97045 bp, GeneMark does not have a start site /note=Phamerator: Pham 17151 as of June 14, 2021. The gene is conserved between cluster BE and sub cluster BE2, use for comparison LukeCage, Karimac, and Starbow. Possible functions unknown. /note=Starterator: Reasonable conserved start site. 9/9 non-draft genes "Most annotated" called site #3. For the phage, Starterator (3, 97045) found in 11/12 (91.7%) of genes in pham and called 100% when present. /note=Coding Potential: From start site ( 97,045) to the end site (97,203) high and strong coding potential and atypical coding potential within the entire ORF, frame 1 in the Genemark map. /note=SD (Final) Score: Both RBS is -5.801 and the Z-score is 1.987 are the best in this gene. /note=Gap/overlap: 139, reasonable LORF, even though a big gap was present. This had the best Z-score, RBS score, it was collaborated by Glimmer start site and was the most selected in Starterator. /note=Location call: Real gene. Potential start 97,045, conserved in Phamerator cluster (BE) and sub cluster BE2. Found in 11/12 of genes and called 100% when present. /note=Function call: NKF, based on all the data I have collected I was not able to find the function of this gene ( NKF: no known function). In BLASTp, NCBI there were hits with query coverage of 100% with percent identity of 100% and e-values lower than e-28, however there were for hypothetical proteins. In phagesdb BLAST there were hits but the scores (bits) were at 140 and the hits were for unknown functions. In CDD had no hits this indicates that a gene might not have conserved domains. Here HHPRED hit list had not acceptable probability, percent coverage and e-values. /note=Transmembrane domains: No TMD, Number of predicted TMHs: 0 /note= /note=Secondary Annotator Name: Lopez, Erick /note=Secondary Annotator QC: No PECAAN note progress. Coding potential observed only in the self-trained genemark. Host-trained genemark does not. Glimmer calls 97,045 as the start site, genemark does not. Starterator for Pham 17,751 calls start site 3 for this gene/phage and it is consistent with the most manually annotated start site. Therefore, this gene is real. According to Blast analysis oh both phagesdb and NCBI, this gene is of no known function. This gene is conserved in Bordeaux and BoomerJr phages within the same cluster, BE. CDS 97187 - 97744 /gene="188" /product="gp188" /function="DnaE-like DNA polymerase III (alpha)" /locus tag="PumpkinSpice_188" /note=Original Glimmer call @bp 97187 has strength 10.21; Genemark calls start at 97187 /note=SSC: 97187-97744 CP: yes SCS: both ST: SS BLAST-Start: [3`-5` exoribonuclease [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 7.30919E-134 GAP: -17 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.06, -2.5359491958734357, yes F: DnaE-like DNA polymerase III (alpha) SIF-BLAST: ,,[3`-5` exoribonuclease [Streptomyces sp. JV178] ],,WP_099970915,100.0,7.30919E-134 SIF-HHPRED: DNA polymerase III subunit epsilon; DNA editing Proofreading Exonuclease Polymerase, DNA Binding protein; 6.7A {Escherichia coli K12},,,5M1S_D,98.9189,99.4 SIF-Syn: exonuclease, upstream is pham 17151, downstream is pham 7764, just like in phages Karimac, and Yaboi. /note=Primary Annotator Name: Bahena, Kenninsy /note=Auto-annotation start source: Both Glimmer and Genemark called start site at 97,187 bp /note=Phamerator: Pham 4753 as of June 14, 2021. The gene is conserved between cluster BE and sub cluster BE2, use for comparison LukeCage, Karimac, and Starbow. Possible functions unknown. /note=Starterator: Reasonable conserved start site. 35/88 non-draft genes "Most annotated" called site #43. For the phage, Starterator (43, 97187) found in 47/105 (44.8%) of genes in pham and called 91.5% when present. /note=Coding Potential: From start site ( 97,187) to the end site (97,744) high and strong coding potential and atypical coding potential within the entire ORF, frame 2 in the Genemark map. /note=SD (Final) Score: Both RBS is -2.536 and the Z-score is 3.06 are the best in this gene. /note=Gap/overlap: -17, reasonable LORF. This had the best Z-score, RBS score, it was collaborated by Genemark and Glimmer start site and was the most selected in Starterator. /note=Location call: Real gene. Potential start 97,187, conserved in Phamerator cluster (BE) and sub cluster BE2. Found in 47/107 of genes and called 91.5% when present. /note=Function call: Based on all the data I have collected, I was able to find the function of the gene to be exonuclease. This gene is real. Both NCBI and phagesdb BLASTp got good hits with E-values lower than e^-111, and percent identities higher than 90%. PDB hits were significant and HHPRED hit list had acceptable probabilities, percent coverage, and E-values. CDD had 2 hits indicating the gene might have conserved domains that were characterized. The accession number for one hit of HHPRED was NF033638, the second accession number was pfam1473, both results returned a called of exonuclease. This function appeared to be called by Batuta, another member of this sub cluster. /note=Transmembrane domains: No TMD, Number of predicted TMHs: 0 /note=Secondary Annotator Name: LastName, FirstName Secondary Annotator QC: CDS 97748 - 98143 /gene="189" /product="gp189" /function="hypothetical protein" /locus tag="PumpkinSpice_189" /note=Original Glimmer call @bp 97748 has strength 13.1; Genemark calls start at 97748 /note=SSC: 97748-98143 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 3.72247E-92 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.77, -3.588861267572955, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675319,100.0,3.72247E-92 SIF-HHPRED: SIF-Syn: NKF, upstream is pham 4753, downstream is pham 20140, just like in phages Karimac, and Yaboi. /note=Primary Annotator Name: Bahena, Kenninsy /note=Auto-annotation start source: Both Glimmer and Genemark called start site at 97,748 bp /note=Phamerator: Pham 7764 as of June 14, 2021. The gene is conserved between cluster BE and sub cluster BE2, use for comparison LukeCage, Karimac, and Starbow. Possible functions unknown. /note=Starterator: Reasonable conserved start site. 16/16 non-draft genes "Most annotated" called site #1. For the phage, Starterator (1, 97748) found in 21/21 (100%) of genes in pham and called 100% when present. /note=Coding Potential: From start site ( 97,748) to the end site (98,143) high and strong coding potential and atypical coding potential within the entire ORF, frame 2 in the Genemark map. /note=SD (Final) Score: Both RBS is -3.589 and the Z-score is 2.77 are the best in this gene. /note=Gap/overlap: 3, reasonable LORF. This had the best Z-score, RBS score, it was collaborated by Glimmer and Genemark start site and was the most selected in Starterator. /note=Location call: Real gene. Potential start 97,748, conserved in Phamerator cluster (BE) and sub cluster BE2. Found in 21/21 of genes and called 100% when present. /note=Function call: NKF, based on all the data I have collected I was not able to find the function of this gene ( NKF: no known function). In BLASTp, NCBI there were hits with query coverage of 100% with percent identity of 100% and e-values lower than e-92, however there were for hypothetical proteins. In phagesdb BLAST there were hits but all were for unknown function. In CDD had no hits this indicates that a gene might not have conserved domains. Here HHPRED hit list had not acceptable probability, or e-values, even though they had acceptable percent coverage. /note=Transmembrane domains: No TMD, Number of predicted TMHs: 0 /note=Secondary Annotator Name: LastName, FirstName Secondary Annotator QC: tmRNA 98164 - 98442 /gene="190" /locus tag="PUMPKINSPICE_190" /note= CDS 98455 - 98769 /gene="191" /product="gp191" /function="hypothetical protein" /locus tag="PumpkinSpice_191" /note=Original Glimmer call @bp 98455 has strength 9.86; Genemark calls start at 98455 /note=SSC: 98455-98769 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_191 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 2.81726E-71 GAP: 311 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.01, -3.0866669750886713, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_191 [Streptomyces phage Starbow] ],,AXH66659,100.0,2.81726E-71 SIF-HHPRED: SIF-Syn: NKF, upstream is pham 7764, downstream is pham 50588, just like in phages Karimac, and Yaboi. /note=Primary Annotator Name: Bahena, Kenninsy /note=Auto-annotation start source: Both Glimmer and Genemark called start site at 98,455 bp /note=Phamerator: Pham 20140 as of June 14, 2021. The gene is conserved between cluster BE and sub cluster BE2, use for comparison LukeCage, Karimac, and Starbow. Possible functions unknown. /note=Starterator: Reasonable conserved start site. 15/15 non-draft genes "Most annotated" called site #3. For the phage, Starterator (3, 98,455) found in 20/20 (100%) of genes in pham and called 100% when present. /note=Coding Potential: From start site ( 98,455) to the end site (98,769) high and strong coding potential and atypical coding potential within the entire ORF, frame 1 in the Genemark map. /note=SD (Final) Score: Both RBS is -3.087 and the Z-score is 3.01 are the best in this gene. /note=Gap/overlap: 311, reasonable LORF. Even though a large gap is present, this had the best Z-score, RBS score, it was collaborated by Glimmer and Genemark start site and was the most selected in Starterator. /note=Location call: Real gene. Potential start 98,455, conserved in Phamerator cluster (BE) and sub cluster BE2. Found in 20/20 of genes and called 100% when present. /note=Function call: NKF, based on all the data I have collected I was not able to find the function of this gene ( NKF: no known function). In BLASTp, NCBI there were hits with query coverage of 100% with percent identity of 100% and e-values lower than e-71, however there were for hypothetical proteins. In phagesdb BLAST there were hits but they were for unknown function proteins. In CDD had no hits this indicates that a gene might not have conserved domains. Here HHPRED hit list had acceptable probability, and percent coverage, but not acceptable e-values. /note=Transmembrane domains: No TMD, Number of predicted TMHs: 0 /note=Secondary Annotator Name: LastName, FirstName Secondary Annotator QC: CDS 98779 - 99003 /gene="192" /product="gp192" /function="membrane protein" /locus tag="PumpkinSpice_192" /note=Original Glimmer call @bp 98779 has strength 6.05; Genemark calls start at 98779 /note=SSC: 98779-99003 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_192 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 2.22867E-45 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.865, -4.954064444624791, no F: membrane protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_192 [Streptomyces phage Starbow] ],,AXH66660,100.0,2.22867E-45 SIF-HHPRED: SIF-Syn: NKF, upstream is pham 20140, downstream is pham 10486, just like in phages MindFlayer, and Karimac. /note=AF: membrane protein. /note=Primary Annotator Name: Bahena, Kenninsy /note=Auto-annotation start source: Both Glimmer and Genemark called start site at 98,779 bp /note=Phamerator: Pham 50588 as of June 14, 2021. The gene is conserved between cluster BE and sub cluster BE2, use for comparison LukeCage, Karimac, and Starbow. Possible functions unknown. /note=Starterator: Reasonable conserved start site. 23/24 non-draft genes "Most annotated" called site #10. For the phage, Starterator (10, 98,779) found in 30/31 (98.8%) of genes in pham and called 96.7% when present. /note=Coding Potential: From start site ( 98,779) to the end site (99,003) high and strong coding potential and atypical coding potential within the entire ORF, frame 1 in the Genemark map. /note=SD (Final) Score: Both RBS is -4.954 and the Z-score is 1.865 are the best in this gene. /note=Gap/overlap: 9, reasonable LORF, this had the best Z-score, RBS score, it was collaborated by Glimmer and Genemark start site and was the most selected in Starterator. /note=Location call: Real gene. Potential start 98,779, conserved in Phamerator cluster (BE) and sub cluster BE2. Found in 30/31 of genes and called 96.7% when present. /note=Function call: NKF, based on all the data I have collected I was not able to find the function of this gene ( NKF: no known function). In BLASTp, NCBI there were hits with query coverage of 100% with percent identity of 100% and e-values lower than e-45, however there were for hypothetical proteins. In phagesdb BLAST there were hits but they were for unknown function proteins. In CDD had no hits this indicates that a gene might not have conserved domains. Here HHPRED hit list had acceptable probability, and percent coverage, but not acceptable e-values. /note=Transmembrane domains: No TMD, Number of predicted TMHs: 1, however the predicted probability was less than 75% (0.57) indicating this was insignificant. TOPCONS did not have any transmembrane domains in this gene, /note=Secondary Annotator Name: Bang, Clara /note=Secondary Annotator QC: CDS 99055 - 99459 /gene="193" /product="gp193" /function="acetyltransferase" /locus tag="PumpkinSpice_193" /note=Original Glimmer call @bp 99055 has strength 8.15; Genemark calls start at 99055 /note=SSC: 99055-99459 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB86_gp127 [Streptomyces phage Yaboi] ],,NCBI, q1:s1 100.0% 5.84985E-89 GAP: 51 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.829, -3.4668782168002776, yes F: acetyltransferase SIF-BLAST: ,,[hypothetical protein HWB86_gp127 [Streptomyces phage Yaboi] ],,YP_009841291,94.7761,5.84985E-89 SIF-HHPRED: L-2,4-diaminobutyric acid acetyltransferase; L-2, 4-diaminobutyrate acetyltransferase, acetyl coenzyme A, acetylation, stress response, chemical chaperone, TRANSFERASE; HET: COA, DAB; 1.2A {Paenibacillus lautus},,,6SLL_A,92.5373,99.8 SIF-Syn: Acetyltransferase (Pham 10486), the upstream gene belongs to Pham 50588 with no function, and the downstream gene belongs to Pham 12868, whose function is also unknown. The same order of genes with the same corresponding Phams is observed in phage Karimac. /note=Primary Annotator Name: Andari, Maya /note=Auto-annotation start source: Glimmer and GeneMark. They both agree on the same start site which is listed as 99055. /note=Phamerator: As of 5/13/21, the Pham this gene is found in is listed as 10486. This gene is conserved with some of the other members within the same cluster, Cluster BE, and have similar bp length as well. I compared my phage to two other phages: IchabodCrane and Karimac. There is no function listed on Phamerator for this gene, however there is a 94% function frequency of the genes within this pham that identify the function as an acetyltransferase. /note=Starterator: The start site choice conserved among the members of the pham is start site 99055; (7, 99055). This pham has 41 members, with 8 of them being drafts. (41/41 call site #7) /note=Coding Potential: This gene has very moderate coding potential predicted within the putative ORF and the chosen start site does cover all of the coding potential present. /note=SD (Final) Score: This start site does have the best SD score, listed as -3.467. /note=Gap/overlap: The 51 bp gap is reasonable since there isn`t another start site listed with a better gap, or a higher SD score and Z-value. /note=Location call: The gene is a real gene and seems to have the best start site listed. /note=Function call: The top hit in PhagesDB with the lowest e-value and highest percent identity called acetyltransferase as the function. This is also confirmed by NCBI top 2 hits, whose suggested function is an acetyltransferase, with almost 100% query coverage, 100% identity, and low e-values of zero. Based on CDD and HHpred, the protein is classified as an acetyltransferase and functions as an enzyme that transfers an acetyl group. /note=Transmembrane domains: The function of gene 156 is hypothesized to be an acetyltransferase, however this enzyme does not contain any transmembrane domains. /note=Secondary Annotator Name: Lopez, Erick /note=Secondary Annotator QC: Thorough enough information to call the correct start site and the legitimacy of this gene. I agree with Maya that this gene is real and at the suggested start site. CDS 99470 - 99685 /gene="194" /product="gp194" /function="hypothetical protein" /locus tag="PumpkinSpice_194" /note=Original Glimmer call @bp 99470 has strength 12.26; Genemark calls start at 99470 /note=SSC: 99470-99685 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 2.66985E-43 GAP: 10 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.804, -5.608116066865693, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675321,100.0,2.66985E-43 SIF-HHPRED: DUF2992 ; Protein of unknown function (DUF2992),,,PF11208.9,73.2394,60.9 SIF-Syn: NKF (Pham 12868), the upstream gene belongs to Pham 10486, the downstream gene belongs to Pham 14655. The same order of genes with corresponding Phams is observed in phage LukeCage. /note=Primary Annotator Name: Andari, Maya /note=Auto-annotation start source: Glimmer and GeneMark. They both agree on the same start site which is listed as 99470. /note=Phamerator: As of 5/13/21, the Pham this gene is found in is listed as 12868. This gene is conserved with some of the other members within Cluster BE & BK, and have similar bp length as well. I compared my phage to two other phages: LukeCage and Starbow. There is no function listed on Phamerator for this gene or function frequency of the genes within this pham that identify the function. /note=Starterator: The start site choice conserved among the members of the pham is start site 99470; (12, 99470). This pham has 63 members, with 14 of them being drafts. (62/63 call site #12) /note=Coding Potential: This gene has good coding potential predicted within the putative ORF and the chosen start site does cover all of the coding potential. /note=SD (Final) Score: This start site does not have the best SD score. The best SD score is listed as -5.397. /note=Gap/overlap: The 10 bp gap is reasonable and there isn`t another start site listed with a better gap. /note=Location call: This is a real gene and seems to have the best start site, however it does not have the best SD score or Z-value, but it does have the Most Annotated start site according to Starterator and has good coding potential. /note=Function call: The top 2 hits in PhagesDB BLAST, suggest that the function is unknown, with a low e-values. None of the significant hits on NCBI BLAST suggest a protein function with high query coverage, identity, or an e-value of zero. There are no hits on CDD and no significant hits with a function on HHpred, therefore I believe that the function of this protein still remains unknown. /note=Transmembrane domains: Since there is no known function for this gene, I’m not quite sure whether it made sense not to see any TMDs. However, due to the lack of TMDs, this function cannot be called as a membrane protein either. /note=Secondary Annotator Name: Nguyen, Jennifer /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 99696 - 99941 /gene="195" /product="gp195" /function="hypothetical protein" /locus tag="PumpkinSpice_195" /note=Original Glimmer call @bp 99696 has strength 4.77; Genemark calls start at 99696 /note=SSC: 99696-99941 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB80_gp121 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 3.0847E-53 GAP: 10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.616, -3.911270251972219, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp121 [Streptomyces phage Karimac] ],,YP_009840331,100.0,3.0847E-53 SIF-HHPRED: SIF-Syn: NKF (Pham 14655), the upstream gene belongs to Pham 12868, the downstream gene belongs to Pham 56053. The same order of genes with corresponding Phams is observed in phage Bordeaux. /note=Primary Annotator Name: Andari, Maya /note=Auto-annotation start source: Glimmer and GeneMark. They both agree on the same start site which is listed as 99696. /note=Phamerator: As of 5/13/21, the Pham this gene is found in is listed as 14655. This gene is conserved with some of the other members within the same cluster, Cluster BE, and have the same bp length as well. I compared my phage to two other phages: WipeOut and Bordeaux. There is no function listed on Phamerator for this gene, or a function frequency of the genes within this pham that identify the function. /note=Starterator: The start site choice conserved among the members of the pham is start site 99696; (14, 99696). This pham has 21 members, with 6 of them being drafts. (21/21 call site #14) /note=Coding Potential: This gene has good coding potential predicted within the putative ORF and the chosen start site does cover all of the coding potential. /note=SD (Final) Score: This start site does have the best SD score, which is listed as -3.911. /note=Gap/overlap: The 10 bp gap is reasonable since there isn`t another start site with a better gap, or higher SD score and Z-value. /note=Location call: This gene is a real gene and seems to have the best start site. /note=Function call: The top 2 hits in PhagesDB BLAST, suggest that the function is unknown, with low e-values. None of the significant hits on NCBI BLAST suggest a protein function with high query coverage, identity, or an e-value of zero. There are no hits on CDD and the only significant hit with a function on HHpred is a Glycosyl hydrolase. However, there is not enough evidence to suggest this is a function, therefore I believe that the function of this protein still remains unknown. /note=Transmembrane domains: Since there is no known function for this gene, I’m not quite sure whether it made sense not to see any TMDs. However, due to the lack of TMDs, this function cannot be called as a membrane protein either. /note=Secondary Annotator Name: Wobig, Nathan /note=Secondary Annotator QC: I agree with the proposed realness and start site for this gene. The evidence is sufficient and sufficiently presented. CDS 100100 - 100540 /gene="196" /product="gp196" /function="hypothetical protein" /locus tag="PumpkinSpice_196" /note=Original Glimmer call @bp 100100 has strength 13.36; Genemark calls start at 100100 /note=SSC: 100100-100540 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 7.10981E-95 GAP: 158 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.99, -3.2535385006449706, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675322,100.0,7.10981E-95 SIF-HHPRED: SIF-Syn: NKF (Pham 56053), the upstream gene belongs to Pham 14655, the downstream gene belongs to Pham 9113. The same order of genes with corresponding Phams is observed in phage MindFlayer. /note=Primary Annotator Name: Andari, Maya /note=Auto-annotation start source: Glimmer and GeneMark. They both agree on the same start site which is listed as 100100. /note=Phamerator: As of 5/13/21, the Pham this gene is found in is listed as 56053. This gene is conserved with some of the other members within Clusters BE and BK, and have similar bp length as well. I compared my phage to two other phages: Bordeaux and MindFlayer. There is no function listed on Phamerator for this gene, or a function frequency of the genes within this pham that identify the function. /note=Starterator: The start site choice conserved among the members of the pham is start site 100100; (14, 100100). This pham has 54 members, with 12 of them being drafts. (35/54 call site #14) /note=Coding Potential: This gene has really good coding potential predicted within the putative ORF and the chosen start site does cover all of the coding potential. /note=SD (Final) Score: This start site does have the best SD score, which is listed as -3.254. /note=Gap/overlap: The 158 bp gap is not that reasonable, but there isn`t another start site listed with a better gap or a higher SD score and Z-value. /note=Location call: This gene is a real gene and seems to have the best start site. /note=Function call: The top 2 hits in PhagesDB BLAST, suggest that the function is unknown, with low e-values. None of the significant hits on NCBI BLAST suggest a protein function with high query coverage, identity, or an e-value of zero. There are no hits on CDD and the only significant hit with a function on HHpred is a glycosyltransferase. However, there is not enough evidence to suggest this is a function, therefore I believe that the function of this protein still remains unknown. /note=Transmembrane domains: Since there is no known function for this gene, I’m not quite sure whether it made sense not to see any TMDs. However, due to the lack of TMDs, this function cannot be called as a membrane protein either. /note=Secondary Annotator Name: Zorawik, Michelle /note=Secondary Annotator QC: I have reviewed all the evidence and agree with the location call. The gene is real and start site 100100 is the most likely choice based on Starterator, the Final-score, and Z-score. CDS 100797 - 101081 /gene="197" /product="gp197" /function="hypothetical protein" /locus tag="PumpkinSpice_197" /note=Original Glimmer call @bp 100797 has strength 11.79; Genemark calls start at 100797 /note=SSC: 100797-101081 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 6.70517E-62 GAP: 256 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.865, -5.016377765385121, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675323,100.0,6.70517E-62 SIF-HHPRED: SIF-Syn: NKF (Pham 9113), the upstream gene belongs to Pham 56053, the downstream gene belongs to Pham 12335. The same order of genes with corresponding Phams is observed in phage Battuta. /note=Primary Annotator Name: Andari, Maya /note=Auto-annotation start source: Glimmer and GeneMark. They both agree on the same start site which is listed as 100797. /note=Phamerator: As of 5/13/21, the Pham this gene is found in is listed as 9113. This gene is conserved with some of the other members within Clusters BE and BK, and have similar bp length as well. I compared my phage to two other phages: Battuta and Karimac. There is no function listed on Phamerator for this gene, or a function frequency of the genes within this pham that identify the function. /note=Starterator: /note=Coding Potential: This gene has good coding potential predicted within the putative ORF and the chosen start site does cover all of the coding potential. /note=SD (Final) Score: This start site does not have the best SD score. The best SD score is listed as -4.669. /note=Gap/overlap: The 256 bp gap is not that reasonable. There isn`t another start site listed with a better gap, but there is a start site with a higher SD score and Z-value. /note=Location call: This gene seems like a real gene based off it`s coding potential, but I am hesitant to declare it a real gene yet. /note=Function call: The top 2 hits in PhagesDB BLAST, suggest that the function is unknown, with low e-values. None of the significant hits on NCBI BLAST suggest a protein function with high query coverage, identity, or an e-value of zero. There are no hits on CDD and the only significant hit with a function on HHpred is a DNA-directed RNA polymerase. However, there is not enough evidence to suggest this is a function, therefore I believe that the function of this protein still remains unknown. /note=Transmembrane domains: Since there is no known function for this gene, I’m not quite sure whether it made sense not to see any TMDs. However, due to the lack of TMDs, this function cannot be called as a membrane protein either. /note=Secondary Annotator Name: Bang, Clara /note=Secondary Annotator QC: I agree with Maya`s call on the start site. With the evidence taken together, I would think that this gene is indeed real. It is conserved within members of its cluster and a different cluster and it has the best SD score. It also is called by both Glimmer and GeneMark and has good coding potential. CDS 101081 - 101266 /gene="198" /product="gp198" /function="hypothetical protein" /locus tag="PumpkinSpice_198" /note=Genemark calls start at 101081 /note=SSC: 101081-101266 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein SEA_WIPEOUT_192 [Streptomyces phage Wipeout] ],,NCBI, q1:s1 100.0% 1.07776E-38 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.057, -5.652760272842956, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_WIPEOUT_192 [Streptomyces phage Wipeout] ],,QGH74404,100.0,1.07776E-38 SIF-HHPRED: SIF-Syn: NKF (Pham 12335), the upstream gene belongs to Pham 9113, the downstream gene belongs to Pham 9125. The same order of genes with corresponding Phams is observed in phage TomSawyer. /note=Primary Annotator Name: Andari, Maya /note=Auto-annotation start source: No call for Glimmer, only called by GeneMark with a start site listed as 101081. /note=Phamerator: As of 5/13/21, the Pham this gene is found in is listed as 12335. This gene is conserved with some of the other members within Clusters BE and BK, and have similar bp length as well. I compared my phage to two other phages: TomSawyer and Starbow. There is no function listed on Phamerator for this gene, or a function frequency of the genes within this pham that identify the function. /note=Starterator: The start site choice conserved among the members of the pham is start site 101081; (9, 101081). This pham has 50 members, with 12 of them being drafts. (40/50 call site #9) /note=Coding Potential: This gene has reasonable coding potential predicted within the putative ORF and the chosen start site does cover all of the coding potential. /note=SD (Final) Score: This start site does have the best SD score, which is listed as -5.653. /note=Gap/overlap: The -1 bp overlap is reasonable since there isn`t a start site with a better gap, SD score, or Z-value. /note=Location call: This is a real gene and seems to have the best start site. /note=Function call: The top 2 hits in PhagesDB BLAST, suggest that the function is unknown, with low e-values. None of the significant hits on NCBI BLAST suggest a protein function with high query coverage, identity, or an e-value of zero. There are no hits on CDD and the only significant hit with a function on HHpred is an acetylglutamate kinase. However, there is not enough evidence to suggest this is a function, therefore I believe that the function of this protein still remains unknown. /note=Transmembrane domains: Since there is no known function for this gene, I’m not quite sure whether it made sense not to see any TMDs. However, due to the lack of TMDs, this function cannot be called as a membrane protein either. /note=Secondary Annotator Name: Lopez, Erick /note=Secondary Annotator QC: I agree with Maya`s call that this is a real gene and at the best selected start site. Main annotator still needs to confirm function call! CDS 101269 - 101532 /gene="199" /product="gp199" /function="hypothetical protein" /locus tag="PumpkinSpice_199" /note=Genemark calls start at 101269 /note=SSC: 101269-101532 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein SEA_BIRCHLYN_200 [Streptomyces phage Birchlyn]],,NCBI, q1:s1 100.0% 1.80787E-56 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.878, -3.7437894285470263, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BIRCHLYN_200 [Streptomyces phage Birchlyn]],,QDF17329,98.8506,1.80787E-56 SIF-HHPRED: SIF-Syn: Pham 9125 (as of 5/26/2021), upstream is Pham 17831 with NKF (as of 5/26/2021), downstream is Pham 12235 with NKF (as of 5/26/2021), just like in phages Battuta, Karimac, and Starbow /note=Primary Annotator Name: Nguyen, Jennifer /note=Auto-annotation start source: Only GeneMark called the gene at the start site at 101269 bp. /note=Phamerator: Pham number 9125 as of 4/21/2021. The gene is conserved in phages Battuta, Bordeaux, and Starbow, all belonging to the same subcluster BE2 as PumpkinSpice. /note=Starterator: Start site 13 in Starterator was manually annotated in 19/53 non-draft genes in this pham. Start 13 is 101269 in PumpkinSpice. This evidence agrees with the site predicted by GeneMark. /note=Coding Potential: The ORF does have reasonable coding potential and the chosen start site does include all of the coding potential. /note=SD (Final) Score: -3.744. This is the best final score on PECAAN. /note=Gap/overlap: There is a 2 bp gap, which is reasonable as this gap is less than 30 bp and it is the smallest gap amongst the gene candidates. This gene is also conserved in several other phages and the gap was seen in the other phages as well, such as phages Battuta, Bordeaux, and Starbow. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 101269. Starterator data agrees with Genemark. /note=Function call: The function is unknown. The top hits for PhagesDB BLAST have an unknown function (E-value < 5e-48) and the top hits for NCBI BLAST also have an unknown function or hypothetical protein (100% coverage, E-value < 2e-56, and 98.85%+ identity). HHpred and CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predicted any TMDs. This does not meet the evidence requirements to call it a membrane protein. /note=Secondary Annotator Name: Nguyen, Jennifer /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 101529 - 101708 /gene="200" /product="gp200" /function="hypothetical protein" /locus tag="PumpkinSpice_200" /note=Original Glimmer call @bp 101529 has strength 3.28 /note=SSC: 101529-101708 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_200 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 1.58862E-36 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.466, -5.866826365354437, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_200 [Streptomyces phage Starbow] ],,AXH66668,100.0,1.58862E-36 SIF-HHPRED: SIF-Syn: Pham 17831 (as of 5/26/2021), upstream is an RNA ligase (Pham 4204 as of 5/26/2021), downstream is Pham 9125 with NKF (as of 5/26/2021), just like in phages Bordeaux, Mindflayer, and Starbow /note=Primary Annotator Name: Nguyen, Jennifer /note=Auto-annotation start source: Only Glimmer called the gene at the start site at 101529 bp. /note=Phamerator: Pham number 17831 as of 5/1/2021. The gene is conserved in phages Mindflayer, Starbow, and TomSawyer, all belonging to the same subcluster BE2 as PumpkinSpice. /note=Starterator: Start site 3 in Starterator was manually annotated in 12/16 non-draft genes in this pham. Start 3 is 101529 in PumpkinSpice. This evidence agrees with the site solely predicted by Glimmer. /note=Coding Potential: The ORF does have reasonable coding potential and the chosen start site does include all of the coding potential. /note=SD (Final) Score: The RBS score is irrelevant because the gene is organized into an operon and thus transcribed as polycistronic mRNA based on the -4 bp overlap. Therefore, the start site with the operon will be chosen over this factor. It will be good to note that the start site with the operon has a final score of -5.867, so it would have been the third-best final score of four candidates on PECAAN. /note=Gap/overlap: There is a -4 bp overlap, indicating that the gene is part of an operon. This gene is also conserved in several other phages and this gap can be seen in the other phages such as phages Mindflayer, Starbow, and TomSawyer. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 101529. Starterator data agrees with Glimmer. /note=Function call: The function is unknown. The top hits for PhagesDB BLAST have an unknown function (E-value < 4e-32) and the top hits for NCBI BLAST also have an unknown function or hypothetical protein (100% coverage, E-value < 8e-36, and 98%+ identity). HHpred and CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predicted any TMDs. This does not meet the evidence requirements to call it a membrane protein. /note=Secondary Annotator Name: Wobig, Nathan /note=Secondary Annotator QC: I agree with the evidence presented here regarding gene realness and start site. CDS 101721 - 102797 /gene="201" /product="gp201" /function="RNA ligase" /locus tag="PumpkinSpice_201" /note=Original Glimmer call @bp 101721 has strength 13.46; Genemark calls start at 101721 /note=SSC: 101721-102797 CP: no SCS: both ST: SS BLAST-Start: [2`-5` RNA ligase [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 0.0 GAP: 12 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.99, -2.6814417326944517, no F: RNA ligase SIF-BLAST: ,,[2`-5` RNA ligase [Streptomyces sp. JV178] ],,WP_099970917,99.7207,0.0 SIF-HHPRED: T4 RNA ligase 1; metal catalysis, covalent nucleotidyltransferase, lysyl-AMP, LIGASE; HET: ATP; 2.187A {Enterobacteria phage T4},,,5TT6_A,94.4134,100.0 SIF-Syn: RNA ligase, upstream gene belongs to pham 17831, downstream gene belongs to pham 29897, just like in phages Battuta and Wipeout. /note=Primary Annotator Name: Liu, Lily /note=Auto-annotation start source: Both Glimmer and GeneMark call the start site at 101721bp. /note=Phamerator: pham 4204, date 5/16/2021. It is conserved in phages such as Battuta, Birchyln, and BoomerJR. /note=Starterator: Start site 16 was manually annotated in 44/81 non-draft genes in this pham. Start site 16 is 101721bp, which agrees with the site called by GeneMark and Glimmer. /note=Coding Potential: The self-trained GeneMark shows both typical and alternative coding potential, but the host-trained GeneMark does not show any coding potential at all. Both the self-trained and the host-trained GeneMarks correspond to the third reading frame. (The chosen start site does not cover all of the typical coding potential in the self-trained GeneMark. The self-trained GeneMark has typical coding potential that starts at the gene upstream of this one and continues after the upstream gene ends and through this gene. But this start site is still the longest ORF for this gene and covers the most coding potential.) /note=SD (Final) Score: The best final score is -2.681 and the best z-score is 2.99. (These scores correspond with the chosen start site.) /note=Gap/overlap: There is a 12bp gap, which is not too large, and this gap is also conserved among other phages (like Battuta and Birchlyn). /note=Location call: Based on the above evidence, this is a real gene and most likely has a start site at 101721bp. /note=Function call: This gene encodes a RNA ligase. The majority of PhagesDB hits listed this protein as RNA ligase, with the top two e-values both being 0. The majority of NCBI BLAST hits also listed this protein as RNA ligase, with the top two e-values both being 0. CDD hits all called this protein as RNA ligase, with the two best e-values being 2.41e-22 and 7.84e-19. The majority of HHpred hits also listed this protein as RNA ligase, with the top two e-values being 1e-48 and 7.9e-25. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Bang, Clara /note=Secondary Annotator QC: I agree with Lily`s calls that this gene is real and I agree with her called start site. I think that she provides compelling evidence for both of these calls! CDS 102787 - 103080 /gene="202" /product="gp202" /function="hypothetical protein" /locus tag="PumpkinSpice_202" /note=Original Glimmer call @bp 102874 has strength 4.29; Genemark calls start at 102874 /note=SSC: 102787-103080 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_BIRCHLYN_203 [Streptomyces phage Birchlyn] ],,NCBI, q1:s1 100.0% 5.18628E-65 GAP: -11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.663, -4.59030133454269, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BIRCHLYN_203 [Streptomyces phage Birchlyn] ],,QDF17332,100.0,5.18628E-65 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kim, James Joon /note=Auto-annotation start source: Both Glimmer and Genemark call the start site at 102874. /note=Phamerator: Pham 29897 where it was also conserved in the two other phages used for comparison, Battuta and BoomerJR respectively. The pham report was ran on 5/21/21 /note=Starterator: Starterator calls for start site 4 which is 102874, however the start site 102874 has only 1 manual annotation where it is called 20% of the time which leads me to believe that start site 102787 is the actual start site because it has 14 manual annotations and is called 76.2% times when present. /note=Coding Potential: The coding potential is on the forward direction where the self-trained genemark shows sufficient evidence of the coding potential through the red and black lines being displayed near the start site. The host-trained genemark also shows sufficient evidence of a coding potential found in the start site region. /note=SD (Final) Score: The final score being -4.590 is the best final score with respect to 2.663 as its Z-Score. Although there is another start site with a better final score, the analysis done on starterator suggests that the start site of 102787 is the better choice for the start site due to the presence of more manual annotations. /note=Gap/overlap: There is an overlap of 11 basepairs but upon comparing it to nearby phages such as battuta and boomerJR, they also display the same results and this overlap does not present any future problems or concerns. /note=Location call: From the evidence above, I believe that the start site of this gene is 102787. /note=Function call: The top three searches in Blast showed hypothetical proteins that have no known function. Because of this, I believe that the phage also has no known function. /note=Transmembrane domains: MHMM and TOPCONS do not predict TMD`s to be present and because of this, it cannot be predicted as a transmembrane protein. /note=Secondary Annotator Name: Lopez, Erick Alberto /note=Secondary Annotator QC: I agree with James` call that this is a real gene and at the correct start site. Additionally from the evidence presented it is apparent that this gene is of no known function. The lack of TMD`s is indicative of this protein not being a transmembrane protein. CDS 103148 - 103336 /gene="203" /product="gp203" /function="hypothetical protein" /locus tag="PumpkinSpice_203" /note=Original Glimmer call @bp 103148 has strength 3.72 /note=SSC: 103148-103336 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein HWB80_gp112 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 1.65671E-36 GAP: 67 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.466, -6.694195638408262, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp112 [Streptomyces phage Karimac] ],,YP_009840340,100.0,1.65671E-36 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kim, James Joon /note=Auto-annotation start source: Glimmer calls the start site at 103148 whereas Genemark does not have a start site call. /note=Phamerator: The pham is 1122 where it was run on 5/21/21. The two phages used for comparison was battuta and birchyln where the conserved domain between the phages was used for comparison where syteny was observed. /note=Starterator: Starterator says that start number 5 is the most common start site in published annotations which is the equivalent of start site 103148 where it is called 100% of the time when present along with 14 manual annotations which is also very sufficient. /note=Coding Potential: The coding direction for this gene is forward where the self-trained genemark shows good evidence of coding through the red and black lines being displayed. However, the host-trained genemark does not show enough evidence for coding potential. /note=SD (Final) Score: The final score for this gene is -6.694 which is a very good final score with respect to its Z-score being 1.466. Although the other start site has a better Z-score we need to consider the analysis done on starterator to be confident with our start site being 103148. /note=Gap/overlap: Although there is a gap of 67 base pairs, it is an unusually high amount of base pairs but when comparing this to the genes found in battuta and birchyln, there was also an overlap of base pairs near this gene which means that this overlap is expected and should not cause any future problems. /note=Location call: With the evidence above, the best start site for this gene is 103148. /note=Function call: When looking at the top 3 searches in blast, it shows hypothetical proteins which leads us to believe that this gene does not have a known function yet. /note=Transmembrane domains: MHMM and TOPCONS do not predict TMD`s to be present and because of this, it cannot be predicted as a transmembrane protein. /note=Secondary Annotator Name: Nguyen, Jennifer /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 103260 - 103421 /gene="204" /product="gp204" /function="hypothetical protein" /locus tag="PumpkinSpice_204" /note=Genemark calls start at 103260 /note=SSC: 103260-103421 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein SEA_MINDFLAYER_199 [Streptomyces phage MindFlayer] ],,NCBI, q1:s1 100.0% 1.2004E-30 GAP: -77 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.829, -3.4668782168002776, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_MINDFLAYER_199 [Streptomyces phage MindFlayer] ],,QPL13802,100.0,1.2004E-30 SIF-HHPRED: SIF-Syn: Unknown function, upstream gene is in pham 1122, downstream is in pham 7023, just like in phage Battuta. /note=AF: Weird gene, huge overlap, but CP clear for both genes & many other phages also call this one. /note=Auto-annotation start source: Glimmer does not call gene, but GeneMark calls the start site at 103260 bp. /note=Phamerator: Pham 56617. Date: 05/23/2021. This Pham is conserved in other BE cluster phages (Battuta and MindFlayer used for comparison). One function is called for this Pham for one phage: DNA helicase. However, there is no function defined for this Pham across most BE cluster phages in the phams database. /note=Starterator: Start site 3 is conserved in 8/14 of non-draft phage annotations. The site corresponds to 103260 bp in PumpkinSpice. /note=Coding Potential: Coding potential on the ORF is on the forward strand, and it is all covered by the chosen start site. Good coding potential on Self, but it is not sufficient on Host. /note=SD (Final) Score: -3.467. This is the best SD score on PECAAN. With respect to Z-score (2.829), this is also the best score out of all the start site candidates in PECAAN. /note=Gap/overlap: 77 bp overlap. This is a very large-sized overlap, but this orientation is conserved in one other phage: Battuta. The gene itself is conserved in other phages such as MindFlayer and Daubensuki, but this excessive overlap is unique to this phage and Batutta. The overlap appears to coincide with the coding potential of the upstream gene. /note=Location call: Based on the information above, this is a real gene. The predicted start site is 103260. Starterator agrees with GeneMark. /note=Function call: No Known Function/Unknown Function. The top 3 phagesdb BLAST hits are those of unknown function (E value < e-8). The top 4 NCBI BLASTp hits suggested function is unknown/hypothetical protein, with high query coverage (90-100%), high percent identity (minimum is 43%, which is greater than 35%), and low E-values (between 1e-30 and 9e-08). There are no relevant hits from CDD while HHpred has two best hits with >95% probability, 18-35% coverage, and e-values at about 0.01. Despite having a good probability value, the other parameter values are insufficient evidence, and the function is denoted for eukaryotes based on primary literature. So, there is no known function based on this information. /note=Transmembrane domains: MHMM and TOPCONS do not predict TMD`s to be present. Therefore, it cannot be identified as a transmembrane protein based on this information. /note=Secondary Annotator Name: Wobig, Nathan /note=Secondary Annotator QC: I agree with the evidence presented for gene realness and start site. Though the overlap is considerably large, there are not better start site options. CDS 103451 - 103606 /gene="205" /product="gp205" /function="membrane protein" /locus tag="PumpkinSpice_205" /note=Original Glimmer call @bp 103448 has strength 7.06; Genemark calls start at 103421 /note=SSC: 103451-103606 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_204 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 4.96956E-28 GAP: 29 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.069, -6.134477077463115, no F: membrane protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_204 [Streptomyces phage Starbow] ],,AXH66672,100.0,4.96956E-28 SIF-HHPRED: SIF-Syn: Unknown function, upstream gene is in pham 56617, downstream is polynucleotide kinase, just like in phage Battuta. /note=Auto-annotation start source: Both Glimmer and GeneMark call the gene, but they call different start sites. Glimmer calls the start site at 103448 bp, and GeneMark calls the gene at 103421 bp. /note=Phamerator: Pham 7023. Date: 05/23/2021. This Pham is conserved in other BE cluster phages (Battuta and TomSawyer used for comparison). There is no function defined for this Pham across BE cluster phages in the phams database. /note=Starterator: Start site 5 is conserved in 2/10 of non-draft phage annotations. The site corresponds to 103448 bp in PumpkinSpice. While there is a more conserved start site, start site 6 with 6/10 non draft genes calling it, the PumpkinSpice genome is represented on the track with confirmed start site 5 rather than start site 6. /note=Coding Potential: Coding potential on the ORF is on the forward strand, and it is all covered by the chosen start site. Good coding potential on Self, but it is not sufficient on Host. /note=SD (Final) Score: -5.436. This is not the best SD score. However, it is reasonable due to being a value around some of the other called start sites. /note=Gap/overlap: 26 bp gap. This is a reasonable size for a gap in the genome. The start site does not result in the longest ORF, but this gap is conserved in phage Battuta. /note=Location call: Based on the information above, this is a real gene. The predicted start site is 103448. Despite not being the most manually annotated one, Starterator`s proposed start site 5 agrees with Glimmer. (Note: this was edited to now be called 103451. It is the most manually annotated start site, and it shares synteny with phage Battuta.) /note=Function call: No Known Function/Unknown Function. The top 4 phagesdb BLAST hits have the function of "function unknown" (E-value = 1e-24). The top 4 NCBI BLASTp hits suggested function is unknown/hypothetical protein, with high query coverage (100%), high percent identity (>96%), and low E-values (between 5e-28 and 5e-27). There are no relevant hits from CDD while HHpred has an insufficient best hits with 89.4% probability, 72.549% coverage, and an e-value of 2.3. Despite having a good probability value, the other parameter values are insufficient evidence. So, there is no known function based on this information. /note=Transmembrane domains: TMHMM and TOPCONS predict one TMD, with 4/6 of the TOPCONS programs calling it. Therefore, this indicates that this gene can be identified as a "transmembrane protein." /note=Secondary Annotator Name: Zorawik, Michelle /note=Secondary Annotator QC: All the evidence has been considered carefully but I would choose start 6 instead, which corresponds to 103451 in PumpkinSpice. The gap is conserved in Battuta, which is the only phage showing synteny with PumpkinSpice in this genome location. Start site 6 has 6/10 MA`s, including Battuta. The Final Score is not the best but the gene forms an operon with the downstream gene and is therefore acceptable. The upstream gene belongs to pham 56617 and even though this pham is found in other BE phages, it is only found in front of this gene in Battuta and MindFlayer. In Batutta, the gap is 30bp (just as for start site 6 in PumkinSpice) and 0 bp in MindFlayer (just as for start site 3 in PumpkinSpice). Since start site 3 does not tend to be called frequently but start site 6 does (Starterator), the most likely start site choice for this gene in PumkinSpice is 6, which corresponds to 103451. This start site was not called by Glimmer or GeneMark but it does cover all the coding potential. CDS 103606 - 104496 /gene="206" /product="gp206" /function="polynucleotide kinase" /locus tag="PumpkinSpice_206" /note=Original Glimmer call @bp 103606 has strength 12.38; Genemark calls start at 103606 /note=SSC: 103606-104496 CP: yes SCS: both ST: SS BLAST-Start: [polynucleotide kinase [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.969, -4.25176284984289, yes F: polynucleotide kinase SIF-BLAST: ,,[polynucleotide kinase [Streptomyces phage Starbow] ],,AXH66673,99.6622,0.0 SIF-HHPRED: POLYNUCLEOTIDE KINASE; KINASE, PHOSPHATASE, ALPHA/BETA, P-LOOP, TRANSFERASE; HET: ADP, MSE; 2.33A {Enterobacteria phage T4} SCOP: c.108.1.9, c.37.1.1,,,1LTQ_A,99.6622,100.0 SIF-Syn: Polynucleotide kinase is upstream of gene in pham 7023 and downstream of gene in pham 64703 just like in phages Battuta, Birchlyn, Bordeaux, and IchabodCrane. /note=Primary Annotator Name: Delgado, Yennifer /note=Auto-annotation start source: Glimmer and GeneMark. Both called the start at 103606. /note=Phamerator: pham: 60896. Date: 05/21/2021. It is conserved; found in Battuta_206, Birchlyn_207, BoomerJR_204. /note=Starterator: Start site 49 in Starterator was manually annotated in 40/275 non-draft genes in this pham. Start 49 is 103606 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. All the coding potential is covered by the chosen start site. Coding potential is found in GeneMark Self but not in GeneMark Host. /note=SD (Final) Score: -4.252. It is the second best final score on PECAAN. This score is not very significant since there is no reasonable ribosomal binding site due to the overlap. /note=Gap/overlap: 1 bp. The overlap is very small, and no gene could possibly be added to that region. This overlap is also conserved in Karimac, Battuta, and Bordeaux. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 103606. /note=Function call: polynucleotide kinase. The top 3 phagesdb BLAST hits have the function of polynucleotide kinase (E-value <1e-172), and the top 3 NCBI BLAST hits also have the same function with 100% coverage, 96%+ identity, and E-value <0. Both CDD and HHpred called this gene as polynucleotide kinase. CDD called it a polynucleotide kinase with e-values < 4.23e-10. HHpred also called it a polynucleotide kinase with 99.5% probability, more than 45% coverage, and e-values <1.2e-11. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: BANG, CLARA EUHEUN /note=Secondary Annotator QC: I agree with Yennifer`s calls. I think this gene is real and I also agree with her start site 103606. She provides ample evidence that makes her calls convincing. Please remember to complete the Starterator menu drop-down CDS 104537 - 104770 /gene="207" /product="gp207" /function="hypothetical protein" /locus tag="PumpkinSpice_207" /note=Original Glimmer call @bp 104537 has strength 9.83; Genemark calls start at 104537 /note=SSC: 104537-104770 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_BIRCHLYN_208 [Streptomyces phage Birchlyn] ],,NCBI, q1:s1 100.0% 4.87306E-51 GAP: 40 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.987, -5.225598853956516, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BIRCHLYN_208 [Streptomyces phage Birchlyn] ],,QDF17336,100.0,4.87306E-51 SIF-HHPRED: SIF-Syn: Gene in pham 60505 is upstream of gene in pham 63888 (polynucleotide kinase) and downstream of gene in pham 15216 just like in phages Battuta, Birchlyn, Bordeaux, and Karimac. /note=Primary Annotator Name: Delgado, Yennifer /note=Auto-annotation start source: Glimmer and GeneMark. Both called the start at 104537. /note=Phamerator: pham: 60505. Date: 05/23/2021. It is conserved; found in Battuta_207, Birchlyn_208, and Bordeaux_207. /note=Starterator: Start site 5 in Starterator was manually annotated in 13/21 non-draft genes in this pham. Start 5 is 104537 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. All the coding potential is covered by the chosen start site. Coding potential is found in GeneMark Self but not in GeneMark Host. /note=SD (Final) Score: -5.226. It is the second best final score on PECAAN. /note=Gap/overlap: 40 bp. The gap is a little bit large, but reasonable because the gap is conserved in other phages (Battuta, IchabodCrane, and Birchlyn) and there is no coding potential in the gap that might be a new gene. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 104537. /note=Function call: NKF. Both CDD and HHpred did not show any significant hits for this gene. The top two phagesdb BLAST hits have unknown function (E-value =2e-41), and the 3 top NCBI BLAST hits also have unknown function (hypothetical protein) with 100% coverage, 98%+ identity, and E-value <5.0e-50. Thus, the function of this gene is unknown. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Torres, Canela /note=Secondary Annotator QC: Based on the above information (Glimmer and GeneMark agreeing, good coding potential, previous annotations in Starterator, as well as synteny with final phage genomes) I agree with the primary annotator`s location call. CDS 104812 - 105186 /gene="208" /product="gp208" /function="hypothetical protein" /locus tag="PumpkinSpice_208" /note=Original Glimmer call @bp 104812 has strength 17.5; Genemark calls start at 104812 /note=SSC: 104812-105186 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_207 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 4.34982E-80 GAP: 41 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.353, -4.461652798070151, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_207 [Streptomyces phage Starbow] ],,AXH66675,99.1936,4.34982E-80 SIF-HHPRED: SIF-Syn: [NKF, both upstream and downstream genes immediately surrounding are currently unknown, (listed as phams 64703 upstream; and 17655 downstream; 05/21/2021) with the genes matching with phages TomSawyer/Karimac] /note=Primary Annotator Name: Bruns, James /note=Auto-annotation start source: Both Glimmer and GeneMark concur with the start at 104812. /note=Phamerator: As of 05/16/2021 the gene in question belongs in pham 63472, and is present in non-draft phages Karimac and LukeCage. Both phages similarly belong to cluster BE. /note=Starterator:Start six, of which 19 out of 32 non-draft genes confirm, is located on PumpkinSpice 104812 bp. This was also called by both GeneMark and Glimmer which agree with the most annotated start given by Starterator. /note=Coding Potential: Coding potential is present in both Host and Self Trained GeneMark analysis with atypical potential present in the Self-Trained GeneMark page. The predicted gene length by the Self-Trained GeneMark analysis represents the whole gene including the predicted start site. Gene is in the forward orientation. /note=SD (Final) Score: Score of -4.462, and is the lowest negative score present on PECAAN. /note=Gap/overlap: Gap present totaling 41 bp, and is the longest reasonable ORF for this Gene call. Upon synteny comparisons (Wipeout/MindFlayer), the gap seen was considered correct. /note=Location call: Based on the data above, this is highly likely to be a true gene with a start of 104812 bp. /note=Function call: PhagesDB BLASTp shows no functions in over ten final draft hits with E-values of 120 bp). The gaps found in other phages with a similar gene, including Battuta (BE2) and BoomerJr (BE2), is noticeably smaller. The size of this gap thus does not appear to be reasonable. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 105985. This proposed start site agrees with the evidence from GeneMark and Starterator, gives a higher Final Score than the original call, and results in a smaller gap compared to the original call. /note=Function call: SprT-like protease. The top Phagesdb BLAST hits (e-value of 1e-86) give a function of SprT-like protease. The top NCBI BLAST hits (100% coverage, ~90%+ identity, E-value <1e-104) also give a function of SprT-like protease. HHPRED has a significant hit (99.8% probability, 96.6% coverage, E-value 2.4e^-19) for a SprT-like domain-containing protein. Finally, CDD gives two significant hits (E-values 8.2e-9 and 4e-04) for proteins of the SpRT-like family/ SpRT homologs. /note=Transmembrane domains: No transmembrane domains predicted by either TMHMM or TOPCONS, suggesting that this is not a membrane protein. This is unsurprising, as similar proteins have been shown in previous literature to be involved in DNA repair (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5127644/). It is thus reasonable to expect that this protein would not interact with the membrane. /note=Secondary Annotator Name: Lopez, Erick /note=Secondary Annotator QC: According to the evidence presented above, I agree with the legitimacy of this gene and its start site. Additionally, this function call is observed in multiple other instances according to BLAST data and as such it is the correct function call. Great job Loren! CDS 106415 - 106603 /gene="215" /product="gp215" /function="hypothetical protein" /locus tag="PumpkinSpice_215" /note=Original Glimmer call @bp 106415 has strength 7.48; Genemark calls start at 106415 /note=SSC: 106415-106603 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB80_gp103 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 9.62654E-38 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.676, -5.429692635624113, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp103 [Streptomyces phage Karimac] ],,YP_009840349,100.0,9.62654E-38 SIF-HHPRED: SIF-Syn: Pham 1138 is also found in other BE2 phages like Mindflayer and Starbow. For all of these phages, pham 1138 is also adjacent to pham 65280 upstream and pham 12626 downstream. Pham designations as of 05/28/2021. /note=Primary Annotator Name: Chang, Loren /note=Auto-annotation start source: Both Glimmer and GeneMark call the start site to be at 106415. /note=Coding Potential: Coding potential in this ORF is found on the forward strand only, indicating that this is a forward gene. Coding potential is only found in GeneMarkS, and not in GeneMarkHost. All of the coding potential is covered by this ORF. /note=SD (Final) Score: The Final Score for the initial call is -5.430, which is the 2nd highest final score on PECAAN. /note=Gap/overlap: Overlap of 8 bp. Should be a legitimate overlap; is less than 30 bp, and is conserved in other BE2 phages like Birchlyn and Battuta. /note=Phamerator: Pham 1138 (05/23/2021). Is conserved in other BE2 phages like Birchlyn and Battuta. /note=Starterator: Start site 11 was manually annotated in 17 out of 24 of non-draft genes in this pham. Start site 11 is position 106415. This agrees with the predictions made by both Glimmer and GeneMark. /note=Location call: Given the above evidence, this is a real gene, and the most probable start site is at 106415. /note=Function call: No known function. The top PhagesDB hits (E-values 2e-31) have unknown functions. The top NCBI BLAST hits (100% coverage, 50%+ identity, E-values<1.54e-18) also all had unknown functions. Neither HHPred nor CDD gave any significant hits. Thus, there is no known function at this time. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predict the presence of any transmembrane domains, suggesting that this protein is not a membrane protein. /note=Secondary Annotator Name: Nguyen, Jennifer /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 106605 - 106805 /gene="216" /product="gp216" /function="membrane protein" /locus tag="PumpkinSpice_216" /note=Original Glimmer call @bp 106605 has strength 10.66; Genemark calls start at 106605 /note=SSC: 106605-106805 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 7.55013E-39 GAP: 1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.388, -4.2111970702621555, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675348,100.0,7.55013E-39 SIF-HHPRED: SIF-Syn: There is a synteny between PhumpkinSpice and Karimac phages, where this gene (pham 12626) shows a synteny with gene from Karimac genome, and also there is a synteny with upstream (pham 1138) and downstream (pham 10788) genes. All of them are located at the same place and with the same order. /note=Primary Annotator Name: Jakupova, Malika /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start at 106605. /note=Phamerator: pham: 12626. This analysis was run 05/07/21. It is conserved; found in Battuta (BE) and Bmoc (BE). /note=Starterator: Start site 14 in Starterator was manually annotated in 30 of the 32 non-draft genes in this pham. Start 14 is 106605 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Coding Potential: The ORF does have reasonable coding potential only in the Self-Trained output and the chosen start site does include all of the coding potential. The Host Trained output doesn`t show any coding potential. /note=SD (Final) Score: -4.211. It is the best final score on PECAAN and it is the part of LORF. /note=Gap/overlap: 1bp. It is very small gap, might suggest that this gene is a part of an operon system and it is conserved in phages (Bordeaux, MindFlayer) and there is no coding potential in the gap that might be a new gene. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 106605. /note=Function call: NFK. PhagesDB Blast hits show no known function for this gene, with good e-value <10^-32. NCBI Blast only provide hypothetical function for this gene,(identity 99%+, coverage 100%, e-value <10^-38). All of the Hhpred hits have very positive e-values, where some of the functions of those hits are more related to mammalian organisms. CDD doesn`t provide any information about this gene. /note=Transmembrane domains: TMHMM predicts just one TMD. TOPCONS also predicts one TMD. Based on this evidence this gene can be assumed to have a real TMD and is therefore a “membrane protein” since it doesn`t have any consistent function that has been assigned by PhagesDB, NCBI, HHpred. /note=Secondary Annotator Name: Wobig, Nathan /note=Secondary Annotator QC: The evidence here is really strong for gene presence and start site. I agree with Malika. CDS 106828 - 107067 /gene="217" /product="gp217" /function="hypothetical protein" /locus tag="PumpkinSpice_217" /note=Original Glimmer call @bp 106828 has strength 10.08; Genemark calls start at 106828 /note=SSC: 106828-107067 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_216 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 5.24305E-52 GAP: 22 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.314, -4.543412470436616, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_216 [Streptomyces phage Starbow] ],,AXH66682,100.0,5.24305E-52 SIF-HHPRED: SIF-Syn: NFK, pham (10788) is found at the same location and order in PumpkinSpice, Karimac and Bordeax.. Also, genes upstream (pham 12626) and downstream (12989) both show synteny with phage Karimac and Bordeax. /note=Primary Annotator Name: Jakupova, Malika /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start at 106828. /note=Phamerator: pham:10788, this analysis was run 05/07/21; it is found in both (BE) clusters Battuta and Birchlyn. /note=Starterator: start site 17 was found in 21 of 34 ( 61.8% ) of genes in pham, it was manually annotated 10 of 28 genes, this start site 17 corresponds to the start site 106828 in PumpkinSpice. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found only in GeneMarkSelf. GeneMarkHost doesn`t show any coding potential. /note=SD (Final) Score: -4.543. It is the best final score on PECAAN. /note=Gap/overlap: the gap is 22 bp, it is an average gap and it is conserved in several other phages such as Karimac and MindFlayer. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 106828 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: NKF. PhagesDB Blast provides evidence that this gene has no known function, with e-value <10^-42. NCBI BLAST hits only provide information about hypothetical function of this gene with (97%+ identity, 100% coverage and e-value <10^-52). HHpred doesn`t provide any informative information about this gene`s function, where most of its hits have either very positive e-value or states the function of mammalian organisms. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Zorawik, Michelle /note=Secondary Annotator QC: I have reviewed all the evidence and agree with the location call. The gene is real and start site 106828 is the most likely choice based on Starterator, Z-score and Final-score. tRNA 107088 - 107162 /gene="218" /product="tRNA-Leu(caa)" /locus tag="PUMPKINSPICE_218" /note=tRNA-Leu(caa) CDS 107163 - 107390 /gene="219" /product="gp219" /function="hypothetical protein" /locus tag="PumpkinSpice_219" /note=Original Glimmer call @bp 107163 has strength 7.61; Genemark calls start at 107163 /note=SSC: 107163-107390 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 1.34063E-47 GAP: 95 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.34, -4.789449032486301, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675351,100.0,1.34063E-47 SIF-HHPRED: SIF-Syn: NKF. Upstream gene is Pham 10788, downstream gene is Pham 62958, just like in phages MindFlayer, Bordeaux and Karimac. /note=Primary Annotator Name: Taheri, Armin /note=Auto-annotation start source: Glimmer and GeneMark call the start site at 107163. /note=Phamerator: Pham 12989. No function. Conserved in LukeCage, Karimac and Mindflayer. Date: 5/27/2021. /note=Starterator: Site 9, position 107,163 in PumpkinSpice. This is the most annotated start site (manually annotated in 32 of 32 non-draft genes in pham) /note=Coding Potential: Very good typical and atypical forward coding potential on GeneMarkS, contained by the auto-annotated start site. /note=SD (Final) Score: For the auto-annotated start site, the RBS is -4.798 and the z-score is 2.34. These are not the best possible scores. Site 107238 has the best RBS (-4.235) and z-score (2.643). However, this start site results in an unreasonably large gap (170bp). /note=Gap/overlap: 41bp. This is the smallest possible gap and it is conserved in other final genomes, such as phages LukeCage, Karimac, and MindFlayer. /note=Location call: This gene is real with a start site of 107,163. This start site is called by Glimmer and GeneMark, it is the most annotated site on Starterator, and it contains all of the coding potential. /note=Function call: NKF. All strong PhagesDB and NCBI BLAST hits have unknown functions. All strong CDD hits have unknown function. There are no strong HHpred hits. /note=Transmembrane domains: No transmembrane domains predicted by TmHmm or Topcons. /note=Secondary Annotator Name: Bang, Clara /note=Secondary Annotator QC: I agree with Armin`s calls that the gene is real and that the start site is 107163. Overall, Armin provides compelling evidence above and nothing else needs to be added! tRNA 107397 - 107488 /gene="220" /product="tRNA-Ser(gct)" /locus tag="PUMPKINSPICE_220" /note=tRNA-Ser(gct) CDS 107568 - 107708 /gene="221" /product="gp221" /function="hypothetical protein" /locus tag="PumpkinSpice_221" /note=Original Glimmer call @bp 107568 has strength 5.35; Genemark calls start at 107568 /note=SSC: 107568-107708 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB80_gp099 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 2.8295E-25 GAP: 177 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.165, -4.4057176022952405, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp099 [Streptomyces phage Karimac] ],,YP_009840353,100.0,2.8295E-25 SIF-HHPRED: SIF-Syn: NKF. Upstream gene is Pham 12989, downstream gene is Pham 18569, just like in phages Karimac, MindFlayer, and Birchlyn. /note=Primary Annotator Name: Taheri, Armin /note=Auto-annotation start source: Glimmer and GeneMark call the start site at 107,568. /note=Phamerator: Pham 59901. Conserved in phages Bordeaux, Karimac and LukeCage. No function. Date: 5/27/2021. This pham is different than the one listed on PECAAN (Pham 62958) and does not have a Starterator report yet. /note=Starterator: (based on report for pham 62958) site 15, position 107,568 in PumpkinSpice. This is the most annotated start site (manually annotated in 17 of 31 non-draft genes in pham). /note=Coding Potential: Good typical and atypical coding potential that covers the entire ORF. The typicaly potential is contained by the auto-annotated start, but the atypical coding potential extends to the stop site of the upstream gene. /note=SD (Final) Score: For the auto-annotated start site, the RBS is -4.406 and the z-score is 2.165. These are not the best possible scores. Site 107,505 has the best RBS (-3.422) and z-score (2.636). /note=Gap/overlap: 177bp. Start site 107,505 results in a longer ORF (204bp) and smaller gap (114bp). The auto-annotated gap is conserved in phages Starbow, Karimac, and Birchlyn. /note=Location call: This gene is real, with a start site of 107,568. This start site is called by Glimmer and GeneMark, and it is the only manually annotated site on Starterator. /note=Function call: NKF. All strong PhagesDB and NCBI BLAST hits have unknown function. There are no significant HHpred hits and no CDD hits. /note=Transmembrane domains: No transmembrane domains predicted by TmHmm and Topcons. /note=Secondary Annotator Name: Lopez, Erick /note=Secondary Annotator QC: Really great notes, I agree with the legitimacy and start site of this gene. After reviewing the blast evidence, I also agree with the function call for this gene. CDS 107725 - 107892 /gene="222" /product="gp222" /function="hypothetical protein" /locus tag="PumpkinSpice_222" /note=Original Glimmer call @bp 107683 has strength 10.61; Genemark calls start at 107725 /note=SSC: 107725-107892 CP: yes SCS: both-gm ST: SS BLAST-Start: [hypothetical protein HWB77_gp100 [Streptomyces phage StarPlatinum] ],,NCBI, q1:s1 100.0% 4.12472E-30 GAP: 16 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.257, -4.27608723048325, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB77_gp100 [Streptomyces phage StarPlatinum] ],,YP_009839626,100.0,4.12472E-30 SIF-HHPRED: SIF-Syn: NKF gene in pham 18569, upstream gene is in pham 62958 and downstream gene is in pham 16100 just like in Karimac and Bordeaux /note=Primary Annotator Name: Namaganda, Samali /note=Auto-annotation start source: Glimmer and GeneMark call different start sites. Genemark calls 107683 and Glimmer calls 107725. Starterator and Pham maps point at the GeneMark call being right, because the gap is conserved in other BE2 phages and it is the most annotated start site in other BE2 phages. /note=Phamerator: As of 5/23/21, the gene belongs to pham 18,569. It has 63 members, 14 of which are drafts. This pham is conserved in other BE cluster phages such as TomSawyer, Wipeout and Yaboi. /note=Starterator: Analysis was run on 05/07/21. Starterator points to 107725 being the start site. It is the most annotated start site in the pham; annotated 17 of 49 non-draft genes. It is also the most annotated start site in BE2 subcluster. /note=Coding Potential: The gene has both atypical and typical coding potential in the forward direction in the self-trained genemark file. It does not have any coding potential in the host-trained GeneMark File /note=SD (Final) Score: Even though the Gimmer start site call has the best RBS score (-3.418), it results in a 26 bp overlap that is not conserved in all the other BE2 phages. The Genemark call has the fourth best RBS score (-4.276) which results in a gap that is conserved in other BE2 phages and is the most annotated start site in starterator. It also has the fourth best Z-score of 2.257. /note=Gap/overlap: There is a 16 bp gap that is conserved in other BE2 phages and does not leave out any area of coding potential. /note=Location call: Based on the above evidence, this is a real gene with GeneMark start site 107725. Start site 107725 was also the most manually annotated in subcluster BE2. /note=Function call: There is yet to be enough data to hypothesize the gene function. All the significant hits (E values < 1e-20) in PhagesDB Blastp and the NCBI Hit Gene product did not have a function. As of 5/16/21, there were no significant hits in HHpred and CDD databases. /note=Transmembrane domains: As of 5/13/21 there are no TMHs called by TmHmm /note=Secondary Annotator Name: Nguyen, Jennifer /note=Secondary Annotator QC: I disagree with this annotation and would argue that start site 107683 predicted by Glimmer should be the selected start because the Starterator report shows the two sites having a comparable number of MAs (13 vs 17), but the track that PumpkinSpice lies on (Track 12) shows that start site 11 is significantly farther up the lane compared to start 16. Also, looking at the self-trained coding potential, start site 16 (107725) would not actually cover the entire coding potential region. /note=Note: There was not enough compelling evidence for either start site, and after consulting with the professor we went with Start site 16 because it was the most manually annotated in BE2 and the gap is conserved in another BE2 phages. CDS 107889 - 108272 /gene="223" /product="gp223" /function="hypothetical protein" /locus tag="PumpkinSpice_223" /note=Original Glimmer call @bp 107889 has strength 13.51; Genemark calls start at 107889 /note=SSC: 107889-108272 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 3.4731E-88 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.999, -2.6623718267059564, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_099970925,100.0,3.4731E-88 SIF-HHPRED: SIF-Syn: NKF gene in pham 16100, upstream gene is in pham 18569 and downstream gene is in pham 14729 just like in Karimac and Bordeaux /note=Primary Annotator Name: Namaganda, Samali /note=Auto-annotation start source: Glimmer and Genemark both call 10799 as the start site. /note=Phamerator: As of 5/13/21, the gene is part of pham 16100. This pham is conserved and found in other BE2 phages such as Battuta, TomSawyer and Yaboi. The pham has 48 members, 8 are drafts /note=Starterator: The analysis was run of 05/07/20, the most annotated start site is 107,889 (start 2) . It is manually annotated 28/33. /note=Coding Potential: The gene has both typical and atypical coding potential in the forward direction in the self-trained genemark file. There is no coding potential in the host-trained genemark file. /note=SD (Final) Score: The current start site has the best RBS score of -2.662 and the highest Z-score of 2.999. /note=Gap/overlap: There a 4bp overlap that is conserved in other BE2 phages, this indicates that the gene might be part of an operon. /note=Location call: Based on the data above, the gene is real and the start site is 107,889. Starterator agrees with Glimmer and GeneMark /note=Function call: There is yet to be enough data to hypothesize the gene function. All the significant hits (E values < 1e-20) in PhagesDB Blastp and the NCBI Hit Gene product did not have a function. As of 5/14/21, there were no significant hits in HHpred and CDD databases. /note=Transmembrane domains: As of 3/24/21, there are no TmHmm TMH calls /note=Secondary Annotator Name: Wobig, Nathan /note=Secondary Annotator QC: I agree. The evidence for gene realness and start site are sufficient and sufficiently presented. CDS 108275 - 108538 /gene="224" /product="gp224" /function="hypothetical protein" /locus tag="PumpkinSpice_224" /note=Original Glimmer call @bp 108275 has strength 11.03; Genemark calls start at 108275 /note=SSC: 108275-108538 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ICHABODCRANE_220 [Streptomyces phage IchabodCrane] ],,NCBI, q1:s3 100.0% 4.60934E-57 GAP: 2 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.056, -5.383507109175471, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ICHABODCRANE_220 [Streptomyces phage IchabodCrane] ],,QFP97498,97.7528,4.60934E-57 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Ali Pour, Paria /note=Auto-annotation start source: Both Glimmer and GeneMark call the gene and they agree on the start site at 108,275 bp. /note=Phamerator: Pham 14729. Date 04/26/2021. The gene is conserved in phages Battuta, Birchlyn, BoomerJR, Bordeaux, Enygma, Genie2, IchabodCrane, JimJam, Karimac, LukeCage, MindFlayer, Quaran19, SaltySpitoon, Starbow, StarPlatinum, Tomas, TomSawyer, Wipeout, Wofford, Yaobi, which are all in the same cluster (BE) as PumpkinSpice. /note=Starterator: Start site 4 in Starterator was manually annotated in 13/15 non-draft genes in this pham. Start 4 is 108275 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Coding Potential: Coding potential for GeneMark Self but not host in the second frame of the forward strand. /note=SD (Final) Score: -5.384. It is the best final score on PECAAN.` /note=Gap/overlap: 2 bp. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 108,275 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: /note=Transmembrane domains: /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 108538 - 108792 /gene="225" /product="gp225" /function="hypothetical protein" /locus tag="PumpkinSpice_225" /note=Original Glimmer call @bp 108538 has strength 11.65; Genemark calls start at 108538 /note=SSC: 108538-108792 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 4.78457E-52 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.479, -3.8116689268127657, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_099970926,100.0,4.78457E-52 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Ali Pour, Paria /note=Auto-annotation start source: Both Glimmer and GeneMark call the gene and they agree on the start site at 108,538 bp. /note=Phamerator: Pham 16589. Date 04/26/2021. The gene is conserved in phages Batholomune, Battuta, Birchlyn, Bmoc, BoomerJR, Bordeaux, Braelyn, Cross, Daubenski, EGole, Enygma, Evy, Genie2, IchabodCrane, Jay2Jay, JimJam, Karimac, LilMartin, LukeCage, Mildred21, MindFlayer, MulchMansion, NootNoot, Paradiddles, Peebs, Quaran19, SaltySpitoon, Samisti12, Starbow, Sushi23, Targaryen, Teutsch, Tomas, TomSawyer, Tribute, Warpy, Wipeout, Wofford, Yaboi, which are all in the same cluster (BE) as PumpkinSpice. /note=Starterator: The start number called the most often in the published annotations is 4, it was called in 31 of the 32 non-draft genes in the pham. Start 4 corresponds to 108,694 bp in PumpkinSpice; however, this start site is not the best for PumpkinSpice because it leads to a 155 bp gap and the worst z-score and final score out of the six possible start sites. Start site 2, called by another non-draft phage, Birchlyn_226, from the same subcluster as PumpkinSpice, has been called as start site for this gene. Glimmer and Genemark and the rest of the evidence support start site 2 at 108,538 bp as the start site for this gene. /note=Coding Potential: Coding potential for GeneMark Self but not host in the first frame of the forward strand. /note=SD (Final) Score: -3.812. It is the best final score on PECAAN. /note=Gap/overlap: 12 bp. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 108,538 bp. /note=Function call: /note=Transmembrane domains: /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 108891 - 109019 /gene="226" /product="gp226" /function="membrane protein" /locus tag="PumpkinSpice_226" /note=Genemark calls start at 108840 /note=SSC: 108891-109019 CP: no SCS: genemark-cs ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_225 [Streptomyces phage Starbow] ],,NCBI, q1:s18 100.0% 3.76338E-19 GAP: 98 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.825, -5.03807843799195, no F: membrane protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_225 [Streptomyces phage Starbow] ],,AXH66689,71.1864,3.76338E-19 SIF-HHPRED: SIF-Syn: /note=Membrane protein - both TMHMM and SOSUI detect 1 TMD /note=AF /note=Start 4: 108891 /note=• Found in 17 of 17 ( 100.0% ) of genes in pham /note=• Manual Annotations of this start: 7 of 11 /note=• Called 41.2% of time when present CDS 109006 - 109170 /gene="227" /product="gp227" /function="hypothetical protein" /locus tag="PumpkinSpice_227" /note=Original Glimmer call @bp 109006 has strength 3.97; Genemark calls start at 109006 /note=SSC: 109006-109170 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein HWB80_gp093 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 5.7293E-32 GAP: -14 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.214, -2.66069824281639, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp093 [Streptomyces phage Karimac] ],,YP_009840359,100.0,5.7293E-32 SIF-HHPRED: SIF-Syn: /note=Start 3 (109006) /note=• Found in 44 of 44 ( 100.0% ) of genes in pham /note=• Manual Annotations of this start: 35 of 36 /note=• Called 93.2% of time when present /note=Coding Potential: Coding potential for GeneMark Self but not host in the first frame of the forward strand. CDS 109170 - 109562 /gene="228" /product="gp228" /function="hypothetical protein" /locus tag="PumpkinSpice_228" /note=Original Glimmer call @bp 109170 has strength 13.35; Genemark calls start at 109170 /note=SSC: 109170-109562 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_227 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 2.36451E-90 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.367, -4.255178910020749, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_227 [Streptomyces phage Starbow] ],,AXH66691,100.0,2.36451E-90 SIF-HHPRED: SIF-Syn: /note=AF /note=Start 4: 109170 /note=• Found in 35 of 36 ( 97.2% ) of genes in pham /note=• Manual Annotations of this start: 27 of 29 /note=• Called 97.1% of time when present CDS 109573 - 109941 /gene="229" /product="gp229" /function="deoxycytidylate deaminase" /locus tag="PumpkinSpice_229" /note=Original Glimmer call @bp 109573 has strength 8.66; Genemark calls start at 109573 /note=SSC: 109573-109941 CP: no SCS: both ST: NI BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 9.93291E-85 GAP: 10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.77, -3.588861267572955, yes F: deoxycytidylate deaminase SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_099970927,100.0,9.93291E-85 SIF-HHPRED: c.97.1.2 (A:) Deoxycytidylate deaminase {Bacteriophage T4 [TaxId: 10665]},,,d1vq2a_,90.1639,99.8 SIF-Syn: /note=Start 84 (109573) /note=• Found in 19 of 594 ( 3.2% ) of genes in pham /note=• Manual Annotations of this start: 13 of 558 /note=• Called 100.0% of time when present /note= /note=all databases gave evidence for deoxycytidylate deaminase CDS 109996 - 110766 /gene="230" /product="gp230" /function="metallophosphoesterase" /locus tag="PumpkinSpice_230" /note=Original Glimmer call @bp 109996 has strength 11.85; Genemark calls start at 109996 /note=SSC: 109996-110766 CP: yes SCS: both ST: SS BLAST-Start: [phosphoesterase [Streptomyces phage Birchlyn] ],,NCBI, q1:s1 100.0% 0.0 GAP: 54 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.203, -2.6835611257758942, yes F: metallophosphoesterase SIF-BLAST: ,,[phosphoesterase [Streptomyces phage Birchlyn] ],,QDF17354,100.0,0.0 SIF-HHPRED: hypothetical protein TT1561; Metallo-dependent phosphatases, Thermus thermophilus, structural genomics, RIKEN Structural Genomics/Proteomics Initiative, RSGI, UNKNOWN FUNCTION; HET: MSE, CA; 2.1A {Thermus thermophilus} SCOP: d.159.1.6,,,1UF3_C,79.6875,99.9 SIF-Syn: /note=first start site manually annotated 57 times. CDS 110853 - 111206 /gene="231" /product="gp231" /function="hypothetical protein" /locus tag="PumpkinSpice_231" /note=Original Glimmer call @bp 110853 has strength 16.71; Genemark calls start at 110853 /note=SSC: 110853-111206 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 2.3877E-77 GAP: 86 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.066, -3.748312656400878, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675354,100.0,2.3877E-77 SIF-HHPRED: SIF-Syn: /note=Only 1 SS choice. CDS 111298 - 111537 /gene="232" /product="gp232" /function="hypothetical protein" /locus tag="PumpkinSpice_232" /note=Original Glimmer call @bp 111298 has strength 8.24; Genemark calls start at 111298 /note=SSC: 111298-111537 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 4.85856E-49 GAP: 91 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.214, -2.66069824281639, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_180304112,100.0,4.85856E-49 SIF-HHPRED: SIF-Syn: /note=start called in 64 of the 69 non-draft genes in pham CDS 111521 - 111832 /gene="233" /product="gp233" /function="WhiB family transcription factor" /locus tag="PumpkinSpice_233" /note=Original Glimmer call @bp 111494 has strength 4.22; Genemark calls start at 111521 /note=SSC: 111521-111832 CP: no SCS: both-gm ST: SS BLAST-Start: [WhiB family transcriptional regulator [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 1.58171E-71 GAP: -17 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.723, -4.7673377397426835, no F: WhiB family transcription factor SIF-BLAST: ,,[WhiB family transcriptional regulator [Streptomyces sp. JV178] ],,WP_180304113,100.0,1.58171E-71 SIF-HHPRED: Transcriptional regulator WhiB1; nitric oxide, sigmaA, iron-sulfur, tuberculosis, Wbl protein, SIGNALING PROTEIN; HET: SF4; NMR {Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv)},,,5OAY_A,75.7282,99.8 SIF-Syn: /note=(Start: 26 @111494 has 9 MA`s), (Start: 29 @111521 has 55 MA`s) CDS 111829 - 112137 /gene="234" /product="gp234" /function="hypothetical protein" /locus tag="PumpkinSpice_234" /note=Original Glimmer call @bp 111829 has strength 3.8; Genemark calls start at 111835 /note=SSC: 111829-112137 CP: no SCS: both-gl ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 9.28135E-71 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.066, -3.3503726477288405, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675357,100.0,9.28135E-71 SIF-HHPRED: SIF-Syn: /note=(Start: 11 @111829 has 41 MA`s) CDS 112118 - 112276 /gene="235" /product="gp235" /function="hypothetical protein" /locus tag="PumpkinSpice_235" /note=Original Glimmer call @bp 112118 has strength 7.71; Genemark calls start at 112175 /note=SSC: 112118-112276 CP: yes SCS: both-gl ST: NI BLAST-Start: [hypothetical protein SEA_STARBOW_234 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 1.22298E-26 GAP: -20 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.162, -4.394817490717228, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_234 [Streptomyces phage Starbow] ],,AXH66698,100.0,1.22298E-26 SIF-HHPRED: SIF-Syn: /note=(Start: 4 @112100 has 28 MA`s) - this site includes all atypical coding potential but makes a bigger overlap with last gene /note=(Start: 5 @112118 has 10 MAs) - less overlap but leaves off some CP. /note=Choosing 112118 since it has better stats. CDS 112273 - 112503 /gene="236" /product="gp236" /function="hypothetical protein" /locus tag="PumpkinSpice_236" /note=Original Glimmer call @bp 112273 has strength 3.05 /note=SSC: 112273-112503 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_235 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 2.82132E-49 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.012, -5.6962627279776274, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_235 [Streptomyces phage Starbow] ],,AXH66699,100.0,2.82132E-49 SIF-HHPRED: SIF-Syn: /note=(Start: 1 @112273 has 49 MA`s. /note=operon, maybe CDS 112593 - 112841 /gene="237" /product="gp237" /function="hypothetical protein" /locus tag="PumpkinSpice_237" /note=Original Glimmer call @bp 112593 has strength 6.66; Genemark calls start at 112593 /note=SSC: 112593-112841 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_BIRCHLYN_238 [Streptomyces phage Birchlyn] ],,NCBI, q1:s1 100.0% 3.01398E-52 GAP: 89 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.611, -4.699110570850454, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BIRCHLYN_238 [Streptomyces phage Birchlyn] ],,QDF17359,100.0,3.01398E-52 SIF-HHPRED: SIF-Syn: /note=Start: 4 @112593 has 20 MA`s; called 100% time when present CDS 112873 - 113274 /gene="238" /product="gp238" /function="hypothetical protein" /locus tag="PumpkinSpice_238" /note=Original Glimmer call @bp 112873 has strength 4.01; Genemark calls start at 112873 /note=SSC: 112873-113274 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 8.55289E-87 GAP: 31 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.621, -6.56527855705493, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675360,99.2481,8.55289E-87 SIF-HHPRED: SIF-Syn: /note=(Start: 2 @112873 has 37 MA`s), called 87% time when present CDS 113274 - 113573 /gene="239" /product="gp239" /function="membrane protein" /locus tag="PumpkinSpice_239" /note=Original Glimmer call @bp 113271 has strength 5.43; Genemark calls start at 113271 /note=SSC: 113274-113573 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s2 100.0% 1.64585E-66 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.577, -3.8158758578300507, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675361,99.0,1.64585E-66 SIF-HHPRED: SIF-Syn: membrane protein, upstream gene is NKF (pham 64486), downstream gene is NKF (pham 7714), just like in Karimac (BE2) and Genie2 (BE2). /note=(Start: 2 @113271 has 17 MA`s), (Start: 3 @113274 has 48 MA`s), /note=Switching to start 3 (113274), as -1 thought to be more likely start site for a operon -AF -- /note=Primary Annotator Name: Bang, Clara /note=Auto-annotation start source: Glimmer and Genemark both call the start site as 113271. /note=Phamerator: This gene is in pham 53237 as of 4/23/21. It is conserved and also found in Annadreamy (BK) and Battuta (BE) /note=Coding Potential: In the forward direction, Host-Trained showed no coding potential within this region; however, Self-Trained GeneMark showed high coding potential within the start and stop site within the forward the direction. /note=SD (Final) Score: The final score for this gene is -3.465 and the z-score is 2.577. This is the best score. /note=Gap/overlap: There is an overlap of -4 bp, which is a reasonable overlap. This may be because this start site is part of an Operon. This overlap is conserved in IchabodCrane and Wipeout. /note=Location call: Based on the evidence above, this gene is real and starts at 113271 which is called by both Glimmer and GeneMark and has good coding potential. /note=Function call: No known function. The top three non-draft phages from Phagesdb BLAST hits have no known function with e-values less than 2 e-31. The top three NCBI BLAST hits also call no known function, with the highest hits having 98% coverage, 99% identity and e-value of 1 e-59. CDD had one hit, placing this gene in the Smc super family, but the e-value is too high to be considered relevant (9.30 e-3). HHpred had no relevant hits for this gene. /note=Transmembrane domains: TMHMM predicts just one TMD. TOPCONS also predicts one TMD. Based on this evidence this gene can be assumed to have a real TMD and is therefore a “membrane protein." /note=Secondary Annotator Name: Zorawik, Michelle /note=Secondary Annotator QC: I have reviewed all the evidence and agree with the location call. The gene is real and start site 113271 is most likely. CDS 113605 - 113796 /gene="240" /product="gp240" /function="hypothetical protein" /locus tag="PumpkinSpice_240" /note=Original Glimmer call @bp 113605 has strength 9.86; Genemark calls start at 113605 /note=SSC: 113605-113796 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 3.03361E-39 GAP: 31 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.311, -2.45827804503836, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_099970930,100.0,3.03361E-39 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Lopez, Erick /note=Auto-annotation start source: Both Glimmer and Genemark call 113605 /note=Phamerator: Pham 62843 as of 5/7/2021. Gene is conserved in the phage Bordeaux, both upstream and downstream genes are conserved. /note=Starterator: Start site 11 in starterator was manually annotated in 22/24 non-draft genes. This start site was the most often called site. This evidence agrees with Glimmer and Genemark. /note=Coding Potential: No coding potential found in the the host-trained file. There is however self-trained coding potential. The lack of host-trained coding potential appears to be a result of a formula issue with the identifying of host-trained coding potential. /note=SD (Final) Score: -2.458 This was the best (and only) final score on PECAAN /note=Gap/overlap: 31 - Not large enough for another gene and additionally this gap is conserved in phage Battuta. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 113,605. /note=Function call: NKF /note=Transmembrane domains: No transmembrane domains found. /note=Secondary Annotator Name: Nguyen, Jennifer /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. For the coding potential, maybe clarify that the self-trained coding potential is reasonable and covered by the chosen start site. Also, the starterator drop-down menu has not been checked yet! CDS 113800 - 114015 /gene="241" /product="gp241" /function="hypothetical protein" /locus tag="PumpkinSpice_241" /note=Original Glimmer call @bp 113800 has strength 5.71; Genemark calls start at 113800 /note=SSC: 113800-114015 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_240 [Streptomyces phage Starbow]],,NCBI, q1:s1 100.0% 2.23021E-44 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.546, -5.699688397548125, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_240 [Streptomyces phage Starbow]],,AXH66704,98.5916,2.23021E-44 SIF-HHPRED: SIF-Syn: Pham 12879 (as of 5/26/2021), upstream is Pham 15314 with NKF (as of 5/26/2021), downstream is Pham 64486 with NKF (as of 5/26/2021), just like in phages Battuta, Karimac, and Starbow /note=Primary Annotator Name: Nguyen, Jennifer /note=Auto-annotation start source: Both Glimmer and GeneMark call the gene and they agree on the start site at 113800 bp. /note=Phamerator: Pham number 12879 as of 5/1/2021. The gene is conserved in phages Starbow, TomSawyer, and Wipeout, all belonging to the same subcluster BE2 as PumpkinSpice. /note=Starterator: Start site 1 in Starterator was manually annotated in 15/15 non-draft genes in this pham. Start 1 is 113800 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Coding Potential: The ORF does have reasonable coding potential and the chosen start site does include all of the coding potential. /note=SD (Final) Score: -5.700. This is the worst final score on PECAAN of the three candidates, but this auto-annotated site has the longest ORF and significantly smaller gap. /note=Gap/overlap: There is a 3 bp gap, which is reasonable as this gap is less than 30 bp and it is the smallest gap amongst the gene candidates. This gene is also conserved in several other phages and the gap was seen in the other phages as well, such as phages Starbow, TomSawyer, and Wipeout. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 113800. Starterator data agrees with both Glimmer and Genemark. /note=Function call: The function is unknown. The top hits for PhagesDB BLAST have an unknown function (E-value < 2e-37) and the top hits for NCBI BLAST also have an unknown function or hypothetical protein (100% coverage, E-value < 2e-55, and 98.5%+ identity). HHpred and CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predicted any TMDs. This does not meet the evidence requirements to call it a membrane protein. /note=Secondary Annotator Name: Lopez, Erick /note=Secondary Annotator QC: Based on all the evidence Jennifer has listed in her annotation notebook, it is safe to assume that I agree with her location call in that this gene is real. Although a lot of the other evidence such as coding potential and and autoannotated start calls are present, the starterator information along with Final score provided concrete evidence to agree with the previous statement. Evidence is solid and every piece of evidence is explained in detail. CDS 114026 - 114469 /gene="242" /product="gp242" /function="hypothetical protein" /locus tag="PumpkinSpice_242" /note=Original Glimmer call @bp 114026 has strength 7.93; Genemark calls start at 114026 /note=SSC: 114026-114469 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ICHABODCRANE_238 [Streptomyces phage IchabodCrane] ],,NCBI, q1:s1 100.0% 9.59048E-103 GAP: 10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.976, -5.2484617369160205, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ICHABODCRANE_238 [Streptomyces phage IchabodCrane] ],,QFP97516,99.3197,9.59048E-103 SIF-HHPRED: SIF-Syn: NKF (pham 15314), upstream is pham 12879 and downstream is pham 871, just like in Bordeaux (BE2) and Battuta (BE2) /note=Primary Annotator Name: Wobig, Nathan /note=Auto-annotation start source: Glimmer and GeneMark call at 114026 /note=Phamerator: pham: 15314. Date 04/21/2021. It is conserved; found in Bordeaux (BE2) and Battuta (BE2) /note=Starterator: Start site 4 in Starterator was manually annotated in 12 out of 15 non-draft genes in this pham. Start 4 is 114026 in PumpkinSpicec. This evidence agrees with the site predicted by Glimmer and GeneMark /note=Coding Potential: High coding potential for GeneMark Self but not host in the second frame of the forward strand. /note=SD (Final) Score: -5.248. This is not the best final score, but this start site is the LORF and provides a gene length and gap that is consistent with genes of the same pham in other phages. /note=Gap/overlap: The gap is 10, which is a reasonable size. /note=Location call: Based on the above evidence, this most likely is a real gene with start site 114026. /note=Function call: Both NCBI and PhagesDB BLAST solely gave strong hits for other genes of NKF. No good hits (i.e. e = 0.7) with functions were found. CDD and HHpred had no useful hits. /note=Transmembrane domains: Neither TmHmm nor Topcons call anything. /note=Secondary Annotator Name: Nguyen, Jennifer /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. For the gap, I would mention other phages that observed this gap and it is the smallest gap. tRNA 114491 - 114571 /gene="243" /product="tRNA-Gln(ttg)" /locus tag="PUMPKINSPICE_243" /note=tRNA-Gln(ttg) CDS 114590 - 114766 /gene="244" /product="gp244" /function="hypothetical protein" /locus tag="PumpkinSpice_244" /note=Original Glimmer call @bp 114590 has strength 10.99; Genemark calls start at 114590 /note=SSC: 114590-114766 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein SEA_STARBOW_243 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 3.40853E-32 GAP: 120 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.987, -4.7784408226142965, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_243 [Streptomyces phage Starbow] ],,AXH66706,100.0,3.40853E-32 SIF-HHPRED: SIF-Syn: NKF (Pham 871); the upstream gene has NKF and belongs to pham 15314, the downstream gene has NKF and belongs to pham 16784. The same order of genes with the corresponding phams is observed in phage IchabodCrane (BE). /note=Primary Annotator Name: Zorawik, Michelle /note=Auto-annotation start source: The gene was called by Glimmer and GeneMark, which agreed on the start site 114590. /note=Phamerator: The gene belongs to Pham 871 (4/23/21) and is conserved in other BE2 phages, such as Battuta and Bordeaux. /note=Starterator: Start site 1, which corresponds to 114590 in PumpkinSpice, was manually annotated in 10/10 non-draft genomes. It is the only start site choice and was predicted by Glimmer and GeneMark. /note=Coding Potential: Reasonable coding potential within the putative ORF in the forward direction that is covered by the chosen start site. /note=SD (Final) Score: -4.778, which is the only SD score available. /note=Gap/overlap: The 120bp gap is reasonable and results in the longest ORF. Conserved in other phages like Bordeaux. /note=Location call: In consideration of the above evidence, this is a real gene and the start site is 114590, which is the only but rational option. /note=Function call: NKF. The top two hits in PhagesDB have unknown functions (e-value 3e-27). The top hits for NCBI also have hypothetical functions (e-value below 9.47e-32, identity > 98.28%, 100% coverage). There are no informative CDD or HHpred hits that support function calls. /note=Transmembrane domains: There is no evidence to suggest that this is a membrane protein. Neither TMHMM nor TOPCON predicted any TMDs. /note=Secondary Annotator Name: Nguyen, Jennifer /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 114763 - 115008 /gene="245" /product="gp245" /function="hypothetical protein" /locus tag="PumpkinSpice_245" /note=Original Glimmer call @bp 114763 has strength 10.79; Genemark calls start at 114763 /note=SSC: 114763-115008 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 6.8022E-53 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.049, -3.30700010583914, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675364,100.0,6.8022E-53 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF (pham 1779), downstream gene is NKF (pham 871), just like in phages Karimac (BE2) and MindFlayer (BE2). /note=Primary Annotator Name: Bang, Clara /note=Auto-annotation start source: Both Glimmer and GeneMark called the start site as 114763. /note=Phamerator: This gene is in pham 16784 as of 4/23/21. This gene is conserved in Cross (BE) and Evy (BE). /note=Starterator: Start site 3 was manually annotated in 29/29 non-draft genes in this pham. Start site 3 is 114763 which agrees with Glimmer and GeneMark. /note=Coding Potential: For Host-Trained GeneMark, there was no coding potential in the forward direction within this region; however, for Self-Trained GeneMark, there is high coding potential within this region. /note=SD (Final) Score: -3.307. This is the best SD score. /note=Gap/overlap: -4 bp. This is a reasonable gap because this start site may be part of an OPERON. This gap is also conserved in phages that also have this gene (Mindflayer and Starbow). There is also no coding potential between genes. /note=Location call: This gene is real and most likely starts at 114763 bp, which includes all coding potential for this gene. /note=Function call: No known function. The top three non-draft phages from Phagesdb BLAST hits have no known function with e-values of 1 e-42. The top three NCBI BLAST hits also call no known function, with the highest hits having 100% coverage, 100% identity and e-value of 6 e-53. CDD and HHpred had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Wobig, Nathan /note=Secondary Annotator QC: The evidence for the realness of this gene and its start site are strong. The Starterator box should be checked. CDS 115010 - 115213 /gene="246" /product="gp246" /function="hypothetical protein" /locus tag="PumpkinSpice_246" /note=Original Glimmer call @bp 115010 has strength 4.69; Genemark calls start at 115010 /note=SSC: 115010-115213 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_WIPEOUT_238 [Streptomyces phage Wipeout] ],,NCBI, q1:s1 100.0% 2.53266E-41 GAP: 1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.663, -3.364992052816827, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_WIPEOUT_238 [Streptomyces phage Wipeout] ],,QGH74445,100.0,2.53266E-41 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Lopez, Erick /note=Auto-annotation start source: Yes, both Glimmer and Genemark called 115010. /note=Phamerator: Belongs to Pham 1779 as of 5/7/2021. This gene is conserved in phage Mindflyer (BE). /note=Starterator: Start site 7 annotated 100% of time. /note=Coding Potential: Coding potential was only found in self-trained genemark. Host-trained genemark showed no coding potential. /note=SD (Final) Score: -3.365, this was the best final score. /note=Gap/overlap:1, this gap is not big enough for another gene before. /note=Location call: Based on the evidence above, this is a real gene and the start site is indeed 115,010. /note=Function call: NKF /note=Transmembrane domains: No transmembrane domains found. /note=Secondary Annotator Name: Wobig, Nathan /note=Secondary Annotator QC: I agree with Erick; the evidence is sufficiently strong for those sites; starterator box must be checked. CDS 115246 - 115455 /gene="247" /product="gp247" /function="hypothetical protein" /locus tag="PumpkinSpice_247" /note=Original Glimmer call @bp 115246 has strength 7.66; Genemark calls start at 115246 /note=SSC: 115246-115455 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB77_gp076 [Streptomyces phage StarPlatinum] ],,NCBI, q1:s1 100.0% 5.48667E-42 GAP: 32 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.203, -2.9845911214398755, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB77_gp076 [Streptomyces phage StarPlatinum] ],,YP_009839650,98.5507,5.48667E-42 SIF-HHPRED: SIF-Syn: Pham 5178 (as of 5/26/2021), upstream is Pham 8220 with NKF (as of 5/26/2021), downstream is Pham 1779 with NKF (as of 5/26/2021), just like in phages Battuta, Genie2, and Yaboi /note=Primary Annotator Name: Nguyen, Jennifer /note=Auto-annotation start source: Both Glimmer and GeneMark call the gene and they agree on the start site at 115246 bp. /note=Phamerator: Pham number 5178 as of 5/1/2021. The gene is conserved in phages Genie2, LukeCage, and Starbow, all belonging to the same subcluster BE2 as PumpkinSpice. /note=Starterator: Start site 3 in Starterator was manually annotated in 21/41 non-draft genes in this pham (100% of time when present). Start 3 is 115246 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Coding Potential: The ORF does have reasonable coding potential and the chosen start site does include all of the coding potential. /note=SD (Final) Score: -2.985. This is the best final score on PECAAN. /note=Gap/overlap: There is a 32 bp gap, which is reasonable as this is the smallest gap amongst the gene candidates. This gene is also conserved in several other phages and the gap was seen in the other phages as well, such as phages Genie2, LukeCage, and Starbow. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 115246. Starterator data agrees with both Glimmer and Genemark. /note=Function call: The function is unknown. The top hits for PhagesDB BLAST have an unknown function (E-value < 1e-33) and the top hits for NCBI BLAST also have an unknown function or hypothetical protein (100% coverage, E-value < 6e-42, and 98.55%+ identity). HHpred and CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predicted any TMDs. This does not meet the evidence requirements to call it a membrane protein. /note=Secondary Annotator Name: Wobig, Nathan /note=Secondary Annotator QC: I agree with the evidence provided by Jennifer on both the realness of the gene and its start site. CDS 115452 - 115721 /gene="248" /product="gp248" /function="hypothetical protein" /locus tag="PumpkinSpice_248" /note=Original Glimmer call @bp 115452 has strength 10.02; Genemark calls start at 115452 /note=SSC: 115452-115721 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_BIRCHLYN_249 [Streptomyces phage Birchlyn] ],,NCBI, q1:s1 100.0% 2.68019E-57 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.21, -4.760068129112225, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BIRCHLYN_249 [Streptomyces phage Birchlyn] ],,QDF17368,100.0,2.68019E-57 SIF-HHPRED: SIF-Syn: NKF (pham 8220), upstream is pham 5178 and downstream is pham 57798, just as in Bordeaux (BE2) and Battuta (BE2) /note=Primary Annotator Name: Wobig, Nathan /note=Auto-annotation start source: Glimmer and GeneMark call at 115452 /note=Phamerator: pham: 8220. Date 04/21/2021. It is conserved; found in Bordeaux (BE2) and Battuta (BE2) /note=Starterator: Start site 4 (115449) in Starterator was manually annotated in 5 out of 49 non-draft genes in this pham. It is called 43.8% of the time when called. This evidence agrees with the site predicted by Glimmer and GeneMark and has a good overlap of 4 bp. Start site 5 (115452) is called in 20/65 genomes and called 68.3% of the time when present, but conflicts with other evidence (i.e. Glimmer, GeneMark, overlap) /note=Coding Potential: high coding potential for GeneMark Self in the third frame of the forward strand. but no potential in host GeneMark /note=SD (Final) Score: -5.538 with Z-score of 2.21. /note=Gap/overlap: Overlap of -7. Despite the fact that start site 115452 has a more typical overlap of -4, it seems that other phages BE1 and BE2 do not necessarily have overlaps of -4. Therefore, this gene is most likely not part of an operon, and it is reasonable to choose the start site with overlap -7. /note=Location call: The above evidence suggests that the gene is real with start site 115459 /note=Function call: Both NCBI and PhagesDB BLAST solely gave strong hits for other genes of NKF. No good hits (i.e. e > 1) with functions were found. HHpred and CDD had no useful hits. /note=Transmembrane domains: Neither TmHmm nor Topcons call anything. /note=Secondary Annotator Name: Lopez, Erick /note=Secondary Annotator QC: Nathan`s evidence is extremely well detailed. All evidence points to this gene being real and this specific start site being true for PumpkinSpice_205 (Stop:115,721). The critical piece of evidence that required interpretation was that resented by starterator in regard to the overlap observed and I agree with Nathan`s decision in that although the majority of genes in the pham leave a 4bp gap, the 7bp gap seems appropriate. CDS 115743 - 116162 /gene="249" /product="gp249" /function="hypothetical protein" /locus tag="PumpkinSpice_249" /note=Original Glimmer call @bp 115743 has strength 12.91; Genemark calls start at 115743 /note=SSC: 115743-116162 CP: yes SCS: both ST: SS BLAST-Start: [KTSC domain-containing protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 1.69558E-98 GAP: 21 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.969, -3.1725816037952645, yes F: hypothetical protein SIF-BLAST: ,,[KTSC domain-containing protein [Streptomyces sp. JV178] ],,WP_099970931,100.0,1.69558E-98 SIF-HHPRED: SIF-Syn: NKF (Pham 57798); the upstream gene has NKF and belongs to pham 8220, the downstream gene has NKF and belongs to pham 54516. The same order of genes with the corresponding phams is observed in phage LukeCage (BE), for which the gene belonging to pham 54516 encodes thioredoxin. /note=Primary Annotator Name: Zorawik, Michelle /note=Auto-annotation start source: The gene was called by Glimmer and GeneMark, which agreed on the start site 115743. /note=Phamerator: The gene belongs to Pham 59991 (4/23/21) and is conserved in other BE2 phages, such as Karimac and Starbow. /note=Starterator: Start site 2, which corresponds to 115743 in PumpkinSpice, was manually annotated in 21/43 non-draft genomes (100% of time when present). It was predicted by Glimmer and GeneMark. /note=Coding Potential: Reasonable coding potential within the putative ORF in the forward direction that is covered by the chosen start site. /note=SD (Final) Score: -3.173, which is the best SD score. /note=Gap/overlap: The 21bp gap is reasonable and results in the longest ORF. /note=Location call: In consideration of the above evidence, this is a real gene and the most likely start site is 115743. /note=Function call: NKF. The top two hits in PhagesDB have unknown functions (e-value 1e-77). The top hits for NCBI also have hypothetical functions (e-value below 1.7e-98, identity > 99.28%, 100% coverage). There are no informative CDD or HHpred hits that support function calls. The protein likely has a KTSC domain. /note=Transmembrane domains: There is no evidence to suggest that this is a membrane protein. Neither TMHMM nor TOPCON predicted any TMDs. /note=Secondary Annotator Name: Wobig, Nathan /note=Secondary Annotator QC: I agree with the evidence for gene realness and start site. The evidence is sufficiently strong and well presented. CDS 116169 - 116468 /gene="250" /product="gp250" /function="thioredoxin" /locus tag="PumpkinSpice_250" /note=Original Glimmer call @bp 116169 has strength 5.3; Genemark calls start at 116169 /note=SSC: 116169-116468 CP: yes SCS: both ST: SS BLAST-Start: [thioredoxin [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 1.09123E-64 GAP: 6 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.157, -2.905625766045924, yes F: thioredoxin SIF-BLAST: ,,[thioredoxin [Streptomyces phage Starbow] ],,AXH66712,100.0,1.09123E-64 SIF-HHPRED: Putative oxidoreductase; APC23140, meticillin-resistant Staphylococcus aureus, oxidoreductase, thioredoxin fold, Structural Genomics, PSI-2, Protein Structure Initiative, Midwest Center for; HET: MSE; 1.5A {Staphylococcus aureus subsp. aureus},,,3IV4_A,96.9697,99.9 SIF-Syn: Thioredoxin, upstream gene is NKF (pham 6397), downstream gene is NKF (pham 64020), just like in MindFlayer (BE2) and StarPlatinum (BE2). /note=*Very diverse pham. Most annotated SS is only 18/70 times. Start 26 (116169) called 100% time when present (like several other SSs). -AF /note=Primary Annotator Name: Bang, Clara /note=Auto-annotation start source: Both Glimmer and GeneMark call the start site as 116169. /note=Phamerator: This gene is in pham 59541 as of 4/23/21. This pham is conserved in Belfort (BK) and BillNye (BK). /note=Starterator: Start site 24 is manually annotated in 6/49 non-draft genes in this pham. Start site 24 is 116169 in PumpkinSpice. This start site agrees with Glimmer and GeneMark. /note=Coding Potential: In the forward direction, in Self-Trained GeneMark there is high coding potential within this region, but in Host-Trained GeneMark, there is no coding potential within this region. /note=SD (Final) Score: -2.906. This is the best SD score. /note=Gap/overlap: 6 bp. This gap is reasonable and is conserved with other phages with this gene (IchabodCrane and Mindflayer). There is also no coding potential between the two genes. /note=Location call: This gene is real and most likely starts at 116169 bp, which includes all coding potential for this gene. /note=Function call: Thioredoxin. The top three non-draft phages from Phagesdb BLAST hits have the function as thioredoxin with e-values of 8 e-51. The top three NCBI BLAST hits also call thioredoxin, with the highest hits having 100% coverage, 100% identity and e-value of 1 e-64. Both CDD and HHpred called this gene as thioredoxin. CDD called it as thioredoxin with an e-value of 2.01 e-14. HHpred also called it as thioredoxin with 99.87% probability, 96.97% coverage, and with an e-value of 5.7 e-19. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Zorawik, Michelle /note=Secondary Annotator QC: I have reviewed all the evidence and agree with the location call. The gene is real and 116169 is the most likely start site. CDS 116639 - 116770 /gene="251" /product="gp251" /function="hypothetical protein" /locus tag="PumpkinSpice_251" /note=Original Glimmer call @bp 116639 has strength 3.16 /note=SSC: 116639-116770 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein SEA_BIRCHLYN_253 [Streptomyces phage Birchlyn] ],,NCBI, q1:s1 100.0% 1.68319E-22 GAP: 170 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.653, -3.4470622626190464, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BIRCHLYN_253 [Streptomyces phage Birchlyn] ],,QDF17371,100.0,1.68319E-22 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Lopez, Erick /note=Auto-annotation start source: Glimmer calls the start at 116639, Genemark has no start call. /note=Phamerator: Pham 6397 as of 5/7/2021; Synteny is observed as the gene is conserved in phage Wipeout. /note=Starterator: Start site 2 was called in 9/10 non-draft genes for pham 6397. /note=Coding Potential: Coding potential only observed in self-trained genemark. /note=SD (Final) Score: -3.447 Best final score. /note=Gap/overlap: 170, this gap is large, however conserved in other phages within this cluster. /note=Location call: Based on previous evidence, this is a real gene. /note=Function call: NKF /note=Transmembrane domains: No transmembrane domains found. /note=Secondary Annotator Name: Bang, Clara /note=Secondary Annotator QC: I agree with Erick`s start site call, for this start site is called 9/10 times and includes all coding potential. I would just add that this start site also has the best Z-score, but other than that I think that Erick`s call is the correct call. I also agree that this is a real gene. Please fill out the Starterator drop-down menu. /note= /note=Edit: Starterator drop-down has been filled! Thanks Clara! 5/23/21 CDS 116864 - 117346 /gene="252" /product="gp252" /function="glycosylase" /locus tag="PumpkinSpice_252" /note=Original Glimmer call @bp 116864 has strength 11.99; Genemark calls start at 116858 /note=SSC: 116864-117346 CP: yes SCS: both-gl ST: SS BLAST-Start: [glycosylase [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 7.38439E-115 GAP: 93 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.132, -4.4747668481608205, no F: glycosylase SIF-BLAST: ,,[glycosylase [Streptomyces phage Karimac] ],,YP_009840383,100.0,7.38439E-115 SIF-HHPRED: SIF-Syn: Gene is a glycosylase in PumpkinSpice and NKF in comparison phages (Pham 2976 as of 5/26/2021), upstream is Pham 2204 with NKF (as of 5/26/2021), downstream is Pham 6397 with NKF (as of 5/26/2021), just like in phages Bordeaux, IchabodCrane, and Mindflayer /note=Primary Annotator Name: Nguyen, Jennifer /note=Auto-annotation start source: Both Glimmer and GeneMark call the gene, but Glimmer calls at 116864 bp while GeneMark calls at 116858 bp. /note=Phamerator: Pham number 2976 as of 5/17/2021. The gene is conserved in phages StarPlatinum, Wipeout, and Yaboi, all belonging to the same subcluster BE2 as PumpkinSpice. /note=Starterator: Start site 19 in Starterator was manually annotated in 24/56 non-draft genes in this pham. Start 19 is 116864 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer. /note=Coding Potential: The ORF does have reasonable coding potential and the chosen start site does include all of the coding potential. /note=SD (Final) Score: -4.475. This is not the best final score on PECAAN, but compared to the other viable start site candidate, this is the better one of the two. /note=Gap/overlap: There is a 87 bp gap that is somewhat large, but this is ultimately reasonable as this is the smallest gap amongst the gene candidates. Compared to the candidate with the smallest gap, this site had a comparable gap and better final scores. This gene is also conserved in several other phages and the gap was seen in the other phages as well, such as phages StarPlatinum, Wipeout, and Yaboi. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 116864. Starterator data agrees with Glimmer. /note=Function call: It is a glycosylase. The top hits for PhagesDB BLAST have the function of glycosylase (E-value < 7e-91) and the top hits for NCBI BLAST also have the function of glycosylase (100% coverage, E-value < 2e-107, and 93.12%+ identity). HHpred and CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predicted any TMDs. This does not meet the evidence requirements to call it a membrane protein. /note=Secondary Annotator Name: Bang, Clara /note=Secondary Annotator QC: I agree with the calls that the gene is real and I also agree with the start site Jennifer called. I think she provides compelling evidence for both of these calls! CDS 117356 - 117847 /gene="253" /product="gp253" /function="hypothetical protein" /locus tag="PumpkinSpice_253" /note=Original Glimmer call @bp 117365 has strength 11.31; Genemark calls start at 117356 /note=SSC: 117356-117847 CP: yes SCS: both-gm ST: NI BLAST-Start: [hypothetical protein SEA_STARBOW_251 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 2.80105E-115 GAP: 9 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.713, -4.00988004985817, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_251 [Streptomyces phage Starbow] ],,AXH66714,100.0,2.80105E-115 SIF-HHPRED: SIF-Syn: NKF (pham 2976), upstream is pham 2204 and downstream is pham 8214, just as in Bordeaux (BE2) and Battuta (BE2) /note=Primary Annotator Name: Wobig, Nathan /note=Auto-annotation start source: Glimmer calls at 117365 while GeneMark calls at 117356 /note=Phamerator: pham: 2204. Date 04/21/2021. It is conserved; found in Bordeaux (BE2) and Battuta (BE2) /note=Starterator: Start site 5 in Starterator was manually annotated in 10 out of 30 non-draft genes in this pham and called 87.5% of the time when present. Start 5 is 117356 in PumpkinSpice. This evidence agrees with GeneMark but not Glimmer. Start site 6, which agrees with Glimmer, is only manually annotated once and called 12.5% of the time when present. /note=Coding Potential: high coding potential for GeneMark Self in the second frame of the forward strand. but no potential in host GeneMark /note=SD (Final) Score: -4.010, which is the lowest possible. /note=Gap/overlap: 9, which is the second smallest gap possible /note=Location call: Given the above evidence, the gene is real with start site 117356 /note=Function call: Both NCBI and PhagesDB BLAST solely gave strong hits for other genes of NKF. No good hits (i.e. e > 1) with functions were found. CDD and HHpred gave no useful hits /note=Transmembrane domains: Neither TmHmm nor Topcons call anything. /note=Secondary Annotator Name: Zorawik, Michelle /note=Secondary Annotator QC: I have reviewed all the evidence and agree with the location call. Start site 117356 is most likely based on Starterator and the Z-score and Final-score CDS 117825 - 118247 /gene="254" /product="gp254" /function="hypothetical protein" /locus tag="PumpkinSpice_254" /note=Original Glimmer call @bp 117825 has strength 6.85; Genemark calls start at 117825 /note=SSC: 117825-118247 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB80_gp067 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 2.16803E-100 GAP: -23 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.611, -4.04589805707511, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp067 [Streptomyces phage Karimac] ],,YP_009840385,100.0,2.16803E-100 SIF-HHPRED: SIF-Syn: NKF (Pham 8214); the upstream gene has NKF and belongs to pham 2204, the downstream gene has NKF and belongs to pham 6197. The same order of genes with the corresponding phams is observed in phage Birchlyn (BE). /note=Primary Annotator Name: Zorawik, Michelle /note=Auto-annotation start source: The gene was called by Glimmer and GeneMark, which agreed on the start site 117825. /note=Phamerator: The gene belongs to Pham 8214 (4/23/21) and is conserved in other BE phages, such as Battuta and Birchlyn /note=Starterator: Start site 14, which corresponds to 117825 in PumpkinSpice, was manually annotated in 29/33 non-draft genomes in Starterator and was predicted by Glimmer and GeneMark. /note=Coding Potential: Reasonable coding potential within the putative ORF in the forward direction that is covered by the chosen start site. /note=SD (Final) Score: -4.046, which is not the best but still a reasonable SD score. The gene is part of an operon. /note=Gap/overlap: The 23bp overlap is reasonable as the gene is part of an operon, and conserved in other phages like Battuta. /note=Location call: In consideration of the above evidence, this is a real gene and the most likely start site is 117825. /note=Function call: NKF. The top two hits in PhagesDB have unknown functions (e-value 3e-83). The top hits for NCBI also have hypothetical functions (e-values below 2.89e-99, identity > 97.8%, 100% coverage). There are no informative CDD or HHpred hits that support function calls. /note=Transmembrane domains: There is no evidence to suggest that this is a membrane protein. Neither TMHMM nor TOPCON predicted any TMDs. /note=Secondary Annotator Name: Lopez, Erick /note=Secondary Annotator QC: In accordance to the evidence presented above, such as strong e-value, coding potential, synteny and starterator, I agree with Michelle`s decision in that this is a real gene and it is placed at the correct start site. Overall solid analysis, Michelle did a great job! CDS 118283 - 118519 /gene="255" /product="gp255" /function="hypothetical protein" /locus tag="PumpkinSpice_255" /note=Original Glimmer call @bp 118328 has strength 7.78; Genemark calls start at 118325 /note=SSC: 118283-118519 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_253 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 1.60391E-48 GAP: 35 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.684, -6.160605165723925, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_253 [Streptomyces phage Starbow] ],,AXH66716,100.0,1.60391E-48 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF. (pham 8019), downstream gene is NKF (pham 8214), just like in phages TomSawyer (BE2) and Bordeaux (BE2). /note=Start: 1 @118283 has 28 MA`s; start 4 @ 118328 has 15 MAs. Chose first site for longer gene and to agree with starterator. /note=Primary Annotator Name: Bang, Clara /note=Auto-annotation start source: Glimmer and Genemark called different start sites (Glimmer: 118328; Genemark: 188325 - 3 bp difference). /note=Phamerator: This gene is in pham 6197 as of 4/23/21. This pham is conserved in Yaboi (BE) and Genie2 (BE). /note=Coding Potential: For both start sites, there is no coding potential within this region for the forward direction for Host-Trained GeneMark. However, both start sites in the forward direction in Self-Trained GeneMark include high coding potential within this region. /note=SD (Final) Score: The final score is -6.161 and the z-score is 1.684. This score is not the best, but it is still reasonable because it is part of an OPERON. /note=Gap/overlap: The gap is 35 bp. While this is a relatively large gap, it is conserved in other phages (Bordeaux and Birchlyn). While this gap is not common in this cluster, it is conserved in a few phages and there is no additional coding potential before or after this gene. /note=Location call: This gene is real and starts at 118283 which is called by Starterator and not Glimmer and GeneMark. This start site includes all coding potential. /note=Function call: No known function. The top three non-draft phages from Phagesdb BLAST hits have no known function with e-values of 1 e-30. The top three NCBI BLAST hits also call no known function, with the highest hits having 100% coverage, 100% identity and e-value of 4 e-37. Both CDD and HHpred had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Zorawik, Michelle /note=Secondary Annotator QC: I have reviewed all the evidence and agree with the location call. The gene is real and start site 118283 is most likely based on Starterator. The Z-score and Final-score are not the best but reasonable given that the gene is part of an operon. CDS 118520 - 118798 /gene="256" /product="gp256" /function="hypothetical protein" /locus tag="PumpkinSpice_256" /note=Genemark calls start at 118520 /note=SSC: 118520-118798 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein SEA_BIRCHLYN_258 [Streptomyces phage Birchlyn] ],,NCBI, q1:s1 100.0% 8.08582E-61 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.653, -3.3861058366776207, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BIRCHLYN_258 [Streptomyces phage Birchlyn] ],,QDF17376,100.0,8.08582E-61 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Lopez, Erick /note=Auto-annotation start source: Genemark calls 118520, no Glimmer start. /note=Phamerator: Pham 8019 as of 5/7/2021; Synteny is observed for this gene as it is also conserved in phage Wipeout /note=Starterator: According to starterator, the start site called for this gene is site 1, this also happens to be the most manually annotated start site for this pham. /note=Coding Potential: Coding potential only observed in self-trained genemark. There was no coding potential observed for host-trained genemark. /note=SD (Final) Score: -3.386, this is the best final score. /note=Gap/overlap: 0, There is no gap/overlap for this gene and as such no necessary gene additions. /note=Location call: Based on the evidence listed above, this gene is real. The correct start site is 118,520. /note=Function call: NKF /note=Transmembrane domains: No transmembrane domains found. /note=Secondary Annotator Name: Bang, Clara /note=Secondary Annotator QC: I agree that this gene is real and I also agree with Erick`s called start site. I again, think its worth mentioning that this start site has the best z-score. The evidence for this start site is overall very compelling! Please fill out the Starterator drop-down menu /note= /note=Edit: Starterator drop-down has been filled! 5/23/21 CDS 118770 - 118883 /gene="257" /product="gp257" /function="hypothetical protein" /locus tag="PumpkinSpice_257" /note=Original Glimmer call @bp 118770 has strength 5.85 /note=SSC: 118770-118883 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_256 [Streptomyces phage Starbow] ],,NCBI, q1:s3 100.0% 7.44301E-17 GAP: -29 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.704, -5.369733147763767, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_256 [Streptomyces phage Starbow] ],,AXH66719,94.8718,7.44301E-17 SIF-HHPRED: SIF-Syn: Pham 62440 (as of 5/26/2021), upstream is an HNH endonuclease (Pham 57225 as of 5/26/2021), downstream is Pham 8019 with NKF (as of 5/26/2021), just like in phages TomSawyer, Wipeout, and Starbow /note=Primary Annotator Name: Nguyen, Jennifer /note=Auto-annotation start source: Only Glimmer called the gene at the start site at 118770 bp. /note=Phamerator: Pham number 61055 as of 5/1/2021. The gene is conserved in phages Birchlyn, Genie2, and Mindflayer, all belonging to the same subcluster BE2 as PumpkinSpice. /note=Starterator: Start site 13 (118770) in Starterator was manually annotated in 10/22 non-draft genes in this pham (44% of time when present) /note=Coding Potential: The ORF does have reasonable coding potential and the chosen start site does include all of the coding potential. /note=SD (Final) Score: -5.370. This is not the best final score on PECAAN, but compared to the other candidates with better final scores, this start site candidate has the smallest gap. /note=Gap/overlap: There is a -29 bp gap, which is reasonable as this is the smallest gap amongst the gene candidates. This gene is also conserved in several other phages and the gap was seen in the other phages as well, such as phages Birchlyn, Genie2, and Mindflayer. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 118770. Starterator data agrees with Glimmer. /note=Function call: The function is unknown. The top hits for PhagesDB BLAST have an unknown function (E-value < 4e-18) and the top hits for NCBI BLAST also have an unknown function or hypothetical protein (100% coverage, E-value < 8e-17, and 100%+ identity). HHpred and CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predicted any TMDs. This does not meet the evidence requirements to call it a membrane protein. /note=Secondary Annotator Name: Wobig, Nathan /note=Secondary Annotator QC: I agree with Jennifer on the gene realness and start site. I would change the starterator box to suggested start site. 5/26/2021 - Addressed! CDS complement (118899 - 119441) /gene="258" /product="gp258" /function="HNH endonuclease" /locus tag="PumpkinSpice_258" /note=Original Glimmer call @bp 119441 has strength 9.62; Genemark calls start at 119441 /note=SSC: 119441-118899 CP: yes SCS: both ST: SS BLAST-Start: [HNH endonuclease [Streptomyces phage Wipeout]],,NCBI, q1:s1 100.0% 1.14242E-130 GAP: 79 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.478, -6.670495803018011, no F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Streptomyces phage Wipeout]],,QGH74459,100.0,1.14242E-130 SIF-HHPRED: Restriction endonuclease Hpy99I; ENDONUCLEASE-DNA COMPLEX, RESTRICTION ENZYME, HPY99I, PSEUDOPALINDROME, HYDROLASE-DNA COMPLEX; HET: 1PE; 1.5A {Helicobacter pylori},,,3GOX_B,68.8889,99.8 SIF-Syn: Most other phages in subcluster do not show synteny for this gene. Nevertheless, some synteny is still observed for some phages. For example: NKF (pham 57225), upstream is pham 183 and downstream is pham 58076, just as in JimJam (BE2). Starbow (BE2) has the same upstream gene but does not contain the downstream pham 183. /note=Starterator: Start site 21 in Starterator was manually annotated in only13/61 non-draft genes in this pham; not present in PumpkinSpice. This evidence agrees with GeneMark and Glimmer. Start 11 (119441) annotated 5/61 times, called 100% of time when present /note=Primary Annotator Name: Wobig, Nathan /note=Auto-annotation start source: Glimmer and GeneMark call at 119441 /note=Phamerator: pham: 57225. Date 04/21/2021. It is conserved; found in JimJam (BE2) and Starbow (BE2) /note=Coding Potential: high coding potential for GeneMark Self in the second frame of the reverse strand. but no potential in host GeneMark /note=SD (Final) Score: -6.670 with Z-score of 1.478. This score is low, but so are all the other scores with reasonable gap lengths /note=Gap/overlap: 79. This gap gives the LORF, and all other PECAAN start sites are gaps of +200. The gap is conserved in other phages, and it makes sense because the preceding gene is in the forward direction. /note=Location call: Given the above evidence, the gene is real and the start site 119441. Starterator agrees with this evidence. /note=Function call: It is an HnH endonuclease. The top three phagesdb BLAST hits have the function of HNH endonuclease (E-value <2e-99), and the 3 top NCBI BLAST hits also have the function of scaffolding protein. (100% coverage, 91%+ identity, and E-value <10^-120); one of these hits specifies Endonuclease VII rather than endonuclease. HHpred gives hits for a restriction endonuclease (e = 2.9e-19, 99.82% probability, 68.89% coverage); CDD gives a hit for endonuclease VII (43.90% identity, 41% coverage, e = 6.8e-20). Still given the other hits, it is most likely an HNH endonuclease /note=Transmembrane domains: Neither TmHmm nor Topcons call anything. /note=Secondary Annotator Name: Zorawik, Michelle /note=Secondary Annotator QC: I have reviewed all the evidence and agree with the location call. Start site 119441 is the most likely choice and results in a gap that is conserved in other phages. CDS complement (119521 - 119646) /gene="259" /product="gp259" /function="hypothetical protein" /locus tag="PumpkinSpice_259" /note=Original Glimmer call @bp 119646 has strength 1.45 /note=SSC: 119646-119521 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein SEA_WIPEOUT_253 [Streptomyces phage Wipeout]],,NCBI, q1:s1 100.0% 1.64356E-19 GAP: 1152 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.045, -7.972784296525115, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_WIPEOUT_253 [Streptomyces phage Wipeout]],,QGH74460,100.0,1.64356E-19 SIF-HHPRED: SIF-Syn: NKF (Pham 183); the upstream gene has NKF and belongs to pham 57225, the downstream gene has NKF and belongs to pham 7359. The same order of genes with the corresponding phams is observed in phage Wipeout (BE). /note=Start 5 (119646) called by all three other members of pham. Other members in pham all call gene w/ length 126 which just about includes all the coding potential. /note=Primary Annotator Name: Zorawik, Michelle /note=Auto-annotation start source: Glimmer called start site 119646. GeneMark did not make a call. /note=Phamerator: The gene belongs to Pham 183 (4/23/21) and is conserved only in one other non-draft BE2 phage (Wipeout). /note=Coding Potential: Very small peak of coding potential within the putative ORF in the reverse direction that is covered by the chosen start site. /note=SD (Final) Score: -3.907, which is the best SD score. /note=Gap/overlap: The 1005bp gap is very large but there is no evidence to assume a missing gene. The gap results in the longest ORF. A gap of 1152bp is found in phage Wipeout. /note=Location call: In consideration of the above evidence, this is a real gene, and start site 119793 seems to be more likely than the start site that was manually annotated in phage Wipeout (119646 in PumpkinSpice). /note=Function call: NKF. The top two hits in PhagesDB have unknown functions (e-values below 6e-17). The top hits for NCBI also have hypothetical functions (e-values below 3.6e-19, identity > 97.7%, 100% coverage for strongest hit, 45.5% for second hit). There are no informative CDD or HHpred hits that support function calls. /note=Transmembrane domains: There is no evidence to suggest that this is a membrane protein. Neither TMHMM nor TOPCON predicted any TMDs. /note=Secondary Annotator Name: Bang, Clara /note=Secondary Annotator QC: I agree with Michelle`s conclusion that this gene is real and agree with her called start site. It has the best final score and the best z-score. The gap is pretty concerning, but she addresses that large gaps are typically seen around this gene. Overall, I agree with her evidence. CDS complement (120799 - 121002) /gene="260" /product="gp260" /function="hypothetical protein" /locus tag="PumpkinSpice_260" /note=Original Glimmer call @bp 121002 has strength 8.87; Genemark calls start at 121002 /note=SSC: 121002-120799 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB79_gp001 [Streptomyces phage LukeCage] ],,NCBI, q1:s1 100.0% 6.4004E-40 GAP: 100 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.556, -3.50848394673489, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB79_gp001 [Streptomyces phage LukeCage] ],,YP_009839926,100.0,6.4004E-40 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF (just like in phage Wipeout (BE2)) (pham 183), downstream gene is NFK (just like in phage Starbow (BE2)) (pham 19906). This phage is the only phage in BE2 (so far) that has this particular order of genes. /note=Primary Annotator Name: Bang, Clara /note=Auto-annotation start source: Both Glimmer and GeneMark called the start site as 121002. /note=Phamerator: This gene is in pham 7359 as of 4/23/21. This pham is conserved in Battuta (BE) and Birchlyn (BE). /note=Starterator: Start site 2 is manually annotated in 38/38 non-draft genes in this pham. Start site 2 is 121002 in PumpkinSpice. Glimmer and GeneMark agree with this start site. /note=Coding Potential: In the reverse direction for Host-Trained GeneMark, there is a small amount of coding potential, but not enough to be deemed significant. For Self-Trained GeneMark, there is good coding potential within the this region. /note=SD (Final) Score: -3.508. This is the best SD score. /note=Gap/overlap: 100 bp. This is relatively high for a gap, but this gap is conserved in other phages with this gene (Starbow and Karimac). There is also no coding potential between the two genes. /note=Location call: This gene is real and most likely starts at 121002 bp, which also includes all coding potential for this gene. /note=Function call: No known function. The top three non-draft phages from Phagesdb BLAST hits have no known function with e-values less than 1 e-33. The top three NCBI BLAST hits also call no known function, with the highest hits having 100% coverage, 100% identity and e-value of 6 e-40. CDD and HHpred did not call any relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Lopez, Erick /note=Secondary Annotator QC: I am in agreement with Clara`s decisions in all aspects of this gene. All of the interpretations for the various pieces of evidence are spot on and are exactly the way in which I would interpret them. Further, all pieces of evidence are explained in great detail. Excellent work. CDS complement (121103 - 121405) /gene="261" /product="gp261" /function="hypothetical protein" /locus tag="PumpkinSpice_261" /note=Original Glimmer call @bp 121405 has strength 5.8; Genemark calls start at 121405 /note=SSC: 121405-121103 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_2 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 1.66491E-68 GAP: 97 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.066, -2.970161406017234, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_2 [Streptomyces phage Starbow] ],,AXH66513,100.0,1.66491E-68 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Lopez, Erick /note=Auto-annotation start source: Glimmer and Genemark both call 121405 /note=Phamerator: Pham 19906 as of 5/7/2021; Gene belongs to pham 19906, conserved in phage BoomerJR /note=Starterator: Starterator calls start site 3 @ 121405, this is also the most manually annotated start site with 18 of 28 non-draft genes. /note=Coding Potential: Coding potential was observed in Self-trained genemark only. /note=SD (Final) Score: -2.970, This was the best final score. /note=Gap/overlap: 97 There is a large gap found before this gene, yet this gap is conserved in phages mindflayer and battuta. /note=Location call: Based on the evidence above, this is the correct start site for this gene and the gene is indeed real. /note=Function call: NKF /note=Transmembrane domains: No transmembrane domains found. /note=Secondary Annotator Name: Nguyen, Jennifer /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. For the coding potential, maybe clarify that the self-trained coding potential is reasonable and entire coding region is covered by the chosen start site. Also, the starterator drop-down menu has not been checked yet! CDS complement (121503 - 122543) /gene="262" /product="gp262" /function="hypothetical protein" /locus tag="PumpkinSpice_262" /note=Original Glimmer call @bp 122543 has strength 15.33; Genemark calls start at 122543 /note=SSC: 122543-121503 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_BORDEAUX_3 [Streptomyces phage Bordeaux] ],,NCBI, q1:s1 100.0% 0.0 GAP: 195 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.922, -4.9157003279346805, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BORDEAUX_3 [Streptomyces phage Bordeaux] ],,QGH79776,100.0,0.0 SIF-HHPRED: SIF-Syn: Pham 63243 (as of 5/26/2021), upstream is Pham 64002 with NKF (as of 5/26/2021), downstream is Pham 19906 with NKF (as of 5/26/2021), just like in phages Battuta, Genie2, and Starbow /note=AF: See notes for PumpkinSpice_3: AF: Start (#65, 122543) is chosen by GM/Glimmer, closes gap, and is annotated 55.2% of time when present (mostly by BE2 phages, 14 times). Many more of genes in this pham call a start site (#74, 122498) - annotated 75% of time with present, by mix of BE1 and BE2 phages. I am choosing 122543 for above reasons. /note= (Start: 65 @122543 has 14 MA`s), (Start: 74 @122498 has 63 MA`s) /note=Primary Annotator Name: Nguyen, Jennifer /note=Auto-annotation start source: Both Glimmer and GeneMark call the gene and they agree on the start site at 122543 bp. /note=Phamerator: Pham number 63243 as of 5/17/2021. The gene is conserved in phages Battuta, BoomerJR, and Wofford, all belonging to the same subcluster BE2 as PumpkinSpice. /note=Starterator: Start site 65 in Starterator was manually annotated in 14/286 non-draft genes in this pham. Called 55% of time when present /note=Coding Potential: The ORF does have reasonable coding potential and the chosen start site does include all of the coding potential. /note=SD (Final) Score: -4.916. This is not the best final score on PECAAN, but compared to the other sites, this start site candidate had the longest ORF and smallest gap. /note=Gap/overlap: There is a 195 bp gap that is relatively large, but this is ultimately reasonable as this is the smallest gap amongst the gene candidates. This gene is also conserved in several other phages and the gap was seen in the other phages as well, such as phages Battuta, BoomerJR, and Wofford. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 122543. Starterator data agrees with both Glimmer and Genemark. /note=Function call: The function is unknown. The top hits for PhagesDB BLAST have an unknown function (E-value = 0.0) and the top hits for NCBI BLAST also have an unknown function or hypothetical protein (99-100% coverage, E-value = 0.0, and 99.42%+ identity). HHpred and CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predicted any TMDs. This does not meet the evidence requirements to call it a membrane protein. /note=Secondary Annotator Name: Bang, Clara /note=Secondary Annotator QC: I agree with the call that this gene is real. It definitely is skeptical that only 6/308 manual annotations called this start site, but it seems that those that have this start site call it. It is called 50% of the time when present and it includes all coding potential. I think it would be useful to ask an instructor on this, but overall, I think this start site seems logical. 5/26/2021 - Addressed! CDS complement (122739 - 123131) /gene="263" /product="gp263" /function="hypothetical protein" /locus tag="PumpkinSpice_263" /note=Original Glimmer call @bp 123131 has strength 5.27; Genemark calls start at 123131 /note=SSC: 123131-122739 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 2.92894E-92 GAP: 111 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.815, -4.0178693334748665, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_099970936,100.0,2.92894E-92 SIF-HHPRED: SIF-Syn: NKF (pham 58111), upstream is pham 10814 and downstream is pham 57618, just as in Bordeaux (BE2) and Battuta (BE2) /note=Primary Annotator Name: Wobig, Nathan /note=Auto-annotation start source: Glimmer and GeneMark call at 123131 /note=Phamerator: pham: 59330. Date 04/21/2021. It is conserved; found in Bordeaux (BE2) and Battuta (BE2) /note=Starterator: Start site 15 in Starterator was manually annotated in 28 out of 55 non-draft genes in this pham and called 97.6% of the time when present. Start 15 is 123131 in PumpkinSpice. This evidence agrees with GeneMark and Glimmer. /note=Coding Potential: High coding potential for GeneMark Self and Host in the second frame of the reverse strand. /note=SD (Final) Score: -4.018 with Z score of 2.815. These are the best values provided by PECAAN /note=Gap/overlap: Gap of 111. All of the gaps provided by PECAAN are 200+. gap conserved in other phages such as Bordeaux /note=Location call: Given the above evidence, gene exists with start site 123131. Starterator agrees /note=Function call: Both NCBI and PhagesDB BLAST solely gave strong hits for other genes of NKF. No good hits (i.e. e > 1) with functions were found. CDD and HHpred gave no useful hits /note=Transmembrane domains: Neither TmHmm nor Topcons call anything. /note=Secondary Annotator Name: Nguyen, Jennifer /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. NOTE: The starterator drop-down menu has not been checked yet! CDS complement (123243 - 123590) /gene="264" /product="gp264" /function="hypothetical protein" /locus tag="PumpkinSpice_264" /note=Original Glimmer call @bp 123590 has strength 14.44; Genemark calls start at 123590 /note=SSC: 123590-123243 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 6.48102E-79 GAP: 42 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.976, -4.801303705573801, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675371,100.0,6.48102E-79 SIF-HHPRED: SIF-Syn: NKF (Pham 10814); the upstream gene has NKF and belongs to pham 58111, the downstream gene has NKF and belongs to pham 17523. The same order of genes with the corresponding phams is observed in phage BoomerJR (BE), for which the gene belonging to pham 17523 encodes a Lsr2-like DNA bridging protein. /note=Primary Annotator Name: Zorawik, Michelle /note=Auto-annotation start source: The gene was called by Glimmer and GeneMark, which agreed on the start site 123590. /note=Phamerator: The gene belongs to Pham 10814 (4/23/21) and is conserved in other BE phages, such as Battuta and Birchlyn. /note=Starterator: Start site 15, which corresponds to 123590 in PumpkinSpice, was manually annotated in 30/66 non-draft genomes in Starterator and was predicted by Glimmer and GeneMark. /note=Coding Potential: Reasonable coding potential within the putative ORF in the reverse direction that is covered by the chosen start site. /note=SD (Final) Score: -4.801, which is the best SD score. /note=Gap/overlap: The 42bp gap is reasonable and results in the longest ORF. /note=Location call: In consideration of the above evidence, this is a real gene, and start site 123590 seems to be most likely. /note=Function call: NKF. The top two hits in PhagesDB have unknown functions (e-value 3e-59). The top hits for NCBI also have hypothetical functions (e-values below 2.79e-78, identity > 99.13%, 100% coverage). There are no informative CDD or HHpred hits that support function calls. /note=Transmembrane domains: There is no evidence to suggest that this is a membrane protein. Neither TMHMM nor TOPCON predicted any TMDs. /note=Secondary Annotator Name: Lopez, Erick /note=Secondary Annotator QC: The evidence presented by Michelle is extremely thorough and I agree with her interpretation of the evidence. This gene is indeed real and currently has the most accurate start site. CDS complement (123633 - 123794) /gene="265" /product="gp265" /function="hypothetical protein" /locus tag="PumpkinSpice_265" /note=Original Glimmer call @bp 123794 has strength 5.4; Genemark calls start at 123794 /note=SSC: 123794-123633 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB80_gp006 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 2.05667E-30 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.922, -5.186767100221219, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp006 [Streptomyces phage Karimac] ],,YP_009840179,100.0,2.05667E-30 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF (pham 18014), downstream gene is Lsr2-DNA bridging protein (pham 3538), just like in phages StarPlatinum (BE2) and Wofford (BE2). /note=Primary Annotator Name: Bang, Clara /note=Auto-annotation start source: Both Glimmer and GeneMark called the start site as 123794. /note=Phamerator: This gene is in pham 17523 as of 4/23/21. This pham is conserved in Bordeaux (BE) and BoomerJR (BE). /note=Starterator: Start site 1 is manually annotated in 66/66 non-draft genes in this pham. Start 1 is 123794 in PumpkinSpice. Glimmer and GeneMark agree with this start site. /note=Coding Potential: In the reverse direction, for Host-Trained GeneMark, there is no coding potential within this region. For Self-Trained GeneMark, there is good coding potential within this region in the reverse direction. /note=SD (Final) Score: -5.187. This is not the best SD score, but this has the most reasonable gap. /note=Gap/overlap: 9 bp. This is a reasonable gap size and is conserved in other phages with this gene (LukeCage and StarPlatinum). There is also no coding potential in between the two genes. /note=Location call: This gene is real and most likely starts at 123794 bp as Glimmer and GeneMark also agree on this start site, which also includes all coding potential. /note=Function call: No known function. The top three non-draft phages from Phagesdb BLAST hits have no known function with e-values of 7 e-25. The top three NCBI BLAST hits also call no known function, with the highest hits having 100% coverage, 100% identity and e-value of 2 e-30. CDD and HHpred did not call any relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Zorawik, Michelle /note=Secondary Annotator QC: I have reviewed all the evidence and agree with the location call. Start site 123794 is the most likely choice. CDS complement (123804 - 124316) /gene="266" /product="gp266" /function="Lsr2-like DNA bridging protein" /locus tag="PumpkinSpice_266" /note=Original Glimmer call @bp 124316 has strength 8.12; Genemark calls start at 124316 /note=SSC: 124316-123804 CP: yes SCS: both ST: SS BLAST-Start: [Lsr2-like DNA bridging protein [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 1.07085E-116 GAP: 115 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.969, -3.1725816037952645, yes F: Lsr2-like DNA bridging protein SIF-BLAST: ,,[Lsr2-like DNA bridging protein [Streptomyces phage Starbow] ],,AXH66518,99.4118,1.07085E-116 SIF-HHPRED: Lsr2 ; Lsr2,,,PF11774.9,52.9412,99.6 SIF-Syn: /note=Primary Annotator Name: Lopez, Erick /note=Auto-annotation start source: Both Glimmer and Genemark call 124316 /note=Phamerator: Pham 3538 as of 5/7/2021; Gene displays synteny as it is conserved in phage Bordeaux /note=Starterator: Starterator calls start site 9 for this gene @124316, which also happens to be the most manually annotated start site for this gene. /note=Coding Potential: Strong coding potential was observed in self-trained genemark. Very weak coding potential signals observed in host-trained Genemark /note=SD (Final) Score: -3.173, this was the best final score on pecaan /note=Gap/overlap: 115, big gap but it is conserved in other phages within this pham. /note=Location call: In accordance to the evidence presented above, I believe that this gene is real and it has the correct start site of 124,316 in the reverse direction. /note=Function call: Lsr2-like DNA bridging protein /note=Transmembrane domains: No transmembrane domains found. /note=Secondary Annotator Name: Nguyen, Jennifer /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. For the gap, I would mention some example phages that observed this gap. I would mention the specific numbers for how many of the non-drafts were manually annotated with a specific start site. Also, the starterator drop-down menu has not been checked yet! CDS complement (124432 - 125154) /gene="267" /product="gp267" /function="hypothetical protein" /locus tag="PumpkinSpice_267" /note=Original Glimmer call @bp 125154 has strength 5.3; Genemark calls start at 125154 /note=SSC: 125154-124432 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ICHABODCRANE_7 [Streptomyces phage IchabodCrane] ],,NCBI, q1:s1 100.0% 1.75218E-178 GAP: 103 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.865, -4.954064444624791, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ICHABODCRANE_7 [Streptomyces phage IchabodCrane] ],,QFP97324,100.0,1.75218E-178 SIF-HHPRED: SIF-Syn: Pham 30103 (as of 5/26/2021), upstream is Pham 6734 with NKF (as of 5/26/2021), downstream is an Lsr2-like DNA bridging protein (Pham 3538 as of 5/26/2021), just like in phages Wipeout, IchabodCrane, and Battuta /note=Primary Annotator Name: Nguyen, Jennifer /note=Auto-annotation start source: Both Glimmer and GeneMark call the gene and they agree on the start site at 125154 bp. /note=Phamerator: Pham number 30103 as of 5/17/2021. The gene is conserved in phages Birchlyn, BoomerJR, and Genie2, all belonging to the same subcluster BE2 as PumpkinSpice. /note=Starterator: Start site 26 in Starterator was manually annotated in 142/247 non-draft genes in this pham. Start 26 is 125154 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Coding Potential: The ORF does have reasonable coding potential and the chosen start site does include all of the coding potential. /note=SD (Final) Score: -4.954. This is the third best final score on PECAAN, and compared to the candidates with the better scores, this start site has the longest ORF and smallest gap. /note=Gap/overlap: There is a 103 bp gap that is relatively large, but this is ultimately reasonable as this is the smallest gap amongst the gene candidates. This gene is also conserved in several other phages and the gap was seen in the other phages as well, such as phages Birchlyn, BoomerJR, and Genie2. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 125154. Starterator data agrees with both Glimmer and Genemark. /note=Function call: The function is unknown. The top hits for PhagesDB BLAST have an unknown function (E-value < 1e-140) and the top hits for NCBI BLAST also have an unknown function or hypothetical protein (100% coverage, E-value < 2e-178, and 99.17%+ identity). HHpred and CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predicted any TMDs. This does not meet the evidence requirements to call it a membrane protein. /note=Secondary Annotator Name: Lopez, Erick /note=Secondary Annotator QC: All the evidence presented seems accurate. Jennifer was very thorough in the interpretation for the outputs in the varying pieces of evidence that were collected. I agree with Jennifer`s decision in that this gene is real and calls the correct start site. Note: Please fill out the pecaan notes with all the evidence in your annoNB 5/26/2021 - Addressed! CDS complement (125258 - 125422) /gene="268" /product="gp268" /function="hypothetical protein" /locus tag="PumpkinSpice_268" /note=Original Glimmer call @bp 125422 has strength 6.56; Genemark calls start at 125422 /note=SSC: 125422-125258 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 9.22393E-29 GAP: 55 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.986, -4.780746793811868, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_180304116,100.0,9.22393E-29 SIF-HHPRED: SIF-Syn: NKF (pham 6734), upstream is pham 54470 and downstream is pham 30103, just as in Bordeaux (BE2) and IchabodCrane (BE2) /note=Primary Annotator Name: Wobig, Nathan /note=Auto-annotation start source: Glimmer and GeneMark call at 125422 /note=Phamerator: pham: 6734. Date 04/21/2021. It is conserved; found in Bordeaux (BE2) and IchabodCrane (BE2) /note=Starterator: Only 1 start site; manually annotated in 30 out of 30 non-draft genes in this pham and called 100% of the time when present. Start 11 is 125422 in PumpkinSpice. This evidence agrees with GeneMark and Glimmer. /note=Coding Potential: High coding potential for GeneMark Self and Host in the first frame of the reverse strand. /note=SD (Final) Score: -4.781 with z score of 1.986. These are the best options given by PECAAN /note=Gap/overlap: gap of 55, which is the smallest gap provided by PECAAN; it is the LORF. Gap conserved in other phages such as Bordeaux /note=Location call: Given the above evidence, gene is real and start site is 125422. Starterator Agrees /note=Function call: Both NCBI and PhagesDB BLAST solely gave strong hits for other genes of NKF. No hits with functions were found. Notably, not many similar sequences exist at all. HHpred and CDD gave no useful hits /note=Transmembrane domains: Neither TmHmm nor Topcons call anything. /note=Secondary Annotator Name: Nguyen, Jennifer /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS complement (125478 - 125618) /gene="269" /product="gp269" /function="hypothetical protein" /locus tag="PumpkinSpice_269" /note=Original Glimmer call @bp 125618 has strength 11.78; Genemark calls start at 125618 /note=SSC: 125618-125478 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 3.35426E-24 GAP: 132 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.713, -3.1816499351312086, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_180304117,100.0,3.35426E-24 SIF-HHPRED: SIF-Syn: NKF (Pham 54470); the upstream gene has NKF and belongs to pham 6734, the downstream gene has NKF and belongs to pham 45824. The same order of genes with the corresponding phams is observed in phage Battuta (BE). /note=Primary Annotator Name: Zorawik, Michelle /note=Auto-annotation start source: The gene was called by Glimmer and GeneMark, which agreed on the start site 125618. /note=Phamerator: The gene belongs to Pham 54470 (4/23/21) and is conserved in other BE phages, such as Bordeaux and IchabodCrane. /note=Starterator: Start site 9, which corresponds to 125618 in PumpkinSpice, was manually annotated in 28/28 non-draft genomes in Starterator and was predicted by Glimmer and GeneMark. /note=Coding Potential: Reasonable coding potential within the putative ORF in the reverse direction that is covered by the chosen start site. /note=SD (Final) Score: -3.182, which is the best SD score. /note=Gap/overlap: The 132bp gap is reasonable and conserved in other phages like MindFlayer. /note=Location call: In consideration of the above evidence, this is a real gene and the most likely start site is 125618. /note=Function call: NKF. The top two hits in PhagesDB have unknown functions (e-value 1e-21). The top hits for NCBI also have hypothetical functions (e-values below 2e-23, identity > 97.8%, 100% coverage). There are no informative CDD or HHpred hits that support function calls. /note=Transmembrane domains: There is no evidence to suggest that this is a membrane protein. Neither TMHMM nor TOPCON predicted any TMDs. /note=Secondary Annotator Name: Nguyen, Jennifer /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS complement (125751 - 126254) /gene="270" /product="gp270" /function="hypothetical protein" /locus tag="PumpkinSpice_270" /note=Original Glimmer call @bp 126254 has strength 12.39; Genemark calls start at 126254 /note=SSC: 126254-125751 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 7.87835E-118 GAP: 130 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.049, -3.30700010583914, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_099970938,100.0,7.87835E-118 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF (pham 54470), downstream gene is NKF (pham 18063), just like in phages TomSawyer (BE2) and Bordeaux (BE2). /note=Primary Annotator Name: Bang, Clara /note=Auto-annotation start source: Both Glimmer and GeneMark called the start site as 126254. /note=Phamerator: This gene is in pham 45824 as of 4/23/21. This pham is conserved in LilMartin (BE) and IchabodCrane (BE). /note=Starterator: Start site 3 is manually annotated in 56/62 non-draft genes in this pham by Starterator. Start site 3 is 126254. Glimmer and GeneMark agree with this start site. /note=Coding Potential: In the reverse direction for both Host-Trained GeneMark and Self-Trained GeneMark, there is good coding potential within this region. /note=SD (Final) Score: -3.307. This is the best SD score and it includes all coding potential /note=Gap/overlap: 130 bp. This is a large gap, but this gap is conserved in phages with this gene (Karimac and LukeCage). There is also no coding potential between these genes. /note=Location call: This gene is real and most likely starts at 126254, which includes all coding potential and is called by both Glimmer and GeneMark. /note=Function call: No known function. The top three non-draft phages from Phagesdb BLAST hits have no known function with e-values of 6 e-93. The top three NCBI BLAST hits also call no known function, with the highest hits having 99% coverage, 100% identity and e-value of 8 e-118. CDD and HHpred did not call any relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Wobig, Nathan /note=Secondary Annotator QC: I agree with Clara. The evidence for gene realness and start site are both very strong. Check starterator box. CDS complement (126385 - 126573) /gene="271" /product="gp271" /function="hypothetical protein" /locus tag="PumpkinSpice_271" /note=Original Glimmer call @bp 126573 has strength 10.51; Genemark calls start at 126573 /note=SSC: 126573-126385 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_BATTUTA_12 [Streptomyces phage Battuta] ],,NCBI, q1:s1 100.0% 5.73321E-36 GAP: 106 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.577, -3.4645021069762962, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BATTUTA_12 [Streptomyces phage Battuta] ],,QRI45702,100.0,5.73321E-36 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Lopez, Erick /note=Auto-annotation start source: Both Glimmer and Genemark call start site 126,573 /note=Phamerator: Pham 18063 as of 5/7/2021; Gene is conserved in phage Battuta /note=Starterator: Starterator calls start site 2 @126573 for this gene. This start site also happens to be the most manually annotated gene in this pham at 20/22 non-draft genes. /note=Coding Potential: Strong coding potential is observed in both Host- and Self-trained genemark. /note=SD (Final) Score: -3.456, this is the best final score for this gene. /note=Gap/overlap: 106, large gap that is conserved in other phages. No additional gene needed. /note=Location call: Based on the evidence above, this does appear to be a real gene. /note=Function call: NKF /note=Transmembrane domains: No transmembrane domains found. /note=Secondary Annotator Name: Wobig, Nathan /note=Secondary Annotator QC: I agree with the evidence presented by Erick for gene realness and start site. Check starterator box. CDS complement (126680 - 127033) /gene="272" /product="gp272" /function="ParB-like nuclease domain" /locus tag="PumpkinSpice_272" /note=Original Glimmer call @bp 127033 has strength 14.73; Genemark calls start at 127033 /note=SSC: 127033-126680 CP: yes SCS: both ST: SS BLAST-Start: [ParB-like nuclease domain protein [Streptomyces phage Starbow] ],,NCBI, q1:s3 100.0% 8.13866E-82 GAP: 126 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.567, -3.5660483139923818, no F: ParB-like nuclease domain SIF-BLAST: ,,[ParB-like nuclease domain protein [Streptomyces phage Starbow] ],,AXH66524,98.3193,8.13866E-82 SIF-HHPRED: ParB domain protein nuclease; ParB-N, pnob8, partition, HYDROLASE; HET: MSE, CIT; 2.45A {Sulfolobus solfataricus},,,5K5D_C,62.3932,99.1 SIF-Syn: Gene is a ParB-like nuclease domain protein (Pham 64554 as of 5/26/2021), upstream is Pham 17191 with NKF (as of 5/26/2021), downstream is Pham 18063 with NKF (as of 5/26/2021), just like in phages Mindflayer, StarPlatinum, and Wipeout /note=Primary Annotator Name: Nguyen, Jennifer /note=Auto-annotation start source: Both Glimmer and GeneMark call the gene and they agree on the start site at 127033 bp. /note=Phamerator: Pham number 63945 as of 5/17/2021. The gene is conserved in phages BoomerJR, Genie2, and Yaboi, all belonging to the same subcluster BE2 as PumpkinSpice. /note=Starterator: Start site 18 in Starterator was manually annotated in 45/102 non-draft genes in this pham. Start 18 is 127033 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Coding Potential: The ORF does have reasonable coding potential and the chosen start site does include all of the coding potential. /note=SD (Final) Score: -3.566. This is the second best final score on PECAAN. The start site with the best final score has the shortest ORF and largest gap. /note=Gap/overlap: There is a 126 bp gap that is relatively large, but this is ultimately reasonable as this is the second smallest gap amongst the gene candidates. This gene is also conserved in several other phages and the gap was seen in the other phages as well, such as phages BoomerJR, Genie2, and Yaboi. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 127033. Starterator data agrees with both Glimmer and Genemark. /note=Function call: It is a ParB-like nuclease domain protein. The top hits for PhagesDB BLAST have the function of a ParB-like nuclease domain protein (E-value < 6e-66) and the top hits for NCBI BLAST also have the function of a ParB-like nuclease domain protein (100% coverage, E-value < 9e-82, and 100%+ identity). HHpred had multiple hits for ParB-like nuclease domain protein with the best hit having 99.15% probability, 62.3932% coverage, and an E-value of 4.3e-10. CDD had a specific hit of ParB-like nuclease domain with an e-value of 1.86e-04, a non-specific hit of ParB-like N-terminal domain with an E-value of 1.50e-07, and the superfamily of ParB N-terminal domain and sulfiredoxin protein-related families. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predicted any TMDs. This does not meet the evidence requirements to call it a membrane protein. /note=Secondary Annotator Name: Wobig, Nathan /note=Secondary Annotator QC: For gap/overlap, you listed start site 127039, but for everything else you list 127033. If this is supposed to be 127033 as well, then I agree with all the evidence! 5/26/2021 - Addressed! CDS complement (127160 - 127303) /gene="273" /product="gp273" /function="membrane protein" /locus tag="PumpkinSpice_273" /note=Original Glimmer call @bp 127303 has strength 8.8; Genemark calls start at 127303 /note=SSC: 127303-127160 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 3.46289E-22 GAP: 117 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.165, -4.85287563363746, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_180304118,100.0,3.46289E-22 SIF-HHPRED: SIF-Syn: NKF (pham 17191), upstream is pham 18168 and downstream is pham 58194, just as in Bordeaux (BE2) and Battuta (BE2); however, in PumpkinSpicce, pham 21034 is between pham 17191 and pham 18168, and does not appear in other phages. /note=Primary Annotator Name: Wobig, Nathan /note=Auto-annotation start source: Glimmer and GeneMark call at 127303 /note=Phamerator: pham: 17191. Date 04/21/2021. It is conserved; found in Bordeaux (BE2) and Battuta (BE2) /note=Starterator: Only one start site (Start site 6). Manually annotated in 30 out of 30 non-draft genes in this pham and called 100% of the time when present. Start 15 is 127303 in PumpkinSpice. This evidence agrees with GeneMark and Glimmer. /note=Coding Potential: High coding potential for GeneMark Self and Host in the first frame of the reverse strand. /note=SD (Final) Score: the final score is -4.853 with z score of 2.165. These are the highest pair of values provided by PECAAN /note=Gap/overlap: The gap is 7, which is considerably smaller than the other two suggested gaps. It is the LORF. /note=Location call: Given the above evidence, the gene is real with start site 127303. Starterator agrees. /note=Function call: Both NCBI and PhagesDB BLAST solely gave strong hits for other genes of NKF. No good hits (i.e. e > 1) with functions were found. CDD and HHpred gave no useful hits. /note=Transmembrane domains: TmHmm and TOPCONS call 1 tmd. /note=Secondary Annotator Name: Lopez, Erick /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been thoroughly considered and properly interpreteted. CDS complement (127421 - 127582) /gene="274" /product="gp274" /function="hypothetical protein" /locus tag="PumpkinSpice_274" /note=Original Glimmer call @bp 127642 has strength 11.39; Genemark calls start at 127639 /note=SSC: 127582-127421 CP: no SCS: both-cs ST: NA BLAST-Start: [hypothetical protein SEA_BIRCHLYN_272 [Streptomyces phage Birchlyn]],,NCBI, q1:s21 100.0% 1.94907E-29 GAP: 114 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.556, -3.859592806742189, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BIRCHLYN_272 [Streptomyces phage Birchlyn]],,QDF17390,72.6027,1.94907E-29 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF (pham 17191), downstream gene is NKF (pham 18618), just like in phage Wipeout (this sequence of genes is only seen in 1 other phage in BE2). /note=Primary Annotator Name: Bang, Clara /note=Auto-annotation start source: Glimmer and GeneMark called different start sites (Glimmer: 127642; Genemark: 127639 - 3 bp difference). These start sites include all coding potential. /note=Phamerator: This gene is in pham 18618 as of 4/23/21. This pham is conserved in JimJam (BE2) and Karimac (BE2). /note=Starterator: Start site 1 is manually annotated in 1/28 phages, while start site 3 is found in PumpkinSpice and is called in 27/28 manually annotated phages. Start site 3 is the most likely start site according to Starterator and is 127582 in PumpkinSpice. /note=Coding Potential: In the reverse direction, both Host-Trained GeneMark and Self-Trained GeneMark show high coding potential within this region for both start sites. /note=SD (Final) Score: The SD score for the Starterator start site (3) is -3.860 and the Z-score is 2.556. This is the best SD score and Z-score (better than the auto-annotated start sites). /note=Gap/overlap: The gap is 114 bp, which is relatively large. However, this gap is conserved in IchabodCrane and LukeCage. /note=Location call: Based on the evidence above, this gene is real. However, the start site is not the auto-annotated start sites (Glimmer and GeneMark), but 127582. This start site does not include all of the coding potential, but it includes most. /note=Function call: No known function. The top three non-draft phages from Phagesdb BLAST hits have no known function with e-values of 5 e-26. The top three NCBI BLAST hits also call no known function, with the highest hits having 72% coverage, 100% identity and e-value of 2 e-29. Both CDD and HHpred called no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Zorawik, Michelle /note=Secondary Annotator QC: I have reviewed all the evidence and agree with the location call. Even though the start site was not called by Glimmer and GeneMark, it does appear to be conserved in BE2 phages based on Starterator. CDS complement (127697 - 128035) /gene="275" /product="gp275" /function="hypothetical protein" /locus tag="PumpkinSpice_275" /note=Original Glimmer call @bp 128035 has strength 5.23; Genemark calls start at 128035 /note=SSC: 128035-127697 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_WIPEOUT_15 [Streptomyces phage Wipeout] ],,NCBI, q1:s1 100.0% 3.59287E-77 GAP: 113 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.049, -2.5588120788329394, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_WIPEOUT_15 [Streptomyces phage Wipeout] ],,QGH74262,100.0,3.59287E-77 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Lopez, Erick /note=Auto-annotation start source: Both Glimmer and Genemark call start site 128,035 /note=Phamerator: Pham 5540 as of 5/7/2021; Gene is conserved in phage Bordeaux /note=Starterator: Starterator calls start site 9 @128035, this also happens to be the most manually annotated start site for this gene as it is called in 49/54 non-draft genes. /note=Coding Potential: Strong coding potential was observed in both host- and self-trained genemark. /note=SD (Final) Score: -2.559, this was the best final score on PECAAN. /note=Gap/overlap: 113, although a large gap, there is no necessary addition of gene as this gap is conserved in other phages of the pham. /note=Location call: According to the evidence presented above, this gene does appear to be real with a start site @128035. /note=Function call: NKF /note=Transmembrane domains: No transmembrane domains found. /note=Secondary Annotator Name: Bang, Clara /note=Secondary Annotator QC: I agree with Erick`s conclusion that the gene is real and I also agree with his called start site. I think the evidence is pretty compelling towards this particular start site. Please fill out the Starterator drop-down menu. CDS complement (128149 - 128415) /gene="276" /product="gp276" /function="hypothetical protein" /locus tag="PumpkinSpice_276" /note=Original Glimmer call @bp 128415 has strength 8.77; Genemark calls start at 128415 /note=SSC: 128415-128149 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STARBOW_17 [Streptomyces phage Starbow] ],,NCBI, q1:s1 100.0% 2.32491E-55 GAP: 34 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.19, -4.354778061539587, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STARBOW_17 [Streptomyces phage Starbow] ],,AXH66528,100.0,2.32491E-55 SIF-HHPRED: SIF-Syn: Pham 8995 (as of 5/26/2021), upstream is Pham 47622 with NKF (as of 5/26/2021), downstream is Pham 5540 with NKF (as of 5/26/2021), just like in phages TomSawyer, Birchlyn, and Genie2 /note=Primary Annotator Name: Nguyen, Jennifer /note=Auto-annotation start source: Both Glimmer and GeneMark call the gene and they agree on the start site at 128145 bp. /note=Phamerator: Pham number 8995 as of 5/17/2021. The gene is conserved in phages BoomerJR, Battuta, and MindFlayer, all belonging to the same subcluster BE2 as PumpkinSpice. /note=Starterator: Start site 11 in Starterator was manually annotated in 30/30 non-draft genes in this pham. Start 11 is 128145 in PumpkinSpice. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Coding Potential: The ORF does have reasonable coding potential and the chosen start site does include all of the coding potential. /note=SD (Final) Score: -4.355. This is the best final score on PECAAN. /note=Gap/overlap: There is a 34 bp gap, which is reasonable as this is the smallest gap amongst the gene candidates. This gene is also conserved in several other phages and the gap was seen in the other phages as well, such as phages BoomerJR, Battuta, and MindFlayer. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 128145. Starterator data agrees with both Glimmer and Genemark. /note=Function call: The function is unknown. The top hits for PhagesDB BLAST have an unknown function (E-value < 2e-43) and the top hits for NCBI BLAST also have an unknown function or hypothetical protein (100% coverage, E-value < 3e-55, and 95.45%+ identity). HHpred and CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predicted any TMDs. This does not meet the evidence requirements to call it a membrane protein. /note=Secondary Annotator Name: Bang, Clara /note=Secondary Annotator QC: I agree with the call that this gene is real and I agree with the start site Jennifer called. I think this evidence provided is compelling for both of these calls! CDS complement (128450 - 128734) /gene="277" /product="gp277" /function="hypothetical protein" /locus tag="PumpkinSpice_277" /note=Genemark calls start at 128728 /note=SSC: 128734-128450 CP: yes SCS: genemark-cs ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 3.02723E-63 GAP: 114 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.653, -3.368377069717189, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_099970943,100.0,3.02723E-63 SIF-HHPRED: SIF-Syn: NKF (Pham 47622); the upstream gene has NKF and belongs to pham 8995, the downstream gene has NKF and belongs to pham 13649. The same order of genes with the corresponding phams is observed in phage IchabodCrane (BE). /note=Primary Annotator Name: Zorawik, Michelle /note=Auto-annotation start source: GeneMark called start site 128728. Glimmer did not make a call. /note=Phamerator: The gene belongs to Pham 47622 (5/1/21) and is conserved in other BE2 phages, such as Bordeaux and Battuta. /note=Starterator: Start site 4, which corresponds to 128734 in PumpkinSpice, was manually annotated in 18/26 non-draft genomes in Starterator. It was not called by Glimmer or GeneMark. /note=Coding Potential: Reasonable coding potential within the putative ORF in the reverse direction that is covered by the chosen start site. /note=SD (Final) Score: -3.368, which is the best SD score. /note=Gap/overlap: The 114bp gap is reasonable and results in the longest ORF. It is conserved in other phages like Bordeaux. /note=Location call: In consideration of the above evidence, this is a real gene and the most likely start site is 128734. /note=Function call: NKF. The top two hits in PhagesDB have unknown functions (e-value 1e-49). The top hits for NCBI also have hypothetical functions (e-values below 5.47e-63, identity > 98.93%, 100% coverage). There are no informative CDD or HHpred hits that support function calls. /note=Transmembrane domains: There is no evidence to suggest that this is a membrane protein. Neither TMHMM nor TOPCON predicted any TMDs. /note=Secondary Annotator Name: Lopez, Erick /note=Secondary Annotator QC: I agree with this annotation! Please remember to fill out the pecaan notes with the evidence from your notebook! CDS complement (128849 - 129478) /gene="278" /product="gp278" /function="hypothetical protein" /locus tag="PumpkinSpice_278" /note=Original Glimmer call @bp 129478 has strength 9.93; Genemark calls start at 129478 /note=SSC: 129478-128849 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB80_gp019 [Streptomyces phage Karimac] ],,NCBI, q1:s1 100.0% 1.89074E-152 GAP: 146 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.556, -3.50848394673489, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB80_gp019 [Streptomyces phage Karimac] ],,YP_009840192,100.0,1.89074E-152 SIF-HHPRED: SIF-Syn: NKF, upstream is NKF (pham 47622), downstream is NKF (pham 9696), just like in phages Battuta (BE2) and Genie2 (BE2). /note=Primary Annotator Name: Bang, Clara /note=Auto-annotation start source: Both Glimmer and GeneMark called the start site as 129478. /note=Phamerator: This gene is in pham 13649 as of 4/23/21. This pham is conserved in Birchlyn (BE) and Battuta (BE). /note=Starterator: Start site 7 is manually annotated in 30/30 non-draft genes in this pham. Start site 7 is 129478. Glimmer and GeneMark agree with this start site as well. /note=Coding Potential: Both Self-Trained and Host-Trained Genemark showed good coding potential in the reverse direction within this region. /note=SD (Final) Score: -3.508. This is the best SD score. /note=Gap/overlap: 146 bp. This is a relatively high gap in between genes, but this gap is conserved with other phages that possess these genes (Karimac and MindFlayer). There is also no coding potential in between. /note=Location call: Based on the evidence above, this gene is real and starts at 129478. /note=Function call: No known function. The top three non-draft phages from Phagesdb BLAST hits have no known function with e-values of 1 e-120. The top three NCBI BLAST hits also call no known function, with the highest hits having 100% coverage, 100% identity and e-value of 2 e-152. CDD and HHpred did not call any relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Zorawik, Michelle /note=Secondary Annotator QC: I have reviewed all the evidence and agree with the location call. The gene is real and start site 129478 is the most likely choice CDS complement (129625 - 129921) /gene="279" /product="gp279" /function="hypothetical protein" /locus tag="PumpkinSpice_279" /note=Original Glimmer call @bp 129921 has strength 7.28; Genemark calls start at 129921 /note=SSC: 129921-129625 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_MINDFLAYER_271 [Streptomyces phage MindFlayer]],,NCBI, q1:s15 100.0% 1.35791E-64 GAP: 78 bp gap LO: no RBS: Kibler 6, Karlin Medium, 0.94, -7.028301864955391, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_MINDFLAYER_271 [Streptomyces phage MindFlayer]],,QPL13869,87.5,1.35791E-64 SIF-HHPRED: SIF-Syn: /note=AF: Start #5 most called but not in pumpkinSpice. Start 6 (129921) called 36/86 times (90% when present) /note=Primary Annotator Name: Lopez, Erick /note=Auto-annotation start source: Both Glimmer and Genemark call start site 129,921 /note=Phamerator: Pham 9696 as of 5/7/2021; Gene conserved in phage Bordeaux /note=Coding Potential: Strong coding potential observed in both Host- and Self- trained genemark. /note=SD (Final) Score: -7.028, This was not the best final score on pecaan. /note=Gap/overlap: 78, sizeable gap but not enough to require the addition of a gene. /note=Location call: Although some of the evidence points to this not being the correct start site, this gene is definitely real and does appear to be at the correct start site -- in agreement with Glimmer and Genemark. /note=Function call: NKF /note=Transmembrane domains: There were no transmembranes found. /note=Secondary Annotator Name: Bang, Clara /note=Secondary Annotator QC: I agree with Erick`s call that the gene is real. Seeing that PumpkinSpice does not have the most annotated start site and calls the second most annotated start site makes me agree with Erick`s start site call as well. Please fill out the Starterator drop-down menu. CDS complement (130000 - 130155) /gene="280" /product="gp280" /function="hypothetical protein" /locus tag="PumpkinSpice_280" /note=Original Glimmer call @bp 130155 has strength 9.83; Genemark calls start at 130155 /note=SSC: 130155-130000 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_ICHABODCRANE_20 [Streptomyces phage IchabodCrane] ],,NCBI, q1:s2 100.0% 1.32324E-28 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.017, -4.717114992838803, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ICHABODCRANE_20 [Streptomyces phage IchabodCrane] ],,QFP97337,98.0769,1.32324E-28 SIF-HHPRED: SIF-Syn: NKF (pham 25254), upstream is pham 16711 and downstream is pham 9696, just as in Bordeaux (BE2) and Battuta (BE2) /note=Primary Annotator Name: Wobig, Nathan /note=Auto-annotation start source: Glimmer and GeneMark call at 130155 /note=Phamerator: pham: 25254. Date 04/21/2021. It is conserved; found in Bordeaux (BE2) and Battuta (BE2) /note=Starterator: Start site 14 in Starterator was manually annotated in 40 out of 48 non-draft genes in this pham and called 89.7% of the time when present. Start 14 is 130155 in PumpkinSpice. This evidence agrees with GeneMark and Glimmer. /note=Coding Potential: High coding potential for GeneMark Self in the third frame of the reverse strand. No coding potential for GeneMark Host /note=SD (Final) Score: -4.717 with z score of 2.017. These are not the best values, but gene is most likely part of an operon, so values are less important. /note=Gap/overlap: -1, which is consistent with patterns commonly observed in operons /note=Location call: Given the above evidence, the gene is real with start site 130155. Starterator agrees /note=Function call: Both NCBI and PhagesDB BLAST solely gave strong hits for other genes of NKF. No hits with functions were found. CDD and HHpred gave no useful hits. /note=Transmembrane domains: Neither TmHmm nor Topcons call anything. /note=Secondary Annotator Name: Zorawik, Michelle /note=Secondary Annotator QC: I have reviewed all the evidence and agree with the location call. The gene is real and start site 130155 is the most likely choice. CDS complement (130155 - 130403) /gene="281" /product="gp281" /function="hypothetical protein" /locus tag="PumpkinSpice_281" /note=Genemark calls start at 130403 /note=SSC: 130403-130155 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 6.52352E-54 GAP: 199 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 0.94, -6.9496166720535335, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675279,100.0,6.52352E-54 SIF-HHPRED: SIF-Syn: NKF (Pham 16711); the upstream gene has NKF and belongs to pham 25254, the downstream gene has NKF and belongs to pham 16871. The same order of genes with the corresponding phams is observed in phage Wipeout (BE). /note=Primary Annotator Name: Zorawik, Michelle /note=Auto-annotation start source: GeneMark called start site 130403. Glimmer did not make a call. /note=Phamerator: The gene belongs to Pham 16711 (5/1/21) and is conserved in other BE2 phages, such as Birchlyn. /note=Starterator: Start site 5, which corresponds to 130403 in PumpkinSpice, was manually annotated in 22/28 non-draft genomes in Starterator. It was also called by GeneMark. /note=Coding Potential: Reasonable coding potential within the putative ORF in the reverse direction that is covered by the chosen start site. /note=SD (Final) Score: -6.950, which is not the best SD score but reasonable given that the gene is part of an operon. /note=Gap/overlap: The 199bp gap is large but results in the longest ORF and is conserved in other phages like Birchlyn and Battuta. /note=Location call: In consideration of the above evidence, this is a real gene and the most likely start site is 130403. /note=Function call: NKF. The top two hits in PhagesDB have unknown functions (e-value 5e-45). The top hits for NCBI also have hypothetical functions (e-values below 5.1e-52, identity > 96.34%, 100% coverage). There are no informative CDD or HHpred hits that support function calls. /note=Transmembrane domains: There is no evidence to suggest that this is a membrane protein. Neither TMHMM nor TOPCON predicted any TMDs. /note=Secondary Annotator Name: Bang, Clara /note=Secondary Annotator QC: I agree with Michelle`s conclusion that this gene is real and I also agree with her start site call. I think all of the evidence taken above is compelling enough to conclude these things. CDS complement (130603 - 130827) /gene="282" /product="gp282" /function="hypothetical protein" /locus tag="PumpkinSpice_282" /note=Original Glimmer call @bp 130827 has strength 4.47; Genemark calls start at 130827 /note=SSC: 130827-130603 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWB79_gp023 [Streptomyces phage LukeCage] ],,NCBI, q1:s1 100.0% 7.13974E-45 GAP: 4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.879, -5.064680329293257, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWB79_gp023 [Streptomyces phage LukeCage] ],,YP_009839948,95.9459,7.13974E-45 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF (pham 16711), downstream gene is NKF (pham 12245), just like in phages Wipeout (BE2) and TomSawyer (BE2). /note=Primary Annotator Name: Bang, Clara /note=Auto-annotation start source: Both Glimmer and GeneMark called the start site as 130827. /note=Phamerator: This gene is in pham 16871 as of 4/23/21. This pham is conserved in Starbow (BE) and StarPlatinum (BE). /note=Starterator: Start site 11 is manually annotated in 22/30 non-draft genes in this pham. Start site 11 is 130827. Glimmer and GeneMark agree with this start site. /note=Coding Potential: In the reverse direction, Host-Trained GeneMark showed no coding potential within this region; however, Self-Trained GeneMark showed good coding potential. /note=SD (Final) Score: -5.065. It is not the best final score on PECAAN but captures all coding potential with reasonable gap. /note=Gap/overlap: 4 bp. This is a reasonable gap between two genes and is conserved in other phages that have this gene (Mindflayer and TomSawyer). There is also no coding potential between the two gene, so there is no additional gene needed to be added. /note=Location call: Based on the evidence above, this gene is real and starts at 130827. /note=Function call: No known function. The top three non-draft phages from Phagesdb BLAST hits have no known function with e-values of 9 e-36. The top three NCBI BLAST hits also call no known function, with the highest hits having 93% coverage, 100% identity and e-value of 7 e-45. CDD and HHpred did not call any relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Lopez, Erick /note=Secondary Annotator QC: I agree with this annotation. All evidence has been extensively reviewed and considered. CDS complement (130832 - 131008) /gene="283" /product="gp283" /function="hypothetical protein" /locus tag="PumpkinSpice_283" /note=Original Glimmer call @bp 131008 has strength 8.22; Genemark calls start at 131008 /note=SSC: 131008-130832 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_WIPEOUT_277 [Streptomyces phage Wipeout] ],,NCBI, q1:s1 100.0% 1.10618E-33 GAP: -11 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.587, -3.5077455217481304, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_WIPEOUT_277 [Streptomyces phage Wipeout] ],,QGH74484,100.0,1.10618E-33 SIF-HHPRED: SIF-Syn: /note=Start 6 @ 131008 called 34 MAs, 86.4% of time when present /note=Primary Annotator Name: Lopez, Erick /note=Auto-annotation start source: Both Genemark and Glimmer call start site 131,008 /note=Phamerator: Pham 12245 as of 5/7/2021; Gene conserved in phage Mindflayer /note=Coding Potential: Coding potential only observed in Self-trained genemark. /note=SD (Final) Score: -3.508 This was the best final score. /note=Gap/overlap: -11, this is a reasonable overlap of genes and this overlap appears to be conserved in other phages such as BoomerJR and MindFlayer. /note=Location call: According to the evidence gathered above, this does appear to be a real gene. /note=Function call: NKF /note=Transmembrane domains: There were no transmembranes found. /note=Secondary Annotator Name: Nguyen, Jennifer /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. NOTE: the starterator drop-down menu has not been checked yet! CDS complement (130998 - 131210) /gene="284" /product="gp284" /function="hypothetical protein" /locus tag="PumpkinSpice_284" /note=Genemark calls start at 131210 /note=SSC: 131210-130998 CP: yes SCS: genemark ST: NI BLAST-Start: [hypothetical protein SEA_MINDFLAYER_24 [Streptomyces phage MindFlayer] ],,NCBI, q1:s1 100.0% 2.44443E-42 GAP: 57 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.79, -3.099505556436281, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_MINDFLAYER_24 [Streptomyces phage MindFlayer] ],,QPL13665,100.0,2.44443E-42 SIF-HHPRED: SIF-Syn: NKF (pham 1854), upstream is pham 55565 but more discrepancies consistently arise downstream, just as in Bordeaux (BE2) and Battuta (BE2) /note=Primary Annotator Name: Wobig, Nathan /note=Auto-annotation start source: GeneMark calls at 131210, but Glimmer does not call. /note=Phamerator: pham: 1854. Date 04/21/2021. It is conserved; found in Bordeaux (BE2) and Battuta (BE2) /note=Starterator: Start site 15 in Starterator was manually annotated in 69 out of 64 non-draft genes in this pham and called 91.0% of the time when present. Start 15 is 131210 in PumpkinSpice. This evidence agrees with GeneMark and Glimmer. /note=Coding Potential: medium coding potential for GeneMark Self in the second frame of the reverse strand, especially when considering alternative coding potential. No coding potential for GeneMark Host /note=SD (Final) Score: -3.100 with Z-score of 2.79. These are the best options provided by PECAAN. /note=Gap/overlap: gap of 620, which is very big. /note=Location call: Given the above evidence, the gene is real with start site 131210. Starterator agrees. /note=Function call: Both NCBI and PhagesDB BLAST solely gave strong hits for other genes of NKF. No hits with functions were found. CDD and HHpred gave no useful hits. /note=Transmembrane domains: Neither TmHmm nor Topcons call anything. /note=Secondary Annotator Name: Nguyen, Jennifer /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS complement (131268 - 131453) /gene="285" /product="gp285" /function="hypothetical protein" /locus tag="PumpkinSpice_285" /note= /note=SSC: 131453-131268 CP: yes SCS: neither ST: NA BLAST-Start: [hypothetical protein SEA_BIRCHLYN_23 [Streptomyces phage Birchlyn] ],,NCBI, q1:s7 100.0% 5.09524E-34 GAP: 377 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.46, -6.452925062527533, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BIRCHLYN_23 [Streptomyces phage Birchlyn] ],,QDF17406,91.0448,5.09524E-34 SIF-HHPRED: SIF-Syn: /note=most other genes in related pham are 186bp long. includes all CP CDS 131831 - 132292 /gene="286" /product="gp286" /function="HNH endonuclease" /locus tag="PumpkinSpice_286" /note=Original Glimmer call @bp 131990 has strength 4.54; Genemark calls start at 131831 /note=SSC: 131831-132292 CP: yes SCS: both-gm ST: SS BLAST-Start: [hypothetical protein [Streptomyces sp. JV178] ],,NCBI, q1:s1 100.0% 2.43095E-109 GAP: 377 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.556, -3.649482460397077, no F: HNH endonuclease SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. JV178] ],,WP_143675281,100.0,2.43095E-109 SIF-HHPRED: restriction endonuclease PacI; HNH restriction endonuclease, beta-beta-alpha-metal active site, 8 base-pair rare cutter, HYDROLASE-DNA complex; HET: SO4; 1.92A {Pseudomonas alcaligenes},,,3M7K_A,65.3595,99.0 SIF-Syn: NKF (Pham 55565); the upstream gene has NKF and belongs to pham 1854, there is no downstream gene. The same order of genes with the corresponding phams is observed in phage MindFlayer (BE), in which pham 55565 functions as an HNH-endonuclease. /note=Primary Annotator Name: Zorawik, Michelle /note=Auto-annotation start source: Glimmer called start site 131990. GeneMark called start site 131831. /note=Phamerator: The gene belongs to Pham 55565 (5/1/21) and is conserved in other BE phages, such as Karimac. /note=Starterator: Start site 14, which corresponds to 131831 in PumpkinSpice, was manually annotated in 62/66 non-draft genomes in Starterator. It was also called by GeneMark. /note=Coding Potential: Reasonable coding potential within the putative ORF in the forward direction that is covered by the start site predicted by GeneMark but not Glimmer. /note=SD (Final) Score: -3.649, which is the best SD score. /note=Gap/overlap: The 620bp gap is large but reasonable since there is a switch from a reverse to a forward gene. Results in the longest ORF, and is conserved in other phages like LukeCage. /note=Location call: In consideration of the above evidence, this is a real gene and the most likely start site is 131831. /note=Function call: HNH endonuclease. The top hits in PhagesDB call HNH endonuclease functions (e-value 7e-90). The top hits for NCBI also have HNH endonuclease functions (e-values below 3.86e-102, identity > 95.3%, 100% coverage). There are no informative CDD hits but the top HHpred hits also suggest restriction endonuclease functions (e-values below 3.3e-8, probability > 98.7, coverage > 55.5%). /note=Transmembrane domains: There is no evidence to suggest that this is a membrane protein. Neither TMHMM nor TOPCON predicted any TMDs. /note=Secondary Annotator Name: Lopez, Erick /note=Secondary Annotator QC: I agree with this annotation. All evidence has been properly considered. Note: please fill out the pecaan notes with the information from your annoNB