CDS 757 - 1125 /gene="1" /product="gp1" /function="hypothetical protein" /locus tag="Brunswick_1" /note=Original Glimmer call @bp 757 has strength 7.9; Genemark calls start at 757 /note=SSC: 757-1125 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_1 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 1.05248E-82 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.319, -1.993391246735709, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_1 [Arthrobacter phage Niktson] ],,ASD52226,100.0,1.05248E-82 SIF-HHPRED: SIF-Syn: NKF protein in pham 59721 is flanked by an HNH endonuclease, just like in phage DevitoJr and Breylor17. /note=Primary Annotator Name: Chavez, Valeria /note=Auto-annotation: Glimmer and Genemark both agree on the same start site at position 757. The start codon is ATG. /note=Coding Potential: There is reasonable typical and atypical coding potential within the putative ORF predicted by GeneMarkS. GeneMark Host also predicts some typical coding potential. The chosen start site at position 757 covers all of this coding potential. /note=SD (Final) Score: The SD score is -1.993 and is the predicts the best sequence match. The Z-score is is 3.319 and the only one >2. /note=Gap/overlap: There is no gap or overlap since this is the first predicted gene. The length of this predicted gene is acceptable (369 bp) using the predicted start site. Alternative predicted start sites make the gene too short (<120 bp). /note=Phamerator: This gene is in Pham 59721 as of 01/06/22. Our phage is in subcluster AU1, and there are 21 non draft genomes in this subcluster that also have this pham. Phages Breylor17, DevitoJr, ElephantMan, Gordon, Nightmare, Tatanka, and Truckee were used for comparison. Phamerator did not have a function called for this gene, but the function for this same gene in other members of this subcluster was either HNH endonuclease or one of its domain proteins. /note=Starterator: Start site 19 is conserved among other members of the pham to which this gene belongs. 36/109 non draft genes in this pham call this site. Brunswick does not have this start site, but calls start site 9 at position 757. It has 28 manual annotations and is called 100% of the time when present. 28/109 non draft genes in this pham call this site and most of them are in the AU cluster. /note=Location call: The gathered evidence suggests that this is a real gene and that start site 9 at basepair position 757 is most likely the true start site. /note=Function call: The top 2 NCBI and PhagesDB BLASTp hits, sorted by e-value, suggested function is unknown, with high query coverage (100%), high % identity (100%), and low e-values (<6e-62). CDD had no hits. One HHpred hit had high probability of 85.64%, okay coverage of 54.0984%, but the e-value of 13 did not meet the <10e-3 threshold. There is no suggested function. /note=Transmembrane domains: Since TMHMM and TOPCONS didn’t call at least 1 TMD, we can conclude that this protein doesn’t have any TMDs. /note=Secondary Annotator Name: Montoya Serpas, Cinthya /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 1118 - 1525 /gene="2" /product="gp2" /function="HNH endonuclease" /locus tag="Brunswick_2" /note=Original Glimmer call @bp 1118 has strength 5.08; Genemark calls start at 1118 /note=SSC: 1118-1525 CP: yes SCS: both ST: SS BLAST-Start: [HNH endonuclease domain protein [Arthrobacter phage Gordon] ],,NCBI, q1:s1 100.0% 1.05808E-95 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.232, -2.7645180226256008, yes F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease domain protein [Arthrobacter phage Gordon] ],,YP_009603463,100.0,1.05808E-95 SIF-HHPRED: HNH endonuclease; Thermophilic bacteriophage, HNH Endonuclease, DNA nicking, HYDROLASE; 1.52A {Geobacillus virus E2},,,5H0M_A,57.037,98.7 SIF-Syn: /note=Primary Annotator Name: Cheng, Celine /note=Auto-annotation: Both Glimmer and GeneMark call the gene and agree on a start site at 1118 bp. Glimmer assigns a score of 5.08. The start codon is GTG. /note=Coding Potential: Coding potential was found in both GeneMark Self and Host, though GeneMark Self has a higher coding potential than Host. Regardless, both show reasonable coding potential and the chosen start site captures all coding potential on a forward ORF (ORF 2). /note=SD (Final) Score: The final score of -2.765 is the best final score listed in PECAAN. Its corresponding Z-score is 3.232. /note=Gap/overlap: It has an overlap of 8 bp with the previous gene and a gap of 20 bp with the next gene. Both overlap and gap are of reasonable size. Furthermore, an overlap with the previous gene and gap with the next gene are seen in other phages that have synteny with this gene and phage. /note=Phamerator: As of 01/06/2022, it is part of pham 56995. It is conserved in phages Breylor17, CapnMurica, DevitoJr, and CastorTray, which are all also part of subcluster AU1 with Brunswick. Many genes in this pham encode HNH endonuclease and HNH endonuclease domain protein. /note=Starterator: There are 108 non-draft genes in pham 56995, and 46 call start number 28, which is the most often called start site. This correlates to a start site at 1118 bp for Brunswick. /note=Location call: Based on the evidence above, including coding potential, SD (final) score, gap/overlap size, and phamerator and starterator analysis, this gene is a real gene and has a start site at 1118 bp. We can see that this start site is called in Starterator, GeneMark, and Glimmer. /note=Function call: Many PhagesDB BLAST hits suggest this gene’s function is a HNH endonuclease domain protein with small e-values of 8e-77. HNH endonuclease is also a hit with very small e-values (4e-75). HHPRED had a hit for an HNH endonuclease with a 98.7% probability, 57.038% coverage, and e-value of 7.2e-8. NCBI BLAST also has multiple hits for HNH endonuclease and HNH endonuclease domain protein. HNH endonuclease’s top hits having 98.5185% identity, 99.2593% alignment, 100% coverage, and e-value of 6.19551e-95. There are also several other hits for HNH endonuclease with very small e-values. HNH endonuclease domain protein’s top hits have 99.2593% identity,100% alignment, 100% coverage, and e-value of 1.05808e-95. CDD had no relevant hits. Specifically, HNH endonuclease is an accepted SEA-PHAGES function. /note=Transmembrane domains: Both TMHMM and TOPCONS did not predict any TMDs. As a result, this is not a membrane protein. /note=Secondary Annotator Name: Semaan, Sasha /note=Secondary Annotator QC: All the evidence supports the location and function call of the gene. There seems to be no issues with any of this work, great job on the detailed notes. CDS 1546 - 2670 /gene="3" /product="gp3" /function="endolysin" /locus tag="Brunswick_3" /note=Original Glimmer call @bp 1546 has strength 6.94; Genemark calls start at 1546 /note=SSC: 1546-2670 CP: yes SCS: both ST: SS BLAST-Start: [lysin A [Arthrobacter phage StarLord]],,NCBI, q1:s1 99.7326% 1.23806E-171 GAP: 20 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.308, -2.033982896655645, yes F: endolysin SIF-BLAST: ,,[lysin A [Arthrobacter phage StarLord]],,QFG14375,77.9841,1.23806E-171 SIF-HHPRED: d.118.1.1 (A:1-157) N-acetylmuramoyl-L-alanine amidase PlyG {Anthrax bacillus (Bacillus anthracis) [TaxId: 1392]},,,d1yb0a1,41.7112,99.8 SIF-Syn: Lysin A, N-acetylmuramoyl-L-alanine amidase domain, upstream gene is HNH endonuclease and downstream gene is a NKF, just like in phage Breylor17. /note=(calling just as "endolysin" because no other lysin genes are present, and therefore we should not specify domain -AF) /note=Primary Annotator Name: Cosentino, Evan /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 1546. /note=Coding Potential: On GeneMark Host, There is typical coding potential starting around 1660 bp and going until about 1930 bp. It then picks up again around 2100 bp and goes until 2300 bp. It then has coding potential on the reverse frame from 2300 to about 2670. On GeneMark Self, there is atypical coding potential from about 1700 bp to 2000 bp and then again from 2075 bp to 2670 bp. /note=SD (Final) Score: -2.034 /note=Gap/overlap: 20 bp gap /note=Phamerator: This is listed in pham 84661. Date 1/6/22. /note=Starterator: Start site 14 in Starterator was manually annotated in 23 out of 39 non-draft genes. Start 14 is 1546 in Brunswick. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: 1546 /note=Function call: Lysin A, N-acetylmuramoyl-L-alanine amidase domain. Many phagesdb BLAST hits have the function of lysin A with low e-values and high scores. Many NCBI BLAST hits also have this result. The best hit in CDD lists the function as a N-acetylmuramoyl-L-alanine amidase. That CDD hit had the most negative e-value at 3.33499e-16 and one of the better coverages at 33.4225%. The best HHpred hit also lists the function as a N-acetylmuramoyl-L-alanine amidase. That hit has the most negative e-value at 4.5e-18, the highest probability at 99.8%, and one of the better coverages at 41.7112%. Therefore, the function that makes the most sense is Lysin A, N-acetylmuramoyl-L-alanine amidase domain. /note=Transmembrane domains: TmHmm and Topcons doesn`t call any transmembrane domains. /note=Secondary Annotator Name: Verpukhovskiy, Philipp /note=Secondary Annotator QC: Agree with location, function calls. Very comprehensive notes CDS 2667 - 2834 /gene="4" /product="gp4" /function="membrane protein" /locus tag="Brunswick_4" /note=Genemark calls start at 2691 /note=SSC: 2667-2834 CP: yes SCS: genemark-cs ST: SS BLAST-Start: [hypothetical protein SEA_BREYLOR17_4 [Arthrobacter phage Breylor17] ],,NCBI, q3:s4 96.3636% 1.73645E-27 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 0.516, -7.77805343054184, no F: membrane protein SIF-BLAST: ,,[hypothetical protein SEA_BREYLOR17_4 [Arthrobacter phage Breylor17] ],,AXH43749,94.6429,1.73645E-27 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Gibbons, Alicia /note=Auto-annotation: No Glimmer start site is called. The GeneMark start site that is called is 2691.The start codon is ATG. /note=Coding Potential: The third bar of the Host-Trained GeneMark has reasonable coding potential within this range. The chosen start site does cover all of this coding potential. In GeneMarkS, the atypical coding potential (red dotted line) appears to have reasonable coding potential, while the typical coding potential (black line) has about 20% of its value (y-axis) and begins after the auto-annotated start site. Note that the chosen (changed) start site also covers all of the coding potential. /note=SD (Final) Score: The SD score is the second smallest (best) at -5.474, after one call with a SD score of -4.934. However, this call has a gap of 116 compared with our auto-annotated start site’s gap of 20, making our call more credible. The z-score of our call is highest at 2.017, making the auto-annotated call credible. The chosen start call has a z-score of 0.516 and a final score of -7.778: see Location Call for notes. /note=Gap/overlap: The gap of this gene with its auto-annotated start site is 20, which is relatively reasonable. There is one call with a greater (better) SD score, although its gap of 116 makes it less credible. The length of the gene is 143 bp, which is greater than 120 bp and reasonable. The chosen start site has a start of 2667 which contains the LORF and has a compelling gap of -4, but it has a very low z-score at 0.516 and a lower (worse) SD score of -7.778. The auto-annotated call has the second-longest ORF. /note=Phamerator: As of Jan 7th, 2022, the pham called is 11900. Our phage is of cluster AU, and 14 other non-draft phages of cluster AU (and only cluster AU) contain similar length genes with this pham in this position (gene 4/5), including phages Breylor17, CastorTray, Caterpillar, DevitoJr, and Giantsbane. None of these genes have called functions. /note=Starterator: Starterator was run on 1/1/22. Start 1 was called 66.7% of the time of genes in this pham (12/18), including in 12/14 non-draft genes, and is conserved, while start 3 was called 42.9% of the time present, for 6/18 phages including Brunswick. Start 1 is at position 2667. /note=Location call: Taken together, the gathered evidence suggests that this is a real gene. While the suggested start site was 2691, I am changing it to 2667 because this start site is highly conserved in Starterator and has a gap of -4. This is more compelling than the higher z-score of the auto-annotated start site. There do not appear to be any missed genes in the Pham Maps. /note=Function call: PhagesDB BLASTp contains no results of significant e-value with a known function. BLASTp contains no results of a significant e-value with a known function. CDD returned no results. HHpred also does not return results. However, there is evidence of a transmembrane domain, so this gene is called with the function of membrane protein. /note=Transmembrane domains: TMHMM calls 1 TMH and TOPCONS calls 1 TMD. /note=Secondary Annotator Name: Liao, Shiqing /note=Secondary Annotator QC: 1. remember to check boxes for phagesDB and NCBI BLASTp; 2. I think this should be a membrane protein since one TMD is each called in TMHMM and TOPCON; 3. I`m not sure if you should change the start site because, although start site 1 is more conserved, start 3 is also called. At the same time, Z score for auto-annotated one is a lot higher, and a gap of 20bp shouldn`t be a problem. **changed 1, 2 CDS 2831 - 3223 /gene="5" /product="gp5" /function="hypothetical protein" /locus tag="Brunswick_5" /note=Original Glimmer call @bp 2768 has strength 5.86; Genemark calls start at 2831 /note=SSC: 2831-3223 CP: yes SCS: both-gm ST: SS BLAST-Start: [hypothetical protein SEA_TEACUP_5 [Arthrobacter phage Teacup]],,NCBI, q1:s35 99.2308% 1.27678E-83 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.179, -4.371954607757725, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TEACUP_5 [Arthrobacter phage Teacup]],,ASR84011,74.2515,1.27678E-83 SIF-HHPRED: SIF-Syn: NKF; upstream gene is NKF, downstream gene is NKF, as in phage Breylor17 and CastorTray. /note=Primary Annotator Name: Di Blasi, Daria /note=Auto-annotation: Both Glimmer and GeneMark call the gene but they do not agree on the start site. Glimmer calls the start site at 2768 and GeneMark calls the start site at 2831. The start codon is GTG for both of these starts. /note=Coding Potential: Coding potential on the forward strand only, indicating that this is a forward gene. Both GeneMark Self and GeneMark Host show coding potential. GeneMark Self shows coding potential starting from start site 2768. GeneMark Host shows coding potential starting from start site 2831. /note=SD (Final) Score: -6.549 is the SD Score for the 2768 start site which is less favorable than the -4.372 SD Score of the 2831 start site, suggesting that start site 2831 is the real start site of the gene. /note=Gap/overlap: Start site 2768 has an overlap of 67 bp with the upstream gene and start site 2831 has an overlap of 4 bp with the upstream gene, which provides evidence that start site 2831 is the real start site of the gene. Start site 2831 produces a gene of length 393 bp which is reasonable. /note=Phamerator: The gene is part of pham 3203 as of January 6th, 2022. The pham has 116 members, 5 of which are draft genomes (including Brunswick). Pham 3203 is present in 26 phages in the AU cluster such as in Breylor17(AU) and Caterpillar(AU). /note=Starterator: The highly conserved start site (start site 16) is not present in the Brunswick genome. The most annotated site is called in 60 of the 109 non-draft genes in the pham. No manual annotations of start site 2768 exist to date but there are 4 manual annotations of start site 2831. Additionally, start site 2831 is called 31.2% of the time when it is present and the other phages that call this start site all belong to subcluster AU1 like Brunswick. All of this evidence suggests that start site 2831 is the real start site for the gene. /note=Location call: Based on all the evidence, the start site of the gene is likely start site 2831 since this start is called by GeneMark, produces the smallest overlap with the upstream gene (4 bp), has a favorable SD score (-4.372), has a Z-score of 2.179 which greater than 2 (while start site 2768 has a Z-Score less than 2 (​​1.135)), has been manually annotated 4 times, and the 5 phages that call this start site all belong to subcluster AU1 like Brunswick. /note=Function call: NKF; All phagesdb BLASTp hits were of unknown function, all NCBI BLASTp hits called gene products with a hypothetical protein, and there were no relevant CDD hits or HHpred hits (all HHpred hits had E-values > 18), therefore there is evidence to support that the gene product has no known function. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Tenney, Megan /note=Secondary Annotator QC: I agree with your call that this is a real gene that starts at 2831bp and your rationale is sound. Good work! CDS 3229 - 3687 /gene="6" /product="gp6" /function="hypothetical protein" /locus tag="Brunswick_6" /note=Original Glimmer call @bp 3229 has strength 6.36; Genemark calls start at 3229 /note=SSC: 3229-3687 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_GIANTSBANE_6 [Arthrobacter phage Giantsbane]],,NCBI, q1:s1 100.0% 1.88462E-83 GAP: 5 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.798, -3.1591032187135273, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_GIANTSBANE_6 [Arthrobacter phage Giantsbane]],,QGZ17210,88.1579,1.88462E-83 SIF-HHPRED: SIF-Syn: NKF, the upstream gene is NKF in pham 3203 and downstream gene is a terminase, just like in phages Niktson and Breylor17. /note=Glania-- Checked evidence boxes for phagesDB blast /note= /note=Primary Annotator Name: Empson, Brianna /note=Auto-annotation: Both Glimmer and GeneMark call this gene`s start site at 3229 (methionine). /note=Coding Potential: There is some reasonable coding potential on the Host-Trained GeneMark, and there is coding potential on the Self-Trained GeneMark. However, the suggested start site does not cover all of the coding potential. /note=SD (Final) Score: The suggested start site has the second-best Final Score (-3.159). This is still a reasonable Final Score, especially because it minimizes the gap when compared to all other potential start sites. /note=Gap/overlap: The gap of 5 bp between this gene and the upstream gene is reasonable. This is the smallest gap out of all of the potential start sites. /note=Phamerator: As of 01/17/2022, this gene is in pham 96486. This pham is conserved amongst the AU cluster. Phages and genes used for comparison were: Breylor17_6, CapnMurica_6, and CastorTray7. No function was called in Phamerator or the phages database. /note=Starterator: The start site called most frequently in non-draft genes is site 11 which corresponds to bp 3229 in Brunswick. It was called in 39/58 non-draft genes. /note=Location call: Given the evidence, I would say this is a real gene. With the evidence I can gather, I would say the auto-annotated start site is correct because all of the other potential start sites create a very large gap or overlap and are not as well conserved. The auto-annotated start site also has a reasonable Final Score. /note=Function call: The top 5 hits, sorted by E-value, from NCBI Blast with low E-values (1.66e-79), high coverage (100%), and high identity (69%) all suggest this gene has an unknown function. HHpred and CDD were uninformative. /note=Transmembrane domains: Neither TMHMM nor TOPCONS call and TMD for this gene. /note=Secondary Annotator Name: Teoh, Bryan /note=Secondary Annotator QC: CDS 3704 - 5443 /gene="7" /product="gp7" /function="terminase" /locus tag="Brunswick_7" /note=Original Glimmer call @bp 3704 has strength 6.35; Genemark calls start at 3704 /note=SSC: 3704-5443 CP: yes SCS: both ST: SS BLAST-Start: [terminase [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 0.0 GAP: 16 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.308, -2.0949393225970705, yes F: terminase SIF-BLAST: ,,[terminase [Arthrobacter phage Niktson] ],,ASD52231,100.0,0.0 SIF-HHPRED: Terminase large subunit; genome packaging, bacteriophage, ATPase, nuclease, VIRAL PROTEIN; HET: BR; 2.2A {Enterobacteria phage HK97},,,6Z6D_A,90.1554,100.0 SIF-Syn: Terminase. Highly conserved Pham regions across all phages within the same sub cluster. /note=Glania-- Unchecked draft phages for phages DB blast /note= /note=Primary Annotator Name: Ghannam, Maisam /note=Auto-annotation: Glimmer & GeneMark both call the gene at 3074 F. The start codon is ATG or Methionine. /note=Coding Potential: High coding potential as shown on GenemarkS via PhagesDB.org. The Host trained Genemark has coding potential though the peaks are not as apparent. Still, both are enough to suggest that the gene is real. The synteny map of Brunswick compared with finalized Darby and ElephantMan are identical, further indicating that this gene is real. /note=SD (Final) Score: Has a high Z score of 3.308 as well as a final score of -2.09, which is the least negative submission on PECAAN. /note=Gap/overlap: 16bp gap in ORF area. Host trained Genemark suggests there is a slight overlap in the ending of Gene 6 and the beginning of Gene 7. /note=Phamerator: Belongs to Pham 14601. Shares same pham organization as other final draft phages in the same subcluster. Synteny suggests that common start site is conserved in all genes within this pham. Drop down of the Phamerator location will be switched to SS in lieu of this evidence. /note=Starterator: Start site is conserved in 99.1% of the phages in this subcluster. Brunswick shares the common start site at bp #8, suggesting that this is the suggested start site for Gene 7. Selected gene does have all GM coding capacity as checked off in PECAAN. /note=Location call: Gene 7 has a start site at 3074 and a stop site at 5443 in the forward direction. Starterator and Phamerator suggest highly conserved start sites and synteny compared with finalized phage genes of the same subcluster. 16bp gap is due to slight overlap in the ending of Gene 6, not enough to warrant any concern. /note=Function call: All BLASTp hits had 0 e value for terminase functionality. All phages within the subcluster had this reading. Part of terminase superfamily COG4626. HHpred also verifies high probability of terminase functional call based on high coverage amounts in hits 6Z6D_A and 6M5V_A. /note=Transmembrane domains: No recorded TMHs. Suggests that this gene does not have any interaction with host cell membrane features. /note=Secondary Annotator Name: Turon Font, Guillem /note=Secondary Annotator QC: Done. Very detailed! CDS 5440 - 6291 /gene="8" /product="gp8" /function="hypothetical protein" /locus tag="Brunswick_8" /note=Original Glimmer call @bp 5440 has strength 12.57; Genemark calls start at 5440 /note=SSC: 5440-6291 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_7 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.073, -2.583959800616441, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_7 [Arthrobacter phage Niktson] ],,ASD52232,100.0,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Light, Isabel /note=Auto-annotation: The gene was called by both Genemark and Glimmer with the same start site of 5440. The start codon is ATG at this location. /note=Coding Potential: There is good coding potential observed in the putative ORF and the start site covers all the coding potential. /note=SD (Final) Score: The final score is -2.584 which is the best possible (most positive). The Z-score seems to be irrelevant because the gene appears to be in an operon. /note=Gap/overlap: There is an overlap of 4 bp between this gene and the gene upstream. This startsite creates the longest ORF. The length of the gene is 852 bp which is acceptable (greater than 120 bp). /note=Phamerator: The gene is in pham 93867 as of 1/1/22. There are many phages in which the gene is conserved, many of which are in Cluster AU with Brunswick. These include Breylor17, CapnMurica, CastorTray, and many more as well as many phages in Cluster DJ, and a few other clusters as well. While highly conserved, it seems the function is unknown for this gene. /note=Starterator: Start site 25 (coordinates to basepair 5440) was called in 46/68 genes in pham and there are 41 manual annotations of this start of 62. It is called 100% of the time when present, by phages in cluster Dj and AU (AU1 and AU2). It appears to be an agreeable startsite at bp 5440. /note=Location call: It appears this is a real gene with a start site of 5440. In addition to this start site covering all coding potential, this start site is supported by starterator as it is called 100% of the time when present, and has been manually annotated many times, it results in the largest ORF, and has the best Z-score and Final Score. /note=Function call: The function is unknown of this gene, while there were strong hits hits on NCBI pBLAST, HHpred, PhagesDB, or CDD none provided evidence of function. /note=Transmembrane domains: There are no TMDs based on TMHMM and TOPCONS evidence. /note=Secondary Annotator Name: Chavez, Valeria /note=Secondary Annotator QC: All evidence presented strongly supports function and location call of this gene. CDS 6288 - 6869 /gene="9" /product="gp9" /function="membrane protein" /locus tag="Brunswick_9" /note=Original Glimmer call @bp 6288 has strength 14.51; Genemark calls start at 6288 /note=SSC: 6288-6869 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TEACUP_9 [Arthrobacter phage Teacup]],,NCBI, q1:s1 100.0% 1.64855E-129 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.139, -4.473132615753581, no F: membrane protein SIF-BLAST: ,,[hypothetical protein SEA_TEACUP_9 [Arthrobacter phage Teacup]],,ASR84015,98.4456,1.64855E-129 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Jin, Katherine /note=Auto-annotation: Both GeneMark and Glimmer agree on 6288 as the suggested start site. The site number that was called is #9. The predicted start codon is ATG. /note=Coding Potential: There is good coding potential predicted by GeneMark Self (Track 3), but not GeneMark Host (none of the Tracks show coding potential at this region). There is reasonable coding potential predicted within the ORF and over the start site, 6288. /note=SD (Final) Score: The SD score is -4.473, this is not the best SD score; there are SD scores of -6, but with bad Z-scores of less than 2. In this specific case, the predicted SD and Z score combination (-4.473 and 2.139 respectively) may not be relevant for a ribosome binding site, as the predicted gap is actually a 4bp overlap, which can indicate this gene is organized in an operon. /note=Gap/overlap: The overlap of 4bp is reasonable for this gene if it is part of an operon. This gene length prediction is also the longest ORF, so it is acceptable given the auto annotated start site. /note=Phamerator: This gene is found in Pham 21467 as of 1/07/22. This pham is also found in other members of the same cluster, AU, that Brunswick is in. Some of the phages that I’ve compared to Brunswick are Breylor17, CapnMurica, and CastorTray. As of now, I have not seen a function call for this gene. /note=Starterator: The start number called the most often in the published annotations is 1, it was called in 23 of the 23 non-draft genes in the pham. The start site number for this gene corresponds to the 6288 base pair position. /note=Location call: Based on all the evidence collected so far from Glimmer, Genemark and Starterator, the agreed upon start site is 6288bp. Start number 1 was also the most annotated start number for this gene and other phage genes that are in the AU cluster. /note=Function call: The top 4 NCBI BLASTp hits, with E-values lower than 1e-120, suggest that this gene’s function is a hypothetical protein, with high query coverage (>99%), high % identity (>90.10%). /note=The top 10 hits from PhagesDB BlastP yielded E-values all lower than e-100, and suggest the gene’s function is unknown. /note=Both CDD and HHpred did not lead to informative results. /note=Transmembrane domains: There was only 1 TMH hit on TMHMM, and zero hits on TOPCONS. This means that the lack of data cannot serve as evidence for the gene. /note=Secondary Annotator Name: Cheng, Celine /note=Secondary Annotator QC: Great notes! I agree with all of the evidence presented. CDS 6872 - 7312 /gene="10" /product="gp10" /function="membrane protein" /locus tag="Brunswick_10" /note=Original Glimmer call @bp 6872 has strength 10.95; Genemark calls start at 6872 /note=SSC: 6872-7312 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_10 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 9.51906E-101 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.927, -4.9157003279346805, no F: membrane protein SIF-BLAST: ,,[hypothetical protein NIKTSON_10 [Arthrobacter phage Niktson] ],,ASD52235,100.0,9.51906E-101 SIF-HHPRED: SIF-Syn: Membrane protein, upstream gene is NKF, downstream is NFK, just like in phage DevitoJR (as of 1/7/22). /note=Primary Annotator Name: Montoya, Cinthya /note=Auto-annotation: The start site is 6872 per Glimmer and GeneMark. /note=Coding Potential: There is coding potential present that spans over the putative ORF of this gene in both the self-trained and host-trained GeneMark reports. /note=SD (Final) Score: The SD score for start site 6872 is -4.916. Although this is the 3rd least negative value, it is the most reasonable score resulting in the LORF. Thus, this is the most reasonableSD score resulting in a Z-score value of 1.927 which is close enough to the desirable threshold of >2. /note=Gap/overlap: There is a 2 bp gap between the putative gene and the gene upstream. This is a very small gap and appears to be conserved in other members of the same cluster (DevitoJr, CapnMurica,Synepsis). /note=Phamerator: pham: 56375 as of 1/7/22. This gene is conserved in members of cluster AU1 (DevitoJr, CapnMurica, Synepsis). There is no function call for this genes corresponding to the previously mentioned phages. /note=Starterator: Start site 13 is found in 17/28 (60.7%) members of pham 56375. Start site 13 was manually annotated 11 times for cluster AU1 and called 100% of the time when present. Start site 13 corresponds to basepair coordinate 6872 which is consistent with the auto-annotated start site by both Glimmer and GeneMark. /note=Location call: Based on the evidence presented, this is a real gene with the most reasonable start site being 6872. /note=Function call: There is no function call for all of the phage hits with a reasonable e-value of (10e-6 or better) in both PhagesDB BLASTp and NCBI BLASTp. The first two hits have an identity of 100% when compared to this gene. The presence of strong e-values in both programs further confirms the claim that this is a real gene. Additionally, there are no significant hits in HHPRED due to their extremely high e-values (90-1400) and there are no significant hits in the Conserved Domain Database. /note=Transmembrane domains: TMHMM predicts one TMH with a probability of >80%. Likewise, there is one TMH predicted by TOPCONS with a probability of 90-100%. Thus, there is sufficient evidence to call the function of this protein a TMH. /note=Secondary Annotator Name: Cosentino, Evan /note=Secondary Annotator QC: Everything looks good! I agree with the location and function calls. Great job! CDS 7315 - 7851 /gene="11" /product="gp11" /function="hypothetical protein" /locus tag="Brunswick_11" /note=Original Glimmer call @bp 7315 has strength 12.56; Genemark calls start at 7315 /note=SSC: 7315-7851 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_11 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 4.94964E-126 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.388, -4.701320725655571, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_11 [Arthrobacter phage Niktson] ],,ASD52236,100.0,4.94964E-126 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Semaan, Sasha /note=Auto-annotation: Both Glimmer and GeneMark agree on the same start site @ 7315. /note=Coding Potential: Gene has reasonable coding potential predicted within the putative ORF. The chosen start site does cover the coding potential shown in Host-Trained GeneMark. Typical and atypical overlap according to Self-Trained GeneMark. /note=SD (Final) Score: -4.701; two other suggested start sites had a slightly better final score but either their Z-score or ORF length were not as favorable as the suggested start site with a final score of -4.701. At this start site, the Z-score is 2.388. /note=Gap/overlap: Gap with preceding gene is reasonable, only 2 base pairs. Start site that produces this gap also produces the longest ORF. /note=Phamerator: Gene found in Pham 19179 as of 01/06/22. 15 other non-draft genes are present in this pham that belong to cluster AU, including phage Breylor17 that also shares the same length as this gene. /note=Starterator: The suggested start site for this gene was the start-site called most often, in 37 of 77 non-draft genes, in the pham. The gene of phage Breylor17 that shares the same cluster as this gene also calls this start site, start site number 7. For cluster AU1, start site 7 was annotated 11 times while another start site was only annotated once. /note=Location call: Gathered evidence suggests this is a real gene that has a start site @ 7315bp: covers all coding potential, small gap with preceding gene, longest ORF, aligns with start site of gene in the same cluster (Breylor17). /note=Function call: Function unknown; The top three hits from PhagesDB with identities of 100% and 99%, and e-values of 2e-98, 2e-98, and 8e-97, respectively, have no known functions. These three hits also come from phages within the same cluster. The NCBI database hits only consisted of hypothetical proteins. No hits in the CCD database. There was only 1 hit from the HHpred database but it was associated with Homosapiens, not relevant to this phage. /note=Transmembrane domains: No transmembrane domains. The absence of TMDs is not abnormal for a gene without a known function. /note=Secondary Annotator Name: Gibbons, Alicia /note=Secondary Annotator QC: Looks good! Notes: You do not need to note BLAST results with no known function. Also, it looks like you have not yet run Topcons (click run and check back a while later!). As well, for synteny, consider adding if the upstream/downstream/corresponding Breylor17 and CapnMurica genes are of the same pham. CDS 7890 - 9143 /gene="12" /product="gp12" /function="portal protein" /locus tag="Brunswick_12" /note=Original Glimmer call @bp 7890 has strength 11.03; Genemark calls start at 7911 /note=SSC: 7890-9143 CP: yes SCS: both-gl ST: SS BLAST-Start: [portal protein [Arthrobacter phage Teacup]],,NCBI, q1:s1 100.0% 0.0 GAP: 38 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.153, -2.338663114094478, yes F: portal protein SIF-BLAST: ,,[portal protein [Arthrobacter phage Teacup]],,ASR84018,98.3213,0.0 SIF-HHPRED: Phage portal protein, HK97 family; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_A,82.0144,100.0 SIF-Syn: The synteny checks out, as the gene is in the same position, with the same function as in Breylor and CapnMurica. The two flanking genes are in similar positions, and the following gene has the same function across all three phages. /note=Glania-- Starterator calls start site 1 @7890 has 117 MA`s, evidence for better start site as opposed to start site 2 @7911 which only has 1 MA. Coding potential for start site 7890 is sufficient in the forward direction. Also checked evidence boxes for function call. /note= /note=Primary Annotator Name: Verpukhovskiy, Philipp /note=Auto-annotation: start is called by glimmer at 7890, and genemark at 7911. Length 1254. Start codon ATG. /note=Coding Potential: Coding potential is good, with self-trained genemark having sufficient coding potential in the forward direction. Real gene. /note=SD (Final) Score: -2.339, best final score on pecaan. /note=Gap/overlap: 38, which is not too big to cause concern about the gene. /note=Phamerator: pham 3084, 1/25/22 conserved in Breylor17 a final phage on phammaps. /note=Starterator: start site 1, found in 105/109 phages. Brunswick calls this start site, which is at 7890. /note=Location call: Based on the evidence given from starterator and glimmer this is a real gene with a start site of 7890. /note=Function call: Portal Protein. Blast called function with e value of zero and around 400 matches as a portal protein. NCBI called it the same with an e value of zero, and around 95% coverage. HHpred called the function as well, with e value<-30 and 100% probability. CDD calls the function as well, with e value<-30 and coverage of 76-78%. /note=Transmembrane domains: None /note=Secondary Annotator Name: Di Blasi, Daria /note=Secondary Annotator QC: Glimmer and GeneMark do not agree on the start site; please clarify which start site you are using for the coding potential, SD score, phamerator, etc.; also add whether or not the SD score/overlap or gap is the best of all potential starts; add which is the most annotated start on starterator; add CDD, NCBI and phagesdb function information CDS 9165 - 11324 /gene="13" /product="gp13" /function="major capsid and protease fusion protein" /locus tag="Brunswick_13" /note=Original Glimmer call @bp 9165 has strength 8.06; Genemark calls start at 9165 /note=SSC: 9165-11324 CP: yes SCS: both ST: SS BLAST-Start: [capsid maturation protease [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 0.0 GAP: 21 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.211, -3.461712376159538, yes F: major capsid and protease fusion protein SIF-BLAST: ,,[capsid maturation protease [Arthrobacter phage Niktson] ],,YP_010749843,99.3046,0.0 SIF-HHPRED: SIF-Syn: this gene is capsid maturation protease, upstream gene is portal protein, downstream is NKF, just like in phage Breylor17 with upstream as portal protein and downstream unknown. /note=Primary Annotator Name: Liao, Shiqing /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 9165 which corresponds to a start codon of methionine. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. The whole coding potential is covered between the start and stop site. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.462. It’s the third best Final score. The Z score is 3.221, which is the highest. Although the Final Score is not the best, it`s the longest ORF and has the best Z score. The difference between the best Final score and this one is only 0.8. /note=Gap/overlap: The gap with the upstream gene is 21bp. The gap is conserved in other phages from cluster AU1. /note=Phamerator: Pham number is 14189. Recorded on 01/06/2022. This gene is in the same phamily with other 116 genes such as Aflac_22 (DJ) and Arcadia_12 (AM). /note=Starterator: Start Site 2 is the most annotated start site with 98 out of 116 genes in this pham and called 99% time when this start site is present. Start 2 is 9165 in Brunswick. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 9165. /note=Function call: Multiple phagesDB BLAST has hits with the suggested function capsid maturation protease with e value of 0. CDD has one relevant hit proheadase_HK97, which is a phage prohead protease with an E value of 6.49e-07. HHpred has multiple hits with major capsid protein with a probability of >99.5%, coverage >40%, E-score < 2.8e-11. Thus, it should be a capsid maturation protease. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Empson, Brianna /note=Secondary Annotator QC: I added that the start codon is methionine. Maybe add a note as to why you still picked this start site despite it not having the best Final Score under Location Call. Other than that, it looks good! CDS 11334 - 11669 /gene="14" /product="gp14" /function="hypothetical protein" /locus tag="Brunswick_14" /note=Original Glimmer call @bp 11394 has strength 6.6; Genemark calls start at 11334 /note=SSC: 11334-11669 CP: yes SCS: both-gm ST: SS BLAST-Start: [hypothetical protein NIKTSON_14 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 4.52965E-73 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.909, -5.222575804379144, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_14 [Arthrobacter phage Niktson] ],,ASD52239,100.0,4.52965E-73 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Tenney, Megan /note=Auto-annotation: Glimmer predicts a start site of 11394, a GTG codon, and GeneMark predicts a start of 11334 which calls codon ATG. /note=Coding Potential: Host-trained GeneMark shows some weak coding potential, though it does not reach this stop site. The self-trained coding potential shows strong coding potential completely bound by this predicted ORF, which is evidence that this is a real gene. This coding potential also extends to contain the upstream gene, which could indicate the presence of an operon. /note=SD (Final) Score: The predicted start site of 11394 has a larger final score of -4.751, which is larger than the SD score of the predicted site of 11334 of -5.223. This is relatively strong evidence in favor of start site 11394. /note=Gap/overlap: The auto-annotated start site of 11394 has a large gap of 69bp, though the z-score is 2.495 and a final score of -5.223. An alternate predicted start site at 11334, however, has a more reasonable gap of 9bp. This start site maximizes the ORF length (336bp), though the z-score is 1.909 and the final score is -5.223, which is lower than the other start site candidate. /note=Phamerator: This gene is found in pham 14054, as of 1/6/2022. All members of subcluster AU1 are also found in this phamily, including Breylor17 and CapnMurica. These conserved genes all have ORF lengths of 336, which supports the predicted start site of 11334, giving an ORF length of 335 bp in Brunswick. /note=Starterator: The start site of 11334 is the most annotated start site, which is called 97.8% of the time it is present. Brunswick does not call this start site, though it is present, rather it calls start site 6, at 11394bp, which is called 15.8% of the time when present. This is strong evidence in support of start site 1, at 11334bp. /note=Location call: The gathered evidence indicates that this is a real gene with a start site of 11334bp, which is called by 97.8% of the time it is present in all phages within pham 14054. This start site also minimizes the gap to be only 9bp and maximizes the ORF to a length of 335bp, which is the conserved length among the genes within subcluster AU1. This start site allows for all coding potential to be contained within this ORF. This start site, although, does not have the highest z-score or SD score, though it maintains synteny so this start site is most likely real. /note=Function call: Based on BLAST results for this sequence, all of the phages with strong alignment, indicated by very low e-values, have no known function at this time. CDD showed no significant hits, while the hits on HHpred all had high e-values and low coverage, so the similarity of these gene structures is not high enough to relate their functions. This gene has no known function at this time. /note=Transmembrane domains: Topcons and TmHmm both found no evidence of any transmembrane domains, therefore this gene has an unknown function. /note=Secondary Annotator Name: Ghannam, Maisam /note=Secondary Annotator QC: Great job Megan! I liked your description of the start site variances. CDS 11644 - 11943 /gene="15" /product="gp15" /function="tail terminator" /locus tag="Brunswick_15" /note=Original Glimmer call @bp 11644 has strength 1.93; Genemark calls start at 11644 /note=SSC: 11644-11943 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_15 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 6.74275E-67 GAP: -26 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.485, -3.670670413150579, no F: tail terminator SIF-BLAST: ,,[hypothetical protein NIKTSON_15 [Arthrobacter phage Niktson] ],,ASD52240,100.0,6.74275E-67 SIF-HHPRED: Tail terminator protein Rcc01690; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_F,98.9899,98.6 SIF-Syn: Just upstream of major tail protein, so could possibly have a tail function e.g. tail terminator. Other phages have a tail terminator in the same location. /note=Primary Annotator Name: Teoh, Bryan /note=Auto-annotation: Both Genemark and Glimmer agree on the same start site at 11644 F. Both have an ATG start site. /note=Coding Potential: No coding potential was detected in all 6 ORFs of host-trained GeneMark, however, self-trained gene mark showed strong coding potential in the putative ORF. /note=SD (Final) Score: -3.671; Start site 11644 has the best score among all other start sites /note=Gap/overlap: Overlap of -26bp denotes the smallest overlap while maintaining the coding potential with the longest ORF. /note=Phamerator: Pham 57922 at 1/1/2022. It is conserved in 114 other phages from various clusters. it is the most called gene (14) in 86 non-draft phage genomes. /note=Starterator: Ran on 1/1/2022; Start number 14 is the most called gene number in 86 non-draft phage genomes; This is manually annotated in 75 non-draft genes in this Pham. The evidence agrees with the site predicted by Glimmer and GeneMark. Pham Maps called this gene as a tail terminator gene. /note=Location call: Based on the evidence, this is a real gene and the appropriate start site should be 11644. /note=Function call : Tail Terminator as evident by phages DB function frequency, positive phages DB Blast hits, HHPred and NCBI Blast. They depicted high probability and coverage for this gene being a tail terminator. /note=Transmembrane domains: None Listed /note=Secondary Annotator Name: Light, Isabel /note=Secondary Annotator QC: I agree with the start site and function call, I would add a comparison to another phage in the synteny box. CDS 11955 - 12827 /gene="16" /product="gp16" /function="major tail protein" /locus tag="Brunswick_16" /note=Original Glimmer call @bp 11955 has strength 9.41; Genemark calls start at 11955 /note=SSC: 11955-12827 CP: yes SCS: both ST: SS BLAST-Start: [major tail protein [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 0.0 GAP: 11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.211, -2.2364030944336752, yes F: major tail protein SIF-BLAST: ,,[major tail protein [Arthrobacter phage Niktson] ],,ASD52241,97.9381,0.0 SIF-HHPRED: Phage_TTP_1 ; Phage tail tube protein,,,PF04630.15,64.1379,99.9 SIF-Syn: right after tail terminator /note=Primary Annotator Name: Turon Font, Guillem /note=Auto-annotation: Glimmer and GeneMark agree /note=Coding Potential: Strong Coding Potential in both Host and Self-Trained GeneMark /note=SD (Final) Score: Highest Possible. Z-score is over 2. /note=Gap/overlap: 11bp. Acceptable, but on the edge of being too long /note=Phamerator: Pham 8877 as of 1/6/22. Many non-draft genes from phages in the same cluster (AU) are in it. Many with the same index gene (16). /note=Starterator: MA site 9 (11955) is manually annotated 195 times out of 234 total (206 have it). Conserved in 88% of genes in pham. /note=Location call: Gene is real by synteny, Coding Potential and Start site. Best start site is autoannotated 11955. /note=Function call: Top 5 non-draft calls in PhagesDB BLASTp by e-value suggest function is Major Tail Protein. Highest e-value among them is e-161. All have over 95% identities /note=Top 5 NCBI calls sorted by score suggest function is Major Tail Protein. E-values are all 0.0 with >95.52% identities and 100% coverage. /note=HHPred shows two hits with e-values below e-3. Both have functions relating to phage tail tube proteins. /note=Transmembrane domains: TmHmm calls no transmembrane domains. TOPCONS shows nothing. /note=Secondary Annotator Name: Jin, Katherine /note=Secondary Annotator QC: I agree with all the evidence. CDS 12853 - 13410 /gene="17" /product="gp17" /function="hypothetical protein" /locus tag="Brunswick_17" /note=Original Glimmer call @bp 12853 has strength 8.06; Genemark calls start at 12853 /note=SSC: 12853-13410 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TEACUP_17 [Arthrobacter phage Teacup]],,NCBI, q1:s1 100.0% 2.6775E-130 GAP: 25 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.308, -2.0949393225970705, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TEACUP_17 [Arthrobacter phage Teacup]],,ASR84023,98.9189,2.6775E-130 SIF-HHPRED: SIF-Syn: NKF protein in pham 8098 is flanked by major tail proteins on each side, just like in phage Niktson and Breylor17. /note=Primary Annotator Name: Chavez, Valeria /note=Auto-annotation: Glimmer and Genemark both agree on the same start site at position 12853. The start codon is ATG. /note=Coding Potential: There is reasonable typical and atypical coding potential within the putative ORF predicted by GeneMarkS and GeneMark Host. The chosen start site at position 12853 covers all of this coding potential. /note=SD (Final) Score: The SD score is -2.095 and predicts the best sequence match. The Z-score is 3.308 and the only one much >2. /note=Gap/overlap: The gap is 25 bp. Out of all suggested start sites, this one creates the smallest gap. The gap shown on PhamMaps does not appear to fit another gene and there is no coding potential present in the gap on GeneMark. /note=Phamerator: This gene is in Pham 8098 as of 01/10/22. Our phage is in subcluster AU1, and there are 21 non draft genomes in this subcluster that also have this pham. Phages Breylor17 and Truckee were used for comparison. Phamerator did not have a function called for this gene. /note=Starterator: Start site 25 is conserved among other members of the pham to which this gene belongs. 109/109 non draft genes in this pham call this site. Brunswick calls start site at position 12853. It has 109 manual annotations and is called 100% of the time when present. /note=Location call: The gathered evidence suggests that this is a real gene and that start site 25 at basepair position 12853 is most likely the true start site. /note=Function call: The top 2 NCBI and PhagesDB BLASTp hits, sorted by e-value, suggested function is unknown, with high query coverage (~100%), high % identity (>97%), and low e-values (96%), high % identity (>52.62%), and low E-values (2e-87). These top five hits also came up as top PhagesDB BLASTp hits with low E-values (1e-125) and high coverage (>90%). CDD and HHpred were uninformative because none of the hits met the cutoff values. /note=Transmembrane domains: There is not enough evidence given by TMHMM or Topcons to indicate that this gene contains any transmembrane domains. /note=Secondary Annotator Name: Tenney, Megan /note=Secondary Annotator QC: I agree with your call! I think that there was a membrane protein detected by TOPCONS - it won`t change your call but maybe give it a second look/include it in the PECAAN notes. Great job! CDS 18769 - 19596 /gene="23" /product="gp23" /function="hypothetical protein" /locus tag="Brunswick_23" /note=Original Glimmer call @bp 18769 has strength 8.14; Genemark calls start at 18769 /note=SSC: 18769-19596 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein QCN35_gp22 [Arthrobacter phage Synepsis] ],,NCBI, q1:s1 100.0% 0.0 GAP: 11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.319, -1.993391246735709, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein QCN35_gp22 [Arthrobacter phage Synepsis] ],,YP_010750121,100.0,0.0 SIF-HHPRED: b.21.1.3 (A:162-264) receptor binding protein, rbp, C-terminal domain {Lactococcus lactis phage p2 [TaxId: 100641]} | CLASS: All beta proteins, FOLD: Virus attachment protein globular domain, SUPFAM: Virus attachment protein globular domain, FAM: Lactophage receptor-binding protein head domain,,,SCOP_d1zrua1,32.0,97.9 SIF-Syn: in minor tail cassette /note=Primary Annotator Name: Ghannam, Maisam /note=Auto-annotation: Glimmer & GeneMark both call the gene at 18769 F. The start codon is ATG or Methionine. /note=Coding Potential: Has high coding potential on Self Trained Genemark, but no coding potential at all on host trained genemark. Evidence based on Self Trained Genemark points to gene being real. /note=SD (Final) Score: Highest Z score in potential candidates listed at 3.319, Final Score most optimal at -1.993. /note=Gap/overlap: 11 base pair overlap. Does not appear to be in place for an operon. /note=Phamerator: Part of Pham 64569. Has been Phamerated according to PhagesDB. According to Pham map readings on PECAAN, shares conserved traits compared to phage Darby. With some phages like Breylor17, the pham is associated with the gene directly upstream. /note=Starterator: Indicates a start site at 30bp. Aligns with 65.8% of other genes within the same pham as this being the correct position. Selected gene does have all GM coding capacity as checked off in PECAAN. /note=Location call: Gene 23 has a start site at 18769 and ends at 19596 in the Forward direction. Evidence suggests this is a real gene based on high coding potential and consistency in pham maps when compared to phages Breylor17 and Darby. /note=Function call: Unknown gene function as confirmed by BLASTp. 572 and 447 being the top 2 scores with an extremely low e value for both hits (Synepsis having e-163 AXH46684 and CastorTray having e-125 QYC55012). BLASTp indicating hypothetical protein output. Showing low coverage on conserved domains in NCBI BLAST, but just above the threshold so I marked it as evidence. HHpred showing high possibility of receptor binding protein but low coverage. /note=Transmembrane domains: None. Suggests that this gene, though having no known function, does not have any interaction with host cell membrane features. /note=Secondary Annotator Name: Teoh, Bryan (Joey) /note=Secondary Annotator QC: Agree with annotation call after reviewing extensive evidence collection by first annotator. CDS 19606 - 19743 /gene="24" /product="gp24" /function="hypothetical protein" /locus tag="Brunswick_24" /note=Original Glimmer call @bp 19606 has strength 8.48; Genemark calls start at 19606 /note=SSC: 19606-19743 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_24 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 5.67223E-24 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.319, -2.0111200136961407, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_24 [Arthrobacter phage Niktson] ],,ASD52249,100.0,5.67223E-24 SIF-HHPRED: SIF-Syn: No known function of gene in pham #11614, upstream gene is unknown, downstream is unknown. While gene and downstream gene are conserved in phage Breylor17, upstream gene on Breylor appears to contain the two genes upstream of Bruinswick`s #24. /note=Primary Annotator Name: Light, Isabel /note=Auto-annotation: The gene was called by Glimmer and GeneMark with the same start site of 19606, with codon ATG. /note=Coding Potential: The gene has reasonable coding potential in the putative ORF observed on Self trained GeneMark. There is coding potential before the start site, it appears the gene is in an operon. /note=SD (Final) Score: The Final score is -2.011, it is the only possible start site so it is good that the final score is not very negative and the Z-score (3.319) is positive. /note=Gap/overlap: There is a gap of 9 basepairs, which is reasonable. There are no other start site candidates. The length of the gene is 138 bp which is acceptable. /note=Phamerator: The pham for this gene is #11614 as of 1/6/21. There are 24 members of this pham, many of which are in the same cluster as Brunswick (AU). Phages in AU in this pham include Breylor17, CapnMurica, DevitoJr, ElephantMan. There were no known functions called for this gene. /note=Starterator: Start site #8 if found inn 21/24 genes in pham, and is called 100% of the time when present. The start site is agreeable based on this evidence. /note=Location call: All evidence supports this is a real gene, it is conserved in phamerator and has good coding potential. GeneMark and Glimmer both called start site of 19606 which is the only gene candidate. /note=Function call: No evidence on phagesDB BLASTp, NCBI BLASTp, HHPred, or CDD provide any evidence of a known function. /note=Transmembrane domains: No transmembrane domains predicted by both TMHMM and Topcons. /note=Secondary Annotator Name: Turon Font, Guillem /note=Secondary Annotator QC: I agree. Very good detail! CDS 19766 - 20149 /gene="25" /product="gp25" /function="hypothetical protein" /locus tag="Brunswick_25" /note=Original Glimmer call @bp 19766 has strength 3.93; Genemark calls start at 19766 /note=SSC: 19766-20149 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage Synepsis] ],,NCBI, q1:s1 100.0% 8.63394E-86 GAP: 22 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.373, -3.9663833707737797, yes F: hypothetical protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Synepsis] ],,YP_010750123,100.0,8.63394E-86 SIF-HHPRED: SIF-Syn: No known function protein, upstream gene is NKF, downstream is a tape measuring protein, just like in phage Synepsis. /note=Primary Annotator Name: Jin, Katherine /note=Auto-annotation: Both GeneMark and Glimmer agree on 19766 as the suggested start site. The site number that was called is #24. The predicted start codon is ATG. /note=Coding Potential: There is good coding potential predicted by GeneMark Self (Track 2), but not GeneMark Host (Track 1 shows vague upward hash, other tracks have little info). There is reasonable coding potential predicted within the ORF and over the start site, 19766. /note=SD (Final) Score: The SD score is -3.966, this is the best SD score; there are SD scores of -6, but with bad Z-scores of less than 2. In this specific case, the predicted SD and Z score combination (-3.966 and 2.373 respectively) seems to be relevant for a ribosome binding site, as the predicted gap is 22bp overlap. /note=Gap/overlap: The overlap of 22bp is reasonable for this gene. This gene length prediction is not the longest ORF, but it is acceptable given the auto annotated start site and scores. /note=Phamerator: This gene is found in Pham 19232 as of 1/07/22. This pham is also found in other members of a different cluster, DJ, that Brunswick is not in. Some of the phages that I’ve compared to Brunswick are Aflac, AlainaMarie, Arcadia and Breylor17. As of now, some of the comparisons have been noting the function as ‘minor tail protein.’ /note=Starterator: The start number called the most often in the published annotations is 40, it was called in 93 of the 109 non-draft genes in the pham. The start site number for this gene corresponds to the 19766 base pair position. /note=Location call: Based on all the evidence collected so far from Glimmer, Genemark and Starterator, the agreed upon start site is 19766bp. Start number 40 was also the most annotated start number for this gene and other phage genes that are in the AU cluster. /note=Function call: The top 5 NCBI BLASTp hits, with E-values lower than 5e-85, suggest that this gene’s function may be a minor tail protein, with high query coverage (100%), high % identity (>98.43%). /note=The top 9 hits from PhagesDB BlastP yielded E-values all lower than 1e-67, and suggest the gene’s function could be a minor tail protein. /note=Both the successful hits yield one high hit result, which suggests that this gene is a minor tail protein. /note=Both CDD and HHpred did not lead to informative/significant results. HHpred had high e-values. /note=Transmembrane domains: There was no TMH hit on TMHMM, and zero hits on TOPCONS. This means that the lack of data cannot serve as evidence for the gene. /note=Secondary Annotator Name: Chavez, Valeria /note=Secondary Annotator QC: All evidence presented strongly supports function and location call of this gene. CDS 20146 - 24735 /gene="26" /product="gp26" /function="tape measure protein" /locus tag="Brunswick_26" /note=Original Glimmer call @bp 20146 has strength 8.15; Genemark calls start at 20146 /note=SSC: 20146-24735 CP: yes SCS: both ST: SS BLAST-Start: [tape measure protein [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.639, -4.255672789525901, no F: tape measure protein SIF-BLAST: ,,[tape measure protein [Arthrobacter phage Niktson] ],,ASD52251,99.7384,0.0 SIF-HHPRED: Tape Measure Protein, gp57; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_AF,11.3146,99.9 SIF-Syn: Synteny Box: tape measure protein like in phage DevitoJr, upstream gene is minor tail protein, downstream is minor tail protein like in phage DevitoJR (as of 1/17/22). /note=Primary Annotator Name: MONTOYA, CINTHYA /note=Auto-annotation: The start site is 20146 per Glimmer and GeneMark. /note=Coding Potential: There is coding potential present that spans over the putative ORF of this gene in both the self-trained and host-trained GeneMark reports. /note=SD (Final) Score: -4.256. This is the best SD score among all the other options since it is the least negative number and it is associated with a Z-score of 2.639 which is greater than the desired threshold of 2 and it is the highest value among all the other options. /note=Gap/overlap: There is an overlap of 4 bp with the gene upstream. A 4 bp overlap indicates that this gene is part of an operon which is highly favorable. This gap is also conserved in all phages from the same cluster. The length of the gene product is 4590 which is reasonably long considering the chosen start site. /note=Phamerator: Pham: 3932 as of 1/10/22. This gene is conserved 26/28 in members of cluster AU. All 116 finalized phage annotations within this pham called this gene a tape measure protein. Thus, this gene appears to be highly conserved among members of the same pham and members of the same cluster AU. /note=Starterator: The most reasonable start site conserved among the members of pham 3932 is start site 14 which is found in 86/116 (74.1%) of genes in this pham. There are 75/109 manual annotations of non-draft genes on this start site and it was called 95% of the time when present. In Brunswick’s genome this is start site @20146 which is consistent with Glimmer and Genemark’s auto-annotated start sites. /note=Location call: Based on the evidence presented, this is a real gene with the most reasonable start site being 20146. /note=Function call: Tape measure protein. The top 5 BLASTp hits corresponding to phages Nitkson, ElephantMan, DevitoJr, Nightmare, and Tatanka, suggest that the function of this gene is tape measure protein due to the high identity values (99%, 99%, 98%,98%, and 95% respectively) high query coverages (99%,99%,99%,99%, and 98% respectively) and extremely low e-values of 0.0. HHPRED was also used to analyze protein function resulting in consistent values with BLASTp. The identity values organized for the top five hits range from 99.48% to 93.59%, all query coverages are 100%, and e-values are 0.0 for all the top hits. CDD has a relevant hit for a phage-related tail protein [mobilome: prophage, transposons] with an e-value of 5.25e-4, accession number cl34972, and which is part of superfamily COG5283. HHpred’s top five hits are also consistent with a function of tape measure protein resulting high probability values ranging from 99.9-99.72, extremely low e-values (3.5e-13, 3.5e-13, 9.7e-13, 9.7e-13, and 7.3e-9 respectively). /note=Transmembrane domains: TmHmm and topcon predict 16 TMHs therefore this could be a membrane protein. However, tape measure protein seems more specific and appears to be highly conserved. /note=Secondary Annotator Name: Cheng, Celine /note=Secondary Annotator QC: Evidence that this is a gene looks great! Why was this gene not included? CDS 24725 - 25543 /gene="27" /product="gp27" /function="minor tail protein" /locus tag="Brunswick_27" /note=Original Glimmer call @bp 24725 has strength 7.64; Genemark calls start at 24725 /note=SSC: 24725-25543 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage Nightmare] ],,NCBI, q1:s1 100.0% 0.0 GAP: -11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.563, -4.813835316181514, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Nightmare] ],,ASM62304,100.0,0.0 SIF-HHPRED: Sipho_tail ; Phage tail protein,,,PF05709.14,88.9706,99.8 SIF-Syn: minor tail protein, upstream gene is in pham 3932 with tape measure protein, downstream gene is in pham 8861 with minor protein tail function, just like in phage Breylor17, Niktson, ElephantMan, and CapnMurica. Phams also aligned. /note=Primary Annotator Name: Semaan, Sasha /note=Auto-annotation: Both Glimmer and GeneMark agree on the same start site @ 24725. /note=Coding Potential: Gene has low coding potential predicted within the putative ORF. The chosen start site does cover the coding potential shown in Host-Trained GeneMark. Typical and atypical coding potential overlap present according to Self-Trained GeneMark. /note=SD (Final) Score: -4.814; there were 2 other suggested start sites with slightly better final scores but neither their Z-score or ORF length were as favorable as the suggested start site with a final score of -4.814. At this start site, the Z-score is 2.563. /note=Gap/overlap: There is an 11bp overlap with the preceding gene. While this is uncommon for a gene, the same overlap was observed in three other phages present in the cluster. This suggests that this overlap may be an exception to the general rule of overlaps not exceeding over a few base pairs. /note=Phamerator: Gene found in Pham 14719 as of 01/06/22. 21 other non-draft genes are present in this pham that belong to cluster AU. /note=Starterator: The suggested start site for this gene was the start-site called most often, in 84 of 85 non-draft genes, in the pham. The gene of phages Breylor17 and CapnMurica that shares the same cluster as this gene also calls this start site, start site number 1. For cluster AU1, start site 1 was annotated 13 times. /note=Location call: Gathered evidence suggests this is a real gene that has a start site @ 24725bp: covers all coding potential, longest ORF, aligns with start site of gene in phages of the same cluster (Breylor17 and CapnMurica). The overlap of 11bp is still important to note but all other evidence points to the starting point being @ 24725. /note=Function call: Minor tail protein; The top hits from PhagesDB with identities of 100% and 99%, and extremely small e-values of >1e-153, have two main functions: tail protein and minor tail protein. These hits also come from numerous phages within the same cluster. The NCBI database hits also showed significant hits for function of tail protein and minor tail protein with e-values of 0 and identities <95%. There is one hit from the CCD database that had 83% coverage and an e-value of 1.6e-17 that suggests the protein is a phage tail protein. There were 2 hits from the HHpred database that suggested that the gene is a distal tail protein, and both coverage and e-values were sufficient. /note=Transmembrane domains: There are no hits in TMHMM or Topcons suggesting that this gene does not have a transmembrane domain. This makes sense for the function of the gene, as tail proteins would not be expected to be found in the transmembrane domain. /note=Secondary Annotator Name: Cosentino, Evan /note=Secondary Annotator QC: Everything looks good! I agree with the location and function calls. Just don`t forget to fill in the synteny box. CDS 25543 - 26703 /gene="28" /product="gp28" /function="minor tail protein" /locus tag="Brunswick_28" /note=Original Glimmer call @bp 25543 has strength 8.5; Genemark calls start at 25543 /note=SSC: 25543-26703 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage Niktson] ],,NCBI, q1:s2 100.0% 0.0 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.618, -5.542591178556752, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Niktson] ],,ASD52253,99.2248,0.0 SIF-HHPRED: Protein gp18; NP_465809.1, prophage tail protein gp18, Structural Genomics, Joint Center for Structural Genomics, JCSG, Protein Structure Initiative; HET: MSE, MLY; 1.7A {Listeria monocytogenes EGD-e},,,3GS9_A,93.7824,99.8 SIF-Syn: The synteny checks out, as the gene is in the same position, with the same function as in Breylor and CapnMurica. The two flanking genes are in similar positions, and the preceding gene has the same function across all three phages. /note=Glania-- No coding potential seen on the Host-Trained GeneMark since the first gene candidate has a TTG start codon. Can not check starterator as of 11/18/22 due to pham number changing and the website has not been updated yet, but due to primary annotator`s notes, start site 25543 likely has the most MA`s. Checked evidence boxes. /note= /note=Primary Annotator Name: Verpukhovskiy, Philipp /note=Auto-annotation: Suggested start is 25543 to 26703. The top final score has a different start, but the start codon is TTG and the overlap is -4. The start chosen has a ATG start codon and an overlap of -1. Coding Potential: Good coding potential, shown by high marks on the self trained genemark, with the coding potential being in the forward direction. /note=SD (Final) Score: -5.543 /note=Gap/overlap: -1 /note=Phamerator: Pham 8861 1/26/22. Agrees with Breylor17 and adjacent genes have similar phams. /note=Starterator: Starterator agrees with the called start site of 25543, with 82.2% of genes found in the pham using this start. /note=Location call: 25543 with starterator, glimmer, and genemark all agreeing. /note=Function call: Minor tail protein. Phagesdb Blast has 100% probability with other phages from cluster agreeing, all have evlaues of 0. Similar with HHpred, with probabilities >99% and e values <-12. NCBI Blast with basically 100% coverage, alignment, and probability, with e values of 0. CDD says function is unknown but I believe one can choose to ignore this information compared to the e values of 0 of several phages in the same cluster. /note=Transmembrane domains: None present, even with Topcons graph saying no homologous transmembrane proteins found. /note=Secondary Annotator Name: Gibbons, Alicia /note=Secondary Annotator QC: All sections of these notes can use elaboration. You should include that Glimmer and GeneMark both call the start site 25543 and that there is no coding potential seen on the Host-Trained GeneMark, maybe because the first gene (LORF) candidate has a TTG start codon. You can also mention the z-score as well as the final score in the SD (Final) Score section. You should elaborate on why you chose this start site, especially when there is a candidate that creates the LORF with a gap of -4 and an identical z-score. As well, make sure to add other phages as evidence for function and update your synteny notes. Finally, run Topcons and check for TMDs! CDS 26725 - 26964 /gene="29" /product="gp29" /function="hypothetical protein" /locus tag="Brunswick_29" /note=Original Glimmer call @bp 26725 has strength 9.95; Genemark calls start at 26725 /note=SSC: 26725-26964 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_29 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 1.58052E-49 GAP: 21 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.319, -2.0720764396375664, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_29 [Arthrobacter phage Niktson] ],,ASD52254,100.0,1.58052E-49 SIF-HHPRED: SIF-Syn: The upstream this gene is minor tail protein. The downstream this gene is minor tail protein. The upstream gene of Niktson is minor tail protein. The downstream gene of Niktson is tail protein. /note=Primary Annotator Name: Liao, Shiqing /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 26725. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. The whole coding potential is covered between the start and stop site. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -2.072. It’s the best Final score. The Z score is 3.319, which is the highest. /note=Gap/overlap: The gap with the upstream gene is 21bp. The gap is conserved in other phages from cluster AU1. /note=Phamerator: Pham number is 2665. Recorded on 01/07/2022. This gene is in the same phamily with other 116 genes such as Bilo_22 (BI) and Breylor17_30 (AU). /note=Starterator: Start 3 is not the most annotated start site. The most annotated start site is start 4, but Brunswick doesn’t have this one. This start site is found in 23 of 74 genes in this pham and is called 100% when present. Start 3 also agrees with Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 26725. /note=Function call: Multiple phagesDB BLAST has hits suggesting unknown functions. CDD didn’t come back with hits. HHpred has multiple hits with unreasonably high e values (>51). Thus, the function of this gene is currently unknown. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Di Blasi, Daria /note=Secondary Annotator QC: I agree with the primary annotator based on all the evidence. CDS 26964 - 27752 /gene="30" /product="gp30" /function="minor tail protein" /locus tag="Brunswick_30" /note=Original Glimmer call @bp 26964 has strength 9.09; Genemark calls start at 26964 /note=SSC: 26964-27752 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage DevitoJr]],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.885, -2.916420155493201, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage DevitoJr]],,QXO13190,99.2366,0.0 SIF-HHPRED: Receptor-type tyrosine-protein phosphatase delta; trans-synaptic complex, synapse organizer, IMMUNE SYSTEM-HYDROLASE complex; HET: NAG; 3.253A {Mus musculus},,,4YFD_A,48.0916,97.9 SIF-Syn: Minor tail protein, upstream gene is in pham 2665, downstream gene is in pham 21404, just like DevitoJr. /note=Glania-- As of 11/18/22, Starterator calls the most annotated site for pham 54250 @ start site 2. Start site 2 for this gene in Brunswick is 26962, which is the suggested start. /note= /note=Primary Annotator Name: Tenney, Megan /note=Auto-annotation: Both Glimmer and GeneMark agree on a start site of 26964bp, an ATG codon. /note=Coding Potential: Self-trained GeneMark shows strong coding potential spanning the entire ORF, starting at the predicted start site mentioned above. Though, there is a small amount of coding potential that is visible before this start site, which could indicate an operon. /note=SD (Final) Score: This predicted start site has a final score or -2.916, which is the largest of all candidates, which is associated with a z-score of 2.885, which is strong evidence in favor of this start. /note=Gap/overlap: There is an overlap of 1bp, which could indicate an operon, which is highly favorable. This allows for the longest ORF of 789bp. /note=Phamerator: As of 1/8/2022, this gene is in pham 95727. This pham is highly conserved in cluster AU, including phages Breylor17 and ElephantMan which have a conserved ORF length of 789bp. All of the genes in pham 95727 that are assigned a function, including almost all within cluster AU, are of minor tail protein or tail protein functions. /note=Starterator: As of 1/8/22 there are 116 members in pham 95727. All of which call site 8 (at 26964bp) when it is present, including all phages belonging to the AU cluster, though this is not the “most annotated” start site. Brunswick calls site 8, since it does not have the most annotated start site. /note=Location call: Given the strong coding potential and the agreement by Glimmer and GeneMark, this gene is real and its start site is 26964bp. This site allows for the longest ORF that contains all coding potential and creates an overlap of 1bp, which is favorable. This start site is also highly conserved within members of AU and called 100% of the time it is present. /note=Function call: Minor tail protein. According to NCBI BLAST results, 5 of 6 of the top results with e-values of 0, 100% coverage, and % identity above 94 are minor tail proteins. Phagesdb BLAST shows the top two results to have the function “tail protein” (e-values < e-155). CDD shows two hits, both with fibronectin function, with relatively low e-values (e-value < e-0.6), however the coverage is very poor. HHpred showed no significant hits. /note=Transmembrane domains: Both TmHmm and Topcons did not detect any transmembrane domains. /note=Secondary Annotator Name: Empson, Brianna /note=Secondary Annotator QC: I think your Staterator analysis is wrong. Starterator shows the most common start site being #1, and Brunswick DOES call this start site. It also correlates to the auto-annotated base pair. Your phamerator info also appears to be incorrect, but it is possible the pham has changed since you annotated it. I would update that as well. A final score of I think -2 or less negative is pretty good! I think maybe you could change this part of your notes. CDS 27749 - 28192 /gene="31" /product="gp31" /function="membrane protein" /locus tag="Brunswick_31" /note=Original Glimmer call @bp 27749 has strength 2.23; Genemark calls start at 27749 /note=SSC: 27749-28192 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_31 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 4.66541E-95 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.992, -5.049507594900835, no F: membrane protein SIF-BLAST: ,,[hypothetical protein NIKTSON_31 [Arthrobacter phage Niktson] ],,ASD52256,100.0,4.66541E-95 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Teoh, Bryan /note=Auto-annotation: Both Genemark and Glimmer agree on the same start site at 27749 F. Both have an ATG start site. /note=Coding Potential: No coding potential was detected in all 6 ORFs of host-trained GeneMark, however, self-trained gene mark showed strong coding potential in the putative ORF. /note=SD (Final) Score: -4.616 ; Start site 27734 has the best score (-4.5) among all other start sites /note=Gap/overlap: Overlap of -4bp denotes a ribosomal binding operon. /note=Phamerator: Pham 21404 at 1/7/2022. It is conserved in 11 other phages from various clusters. it is the most called gene (5) in 46 non-draft phage genomes. /note=Starterator: Ran on 1/7/2022; Start number 5 is the most called gene number in 46 non-draft phage genomes; This is manually annotated in 46 non-draft genes in this Pham. The evidence agrees with the site predicted by Glimmer and GeneMark. Pham Maps called the gene as a valid gene. /note=Location call: Based on the evidence, this is not a gene but likely a ribosome binding complex operon and the appropriate start site should be 27749 /note=Function call : NKF /note=Transmembrane domains: None Listed /note=Secondary Annotator Name: Ghannam, Maisam /note=Secondary Annotator QC: Good job! Even though the gene is NKF, maybe specify the significance of what no transmembrane domain signifies. CDS 28192 - 28497 /gene="32" /product="gp32" /function="membrane protein" /locus tag="Brunswick_32" /note=Original Glimmer call @bp 28195 has strength 9.59; Genemark calls start at 28195 /note=SSC: 28192-28497 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein NIKTSON_32 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 2.08981E-65 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.894, -3.867122135280807, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein NIKTSON_32 [Arthrobacter phage Niktson] ],,ASD52257,100.0,2.08981E-65 SIF-HHPRED: SIF-Syn: Synteny is very well maintained with other genes in the same Pham (56949, as of 1/20/22). The sizes and spacing of the genes in all other non-draft phages in the same pham correspond to the one for Brunswick, at least locally around gene 31. The next gene in Brunswick has its function called as a holin, which only phage Synepsis out of the others in the Pham calls for the gene downstream of the corresponding homolog. /note=Primary Annotator Name: Turon Font, Guillem /note=Auto-annotation: Both Autoannotations agree on start site 28195. Not the largest ORF /note=Coding Potential: Pretty much all start sites would cover coding potential. It is not very strong and only shows up toward the end of the gene in both GeneMarks. /note=SD (Final) Score: AutoAnnotated Start has best Final Score and acceptable Z-score. /note=Gap/overlap: 2 /note=Phamerator: Pham 56949 as of 1/11/22. 90/100 genes in it are non-draft. /note=Starterator: This gene does not have the MA start, but it does have a well annotated start (26). It`s the most called out of all the start sites it presents. It is 28,195. /note=Location call: I agree with the autoannotated start site. /note=Function call: PhagesDB has no hits with a known function. Neither does NCBI BLAST. CDD has no hits. No hit in HHPred meets the requirement of e=10-3. Lowest e-value is 0.078 /note=Transmembrane domains: TmHmm shows one domain. TOPCONS does not load. /note=Secondary Annotator Name: Light, Isabel /note=Secondary Annotator QC: I agree with the location and function call, I would make sure to fill out the synteny box especially because the downstream gene is known function. CDS 28500 - 28892 /gene="33" /product="gp33" /function="membrane protein" /locus tag="Brunswick_33" /note=Original Glimmer call @bp 28500 has strength 5.81; Genemark calls start at 28500 /note=SSC: 28500-28892 CP: no SCS: both ST: SS BLAST-Start: [holin [Arthrobacter phage Synepsis]],,NCBI, q1:s1 100.0% 2.51717E-86 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.74, -3.2181355793933353, yes F: membrane protein SIF-BLAST: ,,[holin [Arthrobacter phage Synepsis]],,AXH46694,98.4615,2.51717E-86 SIF-HHPRED: Phage_holin_Dp1 ; Putative phage holin Dp-1,,,PF16938.8,46.9231,99.9 SIF-Syn: Holin protein is flanked by a pham 56949 and pham 52323, just like in phage Synepsis. /note=Primary Annotator Name: Chavez, Valeria /note=Auto-annotation: Glimmer and Genemark both agree on the same start site at position 28500. The start codon is ATG. /note=Coding Potential: There is reasonable typical and atypical coding potential within the putative ORF predicted by GeneMarkS and GeneMark Host. The chosen start site at position 28500 covers all of this coding potential. /note=SD (Final) Score: The SD score is -3.218 and predicts the best sequence match. The Z-score is is 2.74 and is the only one >2. /note=Gap/overlap: The gap is 2 bp. Out of all suggested start sites, this one creates the smallest gap, which is preferred by the ribosome. /note=Phamerator: This gene is in Pham 94525 as of 01/11/22. Our phage is in subcluster AU1, and there are 21 non draft genomes in this subcluster that also have this pham. Phages Breylor17, Gordon, Synepsis, and Truckee were used for comparison. Phamerator did not have a function called for this gene, but Synepsis_32 is in the same pham as this gene and its called function is holin. /note=Starterator: Start site 16 is conserved among other members of the pham to which this gene belongs. 31/84 non draft genes in this pham call this site. Brunswick does not have this start site, but calls start site 17 at position 28500. 28/109 non draft genes in this pham call this site 100% of the time when present and most are in the AU cluster. /note=Location call: The gathered evidence suggests that this is a real gene and that start site 17 at basepair position 28500 is most likely the true start site. /note=Function call: The top 2 NCBI and PhagesDB BLASTp hits with functions, sorted by e-value, suggested function is holin, with high query coverage (>98%), medium to high % identity (63-96%), and low e-values (<4e-40). CDD and HHpred hits had high probability of >99%, medium coverage of >46%, and low e-values that met the <10e-3 threshold. The suggested function is also holin. /note=Transmembrane domains: Since TMHMM and TOPCONS called at least 2 TMDs, we can conclude that this protein does have TMDs. The visual data supports this and also predicts 2 transmembrane domains. This makes sense since the predicted function is holin, which inserts itself into the bacterial membrane in order to lyse. /note=Secondary Annotator Name: Jin, Katherine /note=Secondary Annotator QC: I agree with the evidence. CDS 28951 - 29247 /gene="34" /product="gp34" /function="hypothetical protein" /locus tag="Brunswick_34" /note=Original Glimmer call @bp 28951 has strength 6.17; Genemark calls start at 28951 /note=SSC: 28951-29247 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_34 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 3.40657E-62 GAP: 58 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.308, -1.953940808934884, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_34 [Arthrobacter phage Niktson] ],,ASD52259,100.0,3.40657E-62 SIF-HHPRED: SIF-Syn: Brunswick_34 is flanked by Brunswick_33 and Brunswick_35. CapnMurica and DevitoJr are also found in cluster AU1 with Brunswick. CapnMurica_34 and DevitoJr_34 are part of Pham 52323 with Brunswick_34 and are annotated to be of No Known Function (NKF). The upstream genes, Brunswick_33, CapnMurica_33, and DevitoJr_33 are part of Pham 94525 and currently are of NKF. Finally, the downstream gene Brunswick_35 is annotated to be of NKF. This is similar to CapnMurica_35, which is part of Pham 2801 and is also of NKF. This synteny is still seen further downstream many genes. The downstream gene in DevitoJr is DevitoJr_35, which is part of Pham 16715, and was not seen in either Brunswick nor CapnMurica. However, DevitoJr_36 has synteny with Brunswick_35 and CapnMurica_35 and is part of Pham 2801. Regardless, in both CapnMurica and DevitoJr, similarly sized gaps are seen planking their respective genes sharing synteny with Brunswick_34. /note=Primary Annotator Name: Cheng, Celine /note=Auto-annotation: Both Glimmer and GeneMark call the gene and agree on the start site at 13424 bp. Glimmer assigns a score of 10.71. The start codon is ATG. /note=Coding Potential: Coding potential was found in both GeneMark Host and GeneMark Self on the second open reading frame. However, the coding potential predicted by GeneMark Host isn’t very strong and starts at about 14080 bp. On the other hand, GeneMark Self has strong coding potential starting at about 13400 bp. The chosen start site captures all coding potential. /note=SD (Final) Score: The final score of -1.954 is the best one listed in PECAAN. This correlates to a Z-score of 3.308. /note=Gap/overlap: With the upstream gene, there is a gap of 13 bp. With the downstream gene, there is a gap of 12 bp. Both gaps are of reasonable size. /note=Phamerator: As of 01/07/2022, it is part of pham 8877. It is conserved in phages Breylor17, CapnMurica, DevitoJr, and CastorTray, which are all also part of subcluster AU1 with Brunswick. Many genes in this pham encode major tail protein. /note=Starterator:There are 220 non-draft genes in pham 8877, and 23 with the start site called, start number 8, which corresponds to a start at 13424 bp for Brunswick. While it is not the most often called start site, this particular start has 19 manual annotations. /note=Location call: Based on the evidence above, including coding potential, SD (final) score, gap/overlap size, and phamerator and starterator analysis, this gene is a real gene and has a start site at 13424 bp. We can see that this start site is called in Starterator, GeneMark, and Glimmer. /note=Function call: Many PhagesDB BLAST hits suggest this gene’s function is a major tail protein with small e-values of 1e-156 to 1e-151. HHPRED had two hits for phage tail tube protein with a 99.9% probability, 64.6617% and 70.3008% coverage, and e-values of 2.5e-23 and 3.8e-20. NCBI BLAST also has multiple hits for major tail protein, with top hits having 99.6241% identity, 100% alignment, 100% coverage, and e-value of 0. There are also several other hits for HNH endonuclease with very small e-values. CDD also had a hit for phage major tail protein with 15.2632% identity, 22.1053 % alignment, 52.2556% identity, and an e-value of 0.000108461. Specifically, major tail protein and tail tube protein are accepted SEA-PHAGES functions. /note=Transmembrane domains: Both TMHMM and TOPCONS did not predict any TMDs. As a result, this is not a membrane protein. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 29300 - 29407 /gene="35" /product="gp35" /function="hypothetical protein" /locus tag="Brunswick_35" /note= /note=SSC: 29300-29407 CP: yes SCS: neither ST: NA BLAST-Start: [hypothetical protein SEA_GIANTSBANE_36 [Arthrobacter phage Giantsbane]],,NCBI, q1:s1 97.1429% 4.13302E-7 GAP: 52 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.292, -6.510288234989851, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_GIANTSBANE_36 [Arthrobacter phage Giantsbane]],,QGZ17240,73.6842,4.13302E-7 SIF-HHPRED: SIF-Syn: /note=added gene - atypical CP on GM self. Small, but fills gap. Very similar sequence to: Giantsbane_36, Ingrid_35, LilHuddy_36, Loretta_35. /note=No TMDs detected. This gene does interrupt several other membrane proteins in a row. CDS 29420 - 29710 /gene="36" /product="gp36" /function="hypothetical protein" /locus tag="Brunswick_36" /note=Original Glimmer call @bp 29420 has strength 9.13; Genemark calls start at 29420 /note=SSC: 29420-29710 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_35 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 6.85683E-62 GAP: 12 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.007, -2.5823297389851954, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_35 [Arthrobacter phage Niktson] ],,ASD52260,98.9583,6.85683E-62 SIF-HHPRED: SIF-Syn: Membrane protein, the upstream gene is called as NKF and the downstream gene is NKF, just like in phage ElephantMan. /note=Glania-- Final score of -2.582 is strong evidence for start site 29420. /note= /note=Primary Annotator Name: Cosentino, Evan /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 29420. /note=Coding Potential: There is good typical coding potential on GeneMark Host on a forward frame. There is good atypical coding potential on GeneMark Self as well. /note=SD (Final) Score: -2.582 /note=Gap/overlap: 172 bp gap /note=Phamerator: This is listed in pham 2801. Date 1/10/22. /note=Starterator: Start site 8 in Starterator was manually annotated in 13 out of 21 non-draft genes. Start 8 is 29420 in Brunswick. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: 29420 /note=Function call: Membrane protein. NCBI BLAST lists the function as a hypothetical protein with high percent identities and low e-values. PhagesDB BLAST also lists the function as function unknown with low e-values and high scores. No results come up on CDD. On HHPred, the best hit lists the function as a YtxH-like protein. This result, however, has a very high e-value at 8.8 and a probability of 85.52%. The 2nd best hit on HHPred lists the function as a Copper resistance protein ScsC N-terminal domain, but has an e-value of 12 and probability of 68.46%. /note=Transmembrane domains: TmHmm and Topcons both call 1 transmembrane domain. SOSUI, DeepTMHMM predict none. /note=Secondary Annotator Name: Montoya Serpas, Cinthya /note=Secondary Annotator QC: SD score, Gap overlap, phamerator, starterator, and location call, and transmembrane domain sections need improvement. CDS 29802 - 29990 /gene="37" /product="gp37" /function="hypothetical protein" /locus tag="Brunswick_37" /note=Original Glimmer call @bp 29802 has strength 8.4; Genemark calls start at 29802 /note=SSC: 29802-29990 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TEACUP_37 [Arthrobacter phage Teacup]],,NCBI, q1:s1 100.0% 7.42653E-34 GAP: 91 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.211, -2.156361006712914, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TEACUP_37 [Arthrobacter phage Teacup]],,ASR84042,98.3871,7.42653E-34 SIF-HHPRED: SIF-Syn: Gene 36 (stop @29990) in Brunswick has NKF (with evidence of being a membrane protein) and is of Pham 57867. Corresponding Gene 36 in Niktson also has NKF and is of pham 56867. Brunswick`s upstream gene has NKF and is of pham 2801, like the upstream gene of Niktson. Brunswick`s downstream gene has NKF and is of pham 94536, like the downstream gene of Niktson. /note=Primary Annotator Name: Gibbons, Alicia /note=Auto-annotation: GeneMark and Glimmer both call a start site of 29802. This start site has a start codon of ATG. /note=Coding Potential: There is reasonable (but low) coding potential predicted within the putative ORF, covered by the chosen start in the Host-Trained GeneMark. There is high coding potential predicted within the putative ORF, covered by the chosen start site in the Self-Trained GeneMark. /note=SD (Final) Score: This start site call has the highest final score (-2.156), the highest Z-score (3.211), and is the LORF. /note=Gap/overlap: While the gap with the upstream gene is fairly high at 91, it is the smallest option. The length of the gene is reasonable at 189 bp. /note=Phamerator: As of 1/10/22, the gene is in pham 57867. All other phages containing genes in this pham are of cluster AU and none have assigned functions. Therefore, this pham is conserved and found in phages including Breylor17, CapnMurica, and CastorTray. /note=Starterator: Start 1 is found in 23 of 23 of genes in this pham and called 100% of the time when present and is therefore conserved. Start 1 corresponds to the base pair position 29802 with the start codon ATG. /note=Location call: Taken together, the gathered evidence suggest that this is a real gene with a start site of 29802. /note=Function call: PhagesDB BLASTp returns16 non-draft phages with genes with e-values smaller than 1e-06, but none contain known functions. NCBI BLASTp returns 8 phage genes with e-values less than 1e-06, 100% query cover, and 100% to 67.74% identity, with no known functions. CDD does not return any conserved domains. HHPRED does not return any results with significantly low e-values. While there is no known specific function, the presence of a transmembrane domain suggests that this gene is a membrane protein. /note=Transmembrane domains: TMHMM predicts 1 TMH. TMHMM predicts 1 TMH. TOPCONS calls for 1 TMH with a TM-helix domain. /note=Secondary Annotator Name: Semaan, Sasha /note=Secondary Annotator QC: All the evidence seems to support the calls made. Double-check the evidence needed to identify a protein as a membrane protein, 1 hit from TMHMM and 1 hit from TOPCON should be sufficient evidence to call the function of this gene a membrane protein. Everything else looks good! **addressed CDS 30104 - 30346 /gene="38" /product="gp38" /function="hypothetical protein" /locus tag="Brunswick_38" /note=Genemark calls start at 30068 /note=SSC: 30104-30346 CP: yes SCS: genemark-cs ST: SS BLAST-Start: [hypothetical protein SEA_TATANKA_37 [Arthrobacter phage Tatanka]],,NCBI, q1:s1 100.0% 2.37802E-47 GAP: 113 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.073, -2.583959800616441, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TATANKA_37 [Arthrobacter phage Tatanka]],,AXC38663,97.5,2.37802E-47 SIF-HHPRED: SIF-Syn: NKF; upstream gene is NKF, as in phage Breylor17 and CastorTray; downstream gene is a membrane protein in Brunswick but has NKF in Breylor17 and CastorTray; Breylor17 has an extra gene inserted with NKF before the corresponding gene. /note=Glania-- Checked evidence for unknown function proteins in phagesDB and NCBI Blasts. Though they don`t provide evidence of any function, it is still evidence that it is a real gene! /note= /note=Primary Annotator Name: Di Blasi, Daria /note=Auto-annotation: Only GeneMark calls the start site (30068). The start codon is ATG for this start site. The start site is the LORF. /note=Coding Potential: Coding potential on the forward strand only, indicating that this is a forward gene. Both GeneMark Self and GeneMark Host show coding potential. GeneMark Host shows coding potential starting at about 30150 while GeneMark Self shows coding potential starting at about 30100 (coding potential starts after the called start site (30068) for both host- and self-trained GeneMark). /note=SD (Final) Score: The 30068 start site has the second most favorable SD score (-3.056) and Z-score (2.818) of all the potential starts. The 300104 start site has the most favorable SD score (-2.584) and Z score (3.073) of all potential start sites, suggesting that the 300104 start site is the real start site of the gene. /note=Gap/overlap: The 30068 start site has a 77 bp gap with the upstream gene. This is the smallest gap of all the potential start sites. Start site 30068 produces a gene of length 279 bp which is reasonable. Start site 30104 produces a 113 bp gap with the upstream gene and produces a gene of length 243 bp. Additionally, the 133 bp gap is conserved with other phages in the cluster such as in Breylor17(AU). /note=Phamerator: The gene is part of pham 94536 as of January 7th, 2022. The pham has 49 members, 5 of which are draft genomes (including Brunswick). 25 of the 49 phages with pham 94536 belong to the AU cluster such as Breylor17(AU) and CastorTray(AU). /note=Starterator: The highly conserved start site (start site 7) is present in the Brunswick genome but is not the start site called (start 30068). The most annotated site is called in 42 of the 44 non-draft genes in the pham and is called 97.9% of the time when present. 16 of the 42 phages that call start site 7 belong to subcluster AU1 like Brunswick. There are no manual annotations of start site 30068 and Brunswick is the only phage that calls this start site. This evidence suggests that start site 30068 is not the real start site for the gene in Brunswick and start site 30104 (the most annotated start site (1)) is the real start site of the gene. /note=Location call: Based on all of the evidence, the start site of the gene is likely start site 300104 since it includes all of the coding potential within the ORF in GeneMark Self, has the most favorable SD score (-2.584) and Z score (3.073), is the most annotated start site called 97.9% of the time when it is present, and 16 of the 42 phages that call start site 300104 belong to subcluster AU1 like Brunswick. /note=Function call: NKF; All phagesdb BLASTp hits were of unknown function, all NCBI BLASTp hits called gene products with a hypothetical protein, and there were no relevant CDD hits or HHpred hits (all HHpred hits had E-values > 4.6), therefore there is evidence to support that the gene product has no known function. /note=Transmembrane domains: TMHMM predicts 1 TMD. TMHMM shows the 1 TMD with a probability just under 0.6, which is below the 0.75 probability threshold. TOPCONS does not predict any TMDs. Therefore, this data is not significant enough to conclude that the gene has a TMD. /note=Secondary Annotator Name: Verpukhovskiy, Philipp /note=Secondary Annotator QC: Agree with location and function calls, very comprehensive notes. CDS 30404 - 30556 /gene="39" /product="gp39" /function="hypothetical protein" /locus tag="Brunswick_39" /note= /note=SSC: 30404-30556 CP: no SCS: neither ST: NA BLAST-Start: [hypothetical protein SEA_TENNO_39 [Arthrobacter phage Tenno]],,NCBI, q1:s1 100.0% 1.41851E-28 GAP: 57 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.9, -4.410201207862326, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TENNO_39 [Arthrobacter phage Tenno]],,AYD87293,100.0,1.41851E-28 SIF-HHPRED: SIF-Syn: /note=Added gene. Good CP on GM-self only. Fills otherwise large gap. Phage Tenno has an orpham gene in this location. CDS 30737 - 31495 /gene="40" /product="gp40" /function="hypothetical protein" /locus tag="Brunswick_40" /note=Original Glimmer call @bp 30737 has strength 10.0; Genemark calls start at 30797 /note=SSC: 30737-31495 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein QCN31_gp40 [Arthrobacter phage Teacup] ],,NCBI, q1:s1 100.0% 0.0 GAP: 180 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.308, -1.953940808934884, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein QCN31_gp40 [Arthrobacter phage Teacup] ],,YP_010749782,100.0,0.0 SIF-HHPRED: SIF-Syn: membrane protein, upstream gene is in pham 94536 and is NKF and downstream gene is in pham 61939 and is NKF, just like in phage CapnMurica. The upstream gap is also conserved in CapnMurica. /note=Primary Annotator Name: Empson, Brianna /note=Auto-annotation: Both Glimmer and GeneMark call this gene. However, Glimmer calls the start site at 30737 (valine) and GeneMark calls the start site at 30797 (methionine). /note=Coding Potential: Both auto-annotated start sites encompass all of the reasonable coding potential within the putative ORF. /note=SD (Final) Score: The Glimmer auto-annotated start site at 30737 has the best Final Score at -1.954. The GeneMark called start site is not even in the top 3 best Final Scores at -6.510. /note=Gap/overlap: The Glimmer start site minimizes the gap to the best extent possible with a gap of 390 bp. Looking at the pham map, this gap appears to be reasonable. This start site also creates the LORF, and the gene length is a reasonable 759 bp. The GeneMark called start site does not contain the longest ORF and increases the gap between genes. /note=Phamerator: As of January 7th, 2022, this gene is in pham 56194. This pham does seem to be conserved in the cluster that Brunswick belongs to. Phages and genes that were used for comparison include: CastorTray_55, CapnMurica_58, and Nightmare_41. The phams database and Phamerator did not have a function called for this gene. /note=Starterator: The most conserved start site among the members of the pham is start site #29 which corresponds to bp 30737 in Brunswick. This start site is called in 104/218 non-draft genes and is called 97.4% of the time when present. /note=Location call: The gathered evidence suggests that the auto-annotated start site called by Glimmer at bp 30737 is the correct start site for this gene. This is the most conserved start site, creates the best Final Score, LORF, and minimizes the gap between genes. The presence of reasonable coding potential within this ORF also suggests that this is a real gene. /note=Function call: The top 5 NCBI BLASTp hits, sorted by E-value, suggested this is a hypothetical protein, with high query coverage (100%), high % identity (>88.93%), and low E-values (2e-155). The top 5 PhagesDB hits also suggest that the function is unknown, with high identity coverage (>91%) and low E-values (0). CDD and HHpred were uninformative. /note=Transmembrane domains: TMHMM predicted the presence of 2 transmembrane domains which does meet the threshold for calling the function of the protein as a membrane protein. Topcons did also call two TMDs. /note=Secondary Annotator Name: Liao, Shiqing /note=Secondary Annotator QC: Looks good CDS 31519 - 31977 /gene="41" /product="gp41" /function="hypothetical protein" /locus tag="Brunswick_41" /note=Original Glimmer call @bp 31519 has strength 6.24; Genemark calls start at 31621 /note=SSC: 31519-31977 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein QCN36_gp42 [Arthrobacter phage CastorTray] ],,NCBI, q1:s1 100.0% 7.58484E-109 GAP: 23 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.002, -5.585253183906189, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein QCN36_gp42 [Arthrobacter phage CastorTray] ],,YP_010750225,100.0,7.58484E-109 SIF-HHPRED: dGTP_diPhyd_N ; dATP/dGTP diphosphohydrolase, N-terminal,,,PF18909.4,66.4474,100.0 SIF-Syn: /note=Glania-- Checked evidence for diphosphohydrolase in HHpred because ideal values for probability and e-value, and primary annotator used it as evidence in notes. /note= /note=Primary Annotator Name: Ghannam, Maisam /note=Auto-annotation: Glimmer calls the gene at 31519 F. Genemark calls the gene at 31621. Coding potential suggests that Glimmer is the right call site. The start codon is ATG or Methionine. /note=Coding Potential: High coding potential on Self trained Genemark and moderate coding potential on host trained Genemark. Coding potential has a significant peak at the 31519 space which indicates that this is a better start site over the suggested start site that Genemark auto annotated. Evidence suggests that the gene is real. /note=SD (Final) Score: Z score at 2.002, final score relatively high at -5.85. Compared to other projected scores this is still the most accurate based on Genemark data. /note=Gap/overlap: 23 base pair gap overlap, no significant observations to be made. /note=Phamerator: Part of Pham 61939 – highly conserved pham map when compared to phage Synepsis and Breylor. These phages in the same cluster describe the gene as having no known function. Coincides with the evidence presented thus far for phage Brunswick. /note=Starterator: Part of Pham 61939. Calls start at bp 10 just like 27/34 of the other phages in this pham, indicating this is a highly conserved starting site for this pham. /note=Location call: Gene 39 starts at 10@31519 and ends at 31977. High coding potential data suggests gene is real, though function may be unknown at this time. /note=Function call: BLASTp hits all point to phages in the same cluster designating the gene as having no known function. NCBI BLAST also showing hypothetical protein with 100% coverage for phage CastorTray Gene 42. Suggests the gene is conserved among phages within the same subcluster. HHpred however shows high probability for diphosphohydrolase function with moderate coverage amount. Therefore official function call will be: hydrolase. /note=Transmembrane domains: No predicted transmembrane domains. Suggests that this gene, though having no known function, does not have any interaction with host cell membrane features. /note=Secondary Annotator Name: Tenney, Megan /note=Secondary Annotator QC: I agree with your location call, that this is a real gene and that start site candidate 31519 is likely true. As far as the function, HHpred has a pretty strong hit and actually the coverage is pretty high (along with ideal values for e-value and probability). I think this might be worth checking as evidence/changing the function call. Great job! CDS 32039 - 32248 /gene="42" /product="gp42" /function="hypothetical protein" /locus tag="Brunswick_42" /note=Original Glimmer call @bp 32039 has strength 11.51; Genemark calls start at 32039 /note=SSC: 32039-32248 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_40 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 1.07829E-39 GAP: 61 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.211, -2.6835611257758942, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_40 [Arthrobacter phage Niktson] ],,ASD52265,100.0,1.07829E-39 SIF-HHPRED: SIF-Syn: Function unknown for gene in pham #1581 is conserved as well as the upstream downstream (both unknown function as well) in phage Breylor17. /note=Primary Annotator Name: Light, Isabel /note=Auto-annotation: The gene was called by Glimmer and GeneMark with the same start site of 32039, with codon ATG. /note=Coding Potential: The gene has good coding potential in the putative ORF observed on both Host and Self trained GeneMark. The start site includes all coding potential for this gene. /note=SD (Final) Score: The final score of -2.684 is the best possible final score between start sites. The Z-score is 3.211, which is the highest possible score of start sites. /note=Gap/overlap: The gap is 61 bp long and the length of the ORF is 210 bp. This is not the longest possible ORF but it is the start site closest to the coding potential, with the best Z-score and Final Score. Given this start site, the gene length of 210 bp is acceptable. /note=Phamerator: There are 25 genes in the pham #1581 as of 1/20/2022. All of the phages were in cluster AU with Brunswick (subclusterAU1), including Breylor17 and CastorTray. No known function was provided. /note=Starterator: Startsite #8 (bp 32029) was manually annotated on 8/11 genes. All 8 phages which called start site #8 are in cluster AU. Overall, startsite #8 is called 92% of time, in 19/21 manually annotated genes in pham. Starterator supports startsite at 32039. /note=Location call: All evidence supports this is a real gene which starts at startsite 32039, given it was called by both Glimmer and Genemark. The Z-score and Final Score are the best possible, the start site has been selected manually 19/21 times for genes in the pham per Starterator, all data supports start site 32029. /note=Function call: No evidence on phagesDB BLASTp, NCBI BLASTp, HHPred, or CDD provide any evidence of a known function. /note=Transmembrane domains: No transmembrane domains predicted by both TMHMM and Topcons. /note=Secondary Annotator Name: Teoh, Bryan /note=Secondary Annotator QC: There is sufficient evidence to conclude the annotation of this gene and is in agreement with primary annotator. CDS 32324 - 32779 /gene="43" /product="gp43" /function="membrane protein" /locus tag="Brunswick_43" /note=Genemark calls start at 32324 /note=SSC: 32324-32779 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein NIKTSON_41 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 1.47882E-106 GAP: 75 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.319, -1.9310779259753799, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein NIKTSON_41 [Arthrobacter phage Niktson] ],,ASD52266,98.0132,1.47882E-106 SIF-HHPRED: SIF-Syn: No known function protein, upstream gene is NKF, downstream is a NKF, just like in phage Breylor17. /note=Primary Annotator Name: Jin, Katherine /note=Auto-annotation: Only GeneMark calls a start site. 32324 was the suggested start site. The site number that was called is #40. The predicted start codon is ATG. /note=Coding Potential: There is good coding potential predicted by GeneMark Self (Track 2), and by GeneMark Host (Track 2 shows an upward hash at the start site). There is reasonable coding potential predicted within the ORF and over the start site, 32324. /note=SD (Final) Score: The SD score is -1.931, this is the best SD score; there are SD scores of -6, but with bad Z-scores of less than 2. In this specific case, the predicted SD and Z score combination (-1.931 and 3.319 respectively) seems to be relevant for a ribosome binding site. /note=Gap/overlap: The overlap of 75bp is reasonable for this gene. This gene length prediction is the longest ORF, and it is acceptable given the auto annotated start site and scores. /note=Phamerator: This gene is found in Pham 39125 as of 1/07/22. This pham is only found in other members of the same cluster, AU, that Brunswick is in. Some of the phages that I’ve compared to Brunswick are Breylor17, CapnMurica, and CastorTray. As of now, I have not seen a function call for this gene. /note=Starterator: The start number called the most often in the published annotations is 1, it was called in 12 of the 21 non-draft genes in the pham. The start site number for this gene corresponds to the 32324 base pair position. /note=Location call: Based on all the evidence collected so far from Glimmer, Genemark and Starterator, the agreed upon start site is 32324bp. Start number 1 was also the most annotated start number for this gene and other phage genes that are in the AU cluster. /note=Function call: The top 5 NCBI BLASTp hits, with E-values lower than 9e-75, suggest that this gene probably could have no known function (NKF) or could be a membrane protein, with high query coverage (100%), but relatively low % identity (>68.63%). /note=The top 3 hits from PhagesDB BlastP yielded E-values all lower than 1e-80, and suggest the gene’s function is NKF. /note=Transmembrane domains: There was one predicted TMH hit on TMHMM and topcons /note= /note=Secondary Annotator Name: TURON FONT, GUILLEM /note=Secondary Annotator QC: I agree with the findings. CDS 32781 - 33044 /gene="44" /product="gp44" /function="hypothetical protein" /locus tag="Brunswick_44" /note=Original Glimmer call @bp 32781 has strength 3.54; Genemark calls start at 32781 /note=SSC: 32781-33044 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_42 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 2.59049E-56 GAP: 1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.992, -4.698398734893535, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_42 [Arthrobacter phage Niktson] ],,ASD52267,100.0,2.59049E-56 SIF-HHPRED: SIF-Syn: synteny box: NFK, both genes upstream and downstream are NFK just like in phage Nightmare. /note=Primary Annotator Name: Montoya, Cinthya /note=Auto-annotation: 32781 per glimmer and GeneMark. /note=Coding Potential: There is very minimal coding potential for this gene in the host-trained GeneMark coding potential map. There is more coding potential in the host-trained /note=GeneMark report. /note=SD (Final) Score: -4.698. This is the best final score as it is the least negative. This score is also associated with the best Z-score of 1.992 which is the closest to the acceptable threshold of 2. /note=Gap/overlap: There is an overlap of 1 bp with the gene upstream. An overlap of 1 results in a reasonably long OFR, however not the longest. A similar gap is conserved in some phages from the same subcluster AU1 (CastorTray,ElephantMan, Nightmare, Niktson, Synopsis,Tatanka, Teacup). /note=Phamerator: Pham: 22410 as of 1/10/22. This Pham is made up of 9 members, 2 draft genomes and 7 final genomes. This pham is conserved in 7/19 members of subcluster AU1 and overall in 7/28 members of cluster AU. Phages CastorTray,ElephantMan, Nightmare, Niktson, Synopsis,Tatanka, and Teacup were used for comparison. There is no function called for this gene at the moment. /note=Starterator: The most reasonable start site is start site 5 which is found in 9/9 of genes in pham 22410. There are 7/7 manual annotations for this start site and it was called 100% of the time when present. This start site corresponds to bp 32781 which is consistent with the auto-annotated call by Glimmer and GeneMark. /note=Location call: Based on the evidence presented, this is a real gene with the most reasonable start site being 32781. /note=Function call: NFK. The top five hits on PhagesDB BLASTp have no function listed for this gene but have extremely small e-values ranging from 4e-46 to 2e-31 which is strong evidence suggesting that this is a real gene. This is consistent with the information presented by NCBI BLASTp as the top five hits list the function of this gene as unknown. However, the top five hits resulted in high query coverage ranging from 100% to 81%, identity values ranging from 100% to 71.23%, and e-values ranging from 3e-56 to 2e-27. There are no CDD hits available for this gene and there are no significant hits predicted by HHpred due to the e-values being too high, the lowest being 4.9. /note=Transmembrane domains: There is no predicted TMH by TOPCONS and TmHmm. Therefore, the function of this gene is not membrane protein. /note=Secondary Annotator Name: Chavez, Valeria /note=Secondary Annotator QC: All evidence presented strongly supports function and location call of this gene. CDS 33047 - 33175 /gene="45" /product="gp45" /function="hypothetical protein" /locus tag="Brunswick_45" /note= /note=SSC: 33047-33175 CP: yes SCS: neither ST: NA BLAST-Start: [hypothetical protein NIKTSON_43 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 1.18676E-21 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.754, -3.188585239877588, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_43 [Arthrobacter phage Niktson] ],,ASD52318,100.0,1.18676E-21 SIF-HHPRED: SIF-Syn: /note=Added gene. VERY low CP on GM-self only. CastorTray, ElephantMan, Niktson call gene in this location. Pham of their gene has 18 members, many from other clusters. CDS 33172 - 33831 /gene="46" /product="gp46" /function="hypothetical protein" /locus tag="Brunswick_46" /note=Original Glimmer call @bp 33172 has strength 14.08; Genemark calls start at 33172 /note=SSC: 33172-33831 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_44 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 4.03689E-150 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.308, -2.033982896655645, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_44 [Arthrobacter phage Niktson] ],,ASD52268,98.1735,4.03689E-150 SIF-HHPRED: SIF-Syn: The synteny checks out, as the gene is in the same position, with the same function as in Breylor and CapnMurica. The following gene is in the same position but the preceding genes are different, yet the gap stays the same. /note=Primary Annotator Name: Verpukhovskiy, Philipp /note=Auto-annotation: GIimmer and genemark both agree of start of 33172. ATG start codon. /note=Coding Potential: Good coding potential on self-trained genemark, with it being read in the forward direction. /note=SD (Final) Score: -2.034, best score on pecaan. /note=Gap/overlap: 127, fairly big but when checked for synteny other phages have as large of a gap. /note=Phamerator: Pham 55385. checked 1/26/22. Breylor17 agrees with pham denomination. /note=Starterator: Agrees with start site, with 76% of calls being for this start site, 79/127 called. This is start site 20. /note=Location call: 33172 start due to glimmer, genemark, other finalized phages, and starterator all agreeing. /note=Function call: Unknown function in phagesdb blast, with e values <-120 and 100% probability. HHpred has hits with low probability and coverage, and high e values. NCBI blast has the same results as phagesDB. CDD has no hits for the conserved functions. /note=Transmembrane domains: None. /note=Secondary Annotator Name: Cosentino, Evan /note=Secondary Annotator QC: I would go into a bit more detail with the starterator and function calls. Additionally was there coding potential in genemark host? Everything else looks good though! I agree with the location and function calls. CDS 33924 - 34226 /gene="47" /product="gp47" /function="hypothetical protein" /locus tag="Brunswick_47" /note=Original Glimmer call @bp 33924 has strength 6.54; Genemark calls start at 33924 /note=SSC: 33924-34226 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_45 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 1.92518E-63 GAP: 92 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.777, -3.2026596621721617, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_45 [Arthrobacter phage Niktson] ],,ASD52269,100.0,1.92518E-63 SIF-HHPRED: SIF-Syn: The upstream gene of this gene is NKF, and the downstream gene of this gene is membrane protein Functions for other phages such as phage Breylor17 are not shown. Pham 84471 is the corresponding gene. The upstream is pham 55385. The downstream is pham 12638. /note=Primary Annotator Name: Liao, Shiqing /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 33924. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. The whole coding potential is covered between the start and stop site. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.203. It’s the best Final score. The Z score is 2.777, which is the highest. /note=Gap/overlap: The gap with the upstream gene is 92bp. The gap is conserved in other phages from cluster AU1. The gap is a little bigger than desired, but not big enough to have another gene in between. /note=Phamerator: Pham number is 84471. Recorded on 01/07/2022. This gene is in the same phamily with other 116 genes such as CapnMurica_43 (AU) and Breylor17_46 (AU). /note=Starterator: Start 1 is the most annotated start site. This start site is found in 25 of 27 genes in this pham and is called 100% when present. Start 1 also agrees with Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 33924. /note=Function call: Multiple phagesDB BLAST has hits suggesting unknown functions. CDD didn’t come back with hits. HHpred has multiple hits with unreasonably high e values (>110). Thus, the function of this gene is currently unknown. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Gibbons, Alicia /note=Secondary Annotator QC: Make sure to fill out the drop-down box for Pham Starterator. Also, you are missing the SD (Final) Score note section. Also, you do not need to include BLAST results with unknown functions. Make sure to run and investigate Topcons. Finally, for synteny notes, add which phages you compared to and consider adding the phams of the upstream/downstream/corresponding genes. CDS 34300 - 34509 /gene="48" /product="gp48" /function="membrane protein" /locus tag="Brunswick_48" /note=Original Glimmer call @bp 34300 has strength 9.85; Genemark calls start at 34300 /note=SSC: 34300-34509 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDH69_gp44 [Arthrobacter phage Gordon] ],,NCBI, q1:s1 100.0% 2.38759E-38 GAP: 73 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.405, -4.187763896562409, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein FDH69_gp44 [Arthrobacter phage Gordon] ],,YP_009603505,98.5507,2.38759E-38 SIF-HHPRED: SIF-Syn: Membrane protein, upstream gene is in pham 84471, downstream gene is in pham 14878, like CapnMurica /note=Primary Annotator Name: Tenney, Megan /note=Auto-annotation: Both Glimmer and GeneMark predict a start site of 34300, an ATG codon. /note=Coding Potential: Both host-trained and self-trained GeneMark shows strong coding potential that is captured in this ORF with the above start site. /note=SD (Final) Score: The final score is -4.188 which is not ideal, however it is the highest of all listed start site candidates. /note=Gap/overlap: The start site candidate at 34300bp allows for the smallest gap of 73bp, that also maximizes the ORF length (210bp). This gap is not ideal, however it is conserved within other AU phages’ genomes, such as Breylor (67bp gap) and CapnMurica (75bp gap). /note=Phamerator: As of 1/8/2022, this gene is in pham 12638. This pham contains almost all phages within the AU cluster, including Breylor and CapnMurica which both match Brunswick’s genome length of 210bp. No functions are called for the other genes in this pham at this time. /note=Starterator: Start 1 at 34300bp, was the most annotated start site, called 100% of the time it was present. This start site was found in all 23 members of this pham. This start site is highly conserved. /note=Location call: Given the evidence, this gene is a real gene with a start site at 34300bp. This start site is highly conserved amongst all members of its pham, most of which are from the same subcluster AU. There is strong coding potential that is fully captured by this start site, and it maximizes the ORF while minimizing the upstream gap. This gap is relatively large, however it is conserved among other members of the AU subcluster. /note=Function call: NKF. All strong NCBI and phagesDB BLAST hits (e-value < e-32, 100% coverage, >98% alignment) have no known function. CDD showed no hits and HHpred provided no significant hits, due to all high e-values. /note=Transmembrane domains: Both TmHmm and Topcons predicted 1 membrane protein and given that there is no known function at this time, this gene can be labeled as a membrane protein. /note=Secondary Annotator Name: Di Blasi, Daria /note=Secondary Annotator QC: I agree with the primary annotator based on all the evidence. CDS 34511 - 34714 /gene="49" /product="gp49" /function="hypothetical protein" /locus tag="Brunswick_49" /note= /note=SSC: 34511-34714 CP: yes SCS: neither ST: NA BLAST-Start: [hypothetical protein SEA_DEVITOJR_47 [Arthrobacter phage DevitoJr]],,NCBI, q1:s1 100.0% 1.60702E-33 GAP: 1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.727, -5.603046763998909, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_DEVITOJR_47 [Arthrobacter phage DevitoJr]],,QXO13207,89.5522,1.60702E-33 SIF-HHPRED: SIF-Syn: /note=Added gene. Good atypical CP on GM-self only. Found in phages ScienceWizSam, CapnMurica, DevitoJr CDS 34716 - 34841 /gene="50" /product="gp50" /function="membrane protein" /locus tag="Brunswick_50" /note=Genemark calls start at 34716 /note=SSC: 34716-34841 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein NIKTSON_48 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 5.5206E-20 GAP: 1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.976, -2.6453814847322845, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein NIKTSON_48 [Arthrobacter phage Niktson] ],,ASD52272,100.0,5.5206E-20 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Teoh, Bryan /note=Auto-annotation: Glimmer does not have any readings on the start site while GeneMark at 34716 F. GeneMark has an ATG start site. /note=Coding Potential: No coding potential was detected in all 6 ORFs of host-trained GeneMark, however, self-trained gene mark showed strong coding potential in the putative ORF. /note=SD (Final) Score: -2.645 ; Start site 34716 has the best score (-4.5) among all other start sites and denotes a very good score based on the annotation criteria. /note=Gap/overlap: Overlap of 10bp offers the longest putative ORF while maintaining the longest coding potential. /note=Phamerator: Pham 14878 at 1/7/2022. It is conserved in 11 other phages from various clusters. it is the most called gene (5) in 44 non-draft phage genomes. /note=Starterator: Ran on 1/7/2022; Start number 5 is the most called gene number in 44 non-draft phage genomes; This is manually annotated in 44 non-draft genes in this Pham. The evidence agrees with the site predicted by Glimmer. Pham Maps called the gene as a valid gene. /note=Location call: Based on the evidence, the identity of this gene is unknown but likely a valid coding sequence. /note=Transmembrane domains: 1 detected by TMHMM and DeepTMHMM /note=Secondary Annotator Name: Empson, Brianna /note=Secondary Annotator QC: For Phamerator, make sure to list some of the phages you used for comparison. For Starterator, you need to write which bp this start site corresponds to. Pham maps doesn`t call the gene as real or not. It may display synteny but that is evidence for the synteny box, not Staterator. You are missing all of your function call data as well (should be above TMDs). You need to fill out all of the drop-down menus and check some of the BLAST calls as evidence, even if it is NKF. I agree with your call, but your notes are incomplete. Still need to fill out synteny too. CDS 34867 - 35505 /gene="51" /product="gp51" /function="hypothetical protein" /locus tag="Brunswick_51" /note=Original Glimmer call @bp 34867 has strength 9.31; Genemark calls start at 34867 /note=SSC: 34867-35505 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_49 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 3.60534E-152 GAP: 25 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.135, -3.4160307515230497, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_49 [Arthrobacter phage Niktson] ],,ASD52273,100.0,3.60534E-152 SIF-HHPRED: SIF-Syn: Synteny is locally very well maintained with other phages in the same pham. The sizes and spacing of surrounding genes are similar to other structures in Pham 55571 (as of 1/20/22). Gene Brunswick_49 seems to not correspond to the intuitive homolog in other genomes in the same Pham, but this should not affect the classification of 46. /note=Primary Annotator Name: Turon Font, Guillem /note=Auto-annotation: Glimmer and GeneMark agree on 34867. /note=Coding Potential: The autoannotated start covers all CP for this gene. /note=SD (Final) Score: Z-score is the highest, above 2. Final Score is the third highest at -3.4 /note=Gap/overlap: 25. This is a very large gap, but it`s the smallest one after -38, which is much more outlandish. /note=Phamerator: In pham 55571 as of 1/11/22. 111 of 116 genes in that pham are non-draft. /note=Starterator: The autoannotated start is the only one that is Manually annotated in pham 55571. Start 19 has 16 MA`s. /note=Location call: I agree with the autoannotated start. /note=Function call: PhagesDB BLAST has no top matches with a function. All top NCBI BLAST matches are hypothetical proteins. NCBI CDD has no matches. The lowest E-value in HHPred is 2.7, a far cry from 10-3. /note=Transmembrane domains: TmHmm calls no transmembrane domains. TOPCONS does not load. /note=Secondary Annotator Name: Ghannam, Maisam /note=Secondary Annotator QC: Good job Guillem. Maybe add the significance of no transmembrane domain hits (what this gene for sure can`t be)? CDS 35512 - 35682 /gene="52" /product="gp52" /function="hypothetical protein" /locus tag="Brunswick_52" /note=Original Glimmer call @bp 35512 has strength 2.85 /note=SSC: 35512-35682 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein FDH69_gp48 [Arthrobacter phage Gordon] ],,NCBI, q1:s1 100.0% 5.45395E-31 GAP: 6 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.172, -4.323774879235046, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDH69_gp48 [Arthrobacter phage Gordon] ],,YP_009603509,100.0,5.45395E-31 SIF-HHPRED: SIF-Syn: Brunswick_49 is flanked by Brunswick_46 and Brunswick_50. CapnMurica and DevitoJr are also found in cluster AU1 with Brunswick. CapnMurica_48 and DevitoJr_50 are part of Pham 11496 with Brunswick_49 and are annotated to be of No Known Function (NKF). The upstream genes, Brunswick_46, CapnMurica_47, and DevitoJr_49 are part of Pham 55571 and currently are of NKF. Finally, the downstream gene Brunswick_50 is annotated to be of NKF and is part of Pham 4854. CapnMurica_49 and DevitoJr_51, which are also part of Pham 4854 and is also of NKF. This synteny is still seen further downstream many genes. /note=Primary Annotator Name: Cheng, Celine /note=Auto-annotation: Only Glimmer calls the gene and predicts the start site to be at 35512 bp. GeneMark does not call the gene. Glimmer assigns a score of 2.85. The start codon is ATG. /note=Coding Potential: In both GeneMark Host and Self, we see very weak coding potential. However, there was some atypical / alternate coding potential noted in GeneMark Self in the first forward ORF. The chosen start site captures all coding potential. /note=SD (Final) Score: The final score of -4.324 is the only listed final score in PECAAN. It corresponds to a Z-score of 2.172. /note=Gap/overlap: There is a 6 bp gap between the gene upstream, Brunswick_47, and a 9 bp gap downstream. Both gaps are of reasonable length. /note=Phamerator: As of 01/12/2022, it is part of pham 11496. It is conserved in phages Breylor17, CapnMurica, DevitoJr, and CastorTray, which are all also part of subcluster AU1 with Brunswick. Many genes in this pham are of No Known Function. /note=Starterator: There are 26 non-draft genes in pham 11496, and 19 with the start site called, start number 5, which corresponds to a start at 35512 bp for Brunswick. This is the most annotated start. /note=Location call: Based on the evidence above, including atypical/alternate coding potential, SD (final) score, gap/overlap size with the upstream forward gene, synteny with other phages in cluster AU, and phamerator and starterator analysis, this gene is a real gene and has a start site at 35512 bp. We can see that this start site is called in Starterator and Glimmer. /note=Function call: While there were many PhagesDB BLAST hits with small e-values of 9e-26, the function is unknown for those hits. HHPRED had no relevant hits. NCBI BLAST also has multiple hits for hypothetical proteins with e-values ranging from 5.45395e-31 to 4.25624e-30, signifying that the function is currently unknown. CDD also had no relevant hits. As a result, this gene is currently of No Known Function (NKF) /note=Transmembrane domains: TMHMM and TOPCONS did not predict any TMDs. As a result, we cannot label it a membrane protein. /note=Secondary Annotator Name: Jin, Katherine /note=Secondary Annotator QC: Evidence and notes look good. CDS 35692 - 35826 /gene="53" /product="gp53" /function="hypothetical protein" /locus tag="Brunswick_53" /note=Original Glimmer call @bp 35692 has strength 8.7; Genemark calls start at 35692 /note=SSC: 35692-35826 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDH68_gp49 [Arthrobacter phage CaptnMurica] ],,NCBI, q1:s1 100.0% 3.90758E-24 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.319, -2.0111200136961407, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDH68_gp49 [Arthrobacter phage CaptnMurica] ],,YP_009603422,100.0,3.90758E-24 SIF-HHPRED: SIF-Syn: NKF, the upstream gene was called as NKF and the downstream gene was called as NKF as well, just like in phage DevitoJr. /note=Primary Annotator Name: Cosentino, Evan /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 35692. /note=Coding Potential: There is no typical coding potential on GeneMark Host. There is, however, good atypical coding potential on GeneMark Self. /note=SD (Final) Score: -2.011 /note=Gap/overlap: 9 bp gap /note=Phamerator: This is listed in pham 4854. Date 1/13/22. /note=Starterator: Start site 11 in Starterator was manually annotated in 32 out of 80 non-draft genes. Start 11 is 35692 in Brunswick. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: 35692. /note=Function call: No known function. NCBI BLAST lists the function as a hypothetical protein with high percent identities and low e-values. PhagesDB BLAST also lists the function as function unknown with low e-values and high scores. No results come up on CDD. On HHPred, the best hit lists the function as a Methyladenine glycosylase. This result, however, has a very high e-value at 2.2 and a probability of 77.83%. /note=Transmembrane domains: TmHmm and Topcons both do not call any transmembrane domains. /note=Secondary Annotator Name: Chavez, Valeria /note=Secondary Annotator QC: All evidence presented strongly supports function and location call of this gene. Don`t forget to fill out the synteny box! CDS 35885 - 36799 /gene="54" /product="gp54" /function="membrane protein" /locus tag="Brunswick_54" /note=Original Glimmer call @bp 35885 has strength 8.77; Genemark calls start at 35885 /note=SSC: 35885-36799 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage CastorTray]],,NCBI, q1:s1 100.0% 0.0 GAP: 58 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.073, -2.794070146961553, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage CastorTray]],,QYC55042,99.0132,0.0 SIF-HHPRED: SIF-Syn: The pham map alignments are strange for this gene. Gene 50 corresponds to gene 38 of Gordon, both of which have no known function (although gene 50 in Brunswick has evidence of being a membrane protein) and are of pham 56194. Gene 50 is not said to correspond to gene 50 of Gordon, although both are of the same pham. The upstream gene in Brunswick, gene 49, corresponds to Gordon gene 49, and both have no known function and are of pham 4854. Brunswick gene 51 does not correspond to any gene in Gordon, although Brunswick gene 52 corresponds to Gordon gene 52, and both have no known function and are of pham 96399. /note=Primary Annotator Name: Gibbons, Alicia /note=Auto-annotation: GeneMark and Glimmer both call the start site 35885, with a start codon of ATG. /note=Coding Potential: The GeneMark Host map contains this gene in the second track, and the start site covers all of this coding potential. Coding potential is reasonable, with some dips. GeneMark Self shows reasonable coding potential along this interval capped by the start site. /note=SD (Final) Score: The final score is the highest at -2.794. The z-score is also highest at 3.073. These are both strong and reasonable scores. /note=Gap/overlap: The LORF start site contains a gap of 40, although its z-score is 0.841 and its final score is -7.752, so it was ignored. The auto-annotated and chosen start creates the second-longest ORF with a gap of 58. Although this is a relatively large gap, for this situation, it appears reasonable. The length is reasonable at 915 bp. /note=Phamerator: As of 1/12/22, this gene is found in pham 56194. This pham appears in 21 other phages of cluster AU, including CapnMurica, CastorTray, and Caterpillar -- in all of the AU phages, this pham is found in two genes. It also found in genes of phages of other clusters (very interesting), including clusters DJ, AM, BI, EL, and AW. Despite this pham appearing in many phages, it does not have a called function. /note=Starterator: As of 1/7/22, this pham has 232 members (wow). Start site 23 is found in 22.4% of genes in the pham and called 94.2% of the time when present. Most of the genes with this start site are of cluster AU, and many are of subcluster AU1, suggesting that this start site is reasonable (not conserved). Start 29 is the most called site, found in 114 of 232 genes in this pham and called 97.4% of the time when present. Interestingly, start 23 is called in many AU phages around the 50th gene, and start 29 is called in many AU phages around the 40th gene (including Brunswick_38). Start 23 corresponds with the auto-annotated start site of 35885. /note=Location call: Taken together, this evidence suggests that gene 51 (stop 36799) is a real gene with a start site of 35885. /note=Function call: Within PECAAN, there appears to be no known function for this gene although with evidence of transmembrane domains, this gene does appear to be a membrane protein. PhagesDB BLAST calls 100 genes with significantly low e-values but none with a known function. HHPRED returns no results with significantly low e-values with a known function. NCBI BLAST likewise returns dozens of significant results with no known functions, although two list the function as membrane protein. CDD also returns no functions. /note=Transmembrane domains: TMHMM predicts 2 TMHs and TOPCONS predicts 1 TMH. /note=Secondary Annotator Name: Montoya Serpas, Cinthya /note=Secondary Annotator QC: Function should be switched to "membrane protein" given the TMHMM and TOPCONS evidence. Don`t forget to check 2-3 boxes as evidence for each database. Everything else looks great. **addressed CDS 36988 - 37140 /gene="55" /product="gp55" /function="hypothetical protein" /locus tag="Brunswick_55" /note=Original Glimmer call @bp 36988 has strength 11.22; Genemark calls start at 36988 /note=SSC: 36988-37140 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_53 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 9.88255E-27 GAP: 188 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.171, -4.325675514574479, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_53 [Arthrobacter phage Niktson] ],,ASD52275,100.0,9.88255E-27 SIF-HHPRED: SIF-Syn: NKF; upstream gene is NKF, downstream gene is NKF, as in phage Breylor17 and CastorTray. /note=Primary Annotator Name: Di Blasi, Daria /note=Auto-annotation: Both Glimmer and GeneMark call start site 36988. The start codon is ATG. /note=Coding Potential: Coding potential on the forward strand only, indicating that this is a forward gene. Both GeneMark Self and GeneMark Host show coding potential. The start site covers all of the coding potential within the ORF. The start site is the LORF. /note=SD (Final) Score: The SD Score of start site 36988 is -4.326 with a Z Score of 2.171, both of which are slightly more favorable than the SD Score (-4.690) and Z Score (2.064) of the only other potential start site. /note=Gap/overlap: Start site 36988 produces a 188 bp gap with the upstream gene but this gap is smaller than the 197 bp gap produced by the only other potential start site. Additionally, the 188 bp gap is conserved with other phages in the cluster such as in Breylor17(AU). /note=Phamerator: The gene is part of pham 575 as of January 11th, 2022. The pham has 29 members, 4 of which are draft genomes (including Brunswick). 22 of the 29 phages with pham 36988 belong to the AU cluster such as Breylor17(AU) and CastorTray(AU) /note=Starterator: The highly conserved start site (start site 4) is present in the Brunswick genome and is called (start site 36988). The most annotated site is called in 17 of the 25 non-draft genes in the pham and is called 91.3% of the time when present. All phages that call start site 4 belong to cluster AU like Brunswick and 13 of the phages belong to subcluster AU1 like Brunswick. /note=Location call: Based on all the evidence, the start site of the gene is likely start site 36988 since it includes all of the coding potential within the ORF in GeneMark Self and GeneMark Host, has the most favorable SD score (-4.326) and Z score (2.171), is the most annotated start site called 91.3% of the time when it is present, and all of the phages that call start site 36988 belong to cluster AU like Brunswick. /note=Function call: NKF; All phagesdb BLASTp hits were of unknown function, all NCBI BLASTp hits called gene products with a hypothetical protein, and there were no relevant CDD hits or HHpred hits (all HHpred hits had E-values > 63), therefore there is evidence to support that the gene product has no known function. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Semaan, Sasha /note=Secondary Annotator QC: All the evidence agrees with the location and function call of the gene. Overall looks great and good job with the detailed notes! CDS 37137 - 37901 /gene="56" /product="gp56" /function="membrane protein" /locus tag="Brunswick_56" /note=Original Glimmer call @bp 37137 has strength 11.56; Genemark calls start at 37137 /note=SSC: 37137-37901 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_54 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.268, -4.649592931416472, no F: membrane protein SIF-BLAST: ,,[hypothetical protein NIKTSON_54 [Arthrobacter phage Niktson] ],,ASD52276,100.0,0.0 SIF-HHPRED: SIF-Syn: NKF, the upstream gene is in pham 575 and is NKF and the downstream gene is in pham 8171 and is NKF, just like in phage CastorTray. /note=Primary Annotator Name: Empson, Brianna /note=Auto-annotation: Both Glimmer and GeneMark call this gene’s start site at bp 37137 (methionine). /note=Coding Potential: This gene does have reasonable coding potential within the putative ORF, and the auto-annotated start site encompasses all of this coding potential. /note=SD (Final) Score: This start site has a Final Score of -4.650 which is not the best Final Score. However, this Final Score appears to be irrelevant for the start call because of the -4 bp overlap which suggests the presence of an operon. Due to the presence of an operon, the auto-annotated start site is still supported despite the lower quality Final Score. /note=Gap/overlap: The overlap with the upstream gene is a reasonable -4 bp which also suggests the presence of an operon. The gene’s length given the auto-annotated start site is 765 bp which is also reasonable. /note=Phamerator: As of January 10th, 2022, this gene is in pham 94717. This pham does appear to be conserved in other members of the AU1 cluster. Phages/genes used for comparison were: CapnMurica_52, Breylor17_54, and CastorTray_57. Neither the phages database nor Phamerator had a function called for this gene. /note=Starterator: The start site choice that is conserved across the pham is start site #4 which corresponds to bp 37137 in Brunswick. 21/47 non-draft phages genes in the pham also call this start site, and it is called 100% of the time when present. /note=Location call: Altogether, the evidence suggests that this is a real gene given the presence of coding potential within the putative ORF. The auto-annotated start site at 37137 appears to be the most reasonable start site call because it creates the LORF, an overlap of -4 bp which indicates an operon, and is highly conserved across members of this gene’s pham. /note=Function call: The top 5 NCBI BLASTp hits, sorted by E-value, suggest that the function is unknown, with high query coverage (>99%), high % identity (>90.16%), and low E-values (4e-153). The top 5 hits from the PhagesDB blast also suggests that the function is unknown, with low E-values (e-138) and high identity coverage (>95%). CDD and HHpred were uninformative. /note=Transmembrane domains: TMHMM predicted one TMD, but Topcons did not call any TMDs. One TMD called by TmHmm does not meet the threshold to call this protein a membrane protein. /note=Secondary Annotator Name: Verpukhovskiy, Philipp /note=Secondary Annotator QC: Agree with location and function calls, very comprehensive notes. CDS 37906 - 38463 /gene="57" /product="gp57" /function="hypothetical protein" /locus tag="Brunswick_57" /note=Original Glimmer call @bp 37906 has strength 4.23; Genemark calls start at 37906 /note=SSC: 37906-38463 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TEACUP_56 [Arthrobacter phage Teacup]],,NCBI, q1:s1 100.0% 2.5422E-134 GAP: 4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.487, -5.833946290930162, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TEACUP_56 [Arthrobacter phage Teacup]],,ASR84056,100.0,2.5422E-134 SIF-HHPRED: SIF-Syn: NKF -- high syntney compared to finalized genes within same subcluster. /note=Primary Annotator Name: Ghannam, Maisam /note=Auto-annotation: Glimmer & GeneMark both call the gene at 37906 F. The start codon is ATG or Methionine. /note=Coding Potential: High coding potential found on both Host and Self Trained Genemark. /note=SD (Final) Score: Z score fairly low sitting at 1.47, RBS coding score is at -5.834. This calculation could be due to the fact that an operon exists in the proposed start site area. /note=Gap/overlap: 4 base pair gap in gene suggests that there is an operon preceding the rest of the gene sequence. /note=Phamerator: Part of Pham 8171 which is shared among 116 other phage members. Pham has an average bp length of 560-590. Shares synteny with other phages in the same subcluster such as Breylor17 and CastorTray. Therefore safe to say that high conservation of pham observed among phage cluster. /note=Starterator: Starterator puts Brunswick at start 6 which 44% of other phages within the pham share. This is the most annotated start region. The start region which is the second-most common is at start 6. Hard to deduce whether other phages were categorized as start 3 due to scientific evidence or “mob mentality” as 44% is not a very strong number and neither is 22%, which is the percentage for start 3. /note=Location call: Until further notice, Gene 55 starts at bp 37906 and ends at 38643. High conservation of pham indicated through synteny maps on PECAAN. Coding region of gene starts at 6. /note=Function call: BLASTp, HHpred, and CDD all showing as no known function or hypothetical protein. High score and low e value on BLASTp confirm this. Related ascension numbers are ASR84056 and ASD52277, both sharing high alignment and identity, with said identity being “hypothetical protein.” /note=Transmembrane domains: No transmembrane domains found. Suggests that this gene, though having no known function, does not have any interaction with host cell membrane features. /note=Secondary Annotator Name: Liao, Shiqing /note=Secondary Annotator QC: 1. For starterator comments, double check if you meant start 3 when you wrote start 6 at some places. When I checked it`s start 6 that has 44%; 2. I think a gene can be part of an operon when there is 4bp overlap, not 4bp gap; 3. Synteny box needs to be a bit more specific and compare with another specific phage CDS 38487 - 42425 /gene="58" /product="gp58" /function="DNA primase/polymerase" /locus tag="Brunswick_58" /note=Original Glimmer call @bp 38487 has strength 7.0; Genemark calls start at 38487 /note=SSC: 38487-42425 CP: yes SCS: both ST: SS BLAST-Start: [DNA polymerase [Arthrobacter phage CaptnMurica] ],,NCBI, q1:s1 100.0% 0.0 GAP: 23 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.408, -3.8308929311341546, no F: DNA primase/polymerase SIF-BLAST: ,,[DNA polymerase [Arthrobacter phage CaptnMurica] ],,YP_009603427,99.314,0.0 SIF-HHPRED: DNA polymerase; DNA polymerase, primase, DNA replication, DNA BINDING PROTEIN; 3.56A {Saccharomyces cerevisiae},,,8FOK_1,49.3902,100.0 SIF-Syn: /note=Primary Annotator Name: Light, Isabel /note=Auto-annotation: Both Glimmer and Genemark called 38487 as the autoannotation start site. The start codon called is ATG. /note=Coding Potential: There is good coding potential in the putative ORF, the start site does cover all the coding potential. Because the ORF is so long (3939 bp) there are lots of peaks and valleys in the coding potential. /note=SD (Final) Score: The final score is the second best, with a value of -3.831 one other start site has a better final score which is -3.452. The Z-score is 2.408, which is good. /note=Gap/overlap: the gap is 23 bp, which is very reasonable. No other start site has a gap less than 100 bp. The length of the gene is very long, 3939 bp long. /note=Phamerator: As of 1/13/22 the gene is in pham #8810. Other members of cluster AU contained this gene, including Breylor17 and CapnMurica. There are 116 members in the pham, all called the function DNA primase/polymerase. This gene is on the approved function list as DNA primase/polymerase. /note=Starterator: The startsite that is most conserved by members of the pham is startsite #9 at 38487 bp. It is called 61/109 times (88.3%) of the time when present. It seems likely that this start site is correct. It is manually annotated 11/13 times for cluster AU1. /note=Location call: With all the evidence provided, it is very likely this is a real gene with the most likely startsite being the startsite at 38487 bp in Brunswhick given that it has the smallest gap, a good Z-score and final score, is called 88.3% of time when present. /note=Function call: Phagesdb BLAST had many strong hits, more than 15 strong hits which have an e-value of 0 with a predicted function of DNA primase/polymerase. HHpred also showed more than 5 strong hits with an E-value of 0 with DNA polymerase being in the description, they all have a probability of 100%. NCBI BLAST showed many strong hits as well, with more than 5 hits for DNA primase/polymerase with an E-value of 0, 100% coverage and 99% aligned. CDD showed 3 significant hits with low E-values (e-13 to e-8) 13-25% coverage for DNA primase and DNA polymerase. /note=Transmembrane domains: No evidence of transmembrane domains on TmHmm or Topcons. /note=Secondary Annotator Name: Tenney, Megan /note=Secondary Annotator QC: I agree with your location and function call! Nice job! CDS 42498 - 42722 /gene="59" /product="gp59" /function="hypothetical protein" /locus tag="Brunswick_59" /note=Original Glimmer call @bp 42498 has strength 7.76; Genemark calls start at 42498 /note=SSC: 42498-42722 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TEACUP_58 [Arthrobacter phage Teacup]],,NCBI, q1:s1 100.0% 3.48682E-45 GAP: 72 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.017, -3.0866669750886713, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TEACUP_58 [Arthrobacter phage Teacup]],,ASR84058,100.0,3.48682E-45 SIF-HHPRED: SIF-Syn: Upstream gene is a DNA primase/polymerase and the downstream gene is NKF just like in phage CapnMurica. /note=Primary Annotator Name: Jin, Katherine /note=Auto-annotation: Both GeneMark and Glimmer agree on 42498 as the suggested start site. The site number that was called is #56. The predicted start codon is GTG. /note=Coding Potential: There is good coding potential predicted by GeneMark Self (Track 2), and by GeneMark Host (Track 3 shows an upward hash at the start site). There is reasonable coding potential predicted within the ORF and over the start site, 42498. /note=SD (Final) Score: The SD score is -3.087, this is the best SD score; there are SD scores of -6, but with bad Z-scores of less than 2. In this specific case, the predicted SD and Z score combination (-3.087 and 3.017 respectively) seems to be relevant for a ribosome binding site. /note=Gap/overlap: The overlap of 72bp is reasonable for this gene. This gene length prediction is the longest ORF, and it is acceptable given the auto annotated start site and scores. /note=Phamerator: This gene is found in Pham 13054 as of 1/07/22. This pham is found in other members of the same cluster, AU, that Brunswick is in. Some of the phages that I’ve compared to Brunswick are Breylor17, CapnMurica, and CastorTray. As of now, I have not seen a function call for this gene. /note=Starterator: The start number called the most often in the published annotations is 1, it was called in 21 of the 28 non-draft genes in the pham. The start site number for this gene corresponds to the 42498 base pair position. /note=Location call: Based on all the evidence collected so far from Genemark and Starterator, the agreed upon start site is 42498bp. Start number 1 was also the most annotated start number for this gene and other phage genes that are in the AU cluster. /note=Function call: The top 5 NCBI BLASTp hits, with E-values lower than 9e-75, suggest that this gene could have no known function (NKF) with high query coverage (100%), but relatively high % identity (>89.19%). /note=The top 3 hits from PhagesDB BlastP yielded E-values all lower than 1e-34, and suggest the gene’s function is NKF. /note=Both CDD and HHpred did not lead to informative/significant results. HHpred had high e-values. /note=Transmembrane domains: There were no predicted TMH hits on TMHMM, and zero hits on TOPCONS. This means that the lack of data cannot serve as evidence for the gene. /note=Secondary Annotator Name: TEOH, BRYAN (Joey) /note=Secondary Annotator QC: Agree with annotation call, notes are comprehensive and concise. CDS 42729 - 43055 /gene="60" /product="gp60" /function="hypothetical protein" /locus tag="Brunswick_60" /note=Original Glimmer call @bp 42729 has strength 8.92; Genemark calls start at 42729 /note=SSC: 42729-43055 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_58 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 9.54506E-69 GAP: 6 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.073, -2.794070146961553, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_58 [Arthrobacter phage Niktson] ],,ASD52280,99.0741,9.54506E-69 SIF-HHPRED: SIF-Syn: NFK, both genes upstream and downstream are NFK just like in phage Nightmare. /note=Primary Annotator Name: Montoya, Cinthya /note=Auto-annotation: 42729 per glimmer and GeneMark. /note=Coding Potential: There is coding potential present that spans the putative ORF of this gene in the self-trained report and and very minimal coding potential shown in the host-trained GeneMark report. /note=SD (Final) Score: -2.794. This is the best final score as it is the least negative. This score is also associated with the best Z-score of 3.073 which is above the desired threshold of 2. /note=Gap/overlap: There is a gap of 6 with the gene upstream. This is the smallest gap resulting in the LORF with a gene product of 327 bp which is reasonably long. A similar gap is conserved among all members of subcluster AU1. /note=Phamerator: This gene is found within pham 4048 as of 1/13/22 which consists of 32 members. This gene is conserved in 25/28 members of cluster AU and it is found in 19/19 members of subcluster AU1, thus being highly conserved. There is no function called for this gene at the moment. /note=Starterator: The most reasonable start site is start site 5 which is found in 15/32 of genes in pham 4048. There are 15/32 manual annotations for this start site and it was called 100% of the time when present. Within subcluster AU1, the start number 5was manually annotated 12/15 times. This start site corresponds to bp 42729 which is consistent with the auto-annotated call by Glimmer and GeneMark. /note=Location call: Based on the evidence presented, this is a real gene with the most reasonable start site being 42729 /note=Function call: NKF. The top five hits on PhagesDB BLASTp have no function listed for this gene but have extremely small e-values ranging from 2e-56 to 2e-49 which is strong evidence suggesting that this is a real gene. This is consistent with the information presented by NCBI BLASTp as the top five hits list the function of this gene as unknown. However, the top five hits resulted in a high query of 100% , identity values ranging from 95% to 81%, and e-values ranging from 9.5e-69 to 2.16e-61. There are no CDD hits available for this gene and there are no significant hits predicted by HHpred due to the e-values being too high, the lowest being 5.2. /note=Transmembrane domains:There is no predicted TMH by TOPCONS and TmHmm. Therefore, the function of this gene is not membrane protein. /note=Secondary Annotator Name: Turon Font, Guillem /note=Secondary Annotator QC: As far as I can tell, Host-trained GeneMark does not have CP spanning further than the AA start and the end. CDS 43048 - 43242 /gene="61" /product="gp61" /function="hypothetical protein" /locus tag="Brunswick_61" /note=Original Glimmer call @bp 43048 has strength 10.13; Genemark calls start at 43048 /note=SSC: 43048-43242 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_59 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 5.62945E-36 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.772, -3.5985503360675457, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_59 [Arthrobacter phage Niktson] ],,ASD52281,100.0,5.62945E-36 SIF-HHPRED: SIF-Syn: No known function, upstream gene is in pham 4048 with no known function , downstream gene is in pham 8381 with no known function, just like in phage Breylor17, Niktson, ElephantMan, and CapnMurica. Phams also aligned. /note=Primary Annotator Name: Semaan, Sasha /note=Auto-annotation: Both Glimmer and GeneMark agree on the same start site @ 43048. /note=Coding Potential: Gene has low coding potential within the putative ORF in the Host-Trained GeneMark. There is evidence of sufficient coding potential within the ORF with typical and atypical overlap in Self-Trained GeneMark. /note=SD (Final) Score: -3.599; At the suggested start site @ 43048, the final score and the z-score, 2.772, are the most favorable in comparison to the other suggested start sites. The length of the ORF is also favorable at 195 base pairs. /note=Gap/overlap: There is an overlap of -8 with the preceding gene. This can suggest the presence of two operons with the preceding gene. With this overlap, the longest ORF is also obtained. /note=Phamerator: Gene found in Pham 6936 as of 01/06/22. All 20 non-draft genes found in this pham belong to phages in cluster AU. /note=Starterator: The suggested start site for this gene was the most annotated start site, 19. It was called in 16 out of the 20 non-draft genes in the pham. Only one gene with start site 19 did not call it as the manually annotated start site. /note=Location call: Gathered evidence suggests that this is a real gene that has a start site @ 43048: covers all coding potential, longest ORF, aligns with start site of gene in phages of the same cluster. /note=Function call: Function Unknown; The top two hits from PhagesDB had identities of 100% and sufficient e-values of 2e-28 were from phage Niktson and phage ElephantMan. These genes had no known function. The hits from the NCBI database consisted only of hypothetical proteins therefore not providing sufficient evidence of a function. There were no significant hits in HHpred as all the hits had an e-value greater than 1. There were no hits in the CDD database. /note=Transmembrane domains: There were no hits in Topcon and the TMHMM databases suggesting that this gene does not have a transmembrane domain. /note=Secondary Annotator Name: Chavez, Valeria /note=Secondary Annotator QC: All evidence presented strongly supports function and location call of this gene. CDS 43248 - 43823 /gene="62" /product="gp62" /function="SSB protein" /locus tag="Brunswick_62" /note=Original Glimmer call @bp 43248 has strength 10.12; Genemark calls start at 43248 /note=SSC: 43248-43823 CP: yes SCS: both ST: SS BLAST-Start: [ssDNA binding protein [Arthrobacter phage DevitoJr]],,NCBI, q1:s1 100.0% 9.12018E-135 GAP: 5 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.401, -4.196087099541657, no F: SSB protein SIF-BLAST: ,,[ssDNA binding protein [Arthrobacter phage DevitoJr]],,QXO13220,99.4764,9.12018E-135 SIF-HHPRED: b.40.4.7 (A:) gp2.5 {Bacteriophage T7 [TaxId: 10760]} | CLASS: All beta proteins, FOLD: OB-fold, SUPFAM: Nucleic acid-binding proteins, FAM: Phage ssDNA-binding proteins,,,SCOP_d1je5a_,74.3456,99.6 SIF-Syn: The synteny checks out, as the gene is in the same position, as in Breylor and CapnMurica. The two flanking genes are also in similar positions. /note=Primary Annotator Name: Verpukhovskiy, Philipp /note=Auto-annotation: Genemark and glimmer call start site at 43823, good gap, start codon ATG. /note=Coding Potential: Good coding potential on self trained genemark in the forward direction. /note=SD (Final) Score: -4.196, second best score on pecaan. /note=Gap/overlap: 5, with comparison to finalized phages agreeing. /note=Phamerator: Pham 8381 agrees with Breylor17, 1/26/22. /note=Starterator: (Start: 17 @43248 has 22 MA`s), /note=Location call: 43248. Starterator agrees, ATG codon, small gap, however final score is low, but still the one of the lowest out of all the ones on pecaan. /note=Function call: PhagesDB blast calls the function unknown with e values <-108 and 100% probability. HHpred has calls with low probability and low e values for dna binding proteins. NCBI blast has calls with 100% probability and low e values <-110 for unknown function. CDD has no hits. /note=Transmembrane domains: None /note=Secondary Annotator Name: Cheng, Celine /note=Secondary Annotator QC: While I agree with your call and evidence, please include more details in your notes, such as Z-score, how the Final score compares, pham number, start number listed on starterator, and gap/overlap information. Also check evidence, even if function unknown or hypothetical protein, because it is evidence that it is real. CDS 43890 - 44189 /gene="63" /product="gp63" /function="hypothetical protein" /locus tag="Brunswick_63" /note=Original Glimmer call @bp 43890 has strength 4.62; Genemark calls start at 43890 /note=SSC: 43890-44189 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_61 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 5.43139E-66 GAP: 66 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.416, -3.8137921535956054, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_61 [Arthrobacter phage Niktson] ],,ASD52283,100.0,5.43139E-66 SIF-HHPRED: SIF-Syn: The upstream is NKF, and the downstream is NKF Interestingly, this gene and gene 66 are both in synteny with gene 65 from CastorTray. CastorTray`s upstream gene is ssDNA binding protein, the downstream is unknown, pham 96852. /note=Primary Annotator Name: Liao, Shiqing /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 43890. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. The whole coding potential is covered between the start and stop site. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.814. It’s the best Final score. The Z score is 2.416, which is the second highest. The highest Z score start site has a Final score of -3.961 with a gene length of 66bp, which is too short to be a gene. /note=Gap/overlap: The gap with the upstream gene is 66bp. The gap is conserved in other phages from cluster AU1. The gap is a little bigger than desired, but not big enough to have another gene in between. There is no extra coding potential that needs to be covered by another start site. /note=Phamerator: Pham number is 95568. Recorded on 01/10/2022. This gene is in the same phamily with other 116 genes such as Breylor17_61 (AU) and Breylor17_65 (AU). /note=Starterator: Start 22 is not the most annotated start site. Start 23 is the MA. Start site 22is found in 29 of 66 genes in this pham and is called 100% when present. Start 23 also agrees with Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 43890. /note=Function call: Multiple phagesDB BLAST has hits suggesting unknown functions. CDD didn’t come back with hits. HHpred has multiple hits with unreasonably high e values (>170). For those with acceptable e-values, the coverage is around 30% which is too low. Thus, the function of this gene is currently unknown. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Cosentino, Evan /note=Secondary Annotator QC: Everything looks good! I agree with the location and function calls. Great job! CDS 44167 - 44457 /gene="64" /product="gp64" /function="hypothetical protein" /locus tag="Brunswick_64" /note=Original Glimmer call @bp 44167 has strength 4.57; Genemark calls start at 44167 /note=SSC: 44167-44457 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TENNO_63 [Arthrobacter phage Tenno]],,NCBI, q1:s1 98.9583% 1.33095E-56 GAP: -23 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.211, -2.2364030944336752, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TENNO_63 [Arthrobacter phage Tenno]],,AYD87272,94.7917,1.33095E-56 SIF-HHPRED: SIF-Syn: NKF, upstream gene is in pham 955688, downstream gene is in pham 17495, like phage ElephantMan. /note=PECAAN Notes /note=Primary Annotator Name: Tenney, Megan /note=Auto-annotation: Both Glimmer and GeneMark call a start site of 44167bp, a GTG codon. /note=Coding Potential: There is strong coding potential spanning over the entire ORF shown in self-trained GeneMark. No coding potential is observed on the host-trained GeneMark coding map. /note=SD (Final) Score: The final score is -2.236, which is the largest of all start site candidates. The z-score associated with this site is 3.211, which is also the strongest of all candidates. /note=Gap/overlap: A start site of 44167 has an overlap of 23bp, which is fairly large but indicates an operon which is highly favored. This large gap is conserved in other members of cluster AU1, like Breylor17 and CapnMurica. This start site also maximizes the ORF length (291bp). /note=Phamerator: As of 1/11/2022, this gene resides in pham 20768, with 24 members all from cluster AU like Breylor17 and CapnMurica. The majority of these genes have an ORF length of 291bp, with a few varying by only a few base pairs. None of these genes have a known function at this time. /note=Starterator: Start 6, at 44167bp, is the most annotated start site with 20/20 manual annotations in non-draft phages. This start site is conserved in all genes within this pham and called in all of them. /note=Location call: Given the substantial coding potential and synteny among many members in cluster AU, including CapnMurica and Breylor17, this is a real gene with a start site of 44167bp. This start site maximizes the ORF (291bp) and maintains synteny in this length. There is a large upstream overlap of 23bp which could indicate a highly favorable operon, but this is also conserved by phages in the AU cluster. This start site was also called 100% of the time it was present in phages of pham 20768, which was 100% of the time. /note=Function call: PhagesDB BLAST shows significant hits (e-values = 5e-48) with phages Teacup and Niktson which all have no known function. NCBI BLAST also shows strong hits with low e-values and high coverage and identities (around e-56, 98%, and 90%, respectively), all with no known function. Both CDD and HHpred found no significant hits. /note=Transmembrane domains: TmHmm and TOPCONS both failed to detect transmembrane domains, so the function remains unknown. /note=Secondary Annotator Name: Gibbons, Alicia /note=Secondary Annotator QC: Looks great! My only comment is that you do not need to include BLAST unknown function genes. CDS 44454 - 44615 /gene="65" /product="gp65" /function="membrane protein" /locus tag="Brunswick_65" /note=Original Glimmer call @bp 44454 has strength 6.51; Genemark calls start at 44454 /note=SSC: 44454-44615 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_63 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 3.47145E-28 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.927, -6.141009609660544, no F: membrane protein SIF-BLAST: ,,[hypothetical protein NIKTSON_63 [Arthrobacter phage Niktson] ],,ASD52285,100.0,3.47145E-28 SIF-HHPRED: SIF-Syn: Upstream gene is 61 with an unknown function call ; Downstream gene is 63 with a function call of membrane protein. /note=AF: DeepTMHMM shows 1 TMD /note=Primary Annotator Name: Teoh, Bryan /note=Auto-annotation: Both Genemark and Glimmer agree on the same start site at 44454 F. Both have an ATG start site. /note=Coding Potential: No coding potential was detected in all 6 ORFs of host-trained GeneMark, however, self-trained gene mark showed strong coding potential in the putative ORF of 1/6 sequences. /note=SD (Final) Score: -6.141; Start site 44454 does not have the best score but has the best Z-score of 1.927. /note=Gap/overlap: Overlap of -4bp denotes the presence of a ribosomal binding site (operon). /note=Phamerator: Pham 17495 at 1/7/2022. It is conserved in 4 other phages from various clusters. it is the most called gene (1) in 3/3 non-draft phage genomes. /note=Starterator: Ran on 1/7/2022; Start number 1 is the most called gene number in 4 phage genomes of the same Pham; The corresponding cluster represented in this Pham belongs to AU1. This is manually annotated in all 3 non-draft genes in this Pham. The evidence agrees with the site predicted by Glimmer and GeneMark. Pham Maps called this gene as a valid gene but function is unknown. /note=Location call: Based on the evidence, this is an operon or a valid gene and the appropriate start site should be 44454. The /note=Function call : NKF as evident by phages DB function frequency, positive phages DB Blast hits, HHPred and NCBI Blast. They predicted coding potential and presence of conserved sequences. However, function is unknown but likely an operon due to a -4bp gap in the putative ORF. /note=Transmembrane domains: 1 listed as evidence of a transmembrane protein. /note=Secondary Annotator Name: Di Blasi, Daria /note=Secondary Annotator QC: Y: I agree with the primary annotator based on all the evidence however, make sure to add the TOPCONS as evidence and write that both TMHMM and TOPCONS show evidence of TMDs; synteny should be comparing other phages. CDS 44599 - 44910 /gene="66" /product="gp66" /function="membrane protein" /locus tag="Brunswick_66" /note=Original Glimmer call @bp 44599 has strength 5.44; Genemark calls start at 44599 /note=SSC: 44599-44910 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage DevitoJr]],,NCBI, q1:s2 100.0% 7.3093E-50 GAP: -17 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.63, -5.454358101769641, no F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage DevitoJr]],,QXO13223,90.3846,7.3093E-50 SIF-HHPRED: SIF-Syn: Synteny is difficult to assess for this gene. Brunswick_61 is well conserved in other phages of the same Pham in terms of size and spacing, and the genes downstream are also well conserved. Brunswick_60, however, isn`t called very often in other finalized Pham maps. Instead, the homolog of Brunswick_59 is longer and sometimes overlaps with Brunswick_61 in the same way as 60 does. It might be worth checking the previous gene. According to Pham maps, the "membrane protein" function isn`t called for any of Brunswick_61`s homologs. /note=Glania-- Checked phagesDB Blast evidence. Also agree with primary and secondary annotator that gene 61 is suspect as the gene is not conserved in other phages in its cluster. /note= /note=Primary Annotator Turon Font, Guillem /note=Auto-annotation: Both Glimmer and GeneMark agree on 44599 (methionine) /note=Coding Potential: Both self and Host-trained GeneMark show strong Coding potential in the frame. The autoannotated start site is the only one that covers all of it. /note=SD (Final) Score: The Z-score is not the best, at 1.63. The final score is relatively low, but it is the best among all the probable ones at -5.454. /note=Gap/overlap: The autoannotated start site has a gap of -17, which is a bit too large. I believe the previous gene is a little suspect here (not many of the other phages in Pham maps show it, and when it is there, there`s always an overlap with the start of this gene). /note=Phamerator: In Pham 74678 as of 1/13/22. The Pham has 30 members, 26 of which are non-drafts. /note=Starterator: This gene has the Most Annotated site (4@44,599). It is called 90% of the time when present, and it is present in 11 of the 30 genes in this Pham. None of its other start sites have MA`s. /note=Location call: I agree with the autoannotation, although the overlap is of concern. It is maintained, within the pham. /note=Function call: PhagesDB BLAST: Many hits with good coverage and e-values, but none call a function. NCBI BLAST has three for membrane protein, all with coverage above 88% and with e-values ranging from e-57 to e-8. NCBI CDD has no hits. HHPred has one hit with e-value lower than 10-3 calling a membrane protein. /note=Transmembrane domains: One in TmHmm, one in TOPCONS, evidence enough to suggest it is a transmembrane protein. /note=Secondary Annotator Name: Empson, Brianna /note=Secondary Annotator QC: I would be careful using words like "highest" and "low" when discussing the Final Score and Z-Score. You want a final score that is the least negative which may confuse some people so I would say "best". Also, just because a gene overlaps doesn`t mean the Z-score is irrelevant. You are thinking of when there is a -1 or -4 overlap, which may be an operon, and that negates the Final Score. So I think I would remove this part of your notes. I agree that the gene before is a little suspect too. Hopefully, QC of that gene will catch something. Don`t forget to fill out synteny notes and check boxes as evidence! I checked the TOPCONS and TMHMM boxes as evidence for you, but I think you still need to check some PhagesDB phages. I agree with your call. CDS 44911 - 45648 /gene="67" /product="gp67" /function="membrane protein" /locus tag="Brunswick_67" /note=Original Glimmer call @bp 44911 has strength 8.56; Genemark calls start at 44911 /note=SSC: 44911-45648 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_65 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 1.68617E-166 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.852, -2.90473872338446, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein NIKTSON_65 [Arthrobacter phage Niktson] ],,ASD52287,95.102,1.68617E-166 SIF-HHPRED: SIF-Syn: Membrane protein is flanked by a pham 74678 and pham 95568, just like in phage CastorTray. /note=Primary Annotator Name: Chavez, Valeria /note=Auto-annotation: Glimmer and Genemark both agree on the same start site at position 44911. The start codon is ATG. /note=Coding Potential: There is reasonable typical and atypical coding potential within the putative ORF predicted by GeneMarkS and GeneMark Host. The chosen start site at position 44911 does not cover all of this coding potential. /note=SD (Final) Score: The SD score is -2.905 and predicts the best sequence match. The Z-score is is 2.852 and is the best one predicted. /note=Gap/overlap: There is no gap or overlap, which may indicates this gene may be part of an operon. /note=Phamerator: This gene is in Pham 94529 as of 01/11/22. Our phage is in subcluster AU1, and there are 18 non draft genomes in this subcluster that also have this pham. Phages CapnMurica, DevitoJr, Gordon, Nightmare, Nicktson, Synopsis, and Tatanka were used for comparison. Phamerator did not have a function called for this gene. /note=Starterator: Start site 1 is conserved among other members of the pham to which this gene belongs. 12/17 non draft genes in this pham call this site 100% of the time when present. Brunswick calls this start site at position 44911 and has 12 manual annotations. /note=Location call: The gathered evidence suggests that this is a real gene and that start site 17 at basepair position 28500 is most likely the true start site. /note=Function call: The top 2 NCBI and PhagesDB BLASTp hits, sorted by e-value, suggested function is unknown, with high query coverage (100%), high % identity (>86%), and low e-values (<1e-148). CDD had no hits. There were no informative HHpred hits. There is no suggested function, but we can assign this a membrane protein since it does have transmembrane domains. /note=Transmembrane domains: Since TMHMM and TOPCONS called at least 3 TMDs, we can conclude that this protein does have TMDs. The visual data supports this and also predicts 3 transmembrane domains. /note=Secondary Annotator Name: Ghannam, Maisam /note=Secondary Annotator QC: Great job! Loved the attention to detail especially in regards to TOPCONS/TMD. CDS 45626 - 45928 /gene="68" /product="gp68" /function="hypothetical protein" /locus tag="Brunswick_68" /note=Original Glimmer call @bp 45626 has strength 4.15; Genemark calls start at 45626 /note=SSC: 45626-45928 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_66 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 4.14469E-67 GAP: -23 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.246, -4.310555060056183, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_66 [Arthrobacter phage Niktson] ],,ASD52288,100.0,4.14469E-67 SIF-HHPRED: SIF-Syn: Brunswick_64 is flanked by Brunswick_63 and Brunswick_65. CapnMurica and DevitoJr are also found in cluster AU1 with Brunswick. CapnMurica_63 and DevitoJr_65 are part of Pham 56995 with Brunswick_64 and are annotated to be of No Known Function. The upstream genes, Brunswick_63, CapnMurica_58, and DevitoJr_64 are part of Pham 94529 and currently are of no known function. Downstream of Brunswick_64, we see a very large gap, in which we see three additional genes in CapnMurica and DevitoJr. /note=Primary Annotator Name: Cheng, Celine /note=Auto-annotation: Both Glimmer and GeneMark call the gene and agree on a start site at 45626 bp. Glimmer assigns a score of 4.15. The start codon is ATG. /note=Coding Potential: Strong coding potential was found in both GeneMark Self and Host. The chosen start site captures all coding potential on the first forward ORF. /note=SD (Final) Score: The final score of -4.311 is the best final score listed in PECAAN. Its corresponding Z-score is 2.246. /note=Gap/overlap: There is a 23 bp overlap with the upstream gene, and a gap of 713 bp with the downstream gene. The overlap upstream is of reasonable size, but the gap downstream is very large. GeneMark Host doesn’t show any strong coding potential that might be a gene that should be included, and GeneMark Self only shows some atypical coding potential in the gap. However, comparison of pham maps with other phages in AU1 suggests that there should actually be three additional genes in the gap. /note=Phamerator: As of 01/13/2022, it is part of pham 95568. It is conserved in phages Breylor17, CapnMurica, DevitoJr, and CastorTray, which are all also part of subcluster AU1 with Brunswick. Many genes in this Pham are of no known function. /note=Starterator: There are 57 non-draft genes in Pham 95568, and 32 call start number 23, which is the most often called start site. This correlates to a start site at 45626 bp for Brunswick. /note=Location call: Based on the evidence above, including coding potential, SD (final) score, gap/overlap size, and phamerator and starterator analysis, this gene is a real gene and has a start site at 45626 bp. We can see that this start site is called in Starterator, GeneMark, and Glimmer. /note=Function call: PhagesDB Function Frequency has two hits for DNA primase/helicase, but considering it is from pham 20193, this will be disregarded. Many PhagesDB BLAST hits suggest this gene’s function is of no known function. HHPRED had two hits for PGDYG protein with 99.4% & 98.6% probability, 30% and 31% coverage, and e-values of 7.6e-12 and 3.5e-7. While probability and e-values are favorable, coverage doesn’t quite meet the desired threshold. NCBI BLAST has hits for hypothetical protein. CDD had no relevant hits. As a result, this gene is labeled as No Known Function. /note=Transmembrane domains: Both TMHMM and TOPCONS did not predict any TMDs. As a result, this is not a membrane protein. /note=Secondary Annotator Name: Light, Isabel /note=Secondary Annotator QC: I agree with the location call and function call, but I would check the boxes for BLAST and NCBI BLASTp as evidence as they are significant hits. CDS 45935 - 46180 /gene="69" /product="gp69" /function="hypothetical protein" /locus tag="Brunswick_69" /note= /note=SSC: 45935-46180 CP: no SCS: neither ST: NI BLAST-Start: [hypothetical protein NIKTSON_67 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 1.40857E-50 GAP: 6 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.388, -4.2241994709359085, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_67 [Arthrobacter phage Niktson] ],,ASD52311,98.7654,1.40857E-50 SIF-HHPRED: SIF-Syn: /note=Added gene. Typical and atypical CP on GM-self only. Found in phages ElephantMan, Niktson, Teacup, Tenno, Nightmare, and others CDS 46217 - 46417 /gene="70" /product="gp70" /function="hypothetical protein" /locus tag="Brunswick_70" /note= /note=SSC: 46217-46417 CP: yes SCS: neither ST: NI BLAST-Start: [hypothetical protein NIKTSON_68 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 3.07309E-40 GAP: 36 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.211, -2.5074698667202133, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_68 [Arthrobacter phage Niktson] ],,ASD52289,100.0,3.07309E-40 SIF-HHPRED: SIF-Syn: /note=Added gene. Some atypical CP on GM-self only. Found in phages ElephantMan, Niktson, Teacup, DevitoJr, and others CDS 46423 - 46563 /gene="71" /product="gp71" /function="hypothetical protein" /locus tag="Brunswick_71" /note= /note=SSC: 46423-46563 CP: yes SCS: neither ST: NI BLAST-Start: [hypothetical protein NIKTSON_69 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 7.01441E-25 GAP: 5 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.073, -2.583959800616441, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_69 [Arthrobacter phage Niktson] ],,ASD52312,100.0,7.01441E-25 SIF-HHPRED: SIF-Syn: /note=Added gene. Some atypical CP on GM-self only. Found in phages ElephantMan, Niktson, CapnMurica, DevitoJr, and others CDS 46642 - 47145 /gene="72" /product="gp72" /function="HNH endonuclease" /locus tag="Brunswick_72" /note=Original Glimmer call @bp 46642 has strength 4.02; Genemark calls start at 46642 /note=SSC: 46642-47145 CP: yes SCS: both ST: SS BLAST-Start: [HNH endonuclease [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 8.3183E-122 GAP: 78 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.894, -3.7244546317120757, yes F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage Niktson] ],,ASD52290,100.0,8.3183E-122 SIF-HHPRED: HNH homing endonuclease; HNH catalytic motif, Helix-turn-helix DNA binding domain, protein-DNA complex, DNA binding protein-DNA COMPLEX; HET: EDO; 2.92A {Bacillus phage SPO1} SCOP: d.4.1.3, d.285.1.1,,,1U3E_M,96.4072,100.0 SIF-Syn: /note=Primary Annotator Name: Cosentino, Evan /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 46642. /note=Coding Potential: There is a bit of typical coding potential on GeneMark Host on a forward frame from 46875 to 47050. There is good atypical coding potential on GeneMark Self. /note=SD (Final) Score: -3.724 /note=Gap/overlap: 713 bp gap /note=Phamerator: This is listed in pham 95508. Date 1/13/22. /note=Starterator: Start site 43 in Starterator was manually annotated in 21 out of 167 non-draft genes. Start 43 is 46642 in Brunswick. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: 46642. /note=Function call: HNH endonuclease. NCBI BLAST lists the function as a HNH endonuclease with high percent identities and low e-values. PhagesDB BLAST also lists the function as HNH endonuclease with low e-values and high scores. CDD lists it as NUMOD4 which is a putative DNA-binding motif found in homing endonucleases and related proteins. On HHPred, the best hit lists the function as a HNH homing endonuclease. This hit has a 100% probability and an e-value of 3e-31. /note=Transmembrane domains: TmHmm and Topcons both do not call any transmembrane domains. /note=Secondary Annotator Name: Jin, Katherine /note=Secondary Annotator QC: I agree with the evidence. Remember to do synteny. CDS 47263 - 47580 /gene="73" /product="gp73" /function="hypothetical protein" /locus tag="Brunswick_73" /note=Original Glimmer call @bp 47263 has strength 0.07; Genemark calls start at 47263 /note=SSC: 47263-47580 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_MEDIUMFRY_70 [Arthrobacter phage MediumFry]],,NCBI, q2:s10 97.1429% 3.83378E-40 GAP: 117 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.161, -4.34663776219455, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_MEDIUMFRY_70 [Arthrobacter phage MediumFry]],,AXH44614,69.5652,3.83378E-40 SIF-HHPRED: SIF-Syn: This gene, 65, is of pham 80112 in Brunswick with NKF, corresponding with gene 68 of CapnMurica, also no known function. The upstream genes are of pham 96468 with functions NHN endonuclease and NHN endonuclease domain protein, respectively. The downstream genes are of pham 9218 with functions DNA helicase and ATP-dependent helicase, respectively. /note=Primary Annotator Name: Gibbons, Alicia /note=Auto-annotation: GeneMark and Glimmer both call a start site of 47263 with a start codon of ATG. /note=Coding Potential: There is no coding potential in this interval on GeneMark Host. GeneMark Self shows high typical coding potential in this interval contained by the start site, and also shows lower atypical coding potential in this interval. /note=SD (Final) Score: The final score of this start site is lowest at -4.347. The z-score of this start site is highest at 2.161 and appears reasonable. /note=Gap/overlap: This start site has a gap of 117, although it creates the LORF. This seems possibly unusual. Its length is reasonable at 318 base pairs. /note=Phamerator: As of 1/12/22, this gene is of pham 80112. This pham appears in many AU phages and a few phages of different clusters and appears to be moderately conserved around this location in AU phages. Phamerator calls no function for this gene. /note=Starterator: Start 6 is the most annotated start site and is found in 22 of 28 of genes in the pham and is called 100% of the time when present, including in Brunswick_69. Our gene, Brunswick_67, is not on Starterator, possibly due to changing gene position (Phamerator was run 1/7/22), as Brunswick_69 calls the start site 47263, which corresponds to our auto-annotated start site of 47263. /note=Location call: Taken together, these results suggest that this gene is real and has a start site of 47263. It is notable though, that this gene does not appear on the GeneMark Host map. /note=Function call: Per PECAAN-based analysis, this gene appears to have no known function. PhagesDB BLAST calls about 20 genes with significantly low e-values, none of which have a known function. HHPRED returns no significant results. NCBI BLAST returns no results with known functions. CDD returns no data. /note=Transmembrane domains: TMHMM predicts no TMHs, neither does TOPCONS. /note=Secondary Annotator Name: Turon Font, Guillem /note=Secondary Annotator QC: Great notes! I find no mistakes. CDS 47755 - 48990 /gene="74" /product="gp74" /function="DNA helicase" /locus tag="Brunswick_74" /note=Original Glimmer call @bp 47755 has strength 3.65; Genemark calls start at 47803 /note=SSC: 47755-48990 CP: yes SCS: both-gl ST: SS BLAST-Start: [helicase [Arthrobacter phage CastorTray]],,NCBI, q1:s1 100.0% 0.0 GAP: 174 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.874, -2.9215542714922735, no F: DNA helicase SIF-BLAST: ,,[helicase [Arthrobacter phage CastorTray]],,QYC55058,96.1538,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Empson, Brianna /note=Auto-annotation: Both Glimmer and GeneMark did call this gene, but Glimmer called the start site at 47755 (methionine) and GeneMark calls the start site at 47803 (methionine). /note=Coding Potential: The gene does have reasonable coding potential within the putative ORF. The auto-annotated Glimmer start site does encompass all of this coding potential. The auto-annotated GeneMark start site leaves a bit of the coding potential out of the putative ORF. /note=SD (Final) Score: The auto-annotated start site for Glimmer does have the best Final Score with a score of -2.922. The auto-annotated start site for GeneMark does not fall in the top three best Final Scores. /note=Gap/overlap: The Glimmer auto-annotated start site produces a fairly large gap of 174 bp, but there is no coding potential between this gene and the upstream gene and this gap is present in many other phages in cluster AU1. Given this information, this gap appears to be reasonable. The gene length created by this start site is 1236 bp. This is not the LORF, but the start site that produces the LORF has a much worse Final Score (-4.165) than the Glimmer start site and both start sites have the same Z-score (2.874). For these reasons, the Glimmer start site still appears to be the most attractive start site. /note=Phamerator: As of January 11th, 2022, this gene was in pham 9218. This pham does appear to be conserved in other members of Bruinswick’s cluster. The phages and genes used for comparison were CapnMurica_69, CastorTray_75, and Breylor17_68. Neither Phamerator nor the phages database called a function for this gene, but many genes in the same pham were called as DNA helicases. /note=Starterator: The most conserved start site amongst this pham is start site #11 which phage Brunswick does not have. Brunswick calls start site #10 which is found in 28/166 non-draft genes and is called 75% of the time when present. Start site #10 corresponds to bp 47755 in Brunswick. While Brunswick does not contain the most conserved start site in the pham, it does contain a relatively conserved start site that is called frequently when present. /note=Location call: Altogether, this gene does appear to be real because there is reasonable coding potential within the putative ORF. The most likely start site appears to be the auto-annotated Glimmer site of 47755. This start site had the best Final Score and covered all of the coding potential. It is also relatively conserved across the pham and has a good Z-score as well. /note=Function call: The top 5 NCBI Blast hits, sorted by E-value, suggest the function to be an ATP-dependent helicase with high identity coverage (>94.2%), high query coverage (100%), and low E-values (0). The top hits on PhagesDB with low E-values (0) also support that this function is a helicase. The top two hits on HHpred, by E-values, also suggest that this gene codes for some DNA protein with high probabilities (100%), high coverage (>98%), and low E-values (7.1e-38). While the top two HHpred suggestions are different, both could be encompassed by a generic DNA helicase. CDD had one hit with high enough coverage (>41%) and a low enough E-value (0.0000111033) to meet the threshold for evidence. This hit suggests that this gene codes for a part of the DEAD-like helicases superfamily which supports the theory that this protein is some kind of DNA helicase. /note=Transmembrane domains: There were no transmembrane domains predicted by either TMHMM or Topcons. This supports the theory that this gene codes for some sort of DNA protein. Secondary Annotator Name: Semaan, Sasha /note=Secondary Annotator QC: The calls made for this gene are backed by sufficient evidence. Looks good overall and great job on detailed notes. CDS 48980 - 49753 /gene="75" /product="gp75" /function="hypothetical protein" /locus tag="Brunswick_75" /note=Original Glimmer call @bp 48980 has strength 8.83; Genemark calls start at 48980 /note=SSC: 48980-49753 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_73 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 2.99922E-143 GAP: -11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.308, -2.606079664606164, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_73 [Arthrobacter phage Niktson] ],,ASD52293,90.2439,2.99922E-143 SIF-HHPRED: SIF-Syn: Moderate synteny when comparing some phages like Elephantman, but low synteny in other phages like Breylor and CapnMurica. /note=Primary Annotator Name: Ghannam, Maisam /note=Auto-annotation: Glimmer & GeneMark both call the gene at 48980 F. The start codon is ATG or Methionine. /note=Coding Potential: High coding potential on Self Trained Genemark and moderate coding potential on Host Trained Genemark. Sufficient evidence points to this being a real gene. Varying rates of synteny are observed, with some phages in the same subcluster lacking gene 71 completely, others being located at a different spot on the DNA strand, and others being identically placed between the same upstream and downstream genes. /note=SD (Final) Score: Along with coding potential visuals, the high Z score (3.308) and good final SD score (-2.606) indicate that 48980 is a good start site. /note=Gap/overlap: -11 overlap suggests that the gene starts while still being in the coding region of the previous gene. Not a significant base pair amount to warrant concern, as RBS score is still low. /note=Phamerator: Part of Pham 93208 which cxonsists of only 5 phages, 2 of which are draft including phage Brunswick. Average bp size for genes in this pham is 750 bp. Pham maps are showing synteny in some compared phages and none in others. /note=Starterator: Start 3 found in 80% (or ⅘) of genes in this pham, with Brunswick included in the most annotated start site. Not a big candidate pool to choose from in terms of other potential start sites. /note=Location call: This gene starts at bp 48980 and stops at 49753 F. Starterator has a low candidate pool for this pham but Brunswick is a part of the most annotated start site at 3. /note=Function call: This gene is of no known function. BLASTp shows high score and low e values for unknown function, using phages Elephantman and Niktson as comparison values (Score 409, e-114 for both). HHpred readouts showing high probability for endonuclease protein but very low coverage, not enough to deem as evidence. /note=Transmembrane domains: None. Suggests that this gene, though having no known function, does not have any interaction with host cell membrane features. /note=Secondary Annotator Name: Verpukhovskiy, Philipp /note=Secondary Annotator QC: Agree with location and function calls, very comprehensive notes. CDS 49753 - 49887 /gene="76" /product="gp76" /function="hypothetical protein" /locus tag="Brunswick_76" /note=Original Glimmer call @bp 49753 has strength 7.74; Genemark calls start at 49753 /note=SSC: 49753-49887 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TEACUP_75 [Arthrobacter phage Teacup]],,NCBI, q1:s1 100.0% 9.78133E-23 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.429, -6.400894757677589, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TEACUP_75 [Arthrobacter phage Teacup]],,ASR84071,100.0,9.78133E-23 SIF-HHPRED: SIF-Syn: Gene in Pham # 296 is conserved in phage Teacup, with the same upstream and downstream genes, both NKF. /note=Primary Annotator Name: Light, Isabel /note=Auto-annotation: Both Glimmer and Genemark called the startsite 49753 with start codon ATG. /note=Coding Potential: There is reasonable coding potential in the putative ORF but it appears to start before the autoannotation start site. /note=SD (Final) Score: The Z-score and Final score are actually better for the start site that was not called. Autoannotation Z score is 1.429 and the final score is -6.401 whereas the other start site has a Z-score of 2.166 and a final score of -5.37. /note=Gap/overlap: the autoannotated start site has a 1 bp overlap whereas the only other potential start site has a 17 bp gap. The length of the autoannotated startsite gene is 135 bp whereas the alternative start site corresponds to a 117 bp ORF. /note=Phamerator: As of 1/13/22 this gene is in pham #296. There are only 10 members of this pham, all of which are in cluster AU including ElephantMan, Nightmare, and Teacup. None called a function for the gene. /note=Starterator: Startsite #1 (corresponding to 49753 bp in Brunswick) is called 100% of the time when present and is present in all phages in pham 296. This means it is very likely to be the startsite for Brunswick. /note=Location call: Given all evidence, especially the manual annotations observed on staterator, it is clear that startsite 49753 is the startsite for this gene and that it is in an operon. /note=Function call: There is not evidence of a known function for this gene in BLAST, NCBI BLASTp, HHpred, or CDD. /note=Transmembrane domains: There are no transmembrane domains for the gene according to TmHmm and Topcons. /note=Secondary Annotator Name: Liao, Shiqing /note=Secondary Annotator QC: Looks good CDS 49891 - 50172 /gene="77" /product="gp77" /function="hypothetical protein" /locus tag="Brunswick_77" /note=Original Glimmer call @bp 49891 has strength 4.34; Genemark calls start at 49924 /note=SSC: 49891-50172 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_TENNO_74 [Arthrobacter phage Tenno]],,NCBI, q1:s1 100.0% 5.54779E-60 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.211, -2.156361006712914, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TENNO_74 [Arthrobacter phage Tenno]],,AYD87280,97.8495,5.54779E-60 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Jin, Katherine /note=Auto-annotation: GeneMark predicted 49924, while Glimmer predicted 49891 as the suggested start site. The site number that was called is #71. The predicted start codon is ATG. /note=Coding Potential: There is good coding potential predicted by GeneMark Self (Track 1), and by GeneMark Host (Track 1 shows an upward hash at the start site and some peaks in the ORF region). There is reasonable coding potential predicted within the ORF and over the start site, 49891. /note=SD (Final) Score: The SD score is -2.156, this the best SD score; there are SD scores of -6, but with bad Z-scores of less than 2. In this specific case, the predicted SD and Z score combination (-2.156 and 3.211 respectively) seems to be relevant for a ribosome binding site. /note=Gap/overlap: The overlap of 3bp is reasonable for this gene. This gene length prediction is the longest ORF, and it is acceptable given the auto annotated start site and scores. /note=Phamerator: This gene is found in Pham 4370 as of 1/07/22. This pham is found in other members of the same cluster, AU, that Brunswick is in. Some of the phages that I’ve compared to Brunswick are Breylor17, CapnMurica, and CastorTray. As of now, I have not seen a function call for this gene. /note=Starterator: The start number called the most often in the published annotations is 5, it was called in 12 of the 21 non-draft genes in the pham. The start site number for this gene corresponds to the 49891 base pair position. /note=Location call: Based on all the evidence collected so far from Genemark and Starterator, the agreed upon start site is 49891bp. Start number 5 was also the most annotated start number for this gene and other phage genes that are in the AU cluster. /note=Function call: The top 4 NCBI BLASTp hits, with E-values lower than 5e-57, suggest that this gene could have no known function (NKF) with high query coverage (100%), but relatively high % identity (>93.55%). /note=The top 5 hits from PhagesDB BlastP yielded E-values all lower than 1e-45, and suggest the gene’s function is NKF. /note=Both CDD and HHpred did not lead to informative/significant results. HHpred had high e-values. /note=Transmembrane domains: There were no predicted TMH hits on TMHMM, and zero hits on TOPCONS. This means that the lack of data cannot serve as evidence for the gene. /note= /note=Secondary Annotator Name: TENNEY, MEGAN /note=Secondary Annotator QC: Actually an SD score of -2.156 is better than -6! The less negative (larger) the better. I agree with your function call and location call! Maybe also include that start site 5 was called 92.9% of the time when present also. Nice job! CDS 50186 - 50659 /gene="78" /product="gp78" /function="HNH endonuclease" /locus tag="Brunswick_78" /note=Original Glimmer call @bp 50186 has strength 8.81; Genemark calls start at 50186 /note=SSC: 50186-50659 CP: yes SCS: both ST: SS BLAST-Start: [HNH endonuclease domain protein [Arthrobacter phage Teacup]],,NCBI, q1:s1 100.0% 1.20235E-108 GAP: 13 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.401, -4.196087099541657, yes F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease domain protein [Arthrobacter phage Teacup]],,ASR84073,98.7342,1.20235E-108 SIF-HHPRED: d.4.1.3 (M:1-105) Intron-encoded homing endonuclease I-HmuI {Bacteriophage SPO1 [TaxId: 10685]},,,d1u3em1,60.5096,99.8 SIF-Syn: /note=Primary Annotator Name: MONTOYA, CINTHYA /note=Auto-annotation: The start site is 50186 per Glimmer and GeneMark. /note=Coding Potential: There is coding potential present that spans over the putative ORF of this gene in both the self-trained and host-trained GeneMark reports. /note=SD (Final) Score: -4.196. This is the best SD score among all the other options since it is the least negative number and it is associated with a Z-score of 2.401 which is greater than the desired threshold of 2 and it is the highest value among all the other options. /note=Gap/overlap: There is an overlap of 13 bp with the gene upstream. This gap is also conserved in all phages from the same subcluster. The length of the gene product is 474 bp which is reasonably long considering the chosen start site. /note=Phamerator: Pham: 83709 as of 1/10/22. This gene is conserved 18/28 in members of cluster AU. 37/62 finalized phage annotations within this pham called this gene a HNH endonuclease protein. Thus, this gene appears to be highly conserved among members of the same pham and members of the same cluster AU. /note=Starterator: The most reasonable start site conserved among the members of pham 83709 is start site 16 which is found in 40/62 (64.5%) of genes in this pham. There are 40/62 manual annotations of non-draft genes on this site and it was called 100% of the time when present. In Brunswick’s genome this is the start site @50186 which is consistent with Glimmer and Genemark’s auto-annotated start sites. /note=Location call: Based on the evidence presented, this is a real gene with the most reasonable start site being 50186. /note=Function call: HNH endonuclease protein. The top 5 BLASTp hits corresponding to phages Nightmare, Teacup, ElephantMan, Nikson, and CapnMurica, suggest that the function of this gene is HNH endonuclease protein due to the high identity values (98%, to 90%) high query coverages (99% to 98%) and extremely low e-values ranging from 5e-89 to 5e-84. NCBI BLAST was also used to analyze protein function resulting in consistent values with BLASTp. The identity values organized for the top five hits range from 98.10% to 87.97%, query coverages range from 100% to 98% , and e-values range from 1.2e-108 to 6.1823e-101. There are no hits available on the CDD. HHpred’s top five hits are also consistent with a function of HNH endonuclease protein resulting high probability values ranging from 99.8-98.5, extremely low e-values ranging from 7.3e-19 to 6.9e-7 /note=Transmembrane domains: There is no predicted TMH by TOPCONS and TmHmm. Therefore, the function of this gene is not membrane protein. /note=Secondary Annotator Name: Teoh ,Bryan /note=Secondary Annotator QC: Agree with annotation call, there is sufficient evidence to conclude the function of this gene. CDS 50656 - 50868 /gene="79" /product="gp79" /function="hypothetical protein" /locus tag="Brunswick_79" /note=Original Glimmer call @bp 50656 has strength 3.96; Genemark calls start at 50656 /note=SSC: 50656-50868 CP: yes SCS: both ST: SS BLAST-Start: [HNH endonuclease domain protein [Arthrobacter phage CaptnMurica] ],,NCBI, q3:s2 97.1429% 1.36204E-35 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.754, -5.337515309673574, no F: hypothetical protein SIF-BLAST: ,,[HNH endonuclease domain protein [Arthrobacter phage CaptnMurica] ],,YP_009603446,92.7536,1.36204E-35 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Semaan, Sasha /note=Auto-annotation: Both Glimmer and GeneMark agree on the same start site @ 50656. /note=Coding Potential: Gene has low coding potential within the putative ORF in the Host-Trained GeneMark. There is evidence of sufficient coding potential within the ORF with typical and atypical overlap in Self-Trained GeneMark. /note=SD (Final) Score: -5.338; There was only one other gene with a better final score than the final score for start site 50656, but this gene length is smaller than the gap if we would choose the start site with the better final score. The Z-score for the start site 50656 is also not the most favorable, only being 1.754, rather than above 2, but the other start sites that did have a Z-score higher than 2 did not have favorable gaps and a long ORF. Therefore, the best choice is the start site at 50656. /note=Gap/overlap: There is an overlap of 4bp with the preceding gene. This suggests the presence of an operon which is favorable. Start site that produced this overlap also results in the largest ORF. /note=Phamerator: Gene found in Pham 1363 as of 01/11/22. All 17 non-draft genes found in this pham belong to phages in cluster AU /note=Starterator: The suggested start site for this gene was the most annotated start site, 3. It was called in 14 out of the 17 non-draft genes in the pham. All the genes with this start site present called it as the annotated start site, therefore it is called 100% of the time when it is present. /note=Location call: Gathered evidence suggests that this is a real gene that has a start site @ 50656: covers all coding potential, longest ORF, presence of an operon, aligns with start site of gene in phages of the same cluster. /note=Function call: HNH Endonuclease; Only one hit from PhagesDB provided significant evidence and that was phage Tokki which called the function as HNH endonuclease and had a sufficient e-value. HHpred resulted in 2 significant hits that the HNH endonuclease function called which had coverage above 92% and e-values below 1e-14. NCBI blast also had 2 significant hits that called HNH endonuclease function with % identity above 80%, coverage above 97%, and e-values to the -34 power which is favorable. CDD database had no hits. /note=Transmembrane domains: Neither TMHMM or Topcons had hits that suggest a transmembrane protein present within this protein. This makes sense since the gene is an HNH endonuclease. /note=Secondary Annotator Name: Turon Font, Guillem /note=Secondary Annotator QC: I agree with everything here :)) CDS 50868 - 51053 /gene="80" /product="gp80" /function="membrane protein" /locus tag="Brunswick_80" /note=Original Glimmer call @bp 50868 has strength 6.84; Genemark calls start at 50868 /note=SSC: 50868-51053 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_78 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 1.73313E-32 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.153, -2.356391881054909, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein NIKTSON_78 [Arthrobacter phage Niktson] ],,ASD52297,98.3607,1.73313E-32 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Verpukhovskiy, Philipp /note=Auto-annotation: Genemark and glimmer agree on start site of 50868. Start codon ATG. /note=Coding Potential: Good coding potential seen on self-trained genemark seen in forward direction. /note=SD (Final) Score: -2.356, this is the best score on pecaan. /note=Gap/overlap: -1 /note=Phamerator: Pham 7031, with Breylor17 agreeing with the protein family. /note=Starterator: Agrees with start site, 12/13 phages, (93%) have start site 2. /note=Location call: 50868. Good final score, ATG start codon, genemark, glimmer, and starterator all agree with the start site. /note=Function call: Unknown function. Phagesdb Blast has unknown function calls with e values<-14. HHpred has hits but the e values are large, >.25. NCBI blast has hits for hypothetical proteins with e values<-22. CDD has no hits. /note=Transmembrane domains: Present, 2 in TmHmm /note=Secondary Annotator Name: Chavez, Valeria /note=Secondary Annotator QC: All evidence presented strongly supports function and location call of this gene. CDS 51050 - 51316 /gene="81" /product="gp81" /function="hypothetical protein" /locus tag="Brunswick_81" /note=Original Glimmer call @bp 51050 has strength 5.99; Genemark calls start at 51050 /note=SSC: 51050-51316 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CASTORTRAY_81 [Arthrobacter phage CastorTray]],,NCBI, q1:s1 100.0% 4.01662E-31 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.639, -3.4892599424135016, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CASTORTRAY_81 [Arthrobacter phage CastorTray]],,QYC55064,73.1183,4.01662E-31 SIF-HHPRED: SIF-Syn: The upstream of this gene is NKF, and the downstream of this gene is NKF Other phages that have this gene don`t have unknown functions for the upstream and downstream genes. /note=Primary Annotator Name: Liao, Shiqing /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 51050. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. In the host-trained GeneMark, the whole coding potential is covered between the start and stop site. However, in the self-trained GeneMark, there is coding potential in front of the start site (and there is another start site in the front). /note=SD (Final) Score: -3.489. It’s the best Final score. The Z score is 2.639, which is the highest. /note=Gap/overlap: The gap with the upstream gene is -4bp, indicating a 4bp overlap with the upstream game. The gap is conserved in some phages from cluster AU1 such as Breylor17, but not CapnMurica. This gap indicates the presence of an operon. /note=Phamerator: Pham number is 18807. Recorded on 01/07/2022. This gene is in the same phamily with other 116 genes such as Arcadia_38 (AM) and Breylor17_75 (AU). /note=Starterator: Start 21 called is not the most annotated start site. The most annotated start site is start 6, but Brunswick doesn’t have this one. Start site 21 is found in 9 of 33 in this pham and is called 88.9% when present. Start 21 also agrees with Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 51050. /note=Function call: Multiple phagesDB BLAST has hits suggesting unknown functions. CDD didn’t come back with hits. HHpred has multiple hits with unreasonably high e values (>110). Thus, the function of this gene is currently unknown. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Cheng, Celine /note=Secondary Annotator QC: Great notes! I agree with your call and evidence. CDS 51309 - 51539 /gene="82" /product="gp82" /function="hypothetical protein" /locus tag="Brunswick_82" /note=Original Glimmer call @bp 51309 has strength 6.61; Genemark calls start at 51309 /note=SSC: 51309-51539 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_SYNEPSIS_74 [Arthrobacter phage Synepsis]],,NCBI, q1:s1 100.0% 2.15256E-37 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.808, -4.302337117698874, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_SYNEPSIS_74 [Arthrobacter phage Synepsis]],,AXH46735,87.013,2.15256E-37 SIF-HHPRED: SIF-Syn: NKF, upstream gene is in pham 7031, downstream gene is in pham 21788, like phage Tatanka. /note=Primary Annotator Name: Tenney, Megan /note=Auto-annotation: Both Glimmer and GeneMark call a start site of 51309bp, which is an ATG codon. /note=Coding Potential: Host-trained GeneMark does not detect any coding potential while GeneMarkS shows strong coding potential that is contained by this ORF. /note=SD (Final) Score: The final score is -4.302 which is the largest of all candidates and the z-score is 2.808, which is strong evidence in favor of this start site. The final-score, while the largest, is still poor but not as important since this gene is most likely part of an operon. /note=Gap/overlap: There is an overlap of -8bp, which is reasonable and indicates the presence of an operon. This overlap is conserved in DevitoJr and CastorTray phages. This start site allows for an ORF length of 231bp which is not the largest, however the candidate with the largest ORF (324bp) has an unreasonable overlap (-101bp) and the final and z-scores are suboptimal (-7.277 and 1.192, respectively). /note=Phamerator: As of 1/11/2022, this gene is in pham 23057 that has 11 members, all of which are cluster AU phages, and four of these are drafts. An ORF length of 231bp or 234bp is conserved in all of these phages, which supports start site 51309. None of these genes have known functions at this time. /note=Starterator: Start 3 (51309bp) has 7/7 manual annotations in non-draft phages and is called 100% of the time it is present, which is 100% of the time. This start site is highly conserved within pham 23057. /note=Location call: Given the strong coding potential, detected by self-trained GeneMark, and synteny with other AU phages like CastorTray and DevitoJr, this is a real gene that starts at 51309bp. This start site is associated with the largest final score of all candidates and a strong z-score, that also allows for a reasonable and favorable overlap and ORF length. This start site is also conserved in all members of the pham. /note=Function call: NKF. PhagesDB BLAST shows strong hits with low e-values (6e-41) with Niktson and ElephantMan, though neither have a known function. NCBI BLAST also shows strong hits with CastorTray and Synepsis (around e-37, 100% coverage, about 78% identity), but these have no known functions. CDD and HHpred detected no significant hits. /note=Transmembrane domains: Both TmHmm and TOPCONS do not detect any transmembrane domains. /note=Secondary Annotator Name: Cosentino, Evan /note=Secondary Annotator QC: Everything looks good! I agree with the location and function calls. Great job! CDS 51523 - 51726 /gene="83" /product="gp83" /function="hypothetical protein" /locus tag="Brunswick_83" /note= /note=SSC: 51523-51726 CP: yes SCS: neither ST: NI BLAST-Start: [hypothetical protein SEA_BREYLOR17_76 [Arthrobacter phage Breylor17]],,NCBI, q11:s7 82.0896% 4.44268E-28 GAP: -17 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.851, -4.994096598233357, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BREYLOR17_76 [Arthrobacter phage Breylor17]],,AXH43821,83.871,4.44268E-28 SIF-HHPRED: SIF-Syn: /note=good atypical CP on GM-self. other genomes have a gene here; those that don`t do not have a gap present in this region. CDS 51723 - 52004 /gene="84" /product="gp84" /function="hypothetical protein" /locus tag="Brunswick_84" /note=Original Glimmer call @bp 51723 has strength 6.38; Genemark calls start at 51723 /note=SSC: 51723-52004 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_80 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 6.39906E-60 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.398, -3.852397050429479, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_80 [Arthrobacter phage Niktson] ],,ASD52299,97.8495,6.39906E-60 SIF-HHPRED: SIF-Syn: Upstream gene 76 has an unknown function but a valid gene. Downstream gene 78 has an unknown function but a valid gene. /note=Primary Annotator Name: Teoh, Bryan /note=Auto-annotation: Both Genemark and Glimmer agree on the same start site at 51723 F. Both have an ATG start site. /note=Coding Potential: Coding potential was detected in 1/6 ORFs of host-trained GeneMark, however, self-trained gene mark showed strong coding potential in the putative ORF of 1/6 forward sequences. /note=SD (Final) Score: -3.852; Start site 51723 does have the best score but does not have the best Z-score of 2.398. /note=Gap/overlap: There is a large overlap of 183bp which likely involves non-coding regions in between genes. /note=Phamerator: Pham 21788 at 1/7/2022. It is conserved in 34 other phages from various clusters. it is the most called gene (17) in 12/30 non-draft phage genomes. /note=Starterator: Ran on 1/7/2022; Start number 17 is the most called gene number in 34 phage genomes of the same Pham; The corresponding cluster represented in this Pham belongs to AU1. This is manually annotated in 12/30 non-draft genes in this Pham. The evidence agrees with the site predicted by Glimmer and GeneMark. Pham Maps called this gene as a valid gene but function is unknown. /note=Location call: Based on the evidence, this is a valid gene and the appropriate start site should be 51723. /note=Function call : NKF as evident by phages DB function frequency, positive phagesDB Blast hits, HHPred and NCBI Blast. HHPRED They predicted coding potential and presence of conserved sequences. However, function is unknown. /note=Transmembrane domains: 0 listed as evidence /note=Secondary Annotator Name: Gibbons, Alicia /note=Secondary Annotator Good job! You can mention that although there is another start site with a higher z-score that creates the LORF, when looking at Starterator, start site 17 was called 100% of the time when present. Also make sure to run and investigate Topcons. As well, in your synteny box, you should mention the phages you compare to and can add the phams of the upstream/downstream/corresponding genes. CDS 52075 - 52527 /gene="85" /product="gp85" /function="hypothetical protein" /locus tag="Brunswick_85" /note=Original Glimmer call @bp 52075 has strength 6.23; Genemark calls start at 52105 /note=SSC: 52075-52527 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein NIKTSON_81 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 99.3333% 2.30907E-100 GAP: 70 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.765, -3.1657223569180837, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_81 [Arthrobacter phage Niktson] ],,ASD52300,95.3947,2.30907E-100 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Turon Font, Guillem /note=Auto-annotation: Glimmer and GeneMark do not agree. Glimmer calls 52075 and GeneMark 52105. /note=Coding Potential: Both autoannotated starts cover all of the coding potential, which exists in the forward direction. /note=SD (Final) Score: 52075: High Z-score (2.7). Has the highest Final Score as well (-3,166). 52105: Z-Score of 1.2 and Final score of -7. Definitely not as good. /note=Gap/overlap: 52075: 70. 52105: 100. The gap is large, but mostly conserved between other genes in the same Pham, as per Pham Maps. /note=Phamerator: In pham 10460 as of 1/13/22. This pham has 23 genes, 18 of which are not drafts. /note=Starterator: Out of the autoannotated starts, Start 4@52075 is the most conserved and the most annotated. Start 52105 has no MA`s, while 52075 has 7. /note=Location call: I believe start 52075 is correct, given the data. The gap is a small issue, but it seems conserved. The location has the best scores and it has the most MA`s. /note=Function call: PhagesDB BLAST has no relevant hits. A lot of hits call integrase with an e-value above 2. NCBI BLAST has no hits that aren`t hypothetical proteins. NCBI CDD has no hits. HHPred has no hits with an acceptable e-value (lowest hit value is 1.9). /note=Transmembrane domains:TmHmm has no hits. TOPCONS results are not available. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 52520 - 52675 /gene="86" /product="gp86" /function="hypothetical protein" /locus tag="Brunswick_86" /note=Original Glimmer call @bp 52520 has strength 0.83; Genemark calls start at 52520 /note=SSC: 52520-52675 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_83 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 7.68745E-29 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.255, -4.151202928977958, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_83 [Arthrobacter phage Niktson] ],,ASD52302,98.0392,7.68745E-29 SIF-HHPRED: SIF-Syn: Pham 17509 protein is flanked by a pham 10460 and pham 12393, just like in phage ElephantMan. /note=Primary Annotator Name: Chavez, Valeria /note=Auto-annotation: Glimmer and Genemark both agree on the same start site at position 52520. The start codon is ATG. /note=Coding Potential: There is reasonable typical and atypical coding potential within the putative ORF predicted by GeneMarkS, but not by GeneMark Host. The chosen start site at position 52520 covers all of this coding potential. /note=SD (Final) Score: The SD score is -4.151 and predicts the best sequence match. However, the Z-score is is 2.255 and is only the third highest. There are two other Z-scores of 2.681 and 2.364 that would lead to a larger overlap with another gene. /note=Gap/overlap: This start site has an 8 bp overlap, which is favorable to the ribosome. /note=Phamerator: This gene is in Pham 17509 as of 01/12/22. Our phage is in subcluster AU1, and there are 7 non draft genomes in this subcluster that also have this pham. Phages DevitoJr, Niktson, Tenno, and Tatanka were used for comparison. Phamerator did not have a function called for this gene. /note=Starterator: Start site 9 is conserved among other members of the pham to which this gene belongs. 3/7 non draft genes in this pham call this site. Brunswick does not have this start site, but calls start site 10 at position 52520. 4/10 non draft genes in this pham call this site 100% of the time when present and all are in the AU cluster. This start site has 3/7 manual annotations. /note=Location call: The gathered evidence suggests that this is a real gene and that start site 10 at basepair position 52520 is most likely the true start site. /note=Function call: The top 2 NCBI and PhagesDB BLASTp hits, sorted by e-value, suggested function is unknown, with high query coverage (100%), high % identity (>84%), and low e-values (<4e-27). CDD had no hits. There were no informative HHpred hits. There is no suggested function. /note=Transmembrane domains: Since TMHMM and TOPCONS didn’t call at least 1 TMD, we can conclude that this protein doesn’t have any TMDs. /note=Secondary Annotator Name: Empson, Brianna /note=Secondary Annotator QC: I agree with your call and evidence, but you need to fill out the drop-down menus, function (NKF), and synteny notes. Also, the start site you chose does cover all of the typical coding potential which is usually good enough to say that it DOES cover all coding potential so you probably want to change this in your notes. CDS 52675 - 52860 /gene="87" /product="gp87" /function="hypothetical protein" /locus tag="Brunswick_87" /note=Original Glimmer call @bp 52675 has strength 10.94; Genemark calls start at 52675 /note=SSC: 52675-52860 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein NIKTSON_84 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 2.38336E-32 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.408, -4.483031786805435, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_84 [Arthrobacter phage Niktson] ],,ASD52303,95.082,2.38336E-32 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Cheng, Celine /note=Auto-annotation: Both Glimmer and GeneMark call the gene and agree on a start site at 52675 bp. Glimmer assigns a score of 10.94. The start codon is ATG. /note=Coding Potential: Some coding potential was found in GeneMark Host, but strong coding potential was found in GeneMark Self. The chosen start site captures all coding potential on the first forward ORF. /note=SD (Final) Score: The final score of -4.483 is the best final score listed in PECAAN. Its corresponding Z-score is 2.408. /note=Gap/overlap: There is a 1 bp overlap with the upstream gene, and an 8 bp overlap with the downstream gene. The overlaps are of reasonable size, and the 1 bp overlap upstream might be indicative of an operon. /note=Phamerator: As of 01/13/2022, it is part of pham 12393. It is conserved in DevitoJr, which is also part of subcluster AU1 with Brunswick. Many genes in this pham are of no known function. /note=Starterator: There are 4 non-draft genes in pham 12393, and all 4 call and were manually annotated to start number 3, which is the most often called start site. This correlates to a start site at 52675 bp for Brunswick. /note=Location call: Based on the evidence above, including coding potential, SD (final) score, gap/overlap size, and phamerator and starterator analysis, this gene is a real gene and has a start site at 52675 bp. We can see that this start site is called in Starterator, GeneMark, and Glimmer. /note=Function call: Many PhagesDB BLAST hits suggest this gene’s function is of no known function. HHPRED had no relevant hits. NCBI BLAST has hits for hypothetical proteins. CDD had no relevant hits. As a result, this gene is labeled as No Known Function. /note=Transmembrane domains: Both TMHMM and TOPCONS did not predict any TMDs. As a result, this is not a membrane protein. /note=Secondary Annotator Name: Ghannam, Maisam /note=Secondary Annotator QC: Great job Celine, loved your concise yet informative explanations. CDS 52853 - 53098 /gene="88" /product="gp88" /function="hypothetical protein" /locus tag="Brunswick_88" /note=Original Glimmer call @bp 52853 has strength 3.35; Genemark calls start at 52853 /note=SSC: 52853-53098 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TEACUP_83 [Arthrobacter phage Teacup]],,NCBI, q1:s1 100.0% 2.35573E-51 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.808, -3.648739441427296, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TEACUP_83 [Arthrobacter phage Teacup]],,ASR84077,98.7654,2.35573E-51 SIF-HHPRED: SIF-Syn: NKF, upstream gene and downstream gene are also NKF, just like in phage DevitoJr. /note=Primary Annotator Name: Cosentino, Evan /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 52853. /note=Coding Potential: There is no coding potential on GeneMark Host, however, there is good atypical coding potential on GeneMark Self. /note=SD (Final) Score: -3.649 /note=Gap/overlap: -8 bp overlap. /note=Phamerator: This is listed in pham 98603. Date 1/22/22. /note=Starterator: Start site 5 in Starterator was manually annotated in 18 out of 18 non-draft genes. Start 5 is 52853 in Brunswick. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: 52853. /note=Function call: NKF. NCBI BLAST lists the function as a hypothetical protein with high percent identities and low e-values. PhagesDB BLAST lists the function as function unknown with low e-values and high scores. No hits come up on CDD. On HHPred, the best hit lists the function as unknown function. That hit has a probability of 90.84% and an e-value of 0.37. /note=Transmembrane domains: TmHmm and Topcons both do not call any transmembrane domains. /note=Secondary Annotator Name: Light, Isabel /note=Secondary Annotator QC: I agree with the calls so far, but I would make sure to answer all the questions for each section. So weird starterator not working for this gene specifically, I just tried on one of my genes and it worked. CDS 53193 - 53279 /gene="89" /product="gp89" /function="hypothetical protein" /locus tag="Brunswick_89" /note= /note=SSC: 53193-53279 CP: no SCS: neither ST: NI BLAST-Start: [hypothetical protein NIKTSON_86 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 6.13468E-9 GAP: 94 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.618, -4.699110570850454, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_86 [Arthrobacter phage Niktson] ],,ASD52314,89.2857,6.13468E-9 SIF-HHPRED: SIF-Syn: /note=not sure if this is real. Very small, some genomes do NOT add a gene here. /note=ElephantMan, Niktson have a gene of the same size, pham 28967 /note=CapnMurica, DevitoJr, Gordon have a gene in this gap, but it`s longer. CDS 53281 - 53610 /gene="90" /product="gp90" /function="membrane protein" /locus tag="Brunswick_90" /note=Genemark calls start at 53281 /note=SSC: 53281-53610 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein FDH68_gp82 [Arthrobacter phage CaptnMurica] ],,NCBI, q1:s1 100.0% 1.31322E-62 GAP: 1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.808, -3.1379842619144376, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein FDH68_gp82 [Arthrobacter phage CaptnMurica] ],,YP_009603455,90.8257,1.31322E-62 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Gibbons, Alicia /note=Auto-annotation: GeneMark calls a start site of 53281, while Glimmer calls no start site. The auto-annotated start codon is ATG. /note=Coding Potential: This gene does not really appear on GeneMark Host. There is only a small peak, which seems to run from about 53350 to 53430. However, this gene does appear on GeneMark Self with reasonable coding potential contained by the auto-annotated start site along the gene interval. /note=SD (Final) Score: The final score is highest at -3.138. The z-score is highest at 2.808. /note=Gap/overlap: This start site has a gap of 182 (length of 330), whereas two longer genes are made with start sites of 119 and 173. The LORF start site has a z-score of 2.257 and a final score of -4.227, whereas the second-longest ORF start site has a z-score of 1.625 and a final score of -6.769. /note=Phamerator: As of 1/12/22, this gene is present in pham 56029. This pham is present around the same gene location in many phages of cluster AU and some phages of the cluster AO. /note=Starterator: Brunswick calls the most called start site. Start 10 was found in 22 of 32 genes in the pham and called 95.5% of the time when present. The clusters where this start site is called is mostly composed of AU1 genes, making this start site reasonable. /note=Location call: Taken together, this evidence suggests that this is likely a real gene with a start site of 53281. /note=Function call: There is no known function for this gene (although, with evidence of a transmembrane domain, this gene is a membrane protein). Per PECAAN, PhagesDB BLAST returns no significant results with known functions. HHPRED also returns no significant results. NCBI BLAST returns mostly significant results with no known functions, although 2 results list the function of this gene as a membrane protein. CDD returns no results. /note=Transmembrane domains: TMHMM predicts one TMH, as does TOPCONS, which seems to predict one TM-helix. /note=Secondary Annotator Name: Jin, Katherine /note=Secondary Annotator QC: I agree with the evidence, remember to do synteny. **addressed CDS 53603 - 53848 /gene="91" /product="gp91" /function="hypothetical protein" /locus tag="Brunswick_91" /note=Original Glimmer call @bp 53603 has strength 1.7; Genemark calls start at 53603 /note=SSC: 53603-53848 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein SEA_UZUMAKI_88 [Arthrobacter phage Uzumaki]],,NCBI, q16:s7 80.2469% 0.00179756 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.398, -5.157748419876103, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_UZUMAKI_88 [Arthrobacter phage Uzumaki]],,UVK62910,50.0,0.00179756 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Empson, Brianna /note=Auto-annotation: Both Glimmer and GeneMark call this gene’s start site at 53603 (methionine). /note=Coding Potential: There is reasonable coding potential in the self-trained GeneMark map, and the proposed start site does encompass all of this coding potential. An important note is that the other proposed start site on PECAAN has a better Z-score and Final Score, but it does not encompass all of the coding potential present. /note=SD (Final) Score: The auto-annotated start site does not have the best Final Score (-5.158), but neither suggested start site has a great Final Score (all < -4). Given this, the auto-annotated start site still does appear reasonable, especially because it has a good Z-score (2.398). /note=Gap/overlap: There is a reasonable overlap between this gene and the upstream gene (-8 bp), which is just slightly over the -7 bp overlap threshold. There is a large downstream gap and coding potential on the self-trained map is present so there may be missing genes downstream. /note=Phamerator: As of January 12th, 2022, this gene was in pham 5025. This gene appears to be an orpham as it is not conserved in other clusters. Neither the phages database nor Phamerator had a function called for this gene.  /note=Starterator: Starterator is uninformative for this gene because it is an orpham.  /note=Location call: Altogether, the evidence suggests that this is a real gene because there was reasonable coding potential within the putative ORF. The auto-annotated start site does appear to be the best start site despite it not having the best scores because the other suggested start site does not encompass all of the coding potential present.  /note=Function call: PhagesDB Blast, NCBI Blast, HHpred, and CDD were uninformative. There were not any significant hits across any of these databases, and this suggests that the function of this gene is unknown.   /note=Transmembrane domains: Neither TMHMM nor Topcons identified any TMDs.  /note=Secondary Annotator Name: Montoya Serpas, Cinthya /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 53863 - 54000 /gene="92" /product="gp92" /function="hypothetical protein" /locus tag="Brunswick_92" /note= /note=SSC: 53863-54000 CP: no SCS: neither ST: NI BLAST-Start: [hypothetical protein PBI_INGRID_90 [Arthrobacter phage Ingrid] ],,NCBI, q1:s1 100.0% 4.47959E-14 GAP: 14 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.129, -6.49991804247903, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PBI_INGRID_90 [Arthrobacter phage Ingrid] ],,QFG08764,47.3684,4.47959E-14 SIF-HHPRED: SIF-Syn: CDS 54027 - 54197 /gene="93" /product="gp93" /function="hypothetical protein" /locus tag="Brunswick_93" /note= /note=SSC: 54027-54197 CP: no SCS: neither ST: NI BLAST-Start: [hypothetical protein NIKTSON_89 [Arthrobacter phage Niktson] ],,NCBI, q1:s1 100.0% 6.52979E-32 GAP: 26 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.809, -5.432024807810012, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein NIKTSON_89 [Arthrobacter phage Niktson] ],,ASD52315,98.2143,6.52979E-32 SIF-HHPRED: SIF-Syn: CDS 54283 - 54621 /gene="94" /product="gp94" /function="VRR-Nuc domain protein" /locus tag="Brunswick_94" /note= /note=SSC: 54283-54621 CP: no SCS: neither ST: NI BLAST-Start: [endonuclease [Arthrobacter phage CaptnMurica] ],,NCBI, q1:s1 100.0% 1.14251E-77 GAP: 85 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.247, -5.21746608827694, yes F: VRR-Nuc domain protein SIF-BLAST: ,,[endonuclease [Arthrobacter phage CaptnMurica] ],,YP_009603458,100.0,1.14251E-77 SIF-HHPRED: c.52.1.35 (A:) stNUC {Salmonella phage SETP3 [TaxId: 424944]} | CLASS: Alpha and beta proteins (a/b), FOLD: Restriction endonuclease-like, SUPFAM: Restriction endonuclease-like, FAM: Virus-type replication-repair nuclease (VRR-Nuc),,,SCOP_d4qbna_,83.9286,99.8 SIF-Syn: CDS 54566 - 55126 /gene="95" /product="gp95" /function="RecB-like exonuclease/helicase" /locus tag="Brunswick_95" /note=Original Glimmer call @bp 54566 has strength 10.84; Genemark calls start at 54566 /note=SSC: 54566-55126 CP: no SCS: both ST: SS BLAST-Start: [exonuclease [Arthrobacter phage CaptnMurica] ],,NCBI, q1:s1 100.0% 5.92651E-137 GAP: -56 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.041, -4.65945827897842, no F: RecB-like exonuclease/helicase SIF-BLAST: ,,[exonuclease [Arthrobacter phage CaptnMurica] ],,YP_009603459,100.0,5.92651E-137 SIF-HHPRED: DNA replication ATP-dependent helicase/nuclease DNA2; DNA binding protein, Hydrolase-DNA complex; HET: ADP; 2.36A {Mus musculus},,,5EAN_A,90.3226,99.4 SIF-Syn: See notes for sentry mapping being incomplete due to potential genes needing to be inserted before this gene. /note=Primary Annotator Name: Ghannam, Maisam /note=Auto-annotation: Glimmer & GeneMark both call the gene at 54566. The start codon is ATG or Methionine. /note=Coding Potential: There is a large gap before the start site of the gene that has major coding potential. Until further addressed by a professor on if any gene adjustments should be made, I will be focusing on what is within the start and end frame. /note=SD (Final) Score: Z score of 2.041 and a RBS score of -4.659 (the lowest in the candidates) signifies that start site 54566 is favored (to be reassessed). /note=Gap/overlap: There is a 717 base pair gap preceding the gene that still needs to be addressed. Based on synteny map coverage with finalized genes in the same cluster, there may be two missing genes that need to be accounted for. /note=Phamerator: As of 1/15/2021, pham 8204 has this gene under the most annotated start site at bp 11. 70 out of 119 genes share this same trait, both finalized and draft phages alike. Synteny maps show conservation of the current reading frame of the gene, but there may need to be genes added that precede the start site. /note=Starterator: Most annotated start site is at 11bp, as mentioned in the phamerator section. Highly conserved start site for phages in this cluster. /note=Location call: Until further notice, Gene 87 starts at 54566 bp with start site 11 being the beginning of the open reading frame, and the gene ends at 55126. /note=Function call: BLASTp showing no function call, however HHpred having high probability and coverage for exonuclease gene, so function will be labeled as exonuclease. /note=Transmembrane domains: No transmembrane domain sequences or any TOPCONS hits, suggesting the protein does not function to interact with cell membranes at all. /note=Secondary Annotator Name: Semaan, Sasha /note=Secondary Annotator QC: Thorough notes, agree with all the evidence provided! CDS 55123 - 57879 /gene="96" /product="gp96" /function="helix-turn-helix DNA binding domain" /locus tag="Brunswick_96" /note=Original Glimmer call @bp 55123 has strength 11.91; Genemark calls start at 55123 /note=SSC: 55123-57879 CP: no SCS: both ST: NI BLAST-Start: [HTH DNA binding domain protein [Arthrobacter phage Teacup]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.919, -2.9063687850157054, no F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[HTH DNA binding domain protein [Arthrobacter phage Teacup]],,ASR84080,99.0185,0.0 SIF-HHPRED: SIF-Syn: The upstream gene is Cas4 Family Exonuclease and the downstream gene is NKF, just like in phage Teacup. /note=Primary Annotator Name: Light, Isabel /note=Auto-annotation: GeneMark and Glimmer both called the startsite of 55123 and the start codon is ATG. /note=Coding Potential: There is good coding potential in the putative ORF, and the start site covers all of it. /note=SD (Final) Score: The Final Score is the best possible, -2.906, and the Z-score is also the best possible, 2.919. /note=Gap/overlap: The autoannotated startsite gives the longest possible ORF, it is a reasonable length of 2757 bp. The gap is 4 bp long, so it is likely in an operon. /note=Phamerator: As of 1/14/22 the gene is in pham #20330, with 119 members. There are other members of the AU cluster in the pham, including Breylor17 and ElephantMan. Helix-turn-Helix DNA binding domain protein was called for the function which is on the SEAPHAGES approved function list. /note=Starterator: Starterator showed that Brunswick does not have the most annotated start. Startsite #15 was called which corresponds to 55123 bp. This start site is called 92.6% of the time when present, with 20/109 manual annotations. /note=Location call: Start site #15 is the best based on it being the longest ORF, having the best Z-score and Final score, smallest gap. Also it does show an overlap of 4 bp meaning it is probably in an operon which means the RBS score matters less. /note=Function call: BLAST had more than 5 hits with an E-value of 0 that called the function as helix-turn-heilx DNA binding domain protein, from phages in the cluster AU such as Teacup. HHPred showed no informative results. NCBI BLASTp had more than 5 hits with an E-value of 0 that had a HTH DNA binding domain with more than 95% coverage. CDD results supported the HTH DNA binding domain calls from BLAST and NCBI BLASTp with a hit of E-value 5x10^-5. /note=Transmembrane domains: There are no transmembrane domains according to TmHmm and Topcons. /note=Secondary Annotator Name: Verpukhovskiy, Philipp /note=Secondary Annotator QC: Agree with function call, very comprehensive notes. CDS 57876 - 58151 /gene="97" /product="gp97" /function="hypothetical protein" /locus tag="Brunswick_97" /note=Original Glimmer call @bp 57876 has strength 6.22; Genemark calls start at 57876 /note=SSC: 57876-58151 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_NIGHTMARE_92 [Arthrobacter phage Nightmare] ],,NCBI, q1:s1 100.0% 1.71469E-59 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.772, -3.1513923047253267, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_NIGHTMARE_92 [Arthrobacter phage Nightmare] ],,ASM62364,98.9011,1.71469E-59 SIF-HHPRED: SIF-Syn: Upstream is a helix-turn-helix DNA binding domain, same as phage Breylor17. /note=PECAAN Notes /note=Primary Annotator Name: Jin, Katherine /note=Auto-annotation: Both Glimmer and Genemark predicted 57876 as the suggested start site. The site number that was called is #86. The predicted start codon is ATG. /note=Coding Potential: There is good coding potential predicted by GeneMark Self (Track 3, has good coding potential in the ORF region), and by GeneMark Host (Track 2, upward hash). There is reasonable coding potential predicted within the ORF and over the start site, 57876. /note=SD (Final) Score: The SD score is -3.151, this is the best SD score; however both the Glimmer and GeneMark predictions were also 57876. In this specific case, the predicted SD and Z score combination (-3.151 and 2.772 respectively). This doesn’t seem relevant for a ribosomal binding site (RBS) as the predicted overlap is -4bp. /note=Gap/overlap: The overlap of 4bp is reasonable for this gene if it is part of an operon. This gene length prediction is also the longest ORF, so it is acceptable given the auto annotated start site. /note=Phamerator: This gene is found in Pham 2667 as of 1/07/22. This pham is found in other members of the same cluster, AU, that Brunswick is in. Some of the phages that I’ve compared to Brunswick are Breylor17, CapnMurica, and CastorTray. As of now, I have not seen a function call for this gene. /note=Starterator: The start number called for this gene is Start 9, it was called in 23 of the 116 non-draft genes in the pham. The start site number for this gene corresponds to the 57876 base pair position. /note=Location call: Based on all the evidence collected so far from Genemark and Starterator, the agreed upon start site is 57876bp. Start number 9 was not the most annotated start number for this gene and other phage genes that are in the AU cluster. /note=Function call: The top 5 NCBI BLASTp hits, with E-values lower than 1e-45, suggest that this gene could have no known function (NKF) with high query coverage (96%), but relatively high % identity (>82.95%). /note=The top 10 hits from PhagesDB BlastP yielded E-values all lower than 1e-47, and suggest the gene’s function is NKF. /note=Both CDD and HHpred did not lead to informative/significant results. HHpred had high e-values. /note=Transmembrane domains: There were no predicted TMH hits on TMHMM, and zero hits on TOPCONS. This means that the lack of data cannot serve as evidence for the gene. /note= /note=Secondary Annotator Name: LIAO, SHIQING /note=Secondary Annotator QC: Looks good, remember to update synteny box