CDS 83 - 535 /gene="1" /product="gp1" /function="terminase, small subunit" /locus tag="Berrie_1" /note=Original Glimmer call @bp 83 has strength 11.28; Genemark calls start at 83 /note=SSC: 83-535 CP: yes SCS: both ST: SS BLAST-Start: [terminase small subunit [Arthrobacter phage Iter] ],,NCBI, q1:s1 100.0% 1.33393E-98 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.255, -1.953940808934884, yes F: terminase, small subunit SIF-BLAST: ,,[terminase small subunit [Arthrobacter phage Iter] ],,URQ04989,98.0,1.33393E-98 SIF-HHPRED: Terminase small subunit; genome packaging, bacteriophage, DNA binding, VIRAL PROTEIN; 1.4A {Enterobacteria phage HK97},,,6Z6E_B,52.0,98.7 SIF-Syn: Terminase small subunit, no upstream gene, downstream gene appears to be terminase large subunit, just like in final phage Adolin. /note=Primary Annotator Name: Erfanian, Kiana /note=Auto-annotation: Glimmer and GeneMark. Both called the same start site of 83. /note=Coding Potential: This gene has good coding potential within the putative ORF, and covers all this coding potential. /note=SD (Final) Score: The RBS Final Score of -1.954 for the original start at 85 is the highest of the suggested starts, and therefore the best. The same is true of the Z-score for this start, at 3.255. /note=Gap/overlap: This gene has an overlap of 3 bp with its downstream gene because its stop is at 535, whereas the start of the downstream gene is 532. /note=Phamerator: This gene was found in pham 97631, which has 145 members. Additionally, 33 phages in this pham belong to cluster AZ. /note=Starterator: Using information from the Starterator analysis run most recently on 1/21/22, it was found that the most conserved start site number is 35. This call was made for 31 of the 112 non-draft genes. The auto-annotated start is called at start number 35 (83), which matches the most conserved start. Phage Berrie’s track contains start site 35 by a yellow line, which denotes it as an auto-annotated start. This start site in Berrie’s track corresponds to that of other phages in the cluster, such as phages DrManhattan and Tweety. Start site 35 has been determined to be the Final Human Annotated start, as represented by a green line on the track representing these phages. The analogous start site between Berrie and other phages in this cluster is therefore promising, indicating that the auto-annotated start site 35 at 83 is indeed correct. /note=Location call: The evidence gathered indicates that the suggested start site of 83 as called by Glimmer and GeneMark appears to be the most probable site. /note=Function call: Terminase small subunit. Both PhagesDB BLASTp and NCBI BLASTp have several hits with low e values, high identity percentages, and reasonable scores; all of which serve as strong evidence for the gene’s function. The top non-draft hit on PhagesDB BLASTp was for a gene in Amyev, a phage within the same cluster as Berrie (AZ). This hit has a significantly low E-value at 2e-81 and reasonable score. Furthermore, the top non-draft hit on NCBI BLASTp was also for a gene from a phage (named Phives) in the same cluster as Berrie. This hit has an E-value even lower than the first hit on PhagesDB of 3.9348e-97, a reasonable score, as well as a very high identity percentage of 94.67%. Each of these first hits have functions of terminase small subunit. This is also the listed function for several other strong hits on both PhagesDB BLASTp and NCBI BLASTp whose data can reasonably serve as further evidence. Top hits for HHPRED alignments further point to terminase small subunit as the gene function. CDD however, provides no data regarding this gene. Given the above data, there is enough evidence to conclude that this is indeed the function of this gene. /note=Transmembrane domains: No TMDs called by TmHmm or TOPCONS. The protein is not a membrane protein. /note=Secondary Annotator Name: Alvarez, Alondra /note=Secondary Annotator QC: I have QC`ed this location call and agree with the primary annotator. CDS 532 - 2238 /gene="2" /product="gp2" /function="terminase, large subunit" /locus tag="Berrie_2" /note=Original Glimmer call @bp 532 has strength 12.66; Genemark calls start at 532 /note=SSC: 532-2238 CP: yes SCS: both ST: SS BLAST-Start: [terminase large subunit [Arthrobacter phage London] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.122, -4.394706008439538, no F: terminase, large subunit SIF-BLAST: ,,[terminase large subunit [Arthrobacter phage London] ],,QOP64305,98.7676,0.0 SIF-HHPRED: Terminase large subunit; genome packaging, bacteriophage, ATPase, nuclease, VIRAL PROTEIN; HET: BR; 2.2A {Enterobacteria phage HK97},,,6Z6D_A,92.7817,100.0 SIF-Syn: terminase, large unit, upstream gene is terminase, small unit, downstream is portal protein, just like in Asa16, London and other phages from cluster AZ /note=Primary Annotator Name: Khaine, Aye Myat /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 532. /note=Coding Potential: There is good coding potential in the ORF of both Host-trained and Self-trained GeneMark. The start covers all coding potential. /note=SD (Final) Score: The final SD score is -4.395 with a z-score of 2.122. Even though this is not the best score, this is the best option and corresponds to the LORF. /note=Gap/overlap: There is a 4bp overlap upstream which indicates that the gene belongs to an operon. /note=Phamerator: This gene belongs to pham 98020 as of 01/22/2022. This pham has 1037 members. More than 10 members belong to the same cluster AZ. /note=Starterator: Starterator calls start number 122 at 532. This start has 21 manual annotations but is not the most annotated start (start 101). This start 122 is shared by a number of other phages in the same cluster AZ. /note=Location call: Based on the evidence above, this is a real gene with the start site at 532. /note=Function call: Terminase, large subunit. Phagesdb function frequency shows high frequency of terminase, large subunit being called as the function. Top phagesdb and NCBI BLASTp hits indicate terminase large subunit as the function with e-values of 0 and 100% coverage, >96% identity. Two top HHpred hits also called terminase, large subunit with e-values <2e-33 and 100% probability. No reliable CDD hits found. /note=Transmembrane domains: This is not a membrane protein. Neither TMHMM nor TOPCONS predicted TMDs. /note=Secondary Annotator Name: Baughman, Lexie /note=Secondary Annotator QC: I have QC`ed the location and functional calls and agree with the first annotator. Perhaps mention in the Starterator section that Berrie does not have the "Most Annotated" start site - it makes your call stronger! CDS 2262 - 3638 /gene="3" /product="gp3" /function="portal protein" /locus tag="Berrie_3" /note=Original Glimmer call @bp 2262 has strength 12.75; Genemark calls start at 2262 /note=SSC: 2262-3638 CP: yes SCS: both ST: SS BLAST-Start: [portal protein [Arthrobacter phage London] ],,NCBI, q1:s1 100.0% 0.0 GAP: 23 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.255, -2.0162541296952132, yes F: portal protein SIF-BLAST: ,,[portal protein [Arthrobacter phage London] ],,QOP64306,98.4716,0.0 SIF-HHPRED: PORTAL PROTEIN; BACTERIOPHAGE SPP1, DNA TRANSLOCATION, MOLECULAR MOTOR, VIRAL PORTAL PROTEIN, VIRAL PROTEIN; HET: CA, HG; 3.4A {BACTERIOPHAGE SPP1},,,2JES_Q,91.048,100.0 SIF-Syn: Portal protein. When compared to phages Asa16 and Adumb2043, the gene is conserved as the same pham, function, and is within a few base pairs of the start site. Additionally, the genes up and down stream seemed to be conserved, although Berrie`s gene functions are not finalized yet. This strongly suggests that portal protein is the correct function call. /note=Primary Annotator Name: McLinden, Katherine /note=Auto-annotation: Both Glimmer and GeneMark call 2262 as their start site. The Glimmer score is 12.75 and the start site is ATG. /note=Coding Potential: There is coding potential for both start sites and the stop site in Host trained and self trained GeneMark. /note=SD (Final) Score: -2.016. This is the best Final score out of all the called start sites. /note=Gap/overlap: The gap is 23 bp, which is the most reasonable gap size of the called start sites. All the other gaps and overlaps are over 100bp, which is not a reasonable call. /note=Phamerator: The pham for this gene is 95678. The date is 1/19/22. There are 1467 members in this pham. There are various clusters represented in this pham. /note=Starterator: Berrie does not have the most annotated start site. The called start site is site 94, which is 2262. This agrees with Glimmer and GeneMark. /note=Location call: This is most likely a real gene. Based on coding potential, GeneMark, Glimmer, and Starterator, 2262 is most likely the start site. /note=Function call: Phages DB Function Frequency’s top five calls are all portal proteins. PhagesDB BLASTp and NCBI BLASTp both called multiple portal protein genes with e-values of 0. HHPRED also calls multiple portal proteins with e-values of 3.1e-34, 1.1e-33, and 5.6e-16. CDD calls one portal protein domain with an e value of 1.49939e-43. The most likely function is therefore portal protein. /note=Transmembrane domains: There were no TMD`s called and no other evidence to suggest a TMD function within the other databases. /note=Secondary Annotator Name: Dooley, Naomi /note=Secondary Annotator QC: I have QC`ed this location call and agree with the primary annotator CDS 3657 - 5711 /gene="4" /product="gp4" /function="capsid maturation protease and VIP2-like ADP-ribosyltransferase toxin" /locus tag="Berrie_4" /note=Original Glimmer call @bp 3657 has strength 10.54; Genemark calls start at 3693 /note=SSC: 3657-5711 CP: yes SCS: both-gl ST: SS BLAST-Start: [capsid maturation protease and VIP2-like ADP-ribosyltransferase toxin [Arthrobacter phage Tuck]],,NCBI, q4:s3 99.5614% 0.0 GAP: 18 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.866, -2.9063687850157054, yes F: capsid maturation protease and VIP2-like ADP-ribosyltransferase toxin SIF-BLAST: ,,[capsid maturation protease and VIP2-like ADP-ribosyltransferase toxin [Arthrobacter phage Tuck]],,WAB10780,97.0717,0.0 SIF-HHPRED: d.166.1.3 (A:) Eukaryotic mono-ADP-ribosyltransferase ART2.2 {Norway rat (Rattus norvegicus) [TaxId: 10116]},,,d1gxya_,38.0117,99.9 SIF-Syn: VIP2-like ADP-ribosyltransferase toxin (pham 55022), upstream gene is portal protein (pham 97881), downstream is NKF (pham 98231), just like in phage Adolin and others. /note=AF: function called according to recent AZ phage harmonization /note=Primary Annotator Name: Uvarov, Evgeniy /note=Auto-annotation start source: There is a disagreement in calls. Glimmer calls the start at 3657 (site 1) with a GTG start codon while GeneMark at 3693 (site 3) with ATG. /note=Coding Potential: Reasonable coding potential predicted within the putative ORF. The Glimmer start call (site 1) covers all the coding potential. The GeneMark start call (site 3) does not cover some of the initial coding potential. This ORF only has forward strand coding potential, thus this is a forward gene. /note=SD (Final) Score: Start site 1 has a final score of -2.906 and a good Z-score of 2.866. This start site has the best (highest) values on PECAAN for this gene. Start site 3 has bad values of -8.464, 0.167 respectively. /note=Gap/overlap: Start site 1 has a gap of 18 bp which is acceptable. This start site creates the LORF and has a gene length of 2055 bp which is good. Start site 3 has a gap of 54. /note=Phamerator: The pham number as of 1/23/2022 is 55022. The gene is conserved in phages Adolin (AZ), Adumb2043 (AZ), Amyev (AZ), as well as other non draft phages. Any of these genomes can be used for comparison since they are all non-draft, but those of the AZ cluster are better since Berrie is from AZ. Based on PhagesDB the function call for the gene is VIP2-like ADP-ribosyltransferase toxin. /note=Starterator: Based on the 1/21/2022 run the most annotated start number 3 is a reasonable choice that is conserved among members of pham 55022. There are 33 members total with 25 being non-draft. 22/25 of non-draft members call start number 3, which correlates to 3657 (site 1) for Berrie. /note=Location call: Considering all of the evidence above, this gene is a real gene that is conserved in phamerator as well as starterator, has good coding potential and covers all of it with a start site at 3657 (site 1). Starterator agrees with Glimmer but not Genemark. /note=Function call: VIP2-like ADP-ribosyltransferase toxin. PhagesDB BLASTp first page of top non-draft hits all have the function of VIP2-like ADP-ribosyltransferase toxin and e values of 0 such as phages Eraser, Amyev, Crewmate. All these phages are in cluster AZ, same as Berrie. For the AZ cluster, Phagesdb Function Frequency shows VIP2-like ADP-ribosyltransferase toxin with the highest total frequency (21%). NCBI BLASTp shows strong hits with zero e-values that are VIP2-like ADP-ribosyltransferase toxin in Eraser (e: 0, id: 90.776%, cov: 99.5614%), DrSierra (e: 0, id: 85.798%, cov: 99.5614%), there was also evidence for ADP-ribosyltransferase domain. Two strong CDD hits are ADP-ribosyltransferase exoenzyme (e: 4.73981e-23, id: 31.6583%, cov: 26.9006%) and VIP2 (e: 2.93529e-21, id: 25.8706%, cov: 27.924%). Since both exhibit such low e-values the VIP2-like ADP-ribosyltransferase toxin is a likely function as it combines these two CDD results. HHpred three top hits were all closely related to the structure and function of VIP2-like ADP-ribosyltransferase toxin. The top hit was d1gxya_ (prob: 99.9%, cov: 38.0117%, e: 2.3e-23) a eukaryotic mono-ADP-ribosyltransferase. The second hit was 1GXY_B (prob: 99.8%, cov: 38.1579%, e: 3.6e-19) a T-cell ecto-ADP-ribosyltransferase. The third hit was 4DV8_A (prob: 99.8%, cov: 38.7427%, e: 3.8e-18) a toxic anthrax lethal factor. Function call was very close between VIP2-like ADP-ribosyltransferase toxin and ADP-ribosyltransferase but due to its prevalence in Phagesdb Function Frequency and Phagesdb BLAST, VIP2-like ADP-ribosyltransferase toxin was chosen. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore this is not a membrane protein. /note=Secondary Annotator Name: Wu, Meigan /note=Secondary Annotator QC: I would reevaluate your coding potential analysis. The coding potential is only located within the predicted ORF (for a start site at approximately 3657-3693 bp) in a forward strand according to Genemark Host and Self, which is acceptable. Additionally, I believe we`ve only used Genemark Host and Self to observe coding potential. For your Phamerator description, I think you meant Berrie instead of Tallboi. I would also note the "ADP-ribosyltransferase domain and MuF-like fusion protein" function call for the pham observed from PhagesDB for your Phamerator evidence. Since there is evidence for both "ADP-ribosyltransferase" and "VIP2-like ADP-ribosyltransferase toxin" function calls, I would check with the professors to make sure one function call is more reasonable than the other, since these are separate function calls according to the functional assignments document. Both function calls seem to be prevalent in cluster AZ for this pham: Kaylissa`s (cluster AZ) function call for this pham was ADP-ribosyltransferase domain and MuF-like fusion protein. CDS 5770 - 6129 /gene="5" /product="gp5" /function="hypothetical protein" /locus tag="Berrie_5" /note=Original Glimmer call @bp 5770 has strength 10.99; Genemark calls start at 5770 /note=SSC: 5770-6129 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_JANEEMI_6 [Arthrobacter phage Janeemi]],,NCBI, q1:s1 98.3193% 3.25259E-77 GAP: 58 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.266, -2.759308040702341, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_JANEEMI_6 [Arthrobacter phage Janeemi]],,UVK63526,98.3193,3.25259E-77 SIF-HHPRED: SIF-Syn: Gene has no known function, upstream gene is a VIP2-like ADP ribosyltransferase toxin, and downstream gene is scaffolding protein like several members of cluster AZ such as Adolin, Amyev, Asa16 and DrManhattan. /note=Primary Annotator Name: Alvarez, Alondra /note=Auto-annotation: Glimmer and GeneMark both call start site 5770 with a Glimmer score of 10.99. Its start codon ATG is common and likely to be used. /note=Coding Potential: The Host-Trained GeneMark and Self-Trained GeneMark both have coding potential predicted within the putative ORF. Coding potential is found on the forward strand, indicating that it is a forward gene. The putative start site at 5770 includes both coding potentials. /note=SD (Final) Score: Putative start site 5770 has a RBS final score of -2.759 and a Z-score of 3.266. These are the best scores out of the potential start sites. This start site also has the longest ORF. /note=Gap/overlap: There is a gap of 58 bp between the gene and the gene upstream. The putative start site minimizes the gap and creates the longest ORF. /note=Phamerator: As of 1/23/2022 the gene belongs to pham 98231. Of the 30 members in the pham, 22 are non-draft genes. The pham is composed of members of the AZ cluster. Phamerator does not call a function for this gene. /note=Starterator: The most conserved start site among the members of the pham is at number 4. It is called in 20 of the 22 non-draft genes, including Berrie. The position corresponds to base pair coordinate 5770 in Berrie. /note=Location call: Based on the coding potential, conservation of genome architecture with other non-draft genes within cluster AZ, good SD statistics for the putative start site, and statistically significant phagesDB BLAST hits, we can determine that this gene is “real.” Coding potential and starterator confirm that the best fit start site is 5770. /note=Function call: NKF. Top hits in PhagesDB BLASTp with e-values lower than 10-57 (identity >82%) had no known function. The hit suggesting a head-to-tail stopper function in PhagesDB function frequency box had a high e-value of 3.8. Top NCBI BLASTp hits with e-values lower than 10-65 (query coverage at 100% and identity percentages >81%) were hypothetical proteins. No output was returned from CDD and HHpred failed to return any statistically significant hits (lowest e-value was 24). /note=Transmembrane domains: TMHMM nor Topcons predict any transmembrane domains. Gene is not a membrane protein. /note=Secondary Annotator Name: /note=Secondary Annotator QC: CDS 6247 - 6786 /gene="6" /product="gp6" /function="scaffolding protein" /locus tag="Berrie_6" /note=Original Glimmer call @bp 6247 has strength 15.95; Genemark calls start at 6247 /note=SSC: 6247-6786 CP: yes SCS: both ST: SS BLAST-Start: [scaffolding protein [Arthrobacter phage Powerpuff] ],,NCBI, q1:s1 100.0% 8.5824E-108 GAP: 117 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.02, -3.3503726477288405, yes F: scaffolding protein SIF-BLAST: ,,[scaffolding protein [Arthrobacter phage Powerpuff] ],,QGZ17304,94.9721,8.5824E-108 SIF-HHPRED: Scaffold protein; major capsid protein, HK97-like fold, scaffolding protein, procapsid, VIRUS; 3.72A {Staphylococcus phage 80alpha},,,6B0X_f,56.9832,97.6 SIF-Syn: Scaffolding protein (pham 98009), upstream gene is pham 98231, downstream gene is major capsid protein (pham 57253), just like in phage Kaylissa. /note=Primary Annotator Name: Baughman, Lexie /note=Auto-Annotation: Glimmer and Genemark. Both agree on the same start site of 6247, with a start codon of ATG. /note=Coding Potential: Reasonable coding potential predicted within the putative ORF. Chosen start site of 6247 covers all of the coding potential. /note=SD (Final) Score: The SD (Final) Score is -3.350, and the Z-score is 3.02. These are the best out of all the listed possible start sites. /note=Gap/Overlap: There is a 117 base pair gap with the upstream gene. However, there is no coding potential in this gap that suggests a gene needs to be added. The auto-annotated start site creates the longest ORF, and the length of the gene is acceptable (540 bp). /note=Phamerator: As of 01/23/2022, the gene is found in Pham 98009. The pham is conserved in other members of the cluster - comparison was done between Berrie and a few other non-draft genomes, including Asa16 and Adolin. Both Phamerator and PhagesDB called the function of this gene as “scaffolding protein,” which is on the approved function list. /note=Starterator: The “Most Annotated” start site (12) is present in 29 of 32 non-draft genes in this pham, and it is present in Berrie. This start site corresponds to base pair position 6247, which is the auto-annotated start site. /note=Location Call: The gathered evidence suggests that this is a real gene and that its start site is likely at position 6247. /note=Function Call: The top NCBI BLASTp hits’ suggested function is scaffolding protein with high query coverage (100%), high % identity (>90.50%), and low e-values (<6e-108). The top PhagesDB BLASTp hits’ suggested function is scaffolding protein, with high % identity (>92%) and a low e-value (<1e-91). While there were no hits in CDD, one of the top two hits in HHpred was informative - with high probability (97.56%), high coverage (56.9832%), and a low e-value (0.0051) - that listed the function as scaffolding protein. /note=Transmembrane Domains: No predicted transmembrane domains. /note=Secondary Annotator Name: Magaling, Janelle /note=Secondary Annotator QC: hhpred hit doesnt have a e-value is > e-7, but other factors look good-- maybe ask abt this, otherwise looks good! CDS 6813 - 7757 /gene="7" /product="gp7" /function="major capsid protein" /locus tag="Berrie_7" /note=Original Glimmer call @bp 6813 has strength 14.44; Genemark calls start at 6813 /note=SSC: 6813-7757 CP: yes SCS: both ST: SS BLAST-Start: [major capsid protein [Arthrobacter phage Janeemi]],,NCBI, q1:s1 100.0% 0.0 GAP: 26 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.158, -2.5074698667202133, yes F: major capsid protein SIF-BLAST: ,,[major capsid protein [Arthrobacter phage Janeemi]],,UVK63528,97.1338,0.0 SIF-HHPRED: Major capsid protein; P22 Bacteriophage, VIRUS; 3.3A {Salmonella phage P22},,,5UU5_C,92.9936,99.9 SIF-Syn: major capsid protein, upstream gene is scaffolding protein, downstream gene is a head to tail adaptor, just like in final phage Phives and Kaylissa. /note=Primary Annotator Name: Dooley, Naomi /note=Auto-annotation: The gene is called by Glimmer and Gene mark at the same start site 6813 along with the same start codon at ATG. This is a very common start codon so there is reason to believe that this may be the correct start site. /note=Coding Potential: The gene has reasonable coding potential that is covered by the start site. /note=SD (Final) Score: The SD score was reasonable at -2.507 which is the best score listed. The z-score of 3.158 is greater than 2 and is the best z-score listed. The next best z-score has an unreasonable gap of 77. This auto annotated start site has the best gap listed. /note=Gap/overlap: There is a reasonable gap of 26bp which is the smallest gap listed. This is a reasonable gap and it is unlikely for a new gene to be found in this gap. /note=Phamerator: The pham number as of 1/24/2022 is 57253, this pham was also found to contain a wide range of clusters with a total of 230 members /note=Starterator: The start number is 7 which corresponds to a 6813 start site. It does have the most annotated start site which agrees with glimmer and genemark /note=Location call: Based on the data it appears that this is a real gene with a 7757 stop site and 6813 start site. /note=Function call: Based on the phages Phives and Kaylissa, it is safe to conclude that the function of this gene is major capsid protein. The top two BLAST hits on PhagesDB both have E-values of 1e-160 and 1e-159; additionally, they both have high scores of 563 and 558 respectively, both of which have a major capsid protein listed (both have high probability, 99.9% for both, both have high coverage, 92.9936% and 89.4904% for both, both have high identity (91.772% and 90.7936%) and e-values of 0. CDD contains one hit with very low identity (~12%), 68% coverage, and an e-value of 2.68e-14. CDD calls a P22 coat protein. This could indicate that it is slightly conserved. /note=Transmembrane domains: Neither TMHMM or TOPCONS predicted any TMDs. Thus we can conclude that this is not a membrane protein. /note=Secondary Annotator Name: Fleming, Hanna /note=Secondary Annotator QC: Based on the evidence you have collected, I agree with your location and function calls. I would make sure to add HHPRED evidence to your function call notes. Otherwise looks great! CDS 7831 - 8238 /gene="8" /product="gp8" /function="head-to-tail adaptor" /locus tag="Berrie_8" /note=Original Glimmer call @bp 7888 has strength 10.19; Genemark calls start at 7888 /note=SSC: 7831-8238 CP: yes SCS: both-cs ST: SS BLAST-Start: [head-to-tail adaptor [Arthrobacter phage Janeemi]],,NCBI, q1:s1 98.5185% 3.50137E-82 GAP: 73 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.255, -2.0162541296952132, yes F: head-to-tail adaptor SIF-BLAST: ,,[head-to-tail adaptor [Arthrobacter phage Janeemi]],,UVK63529,90.9091,3.50137E-82 SIF-HHPRED: 15 PROTEIN; VIRAL PROTEIN, VIRAL INFECTION, TAILED BACTERIOPHAGE, SIPHOVIRIDAE, SPP1, VIRAL ASSEMBLY, HEAD-TO-TAIL INTERFACE, DNA GATEKEEPER, ALLOSTERIC MECHANISM; 7.2A {BACILLUS PHAGE SPP1},,,5A21_C,80.7407,99.1 SIF-Syn: Head-to-tail adaptor. Phage Berrie shows synteny with phage Elezi, and both phages are members of cluster AZ. Berrie gene 7 (pham 57253) is upstream of Berrie gene 8, which matches with gene 7 (major capsid protein) of phage Elezi. Berrie gene 8 (pham 76440) corresponds with Elezi gene 8 (head-to-tail adaptor). Downstream is Berrie gene 9 (pham 98359), which matches with Elezi gene 9 (pham 98359). /note=PECAAN Notes /note=Primary Annotator Name: Wu, Meigan /note=Auto-annotation: Glimmer and Genemark both call the start site at 7888 bp. /note=Coding Potential: Coding potential covers the expected ORF only in a forward reading frame in Self-trained Genemark and Host-trained Genemark. This gene is likely a forward gene. /note=SD (Final) Score: For the start site at 7888, the final score is -4.349, and the z-score is 2.105. For the start site at 7831, the final score is -2.016, and the z-score is 3.255. /note=Gap/overlap: The gap size associated with the start site at 7888 is 130 bp. The gap size corresponding to the start site at 7831 is 73 bp, which is the smallest gap size. There were no notable coding potentials found within these possible gap sizes, and the gap is conserved in AZ cluster phages Elezi and DrManhattan. /note=Phamerator: As of January 23, 2022, the pham number is 76440. This pham is conserved in other AZ cluster phages, including phage Adolin, Amyev, and Crewmate. Genes in this pham either have the function of head-to-tail adaptor or no function noted. /note=Starterator: As of January 23, 2022, pham 76440 has 32 non-draft genes. The auto-annotated start site at 7888 has no previous manual annotations. Start number 3 has 30 manual annotations and is also a candidate start for this gene. Start number 3 corresponds to a start site at 7831 bp. /note=Location call: This gene is likely a real gene with a start site at 7831 bp. /note=Function call: Head-to-tail adaptor. PhagesDB yielded multiple hits corresponding to e-values less than 7e-07. Multiple hits observed on NCBI BLASTp corresponding to head-to-tail adaptor function calls with e-values less than 1e-39. No hits yielded by CDD. HHpred yielded a HK97 Gp6 hit corresponding to a probability of 97.8, e-value of 0.001, and coverage of 78.52%. HHpred also yielded a SPP1 15 hit with a probability of 99.1, coverage of 80.74%, and e-value of 1.2e-8. Both HHpred alignments are required for a head-to-tail adaptor function call. /note=Transmembrane domains: No TMDs were predicted by TMHMM or TOPCONS. This gene does not encode a membrane protein. /note=Secondary Annotator Name: Gonzalez, Celio /note=Secondary Annotator QC: Great detailed work! CDS 8249 - 8356 /gene="9" /product="gp9" /function="hypothetical protein" /locus tag="Berrie_9" /note=Genemark calls start at 8249 /note=SSC: 8249-8356 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein PQD80_gp09 [Arthrobacter phage Lizalica] ],,NCBI, q1:s1 100.0% 2.38954E-14 GAP: 10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.116, -4.387988835334809, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQD80_gp09 [Arthrobacter phage Lizalica] ],,YP_010677574,100.0,2.38954E-14 SIF-HHPRED: SIF-Syn: NKF, upstream gene is head-to-tail adaptor, downstream gene is head-to-tail stopper, just like in phage Adumb2043 /note=Primary Annotator Name: Abuwarda, Manar /note=Auto-annotation: Genemark. It calls the start site at 8249. Start codon is ATG which is a common codon. /note=Coding Potential: The ORF has reasonable coding potential. Coding potential is found in both Genemark Self and Host. The chosen start site includes all the coding potential. /note=SD (Final) Score: -4.388. This is the best and only final score on PECAAN /note=Gap/overlap: 10 bp. This is a reasonable sized gap as it is quite small. /note=Phamerator: Pham 98359. Date 1/24/22. It is conserved and found in Lego (AZ). /note=Starterator: Start site 1 in Starterator was manually annotated in 18/18 non-draft genes in this pham. Start 1 is 8249 in Berrie. This evidence agrees with the start site predicted by GeneMark. /note=Location call: Based on the above evidence, this is a real gene with the most likely start site at 8249. /note=Function call: NKF. phagesDB and NCBI BLAST show no phage hits with known function. HHPRED only shows phage hits with large e-values. CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Paek, Brian /note=Secondary Annotator QC: I agree with the location and function call based on the evidence provided. It would help if you specified what frame shows the coding potential you mention; what about Glimmer having no result say about the start site; include the Z-score in the final score category as well; what phages have this pham as well; I do not think you have to put the evidences that do not show a specific function as evidence. CDS 8356 - 8703 /gene="10" /product="gp10" /function="head-to-tail stopper" /locus tag="Berrie_10" /note=Original Glimmer call @bp 8356 has strength 13.77; Genemark calls start at 8356 /note=SSC: 8356-8703 CP: yes SCS: both ST: SS BLAST-Start: [head-to-tail stopper [Arthrobacter phage Janeemi]],,NCBI, q1:s1 100.0% 3.32016E-69 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.866, -2.9063687850157054, yes F: head-to-tail stopper SIF-BLAST: ,,[head-to-tail stopper [Arthrobacter phage Janeemi]],,UVK63531,95.6522,3.32016E-69 SIF-HHPRED: HEAD COMPLETION PROTEIN GP16; VIRAL PROTEIN, VIRAL INFECTION, TAILED BACTERIOPHAGE, SIPHOVIRIDAE, SPP1, VIRAL ASSEMBLY, HEAD-TO-TAIL INTERFACE, DNA GATEKEEPER, ALLOSTERIC MECHANISM; 7.2A {BACILLUS PHAGE SPP1},,,5A21_E,96.5217,99.7 SIF-Syn: Synteny is present in the pham with phage DrSierra. Both the upstream and downstream gene were conserved. upstream gene is in pham 98136 with NKF and downstream is in pham 98359 with NKF /note=Primary Annotator Name: Batteikh,Maysaa /note=Auto-annotation: Both GLimmer and gene mark call start site at 8356 /note=Coding Potential: The start site covers the entire coding potential for this gene in the forward direction, therefor, the gene is a forward gene. /note=SD (Final) Score: -2.906. The best final score for this gene on PECAAN with the best Z-score of 2.866. /note=Gap/overlap: Overlap of -1, which is indicative of an operon. Is the best and most reasonable gap/overlap for this gene. /note=Phamerator: As of 1/24/2022, this gene belongs to pham 97801, which has 244 members, 220 of which are non draft genes, and 35 of which belong to cluster AZ. /note=Starterator: The most annotated start for this pham is 33, which is called by 111 non draft phages. This gene does not call this start, this genes most annotated start is 30 at 8356. /note=Location call: Based on the gathered evidence, this genes start site is at 8356. /note=Function call: Head to tail stopper. Phages DB has multiple hits of head-to-tail stopper, with the top two being >9e-55, with similar results in NCBI blasts, having greater than 80% identity and lower than e-66 e-values. No results on CDD. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Rajiv, Subashni /note=Secondary Annotator QC: I agree with all calls. Please include the start codon in Auto-Annotation. HHPred was not mentioned in function call notes or checked as evidence. Please fill out the Synteny box! CDS 8715 - 9020 /gene="11" /product="gp11" /function="hypothetical protein" /locus tag="Berrie_11" /note=Original Glimmer call @bp 8715 has strength 10.23; Genemark calls start at 8715 /note=SSC: 8715-9020 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TUCK_12 [Arthrobacter phage Tuck]],,NCBI, q1:s1 100.0% 4.75308E-60 GAP: 11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.255, -2.033982896655645, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TUCK_12 [Arthrobacter phage Tuck]],,WAB10787,99.0099,4.75308E-60 SIF-HHPRED: SIF-Syn: NKF, upstream gene is head-to-tail stopper, downstream gene is major tail protein, just like in phage Adolin. /note=Primary Annotator Name: Kamarzar, Minehli /note=Auto-annotation: Glimmer and GeneMark were used and both agreed on the same start site. The called start codon is 8715. /note=Coding Potential: The gene contains reasonable coding potential predicted within the putative ORF. The chosen start site covers all the coding potential. /note=SD (Final) Score: The SD score of -2.034 is the best option and the z-score is the highest at 3.255. /note=Gap/overlap: The gap with the upstream gene is very reasonable at a 11 bp gap. The length of the gene (306 bp) is acceptable given the auto-annotated start site. /note=Phamerator: As of January 20, 2022, the gene is found in pham 96954. The gene is conserved in Phage Adolin, Adumb2043, Amyev, and Asa16 which all belong to the same cluster (AZ) as Phage Berrie. The phages used for comparison were Phage Adolin, Adumb2043, Amyev, and Asa16. There was no function call for this gene. /note=Starterator: Start site 26 was manually annotated in 26/56 non-draft genes in this pham. Start 26 is 8715 in Berrie. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: The gathered evidence suggests that the original start site call at 8715 by Glimmer and Genemark is reasonable and it is most likely the potential start site. In addition, it also suggests that the gene is a real gene. /note=Function call: PhagesDB BLAST and NCBI BLASTp have multiple hits with small e-values and no known function. PhagesDB BLAST gave hits with e-values of e–43, while NCBI BLASTp gave e-values of e-52 and e-53. The top NCBI BLASTp and PhagesDB BLAST hits sorted by e-values show high identity values (>86%) and >99% query coverage. HHpred and CDD had no relevant hits. /note=Transmembrane domains: TMHMM and TOPCONS did not predict any TMDs, which indicates that it is not a membrane protein. /note=Secondary Annotator Name: Huq, Naveed /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 9020 - 9430 /gene="12" /product="gp12" /function="tail terminator" /locus tag="Berrie_12" /note=Original Glimmer call @bp 9020 has strength 6.75; Genemark calls start at 9020 /note=SSC: 9020-9430 CP: yes SCS: both ST: SS BLAST-Start: [tail terminator [Arthrobacter phage Iter] ],,NCBI, q1:s1 100.0% 2.33421E-89 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.701, -3.1085431521568267, yes F: tail terminator SIF-BLAST: ,,[tail terminator [Arthrobacter phage Iter] ],,URQ05000,98.5294,2.33421E-89 SIF-HHPRED: Tail terminator protein Rcc01690; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_F,94.1176,99.3 SIF-Syn: tail terminator, upstream pham is 98136, downstream is major tail protein, just like in phage Elezi and DrSierra /note=Primary Annotator Name: Krug, Kelley /note=Auto-annotation: Glimmer and GeneMark agree on the 9020 start site /note=Coding Potential: Good coding potential in the 2nd frame, start 9020 encompasses all the coding potential. Has start codon GTG and LORF. /note=SD (Final) Score: Final score of -3.109 and z-score of 2.701, both of which are the best available options /note=Gap/overlap: -1 bp gap, indicative of a operon /note=Phamerator: Pham 2023 as of 1/20/22. Gene is conserved in many other phages in cluster AZ, such as Adolin, as well as EH. /note=Starterator: The most annotated start site is number 2, found in 33/40 of genes in pham and called 100% of the time when present. Berrie also has this start site, corresponding to start 9020. /note=Location call: Likely a real gene with start site 9020, as evidenced by the above information. /note=Function call: Phagesdb BLAST had several significant hits with low e-values for tail terminator, HHPRED had significant hit for tail terminator (99.3% probability, 94.1% coverage, e value of 6.4e-10), NCBI BLAST had a few significant hits for tail terminator (94.9% identity, 94.9% aligned, 100% coverage, 2.25e-87). No hits on CDD. Thus, going with tail terminator for the gene function. /note=Transmembrane domains: No hits on TmHmm nor Topcons. /note=Secondary Annotator Name: Esparza, Pablo /note=Secondary Annotator QC: i agree with all the calls. I would probably just talk a little more about phage comparison in this cluster inside the synteny box. Good job. CDS 9450 - 10001 /gene="13" /product="gp13" /function="major tail protein" /locus tag="Berrie_13" /note=Original Glimmer call @bp 9450 has strength 15.24; Genemark calls start at 9450 /note=SSC: 9450-10001 CP: yes SCS: both ST: SS BLAST-Start: [major tail protein [Arthrobacter phage London] ],,NCBI, q1:s1 100.0% 6.79017E-125 GAP: 19 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.179, -2.7645180226256008, yes F: major tail protein SIF-BLAST: ,,[major tail protein [Arthrobacter phage London] ],,QOP64316,98.3607,6.79017E-125 SIF-HHPRED: YSD1_22 major tail protein; Bacteriophage tail, helical assembly, VIRAL PROTEIN; 3.5A {Bacteriophage sp.},,,6XGR_M,92.3497,98.6 SIF-Syn: This is a major tail protein. Upstream is a tail terminator and downstream is a tail assembly chaperone, just like phage Amyev from cluster AZ. /note=Primary Annotator Name: Lee, Adrienne /note=Auto-annotation: Start site at 9450 is called by both Glimmer and GeneMark. /note=Coding Potential: There is coding potential throughout the whole ORF in both GeneMark host and self. /note=SD (Final) Score: The final score is -2.765 which is the lowest score out of all the gene candidates. /note=Gap/overlap: There is a 19 base pair gap which is reasonable since it is small. There is also no coding potential in this gap. /note=Phamerator: This gene is part of Pham 88338 as of January 20, 2022. This pham has 137 members in total including Adolin and Amyev. /note=Starterator: The most conserved start site is 18 at 9450 for Berrie. This start site is called for 75/114 non-draft genes in this pham. /note=Location call: Based on the evidence, this is a real gene and the start site is 9450, which was called by GeneMark. This start site was also determined by Starterator and conserved in majority of non-draft phages in the pham. /note=Function call: Major tail protein: According to PhagesDB Blast, this gene is a major tail protein in many other phages within cluster AZ. For example, this gene is a major tail protein in Amyev with an e-value of 1e-96. HHPRED also has evidence that this is a major tail protein based on hit 6XGR_M with a coverage of 92.3497 and e-value of 5.3e-6. NCBI Blast also gives many phages with the same functional call such as Asa16 with an identity of 95.6284, alignment of 98.3607, coverage of 100 and e-value of 4.95337e-125. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Melkote, Aditi /note=Secondary Annotator QC: Agree with this call! CDS 10094 - 10366 /gene="14" /product="gp14" /function="tail assembly chaperone" /locus tag="Berrie_14" /note=Original Glimmer call @bp 10094 has strength 16.75; Genemark calls start at 10094 /note=SSC: 10094-10366 CP: yes SCS: both ST: SS BLAST-Start: [tail assembly chaperone [Arthrobacter phage Tuck]],,NCBI, q1:s1 100.0% 1.13117E-53 GAP: 92 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.266, -2.0720764396375664, yes F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Arthrobacter phage Tuck]],,WAB10790,96.6667,1.13117E-53 SIF-HHPRED: SIF-Syn: Tail assembly chaperone, upstream gene is major tail protein, downstream is tape measure protein, overlapping is another tail assembly chaperone, just like in phage Lego /note=Primary Annotator Name: Magaling, Janelle /note=Auto-annotation: Glimmer and GeneMark agree on start 10094 with start codon ATG /note=Coding Potential: There is one forward ORF with good coding potential shown on host and self trained genemark and the start site covers all coding potential /note=SD (Final) Score: Final score is -2.072 which is less negative than the other start. Z score is 3.266 which is good. /note=Gap/overlap: There is a 605 bp overlap which is concerning, but also shows synteny with Kaylissa and Lego. This gene is most likely a tail assembly chaperone which is known to be a fusion of a short gene and a long gene so this overlap makes sense. Length of the gene is 273 which is acceptable. /note=Phamerator: 01/22/22 pham 98425. 33/41 genes in the pham are also in cluster AZ such as Lego and Kaylissa. All non draft genes in pham called for tail assembly chaperone which is on the approved functions list. /note=Starterator: The most annotated start site was start 5 which was called in 31/32 non draft genes in the pham. Berrie also calls this start 5 at 10094. /note=Location call: the evidence above suggests that this is a real gene with good coding potential and is conserved in phamerator and starterator. /note=Function call: Tail assembly chaperone. There are no good HHpred hits and there were no CDD hits. NCBI blast had good hits for tail assembly chaperone with low e value (1.7e-53), high % identity (95.6%), and high query coverage (100%). /note=Transmembrane domains: there are no TmHmm hits so cannot check topcons. this is not a membrane protein. /note=Secondary Annotator Name: Niazmandi, Kiana /note=Secondary Annotator QC: Great work! I agree with your location and function call. please include evidence of strong hits in phagesdb blast. CDS join(10094..10360,10360..10698) /gene="15" /product="gp15" /function="tail assembly chaperone" /locus tag="Berrie_15" /note= /note=SSC: 10094-10698 CP: no SCS: neither ST: NI BLAST-Start: [tail assembly chaperone [Arthrobacter phage Tuck]],,NCBI, q1:s1 99.5025% 3.53994E-136 GAP: -273 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.266, -2.0720764396375664, yes F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Arthrobacter phage Tuck]],,WAB10844,97.5124,3.53994E-136 SIF-HHPRED: SIF-Syn: /note=manually annotated using other AZ phages as example - AF CDS 10711 - 13188 /gene="16" /product="gp16" /function="tape measure protein" /locus tag="Berrie_16" /note=Original Glimmer call @bp 10711 has strength 11.68; Genemark calls start at 10711 /note=SSC: 10711-13188 CP: yes SCS: both ST: SS BLAST-Start: [tape measure protein [Arthrobacter phage London] ],,NCBI, q1:s1 99.8788% 0.0 GAP: 11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.266, -2.0720764396375664, yes F: tape measure protein SIF-BLAST: ,,[tape measure protein [Arthrobacter phage London] ],,QOP64319,87.0801,0.0 SIF-HHPRED: Tape Measure Protein, gp57; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_AF,91.2727,99.7 SIF-Syn: Berrie displays synteny with Amyev. The downstream gene is in pham 98425 and is a tail assembly chaperone, while the upstream gene is in pham 95993 and is a minor tail protein. /note=PECAAN Notes /note=Primary Annotator Name: Ostroske, Elyse /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 10711. /note=Coding Potential: Coding Potential was found on the forward strand in both Host- and Self- Trained GeneMark, indicating that this is a forward gene. The auto-annotated start site and stop site perfectly correlates to the coding potential. /note=SD (Final) Score: -2.072. This is the best final score out of the potential start sites. /note=Gap/overlap: 344 bp gap. While this is large, it is by far the best gap out of the potential start sites. /note=Phamerator: As of 1/23/22, this gene is in pham 95151, which has 87 members, 8 of which are cluster AZ phage genes. /note=Starterator: The start number called the most often in the published annotations is 3, it was called in 72 of the 79 non-draft genes in the pham. However, this gene does not have that start site. Start Site 9 was called for this gene, and is found in 8 of 87 ( 9.2% ) of genes in pham, with manual Annotations of this start being 3/79 and called 100.0% of time when present. Start Site 9 corresponds to 10711 bp, the site called by both Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene with start site at 10711. This start site was called by both Glimmer and GeneMark, encompasses all the coding potential, has the best final score and Z-score, the most reasonable gap, and was validated by Starterator. /note=Function call: Tape Measure Protein. Elezi, Asa16, Eraser were all checked as evidence under PhagesDB BLAST because of their low e-values. The above phages, as well as Kaylissa and Amyev and other AZ phages that fell under the same accession number were checked as evidence under NCBI BLASTp for their low e-values. CDD had hits that came up as tape measure proteins but they were not strong enough to be checked as evidence. HHpred also predicted that it was a tape measure protein, but the hits given, although they had high probability and low e-values, had low coverage and so were not checked as evidence. /note=Transmembrane domains: 12 TMDs were predicted by TMHMM, but TOPCONS did not return any results. /note=Secondary Annotator Name: Erfanian, Kiana /note=Secondary Annotator QC: Everything looks good, nice notes! CDS 13181 - 14056 /gene="17" /product="gp17" /function="minor tail protein" /locus tag="Berrie_17" /note=Original Glimmer call @bp 13181 has strength 11.69; Genemark calls start at 13181 /note=SSC: 13181-14056 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage London] ],,NCBI, q1:s1 100.0% 0.0 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.866, -2.7653702713535186, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage London] ],,QOP64320,95.5326,0.0 SIF-HHPRED: HYPOTHETICAL PROTEIN 19.1; VIRAL PROTEIN, DISTAL TAIL PROTEIN; 2.95A {BACILLUS PHAGE SPP1},,,2X8K_C,98.9691,100.0 SIF-Syn: Berrie shares synteny with other genes in cluster AZ. For instance, Berrie and, Amyev, Asa16, Cassia, and Adumb2043 share genes 17, and downstream genes 18, 19, and 20, which all have a minor tail protein function. Berrie and Elezi also share upstream genes 3 to 15, genes 18 to 24, and genes 25 to 34. /note=Primary Annotator Name: Santos, Charysa /note=Auto-annotation: Glimmer and GeneMark. Both start at 13181. It has a Glimmer score of 11.69. /note=Coding Potential: Coding potential in this ORF is found in the direct/forward sequence, thus this is a forward gene. Coding potential is found in GeneMark Host and Self. /note=SD (Final) Score: -2.765. It is the best/least negative final score on PECAAN for this gene. This start site (@13181) minimizes the gap size (-8), unlike the other start sites with larger gaps. /note=Gap/overlap: -8. Very small overlap compared to the other start sites, meaning there is minimal space for another gene in between the previous gene and this one. /note=Phamerator: As of 01/21/22, this gene is found in Pham 95993 and is conserved in other members of cluster AZ, including Adumb2043 and Amyev. /note=Starterator: The “Most Annotated” start site (3) is present in 25 of 28 non-draft genes in this pham, and is present in Berrie. This start site corresponds to base pair position 13181, the auto-annotated start site. /note=Location call: Using the evidence I found, this is a real gene and the most probable start site is 13181. /note=Function call: The top hits for Phagesdb BLAST were all minor tail function genes with e-values of <1e-149. HHPRED shows hits for distal tail protein genes with high % coverage (>96.9072), high probability (100), and low e-values (<2.2e-25). The top results for NCBI BLAST were also minor tail proteins, the highest hit having a % identity of 90.0344, a % alignment of 95.5326, and 100% coverage with an e-value of 0. Another NCBI BLAST hit for minor tail protein had a % identity of 81.0997, a % alignment of 88.6598, and 100% coverage with a very low e-value (4.05829e-170). These results highly suggest that the function of this gene is a minor tail protein. /note=Transmembrane domains: There were no TMD predictions on either TMHMM or TOPCONS, so we can conclude that this gene is not a membrane protein. /note=Secondary Annotator Name: Khaine, Aye Myat /note=Secondary Annotator QC: I agree with the primary annotation that the start is at 13181 and the function is minor tail protein. CDS 14069 - 15064 /gene="18" /product="gp18" /function="minor tail protein" /locus tag="Berrie_18" /note=Original Glimmer call @bp 14069 has strength 7.84; Genemark calls start at 14069 /note=SSC: 14069-15064 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage Tuck]],,NCBI, q1:s1 100.0% 0.0 GAP: 12 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.266, -2.0720764396375664, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Tuck]],,WAB10793,98.4894,0.0 SIF-HHPRED: Receptor Binding Protein; beta sandwich domain, phage receptor binding protein, Lactococcus lactis pellicle cell wall polyphosphosaccharide, VIRAL PROTEIN; 1.75A {Lactococcus phage 1358},,,4L9B_A,54.9849,99.4 SIF-Syn: Minor tail protein, upstream gene is minor tail protein, downstream gene is minor tail protein, just like in phage Phives. /note=Primary Annotator Name: Sheppy, Tyler /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 14069. The codon for this start site is ATG. /note=Coding Potential: There is reasonable coding potential covered by both the Host-Trained and the Self-Trained GM. This start site covers all of this coding potential. /note=SD (Final) Score: The Final Score for the start site is -2.072, which is the best final score among all of the possible candidates. The Z-score for the start site is 3.266, which is the highest among all of the possible start site candidates. /note=Gap/overlap: There is a reasonable gap of 12 bp between the start of this gene and the end of the upstream gene. All other start site candidates increase the gap. /note=Phamerator: As of January 24, 2022, this gene is found in pham 15182. This pham is conserved in other phages in the AZ Cluster, such as phage Phives and phage Amyev. The most common function called for genes in this pham is minor tail protein. /note=Starterator: A start site is conserved among members of this pham. The most-conserved start site is start number 4 and it corresponds to position 14069, which is the auto-annotated start site in the phage. 19 of 19 non-draft genes call this start. /note=Location call: This is a real gene with a start site of 14069. Glimmer and GeneMark agree on the auto-annotated start site. The start site has the best Final Score and Z-score among all other candidates. Starterator also shows that this start is conserved among genes in this pham. Also, there is reasonable coding potential in the open reading frame for both the Host-Trained GeneMark and the Self-Trained GeneMark. /note=Function call: The function of this gene is minor tail protein. The top hit in NCBI BLAST names the function as minor tail protein and it has a 92.4242% identity, 96.9697% alignment, and 100% coverage. It also has an e-value of 0. The top hit in PhagesDB BLAST has an e-value of 1e-178. This function was also conserved in other genes in the same pham, according to Phamerator. CDD did not have any hits. The top hit in HHpred has a probability of 99.4%, 54.9849% coverage, and e-value of 3.1e-11, but it lists the function as a Receptor Binding Protein, which is not on the approved functions list. Also, this conflicts with the evidence seen in NCBI BLAST, PhagesDB BLAST, and Phamerator. /note=Transmembrane domains: There are no transmembrane domains called by either TOPCONS or TmHmm. This is not a transmembrane protein. /note=Secondary Annotator Name: McLinden, Katherine /note=Secondary Annotator QC: Looks great! I agree with your function call as well. I do not think the HHPRED hit was strong enough to change the function since all other evidence points to minor tail protein. CDS 15065 - 16186 /gene="19" /product="gp19" /function="minor tail protein" /locus tag="Berrie_19" /note=Original Glimmer call @bp 15065 has strength 11.24; Genemark calls start at 15065 /note=SSC: 15065-16186 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage London] ],,NCBI, q1:s1 100.0% 0.0 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.866, -3.1164791313608173, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage London] ],,QOP64322,95.4424,0.0 SIF-HHPRED: Tail protein, 43 kDa; tail protein, structural genomics, PSI, MCSG, Protein Structure Initiative, Midwest Center for Structural Genomics, UNKNOWN FUNCTION; 2.1A {Neisseria meningitidis MC58} SCOP: b.106.1.1,,,3D37_A,93.0295,99.7 SIF-Syn: This gene shows perfect synteny in this region with AZ phages Asa16 and Eraser. As of 1/24/22, pham 15182 (minor tail protein) is upstream, pham 96077 (minor tail protein) downstream. This makes sense because all of these genes share a function. /note=Primary Annotator Name: Stephenson, Juliet /note=Auto-annotation: GeneMark and Glimmer call start site @ 15065, codon ATG. /note=Coding Potential: There is coding potential present in the second ORF (Host and Self-Trained GeneMark), and the selected start codon does include all the coding potential. There is also some coding potential in the fourth ORF, but it is too short to be a gene, and thus should be ignored. /note=SD (Final) Score: This Start Site does have the best Final score (-3.116) as well as the best Z-score (2.866). /note=Gap/overlap: There is no gap (0 bp). The gene is extremely long at 1122 bp, but it appears to be well-supported. /note=Phamerator: Pham 57436 on 1/24/22. Common in cluster AZ, present in some singletons, minor tail protein. /note=Starterator: Most annotated start for this pham (4) is called in this phage. 18/20 non-draft phages call site 4, which corresponds to 15065 in Berrie. /note=Location call: Based on the good coding potential, synteny with other AZ phages, and the number of hits in Phagesdb Blast, this is a real gene with start site 15065. /note=Function call: Top three hits in NCBI Blast are for minor tail proteins (ID>90%, Coverage=100%, low e-value). Top three hits for HHPred are minor tail proteins (prob>99%, Coverage>92%). Other phages in Cluster AZ (Asa16, Eraser, London) with this gene call as minor tail protein. /note=Transmembrane domains: No transmembrane domains present according to TmHmm. /note=Secondary Annotator Name: Uvarov, Evgeniy /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 16192 - 19419 /gene="20" /product="gp20" /function="minor tail protein" /locus tag="Berrie_20" /note=Original Glimmer call @bp 16192 has strength 8.18; Genemark calls start at 16192 /note=SSC: 16192-19419 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage Asa16]],,NCBI, q334:s212 69.0233% 0.0 GAP: 5 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.841, -3.4678233629445905, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Asa16]],,UAJ15381,70.3704,0.0 SIF-HHPRED: Endo-1,4-beta-xylanase C; Binding Site, Carbohydrates, Enzyme Stability, Substrate Specificity, Endo-1, 4-beta-xylanase, Xylan-binding domain, Thermophilic enzymes, Thermostabilizing Domains, sugar; HET: CA, GOL; 2.43A {Paenibacillus barcinonensis},,,4XUP_A,44.9302,97.5 SIF-Syn: Minor tail protein, the gene is of pham 96077. Upstream is a gene of pham 57436 and downstream is a gene of pham 55453, just like in phage Asa16 of the AZ cluster. /note=Primary Annotator Name: Thorp, Jocelyn /note=Auto-annotation: 16,192 is selected as the start site by both Glimmer and GeneMark. /note=Coding Potential: There is reasonable coding potential within the putative ORF. /note=SD (Final) Score: -3.468. This is the best final score of all the potential start sites, and the z-score is the best as well (2.841). /note=Gap/overlap: 5, a reasonable gap. /note=Phamerator: As of 1/23/2022, this gene is in pham 96077. There are 25 other phages that have this gene, all of which are from the AZ cluster. The function of this pham has been called to be a minor tail protein. /note=Starterator: Start site 1 is called 100% of the time when present, in 18 of 18 non-draft genes. This start site corresponds to 16,192. /note=Location call: Based upon the evidence above, this is a real gene with the start site of 16,192. This is the LORF. /note=Function call: Minor tail protein. BLASTp returned many hits with low e-values indicating a strong match to minor tail proteins of other phages within the AZ cluster. For example, from the NCBI BLASTp results, there is a 64% identity match with 70% coverage to the minor tail protein gene from phage Asa16, with the returned e-value 0. CDD and HHPRED did not return informative hits. /note=Transmembrane domains: There are no predicted TMDs in TOPCONS or TMHMM, therefore it is not a membrane protein. /note=Secondary Annotator Name: Alvarez, Alondra /note=Secondary Annotator QC: I have QC`ed this location call and agree with the primary annotator. CDS 19430 - 19765 /gene="21" /product="gp21" /function="hypothetical protein" /locus tag="Berrie_21" /note=Original Glimmer call @bp 19430 has strength 9.86; Genemark calls start at 19430 /note=SSC: 19430-19765 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TUCK_22 [Arthrobacter phage Tuck]],,NCBI, q1:s1 100.0% 9.70338E-71 GAP: 10 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.266, -1.9310779259753799, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TUCK_22 [Arthrobacter phage Tuck]],,WAB10796,100.0,9.70338E-71 SIF-HHPRED: SIF-Syn: NKF(from Pham 55453), upstream gene is from pham 96077, downstream is from pham 55522, just like in phage Asa16 /note=Primary Annotator Name: Zhuang, Chuzhi /note=Auto-annotation: both Glimmer and Genemark agree on the start site #19430, start codon ATG /note=Coding Potential: have reasonable coding potential, chosen start site cover all this coding potential /note=SD (Final) Score:-1.931, it is the best final score /note=Gap/overlap: 10, it is the shortest gap. Other start sites have gaps that are much larger. /note=Phamerator: pham number - 55453, date - 1/24/2022, the gene is conserved in other phages in AZ cluster, Asa16 is used for comparison. No function specified. /note=Starterator: The conserved start site in the pham is 11, and it corresponds to 19430 in my phage. 16/23 of final genes called site 11. /note=Location call: real gene, start at #19430 /note=Function call: NKF, no blast returned hits with known function, but it matched with some proteins in the phages of AZ cluster with no known function. /note=Transmembrane domains: No hit for transmembrane domains. /note=Secondary Annotator Name: Baughman, Lexie /note=Secondary Annotator QC: I have QC`ed the location and functional calls and agree with the first annotator. CDS 19835 - 20302 /gene="22" /product="gp22" /function="membrane protein" /locus tag="Berrie_22" /note=Original Glimmer call @bp 19835 has strength 7.63; Genemark calls start at 19835 /note=SSC: 19835-20302 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Phives] ],,NCBI, q1:s1 100.0% 7.27499E-80 GAP: 69 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.744, -3.6716023243868, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Phives] ],,YP_010677657,95.4839,7.27499E-80 SIF-HHPRED: SIF-Syn: The upstream gene is in pham 55453, this gene is pham 55522, and this gene is in pham 21631, just like in phage Asa16. /note=Primary Annotator Name: Fleming, Hanna /note=Auto-annotation: Glimmer and GeneMark call this gene with a start site at 19835 bp. /note=Coding Potential: There is coding potential on GeneMark and GeneMarkS between 19850 and 20300 bp. The start site covers the entire coding potential. /note=SD (Final) Score: The final score is -3.672, this is the best final score. /note=Gap/overlap: There is a gap of 69 bp in front of this gene. This is a reasonable gap and this same gap exists in other AZ phages such as Asa16. /note=Phamerator: This gene is in pham 55522 as of 1/22/2022, there were 30 members of this pham including many AZ phages. /note=Starterator: This gene has start 8, the most annotated start site which is found in 17 out of 29 of genes. The most annotated start is 19835 bp in Berrie. Starterator agrees with Glimmer and GeneMark. /note=Location call: Based on the collected evidence, this is a real gene with a start site at 19835 bp. /note=Function call: NKF/membrane protein. There are multiple hits on phagesdb BLAST with e-values of <6e-78 suggesting that this is a real gene but of unknown function. NCBI BLAST had hits with e-values<5.3e-80, % identities >88 and coverages of 100% with a function of membrane protein. HHPRED and CDD were uninformative. /note=Transmembrane domains: Yes. TmHmm predicts 4 transmembrane domains. TOPCONS also predicts 4 transmembrane domains. This gene likely encodes a membrane protein. /note=Secondary Annotator Name: Dooley, Naomi /note=Secondary Annotator QC: Primary Annotator Name: Dooley, Naomi /note=Auto-annotation: The gene is called by Glimmer and Gene mark at the same start site 6813 along with the same start codon at ATG. This is a very common start codon so there is reason to believe that this may be the correct start site. /note=Coding Potential: The gene has reasonable coding potential that is covered by the start site. /note=SD (Final) Score: The SD score was reasonable at -2.507 which is the best score listed. The z-score of 3.158 is greater than 2 and is the best z-score listed. The next best z-score has an unreasonable gap of 77. This auto annotated start site has the best gap listed. /note=Gap/overlap: There is a reasonable gap of 26bp which is the smallest gap listed. This is a reasonable gap and it is unlikely for a new gene to be found in this gap. /note=Phamerator: The pham number as of 1/24/2022 is 57253, this pham was also found to contain a wide range of clusters with a total of 230 members /note=Starterator: The start number is 7 which corresponds to a 6813 start site. It does have the most annotated start site which agrees with glimmer and genemark /note=Location call: Based on the data it appears that this is a real gene with a 7757 stop site and 6813 start site. /note=Function call: Based on the phages Phives and Kaylissa, it is safe to conclude that the function of this gene is major capsid protein. The top two BLAST hits on PhagesDB both have E-values of 1e-160 and 1e-159; additionally, they both have high scores of 563 and 558 respectively, both of which have a major capsid protein listed (both have high probability, 99.9% for both, both have high coverage, 92.9936% and 89.4904% for both, both have high identity (91.772% and 90.7936%) and e-values of 0. CDD contains one hit with very low identity (~12%), 68% coverage, and an e-value of 2.68e-14. CDD calls a P22 coat protein. This could indicate that it is slightly conserved. /note=Transmembrane domains: Neither TMHMM or TOPCONS predicted any TMDs. Thus we can conclude that this is not a membrane protein. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: I have QC`d the following gene and agree with the primary annotator CDS 20305 - 20736 /gene="23" /product="gp23" /function="membrane protein" /locus tag="Berrie_23" /note=Original Glimmer call @bp 20293 has strength 12.25; Genemark calls start at 20305 /note=SSC: 20305-20736 CP: yes SCS: both-gm ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Phives] ],,NCBI, q1:s1 100.0% 2.32746E-92 GAP: 2 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.431, -4.021779273157878, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Phives] ],,YP_010677658,97.2028,2.32746E-92 SIF-HHPRED: SIF-Syn: /note=1 TMD in deep TMHMM -AF CDS 20720 - 20965 /gene="24" /product="gp24" /function="membrane protein" /locus tag="Berrie_24" /note=Original Glimmer call @bp 20720 has strength 12.23; Genemark calls start at 20720 /note=SSC: 20720-20965 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Phives] ],,NCBI, q1:s1 98.7654% 1.39696E-37 GAP: -17 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.158, -2.297359520375101, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Phives] ],,YP_010677659,92.5,1.39696E-37 SIF-HHPRED: SIF-Syn: Membrane Protein. Upstream is a gene of pham 21631 and downstream is a deoxynucleoside monophosphate kinase of pham 54294 which is the same in phage DrSierra. /note=Primary Annotator Name: Paek, Brian /note=Auto-annotation: Both Glimmer and GeneMark agree that the start site is 20720 with a start codon of ATG. /note=Coding Potential: There is high coding potential based on the middle frame going in the forward direction within the gene range for both host-trained and self-trained GeneMark. /note=SD (Final) Score: The Final Score is -2.297, and the Z-score is 3.158. The Z-score is the best score out of the options, however, the Z-score is not the best among the options, but it is still appropriate. /note=Gap/overlap: There is a 17 bp overlap which is reasonable because it is all going on the forward strand. This start site produces the longest ORF of 246 bp which is acceptable because it is consistent with the idea that the genes must be densely packed. /note=Phamerator: Pham: 10993. Date Analyzed: 01/21/2022. The gene is conserved in cluster AZ and found in phages Powerpuff, Tallboy, and Cassia. /note=Starterator: Start site 6 is called in 26 out of 32 of the non-draft genes in this pham. 19 of the 24 members of pham 10993 were manually annotated for this start site. Start site 6 is the most annotated site and correlates to 20720 bp in phage Berrie. /note=Location call: The gathered evidence suggests that this is a real gene and the most probable start site is at 20720. /note=Function call: Membrane Protein. 2 out of 3 top NCBI BLAST hits also have the membrane protein function. (> 78% coverage, 68%+ identity, and E-value <10^-31). /note=Transmembrane domains: TmHmm predicted 2 TMHs, suggesting that this gene encodes for membrane proteins. /note=Secondary Annotator Name: Abuwarda, Manar /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. There is a typo under final score category. You mentioned Z-score twice, but also should note that both your Z-score and Final scores are the best options. CDS 21080 - 21694 /gene="25" /product="gp25" /function="deoxynucleoside monophosphate kinase" /locus tag="Berrie_25" /note=Original Glimmer call @bp 21080 has strength 19.37; Genemark calls start at 21080 /note=SSC: 21080-21694 CP: yes SCS: both ST: SS BLAST-Start: [deoxynucleoside monophosphate kinase [Arthrobacter phage Powerpuff] ],,NCBI, q1:s1 100.0% 1.23306E-136 GAP: 114 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.266, -1.9310779259753799, yes F: deoxynucleoside monophosphate kinase SIF-BLAST: ,,[deoxynucleoside monophosphate kinase [Arthrobacter phage Powerpuff] ],,QGZ17322,96.5686,1.23306E-136 SIF-HHPRED: c.37.1.1 (A:) Deoxynucleoside monophosphate kinase {Bacteriophage T4 [TaxId: 10665]},,,d1deka_,93.1373,99.9 SIF-Syn: deoxynucleoside monophosphate kinase; upstream is membrane protein, downstream is NKF. This is the same in phage Elezi but upstream is an endolysin. /note=Primary Annotator Name: Rajiv, Subashni /note=Auto-annotation: Glimmer calls the start at 21080. Genemark calls the start at 21080. The start codon is ATG. /note=Coding Potential: The coding potential in this ORF is only in the forward strand, suggesting it is a forward gene. Coding potential is found in both GeneMark Host and GeneMark Self. /note=SD (Final) Score: The Final Score is -1.931 and the Z-score is 3.266. This is the start site with the best score, and smallest gap. This start site allows for the longest possible ORF. /note=Gap/overlap: There is a gap of 114 bp. There is no coding potential in the gap. /note=Phamerator: Pham 54294 on 1/21/2022. It is conserved in phage Adolin (AZ), Adumb2043 (AZ), and phage Crewmate (AZ). /note=Starterator: Start site 36 in Starterator was found in 37/175 of genes in this pham. It was manually annotated 28 times for. Start 36 is 21080 in Berrie. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 21080. /note=Function call: The likely function is deoxynucleoside monophosphate kinase. PhagesDB’s two top hits predicted deoxynucleoside monophosphate kinase and adenylate kinase with e-values of 1e-110 and 1e-109 , and identities of 96% and 94%, respectively. NCBI’s two top hits also predicted deoxynucleoside monophosphate kinase and adenylate kinase with e-values of 8e-140 and 5e-138 and identities of 96% and 95%, respectively. The CDD database did not have any significant hits. HHpred’s top hits were deoxynucleoside monophosphate kinase and they had e-values of 4e-21 and 2e-17, and probabilities near 100%. /note=Transmembrane domains: No transmembrane domains were called in TMHMM or TOPCONS. It is not a membrane protein. /note=Secondary Annotator Name: Batteikh, Maysaa /note=Secondary Annotator QC: I agree with your call, but the final score and z-score that is chosen is the best for this gene. CDS 21783 - 22382 /gene="26" /product="gp26" /function="hypothetical protein" /locus tag="Berrie_26" /note=Original Glimmer call @bp 21783 has strength 11.69; Genemark calls start at 21783 /note=SSC: 21783-22382 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_LEGO_25 [Arthrobacter phage Lego]],,NCBI, q5:s1 95.9799% 5.86596E-104 GAP: 88 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.238, -5.377667145487419, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LEGO_25 [Arthrobacter phage Lego]],,QIN94425,86.9792,5.86596E-104 SIF-HHPRED: SIF-Syn: NKF, upstream gene is exonuclease (Pham 95523), downstream is deoxynucleoside monophopshate kinase (Pham 54294), similar to phage Lego from the same cluster, AZ. /note=Primary Annotator Name: Huq, Naveed /note=Auto-annotation: Glimmer and Genemark both agree on start site 21783, start codon is ATG /note=Coding Potential: Reasonable coding potential in putative ORF, covered by chosen start site /note=SD (Final) Score: The original start site of 21783 has a final score of -5.378 and a Z-score of 2.238. This start site does not have the best Ribosome Binding Site score but it does have the best Z-score and the other starts are not better because the gap is much larger. /note=Gap/overlap: Gap of 88 with the upstream gene is unreasonable but gene length is reasonable /note=Phamerator: 2502 - 1/21/22. The pham my gene belongs to is present in other members of the cluster, AZ. The phage that I used for comparison is Lego. No function called. /note=Starterator: Non-conserved start site number 10, @21783, 0/41 other members of pham call same start site number /note=Location call: Not real gene, unconserved in starterator /note=Function call: None of the databases were able to predict a function /note=Transmembrane domains: No TMDs predicted /note=Secondary Annotator Name: Kamarzar, Minehli /note=Secondary Annotator QC: I agree with the location and functional call. You mentioned that the RBS score is not the best, but it looks like it is the best in the options given. I also think you should mention that the 88 bp gap seems reasonable since no coding potential was found in the gap and also no other genomes from phages Amyev, Adolin, and Adumb2043 in cluster AZ had a gene present in the gap. I think this is a real gene because all the evidence supports that it is. I would check with professors to make sure, but it is highly likely that it is a real gene. You should definitely check off the top hits for phagesdb BLAST and NCBI BLAST to support the unknown function. Additionally, you should mention the important values such as e-value, query coverage, and etc. in the function call notes. For transmembrane domain, mention that you used Topcons and Tmhmm. CDS 22583 - 23416 /gene="27" /product="gp27" /function="Cas4 exonuclease" /locus tag="Berrie_27" /note=Original Glimmer call @bp 22583 has strength 17.13; Genemark calls start at 22583 /note=SSC: 22583-23416 CP: yes SCS: both ST: SS BLAST-Start: [exonuclease [Arthrobacter phage Iter] ],,NCBI, q1:s1 100.0% 0.0 GAP: 200 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.1, -2.356391881054909, yes F: Cas4 exonuclease SIF-BLAST: ,,[exonuclease [Arthrobacter phage Iter] ],,URQ05014,98.917,0.0 SIF-HHPRED: Cas4_I-A_I-B_I-C_I-D_II-B; CRISPR/Cas system-associated protein Cas4. CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) and associated Cas proteins comprise a system for heritable host defense by prokaryotic cells against phage and other foreign DNA.,,,cd09637,77.9783,99.0 SIF-Syn: /note=AF: calling as cas4 exonuclease. contains 4R5Q_A plus hits to PDDEXK_1 ; PD-(D/E)XK nuclease superfamily. /note=Primary Annotator Name: Esparza, Pablo /note=Auto-annotation: Both Glimmer and GeneMark agree that it is start site 22,583 F. /note=Coding Potential: Coding potential encompasses the gene candidates start and end sight. There appears to be gene coding potential here. It has a little dip in the middle of the coding potential but may not indicate much as it goes down about a little past half way. /note=SD (Final) Score: The final score is -2.356 and the z-score is 3.1 They are both the best out of all the gene candidates. It also is the longest of them all with 834 bp. There is a gap of 200bp but the other candidates have larger gaps anyways. Will review just in case there is no potential new gene before this one. /note=Phamerator: This was run as of Jan 24, 2022. Belongs to pham 95523. Shows relation between phages in this pham. While there does seem some level of conservation in this pham, start site is not conserved as it has shifted or made the gene change number. /note=Starterator: There are 102 members and 17 of them are draft. Is among the genes with the most annotated start. Has start site 19 and it has 32 MA`s. Has quite of number of phages in the same pham that are grouped with Berrie. /note=Location call: It does indeed seem like a real gene with start site @22583. /note=Function call: It appears to be a Cas 4 family exonuclease. All evidence mentions either this or standard exonuclease but when I checked phages in this pham I got phages that were either the former or ladder. Phagedb function frequency states that both are seen when discovered in similar and different clusters. Phages db Blast has many significant e-value results and they either point Cas or straight endonuclease. HHpred also detects a few significant hits stating it is Cas. CDD does not mention anything. NCBI also does not mention anything. /note=Transmembrane domains: TOPCONS and TMHMM do not predict anything in their hits. /note=Secondary Annotator Name: Krug, Kelley /note=Secondary Annotator QC: I agree with the annotation and location/function calls. Maybe NCBI BLAST was glitched when you were looking, but there are certainly several great hits now (e-value of zero, high coverage, align, etc), I would check as evidence for some of them. PhagesDB function list says should only call Cas 4 if it just includes exonuclease but not helicase domain. Not sure how to tell if it does. Exonuclease might be the safer call. Also, whichever function you end up going with only check evidence that supports that function. CDS 23413 - 23502 /gene="28" /product="gp28" /function="membrane protein" /locus tag="Berrie_28" /note=Genemark calls start at 23413 /note=SSC: 23413-23502 CP: yes SCS: genemark ST: SS BLAST-Start: GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.003, -4.641378341780927, yes F: membrane protein SIF-BLAST: SIF-HHPRED: SIF-Syn: membrane protein, unlike phage Adolin where this gene is a recombination directionality factor; upstream is HNH endonuclease unlike phage Adolin where it is a Cas4 family endonuclease; downstream gene is NKF like phage Adolin /note=Sasha Semaan: Real gene because of evidence of coding potential and synteny, as well as reasonable gap, z-score, final score. Little evidence to suggest function, but both Topcons and TMHMM had 1 hit to suggest that this gene is a membrane protein. DeepTMHMM also detects one TMD. /note=Primary Annotator Name: Melkote, Aditi /note=Auto-annotation: GeneMark calls the start at 23413, no Glimmer result. However this results in a gene that is only 90bp, which may be too small. The other start site suggested results in an even smaller gene of length 30bp. /note=Coding Potential: This ORF has good coding potential on both GeneMark S and GeneMark Host, with the start site 23413 including all of the coding potential. /note=SD (Final) Score: The final score is -4.641, this is not the best final score, but the start sites with higher (more negative) final scores result in a gene that is very small (30bp) and do not cover all the coding potential. The z score is 2.003. /note=Gap/overlap: The gap is -4bp, suggesting this gene is part of an operon. /note=Phamerator: 96316. Date 01/24/2022. It is not conserved in any other phages in the same cluster or any other cluster. May be part of an orpham. /note=Starterator: No starterator report, again suggesting this gene codes for a protein that is an orpham. /note=Location call: As of now with the available evidence, the start site appears to be 23413. However it is still to be confirmed if this is a real gene. /note=Function call: No PhagesDB BLAST results, no NCBI BLAST results, no relevant CDD hits. HHpred provides two hits for membrane protein and export protein, with coverage>82%, probability>67%, but with very high E-values (E-value>7.6), making these function calls unreliable. It is likely this gene has NKF. The lack of hits in PhagesDB and NCBI BLAST may be because this gene codes for a protein that is likely part of an orpham. /note=Transmembrane domains: TMHMM provides only one hit, which is insufficient evidence for the function to be that of a transmembrane domain. /note=Secondary Annotator Name: Lee, Adrienne /note=Secondary Annotator QC: This is a tricky call. Normal genes are at least 120 bp long and this one is only 90 bp. There are no other phages in this pham. The final score is not great, but the z-score is pretty good. You should check with the professors on this one since there is coding potential. I do agree with all the evidence you have collected. Don`t forget to select the starterator and gene candidate drop down menus. CDS 23499 - 23867 /gene="29" /product="gp29" /function="nucleoside deoxyribosyltransferase" /locus tag="Berrie_29" /note=Original Glimmer call @bp 23499 has strength 7.78; Genemark calls start at 23499 /note=SSC: 23499-23867 CP: yes SCS: both ST: NI BLAST-Start: [MazG-like pyrophosphatase [Arthrobacter phage Phives] ],,NCBI, q1:s1 100.0% 1.19121E-76 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.063, -6.517886497398833, no F: nucleoside deoxyribosyltransferase SIF-BLAST: ,,[MazG-like pyrophosphatase [Arthrobacter phage Phives] ],,YP_010677664,97.541,1.19121E-76 SIF-HHPRED: c.23.14.1 (A:9-160) Nucleoside 2-deoxyribosyltransferase {Trypanosome (Trypanosoma brucei) [TaxId: 5691]},,,d2f62a2,90.1639,99.7 SIF-Syn: /note=Primary Annotator Name: Niazmandi, Kiana /note=Auto-annotation: both glimmer and genemarks agree on the start site at 23499. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: the final score is -6.518 which is the best score on PECAAN. The next start site which has a slightly more negative final score is 23523. /note=Gap/overlap:there is an overlap of 4 base pairs with its upstream gene which is reasonable. This gene could be a part of an operon. /note=Phamerator: pham: 67497. Date 01/21/22. It is conserved; found in Adolin (AZ) and Amyev (AZ). /note=Starterator: Start site 17 in Starterator was manually annotated in19/34 non-draft genes in this pham. Start 35 is 23523 in Berrie is recommended, which is not the most conserved start number. This evidence doesn’t agree with the site predicted by Glimmer and GeneMark. /note=Location call: the strat site is most likely at 23499 because it completely covers all the action potential, but 23523 doesn’t cover the coding potential completely, they both have very close SD scores. /note=Function call: Unknown. The top three phagesdb BLAST hits are nucleoside deoxyribosyltransferase and they are all from cluster AZ (E-value <7e-60), and on of the top two NCBI BLAST hits also have nucleoside deoxyribosyltransferase function (100% coverage, 86<% identity, and E-value 9.12643e-69 and 9.96377e-69. In one of the top three hits of HHpred hits it functions as nucleoside deoxyribosyltransferase and the other two are unknown coverage (86<%)with <99.6% probability, and E-value of 5.4e-16- 9.4e-18. There is also one hit CDD that indicates this gene functions as a nucleoside deoxyribosyltransferase. /note=Transmembrane domains: there is no TMD in TmHmm and TOPCONS /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 23864 - 24268 /gene="30" /product="gp30" /function="LAGLIDADG endonuclease" /locus tag="Berrie_30" /note=Original Glimmer call @bp 23864 has strength 12.01; Genemark calls start at 23864 /note=SSC: 23864-24268 CP: yes SCS: both ST: SS BLAST-Start: [LAGLIDADG family homing endonuclease [Arthrobacter sp. EPSL27] ],,NCBI, q1:s1 100.0% 7.22773E-86 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.923, -2.707694805492614, yes F: LAGLIDADG endonuclease SIF-BLAST: ,,[LAGLIDADG family homing endonuclease [Arthrobacter sp. EPSL27] ],,WP_066431302,97.7612,7.22773E-86 SIF-HHPRED: I-CREI; ENDONUCLEASE, GROUP I MOBILE INTRON, INTRON HOMING, CHLOROPLAST DNA, LAGLIDADG MOTIF; 3.0A {Chlamydomonas reinhardtii} SCOP: d.95.2.1,,,1AF5_A,76.8657,99.5 SIF-Syn: LAGLIDADG Endonuclease. The upstream gene is LAGLIDADG endonuclease. The downstream gene is NKF. This is conserved in Amyev (AZ), Elezi (AZ). etc. However, the upstream gene presented in some other AZ phages also show function HNH endonuclease. /note=Sasha Semaan: Function call should be LAGLIDADG endonuclease. Plenty of good evidence in Phagesdb Blast, HHPRED, NCBI Blast, CDD to suggest that the function of this gene is an endonuclease, specifically LAGLIDADG endonuclease. /note=Primary Annotator Name: Senthilvelan, Jayasuriya /note=Auto-annotation: Glimmer and GeneMark agree on the start site and start codon: 23864, ATG. /note=Coding Potential: There is coding potential within the putative ORF. The start site covers this coding potential. /note=SD (Final) Score: -2.708 is for the start site 23864, which is the best possible score. /note=Gap/overlap: -4 bp overlap for start site 23864 is highly favorable. Gap has no coding potential. The called start site is the LORF. This gene and the gap is syntenic with other non-draft phages. /note=Phamerator: Gene is found in Pham 67874 as of 01/24/2022. This pham is in many members of cluster AZ (conserved). Function is either a HNH or LAGLIDADG endonuclease. /note=Starterator: Most annotated start is 13 (26/34 call it). This start site is found in Berrie as 23864. Hence, the auto-annotated start site is highly conserved among other non-draft phages. /note=Location call: Above evidence suggests this is a real gene and starts at 23864. /note=Function call: LAGLIDADG endonuclease. CDD says it has LAGLIDADG domain (e-value < 10^-3, coverage 55%). HHPred says it is endonuclease (two hits, both e-value < 10^-10, coverage > 70%). Phagesdb and NCBI BLAST both have HNH and LAGLIDADG endonuclease as good hits (scores > 200, e-value < 10^-50); however, LAGLIDADG was chosen due to CDD results. /note=Transmembrane domains: Neither TmHmm or Topcons predicts any TMHs. No evidence to suggest this gene product is associated with the membrane. /note=Secondary Annotator Name: Ostroske, Elyse /note=Secondary Annotator QC: I have QC`d this gene and agree with the primary annotator. Just don`t forget the synteny box! CDS 24418 - 25131 /gene="31" /product="gp31" /function="recombination directionality factor" /locus tag="Berrie_31" /note=Original Glimmer call @bp 24418 has strength 19.24; Genemark calls start at 24418 /note=SSC: 24418-25131 CP: yes SCS: both ST: SS BLAST-Start: [recombination directionality factor [Arthrobacter phage Powerpuff] ],,NCBI, q1:s1 99.5781% 1.01017E-161 GAP: 149 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.238, -4.979727136815382, no F: recombination directionality factor SIF-BLAST: ,,[recombination directionality factor [Arthrobacter phage Powerpuff] ],,QGZ17327,97.4684,1.01017E-161 SIF-HHPRED: Gp3-like ; Recombination directionality factor-like,,,PF18897.3,88.1857,100.0 SIF-Syn: Function: recombination directionality factor, upstream gene is LAGLIDADG endonuclease, downstream is NKF. This is conserved in phage Crewmate (AZ), Asa16 (AZ). etc. However, for the upstream gene, there is also possibility of HNH endonuclease. /note=Primary Annotator Name: Ma, Yiwen (Kristy) /note=Auto-annotation: GeneMark and Glimmer all agree on the same start site, which is 24418. The start codon is ATG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. There is coding potential predicted by Host-trained GeneMark and Self-trained GeneMark and the chosen start site includes all of the coding potential in both Host-trained GeneMark and Self-trained GeneMark. /note=SD (Final) Score: The Final Score is the best option of -4.980. The Z-score is 2.238, which is significant. /note=Gap/overlap: There is a 149 bp gap. Somewhat large, but ultimately reasonable because the gap is conserved in other phages of the same pham and there is no coding potential in the gap that might be a new gene. /note=Phamerator: Pham 4822. Date: 1/21/2022. Pham number 4822 has 102 members, 17 are drafts. /note=Starterator: Start 27 called the most often, it was called in 42 of the 85 non-draft genes in the pham. Start 27 was called in Berrie_30 (AZ). This evidence agrees with the site predicted by Glimmer and GeneMark. Start 27 also has 38 MA’s /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 24418. /note=Function call: Recombination directionality factor. 3 out of 5 top Phagesdb BLAST hits suggest that this is a recombination directionality factor (E-value < 1e-127, which is significant). The top 3 NCBI BLAST hits all support that there is a high possibility that this gene code for recombination directionality factor (% identity > 92.4%; %coverage > 99.5%; E-value < 3.9e-161). The only HHpred hit also suggests that this is a recombination directionality factor (Probability = 100; 88.2% coverage; E-value = 9.6e-35). CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Santos, Charysa /note=Secondary Annotator QC: I agree with the function and location calls. Don`t forget to select the Starterator drop down menu. CDS 25131 - 25277 /gene="32" /product="gp32" /function="membrane protein" /locus tag="Berrie_32" /note=Original Glimmer call @bp 25131 has strength 16.78; Genemark calls start at 25131 /note=SSC: 25131-25277 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage VResidence]],,NCBI, q1:s1 100.0% 6.37161E-13 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.122, -4.455662434380964, no F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage VResidence]],,UYL87636,81.25,6.37161E-13 SIF-HHPRED: SIF-Syn: /note=AF: 1 TMD in deepTMHMM /note=Primary Annotator Name: Wang, Jennifer Yiyang /note=Auto-annotation: Glimmer and Genemark both agree on a start site of 25131. The start codon ATG is called. /note=Coding Potential: This gene has reasonable coding potential predicted within the putative ORF, and the chosen start site covers all the predicted coding potential. /note=SD (Final) Score: final score=-4.456 for start site 25131. It is not the best final score on PECAAN but with the smallest gap. /note=Gap/overlap: gap=-1(overlap=1). Most reasonable overlap given out of the potential start sites. /note=Phamerator: Pham: 96177. Date 01/23/22. It is conserved in other 24 non-draft phages within the same cluster AZ, such as Adolin(AZ), Adumb2043(AZ), Amyev(AZ)...There is no function called for the gene. /note=Starterator: Start site 4 in Starterator was manually annotated in 24/24 non-draft genes in this pham. Start 4 is 25131 in Berrie. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 25131 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: Unknown function for the gene. The only one acceptable Phagesdb BLAST top hit states “function unknown”. There is no hit for NCBI BLAST. NKF, there is no hit for CDD and no good hit for HHpred since all e-values are extremely high. /note=Transmembrane domains: No TMD`s called and no evidence suggesting TMD function within the other databases, neither TMHMM nor TOPCONS calls it. /note=Secondary Annotator Name: Sheppy, Tyler /note=Secondary Annotator QC: I agree with this annotation. All of the evidence has been considered to make the location and function calls. The synteny box still needs to be updated. CDS 25350 - 25679 /gene="33" /product="gp33" /function="hypothetical protein" /locus tag="Berrie_33" /note=Original Glimmer call @bp 25350 has strength 9.72; Genemark calls start at 25350 /note=SSC: 25350-25679 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ASA16_33 [Arthrobacter phage Asa16]],,NCBI, q1:s1 100.0% 6.52196E-66 GAP: 72 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.111, -2.253486910374644, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ASA16_33 [Arthrobacter phage Asa16]],,UAJ15394,94.4954,6.52196E-66 SIF-HHPRED: SIF-Syn: NKF, upstream gene is in pham 96177, downstream gene is a glutaredoxin, just like in phage Elezi. /note=Primary Annotator Name: Whang, Allison /note=Auto-annotation: Glimmer and Genemark both agree on a start site of 25350. The start codon GTG was called. /note=Coding Potential: There is strong coding potential within the ORF with a start site of 25350 for both the host and self-trained Genemark coding maps. /note=SD (Final) Score: -2.253 for the start site 25350. This negative SD score (<-2) is highly indicative that this start site is the correct start site, even if the other start sites listed have more negative SD scores. /note=Gap/overlap: There is a 73 bp gap with the upstream gene, which stops at position 25277, whereas this gene is projected to start at 25350. When comparing Berrie’s pham map with other phages in this cluster (AZ) like Crewmate and Dr. Manhattan, I noticed that these phages had a gene inserted between this corresponding gene and the upstream gene. However, when compared to phage Elezi, this inserted gene was not present. Due to the small gap size and synteny with Elezi, I determine that there is no gene inserted in Berrie’s genome. /note=Phamerator: Information collected on 1/24/2022. The gene is found in pham 64637. All of the other phages that also had genes within this pham were within cluster AZ, the same as Berrie. No functions are listed on phamerator. /note=Starterator: Information collected on 1/21/2022. Start site 4 was the most annotated start site for the genes that are in this pham and was called 100% when it was present. For this particular gene, the corresponding start number to 4 is 25350. This is the same start site that was agreed upon by Glimmer and Genemark. /note=Location call: This gene seems like a real gene because start site 25350 covers all coding potential within the ORF, and that Glimmer and Genemark agree on this start site. /note=Function call: Relevant PhagesDB BLAST hits match to genes with no known function. Analogous genes from phages within the same cluster as Berrie (AZ) are included within these hits, and all have no known function. NCBI BLAST hits also match to proteins that are hypothetical with no listed functions. HHpred hits do match to genes with functions, but e values are very large (>10 ) meaning that these hits are not probable. Thus, this gene most likely has NKF.. /note=Transmembrane domains: No transmembrane domains indicated by TMHMM or TOPCONS. /note=Secondary Annotator Name: Stephenson, Juliet /note=Secondary Annotator QC: I have reviewed this annotation and find that all evidence has been considered. CDS 25682 - 25942 /gene="34" /product="gp34" /function="NrdH-like glutaredoxin" /locus tag="Berrie_34" /note=Original Glimmer call @bp 25682 has strength 15.47; Genemark calls start at 25682 /note=SSC: 25682-25942 CP: yes SCS: both ST: SS BLAST-Start: [NrdH-like glutaredoxin [Arthrobacter phage JohnDoe]],,NCBI, q4:s6 96.5116% 3.5333E-48 GAP: 2 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.866, -3.4175091270247986, yes F: NrdH-like glutaredoxin SIF-BLAST: ,,[NrdH-like glutaredoxin [Arthrobacter phage JohnDoe]],,WGH20746,88.6364,3.5333E-48 SIF-HHPRED: THIOREDOXIN; OXIDOREDUCTASE, GLUTAREDOXIN, TRX, GRX; 1.18A {BACILLUS CEREUS} SCOP: c.47.1.0,,,3ZIT_A,87.2093,99.4 SIF-Syn: Gene 34 is a glutaredoxin and the upstream gene belongs to pham 64637, just like in phage Eraser. However, the downstream genes do not match. /note=Primary Annotator Name: Wright, Nicklas /note=Auto-annotation: Genemark and Glimmer agree on a start site of 25682. The start codon is ATG. /note=Coding Potential: The gene has good coding potential in the forward direction and the start site includes all of the coding potential. /note=SD (Final) Score: The final score is -3.418 which is the best final score on PECAAN. /note=Gap/overlap: This gene has a 2 bp gap. /note=Phamerator: Pham 96417 as of 1/22/2022. This pham is in 932 phages from various clusters as well as singletons. The function of this pham is glutaredoxin. /note=Starterator: This pham is tremendous, with 864 non-draft members. The most conserved start site is 106 and it is called in 167 of the 864 non-draft phages. Berrie does not have start site 106 and instead has start site 89 which corresponds to position 25682. Start site 89 is called in 2.4% of phages. /note=Location call: This is likely a real gene with start site 25682. All other potential start sites have very large overlaps. /note=Function call: This gene is very likely a glutaredoxin. HHpred, BLASTp, and CDD all have very strong hits for glutaredoxin. (e-value < 3x10^-10). Additionally, this gene belongs to pham 96417, whose function is glutaredoxin. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Thorp, Jocelyn /note=Secondary Annotator QC: I agree with the annotation above. CDS 25939 - 26160 /gene="35" /product="gp35" /function="hypothetical protein" /locus tag="Berrie_35" /note=Original Glimmer call @bp 25939 has strength 9.19; Genemark calls start at 25939 /note=SSC: 25939-26160 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_JANEEMI_37 [Arthrobacter phage Janeemi]],,NCBI, q7:s4 91.7808% 3.47033E-37 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.599, -5.544323445864165, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_JANEEMI_37 [Arthrobacter phage Janeemi]],,UVK63557,91.4286,3.47033E-37 SIF-HHPRED: SIF-Syn: NKF gene in pham 18209 is flanked by glutaredoxin and metallophosphoesterase, just like in phage DrManhattan. /note=Primary Annotator Name: Chavez, Valeria /note=Auto-annotation: Glimmer and Genemark both agree on the same start site at position 25939. The start codon is ATG. /note=Coding Potential: There is reasonable typical and atypical coding potential within the putative ORF predicted by GeneMarkS and GeneMark Host. The chosen start site at position 25939 covers all of this coding potential. /note=SD (Final) Score: The SD score is -5.544 and does not predict the best sequence match. The Z-score is 1.599 and is only the second highest. There is another start site with SD score of -3.867 and Z-score of 2.866, but that would lead to a 50 bp gap with another gene. /note=Gap/overlap: This start site has an 4 bp overlap, which is preferred by the ribosome. /note=Phamerator: This gene is in Pham 18209 as of 01/24/22. Our phage is in subcluster AZ, and there are 5 non draft genomes in this cluster that also have this pham. Phages Adolin, DrManhattan, KeAlii, Phives, and Reedo were used for comparison. Phamerator did not have a function called for this gene. /note=Starterator: Start site 6 is conserved among other members of the pham to which this gene belongs. 3/5 non draft genes in this pham call this site, including Berrie. This site is called 100% of the time when present and has 3/5 manual annotations. /note=Location call: The gathered evidence suggests that this is a real gene and that start site 6 at basepair position 25939 is most likely the true start site. /note=Function call: The top 2 NCBI and PhagesDB BLASTp hits, sorted by e-value, suggested function is unknown, with high query coverage (>91%), high % identity (>60%), and low e-values (<7e-22). CDD had no hits. There were no informative HHpred hits. There is no suggested function. /note=Transmembrane domains: Since TMHMM and TOPCONS didn’t call at least 1 TMD, we can conclude that this protein doesn’t have any TMDs. /note=Secondary Annotator Name: Zhuang, Chuzhi (Louise) /note=Secondary Annotator QC: CDS 26157 - 26750 /gene="36" /product="gp36" /function="metallophosphoesterase" /locus tag="Berrie_36" /note=Original Glimmer call @bp 26157 has strength 9.62; Genemark calls start at 26157 /note=SSC: 26157-26750 CP: yes SCS: both ST: SS BLAST-Start: [metallophosphoesterase [Arthrobacter phage Adolin]],,NCBI, q1:s1 95.4315% 4.9897E-102 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.116, -4.387988835334809, no F: metallophosphoesterase SIF-BLAST: ,,[metallophosphoesterase [Arthrobacter phage Adolin]],,QHB36616,84.1837,4.9897E-102 SIF-HHPRED: MPP_AQ1575; Aquifex aeolicus AQ1575 and related proteins, metallophosphatase domain. This family includes bacterial and archeal proteins homologous to AQ1575, an uncharacterized Aquifex aeolicus protein.,,,cd07390,89.8477,99.8 SIF-Syn: Berrie_36 is flanked by Berrie_35 and Berrie37. Adolin is also found in cluster AZ with Berrie. Like Berrie_36, Adolin_34 is annotated to be a metallophosphoesterase and is part of pham 96672. The upstream genes, Berrie_35 and Adolin33, are part of Pham 18209 and currently are of no known function. Finally, the downstream genes Berrie_37 and Adolin_35 are both part of pham 98109 and encode Holliday junction resolvase. This synteny is still seen further downstream many genes. /note=Primary Annotator Name: Cheng, Celine /note=Auto-annotation: Both Glimmer and GeneMark call the gene and agree on a start site at 26157 bp. Glimmer assigns a score of 9.62. The start codon is ATG. /note=Coding Potential: Strong coding potential was found in both GeneMark Host and Self. The chosen start site captures all coding potential on the second forward reading frame. /note=SD (Final) Score: The final score of -4.388 is the not best final score listed in PECAAN, but it is still reasonable. Its corresponding Z-score is 2.408. More information in gap/overlap will further explain why this specific start site was chosen. /note=Gap/overlap: There is a 4 bp overlap with the upstream gene, and an 4 bp overlap with the downstream gene. The overlaps are of reasonable size, and might be indicative of an operon. All other starting sites have gaps/overlaps that are far too big, at >100 bp. /note=Phamerator: As of 01/23/2022, it is part of pham 96672. It is conserved in Adolin and Adumb2043, which are also part of cluster AZ with Berrie. Many genes in this pham are of metallophosphoesterase and phosphodiesterase. /note=Starterator: There are 97 non-draft genes in pham 96672, and 22 call the most annotated start site, start 51. This correlates to a start site at 26157 bp for Berrie. /note=Location call: Based on the evidence above, including coding potential, SD (final) score, gap/overlap size, and phamerator and starterator analysis, this gene is a real gene and has a start site at 26157 bp. We can see that this start site is called in Starterator, GeneMark, and Glimmer. /note=Function call: The PhagesDB Function Frequency table has hits for metallophosphoesterase and phosphodiesterase. Many PhagesDB BLAST hits also suggest this gene encodes metallophosphoesterase and phosphodiesterase. Specifically, it has a hit with Adumb2043 with an e-value of 4e-94 for phosphodiesterase, and two hits with Adolin and DrManhattan for metallophosphoesterase, with e-values of 1d-84 and 5e-85 respectively. HHPRED had hits for hypothetical protein, but also metallophosphoesterase and phosphodiesterase, with coverage of 89.8477% and 84.264%, and e-values of 4.4e-19 and 1.1e-17 respectively. NCBI BLAST has hits for metallophosphoesterase and phosphodiesterase, with 72.9592 and 80.7107% identity, 95.4315% and 95.9543% coverage, and e-values of 3.63994e-102 and 9.01078e-114 respectively. CDD also had hits for metallophosphatase domain and phosphoesterase, with 36.4706% and 33.3333% identity, 50.5882% and 43.5485% alignment, 90.8629% and 86.802% coverage, and e-values of 9.56739e-35 and 1.17987e-34 respectively. We`ve had hits for metallophosphoesterase, phosphoesterase, and phosphodiesterase. While they have had strong hits in NCBI BLAST, PhagesDB BLAST, HHPRED, and CDD, more genes in this pham of phages also in cluster AZ are labeled as metallophoesterase compared to phosphodiesterase or phosphoesterase. Additionally, metallophoesterase is accepted as a function by SEA-PHAGES, whereas phosphodiesterase is not. /note=Transmembrane domains: Both TMHMM and TOPCONS did not predict any TMDs. As a result, this is not a membrane protein. /note=Secondary Annotator Name: Fleming, Hanna CDS 26747 - 27190 /gene="37" /product="gp37" /function="Holliday junction resolvase" /locus tag="Berrie_37" /note=Original Glimmer call @bp 26747 has strength 10.23; Genemark calls start at 26747 /note=SSC: 26747-27190 CP: yes SCS: both ST: SS BLAST-Start: [holliday junction resolvase [Arthrobacter phage Janeemi]],,NCBI, q1:s1 100.0% 3.25632E-93 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.105, -4.490107201936413, no F: Holliday junction resolvase SIF-BLAST: ,,[holliday junction resolvase [Arthrobacter phage Janeemi]],,UVK63558,97.9592,3.25632E-93 SIF-HHPRED: HOLLIDAY-JUNCTION RESOLVASE; HYDROLASE, ENZYME, HOMOLOGOUS RECOMBINATION, HOLLIDAY JUNCTION RESOLVING ENZYME, NUCLEASE, ARCHAEA, THERMOPHILE; HET: EDO, SO4; 1.8A {SULFOLOBUS SOLFATARICUS} SCOP: c.52.1.18,,,1OB8_B,72.1088,99.2 SIF-Syn: Holliday junction resolvase, upstream gene is metallophosphoesterase and downstream gene is a NKF, just like in phage Adolin. /note=Primary Annotator Name: Cosentino, Evan /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 26747. /note=Coding Potential: There is good typical coding potential on GeneMark Host and on GeneMark Self. There is no atypical coding potential. /note=SD (Final) Score: -4.490. This is the best final score listed. /note=Gap/overlap: -4 bp overlap. This likely indicates the start of an operon. /note=Phamerator: This is listed in pham 98109. Date 1/23/22. There are other phages in cluster AZ listed in this pham such as phage Amyev and Adolin. Other phages in this pham had their function listed as holliday junction resolvase which is consistent with my other findings. /note=Starterator: Start site 78 in Starterator was manually annotated in 18 out of 283 non-draft genes. Start 78 is 26747 in Berrie. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: The gathered evidence suggests that the start site is at 26747. This is a real gene. /note=Function call: Holliday junction resolvase. NCBI BLAST lists the function as a holliday junction resolvase with high percent identities and low e-values. PhagesDB BLAST lists the function as a holliday junction resolvase with low e-values and high scores. No hits come up on CDD. HHPred also has the function as a holliday junction resolvase as the two best hits list that as the function and have high probabilities and low e-values. /note=Transmembrane domains: TmHmm and Topcons both do not call any transmembrane domains. /note=Secondary Annotator Name: Gonzalez, Celio /note=Secondary Annotator QC: Great work! CDS complement (27180 - 27347) /gene="38" /product="gp38" /function="hypothetical protein" /locus tag="Berrie_38" /note=Genemark calls start at 27347 /note=SSC: 27347-27180 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein PQE18_gp35 [Arthrobacter phage DrSierra] ],,NCBI, q1:s1 100.0% 3.74749E-26 GAP: 161 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.02, -2.523003374675015, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE18_gp35 [Arthrobacter phage DrSierra] ],,YP_010678360,94.5455,3.74749E-26 SIF-HHPRED: SIF-Syn: Although this gene does not appear to be conserved in all AZ phages, there is synteny with Niobe. This gene, 38, corresponds with gene 36 in Niobe, and both genes have no known function and are of pham 788. The upstream genes are both of pham 98109 with the functions Holliday junction resolvase. THe downstream genes are both of pham 83049 with function DNA primase/helicase. /note=DeepTMHMM detects a signal peptide -AF /note=Primary Annotator Name: Gibbons, Alicia /note=Auto-annotation: GeneMark calls the gene to have a start site of 27347, Glimmer does not call a start site. The auto-annotated start site has an ATG codon. /note=Coding Potential: This gene appears in the fifth track of GeneMark Host and has moderate coding potential. It is situated between two forward genes, which is unusual, although its coding potential is visible and contained by the start site. On GeneMark Self, this gene also appears in the fifth track and has coding potential contained by the start site over the gene interval. /note=SD (Final) Score: The auto-annotated start site has the lowest final score of -2.523 and the highest z-score of 3.02. This start site creates the third-largest ORF, preceded by a start site candidate with a final score of -4.549 and a z-score of 2.636 and a TTG codon, which is preceded by the LORF with a final score of -8.908, a z-score of 0.542, and a GTG start codon. /note=Gap/overlap: The gap of the auto-annotated start site is 161, which seems possibly unreasonable. The second-largest ORF has a gap of 92 and the LORF has a gap of 20. /note=Phamerator: Phamerator calls this gene to be of pham 788 as of 1/21/22. This pham is present in 22 non-draft phages, all of the cluster AZ. These genes are around the same length and number as that of Berrie, and none have a known function. /note=Starterator: Start 21 is found in 88.9% (24/27) of genes in this pham and called 95.8% of the time when present, including in Berrie, where it corresponds to the start site 27347. This high amount of conservation suggests that this is likely the correct start site. /note=Location call: Although it is unusual for a reverse gene to be situated between two forward genes, this situation is conserved in other phages according to pham maps, and all other evidence suggests that this is a real gene. This gene appears to be real with a start site of 27347. /note=Function call: This gene has no known function. PhagesDB BLAST returns no significant matches with known functions, HHPred returns no significant results, NCBI BLAST returns no significant matches with known functions, and CDD returns no data. /note=Transmembrane domains: TMHMM returns no predicted TMHs. /note=Secondary Annotator Name: Paek, Brian /note=Secondary Annotator QC: I agree with all the evidence provided and the conclusions made. CDS 27509 - 29992 /gene="39" /product="gp39" /function="DNA primase/helicase" /locus tag="Berrie_39" /note=Original Glimmer call @bp 27509 has strength 11.9; Genemark calls start at 27509 /note=SSC: 27509-29992 CP: yes SCS: both ST: SS BLAST-Start: [DNA primase/helicase [Arthrobacter phage Lego]],,NCBI, q1:s1 100.0% 0.0 GAP: 161 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.02, -2.523003374675015, yes F: DNA primase/helicase SIF-BLAST: ,,[DNA primase/helicase [Arthrobacter phage Lego]],,QIN94438,92.8916,0.0 SIF-HHPRED: Primase; primase, helicase, ssDNA-binding protein, TRANSFERASE; HET: SO4; 2.406A {Nitratiruptor phage NrS-1},,,6K9C_A,40.3869,100.0 SIF-Syn: DNA primase/helicase; upstream gene is NKF, downstream gene is NKF, as in phage Amyev and Crewmate. /note=Primary Annotator Name: Di Blasi, Daria /note=Auto-annotation: Both Glimmer and GeneMark call the same start site 27509 with start codon CTG. The start site is not the LORF. /note=Coding Potential: Coding potential on the forward strand only, indicating that this is a forward gene. Strong coding potential on both host-trained and self-trained GeneMark. The start site covers all of the coding potential within the ORF for host- and self- trained GeneMark. /note=SD (Final) Score: The 27509 start site has the most favorable SD score (-2.523) and Z Score (3.02) of all potential start sites, suggesting that this start site is the real start site of the gene. /note=Gap/overlap: The gene has a gap of 161 bp with the upstream gene. This is the second smallest gap of all potential start sites (smaller gap is 92 bp). This gap seems to be conserved among other members of the AZ cluster like phage Elezi and phage Amyev. /note=Phamerator: The gene is part of pham 83049 as of January 24th, 2022. The pham has 99 members, 13 of which are draft genomes (including Berrie). 33 phages with pham 83049 belong to the AZ cluster such as Adolin(AZ) and Elezi(AZ). /note=Starterator: The highly conserved start site (start site 28) is not present in the Berrie genome. The most annotated site is called in 37 of the 83 non-draft genes in the pham; it is called 100% of the time when present. The predicted start site for Berrie is start 34 which is called in 32 of the 83 non-draft genomes in the pham (which is not much lower than the most annotated start); it is called 100% of the time when present. The majority of phages that call this start site belong to cluster AZ like Berrie. This evidence suggests that start site 27509 is the real start site for the gene in Berrie. /note=Location call: Based on all the evidence, the start site of the gene is likely site 27509 since both Glimmer and GeneMark call the gene, there is strong coding potential in GeneMark self and host, the start site has the most favorable SD Score (-2.523) and Z Score (3.02) of all potential starts, the site produces the second smallest gap with the upstream gene (conserved gap), and the start site is called 100% of the time in manual annotations of other AZ phages. /note=Function call: DNA primase/helicase; The top Phagesdb BLAST hits all call the gene a DNA primase/helicase (E-value=0), the top HHPred hits call the gene a DNA primase and a helicase (E-Value=2.5e-27, 5.3e-21), the top NCBI BLAST hits all call the gene a DNA primase/helicase (E-value=0), and the top 2 CDD hits call the gene a phage DNA primase (E-value=0), and thus there is evidence to support that the gene is a DNA primase/helicase. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Rajiv, Subashni /note=Secondary Annotator QC: I agree with all your calls. Make sure to check the PhagesDB Blast, HHpred, CDD, and NCBI Blast boxes for evidence. CDS 30007 - 30228 /gene="40" /product="gp40" /function="membrane protein" /locus tag="Berrie_40" /note=Original Glimmer call @bp 30007 has strength 12.16; Genemark calls start at 30007 /note=SSC: 30007-30228 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Janeemi]],,NCBI, q1:s1 100.0% 6.45298E-26 GAP: 14 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.106, -4.34663776219455, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Janeemi]],,UVK63561,91.7808,6.45298E-26 SIF-HHPRED: SIF-Syn: NKF, the upstream gene`s function is a DNA primase/helicase, and the downstream gene has NKF but is in pham 54541, just like in phage Phives /note=Deep TMHMM found one TMD -AF /note=Primary Annotator Name: Empson, Brianna /note=Auto-annotation: Both Glimmer and GeneMark call this gene’s start site at bp 30007 (methionine). /note=Coding Potential: There is reasonable coding potential within the putative ORF. The auto-annotated start site does cover all of this potential. /note=SD (Final) Score: The auto-annotated start site does have the best Final Score (-4.347), the best Z-score (2.106), and creates the LORF. /note=Gap/overlap: The gap with the upstream gene (14 bp) is slightly larger than an ideal gap would be, but still does appear to be reasonable given pham map evaluation of the gap. The auto-annotated start site creates the most reasonable gap compared to the other potential start sites, and the gene length is a reasonable 222 bp long. The auto-annotated start site also creates the LORF. /note=Phamerator: As on January 22, 2022, this gene was in pham 89533. The only other members of this pham are genes in phages that are also in Cluster AZ, so the pham is conserved amongst the cluster. The only non-draft phage in this cluster is Phives (Phives_41). Neither Phamerator nor the phages database had a function called for this gene. /note=Starterator: The most conserved start site in this pham is start site #1 which corresponds to bp 30007 (auto-annotated start site). This site is called in 1/1 of the non-draft phages and is called 100% of the time when present. /note=Location call: The evidence suggests that this is a real gene because there is reasonable coding potential within the putative ORF, and the auto-annotated start site at 30007 appears to be correct given that is has the best Final Score, Z-score, and creates the LORF. /note=Function call: The only hit from the NCBI Blast (identity ≈ 83.3%, coverage ≈ 97.3%, e-value ≈ 7.8e-25) suggests that this gene’s function is unknown. CDD and HHpred were uninformative. /note=Transmembrane domains: TMHMM predicted 1 TMD which is not enough to call this gene’s function as a membrane protein. /note=Secondary Annotator Name: Huq, Naveed /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 30225 - 30341 /gene="41" /product="gp41" /function="hypothetical protein" /locus tag="Berrie_41" /note=Genemark calls start at 30225 /note=SSC: 30225-30341 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein PQD82_gp42 [Arthrobacter phage Phives] ],,NCBI, q1:s1 100.0% 1.01765E-17 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.876, -4.905314844093283, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQD82_gp42 [Arthrobacter phage Phives] ],,YP_010677676,97.3684,1.01765E-17 SIF-HHPRED: SIF-Syn: Pham maps showing a shared synteny amongst phages within sub cluster. All have minor gaps between the end of the upstream gene and the beginning of this gene. Downstream are DNA helices and polymerase genes, so perhaps this gene has a function related to those that have not been uncovered yet. /note=Primary Annotator Name: Ghannam, Maisam /note=Auto-annotation: Glimmer does not call this gene, however Genemark does at 30225. The start codon is ATG or Methionine. /note=Coding Potential: Significant coding potential found on host and self trained Genemark. A -4 bp overlap exists at the start site, alluding to the presence of an operon. 117 bp length. Complete coding potential is encompassed in this read. /note=SD (Final) Score: Z score is not very strong, sits at 1.876, while the SD score is 4.905. These results are better than the previous candidate’s scores of .458 and -7.840. /note=Gap/overlap: There is a -4 bp overlap with the upstream gene that suggests potential operon. The gene points towards being real. /note=Phamerator: Phamerator shows 33 members of Pham 54541 with the average base pair length in the 110-120bp range. Most of the phages in this pham are in the AZ subcluster. Pham maps shows synteny in comparison to other phages such as Asa16 and Adolin. /note=Starterator: Starterator calling a variety of potential start sites, which aligns with the operon hypothesis. Most annotated start site is start 6 @30225 with 10 MAs out of 25, just under the 50% range. Other potential sites include start site 9 or 3. /note=Location call: The gene shows evidence of being real and initializing with an operon. The location call is Gene 41 start@30225, stop@30341, with all coding potential being within the frame. /note=Function call: PhagesDB BLAST shows low scoring compared to phages Lizalica and Phives (77-81) and low e value (2e-14) under unknown function category. HHpred readings showing similar results, listing hypothetical proteins under high probability and moderate coverage @ 55% for unknown hypothetical protein. /note=Transmembrane domains: None found, suggesting that this gene does not code for a protein that interacts with host cell membrane domains. No TOPCONS or CDD readouts either. /note=Secondary Annotator Name: Esparza, Pablo /note=Secondary Annotator QC: I agree with all the calls. The only thing I would add on is the synteny. Talk about functions upstream and downstream and check which phage in the cluster shares this to any degree. Otherwise good job. CDS 30512 - 32377 /gene="42" /product="gp42" /function="DNA polymerase I" /locus tag="Berrie_42" /note=Original Glimmer call @bp 30512 has strength 14.34; Genemark calls start at 30512 /note=SSC: 30512-32377 CP: yes SCS: both ST: SS BLAST-Start: [DNA polymerase I [Arthrobacter phage Powerpuff] ],,NCBI, q1:s1 100.0% 0.0 GAP: 170 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.362, -3.954790667257792, no F: DNA polymerase I SIF-BLAST: ,,[DNA polymerase I [Arthrobacter phage Powerpuff] ],,QGZ17341,97.9066,0.0 SIF-HHPRED: Prex DNA polymerase; DNA polymerase, TRANSFERASE; HET: SO4; 2.9A {Plasmodium falciparum},,,5DKT_A,96.7794,100.0 SIF-Syn: Upstream and downstream genes are conserved in phage Adumb2043, both NKF. DNA Polymerase I is conserved in Adumb2043 as well. /note=Primary Annotator Name: Light, Isabel /note=Auto-annotation: GeneMark and Glimmer both called the startsite of 30512 and the start codon is TTG, which is the least likely start site to be called. /note=Coding Potential: There is good coding potential in the putative ORF, and however the start site does not include all the coding potential. /note=SD (Final) Score: The final score is the best possible score of -3.955 and the Z-score is 2.362 which is good as well. /note=Gap/overlap: The autoannotated startsite gives the longest possible ORF, it is a reasonable length of 1866 bp. The gap is 170 bp long, which is long but reasonable. /note=Phamerator: As of 1/21/22 the gene is in pham #47481, with 870 members. There are other members of the AZ cluster in the pham, including Adolin and Adumb2043. DNA Polymerase I was called for the function which is on the SEAPHAGES approved function list. /note=Starterator: Starterator showed that Berrie calls the most annotated start. Startsite #56 which corresponds to 30512 bp in Berrie is called 98.1% of the time when present, with 775/810 phages. /note=Location call: Start site #56 is the best based on it being the longest ORF, having the best Final score and reasonable Z-score, smallest gap. Also starterator shows it is the most annotated start site. /note=Function call: BLAST had more than 5 hits with an E-value of 0 that called the function as DNA Polymerase I, from phages in the cluster AZ such as Adumb2043. HHPred showed more than 5 hits with an E-value of 0 for DNA Polymerase. NCBI BLASTp had more than 5 hits with an E-value of 0 that had a HTH DNA binding domain with more than 95% coverage. CDD results also showed more than 5 hits with an E-value of 0 for DNA Polymerase I. /note=Transmembrane domains: There are no transmembrane domains according to TmHmm and Topcons. /note=Secondary Annotator Name: Melkote, Aditi /note=Secondary Annotator QC: Agree with this call! CDS 32374 - 32559 /gene="43" /product="gp43" /function="hypothetical protein" /locus tag="Berrie_43" /note=Original Glimmer call @bp 32389 has strength 6.73; Genemark calls start at 32389 /note=SSC: 32374-32559 CP: no SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_VRESIDENCE_42 [Arthrobacter phage VResidence]],,NCBI, q1:s1 96.7213% 4.71154E-25 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.824, -5.075057988041409, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VRESIDENCE_42 [Arthrobacter phage VResidence]],,UYL87646,85.2459,4.71154E-25 SIF-HHPRED: SIF-Syn: Upstream gene is DNA polymerase I, this gene is a NKF just like phage Adumb2043. Upstream gene is a DNA ligase which is also like phage Adumb2043. /note=Primary Annotator Name: Jin, Katherine /note=Auto-annotation: Both Glimmer and Genemark predicted 32389 as the suggested start site. The site number that was called is #43. The predicted start codon is ATG. /note=Coding Potential: There is good coding potential predicted by GeneMark Self (Track 1, has good coding potential in the ORF region), and by GeneMark Host (Track 1, upward hash and coding potential in ORF region). There is reasonable coding potential predicted within the ORF and over the start site, 32389. /note=Function call: The top 3 NCBI BLASTp hits, with E-values lower than 1e-15, suggest that this gene could have no known function (NKF) with high query coverage (>94%), but relatively low % identity (>61%- 71.7%). /note=The top 5 hits from PhagesDB BlastP yielded E-values all lower than 6e-17, and suggest the gene’s function is NKF. /note=Both CDD and HHpred did not lead to informative/significant results. HHpred had high e-values. /note=Transmembrane domains: There were no predicted TMH hits on TMHMM, and zero hits on TOPCONS. This means that the lack of data cannot serve as evidence for the gene. /note= /note=Secondary Annotator Name: NIAZMANDI, KIANA /note=Secondary Annotator QC: great work! I agree with the start site and the function!! please include the downstream gene in the synteny box, also include more information about the upstream and downstream genes of Adumb2043 in the synteny box CDS 32552 - 32857 /gene="44" /product="gp44" /function="DNA ligase" /locus tag="Berrie_44" /note=Original Glimmer call @bp 32552 has strength 14.53; Genemark calls start at 32552 /note=SSC: 32552-32857 CP: yes SCS: both ST: SS BLAST-Start: [DNA ligase [Arthrobacter phage London]],,NCBI, q1:s1 97.0297% 6.33262E-60 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.538, -3.587537338656976, yes F: DNA ligase SIF-BLAST: ,,[DNA ligase [Arthrobacter phage London]],,QOP64343,96.0,6.33262E-60 SIF-HHPRED: d.142.2.2 (A:1-314) Adenylation domain of NAD+-dependent DNA ligase {Thermus filiformis [TaxId: 276]},,,d1dgsa3,95.0495,99.6 SIF-Syn: DNA ligase protein as in phage Powerpuff. Gene upstream is NFK as in phage Powerpuff. Gene downstream is NFK as in phage Powerpuff. /note=Primary Annotator Name: Montoya, Cinthya /note=Auto-annotation start source: 32552 per Glimmer and GeneMark. /note=Coding Potential: There is coding potential present that spans over the putative ORF of this gene in both the self-trained and host-trained GeneMark reports. /note=SD (Final) Score: 2.538. This is the only entry available for this gene. However, a score of 2.538 is a reasonable score as it is greater than 2. /note=Gap/overlap: There is an overlap of 8 between the gene upstream and the putative gene. This overlap is reasonable given the length of this gene. This overlap is also conserved in other phages from the same cluster (Amyev, Crewmate, Powerpuff, etc.) /note=Phamerator: Pham: 9766 as of 1/24/22. There are 33 members in this pham, 7 being draft genomes. 25/33 members of this pham call this gene a DNA Ligase, thus, this gene appears to be highly conserved among members of the same pham and members of the same cluster, AZ. /note=Starterator: The most annotated start site is number 16 which is found in 14/33 (42.4%) genes in this pham. It has been manually annotated 9/26 times in non-draft genomes, thus being the most manually annotated start site. In Berrie’s genome, this is start site 6 @32552 which is consistent with Glimmer and Genemark’s auto-annotated start sites. /note=Location call: Based on the evidence presented, this is a real gene with the most reasonable start site being 32552. /note=Function call: DNA ligase. The top 5 hits on Phagesdb BLAST corresponding to phages Phives, Elezi, London, Warda, Lizalica, suggest that the function of this gene is DNA ligase with e-values ranging from 4e-49 to 3e-47. This information is consistent with NCBI BLAST which presents e-values ranging from 8.86e-62 to 9.14e-59, identity values ranging from 94% to 91%, and query coverages ranging from 100% to 92% for the top five hits. The top five hits on HHpred also call this gene a DNA ligase with probability values ranging from 99.6 to 99.5, e-values ranging from 2.5e-16 to 3.4e-15. Moreover, the top two hits on CDD call this gene a DNA ligase with e-values of 5.6e-12 and 2.06e-10. /note=Transmembrane domains: There is no predicted TMH by TOPCONS and TmHmm. Therefore, the function of this gene is not membrane protein. /note=Secondary Annotator Name: Senthilvelan, Jayasuriya /note=Secondary Annotator QC: Everything looks good! Nice work. I agree with both location and function call. CDS 32854 - 33162 /gene="45" /product="gp45" /function="hypothetical protein" /locus tag="Berrie_45" /note=Original Glimmer call @bp 32854 has strength 11.37; Genemark calls start at 32854 /note=SSC: 32854-33162 CP: no SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_LONDON_41 [Arthrobacter phage London]],,NCBI, q1:s1 99.0196% 1.35957E-49 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.158, -2.5074698667202133, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LONDON_41 [Arthrobacter phage London]],,QOP64344,86.1386,1.35957E-49 SIF-HHPRED: SIF-Syn: Adumb2043 is in pham 16103 with no known function, just like phage Berrie. The upstream gene is in pham 9766 with a function of DNA ligase, like Berrie as well. The downstream gene of Berrie is in pham 97248 and the function is DNA binding protein. Phage Adumb2043 had a downstream gene of the same pham but a different function was called: RNA polymerase sigma factor. /note=Primary Annotator Name: Semaan, Sasha /note=Auto-annotation: Both Glimmer and GeneMark agree on the same start site @ 32854. /note=Coding Potential: Host-trained GeneMark shows reasonable coding potential between the suggested start site and the stop site. /note=SD (Final) Score: -2.507; the only other suggested start site had a final score of -8.418, which is not ideal compared to the final score of the start site @ 32854. Additionally, the Z-score at this start site is 3.158 while the other is 0.778, meaning that the best final score and Z-score were from start site 32854. /note=Gap/overlap: There is a 4bp overlap suggesting that there is an operon at the start site of 32854. /note=Phamerator: Gene found in Pham 16103 as of 01/25/21. There are 86 members in this pham and 27 non-draft belong to Cluster AZ, the same cluster as Berrie. /note=Starterator: The suggested start site for this gene was the start-site called most often, in 64 of 71 non-draft genes, in the pham. Only 3 genes have the most annotated start site and do not call it. Overall, it is called 96.3% of the time when present and Adolin_44 (AZ), Adumb2043_41 (AZ) are two other phages among more that are in the same cluster as Berrie that call this start site. /note=Location call: Gathered evidence suggests this is a real gene that has a start site @ 32854. The start site codon is ATG which is most common, the overlap is 4bp that suggested an operon, there is ideal coding potential, and the most ideal final score and Z-scores. /note=Function call: Function unknown; PhagesDB had hits with no known function, and HHpred had not had meaningful hits, they had some hits suggesting that the function has an ATPsynthase for plants which is irrelevant for this gene. Finally, NCBI only had hits for hypothetical functions so overall this gene has no known function. /note=Transmembrane domains: No transmembrane domains. The absence of TMDs is not abnormal for a gene without a known function. /note=Secondary Annotator Name: Ma, Yiwen (Kristy) /note=Secondary Annotator QC: I agree with your location call and function call. It would be better to add some evidence that support the hits, such as E-value, probability percentage, identity percentage. etc. Don`t forget to select the Starterator and GM Coding Capacity drop down menu. Overall, good notes! CDS 33402 - 34193 /gene="46" /product="gp46" /function="DNA binding protein" /locus tag="Berrie_46" /note=Original Glimmer call @bp 33402 has strength 16.78; Genemark calls start at 33402 /note=SSC: 33402-34193 CP: yes SCS: both ST: SS BLAST-Start: [helix-turn-helix domain-containing protein [Pseudarthrobacter siccitolerans] ],,NCBI, q8:s5 97.3384% 4.85811E-102 GAP: 239 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.923, -2.7863799983944713, yes F: DNA binding protein SIF-BLAST: ,,[helix-turn-helix domain-containing protein [Pseudarthrobacter siccitolerans] ],,WP_050053359,78.1609,4.85811E-102 SIF-HHPRED: RNA polymerase sigma factor SigA; DNA-dependent RNA polymerase, nucleotidyl transferase, transcription initiation complex, TRANSCRIPTION; HET: SO4, EDO; 2.76A {Mycobacterium smegmatis (strain ATCC 700084 / mc(2)155)},,,5VI8_F,92.7757,100.0 SIF-Syn: DrManhattan and Adolin (AZ) both have similar phams that are bordering gene 33402. /note=Primary Annotator Name: Verpukhovskiy, Philipp /note=Auto-annotation start source: Genemark and Glimmer both call the site at 33402. Coding Potential: Coding potential in hosttrained and selftrained genemark is only in the forward strand, meaning that it is transcribed in the forward direction. /note=SD (Final) Score: -2.786, which is the best score on pecaan. /note=Gap/overlap: 239. Fairly large gap, but conserved with other final phages on phammaps (Amyev and Adolin both AZ). No coding potential in the gap. /note=Phamerator: Pham 97248. Date 1/24/22. Conserved, found in Amyev and Adolin (both AZ). /note=Starterator: Start site of 22 is manually annotated by 21 of 33 phages, with all being final phages. Start 22 is 33402 in Berrie. This agrees with GIimmer and Genemark. /note=Location call: Based on evidence above, this is a real gene with a location call at 33402. /note=Function call: DNA Binding protein. All except 1 of the 10 PhagesDB BLAST hits have the function called as DNA binding protein with Evalue<^-50. HHpred has all of its hits classifying the function as either sigma factor which is a protein needed for transcription in bacteria. All the hits are 99.9 or 100% probability, with Evalue<-20, and coverage ranging from 90 to 97%. NCBI Blast also has it called as either sigma factor or DNA binding protein, with 97+% coverage, 56-62% identity, 70-72% alignment, and Evalue<-99. CDD has relevant hits, with the first two having 85 and 91% coverage, low (20-30%) alignment and coverage, and e values<-4. /note=Transmembrane domains: No transmembrane domains are seen in THMMM nor Topcons. /note=Secondary Annotator Name: Wang, Yiyang (Jennifer) /note=Secondary Annotator QC: I agree with the start site and the function annotated by the primary annotator. Great job on putting in detailed information! I don`t think you need to check all the boxes for function, just 2-3 evidences should be fine. For the synteny box, maybe state the function of the gene and add the information(pham number) for the up and downstream gene. CDS 34240 - 34536 /gene="47" /product="gp47" /function="hypothetical protein" /locus tag="Berrie_47" /note=Original Glimmer call @bp 34240 has strength 15.28; Genemark calls start at 34240 /note=SSC: 34240-34536 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HOU48_gp45 [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 100.0% 1.43882E-46 GAP: 46 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.923, -2.7254235724530456, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU48_gp45 [Arthrobacter phage DrManhattan] ],,YP_009815388,84.6939,1.43882E-46 SIF-HHPRED: SIF-Syn: The upstream gene of this gene is DNA binding protein The downstream gene of this gene is SprT-like protease This gene is conserved in Adolin, with the upstream gene as DNA binding protein and the downstream gene as Spr_T protease. The function of the gene in synteny with this one is currently unknown. /note=Need to fix format for synteny box and add start codon - Janelle /note=Primary Annotator Name: Liao, Shiqing /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 34240. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. In both self and host-trained GeneMark, the whole coding potential is covered between the start and stop site. /note=SD (Final) Score: -2.725 It’s the best Final score. The Z score is 2.923, which is the highest. /note=Gap/overlap: The gap with the upstream gene is 46bp, which is a little bit big, but not big enough for another gene. There is no extra coding potential between this gene and the upstream gene. The gap is conserved in some phages from cluster AU1 such as Breylor17. Although there is another start site which results in the longest ORF and -8bp gap, it’s not selected because the final and z score are much lower than the current start site (the final score is -7.963, and the z score is 0.37). /note=Phamerator: Pham number is 11290. Recorded on 01/20/2022. This gene is in the same phamily with other 14 genes such as Adolin_46 (AZ) and BerrieBlu_45 (singleton). /note=Starterator: The most annotated start site is start 5, which is auto-called in Berrie at 34240. Start site 5 is found in 12 of 12 non-draft genes in this pham and is called 100% when present. Start 5 also agrees with Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 34240. /note=Function call: Multiple phagesDB BLAST and NCBI BLAST have hits suggesting unknown functions. CDD didn’t come back with hits. HHpred has multiple hits with unreasonably high e values (>44) Thus, the function of this gene is currently unknown. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Whang, Allison /note=Secondary Annotator QC: Agree with primary annotator. I would refer to the annotation manual for the correct way to format the synteny box. Example: "Portal protein, upstream gene is terminase, downstream is capsid maturation protease, just like in phage XXX". CDS 34666 - 35262 /gene="48" /product="gp48" /function="SprT-like protease" /locus tag="Berrie_48" /note=Original Glimmer call @bp 34666 has strength 17.47; Genemark calls start at 34666 /note=SSC: 34666-35262 CP: yes SCS: both ST: SS BLAST-Start: [SprT-like protease [Arthrobacter phage Amyev] ],,NCBI, q1:s1 100.0% 4.64629E-138 GAP: 129 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.923, -2.707694805492614, yes F: SprT-like protease SIF-BLAST: ,,[SprT-like protease [Arthrobacter phage Amyev] ],,YP_010677751,97.4747,4.64629E-138 SIF-HHPRED: SprT-like domain-containing protein Spartan; DPC repair protease, DNA BINDING PROTEIN; HET: ADP, MLZ, FLC; 1.5A {Homo sapiens},,,6MDW_A,51.0101,99.5 SIF-Syn: SprT-like protease, upstream gene in pham 11290, downstream gene in pham 11961, like Adumb2043. /note=Primary Annotator Name: Tenney, Megan /note=Auto-annotation: Both GeneMark and Glimmer predict a start of 34666 with a start codon of ATG. /note=Coding Potential: Both host-trained and self-trained GeneMark show strong coding potential that’s contained within the predicted ORF, beginning at 34666bp. This is strong evidence that this is a real gene. /note=SD (Final) Score: The final score is -2.708 which is the largest of all candidates. This start site is also associated with a z-score of 2.923, the largest of all candidates and very strong support for this predicted start site. /note=Gap/overlap: The gap is 129 bp, which is slightly large, but this gap is conserved in other AZ phages like Adolin and Adumb2043 and this start site minimizes the gap size while maximizing ORF length (597bp). This ORF length is conserved in many members of pham 19450, which supports this gene length. /note=Phamerator: As of 1/23/2022 this gene was in pham 19450, which has 70 members with only 13 draft genomes. This pham contains 33 phages from cluster AZ, like Amyev and Adumb2043. Many of these genes have an assigned function of SprT-lke protease or protease, 23 of these are from cluster AZ. /note=Starterator: As of 1/23/2022, start 38 (@34666bp in Berrie) was the most annotated start site. This site was called 94.7% of the time when present, which was 54.3% of the time amongst phages in pham 19450. This is strong evidence that start site candidates 38 is the correct start site. /note=Location call: Given the strong coding potential in both GeneMark host and self and conservation amongst other AZ phages, this gene is a real gene with a start site of 34666bp. This start site was the most annotated for genes in pham 19450, being called 94.7% of the time when present. This start site also maximized the ORF and had strong z and final scores, while also minimizing the upstream gap. /note=Function call: SprT-like protease. Both NCBI and PhagesDB BLAST showed multiple significant hits with SprT-like protease genes with very low e-values, percent coverages, and identities (96%, respectively). HHpred showed significant hits with SprT-like proteins with strong values (99% probability, and >50% coverage). CDD also showed one significant hit – a SprT-like family with an e-value of 2.49e-5 and 48.48% coverage. /note=Transmembrane domains: Neither TOPCONS or TmHmm detected transmembrane domains, therefore this gene does not associate with the membrane. /note=Secondary Annotator Name: Wright, Nicklas /note=Secondary Annotator QC: I agree with the location call of start site 34666 and the function call of SprT-like protease. CDS 35382 - 35540 /gene="49" /product="gp49" /function="hypothetical protein" /locus tag="Berrie_49" /note=Original Glimmer call @bp 35382 has strength 18.24; Genemark calls start at 35382 /note=SSC: 35382-35540 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQE13_gp45 [Arthrobacter phage Elezi] ],,NCBI, q1:s1 100.0% 2.61149E-25 GAP: 119 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.19, -2.0895162839948163, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE13_gp45 [Arthrobacter phage Elezi] ],,YP_010678023,100.0,2.61149E-25 SIF-HHPRED: SIF-Syn: Upstream gene 48 is SprT-like protease while downstream gene 50 is NKF. /note=state if start covers all coding potential. big overlap-- need to justify its ok- is there synteny?. need to state compared phages and functions in phamerator. add bp number and if most conserved is in berrie in starterator. add more detail in function call. need to check ncbi and blastp evidence. - janelle /note=Primary Annotator Name: Teoh, Bryan /note=Auto-annotation: Both Genemark and Glimmer agree on the same start site at 35382 F. Both have an ATG start site. /note=Coding Potential: Coding potential was detected in 1/6 ORFs of host-trained GeneMark, however, self-trained gene mark showed strong coding potential in the putative ORF of 1/6 sequences. /note=SD (Final) Score: 3.19 denotes strong suggestion the gene is valid; Start site 35382 has the best score of all suggested start sites. /note=Gap/overlap: Overlap of 119bp denotes the long non-coding regions between genes. /note=Phamerator: Pham 11961 at 1/21/2022. It is conserved in 22 other phages from various clusters. it is the most called gene (10) in 17/17 non-draft phage genomes. /note=Starterator: Ran on 1/21/2022; Start number 10 is the most called gene number in 22 phage genomes of the same Pham; The corresponding cluster represented in this Pham belongs to AZ. This is manually annotated in all 17 non-draft genes in this Pham. The evidence agrees with the site predicted by Glimmer and GeneMark. Pham Maps called this gene as a valid gene but function is unknown. /note=Location call: Based on the evidence, this is a valid gene and the appropriate start site should be 35382. /note=Function call : NKF as evident by phages DB function frequency, positive phages DB Blast hits, HHPred and NCBI Blast. They predicted coding potential and presence of conserved sequences. However, function is unknown /note=Transmembrane domains: TopCons did not find this gene as evidence of a transmembrane protein. /note=Secondary Annotator Name: Santos, Charysa /note=Secondary Annotator QC: I agree with the NKF call, however, you may want to add more details to your function call (ex: NCBI BLAST identity, alignment, coverage) and check a few boxes for evidence. CDS 35600 - 35944 /gene="50" /product="gp50" /function="hypothetical protein" /locus tag="Berrie_50" /note=Original Glimmer call @bp 35600 has strength 15.31; Genemark calls start at 35600 /note=SSC: 35600-35944 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ITER_48 [Arthrobacter phage Iter] ],,NCBI, q1:s1 100.0% 9.71875E-71 GAP: 59 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.719, -3.0713502170045657, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ITER_48 [Arthrobacter phage Iter] ],,URQ05036,95.614,9.71875E-71 SIF-HHPRED: SIF-Syn: Synteny is mostly maintained with other Phages in the same pham. Only phages Crewmate, Liebe, Maureen, Tweety19 and Yang in the Pham seem to not have a homolog to this gene. The surrounding gene length and structure is also maintained, including the gap between the previous gene and this one, and the also larger-than-average gap between this gene and the following. /note=is gene length acceptable? add functions called in phamerator and starterator. missing location call. fix synteny box format. - janelle /note=Primary Annotator Turon Font, Guillem /note=Auto-annotation: Glimmer and GeneMark agree on start 35600. The start is the largest ORF. ATG codon. /note=Coding Potential: The coding potential in Host-Trained GeneMark starts slightly before the autoannotated start site. There is no start site before it. Self-Trained GeneMark shows the same. /note=SD (Final) Score: Final score is -3.071. The highest among possible start sites. Z-Score for 35600 is 2.719. Above the 2 cutoff, and the highest among all available options. /note=Gap/overlap: +59bp. Longer than expected. The gap seems mostly conserved in other phages of the same cluster. /note=Phamerator: In Pham 77436 as of 1/24/22. This Pham holds 37 members, 31 of which are non-drafts. A significant amount of genes in the Pham are in Berrie`s cluster (AZ), and some are in cluster A. /note=Starterator: The Autoannotated start (35,600) is start 14 in starterator. It is the most common start in the Pham (found in 17 of 37 genes), and is called 100% of the time when present. It is the only manually annotated start this gene offers, with 13 MA`s. /note=Function call: PhagesDB BLAST calls no function in any of its top matches. The only matches that call functions have an e-value of at least 2.9. All NCBI BLAST results are hypothetical proteins. NCBI CDD has no matches. HHPred has no matches. The lowest e-value for HHPred is 23. Protein function cannot b determined. /note=Transmembrane domains: TmHmm shows no TMD`s. TOPCONS has no matches. /note=Secondary Annotator Name: Cheng, Celine /note=Secondary Annotator QC: I agree with all of the evidence presented that this gene is a real gene, its location call, and that it is of no known function. CDS 36108 - 37529 /gene="51" /product="gp51" /function="serine integrase" /locus tag="Berrie_51" /note=Original Glimmer call @bp 36108 has strength 10.61; Genemark calls start at 36105 /note=SSC: 36108-37529 CP: yes SCS: both-gl ST: SS BLAST-Start: [serine integrase [Arthrobacter phage Adolin]],,NCBI, q1:s1 100.0% 0.0 GAP: 163 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.841, -2.8970853586582503, no F: serine integrase SIF-BLAST: ,,[serine integrase [Arthrobacter phage Adolin]],,QHB36635,92.161,0.0 SIF-HHPRED: INTEGRASE; HYDROLASE, SERINE RECOMBINASE, UNIDIRECTIONAL, SITE-SPECIFIC RECOMBINATION; 2.15A {STREPTOMYCES PHAGE PHIC31},,,4BQQ_A,64.2706,100.0 SIF-Syn: Serine integrase, upstream gene is of pham 77436, downstream gene is of pham 98233, just like in phage Adumb2043 /note=is gene length acceptable? - janelle /note=Primary Annotator Name: Abuwarda, Manar /note=Auto-annotation: Glimmer and Genemark. Glimmer calls the start site at 36108 while Genemark calls the start site at 36105. Both calls are ATG codon which is a common start codon. /note=Coding Potential: The ORF has reasonable coding potential. Coding potential is found in both Genemark Self and Host. The chosen start site includes all the coding potential. /note=SD (Final) Score: -2.897. This is the best final score on PECAAN and is the start site predicted by Glimmer. /note=Gap/overlap: 163 bp. This is quite a large gap, however the gap is conserved in other AZ cluster phages and the gap has no coding potential. /note=Phamerator: Pham 78437. Date 1/24/22. It is conserved and found in Adolin (AZ). /note=Starterator: Start site 95 in Starterator was manually annotated in 32/524 non-draft genes in this pham. Start 95 is 36108 in Berrie. This is not the most annotated start site on Starterator, however Berrie does not have the most annotated start site. This evidence agrees with the start site predicted by Glimmer. /note=Location call: Based on the above evidence, this is a real gene with the most likely start site at 36108. /note=Function call: Serine Integrase. The top 5 non-draft phagesdb BLAST hits have the function of serine integrase (E-value = 0) and the top 5 NCBI BLAST hits also have the function of serine integrase (100% coverage, 86%+ identity, E-value = 0). According to CDD, it is part of Ser_recombinase superfamily (E-value 6.8e-29) which includes integrase function. HHpred had a hit for integrase with 100% probability, 64% coverage, and E-value of 1.7e-38. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Cosentino, Evan /note=Secondary Annotator QC: I agree with the function and location calls. Great job! CDS 37779 - 38051 /gene="52" /product="gp52" /function="hypothetical protein" /locus tag="Berrie_52" /note=Original Glimmer call @bp 37779 has strength 13.41; Genemark calls start at 37779 /note=SSC: 37779-38051 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_JANEEMI_53 [Arthrobacter phage Janeemi]],,NCBI, q1:s1 97.7778% 2.95851E-51 GAP: 249 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.02, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_JANEEMI_53 [Arthrobacter phage Janeemi]],,UVK63573,95.5556,2.95851E-51 SIF-HHPRED: SIF-Syn: Synteny is present in the pham with phage DrSierra. Both the upstream and downstream gene were conserved. upstream gene is in pham 55203 with NKF, and downstream is in pham 78437 with serine integrate function. /note=changed starterator box to NA bc only one site. add start codon. add if coding potential is good. is gene length acceptable? add more detail in phamerator (function and comparison phages). add bp of start in starterator- janelle /note=Primary Annotator Name:BAtteikh, Maysaa /note=Auto-annotation: Both glimmer and genemark agree on the same start site of 37779 /note=Coding Potential:The start site covers the entire coding potential for this gene in the forward direction, therefor, the gene is a forward gene in both Host-Trained and Self-Trained Gene Mark. It is the LORF for this gene /note=SD (Final) Score: -2.505. The only final score for this gene with a z-score of 3.02, which are very high and good scores. /note=Gap/overlap: There is a large gap of 249bp, but it is conserved throughout the cluster, phages in cluster AZ have similar gaps for this gene, indicating synteny of the gene with phage DrSierra and adolin. /note=Phamerator: As of 1/24/2022, this gene belongs to pham 98233, which has 30 members, 22 of which are nondraft genes and all belong to cluster AZ. /note=Starterator: The most annotated start for this pham is 5, which is called by 21 of the 22 non-draft phages, and called by this gene. /note=Location call: This is a real gene. Based on the evidence collected, the start site for this gene is 37779. /note=Function call: Function unknown. Multiple hits with very high e-values on phagesDB blast and NCBI Blast indicate no known function. HHPRED and CDD have no informative information. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Gibbons, Alicia /note=Secondary Annotator QC: Check spelling errors. Mention start codon used and the coding potential in GeneMark Host AND GeneMark Self -- does the start site contain all of the coding potential? Mention if this start site is the LORF. Mention if genes on Phamerator have assigned functions. For Location Call, mention if you think this gene is real. Also make sure to check Topcons for evidence (it was not run). For synteny, it appears that you mixed up the upstream and downstream genes. CDS 38048 - 38266 /gene="53" /product="gp53" /function="RNA binding protein" /locus tag="Berrie_53" /note=Original Glimmer call @bp 38048 has strength 13.56; Genemark calls start at 38048 /note=SSC: 38048-38266 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQD82_gp55 [Arthrobacter phage Phives] ],,NCBI, q1:s1 100.0% 1.81304E-41 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.255, -2.0949393225970705, yes F: RNA binding protein SIF-BLAST: ,,[hypothetical protein PQD82_gp55 [Arthrobacter phage Phives] ],,YP_010677689,97.2222,1.81304E-41 SIF-HHPRED: Host Factor for Q beta; Hfq, hexamer, RNA binding protein, translational regulator, Sm motif, TRANSLATION; HET: ACY; 1.55A {Staphylococcus aureus} SCOP: b.38.1.2,,,1KQ1_M,83.3333,96.2 SIF-Syn: /note=add start codon. is start 6 the most conserved in starterator?- janelle /note=Primary Annotator Name: Kamarzar, Minehli /note=Auto-annotation: Glimmer and GeneMark were used and both agreed on the same start site. The called start codon is 38048. /note=Coding Potential: The gene contains reasonable coding potential predicted within the putative ORF. The chosen start site covers all the coding potential. /note=SD (Final) Score: The SD score of -2.095 is the best option and the z-score is the highest at 3.255. /note=Gap/overlap: The 4 base pair overlap with the upstream gene is reasonable. The length of the gene (219 bp) is acceptable given the auto-annotated start site. /note=Phamerator: As of January 20, 2022, the gene is found in pham 55203. The gene is conserved in Phage Adolin, Adumb2043, Amyev, and Asa16 which all belong to the same cluster (AZ) as Phage Berrie. The phages used for comparison were Phage Adolin, Adumb2043, Amyev, and Asa16. There was no function call for this gene. /note=Starterator: Start site 6 was manually annotated in 13/23 non-draft genes in this pham. Start 6 is 38048 in Berrie. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: The gathered evidence suggests that the original start site call at 35519 by Glimmer and Genemark is reasonable and it is most likely the potential start site. In addition, it also suggests that the gene is a real gene. /note=Transmembrane domains: TMHMM and TOPCONS did not predict any TMDs, which indicates that it is not a membrane protein. /note=Secondary Annotator Name: Di Blasi, Daria /note=Secondary Annotator QC: Add the start codon in the auto-annotation; other than that: I agree with the primary annotator. CDS 38263 - 38541 /gene="54" /product="gp54" /function="hypothetical protein" /locus tag="Berrie_54" /note=Original Glimmer call @bp 38263 has strength 17.26; Genemark calls start at 38263 /note=SSC: 38263-38541 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQD82_gp56 [Arthrobacter phage Phives] ],,NCBI, q4:s3 96.7391% 1.12033E-27 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.431, -3.8116689268127657, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQD82_gp56 [Arthrobacter phage Phives] ],,YP_010677690,75.0,1.12033E-27 SIF-HHPRED: SIF-Syn: NKF, upstream is pham 55203, downstream is pham 96259, just like in phage Lego. /note=is length good? add functions in synteny box. - janelle /note=Primary Annotator Name: Krug, Kelley /note=Auto-annotation: Glimmer and GeneMark agree on the start site of 38,263 (GTG) /note=Coding Potential: Good coding potential in the 1st frame, start 38,263 encompasses all the coding potential. /note=SD (Final) Score: Has final score of -3.812 and z-score of 2.431 which is the best of the available options. /note=Gap/overlap: -4 bp gap, favorable, likely indicative of an operon. Additionally, the -4bp gap is much more reasonable than the 304bp gap the LORF has. /note=Phamerator: Pham 30015 as of 1/20/22. Conserved in many other phages belonging to AZ, like phage Elezi. No function called. /note=Starterator: Start 3 is the most annotated start site (17/18 non-draft). The start is found in Berrie and corresponds to start 38,263. /note=Location call: Likely a real gene starting at 38,263, as evidenced by the above information. /note=Function call: Phagesdb BLAST and NCBI BLAST both have hits for function unknown with phage Phives (65.2% identity, 75% aligned, 96.7% coverage, e-value of 8.2e-28). No significant hits on HHPRED nor CDD. Thus, going with NKF. /note=Transmembrane domains: No hits on TmHmm nor Topcons for transmembrane protein. /note=Secondary Annotator Name: Empson, Brianna /note=Secondary Annotator QC: I would comment on why it being indicative of an operon may override the importance of selecting the LORF. I would also make sure to mention if a function was called in the Phamerator section. You could select up to one more phage for NCBI Blast as evidence of NKF because there are other phages that meet the threshold, but I agree with your call. Everything else looks good! Just make sure to update synteny later if the upstream and downstream genes end up having functions :) CDS 38569 - 38805 /gene="55" /product="gp55" /function="hypothetical protein" /locus tag="Berrie_55" /note=Original Glimmer call @bp 38569 has strength 14.05; Genemark calls start at 38569 /note=SSC: 38569-38805 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein SEA_LONDON_54 [Arthrobacter phage London] ],,NCBI, q1:s1 97.4359% 1.62736E-22 GAP: 27 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.923, -2.7863799983944713, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LONDON_54 [Arthrobacter phage London] ],,QOP64357,40.0,1.62736E-22 SIF-HHPRED: SIF-Syn: Berrie has synteny with Elezi. The downstream gene is in pham 30015 and the upstream gene is in pham 97798. /note=changed starterator box to NA bc only one site. add start codon. is length acceptable? add comparison genes and functions in phamerator. - janelle /note=PECAAN Notes /note=Primary Annotator Name: Ostroske, Elyse /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 38569. /note=Coding Potential: Coding Potential was found on the forward strand in both Host- and Self- Trained GeneMark, indicating that this is a forward gene. The auto-annotated start site and stop site perfectly correlates to the coding potential. /note=SD (Final) Score: -2.786. This is the best final score, as there is only one potential start site given in PECAAN. /note=Gap/overlap: 27 bp gap. This is the best gap, as there is only one potential start site given in PECAAN. /note=Phamerator: As of 1/23/22, this gene is in pham 96259, which has 34 members, all of which are from cluster AZ. /note=Starterator: The start number called the most often in the published annotations is 5, it was called in 16 of the 26 non-draft genes in the pham, however Berrie does not possess this start site. Start Site 11 is called for this gene, and correlates to 38569, the start site called by both Glimmer and GeneMark. This start site is found in 8 of 34 (23.5%) of genes in pham and has 6 manual annotations. It is called 100.0% of the time when present. /note=Location call: Based on the above evidence, this is a real gene with start site at 38569. This start site was called by both Glimmer and GeneMark, and is the only potential start site given in PECAAN, and was validated by Starterator. /note=Function call: NKF. Phives, Asa16, Elezi, Eraser, London, and Niobe were checked under PhagesDB as evidence due to their low e-values and all have no known function. Hypothetical proteins from these phages were checked under NCBI BLASTp as evidence due to their high coverage and low e-values. No hits were found in CDD or HHpred. /note=Transmembrane domains: TMDs were not predicted by TMHMM or TOPCONS. /note=Secondary Annotator Name: Ghannam, Maisam /note=Secondary Annotator QC: I agree with this call. For the TMHMM/TOPCONS, perhaps explain the meaning behind no readouts -- such as what the function of this gene can/cannot be. Otherwise good job. CDS 38817 - 38999 /gene="56" /product="gp56" /function="hypothetical protein" /locus tag="Berrie_56" /note=Original Glimmer call @bp 38817 has strength 16.89; Genemark calls start at 38817 /note=SSC: 38817-38999 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_LONDON_56 [Arthrobacter phage London] ],,NCBI, q1:s2 100.0% 1.82673E-30 GAP: 11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.266, -1.993391246735709, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LONDON_56 [Arthrobacter phage London] ],,QOP64359,96.7213,1.82673E-30 SIF-HHPRED: SIF-Syn: Berrie shares synteny with Adumb2043. The upstream gene is from pham 96259 and the downstream gene is from pham 12611. /note=Primary Annotator Name: Santos, Charysa. /note=Auto-annotation: Glimmer and GeneMark. Both start at 38817. It has a Glimmer score of 16.89. /note=Coding Potential: Coding potential in this ORF is found in the complementary sequence, which suggests that it might be a reverse gene. Coding potential is found in GeneMark Host and Self. /note=SD (Final) Score: -1.993. It is the only final score on PECAAN for this gene. This start site (@38817) minimizes the gap size (11). /note=Gap/overlap: 11. Very small overlap compared to the other start sites, meaning there is minimal space for another gene in between the previous gene and this one. /note=Phamerator: As of 01/21/22, this gene is found in Pham 97798 and is conserved in other members of cluster AZ, including Cassia_55 and Crewmate_61. /note=Starterator: The “Most Annotated” start site (12) is present in 13 of 25 non-draft genes in this pham, and is present in Berrie. This start site corresponds to base pair position 38817, the auto-annotated start site. /note=Location call: Using the evidence I found, this is a real gene and the most probable start site is 38817. /note=Function call: Phagesdb BLAST presented hits with no known function, and HHPRED presented hits with very low probability (<24.8), low coverage (<18.3333), and very high e-values (>120). The top hit for NCBI BLAST was a hypothetical protein, with high % identity (98.3333), high alignment (100), high coverage (100), and a low e-value (2.01923e-32). Additionally, there were no hits for CDD. Thus, there is not enough evidence to call the function of this gene (NKF). /note=Transmembrane domains: There were no TMD predictions on either TMHMM or TOPCONS, so we can conclude that this gene is not a membrane protein. /note=Secondary Annotator Name: Light, Isabel /note=Secondary Annotator QC: Everything looks awesome! Notes are super comprehensive and thorough. In the synteny box, maybe just include the functions of the upstream and downstream genes (that both are NKF in Adumb2043 too?) CDS 38999 - 39214 /gene="57" /product="gp57" /function="hypothetical protein" /locus tag="Berrie_57" /note=Original Glimmer call @bp 38999 has strength 13.48; Genemark calls start at 38999 /note=SSC: 38999-39214 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ASCELA_59 [Arthrobacter phage Ascela]],,NCBI, q1:s1 100.0% 4.70858E-31 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.847, -3.90505122954242, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ASCELA_59 [Arthrobacter phage Ascela]],,WGH21582,88.7324,4.70858E-31 SIF-HHPRED: SIF-Syn: No known function, upstream gene is no known function, downstream gene is endolysin, which is similar to phage Lizalica. Phage Lizalica has an extra gene upstream between the corresponding genes for Berrie. /note=is the length acceptable? - janelle /note=Primary Annotator Name: Sheppy, Tyler /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 38999. The start codon for this site is ATG. /note=Coding Potential: There is reasonable coding potential covered by both the Host-Trained GeneMark and the Self-Trained GeneMark. This coding potential is covered by the start site. /note=SD (Final) Score: The Final Score for the start site is -3.905, which is the best Final Score of the candidates. The Z-score is 2.847, which is the highest among the candidates. /note=Gap/overlap: There is an overlap of 1bp between the start of this gene and the end of the upstream gene. This may indicate that this gene is part of an operon. /note=Phamerator: As of January 24, 2022, this gene is found in pham 12611. This pham is found in other phages in the AZ cluster, such as phage Phives and phage Amyev. There was no function called for this gene in Phamerator. /note=Starterator: A start site is conserved among members of this pham. The most-conserved start site is number 5 and it corresponds to position 38999, which is the auto-annotated start site in the phage. 10 of 14 non-draft genes call this start. /note=Location call: This is a real gene with a start site of 38999. Both Glimmer and GeneMark call the same start site. This start also has the best final score, best z-score, and the largest open reading frame. The start site covers all of the coding potential predicted by both the Host-Trained and the Self-Trained GeneMark. The start site is also supported by Starterator. /note=Function call: Function unknown (NKF). The top hit in NCBI BLAST has a 81.6901% identity, 87.3239% alignment, 95.7747% coverage, and an e-value of 3.15569e-30. This hit does not have a function called. The top hit in PhagesDB BLAST does not have a known function and it has an e-value of 7e-26. HHpred was not informative because the top hit had a large e-value. CDD did not have any hits. There was no function called for other genes in this pham, according to Phamerator. /note=Transmembrane domains: There are no transmembrane domains predicted by TOPCONS or TmHmm. This is not a transmembrane protein. /note=Secondary Annotator Name: Jin, Katherine /note=Secondary Annotator QC: Looks good! Just remember the syteny! CDS 39288 - 40256 /gene="58" /product="gp58" /function="endolysin" /locus tag="Berrie_58" /note=Original Glimmer call @bp 39288 has strength 10.94; Genemark calls start at 39288 /note=SSC: 39288-40256 CP: yes SCS: both ST: NA BLAST-Start: [endolysin [Arthrobacter phage Lego]],,NCBI, q1:s1 99.3789% 3.09128E-174 GAP: 73 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.255, -2.0162541296952132, yes F: endolysin SIF-BLAST: ,,[endolysin [Arthrobacter phage Lego]],,QIN94457,85.4545,3.09128E-174 SIF-HHPRED: b.84.3.2 (A:) Peptidoglycan hydrolase LytM {Staphylococcus aureus [TaxId: 1280]},,,d1qwya_,40.9938,99.3 SIF-Syn: Gene architecture conserved in phage Kaylissa. As of 1/24/22, pham 12611 upstream (NKF), 56784 downstream (NKF). /note=starterator start is not most called but is still called 14/73 times-- would starterator menu be SS? - janelle /note=Primary Annotator Name: Stephenson, Juliet /note=Auto-annotation: GeneMark and Glimmer call start site @ 39288, codon ATG. /note=Coding Potential: There is coding potential present in the third ORF (Host and Self-Trained GeneMark), and the selected start codon does include all the coding potential. There is also some coding potential in the fifth ORF, but it is too short to be a gene, and thus should be ignored. /note=SD (Final) Score: This Start Site does have the best Final score (-2.016) as well as the best Z-score (3.255) and the longest ORF, which supports the idea that this is the right start site. /note=Gap/overlap: There is a gap of 73 bp, but there is no coding potential present in the gap, so I do not think there is a gene missing. This gap appears to be conserved in fellow AZ phage Adumb2043. This gene is especially large at 969 bp. /note=Phamerator: Pham 98094 on 1/24/22. Present in several different clusters (AZ, AN, FF, AO, etc.). Most functions are variations on endolysin /note=Starterator: Most annotated start for this pham (28) is not present in this phage. 37/73 non-draft phages call site 28. Start site called in Berrie is 21, which corresponds to bp 39288. Start site 21 has been manually annotated 14 times. /note=Location call: Based on the good coding potential and the number of hits in Phagesdb Blast, this is a real gene with start site 39288. /note=Function call: HHPred has top two hits for analogous functions >90% probability, >40% coverage. NCBI Blast has three hits for endolysin >86% ID, >99% coverage. Calling endolysin. /note=Transmembrane domains: No transmembrane domains present according to TmHmm. /note=Secondary Annotator Name: Montoya, Cinthya /note=Secondary Annotator QC: I would elaborate on the SD score (ei what makes it the best vs others) and on the Gap (is it conserved? in other phages). Everything else looks great! CDS 40385 - 40726 /gene="59" /product="gp59" /function="hypothetical protein" /locus tag="Berrie_59" /note=Original Glimmer call @bp 40385 has strength 13.59; Genemark calls start at 40385 /note=SSC: 40385-40726 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_JANEEMI_59 [Arthrobacter phage Janeemi]],,NCBI, q1:s1 100.0% 5.07427E-73 GAP: 128 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.02, -2.5052746077145835, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_JANEEMI_59 [Arthrobacter phage Janeemi]],,UVK63579,98.2301,5.07427E-73 SIF-HHPRED: SIF-Syn: NKF, the gene is of pham 56784. Downstream is a gene of pham 17795, just like in phage Kaylissa. Upstream is a gene of pham 98094, an endolysin. There are no other phages with the endolysin immediately adjacent to the gene, however in some phages such as Kaylissa and Lego there is a single gene between them. /note=add start codon. is length acceptable? add comparison gene in phamerator. add more detail to starterator. - janelle /note=Primary Annotator Name: Thorp, Jocelyn /note=Auto-annotation: 40,385 is selected as the start site by both Glimmer and GeneMark. /note=Coding Potential: Most of the coding potential is within the putative ORF, with a small amount of coding potential being seen before the start site. GeneMark Self shows less coding potential outside of the start site than GeneMark Host. /note=SD (Final) Score: -2.505. This is not the best final score, but the best final score (-2.236) only spans 168 bp (compared with 342 bp from start site 40,385). /note=Gap/overlap: 128 bp, an acceptable gap. /note=Phamerator: As of 1/23/2022, this gene is in pham 56784. There are 11 other phages within cluster AZ that have this gene, 9 of which are non-draft. There is no function currently called for this pham. /note=Starterator: Start site 4 is called 100% of the time when present (in 4 non-draft genes). This corresponds to 40,385 in Berrie. Start site 7 is called in most of the non-draft genes of this pham, but is not called when start site 4 is present. /note=Location call: Based upon the evidence above, this is a real gene with the start site of 40,385. /note=Function call: No known function. NCBI BLAST returned many hits with low e-values, For example, from the NCBI BLASTp results, there is a 93.8% identity match with 100% coverage to a hypothetical protein gene from phage Phives, with the returned e-value 1.98e-64. HHPRED and CDD did not return any informative hits. /note=Transmembrane domains: There are no predicted TMDs in TOPCONS or TMHMM, therefore it is not a membrane protein. /note=Secondary Annotator Name: Semaan, Sasha /note=Secondary Annotator QC: I agree with the overall analysis, everything looks good! CDS 40816 - 40995 /gene="60" /product="gp60" /function="hypothetical protein" /locus tag="Berrie_60" /note=Original Glimmer call @bp 40816 has strength 17.88; Genemark calls start at 40816 /note=SSC: 40816-40995 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ITER_61 [Arthrobacter phage Iter] ],,NCBI, q1:s1 98.3051% 9.47341E-29 GAP: 89 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.02, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ITER_61 [Arthrobacter phage Iter] ],,URQ05049,93.2203,9.47341E-29 SIF-HHPRED: SIF-Syn: NKF(from Pham 17795), upstream gene is from pham 56784, downstream is from pham18955, just like in phage Kaylissa /note=add gene length is acceptable. add more info in starterator and function call. - janelle /note=Primary Annotator Name: Zhuang, Chuzhi /note=Auto-annotation: both Glimmer and Genemark agree on the start site #40816, start codon ATG /note=Coding Potential: have reasonable coding potential, chosen start site cover all this coding potential /note=SD (Final) Score:-2.505, it is the best final score /note=Gap/overlap: 89, it is the shortest gap. The other start site has a 200-bp gap. /note=Phamerator: pham number - 17795, date - 1/24/2022, the gene is conserved in other phages in AZ cluster, Kaylissa is used for comparison. No function specified. /note=Starterator: The conserved start site in the pham is 6, and it corresponds to 40816 in my phage. 9/9 of final genes called site 6. /note=Location call: real gene, start at #40816 /note=Function call: NKF, no blast returned hits with known function, but it matched with some proteins in the phages of AZ cluster with no known function. /note=Transmembrane domains: No hit for transmembrane domains. /note=Secondary Annotator Name: Verpukhovskiy, Philipp /note=Secondary Annotator QC: Agree with evidence, location, and function calls. good job. CDS 41120 - 41446 /gene="61" /product="gp61" /function="hypothetical protein" /locus tag="Berrie_61" /note=Original Glimmer call @bp 41120 has strength 17.92; Genemark calls start at 41120 /note=SSC: 41120-41446 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_JANEEMI_60 [Arthrobacter phage Janeemi]],,NCBI, q1:s1 100.0% 1.35832E-57 GAP: 124 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.923, -2.996490344739583, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_JANEEMI_60 [Arthrobacter phage Janeemi]],,UVK63580,91.7431,1.35832E-57 SIF-HHPRED: SIF-Syn: /note=missing synteny box - janelle /note=Primary Annotator Name: Senthilvelan, Jayasuriya /note=Auto-annotation: Glimmer and GeneMark agree on the start site and start codon: 41120, ATG. /note=Coding Potential: There is coding potential within the putative ORF. The start site covers this coding potential. /note=SD (Final) Score: -2.996 is for the start site 41120, which is the best possible score. /note=Gap/overlap: 124 bp gap for start site 41120 is relatively large, but this is the smallest gap of all start-site candidates because 41120 is the LORF. Gap has no coding potential. This gene and the gap is syntenic with Adumb2043, a non-draft phage. /note=Phamerator: Gene is found in Pham 18955 as of 01/24/2022. This pham is in nearly all members of cluster AZ (conserved). No function given. /note=Starterator: Most annotated start is 14 (19/23 call it). This start site is found in Berrie as 41120. Hence, the auto-annotated start site is highly conserved among other non-draft phages. /note=Location call: Above evidence suggests this is a real gene and starts at 41120. /note=Function call: Function unknown, NKF. Phagesdb and NCBI Blast have significant hits for hypothetical/unknown proteins in phages Asa16, Elezi, and Phives (score 184, E-value < 10^-30, coverage >95%). No significant hits in HHPred or CDD. /note=Transmembrane domains: Neither TmHmm or Topcons predicts any TMHs. No evidence to suggest this gene product is associated with the membrane. /note=Secondary Annotator Name: Liao,Shiqing /note=Secondary Annotator QC: Looks good! CDS 41517 - 41828 /gene="62" /product="gp62" /function="hypothetical protein" /locus tag="Berrie_62" /note=Original Glimmer call @bp 41517 has strength 12.96; Genemark calls start at 41517 /note=SSC: 41517-41828 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_LONDON_62 [Arthrobacter phage London] ],,NCBI, q1:s1 100.0% 3.37215E-66 GAP: 70 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.02, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LONDON_62 [Arthrobacter phage London] ],,QOP64365,99.0291,3.37215E-66 SIF-HHPRED: SIF-Syn: Function: NKF, upstream gene is NKF, downstream gene is NKF. This is conserved in phage Asa16 (AZ), Eraser (AZ). /note=is length acceptable? add more details to phamerator. is starterator box correct if site only called 10% of he time? - janelle /note=Primary Annotator Name: Ma, Yiwen (Kristy) /note=Auto-annotation: GeneMark and Glimmer all agree on the same start site, which is 41517. The start codon is GTG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. There is coding potential predicted by Host-trained GeneMark and Self-trained GeneMark. The chosen start site includes all of the coding potential in both Host-trained GeneMark and Self-trained GeneMark. /note=SD (Final) Score: The Final Score is the best option of -2.505. The Z-score is 3.02, which is significant. /note=Gap/overlap: There is a 70 bp gap. Somewhat large, but ultimately reasonable because the gap is conserved in other phages of the same pham and there is no other evidence that there might be a new gene in the gap. /note=Phamerator: Pham 80686. Date: 1/21/2022. Pham number 80686 has 360 members, 31 are drafts. /note=Starterator: Start 37 was not called the most often within the pham. However, this site was called most often within the cluster AZ. Start 37 agrees with the site predicted by Glimmer and GeneMark. Start 37 also has 26 MA’s and was called 100% of the time it was present. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 41517. /note=Function call: NKF. Top 3 Phagesdb BLAST hits suggest that this is a hypothetical protein, which is the same meaning of unknown function. (E-value < 3e-55, which is significant). The top 3 NCBI BLAST hits all support that there is a high possibility that this gene code for recombination directionality factor (% identity > 94.1%; %coverage = 100%; E-value < 4.2e-64). There is no significant HHpred hit. CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note= /note=Secondary Annotator Name: Tenney, Megan /note=Secondary Annotator QC: Maybe add that start 37 was called 100% of the time is was present! Also, I think all the coding potential is contained in the ORF. Great job! CDS 41818 - 42174 /gene="63" /product="gp63" /function="hypothetical protein" /locus tag="Berrie_63" /note=Original Glimmer call @bp 41818 has strength 14.2; Genemark calls start at 41818 /note=SSC: 41818-42174 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQE11_gp64 [Arthrobacter phage Warda] ],,NCBI, q1:s1 100.0% 1.15229E-46 GAP: -11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.02, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE11_gp64 [Arthrobacter phage Warda] ],,YP_010677904,77.1186,1.15229E-46 SIF-HHPRED: SIF-Syn: NFK, upstream gene is not called yet(80686), downstream is also not called yet (96132), just like in phage Adumb2043. /note=is length acceptable? wouldnt call starterator SS since site only called 2/133 non draft genes. - janelle /note=Primary Annotator Name: Wang, Jennifer Yiyang /note=Auto-annotation: Glimmer and Genemark both agree on a start site of 41818. The start codon GTG is called. /note=Coding Potential: This gene has reasonable coding potential predicted within the putative ORF, and the chosen start site covers all the predicted coding potential. /note=SD (Final) Score: final score=-2.505 for start site 41818. It is the best final score on PECAAN with the smallest gap/overlap. /note=Gap/overlap: gap=-11(overlap=11). Most reasonable overlap given out of the potential start sites. /note=Phamerator: Pham: 55605. Date 01/23/22. It is conserved in other 132 non-draft phages from various clusters, including many phages from cluster AZ such as Adolin(AZ), Adumb2043(AZ), Warda(AZ)...There is no function called for the gene. /note=Starterator: Start site 16 in Starterator was manually annotated in 2/133 non-draft genes in this pham. Start 16 is 41818 in Berrie. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 41818 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: Unknown function for the gene. All of the Phagesdb BLAST top hits state “function unknown”. All NCBI BLAST top hits state “function unknown” for the gene as well. NKF, there is no hit for CDD and no good hit for HHpred since all e-values are extremely high. /note=Transmembrane domains: No TMD`s called and no evidence suggesting TMD function within the other databases, neither TMHMM nor TOPCONS calls it. /note=Secondary Annotator Name: Bryan Teoh /note=Secondary Annotator QC: Agree with annotation call, all notes are accurate and detailed. CDS 42174 - 42350 /gene="64" /product="gp64" /function="membrane protein" /locus tag="Berrie_64" /note=Original Glimmer call @bp 42174 has strength 12.86; Genemark calls start at 42174 /note=SSC: 42174-42350 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Powerpuff] ],,NCBI, q1:s1 94.8276% 3.26224E-14 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.548, -5.509286325636915, no F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Powerpuff] ],,QGZ17364,75.8621,3.26224E-14 SIF-HHPRED: SIF-Syn: Membrane protein, upstream gene is in pham 55605, downstream gene is in pham 17297, just like in phage Elezi. /note=add site covers all coding potential. overlap could be operon. checked topcons as evidence- janelle /note=Primary Annotator Name: Whang, Allison /note=Auto-annotation: Glimmer and Genemark both agree on a start site of 42174. The start codon ATG was called. /note=Coding Potential: There is strong coding potential within the ORF with a start site of 42174 for both the host and self-trained Genemark coding maps. /note=SD (Final) Score: -5.509 for the start site 42174. This negative SD score (<-2) is highly indicative that this start site is the correct start site. Although the start site at 42189 has a Z value <2, this start site includes less of the coding potential in the host and self-trained Genemark maps thus it is likely not the appropriate start site. /note=Gap/overlap: There is a 0 bp gap with the upstream gene, which stops at position 42174, whereas this gene starts at 42174. This lack of a gap is a good indication that there should be no additional gene inserted upstream of this gene. /note=Phamerator: Information collected on 1/24/2022. The gene is found in pham 96132. All of the other phages that also had genes within this pham were within cluster AZ, the same cluster as Berrie. No functions are listed on phamerator. /note=Starterator: Information collected on 1/21/2022. Start site 3 was the most annotated start site for the genes that are in this pham, called for 16/21 genes. For this particular gene, the corresponding start number to 3 is 42174. This is the same start site that was agreed upon by Glimmer and Genemark. /note=Location call: This gene seems like a real gene because start site 42174 covers all coding potential within the ORF, and that Glimmer and Genemark agree on this start site. /note=Function call: Relevant PhagesDB BLAST hits match to genes with no known function. Analogous genes from phages within the same cluster as Berrie (AZ) are included within these hits, and all have no known function. NCBI BLAST hits also match to proteins that are hypothetical with no listed functions. HHpred hits do match to genes with functions, but e values are very large (>10) meaning that these hits are not probable. However, the most specific evidence garnered from all possible sources are from TmHmm and SOSUI, which point to the function of this gene being a membrane protein due to both TmHmm and SOSUI calling a transmembrane domain each. /note=Transmembrane domains: TmHmm predicts one transmembrane domain. TOPCONS was not loading in PECAAN; used SOSUI instead. SOSUI also calls one transmembrane domain. Since SOSUI and TMHMM both call a transmembrane domain for this gene, I can conclude that this protein is a membrane protein. /note=Secondary Annotator Name: Turon Font, Guillem /note=Secondary Annotator QC: Can we call a protein function with no matches as long as it shows TMDs? Also, I changed the ">" sign to "<" for your Z-Score notes. CDS 42340 - 42504 /gene="65" /product="gp65" /function="hypothetical protein" /locus tag="Berrie_65" /note=Original Glimmer call @bp 42340 has strength 9.53 /note=SSC: 42340-42504 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein HOU48_gp69 [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 96.2963% 5.32796E-15 GAP: -11 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.02, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU48_gp69 [Arthrobacter phage DrManhattan] ],,YP_009815412,73.5849,5.32796E-15 SIF-HHPRED: SIF-Syn: This gene is in pham 17297, the upstream gene is in pham 96132, and the downstream gene is in pham 14469, just like in phage Eraser. /note=add gene length is acceptable, comment on gap. add comparison gene in phamerator. checked off good ncbi blast hits for NKF - janelle /note=Primary Annotator Name: Wright, Nicklas /note=Auto-annotation: Glimmer called the gene with a start site of 42340. GeneMark did not call the gene. The start codon is GTG. /note=Coding Potential: The gene has good coding potential in the forward direction and the start site includes all of the coding potential. /note=SD (Final) Score: The final score is -2.505 which is the best final score on PECAAN. /note=Gap/overlap: This gene has an 11 bp overlap. /note=Phamerator: Pham 17297 as of 1/22/2022. This pham is in 31 phages, all of which are in cluster AZ. No function is called. /note=Starterator: The most conserved start site is 9 and it is called in 22 of the 23 non-draft phages. Berrie does have start site 9 which corresponds to position 42340. /note=Location call: This is likely a real gene with start site 42340. Although this gene was not called by GeneMark and has an 11 bp overlap, the gene has a good final score, some good hits in Phagesdb BLAST (e-value < 2x10^-14), and synteny with other phages, which have similar overlaps. /note=Function call: BLASTp, CDD, and HHpred are all uninformative or lacking hits, therefore, this gene has no known function. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Ostroske, Elyse /note=Secondary Annotator QC: I have QC`d this gene and agree with the primary annotator. CDS 42497 - 42613 /gene="66" /product="gp66" /function="membrane protein" /locus tag="Berrie_66" /note=Original Glimmer call @bp 42497 has strength 11.38; Genemark calls start at 42497 /note=SSC: 42497-42613 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HOU52_gp65 [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 3.26096E-11 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.311, -4.447172733638442, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein HOU52_gp65 [Arthrobacter phage Yang] ],,YP_009815683,86.8421,3.26096E-11 SIF-HHPRED: SIF-Syn: Membrane protein, upstream gene is NKF, downstream gene appears to be HNH endonuclease, just like in final phage Niobe. /note=add start codon. checked off good NCBI hits- janelle /note=Primary Annotator Name: Erfanian, Kiana /note=Auto-annotation: Glimmer and GeneMark. Both called the same start site of 42497. /note=Coding Potential: This gene has good coding potential within the putative ORF, and covers all this coding potential. /note=SD (Final) Score: There are only two suggested starts for this gene. The RBS Final Score of -4.447 for the original start at 42497 is the highest of the suggested starts, and is therefore reasonable. The Z-score for this start is also higher at 2.311 and therefore the better score. /note=Gap/overlap: This gene has an overlap of 8 bp with its downstream gene because its stop is at 42613, whereas the start of the downstream gene is 42598. This gene however, is only 117 bp in length instead of the suggested minimum of 120 bp. This could indicate that the gene is not real, but 120 is simply a guideline and thus does not appear to have weight in this case. /note=Phamerator: This gene was found in pham 14469, which has 28 members, 8 of which are drafts. Additionally, every member phage in this pham belongs to cluster AZ. /note=Starterator: Using information from the Starterator analysis run most recently on 1/21/22, it was found that the most conserved start site number is 6, which was called in 17 of the 20 non-draft genomes. The auto-annotated start is called at start number 6 (65), which matches the most conserved start. Phage Berrie’s track contains start site 6 by a yellow line, which denotes it as an auto-annotated start. This start in Berrie’s track corresponds to that of other phages in the cluster, such as phages Yang and Tbone. Start site 6 has been determined to be the Final Human Annotated start, as represented by a green line on the track representing these phages. The analogous start site between Berrie and other phages in this cluster is therefore promising, indicating that the auto-annotated start site 6 at 42497 is indeed correct. /note=Location call: The evidence gathered indicates that the suggested start site of 42497 as called by Glimmer and GeneMark appears to be the most probable site. /note=Function call: No known function. Every single hit on PhagesDB BLASTp lists genes of unknown function. NCBI BLASTp has a few top hits with the known function of membrane protein. These hits have high percentage ideas (76% and above), but there are other NCBI BLASTp hits that are listed as hypothetical proteins or unknown functions. The top hits on HHPRED have mixed gene functions, but a few of which have cytochrome complex as the listed function. Given the above data, as well as the following transmembrane domain information, it appears that the function of this gene is membrane protein. /note=Transmembrane domains: TmHmm predicts 1 TMH. A few hits from TopCons. The protein is most likely a transmembrane protein. /note=Secondary Annotator Name: Ma, Yiwen (Kristy) /note=Secondary Annotator QC: I agree with your location call and function call, really detailed notes! CDS 42601 - 42954 /gene="67" /product="gp67" /function="hypothetical protein" /locus tag="Berrie_67" /note=Original Glimmer call @bp 42598 has strength 4.68; Genemark calls start at 42652 /note=SSC: 42601-42954 CP: no SCS: both-cs ST: NI BLAST-Start: [hypothetical protein SEA_ITER_68 [Arthrobacter phage Iter]],,NCBI, q6:s4 88.8889% 1.07681E-45 GAP: -13 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.453, -4.726606065093262, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ITER_68 [Arthrobacter phage Iter]],,URQ05056,77.6786,1.07681E-45 SIF-HHPRED: SIF-Syn: /note=AF: function reverted to NKF. not enough evidence to support HNH /note=Primary Annotator Name: Khaine, Aye Myat /note=Auto-annotation: Glimmer called the start at 42598 and GeneMark called it at 42652. GTG. /note=Coding Potential: The ORF starting at 42652 has good coding potential. The start 42598 called by Glimmer covers all coding potential while the start 42652 called by GeneMark does not include all coding potential. /note=SD (Final) Score: Start 42598 has a final score of -3.976 with a z-score of 2.453. Start 42652 has a final score of -6.218 with a z-score of 1.275. Start 42598 has the better score out of the two and appears to be the best option. There is an option with the best final score/z-score but the gap upstream is too large. /note=Gap/overlap: There is a 16bp overlap upstream with the start 42598. The 42652 start has a 38bp gap with coding potential within the gap. /note=Phamerator: This gene belongs to the pham 16105 as of 01/22/2022. The pham has 34 members, out of which 25 members are non-draft phages of cluster AZ. /note=Starterator: Starterator calls start number 10 at 42598. This start number has 2 manual annotations. This is not the most annotated start number 13 but Berrie does not have the most annotated start. /note=Location call: Based on the evidence above, this is a real gene with the start 42598. /note=Function call: HNH endonuclease. Top phagesdb (e-values <1e-37) and NCBI (e-values <8e-43, identity >62%, 90%+ coverage) BLASTp hits return with HNH endonuclease as the function. There are no reliable HHpred hits. No CDD hits are found. /note=Transmembrane domains: This is not a membrane protein. No TMDs are predicted by TMHMMs or TOPCONS. /note=Secondary Annotator Name: Wang, Jennifer Yiyang /note=Secondary Annotator QC: I agree with the start site 42598 and the function annotated by primary annotator. Great job on putting in detailed explanations! CDS 43237 - 43434 /gene="68" /product="gp68" /function="hypothetical protein" /locus tag="Berrie_68" /note=Original Glimmer call @bp 43237 has strength 9.42; Genemark calls start at 43237 /note=SSC: 43237-43434 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_LONDON_68 [Arthrobacter phage London] ],,NCBI, q1:s9 87.6923% 9.323E-26 GAP: 282 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.255, -2.606079664606164, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LONDON_68 [Arthrobacter phage London] ],,QOP64371,75.0,9.323E-26 SIF-HHPRED: SIF-Syn: Unknown function. In comparing Berrie to several other AZ cluster phage, there is no presence of an upstream gene to suggest that there should be a gene added within the large gap present. The downstream gene was also conserved, although it did not seem to be highly related. Phives, Eraser, Adolin, and Crewmate were all used in this comparison. /note=add if covers all coding potential and gene length. - janelle /note=Primary Annotator Name: McLinden, Katherine /note=Auto-annotation: Both Glimmer and GeneMark call start site 43237. The Glimmer score is 9.42. The start codon is ATG. /note=Coding Potential: There is coding potential for both start sites and the stop site in Host trained and self trained GeneMark. /note=SD (Final) Score: -2.606. This is the best final score called. /note=Gap/overlap: The gap is 282, which is very large. However, there is no coding potential within that gap that is viable. The shorter gap sizes do not have all of the coding potential included and therefore they are not the appropriate start sites. /note=Phamerator: The pham number of this gene is 17702. The date is 1/19/22. There are 26 members, which are all in cluster AZ. /note=Starterator: Berrie calls the most annotated start site. The start site is number 20 and is called 100% of the time it is present. This start site in Berrie is 43237, which agrees with Glimmer and GeneMark. /note=Location call: This is most likely a real gene. Based on coding potential, GeneMark, Glimmer, and Starterator, 43237 is most likely the start site. /note=Function call: Phages Function Frequency was not informative. NCBI BLASTp and PhagesDB BLASTp calls function unknown for several phages with good e-values. CDD and HHPRED were also not informative. /note=Transmembrane domains: There were no TMD`s called and no other evidence to suggest a TMD function within TmHmm and TOPCONS. /note=Secondary Annotator Name: Kamarzar, Minehli /note=Secondary Annotator QC: I agree with the location call and functional call. I think you may have made a mistake writing "both start sites" in the coding potential section. For starterator, mention how many non-draft genomes called the same start number out of the total non-draft genomes in the pham. For the function call, I would mention what the good e-values are and maybe some additional supporting values, such as % identity for more evidence. See the Annotation Lab Manual Sample PECAAN Notes for reference. I would check off the top two hits from Phagesdb BLAST and NCBI BLAST for more support. State the two programs used to determine that there were no TMDs: Tmhmm and Topcons. For the syntenty box, you should mention functions and/or pham numbers of the upstream & downstream genes. Overall, good work! CDS 43434 - 43736 /gene="69" /product="gp69" /function="HNH endonuclease" /locus tag="Berrie_69" /note= /note=SSC: 43434-43736 CP: yes SCS: neither ST: NI BLAST-Start: [HNH endonuclease [Arthrobacter phage Iter] ],,NCBI, q1:s1 99.0% 9.68455E-64 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.21, -4.782222827850149, no F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage Iter] ],,URQ05058,97.0,9.68455E-64 SIF-HHPRED: HNH endonuclease; Thermophilic bacteriophage, HNH Endonuclease, DNA nicking, HYDROLASE; 1.52A {Geobacillus virus E2},,,5H0M_A,67.0,98.1 SIF-Syn: /note=added gene based on synteny with other phages, good CP