CDS 535 - 834 /gene="1" /product="gp1" /function="HNH endonuclease" /locus tag="PickleBack_1" /note= /note=SSC: 535-834 CP: yes SCS: neither ST: NA BLAST-Start: [HNH endonuclease [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 1.99932E-65 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.346, -3.8137921535956054, yes F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Mycobacterium phage Conspiracy] ],,YP_008857938,100.0,1.99932E-65 SIF-HHPRED: HNH endonuclease; Structural Genomics, PSI-Biology, Protein Structure Initiative, Northeast Structural Genomics Consortium (NESG), nicking endonuclease, HYDROLASE; HET: MSE; 2.599A {Geobacter metallireducens},,,4H9D_C,78.7879,96.4 SIF-Syn: CDS 872 - 1321 /gene="2" /product="gp2" /function="terminase, small subunit" /locus tag="PickleBack_2" /note=Original Glimmer call @bp 872 has strength 14.29; Genemark calls start at 872 /note=SSC: 872-1321 CP: yes SCS: both ST: SS BLAST-Start: [terminase small subunit [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 2.10339E-104 GAP: 37 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.914, -2.9525085049809894, yes F: terminase, small subunit SIF-BLAST: ,,[terminase small subunit [Mycobacterium phage Conspiracy] ],,YP_008857939,100.0,2.10339E-104 SIF-HHPRED: Terminase small subunit; genome packaging, bacteriophage, DNA binding, VIRAL PROTEIN; 1.4A {Enterobacteria phage HK97},,,6Z6E_A,58.3893,97.1 SIF-Syn: Good synteny with other members of the pham. terminase small subunit, upstream gene has no known function and the downstream is minor tail protein, just like in Conspiracy, and Lev2. /note=Glania - Checked terminase function evidence in PhagesDB BLAST and HHPred. /note= /note=Primary Annotator Name: Tarighat, Asefeh /note=Auto-annotation: Glimmer and Genemark both call the start site to be at 872. /note=Coding Potential: The gene does have good coding potential in the Host-Trained GeneMark in the forward direction and must be a real gene. The start site of 872 covers all coding potential. /note=SD (Final) Score: The start codon at 450 bp length is GTG, with a final score of -2.953, which is the highest (least negative) final score. The Z score is 2.914, which indicates a fairly high probability of a credible ribosome binding site.  /note=Gap/overlap: The gap for the start site of 872 is 248 with the upstream gene, which is quite large but since there is not a strong coding potential in that area and the length of the gene is acceptable (450bp), the chosen start site is reliable.   /note=Phamerator: Pham number 48579 as of ( 09/29/22). This gene with a start site of 872 is conserved and has synteny with Benedict, Lev2, Naca, and many more within cluster A5. The function is consistent and found in the approved function list.  /note=Starterator: (analysis was run on 09/23/22)  Pham number 48579 has 866 members, 37 are drafts. Start site 52 with 34 MA`s of 829. Start #52 at 872 was found in 4.2% of genes in pham but since the pham includes a high number of members it is not concerning and 34 MA is reliable. The start site is called 100% of the time when present.   /note=Location call: start site 872. Based on the information obtained on Glimmer, Genemark, Phamrator and starterator I will call start site 872 to be the start site for this gene.  /note=Function call: Terminase small subunit is the predicted function for this gene. Conspiracy has been checked as the evidence based on the phagesdb Blast with a high score of 299, the E- value is very small (2e-81). The hit on HHpred has a high (enough) coverage (58.38) and the probability is also 97.1. A hit with the identity number of 100% is also recorded on the NCBI blast.  /note=Transmembrane domains: 0 reported on TOPCONS and THHMM, not a transmembrane /note=Secondary Annotator Name: Yan, Lisa /note=Secondary Annotator QC: I agree with this location and functional call. May also want to check additional evidence in NCBI BLAST for small subunit terminase. Synteny box may need to be updated because gene 1 of Pickleback has NKF and downstream gene of Conspiracy does not have a called function (but are in the same pham). CDS 1427 - 2389 /gene="3" /product="gp3" /function="minor tail protein" /locus tag="PickleBack_3" /note=Original Glimmer call @bp 1427 has strength 17.83; Genemark calls start at 1427 /note=SSC: 1427-2389 CP: yes SCS: both ST: SS BLAST-Start: [tail subunit [Mycobacterium phage Chadwick] ],,NCBI, q1:s1 100.0% 0.0 GAP: 105 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.123, -2.6835611257758942, yes F: minor tail protein SIF-BLAST: ,,[tail subunit [Mycobacterium phage Chadwick] ],,YP_009207668,98.75,0.0 SIF-HHPRED: SIF-Syn: Very good synteny of Phams with other A5 phages such as Conspiracy, Bonamassa, and George, to name a few. Upstream of another Minor Tail protein and downstream of a Terminase. /note=Glania - Checked "minor tail protein" function evidence boxes in PhagesDB BLAST and HHPred. Checked DeepTMHMM, no TMDs /note= /note=Primary Annotator Name: Kidd, Conner /note=Auto-annotation: Start@1427 (Glimmer and Genemark agree) ATG. /note=Coding Potential: GenemarkS indicates very strong coding potential in the train 1400bp-2400bp whereas Glimmer indicates strong but slightly spottier coding potential for this region on the same ORF. Overall, it looks very promising. Very good synteny with other A5 phages. /note=SD (Final) Score: -2.684 /note=Gap/overlap:105bp /note=Phamerator: Pham 199 on 9.29.22. Pham 199 contains 260 members of which all members are Cluster A except for two unclustered phages (C3 & PP). /note=Starterator: Start site 27 @1427 is the most Manually Annotated (225/250 annotated phages) start site for this pham and is called 98.7% of the time that it is present. /note=Location call: Start @1427 ATG because Glimmer and Genemark agree, there is good coding potential encapsulated by that start site, it has the most favorable Z and SD scores, and because it is the most conserved and most manually annotated start site for the pham the gene belongs to. /note=Function call: Minor Tail Protein. This protein has been called as a Minor Tail Protein three times in Phagesdb and several NCBI BLASTp hits list this protein as either a tail subunit, structural tail protein, or as a minor tail protein. HHPred hits agree with NCBI Blastp. On Phagesdb BLAST, however, this protein has not been associated with any functional call. /note=Transmembrane domains: None. TmHmm and Topcons do not call any transmembrane domains. /note=Secondary Annotator Name:Olvera, Kevin /note=Secondary Annotator QC: I agree with this annotation except for the selection of the PhagesDB BLAST hits with unknown function as evidence since the function call has been determined as a minor tail protein. The GM coding potential drop-down menu has not been filled out CDS 2386 - 3441 /gene="4" /product="gp4" /function="minor tail protein" /locus tag="PickleBack_4" /note=Original Glimmer call @bp 2386 has strength 8.25; Genemark calls start at 2854 /note=SSC: 2386-3441 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein X823_gp04 [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.963, -4.629584883725491, no F: minor tail protein SIF-BLAST: ,,[hypothetical protein X823_gp04 [Mycobacterium phage Conspiracy] ],,YP_008857941,100.0,0.0 SIF-HHPRED: SIF-Syn: This gene has function of minor tail protein. The upstream gene is in pham 199 and the downstream gene is in 48611, which is also observed in Lev2, ForGetIt, and Scorpia. /note=Primary Annotator Name: Yan, Lisa /note=Auto-annotation: Both Glimmer and GeneMark call the gene, but Glimmer calls the gene start at 2386, while GeneMark calls the gene start at 2854. The start codon for start at 2386 is ATG, while the start codon for 2854 is GTG. /note=Coding Potential: Coding potential is strongest in both GeneMark Self and Host between 2854 and 3441. There is very little coding potential before 2800 in both GeneMark Self and Host, although there is a spike in coding potential between 2600-2700 bp. It is unlikely that this gene is a reverse gene because it is surrounded by forward genes. /note=SD (Final) Score: The SD final score for start site 2854 is -5.771 with a Z-score of 1.92. The SD final score for start site 2386 is -4.630 with a Z-score of 1.963. Start site 2386 has a better RBS and Z score. /note=Gap/overlap: Start site of 2854 has a large gap of 464 bp, but start site of 2386 has an overlap of 4 bp, which may indicate an operon. The 4 bp overlap is observed in other phages such as Lev2. /note=Phamerator: As of 09/28/2022, this gene was found in pham 48633, which currently has 371 member phages. The pham that this gene is in is found in other phages of Cluster A to which PickleBack belongs. Other annotated genomes denote the function of these gene to be for the minor tail protein. /note=Starterator: The most annotated start site in this pham is start 11, which is in 315/359 of the non-draft genes. This start site at 2386 is found in Pickleback. This evidence agrees with the site predicted by Glimmer. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 2386. Although there is not as strong coding potential, start site 2386 has a slightly better RBS and Z score, with a more reasonable ORF and overlap of genes. It is also the most annotated start site. /note=Function call: Multiple phagesDB BLAST has hits with suggested function of minor tail protein with E-value of 0 or <10^-137. Similar results show up in NCBI blast for the function of minor tail protein in other phages such as Danforth and Phillis (100% coverage, >60% identity, and E-value <10^-98). There were no significant matches from CDD or HHPRED. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Hernandez, Betania /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. *don`t forget to check evidence boxes for NCBI Blast* CDS 3447 - 4088 /gene="5" /product="gp5" /function="hypothetical protein" /locus tag="PickleBack_5" /note=Original Glimmer call @bp 3447 has strength 5.24 /note=SSC: 3447-4088 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein X823_gp05 [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 3.91686E-144 GAP: 5 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.018, -4.654077828316241, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein X823_gp05 [Mycobacterium phage Conspiracy] ],,YP_008857942,99.5305,3.91686E-144 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Olvera, Kevin /note=Auto-annotation: Glimmer start site: 3447; No GeneMark start site /note=Coding Potential:Coding potential in this gene is in the forward direction. There is coding potential present in GeneMark Self. /note=SD (Final) Score:-4.654. This is not the best score on PECAAN but it is acceptable. /note=Gap/overlap: gap of +5 bp, although not ideal, this creates the longest ORF /note=Phamerator:This gene is part of pham 48611 as of 09/23/22. This gene is conserved in Conspiracy (A5) and Lev2 (A5). /note=Starterator: Date run: 09/23/22. Start site 22 is the most annotated being called in 271 out of 421 non-draft genomes. Site 22 is at position 3447 in PickleBack and is supported by Glimmer. /note=Location call:This is a real gene with start at 3447 /note=Function call: NKF- The top result from NCBI BLAST is a hypothetical protein found in multiple mycobacterium phages with 100% coverage and an E-value of 3.9e-144. The top results from PhagesDB also have the gene with an unknown function /note=Transmembrane domains:There were no TMDs /note=Secondary Annotator Name: Gleason, Zoe /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. tRNA 4132 - 4205 /gene="6" /product="tRNA-Trp(cca)" /locus tag="PICKLEBACK_6" /note=tRNA-Trp(cca) CDS 4262 - 5176 /gene="7" /product="gp7" /function="lysin A" /locus tag="PickleBack_7" /note=Original Glimmer call @bp 4313 has strength 9.38; Genemark calls start at 4262 /note=SSC: 4262-5176 CP: yes SCS: both-gm ST: NI BLAST-Start: [LysA [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 0.0 GAP: 173 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.13, -5.102966088545787, no F: lysin A SIF-BLAST: ,,[LysA [Mycobacterium phage Conspiracy] ],,YP_008857943,100.0,0.0 SIF-HHPRED: SIF-Syn: Lysin A. There is synteny both upstream and downstream of the gene when compared to phage Conspiracy. The gene upstream of this gene belongs to pham 48661 and downstream belongs to pham 49036 in Pickleback and Consipiracy. /note=Primary Annotator Name: Mehra, Muskaan /note=Auto-annotation: Glimmer and GeneMark. Do not agree on the start site (Glimmer: 4313, GeneMark: 4262) /note=Coding Potential:The coding potential for this ORF is on the forward strand only, thus confirming that this is a forward gene. Coding potential is found for this ORF for both GeneMark Self and Host-Trained formats. /note=SD (Final) Score: The best final score is -4.333 with a Z-score higher than 2. However, this calls for the start site 4355 which is called by neither Glimmer or GeneMark. To confirm the start site, starterator called 4262 as the start site and this had a final score of -5.103. /note=Gap/overlap:193. This gap is large enough to fit another gene (>100bp) however there is no coding potential on forward or reverse strands in this region. /note=Phamerator:45815 (as of 09/29/2022). This pham has 97 members, 6 of which are drafts. The phages within this pham however belong to different clusters namely A, H, L and D. /note=Starterator: Start site 18 (4262) is manually annotated in 69 phages. It is the most annotated start site,and is called 87.8% of the time it is present. The start codon is ATG. /note=Location call: This is a real gene with a start site at 4262. This is called by GeneMark and starterator. /note=Function call: LysinA. The top three phagesDB and NCBI Blast hits call the function lysinA with good e-values for both (0). They also have good coverage (100%). HHPred had no relevant hits with good e-values or coverage. However, due to the fact that there is compelling evidence from PhagesDB and NCBI Blast, as well as synteny with other phages in the cluster A5 such as Conspiracy and Aragog, the function call as lysinA stands. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Kidd, Conner /note=Secondary Annotator QC: Looks great! CDS 5173 - 5982 /gene="8" /product="gp8" /function="lysin B" /locus tag="PickleBack_8" /note=Original Glimmer call @bp 5173 has strength 9.86; Genemark calls start at 5173 /note=SSC: 5173-5982 CP: yes SCS: both ST: SS BLAST-Start: [lysin B [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.15, -4.882892989731512, no F: lysin B SIF-BLAST: ,,[lysin B [Mycobacterium phage Conspiracy] ],,YP_008857944,100.0,0.0 SIF-HHPRED: Gene 12 protein; alpha/beta sandwich, CELL ADHESION; 2.0A {Mycobacterium phage D29},,,3HC7_A,89.9628,100.0 SIF-Syn: Lysin B protein, upstream gene is Lysin A, downstream is terminase large subunit, just like in phages Lev2 and Phlorence. /note=Primary Annotator Name: Doan, Pearl /note=Auto-annotation: Glimmer and Genemark. Both call the start at 5173. /note=Coding Potential: Coding potential in the ORF is in the forward strand only, indicating that this is a forward gene. Coding potential is found both in Host-trained and Self-trained Genemark. /note=SD (Final) Score: -4.883. It is the second best score on PECAAN. /note=Gap/overlap: Upstream overlap: 4 base pairs. This overlap is fine and suggests that this gene is an operon gene. Downstream gap: 24 base pairs.This gap is medium-sized and cannot fit other genes, and there is no coding potential in the gap that might be a new gene. /note=Phamerator: Pham: 49036. Date 9/29/2022. It is conserved; it is found in Phlorence (A) and Lev2 (A). /note=Starterator: Start site 3 in Starterator was manually annotated on 9/23/22 in 47 non-draft genes in this pham. Start 3 is 5173 in Pickleback. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 5,173. /note=Function call: Lysin B. The top three phagesdb BLAST hits have the function of lysin B function (E-value <10^-158), and the top NCBI BLAST hits also have the function of lysin B. (100% coverage, 100% identity, and E-value = 0). HHPred and CDD did not have remarkable function calls. /note=Transmembrane domains: No transmembrane domains are detected. TMHMM and TOPCONS does not detect any transmembrane domains. /note=Secondary Annotator Name: Tarighat, Asefeh /note=Secondary Annotator QC: I have QC’ed this location and function call and agree with the first annotator. CDS 5979 - 7757 /gene="9" /product="gp9" /function="terminase, large subunit" /locus tag="PickleBack_9" /note=Original Glimmer call @bp 6006 has strength 9.84; Genemark calls start at 6006 /note=SSC: 5979-7757 CP: no SCS: both-cs ST: SS BLAST-Start: [terminase large subunit [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 0.979, -7.381088087284097, no F: terminase, large subunit SIF-BLAST: ,,[terminase large subunit [Mycobacterium phage Conspiracy] ],,YP_008857945,100.0,0.0 SIF-HHPRED: SIF-Syn: The function is terminase large subunit. The upper stream gene`s function is Lysin B and the lower stream gene`s function is portal protein. This order is just similar to the other members of the cluster such as Tiger and Lev2 to name a few. /note=stop @ 7757: Tricky start. Terminase, large subunit. Very diverse pham. (Start: 35 @5979 has 16 MA`s [and also would result in an operon]), (Start: 39 @6006 has 28 MA`s). Decided to go with operon/LORF. - AF 7/21/2023 /note=---- /note=Primary Annotator Name: Tarighat, Asefeh /note=Auto-annotation: Glimmer and Genemark both have called 6006 as the start site /note=Coding Potential: The coding potential is strong on both self and host-trained GeneMark and covers the whole area within the ORF in the forward direction. /note=SD (Final) Score: The final score is -2.276 which is the best final score (least negative). The Z score is favorable since it`s higher than 2 (3.067). /note=Gap/overlap: The Gap is 23 with upstream gene, it is not the best but is reasonable and all other data are rational for this start site. The length of the gene is reasonable (1752 bp) /note=Phamerator: The pham number as of 09/23/22 is 49977. The gene is conserved in phages Tiger, Benedict, and George to name a few, all in the same cluster as Pickleback. The function called for the gene is a large terminase subunit and is consistent in all of the named phages in both Phamerator and phams databases and is also on the approved SEA-PHAGES list. /note=Starterator: This analysis was run 09/30/22 pham 49977 has 1247 members, 89 are drafts. Start: 38 @6006 has 28 MA`s /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 6006 bp. Since Starterator does not have any information on this gene, using the other data found, the start site of 6006 is agreeable. /note=Function call: multiple phagesDB BLAST has hits with the suggested function large subunit terminase with e-values of zero in Lev2, Tiger, and Jovo to name a few. HHPRED have hits with coverages 84% to 89% (best 3 marked as evidence) with 100% probability observed in all, and e-value of 6.4e-30 or smaller. /note=Transmembrane domains: Neither TMHMM nor TOPCON predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Doan, Pearl /note=Secondary Annotator QC: I have QC`ed the location and function call and agree with the annotator. CDS 7757 - 9190 /gene="10" /product="gp10" /function="portal protein" /locus tag="PickleBack_10" /note=Original Glimmer call @bp 7757 has strength 13.31; Genemark calls start at 7757 /note=SSC: 7757-9190 CP: yes SCS: both ST: SS BLAST-Start: [portal protein [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.872, -4.966271270034708, no F: portal protein SIF-BLAST: ,,[portal protein [Mycobacterium phage Conspiracy] ],,YP_008857946,100.0,0.0 SIF-HHPRED: PORTAL PROTEIN; BACTERIOPHAGE SPP1, DNA TRANSLOCATION, MOLECULAR MOTOR, VIRAL PORTAL PROTEIN, VIRAL PROTEIN; HET: CA, HG; 3.4A {BACTERIOPHAGE SPP1},,,2JES_E,90.3564,100.0 SIF-Syn: Pickleback shows synteny with Phage George in which a Terminase Large Subunit protein is Upstream of a Portal Protein which is upstream of a Capsid Maturation protease. Additionally there is synteny of phams across these phages. /note=Primary Annotator Name: Kidd, Conner /note=Auto-annotation: Start @7757 (Glimmer and Genemar Agree) ATG /note=Coding Potential: Very good coding potential between the predicted start site @7757 and 9190 on GenemarkS though coding potential rises slightly after 7757 and drops slightly before 9190 on the host-trained Genemark. The gene is likely real. /note=SD (Final) Score: -4.966 /note=Gap/overlap: -1 gap, possibly a member of an operon. /note=Phamerator: Pham 48570 as of 9.29.22. There are 1544 members of this pham and members range across many clusters such as BD, BL, F, K, and AZ though a large portion are from cluster A. /note=Starterator: Start site 73 @7757 is found in 19.06% of genes in this pham and is called 99.7% of the time when it is present. It was called 290 times out of 1438 non-draft genomes in this pham. /note=Location call: Start site @7757 (ATG) because Glimmer and Genemark agree, there is very good coding potential encompassed by this gene with very clear boundaries with low potential. Additionally, this start site is manually annotated at an incredibly high rate when it is present. /note=Function call: Portal protein. HHPred, CDD, NCBI BLASTp, and Phagesdb BLAST all call this protein as a Portal Protein with incredibly low e-values and very high probability. /note=Transmembrane domains: None. Zero predicted transmembrane domains on TmHmm and Topcons. /note=Secondary Annotator Name: Tarighat, Asefeh /note=Secondary Annotator QC: I have QC’ed this location and function call and agree with the first annotator. CDS 9187 - 10026 /gene="11" /product="gp11" /function="capsid maturation protease" /locus tag="PickleBack_11" /note=Original Glimmer call @bp 9187 has strength 5.77; Genemark calls start at 9187 /note=SSC: 9187-10026 CP: yes SCS: both ST: SS BLAST-Start: [capsid maturation protease [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.151, -4.881534226067333, no F: capsid maturation protease SIF-BLAST: ,,[capsid maturation protease [Mycobacterium phage Conspiracy] ],,YP_008857947,100.0,0.0 SIF-HHPRED: SIF-Syn: This gene has the function of capsid maturation protease. The upstream gene is the portal protein in pham 48570, while the downstream gene is the scaffolding protein in pham 48582. This synteny is observed in other phages such as Scorpia, Lev2, and Naca. /note=Primary Annotator Name: Yan, Lisa /note=Auto-annotation: Both Glimmer and GeneMark call the start site of the gene at 9187. /note=Coding Potential: There is high coding potential shown in the forward direction in both Genemark Self and Host. /note=SD (Final) Score: The final score for start site of 9187 is -4.882 with a Z score of 2.151. There are other start sites which have better final and Z scores, but coding potential indicates start site of 9187 is better. /note=Gap/overlap: The overlap for start site of 9187 is -4, which supports the idea that this gene is part of an operon. Other possible start sites have gaps larger than 100, which makes them more unlikely. /note=Phamerator: The pham number as of September 29, 2022 is 44303. The gene is conserved in many phages such as CactusRose, Coog, and Cookiedough, which are also in Cluster A with Pickleback. The function call for the gene is a capsid maturation protease, which is in the approved SEA-PHAGES function list. /note=Starterator: The most annotated start site is start site 2, which is called in 610/736 non-draft genes in the pham. Start site 2 is called in Pickleback, which at 9187. The evidence agrees with the auto annotation data from Glimmer and GeneMark. /note=Location call: Based on the evidence, the start site for this gene is 9187, which agrees with the auto-annotation. /note=Function call: Most likely function of this gene is for the capsid maturation protease. There are multiple hits in PhagesDB BLAST, such as Tiger, Florence, and ForGetIt, which also call the capsid maturation protein with E-value <-162. NCBI BLAST also calls multiple hits in phages which has 100% coverage, e-values of 0, and over 65% identity. HHPRED and CDD do not have any relevant hits. /note=Transmembrane domains: There are no predicted TMDs in either TMHMM or TOPCONS, therefore it is not a membrane protein. /note=Secondary Annotator Name:Mehra, Muskaan /note=Secondary Annotator QC: Too many sources of evidence ticked. CDS 10082 - 10594 /gene="12" /product="gp12" /function="scaffolding protein" /locus tag="PickleBack_12" /note=Original Glimmer call @bp 10082 has strength 13.1; Genemark calls start at 10082 /note=SSC: 10082-10594 CP: yes SCS: both ST: SS BLAST-Start: [scaffolding protein [Mycobacterium phage Archetta] ],,NCBI, q1:s1 100.0% 1.02932E-115 GAP: 55 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.914, -2.6814417326944517, yes F: scaffolding protein SIF-BLAST: ,,[scaffolding protein [Mycobacterium phage Archetta] ],,YP_010060975,100.0,1.02932E-115 SIF-HHPRED: Scaffold protein; major capsid protein, HK97-like fold, scaffolding protein, procapsid, VIRUS; 3.72A {Staphylococcus phage 80alpha},,,6B0X_g,66.4706,98.4 SIF-Syn: This gene is consrved in phages Lev2 and Archetta as a scafolding protein. Both of these phages are in the same subcluster, A5, as PickleBack. In all three of these phages, the gene is preceded by a capsid maturation protease and followed by a major capsid protein,. /note=Primary Annotator Name: Olvera, Kevin /note=Auto-annotation:Glimmer and GeneMark agree on 10082 /note=Coding Potential:Coding Potential is in the forward direction. There is coding potential found in both GeneMark Host and Self. /note=SD (Final) Score:-2.681, This is the best score on PECAAN /note=Gap/overlap: The gap is +55 bp, but does not contain the longest ORF. However the length created by this start site is very consistent with the other genes in this pham. /note=Phamerator: This gene belongs to pham 48582 as of 10/14/22 and is also present in the phages Lev2 (A5) and Conspiracy (A5). /note=Starterator: Start site 27 is the most called start site with 664 MA’s out of 737 non-draft genomes. This site is at position 10082 in PickleBack and is supported by both GeneMark and Glimmer. /note=Location call:This is a real gene with its start site at 10082 /note=Function call:Scaffolding Protein- The top hit, a scaffolding protein, from NCBI BLAST has 100% identity, coverage, and alignment with an E-value of 1e-115. This is consistent with results from PhagesDB Blast and to a lesser extent, HHPRED. /note=Transmembrane domains:There were no TMDs detected in TmHmm which suggests that this is not a membrane protein. /note=Secondary Annotator Name: Yan, Lisa /note=Secondary Annotator QC: I agree with the location and function call made. I noticed you forgot to indicate the date that you ran phamerator. May also want to include information about function of other genes in the same pham. Could probably update synteny box with information about upstream and downstream genes. CDS 10621 - 11553 /gene="13" /product="gp13" /function="major capsid protein" /locus tag="PickleBack_13" /note=Original Glimmer call @bp 10621 has strength 19.06; Genemark calls start at 10621 /note=SSC: 10621-11553 CP: yes SCS: both ST: SS BLAST-Start: [major capsid protein [Mycobacterium phage Archetta] ],,NCBI, q1:s1 100.0% 0.0 GAP: 26 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.594, -3.284949965096066, no F: major capsid protein SIF-BLAST: ,,[major capsid protein [Mycobacterium phage Archetta] ],,YP_010060976,100.0,0.0 SIF-HHPRED: major capsid protein; acne, bacteriophage, HK97-like, VIRUS; 3.7A {Propionibacterium phage PA6},,,3JB5_B,94.8387,100.0 SIF-Syn: Major capsid protein, upstream gene is scaffolding protein, downstream is head-to-tail adapter, just like in phage Lev2 /note=Primary Annotator Name: Hernandez, Betania /note=Auto-annotation: Glimmer and GeneMark both call the start at 10621 (ATG codon). /note=Coding Potential: The ORF has reasonable coding potential in the forward strand. The chosen start site includes all of the coding potential. /note=SD (Final) Score: -3.285. This is the best final score with a high Z-value of 2.594. /note=Gap/overlap: 26 bp gap. Reasonable gap size. Gene length of 933 bp is acceptable. /note=Phamerator: Pham: 48576. Date: 9/28/2022. Conserved in AbbyRanger, Achebe, and Abdiel, all from cluster A. /note=Starterator: Start site 6 was called by 464/912 non-draft members but not present in PickleBack. Start site 19 is 10621 in PickleBack. It was manually annotated in 81/912 non-draft genes in this pham, called 100% time when present. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 10621. /note=Function call: Major capsid protein. The top 10 top hits on phagesdb BLAST have the function of major capsid protein with e-values of e-177 to e-176. The top 5 NCBI BLASTp hits suggested function is major capsid protein. High query coverage (100%), high % identity (99.35%+), and low E-value (0). HHpred displayed hits with Propionibacterium phage PA6 (99.7% probability, 94.8387% coverage, e-value: 9.6e-29) and Staphylococcus phage 80alpha (99.97% probability, 96.129% coverage, e-value: 1.4e-28). CDD had a relevant hits for Phage_capsid (e-value of 3.01e-27) and major_cap_HK97 (e-value of 3.33e-17) /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Olvera, Kevin /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 11622 - 11996 /gene="14" /product="gp14" /function="head-to-tail adaptor" /locus tag="PickleBack_14" /note=Original Glimmer call @bp 11622 has strength 10.33; Genemark calls start at 11622 /note=SSC: 11622-11996 CP: yes SCS: both ST: SS BLAST-Start: [head-to-tail adaptor [Mycobacterium phage Jovo] ],,NCBI, q1:s1 100.0% 1.04716E-84 GAP: 68 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.067, -2.8035499123971284, yes F: head-to-tail adaptor SIF-BLAST: ,,[head-to-tail adaptor [Mycobacterium phage Jovo] ],,YP_008859039,100.0,1.04716E-84 SIF-HHPRED: Yqbg; Putative Head-Tail Connector Protein Yqbg from Bacillus subtilis and similar proteins.,,,cd08053,70.9677,98.7 SIF-Syn: Head to tail adaptor. Gene downstream is pham 48616 and upstream gene is 48576, just like the phage Aragog. /note=Glania - Checked HHPred evidence based on function requirements. /note= /note=Primary Annotator Name: Gleason, Zoe /note=Auto-annotation: Both Glimmer and GeneMark called start as 11622. /note=Coding Potential: There is high coding potential throughout the gene in both host and self trained GeneMark. Coding potential is found in the forward direction, the gene is forward. /note=SD (Final) Score: -2.804- This is the best final (least negative) score. /note=Gap/overlap: 68. It is too small to be a gene and has no coding potential. Results in the longest possible ORF. /note=Phamerator: 48583 (as of 9/23/22). Most other genes in this pham belong to phages in cluster A. Pham has 763 members, 34 of which are drafts. /note=Starterator: Start number 8 @11622. Start 8 is called in 6% of genes in this pham, and is called 100% of the time it is present. This start site is the same as called by Glimmer and GeneMark. /note=Location call: 11622- This is called by Glimmer, GeneMark, and Starterator. Based on the above data, it is the best possible location call. /note=Function call: Head to tail adaptor. PhagesDB and NCBI BLAST only results in genes with this function and all have low E-values. The best result from NCBI BLAST is mycobacterium phage Jovo, which has a query cover of 100%, 100% identity, and 1e-84 E value. Jovo calls the function of this gene as a head to tail adaptor. HHpred`s best hit was a “phage protein” that had a 84% coverage, 99.9% probability and a 4.6e-21 e value. The second best hit was a head to tail connector that had a 71% coverage, 98.7% probability and a 0.0000013 e-value. /note=Transmembrane domains: There are no TMDs called by either TMHMM or TOPCONS. This aligns with the hypothesized function of the gene. /note=Secondary Annotator Name: Hernandez, Betania /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. *check evidence box for NCBI Blast* CDS 11993 - 12133 /gene="15" /product="gp15" /function="hypothetical protein" /locus tag="PickleBack_15" /note=Original Glimmer call @bp 11993 has strength 7.09; Genemark calls start at 11993 /note=SSC: 11993-12133 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein X816_gp76 [Mycobacterium phage Jovo] ],,NCBI, q1:s1 100.0% 7.62499E-27 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.238, -4.184392147131822, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein X816_gp76 [Mycobacterium phage Jovo] ],,YP_008859040,100.0,7.62499E-27 SIF-HHPRED: SIF-Syn: Head-to-tail connector protein. There is synteny both upstream and downstream of the gene when compared to phage Conspiracy. The gene upstream of this gene belongs to pham 48583 and downstream belongs to pham 38358 in Pickleback and Consipiracy. /note=Glania - Changing to NKF because there is no evidence as a portal protein (Pickleback_9 has more evidence of being portal protein.), and HHPred does not match requirements for head-to-tail connector/adaptor. /note=--- /note=Primary Annotator Name: Mehra, Muskaan /note=Auto-annotation: Glimmer and GeneMark. Both call the same start site (11993). /note=Coding Potential: The coding potential for this ORF is on the forward strand only, thus confirming that this is a forward gene. Although the coding potential doesn’t cover the whole gene, there is strong evidence from syteny that this gene does in fact exist. /note=SD (Final) Score: -4.184. This is the best (least negative) final score that has a good Z-value (2.238). /note=Gap/overlap: There is an overlap of 4bp. This indicates that this gene is part of an operon. /note=Phamerator: This gene belongs to pham 48616 (as of 9/29/2022). This pham has 411 members, 12 of which are drafts. The phages in this pham belong to cluster A. /note=Starterator: Start site 3 is manually annotated in 393 phages. It is the most annotated start site, and is called 100% of the time it is present. The start codon is GTG. /note=Location call: This is a real gene with start site 11993.This is called by Glimmer, GeneMark and Starterator. /note=Function call: The predicted function is a head to tail connector protein. There is strong evidence from PhagesDB and NCBI Blast with high coverage (100%) and low e-values (<10^-25). HHPred had no significant hits (all had low coverage and/or high e-values) /note=Transmembrane domains:Neither TMHMM nor TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Gleason, Zoe /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. I have no unconsidered input to provide on what to put as the function. CDS 12130 - 12492 /gene="16" /product="gp16" /function="head-to-tail stopper" /locus tag="PickleBack_16" /note=Original Glimmer call @bp 12130 has strength 8.24; Genemark calls start at 12130 /note=SSC: 12130-12492 CP: yes SCS: both ST: SS BLAST-Start: [head-to-tail stopper [Mycobacterium phage Jovo] ],,NCBI, q1:s1 100.0% 8.84187E-82 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.415, -3.8079208184165134, yes F: head-to-tail stopper SIF-BLAST: ,,[head-to-tail stopper [Mycobacterium phage Jovo] ],,YP_008859041,100.0,8.84187E-82 SIF-HHPRED: HEAD COMPLETION PROTEIN GP16; VIRAL PROTEIN, VIRAL INFECTION, TAILED BACTERIOPHAGE, SIPHOVIRIDAE, SPP1, VIRAL ASSEMBLY, HEAD-TO-TAIL INTERFACE, DNA GATEKEEPER, ALLOSTERIC MECHANISM; 7.2A {BACILLUS PHAGE SPP1},,,5A21_F,97.5,99.2 SIF-Syn: Head-to-tail stopper. Upstream gene is pham 50419 (function not stated in similar phages or in Pickleback). Downstream gene NKF is 49985 (function not stated in similar phages or in Pickleback), just like Lev2 and Phlorence. Part of operon in similar phages. /note=Primary Annotator Name: Doan, Pearl /note=Auto-annotation: Glimmer and Genemark. Both call the start at 5173. /note=Coding Potential: Coding potential in the ORF is in the forward strand only, indicating that this is a forward gene. Coding potential is found both in Host-trained and Self-trained Genemark. /note=SD (Final) Score: -3.808. This is the best score on PECAAN. /note=Gap/overlap: Upstream overlap: 4 base pairs. This overlap is fine and suggests that this gene is an operon gene. Downstream gap: 0 base pairs. Because there is no gap downstream of this gene, this is compelling evidence that the location call is correct. /note=Phamerator: Pham: 38358. Date 9/29/2022. It is conserved; it is found in Phlorence (A) and Lev2 (A). /note=Starterator: Start site 24 in Starterator was manually annotated on 9/23/22 in 532 non-draft genes in this pham. Start 24 is 12130 in Pickleback. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 12,130. /note=Function call: Head-to-tail stopper. The top three phagesdb BLAST hits have the function of head-to-tail stopper function (E-value <10^-64), and the top NCBI BLAST hits also have the function of head-to-tail stopper. (100% coverage, 100% identity, and E-value = 0). HHPred and CDD did not have remarkable function calls. /note=Transmembrane domains: No transmembrane domains are detected. TMHMM and TOPCONS does not detect any transmembrane domains. /note=Secondary Annotator Name: Kidd, Conner /note=Secondary Annotator QC: Thorough and to the point. Looks good. CDS 12492 - 12905 /gene="17" /product="gp17" /function="hypothetical protein" /locus tag="PickleBack_17" /note=Original Glimmer call @bp 12492 has strength 6.49; Genemark calls start at 12492 /note=SSC: 12492-12905 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein X816_gp74 [Mycobacterium phage Jovo] ],,NCBI, q1:s1 100.0% 2.05314E-94 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.033, -4.482213237254482, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein X816_gp74 [Mycobacterium phage Jovo] ],,YP_008859042,100.0,2.05314E-94 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Tarighat, Asefeh /note=Auto-annotation start source: Both Glimmer and GeneMark call the gene and they agree on the start site at 12492 bp. /note=Coding Potential: The ORF does have reasonable coding potential and the chosen start site does include all of the coding potential. The coding potential is only on the forward direction. /note=SD (Final) Score: The final score at the chosen start site is at -4.482 and the z score is at 2.033 which are both not the best numbers but ideal considering the guidelines . /note=Gap/overlap: The gap/overlap with the upstream gene is -1 bp gap which is within acceptable range eventhough over;lapping with the previous gene. /note=Phamerator: The pham number as of October ,4, 2022 is 49985. The gene is conserved in phages Dublin George, and Lev2 to name a few. There is no known function designated for this gene in any of those phages. /note=Starterator: The analysis was run on 09/30/22. There are 770 members of this Pham. Start site number 36 on 12492 was manually annotated for 480 times which is very strong and convincing. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 12492 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: multiple phagesDB BLAST has hits with the no known function with small e values of 5e-77. On HHPRED Has alignment with PF17395.5 with a probability of 99.99%, 84.67% coverage, and e-value of 7.5e-26. /note=Transmembrane domains: TMHMM predicts no TMD. TOPCONS also predicts no TMD. Based on this evidence this gene is not a transmembrane gene. /note=Secondary Annotator Name: Yan, Lisa /note=Secondary Annotator QC: I agree with the location and function calls. It might be helpful to note that some phages in the pham call a function for head to tail connector protein, as it could be a possible function? Make sure to fill out the synteny box! CDS 12905 - 13312 /gene="18" /product="gp18" /function="tail terminator" /locus tag="PickleBack_18" /note=Original Glimmer call @bp 12905 has strength 3.88; Genemark calls start at 12905 /note=SSC: 12905-13312 CP: yes SCS: both ST: SS BLAST-Start: [tail terminator [Mycobacterium phage Jovo] ],,NCBI, q1:s1 100.0% 2.7033E-95 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.138, -5.358093050197135, yes F: tail terminator SIF-BLAST: ,,[tail terminator [Mycobacterium phage Jovo] ],,YP_008859043,100.0,2.7033E-95 SIF-HHPRED: TAIL-TO-HEAD JOINING PROTEIN GP17; VIRAL PROTEIN, VIRAL INFECTION, TAILED BACTERIOPHAGE, SIPHOVIRIDAE, SPP1, VIRAL ASSEMBLY, HEAD-TO-TAIL INTERFACE, DNA GATEKEEPER, ALLOSTERIC MECHANISM; 7.2A {BACILLUS PHAGE SPP1},,,5A21_G,93.3333,99.1 SIF-Syn: As with phage ElTiger69, Pickleback_17 is downstream of a gene in pham 48581 and upstream of a gene in pham 48593. /note=Primary Annotator Name: Kidd, Conner /note=Auto-annotation: Start site @12905 according to both Glimmer and Genemark (GTG) /note=Coding Potential: GenemarkS predicts very favorable coding potential for the region between 12905 and 13312bp. However, Host-trained genemark predicts somewhat weak coding potential relative to previous annotated genes. However, it does not predict stronger coding potential for other open-reading frames over this region. Thus there is favorable coding potential for this gene/start site. /note=SD (Final) Score: -5.358 /note=Gap/overlap: -1bp overlap, indicating that the gene is likely part of an operon. /note=Phamerator: Pham 23 as of 9.29.22. This pham contains 770 members in which the far majority of members are cluster A, with a handful of cluster CA phages, and few unclustered phages. /note=Starterator: Start site 31 is not the most manually annotated start site for this pham (start site 27 which is not present in this gene), but this start site is called 96.7% of the time it is present and is present in 15.7% of the 770 phages of the gene in this pham. /note=Location call: Start @12905 (GTG) because Glimmer and Genemark agree, because the Z and SD scores are both maxed out for this start site, and because this start site is called at a a very high rate for this pham when it is present. /note=Function call: Tail Terminator Protein. Phagesdb BLAST and NCBI Blastp show strong evidence for this function. However, HHPRED and NCBIBlastp also show evidence for the function being a Head-To-Tail connector protein at a much lower frequency than the Tail terminator call. CDD had no functional hits. /note=Transmembrane domains: None. No predicted transmembrane domains on TmHmm or Topcons. /note=Secondary Annotator Name: Olvera, Kevin /note=Secondary Annotator QC:I agree with this annotation. All of the evidence categories have been considered. CDS 13327 - 14154 /gene="19" /product="gp19" /function="major tail protein" /locus tag="PickleBack_19" /note=Original Glimmer call @bp 13327 has strength 18.01; Genemark calls start at 13327 /note=SSC: 13327-14154 CP: yes SCS: both ST: SS BLAST-Start: [major tail protein [Mycobacterium phage Jovo] ],,NCBI, q1:s1 100.0% 0.0 GAP: 14 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.582, -3.451961863597044, yes F: major tail protein SIF-BLAST: ,,[major tail protein [Mycobacterium phage Jovo] ],,YP_008859044,99.6364,0.0 SIF-HHPRED: YSD1_22 major tail protein; Bacteriophage tail, helical assembly, VIRAL PROTEIN; 3.5A {Bacteriophage sp.},,,6XGR_J,58.5455,96.4 SIF-Syn: Major tail protein, which has upstream gene encoding tail terminator and downstream gene encoding tail assembly chaperone, which is also observed in Lev2, Aragog, and Dublin. /note=Primary Annotator Name: Yan, Lisa /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 13327. /note=Coding Potential: There is high coding potential in both GeneMark Self and Host in the forward direction, and the chosen start site contains all coding potential. /note=SD (Final) Score: The Z-score is 2.582 and the final score is -3.452 for the start site of 13327. Both are the best option. /note=Gap/overlap: The gap with the upstream gene is 14, which is a reasonable size. Other possible start sites have gaps greater than 100, which makes them more unlikely. /note=Phamerator: The pham number as of September 29, 2022 is 48593. The gene is conserved in other phages such as Anon, AppleCloud, and Bowtie, which are also in the A cluster with Pickleback. The function call is for a major tail protein. /note=Starterator: The most annotated start site is start number 20, which is called in 463/563 non-draft genes in the pham. However, this site is not found in Pickleback. Instead, Pickleback has a start at site 3, which has 33/563 calls. /note=Location call: Based on the evidence, the start site for this gene is at 13327, which agrees with Glimmer and GeneMark. /note=Function call: Most likely function of this gene is the major tail protein. There are multiple hits in PhagesDB BLAST, such as AgentM, Aragog, and Bluefalcon, which also call the major tail protein with E-value <-155. NCBI BLAST also calls multiple hits in phages which has 100% coverage, e-values of 0, and over 65% identity. HHPRED and CDD do not have any relevant hits. /note=Transmembrane domains: No transmembrane domains are reported in TmHmm or Topcons, therefore it is not a membrane protein. /note=Secondary Annotator Name: Tarighat, Asefeh /note=Secondary Annotator QC: I have QC’ed this location and function call and agree with the first annotator. CDS 14265 - 14645 /gene="20" /product="gp20" /function="tail assembly chaperone" /locus tag="PickleBack_20" /note=Original Glimmer call @bp 14265 has strength 12.07; Genemark calls start at 14265 /note=SSC: 14265-14645 CP: yes SCS: both ST: SS BLAST-Start: [tail assembly chaperone [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 2.43387E-83 GAP: 110 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.321, -4.395041241556842, no F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Mycobacterium phage Conspiracy] ],,YP_008857957,100.0,2.43387E-83 SIF-HHPRED: GP24_25 ; Mycobacteriophage tail assembly protein,,,PF17388.5,94.4444,100.0 SIF-Syn: This gene is present in phages Lev2 and Conspiracy. However, in these phages` genomes and others in this cluster, there is another gene that overlaps this entire gene. This larger gene does not appear to be present in PickleBack. /note=Primary Annotator Name: Olvera, Kevin /note=Auto-annotation:Glimmer and GeneMark agree on 14265 /note=Coding Potential:Coding potential in the forward direction found in GeneMark Host and GeneMark Self. /note=SD (Final) Score:-4.395 This is not the highest score on PECAAN /note=Gap/overlap:The gap is +110 bp and this is the smallest gap on PECAAN. /note=Phamerator:This gene belongs to Pham 49986 as of 09/30/22. This gene is present in phages Lev2(A5) and Conspiracy (A5). /note=Starterator: This gene belongs to Pham 49986. PickleBack does not contain the most annotated start site.The most annotated start site for this pham is start 49 with it being called in 361 out of 735 non-draft genes. Pickleback does contain start 47 which has 295 manual annotations. Start 47 is at position 14265 in PickleBack. /note=Location call: This is a real gene with start site at 14265 /note=Function call:Tail Assembly Chaperone- this function is the top 10 results on PhagesDB BLAST, and all 10 results have e-values better than 1e-64. The top HHPRED result is a tail assembly chaperone with 100% identity and coverage and an E-value of 2.4e-83. The top NCBI BLAST results also strongly point toward tail assembly chaperone. /note=Transmembrane domains:This gene has 0 tmds predicted in either TmHmm or TopCons /note=Secondary Annotator Name: Kidd, Conner /note=Secondary Annotator QC: Looks good Kevin! CDS join(14265..14618,14618..15058) /gene="21" /product="gp21" /function="tail assembly chaperone" /locus tag="PickleBack_21" /note= /note=SSC: 14265-15058 CP: no SCS: neither ST: NI BLAST-Start: GAP: -381 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.321, -4.395041241556842, no F: tail assembly chaperone SIF-BLAST: SIF-HHPRED: SIF-Syn: CDS 15063 - 18404 /gene="22" /product="gp22" /function="tape measure protein" /locus tag="PickleBack_22" /note=Original Glimmer call @bp 15063 has strength 11.63; Genemark calls start at 15063 /note=SSC: 15063-18404 CP: yes SCS: both ST: SS BLAST-Start: [tapemeasure protein [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 0.0 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.395, -4.060228844556081, no F: tape measure protein SIF-BLAST: ,,[tapemeasure protein [Mycobacterium phage Conspiracy] ],,YP_008857958,100.0,0.0 SIF-HHPRED: Tape Measure Protein, gp57; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_AF,77.9874,99.8 SIF-Syn: Synteny: Tape measure protein. Gene downstream is pham 48574 and upstream gene is 42271, just like the phage Micasa and Cuco. /note=Glania - Added another phage comparison to synteny box. /note= /note=Primary Annotator Name: Gleason, Zoe /note=Auto-annotation: Both Glimmer and GeneMark call the start site as 15063. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. The coding potential remains high throughout the self trained GeneMark, but spikes a lot in host trained GeneMark. /note=SD (Final) Score: -4.060. It is the best final score on PECAAN. /note=Gap/overlap: 4 base pair gap. Too short to include another gene and no coding potential is present. /note=Phamerator: The pham number as of 9/23/22 is 48618. The gene is conserved in phages BaconJack, Bumblebee11, and Parliament, all in the same cluster as Pickleback. The function call for the gene is a tape measure protein and it is consistent between Phamerator and the phams database. It is on the approved SEA-PHAGES list. /note=Starterator: This pham has 400 members and 18 drafts. 28 members call start site 1, which correlates to a start site of 15063 bp for Pickleback. It is called 100% of the time it is present. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 15063 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: Multiple phagesDB BLAST has hits with the suggested function tape measure protein with e values of 0. HHPRED’s best hit was 6V8I_AF, which is a tape measure protein. It has a 99.8% probability, 78% coverage, and 5.5e-10 e score. The best result from NCBI BLAST is from multiple mycobacterium phages, which has a query cover of 100%, 100% identity, and an e value of 0. The function of this gene is called as a tape measure protein. /note=Transmembrane domains: TMHMM predicts two TMDs. Based on this evidence this gene can be assumed to have a real TMD and is therefore a “membrane protein”, but the functional call of tape measure is more specific. This aligns with the hypothesized function of the gene. /note=Secondary Annotator Name: Olvera, Kevin /note=Secondary Annotator QC:I agree with this annotation. There has been a very thorough consideration of all evidence categories. CDS 18468 - 19481 /gene="23" /product="gp23" /function="minor tail protein" /locus tag="PickleBack_23" /note=Original Glimmer call @bp 18468 has strength 13.83; Genemark calls start at 18468 /note=SSC: 18468-19481 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Mycobacterium phage Lev2]],,NCBI, q1:s1 100.0% 0.0 GAP: 63 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.318, -3.8730906109286094, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Mycobacterium phage Lev2]],,QGJ97221,100.0,0.0 SIF-HHPRED: Distal Tail Protein, gp58; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_BC,97.0326,99.9 SIF-Syn: Minor tail protein. There is synteny both upstream and downstream of the gene when compared to phage Conspiracy. The gene upstream of this gene belongs to pham 48585 and downstream belongs to pham 48618 in Pickleback and Consipiracy. /note=Primary Annotator Name: Mehra, Muskaan /note=Auto-annotation: Glimmer and GeneMark. Both call the same start site (18468). /note=Coding Potential: The coding potential for this ORF is on the forward strand only, thus confirming that this is a forward gene. Coding potential is found for this ORF for both GeneMark Self and Host-Trained formats. /note=SD (Final) Score: -3.873. This is the best (least negative) final score that has a good Z-value (2.318). /note=Gap/overlap: There is a gap of 63bp. This gap is relatively large however this gene is conserved in several other phages and the gap was seen in the other phages as well, such as Conspiracy and Aragog. There is also no coding potential in either direction for this. /note=Phamerator: This gene belongs to pham 48574. This pham has 1125 members, 84 of which are drafts. Phages in this pham all belong to different clusters. /note=Starterator: Start site 94 (18468) is manually annotated in 80 phages. It is not the most annotated start site (87), but is called 76.4% of the time it is present. The start codon is TTG. /note=Location call: This is a real gene with start site 18468.This is called by Glimmer, GeneMark and Starterator. /note=Function call: The predicted function is a minor tail protein. There is strong evidence from PhagesDB and NCBI Blast with high coverage (~100%) and low e-values (0). HHPred had 1 significant hits which calls the same function (99% probability, 97% coverage and e-value of 4.8e-19) /note=Transmembrane domains:Neither TMHMM nor TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Hernandez, Betania /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 19478 - 21277 /gene="24" /product="gp24" /function="minor tail protein" /locus tag="PickleBack_24" /note=Original Glimmer call @bp 19478 has strength 10.21; Genemark calls start at 19478 /note=SSC: 19478-21277 CP: yes SCS: both ST: SS BLAST-Start: [tail protein [Mycobacterium phage Zolita] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.095, -4.349108688274226, no F: minor tail protein SIF-BLAST: ,,[tail protein [Mycobacterium phage Zolita] ],,YP_010060819,99.3322,0.0 SIF-HHPRED: Tail-Associated Lysin, gp59; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_AE,93.823,99.0 SIF-Syn: Minor tail protein, upstream gene is minor tail protein, downstream is NKF pham 41884 (function not stated in similar phages or in Pickleback), just like in phages Lev2 and Phlorence. Part of operon in Lev2 and Phlorence. /note=Primary Annotator Name: Doan, Pearl /note=Auto-annotation: Glimmer and Genemark. Both call the start at 19478. /note=Coding Potential: Coding potential in the ORF is in the forward strand only, indicating that this is a forward gene. Coding potential is found both in Host-trained and Self-trained Genemark. /note=SD (Final) Score: -4.349. This is the best score on PECAAN. /note=Gap/overlap: Upstream overlap: 4 bp. This overlap is acceptable and suggests that this gene is part of an operon. Downstream gap: 14 base pairs. This is below the 50 bp threshold and therefore cannot fit another gene. This gap is considered acceptable. /note=Phamerator: Pham: 49975. Date 9/30/2022. It is conserved; it is found in Phlorence (A) and Lev2 (A). /note=Starterator: Start site 106 in Starterator was manually annotated on 9/30/22 in 500 non-draft genes in this pham. Start 106 is 19478 in Pickleback. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 19, 478. /note=Function call: Minor tail protein. The top three phagesdb BLAST hits have the function of head-to-tail stopper function (E-value = 0), and the top NCBI BLAST hits also have the function of minor-tail protein. (100% coverage, >97% identity, and E-value = 0). HHPred also identified a tail protein (e < 10^-8, 94% coverage) as a notable tail protein. CDD did not have remarkable function calls. /note=Transmembrane domains: No transmembrane domains are detected. TMHMM and TOPCONS does not detect any transmembrane domains. /note=Secondary Annotator Name: Gleason, Zoe /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 21291 - 21755 /gene="25" /product="gp25" /function="hypothetical protein" /locus tag="PickleBack_25" /note=Original Glimmer call @bp 21291 has strength 9.67; Genemark calls start at 21291 /note=SSC: 21291-21755 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein X823_gp68 [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 2.13111E-110 GAP: 13 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.13, -4.801936092881806, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein X823_gp68 [Mycobacterium phage Conspiracy] ],,YP_008857961,100.0,2.13111E-110 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Tarighat, Asefeh /note=Auto-annotation start source: Both Glimmer and GeneMark call the gene and they agree on the start site at 21291bp. /note=Coding Potential: The ORF does have reasonable coding potential and the chosen start site does include all of the coding potential. The coding potential is only in the forward direction. The gene is 465 bp in length which is reasonable. /note=SD (Final) Score: The final score at the chosen start site is -4.482 and the z score is at 2.033 which are both not the best numbers but ideal considering the guidelines. /note=Gap/overlap: The gap/overlap with the upstream gene is -1 bp gap which is within an acceptable range even though overlapping with the previous gene. /note=Phamerator: The Pham number as of October 4, 2022, is 41884. The gene is conserved in phages Dublin George, and Lev2 to name a few. There is no known function designated for this gene in any of those phages. /note=Starterator: The analysis was run on 09/30/22. There are 835 members of this Pham (pham41884). Start site number 56 @21291 has 500 MA`s. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 21291 bp. Starterator agrees with Glimmer and Genemark with GTG being the start codon. /note=Function call: multiple phagesDB BLAST has hits with the no known function with small e-values of 3e-89. On HHPRED Has alignment with PF10910.1 with a probability of 100%, 85.7% coverage, and e-value of 1.2e-38. NCBI BLAST hits also approve the unknown function with an e-value of 2.1e-110 and probability and coverage of 100%. /note=Transmembrane domains: TMHMM predicts no TMD. TOPCONS also predicts no TMD. Based on this evidence this gene is not a transmembrane gene. /note=Secondary Annotator Name: Mehra, Muskaan /note=Secondary Annotator QC: All good! CDS 21748 - 22050 /gene="26" /product="gp26" /function="membrane protein" /locus tag="PickleBack_26" /note=Original Glimmer call @bp 21748 has strength 9.62; Genemark calls start at 21748 /note=SSC: 21748-22050 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein X823_gp67 [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 1.2066E-63 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.837, -2.827683592113848, no F: membrane protein SIF-BLAST: ,,[hypothetical protein X823_gp67 [Mycobacterium phage Conspiracy] ],,YP_008857962,100.0,1.2066E-63 SIF-HHPRED: SIF-Syn: /note=Glania - ran DeepTMHMM, 2 predicted TMRs, evidence towards function of membrane protein. /note= /note=Primary Annotator Name: Kidd, Conner /note=Auto-annotation: Start site at 21748 (Glimmer and Genemark agree), GTG /note=Coding Potential:Host-trained genemark shows minimal coding potential in the region between 21748bp and approximately 21850bp but shows high coding potential from 21850bp to 22050bp. This is in contrast to GenemarkS which shows a high level of coding potential for the entire expanse of the gene. When compared to Lev2_25 it was concluded that the start site at 21748 was likely correct. /note=SD (Final) Score: 2.837 (Z-score = -2.828). Both SD and Z scores are maximized. /note=Gap/overlap: -8bp overlap. Large overlap is somewhat concerning but other genes in the same pham have the same -8 overlap (ex: Lev2_25 and ForGetIt_25). /note=Phamerator: Pham 116 on 10.4.22. This pham has 349 members of which all member phages are Cluster A, unclustered, or singletons. /note=Starterator: Start site 7 is found in 13.2% of the genes in this pham and was called 100% of the time when it was present. Pickleback_25 also does not contain the most annotated start site. /note=Location call: Start site @21748bp because glimmer and genemark agree, there is good coding potential for the bounded region (GenemarkS and Synteny with other phages). The large overlap is concerning but other genes in pham 116 (looking at Lev2 and ForGetIt) have a similar -8 overlap for the same pham. /note=Function call: NKF. PhagesDB Blast, NBCI BLASTp, CDD, and HHPred all show no functional hits for this protein. PhagesDB, however, has also called this genes in this pham the minor tail protein a total of 23 times. Despite this, there is no other evidence for this functional call. /note=Transmembrane domains: None. TmHmm predicts 1 TMH while Topcons predicts none. This is not sufficient evidence to indicate TMD’s. /note=Secondary Annotator Name: Doan, Pearl /note=Secondary Annotator QC: I have QC`ed the location and function call and agree with the annotator. CDS 22047 - 22367 /gene="27" /product="gp27" /function="membrane protein" /locus tag="PickleBack_27" /note=Original Glimmer call @bp 22047 has strength 9.71; Genemark calls start at 22077 /note=SSC: 22047-22367 CP: yes SCS: both-gl ST: SS BLAST-Start: [minor tail protein [Mycobacterium phage Phlorence] ],,NCBI, q1:s1 100.0% 1.7226E-69 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.949, -4.741344217713455, no F: membrane protein SIF-BLAST: ,,[minor tail protein [Mycobacterium phage Phlorence] ],,ATW60001,100.0,1.7226E-69 SIF-HHPRED: SIF-Syn: /note=DeepTMHMM detects one TMD. /note=Primary Annotator Name: Yan, Lisa /note=Auto-annotation: Glimmer calls the start site at 22047 while GeneMark calls the start site at 22077. /note=Coding Potential: The gene has high coding potential within the ORF that is called by both Glimmer and GeneMark. /note=SD (Final) Score: The final score for start site at 22047 is -4.741 with a Z-score of 1.949, while the final score for start site at 22077 is -4.253 with a Z-score of 2.14. While the scores for start site of 22077 are better, they may be irrelevant because the gene has a 4bp overlap, which means they are organized into an operon. /note=Gap/overlap: For the start site of 22047, there is a 4bp overlap, which indicates that this gene is most likely part of an operon. The other start site has a gap of 26bp, which is reasonable, but the presence of an operon is more likely. /note=Phamerator: As of October 4, 2022, the gene is found in pham 49993. Pham 49993 is in other members of cluster A such as Astro, BabyRay, and Baehexic. The pham database most frequently calls this gene’s function as coding for a minor tail protein. /note=Starterator: The most annotated start site of this pham is start site 23, which is called in 225/642 non-draft genes. However, this start site is not found in Pickleback. Start site 40 is called in Pickleback at start site 22047 and is found in 46/668 genes in the pham. This agrees with the site predicted by Glimmer and supports the gene being part of an operon. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site of 22047. /note=Function call: Minor tail protein. The top Phagesdb BLAST matches including AgentM, ForGetIt, and Phlorence have a e-value of 4e-56. A few NCBI BLAST matches also have the function of minor tail protein with high coverage, high identity, and low e-value. HHPRED and CDD do not yield any hits. /note=Transmembrane domains: TMHMM predicts 1 transmembrane domain, but Topcons does not predict any transmembrane domains, therefore there is insufficient evidence to call this a transmembrane protein. /note=Secondary Annotator Name: Kidd, Conner /note=Secondary Annotator QC: Very thorough notes, excellent. I`m not sure if this is correct but I think you need to click NA or NI for the start-site dropdown instead of SS since they don`t agree. Not sure tho. CDS 22364 - 23761 /gene="28" /product="gp28" /function="minor tail protein" /locus tag="PickleBack_28" /note=Original Glimmer call @bp 22364 has strength 8.01; Genemark calls start at 22364 /note=SSC: 22364-23761 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Mycobacterium phage Aragog]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.989, -2.5052746077145835, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Mycobacterium phage Aragog]],,ATW60974,99.7849,0.0 SIF-HHPRED: SIF-Syn: This gene is found in both phages Archetta(A5) and Phlorence (A5). In all three of these phages, the gene is preceded by a minor tail protein upstream and followed by a gene belonging to pham 623 downstream. /note=Primary Annotator Name: Olvera, Kevin /note=Auto-annotation:Glimmer and GeneMark agree on 22364 /note=Coding Potential: There is forward coding potential found in both GeneMark Self and Host /note=SD (Final) Score:-2.505, this is the best score on PECAAN /note=Gap/overlap:-4 bp, this gene is likely part of an operon /note=Phamerator: This gene is part of Pham 1076 as of 09/30/22 and is found in Lev 2 (A5) and Phlorence (A5) /note=Starterator: Start 9, with 43 MAs, is the most annotated start in this pham and is found in PickleBack however it is not called. Instead, PickleBack calls start 10 with 34 MAs. Start 10 is located at 22364 in PickleBack /note=Location call:This is a real gene with start at 22364 /note=Function call: Minor Tail Protein- PhagesDB Blast and NCBI BlAST both have minor tail protein results with e-values of zero. NCBI BLAST results also have greater than 97 percent identity and alignment. /note=Transmembrane domains:There are no predicted TMDs in either TmHmm or Topcons. /note=Secondary Annotator Name: Tarighat, Asefeh /note=Secondary Annotator QC: I have QC’ed this location and function call and agree with the first annotator. CDS 23771 - 24055 /gene="29" /product="gp29" /function="hypothetical protein" /locus tag="PickleBack_29" /note=Original Glimmer call @bp 23771 has strength 8.43; Genemark calls start at 23771 /note=SSC: 23771-24055 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein O156_gp65 [Mycobacterium phage LittleCherry] ],,NCBI, q1:s1 100.0% 4.33114E-62 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.229, -2.0111200136961407, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein O156_gp65 [Mycobacterium phage LittleCherry] ],,YP_008430687,100.0,4.33114E-62 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hernandez, Betania /note=Auto-annotation: Both Glimmer and GeneMark call the start at 23771 bp (ATG codon). /note=Coding Potential: Both GeneMark Host and GeneMark Self show reasonable coding potential in the forward strand. /note=SD (Final) Score: -2.011. The best final score on PECAAN. /note=Gap/overlap: 9 bp gap. This start site creates the longest ORF with acceptable length of 285 bp. /note=Phamerator: Pham: 623. Date: 10/4/2022. Conserved in AgentM, Aragog, and Artemis2UCLA, all from cluster A. /note=Starterator: Start site 17 is manually annotated 83/140 non-draft genes in this pham. Start site 17 is 23771 bp in PickleBack. Start site agrees with the start site called by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 23771. /note=Function call: Multiple hits on phagesdb BLAST had e-values lower than 3e-50 but had no known function. NCBI BLAST had strong hits of hypothetical proteins with e-values lower than 1e-61 but also had no known function. CDD and HHpred did not have any significant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS complement (24107 - 25546) /gene="30" /product="gp30" /function="serine integrase" /locus tag="PickleBack_30" /note=Original Glimmer call @bp 25546 has strength 11.84; Genemark calls start at 25546 /note=SSC: 25546-24107 CP: yes SCS: both ST: SS BLAST-Start: [serine integrase [Mycobacterium phage Jovo] ],,NCBI, q1:s19 100.0% 0.0 GAP: 220 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.837, -3.1164791313608173, no F: serine integrase SIF-BLAST: ,,[serine integrase [Mycobacterium phage Jovo] ],,YP_008859055,96.3783,0.0 SIF-HHPRED: A118 serine integrase; site-specific recombination, coiled-coil, RECOMBINATION; 2.541A {Listeria innocua},,,5UDO_D,70.3549,100.0 SIF-Syn: Serine integrase. Gene downstream is pham 48617 and upstream gene is 623, just like the phage Phlorence and Scorpia. /note=Primary Annotator Name: Gleason, Zoe /note=Auto-annotation: Both Glimmer and GeneMark call the start site as 25546. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. The coding potential is present but does fluctuate. /note=SD (Final) Score: -3.116. It is the best final score on PECAAN. /note=Gap/overlap: 220. No coding potential present in the gap, so no additional gene. /note=Phamerator: The pham number as of 9/23/22 is 48648. The gene is conserved in phages BlueFalcon, Datway, and HamSlice, all in the same cluster as Pickleback. Most are in cluster A, but some are in other clusters. The function call for the gene is serine integrase and it is consistent between Phamerator and the phams database. It is on the approved SEA-PHAGES list. /note=Starterator: This pham has 332 members and 24 drafts. 30 members call start site 98, which correlates to a start site of 25546 bp for Pickleback. Start site 87, which correlates to 25600bp has more manual annotations, but the best Z score and most likely length of the ORF based on the lengths of other genes in the pham leads to the conclusion that the start site is 25546. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 25546 bp. Starterator doesn’t agree with Glimmer and Genemark as another start site has more manual annotations, but the start site called by Glimmer and GeneMark has better data. /note=Function call: Serine integrase. Multiple phagesDB BLAST has hits with the suggested serine integrase function with e values of 0. HHPRED’s best hit was 5UDO_D, which is a serine integrase. It has a 99.8% probability, 70% coverage, and 2.1e-35 e score. The best result from NCBI BLAST is from multiple mycobacterium phages, which has a query cover of 100%, 96% identity, and an e value of 0. The function of this gene is called as a serine integrase. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Yan, Lisa /note=Secondary Annotator QC: I agree with this location and function call. CDS 25767 - 26186 /gene="31" /product="gp31" /function="membrane protein" /locus tag="PickleBack_31" /note=Original Glimmer call @bp 25767 has strength 5.61; Genemark calls start at 25824 /note=SSC: 25767-26186 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein X823_gp62 [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 2.57161E-90 GAP: 220 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.34, -3.8879567634597363, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein X823_gp62 [Mycobacterium phage Conspiracy] ],,YP_008857967,100.0,2.57161E-90 SIF-HHPRED: SIF-Syn: There is synteny both upstream and downstream of the gene when compared to phage Conspiracy. The gene upstream of this gene belongs to pham 50006 and downstream belongs to pham 27644 in Pickleback and Consipiracy. Conspiracy however calls no known function for this gene. /note=Primary Annotator Name: Mehra, Muskaan /note=Auto-annotation: Glimmer and GeneMark. Do not agree on the start site (Glimmer: 25767, GeneMark: 25824) /note=Coding Potential: The coding potential for this ORF is on the forward strand only, thus confirming that this is a forward gene. Coding potential is found for this ORF for both GeneMark Self and Host-Trained formats. /note=SD (Final) Score: The best final score is -3.888 with a Z-score higher than 2 (2.34). This final score is for the start site called by Glimmer (25767). /note=Gap/overlap: 220. This gap is large enough to fit another gene (>100bp) however there is no coding potential on forward or reverse strands in this region. The gap is also conserved in other related phages. /note=Phamerator:50020 (as of 10/04/2022). This pham has 415 members, 12 of which are drafts. The phages within this pham mostly belong to cluster A. /note=Starterator: Start site 41 (25767) is manually annotated in 37 phages. It is however not the most annotated start site, and is called 92.7% of the time it is present. The start codon is ATG. /note=Location call: This is a real gene with a start site at 25767. This is called by Glimmer and starterator. /note=Function call: Membrane protein. Although PhagesDB BLAST and HHPred predict no known function with good e-values, and NCBI Blast predicts DNA binding domain with a good e-value (~e-90), TMHMM predicts four TMHs and therefore there is compelling evidence that this is a membrane protein. /note=Transmembrane domains: TMHMM predicts four TMHs. Based on this evidence, this gene can be assumed to have a real TMH and is therefore a “membrane protein”. /note=Secondary Annotator Name: Olvera, Kevin /note=Secondary Annotator QC: i agree with the annotation given the tricky start call. The function call as a membrane protein makes the most sense due to the TmHmm predictions. CDS complement (26286 - 26795) /gene="32" /product="gp32" /function="deoxycytidylate deaminase" /locus tag="PickleBack_32" /note=Genemark calls start at 26795 /note=SSC: 26795-26286 CP: yes SCS: genemark ST: SS BLAST-Start: [deoxycytidylate deaminase [Mycobacterium phage Aragog]],,NCBI, q1:s1 100.0% 6.20663E-117 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.055, -6.629076085957458, yes F: deoxycytidylate deaminase SIF-BLAST: ,,[deoxycytidylate deaminase [Mycobacterium phage Aragog]],,ATW60978,99.4083,6.20663E-117 SIF-HHPRED: DEOXYCYTIDYLATE DEAMINASE; PYRIMIDINE METABOLISM, NUCLEOTIDE BIOSYNTHESIS, ZINC, HEXAMER, HYDROLASE, METAL-BINDING, PHOSPHOPROTEIN, ALLOSTERIC ENZYME; 2.1A {HOMO SAPIENS} SCOP: c.97.1.0,,,2W4L_D,75.1479,99.7 SIF-Syn: The function is deoxycytidylate deaminase . The upper stream gene`s function is a membrane protein and the lower stream gene has no known function. This gene has synteny with phage Lev2 and AgentM, even though the upstream (pham 50020) and down stream (pham 49169) genes in these phages have no known function. /note=Primary Annotator Name: Tarighat, Asefeh /note=Auto-annotation start source: GeneMark calls the start site to be at 26795 bp. Glimmer has not called any start site for this gene. /note=Coding Potential: The ORF does have reasonable coding potential and the chosen start site does cover all of the coding potentials. The coding potential is only in the reverse direction. The gene is 510 bp in length which is reasonable. /note=SD (Final) Score: The final score at the chosen start site is -6.629 and the z score is at 1.055 which are the only start site called and follows the guidelines. /note=Gap/overlap: The gap/overlap with the upstream gene is -4 bp gap (overlap) which is the most favorable gap or technically overlap, the start codon is ATG for this start site. /note=Phamerator: The Pham number as of October 5, 2022, is 27644. The gene is conserved in phages Aragog and Archetta, and Lev2 to name a few. The function called for those phages is deoxycytidylate deaminase. /note=Starterator: The analysis was run on 09/30/22. Pham number 27644 has 44 members, 2 are drafts. Start site number 4 @26795 has 37 MA`s. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 26795 bp. Starterator agrees with Genemark with ATG being the start codon. /note=Function call: The function called for this gene is deoxycytidylate deaminase. multiple phagesDB BLAST has hits with the function of deoxycytidylate deaminase with small e-values of 5e-97. On HHPRED Has alignment with 2W4L_D with a probability of 99.7%, 75.14% coverage, and e-value of 3.7e-16. NCBI BLAST hits also confirm deoxycytidylate deaminase function and has hits with an e-value of 6.2e-117 and probability and coverage of 100%. CDD as well as other databases have hits with the same function with the identity of 42.7, alignment of 55%, coverage of 72%, and e-value of 4.1e-37 which is marked as an extra piece of evidence for this function. /note=Transmembrane domains: TMHMM predicts no TMD. TOPCONS also predicts no TMD. Based on this evidence this gene is not a transmembrane gene. /note=Secondary Annotator Name: Gleason, Zoe /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS complement (26792 - 26965) /gene="33" /product="gp33" /function="hypothetical protein" /locus tag="PickleBack_33" /note=Original Glimmer call @bp 26965 has strength 6.69; Genemark calls start at 26965 /note=SSC: 26965-26792 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein X823_gp60 [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 3.45003E-34 GAP: 8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.106, -4.4057176022952405, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein X823_gp60 [Mycobacterium phage Conspiracy] ],,YP_008857969,100.0,3.45003E-34 SIF-HHPRED: DUF2292 ; Uncharacterized small protein (DUF2292),,,PF10055.12,71.9298,73.9 SIF-Syn: As with Phage Lev2 and BlueFalcon, Pickleback_32 is downstream of a gene in pham 27644 and upstream of a gene in pham 50004. /note=Primary Annotator Name: Kidd, Conner /note=Auto-annotation: Start site @26965 GTG, Glimmer and Genemark agree. /note=Coding Potential: Host-trained Genemark shows poor coding potential (<50%) for this region but does not predict any coding potential for this region on other ORF’s. This is in great contrast to GenemarkS which predicts rather high coding potential for this region. /note=SD (Final) Score: SD = 2.106, Z = -4.406. Both are maximized. /note=Gap/overlap: 8bp gap is good. /note=Phamerator: Pham 49169 on 10.4.22. This pham contains 31 genes all from Cluster A phages. /note=Starterator: Start site 7 is found in 100% of genes in this pham and is the most annotated start site with 25/26 annotated genes calling this start site (called 96.8% of the time). /note=Location call: Start site @26965 due to favorable coding potential on GenemarkS, synteny with other fully annotated A5 phages (BlueFalcon and Lev2), maximized SD and Z scores, and favorable gap (8bp). /note=Function call: NKF. There are no functional hits on HHPred, NCBI BLASTp, or CDD. HHPReD hits are all for proteins of unknown function and BLASTp hits likewise do not point towards any function. /note=Transmembrane domains: None. TmHmm and Topcons both predict no TMDs. /note=Secondary Annotator Name: Mehra, Muskaan /note=Secondary Annotator QC: Maybe include more information about why an 8bp gap is good. Also include the number of manual annotations that call for the given start site. HHPred evidence selected has a high e-value (11). No NCBI Blast evidence checked despite there being significant hits. CDS complement (26974 - 27132) /gene="34" /product="gp34" /function="hypothetical protein" /locus tag="PickleBack_34" /note=Original Glimmer call @bp 27132 has strength 8.97; Genemark calls start at 27132 /note=SSC: 27132-26974 CP: yes SCS: both ST: SS BLAST-Start: [lipoprotein [Mycobacterium phage Zolita] ],,NCBI, q1:s1 98.0769% 3.27412E-25 GAP: 88 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.123, -2.6835611257758942, yes F: hypothetical protein SIF-BLAST: ,,[lipoprotein [Mycobacterium phage Zolita] ],,YP_010060830,94.2308,3.27412E-25 SIF-HHPRED: SIF-Syn: The function for this protein is lipoprotein. It has pham 28 upstream and pham 49619 downstream, similar to Lev2. /note=Primary Annotator Name: Yan, Lisa /note=Auto-annotation: Glimmer and GeneMark both call the start at 27132. /note=Coding Potential: There is coding potential in both GeneMarkS and GeneMark Host in the predicted open reading frame. /note=SD (Final) Score: The final score for start site of 27132 is -2.684 with a Z-score of 3.123. Both are the best scores. /note=Gap/overlap: While there is a relatively large gap of 88bp, the other suggested start sites are more unlikely due to Z-score, final score, and coding potential. The length of the gene is on the smaller size with a length of 159, but this is still greater than 140bp. /note=Phamerator: As of October 4, 2022, this gene was found in pham 50004. This pham is also found in other Cluster A phages such as Alma, Albee, and Burger. A few phages have functions called for this gene, primarily with a function of lipoprotein. However, most have no known/called function. /note=Starterator: The most annotated start site for this gene is at start site 48, which is located at 27132. 29/522 of the phages call this start site. Pickleback does not have the most annotated start site 49. /note=Location call: This gene is a real gene and it starts at 27132 with a stop at 26974. /note=Function call: It is possible that this gene encodes a lipoprotein. PhagesDB BLAST does call many with unknown function, but Bluefalcon and Zolita do call a lipoprotein with low E-values (around e-22). HHPRED calls a lipoprotein in Streptococcus pneumoniae with 46% and E-value of 0.00081. NCBI BLAST only has 1 hit that is not a hypothetical protein and calls a lipoprotein with good coverage and good E-value. /note=Transmembrane domains: While TMHMM calls 1 transmembrane domain, TOPCONS does not predict any transmembrane domains, therefore there is insufficient evidence to state that the gene codes for a transmembrane domain. /note=Secondary Annotator Name: Doan, Pearl /note=Secondary Annotator QC: I have QC`ed the location and function call and agree with the annotator. CDS complement (27221 - 27682) /gene="35" /product="gp35" /function="hypothetical protein" /locus tag="PickleBack_35" /note=Original Glimmer call @bp 27682 has strength 15.57; Genemark calls start at 27682 /note=SSC: 27682-27221 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein X823_gp58 [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 1.1699E-104 GAP: -11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.989, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein X823_gp58 [Mycobacterium phage Conspiracy] ],,YP_008857971,100.0,1.1699E-104 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Olvera, Kevin /note=Auto-annotation:Glimmer and GeneMark agree on 27682 /note=Coding Potential:Coding potential in the reverse direction is found in both GeneMark Host and Self /note=SD (Final) Score:-2.505, this is the least negative score on PECAAN /note=Gap/overlap:-11 bp, this overlap produces the longest ORF /note=Phamerator:This gene belongs to pham 28 as of 09/30/22 and is found in phages Aragog( A5) and Bluefalcon( A5). /note=Starterator: Start 66 is the start with the most MAs with 447 out of 701 non-draft genomes. This start is called in PickleBack and is found at position 27682. /note=Location call: This is a real gene with start at 27682 /note=Function call:NKF- The top results from both PhagesDB BLAST and NCBI BLAST are unknown function/hypothetical proteins. /note=Transmembrane domains:There are zero TMDs predicted in either TmHmm or /note=TopCons. /note=Secondary Annotator Name: Tarighat, Asefeh /note=Secondary Annotator QC: I have QC’ed this location and function call and agree with the first annotator. CDS complement (27672 - 27956) /gene="36" /product="gp36" /function="hypothetical protein" /locus tag="PickleBack_36" /note=Original Glimmer call @bp 27920 has strength 7.86; Genemark calls start at 27920 /note=SSC: 27956-27672 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein X823_gp57 [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 8.23427E-62 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.423, -3.790975366313636, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein X823_gp57 [Mycobacterium phage Conspiracy] ],,YP_008857972,100.0,8.23427E-62 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hernandez, Betania /note=Auto-annotation: Both Glimmer and GeneMark call the start at 27920 bp (ATG codon). However, start site 27956 (TTG codon) is a better candidate. /note=Coding Potential: Both GeneMark Host and GeneMark Self show reasonable coding potential in the reverse strand using start site 27956. /note=SD (Final) Score: -3.791. This is the best final score. /note=Gap/overlap: -4 bp. Overlap suggests this is operon. This start site creates the longest ORF with acceptable length of 285 bp. /note=Phamerator: Pham: 1840. Date: 10/6/2022. Conserved in AgentM, Aragog, and Benedict, all from cluster A. /note=Starterator: Start site 7 (27956 bp) is manually annotated 27/46 non-draft genes in this pham. Present in PickleBack but not called. Based on this, start site 7 is most likely start site. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 27956. /note=Function call: NKF. Multiple hits on phagesdb BLAST had e-values < 4e-48 but had no known function. NCBI BLAST had strong hits of hypothetical proteins with e-values lower than 2e-56 but also had no known function. CDD and HHpred did not have any significant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs. /note=Secondary Annotator Name: Tarighat, Asefeh /note=Secondary Annotator QC: I have QC’ed this location and function call and agree with the first annotator. CDS complement (27953 - 28396) /gene="37" /product="gp37" /function="hypothetical protein" /locus tag="PickleBack_37" /note=Original Glimmer call @bp 28396 has strength 13.42; Genemark calls start at 28396 /note=SSC: 28396-27953 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein X823_gp56 [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 2.51946E-103 GAP: 19 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.218, -2.4811409279978642, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein X823_gp56 [Mycobacterium phage Conspiracy] ],,YP_008857973,100.0,2.51946E-103 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Gleason, Zoe /note=Auto-annotation: Both Glimmer and GeneMark call the start site as 28396. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -2.481. It is the best final score on PECAAN. /note=Gap/overlap: 19. Too short to contain another gene. /note=Phamerator: The pham number as of 9/30/22 is 1811. The gene is conserved in phages Benedict, BlueFalcon, and Bonamassa all in the same cluster as Pickleback. Most are in cluster A, but some are in cluster BD. /note=Starterator: This pham has 48 members and 1 drafts. 30 members call start site 6, which correlates to a start site of 28396 bp for Pickleback. it is called 96.7% of the time it is present. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 28396 bp. Starterator, Glimmer, and GeneMark all agree with each other. /note=Function call: No known function. Multiple phagesDB BLAST hits give the function as unknown with a 1e-85 e value (ex. Archetta and Conspiracy).No HHPRED results are relevant due to high e value, but all results here return a DUF. The best result from NCBI BLAST is from multiple mycobacterium phages, which has a query cover of 100%, 100% identity, and an e value of 2.52e-103. It is listed as a hypothetical protein. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Kidd, Conner /note=Secondary Annotator QC: I agree with everything. Good. CDS complement (28416 - 30260) /gene="38" /product="gp38" /function="DNA polymerase I" /locus tag="PickleBack_38" /note=Original Glimmer call @bp 30260 has strength 13.31; Genemark calls start at 30260 /note=SSC: 30260-28416 CP: yes SCS: both ST: SS BLAST-Start: [DNA polymerase I [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 0.0 GAP: 13 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.989, -2.970161406017234, yes F: DNA polymerase I SIF-BLAST: ,,[DNA polymerase I [Mycobacterium phage Conspiracy] ],,YP_008857974,100.0,0.0 SIF-HHPRED: Prex DNA polymerase; DNA polymerase, TRANSFERASE; 2.9A {Plasmodium falciparum (isolate 3D7)},,,5DKU_B,95.9283,100.0 SIF-Syn: DNA Polymerase I. There is synteny both upstream and downstream of the gene when compared to phage Conspiracy. The gene upstream of this gene belongs to pham 1811 and downstream belongs to pham 50147 in Pickleback and Consipiracy. /note=Primary Annotator Name: Mehra, Muskaan /note=Auto-annotation: Glimmer and GeneMark. Both call the same start site (30260) /note=Coding Potential: The coding potential for this ORF is on the reverse strand only, thus confirming that this is a reverse gene. Coding potential is found for this ORF for both GeneMark Self and Host-Trained formats. /note=SD (Final) Score: The best final score is -2.970 with a Z-score higher than 2 (2.989). This final score is for the start site 30260. /note=Gap/overlap: 13. This gap is not large enough to fit another gene (<100bp). The gap is also conserved in other related phages. /note=Phamerator:49976 (as of 10/04/2022). This pham has 1530 members, 109 of which are drafts. The phages within this pham most frequently belong to cluster A. /note=Starterator: Start site 211(30260) is manually annotated in 778 phages. It is the most annotated start site, and is called 99% of the time it is present. The start codon is ATG. /note=Location call: This is a real gene with a start site at 30260. This is called by Glimmer, GeneMark and starterator. /note=Function call: DNA Polymerase I. There is strong evidence from PhagesDB and NCBI Blast with high coverage (~97%) and low e-values (0). HHPred had 3 significant hits which calls the same function (100% probability, ~97% coverage and e-value of 0) /note=Transmembrane domains:Neither TMHMM nor TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Yan, Lisa /note=Secondary Annotator QC: I agree with the location and function call. It might be helpful to note that the phages within the pham also call DNA Polymerase I and include the total number of phages in the pham under Starterator. CDS complement (30274 - 30660) /gene="39" /product="gp39" /function="hypothetical protein" /locus tag="PickleBack_39" /note=Original Glimmer call @bp 30660 has strength 13.71; Genemark calls start at 30660 /note=SSC: 30660-30274 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein X823_gp54 [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 8.80885E-89 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.218, -2.4811409279978642, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein X823_gp54 [Mycobacterium phage Conspiracy] ],,YP_008857975,100.0,8.80885E-89 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Doan, Pearl /note=Auto-annotation: Glimmer and Genemark. Both call the start at 30660. /note=Coding Potential: Coding potential in the ORF is in the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in Host-trained and Self-trained Genemark. /note=SD (Final) Score: -2.481. This is the best score on PECAAN and covers the greatest open reading frame and coding potential. /note=Gap/overlap: Upstream overlap: 4 bp. This overlap is acceptable and suggests that this gene is part of an operon. Downstream gap: 14 base pairs. This is below the 50 bp threshold and therefore cannot fit another gene. This gap is considered acceptable. /note=Starterator: Start site 31 in Starterator was manually annotated on 9/30/22 in 37 non-draft genes in this pham. It is the second-most called start site. Start 31 is 30660 in Pickleback. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Phamerator: Pham: 50147. Date 09/30/22. It is conserved; it is found in Phlorence (A) and Lev2 (A). /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 30660. /note=Function call: Unknown function. The top three phagesdb BLAST hits have the function of unknowon function (E-value <10^-69), and the top five NCBI BLAST hits also have the function of unknown protein. (100% coverage, 98%+ identity, and E-value <10^-88). HHPred’s top hit shows that there is also another protein with no known function (coverage = 98%, probability = 99.9%, e-value < 10^-20). /note=Transmembrane domains: No transmembrane domains are detected. TMHMM and TOPCONS does not detect any transmembrane domains. /note=Secondary Annotator Name: Olvera, Kevin /note=Secondary Annotator QC: i agree with this annotation given the lack of evidence for a specific protein, NKF is the most logical function call here. CDS complement (30657 - 30920) /gene="40" /product="gp40" /function="helix-turn-helix DNA binding domain" /locus tag="PickleBack_40" /note=Original Glimmer call @bp 30881 has strength 4.49; Genemark calls start at 30920 /note=SSC: 30920-30657 CP: yes SCS: both-gm ST: SS BLAST-Start: [HTH DNA binding domain protein [Mycobacterium phage Jovo] ],,NCBI, q1:s1 100.0% 8.88922E-55 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.837, -3.2925703904164987, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[HTH DNA binding domain protein [Mycobacterium phage Jovo] ],,YP_008859065,100.0,8.88922E-55 SIF-HHPRED: DNA-invertase; helix-turn-helix, site-specific recombinase, RECOMBINATION; 3.51A {Enterobacteria phage Mu},,,3UJ3_X,90.8046,96.8 SIF-Syn: /note=Primary Annotator Name: Tarighat, Asefeh /note=Auto-annotation start source: GeneMark calls the start site to be at 30920 bp but Glimmer has called the strat site to be 30881bp. /note=Coding Potential: The ORF does have strong coding potential and the start site 30920 covers all of the coding potentials whereas start site 0f 30881 chosen by Glimmer will miss a small area. The coding potential is only in the reverse direction. The gene is 264 bp in length with 30920 being the star site and 225bp in length with 20881 being the start site. Also, the start @ 30881 has an start codon of TTG, which is not much favorable in compare to start @30920 start codon of ATG which is usually preferred. /note=SD (Final) Score: The final score @start 30920 is -3.293 which is also the least negative score and the z score is 2.837. /note=Gap/overlap: The gap/overlap with the upstream gene is -8 bp gap (overlap) which might not be the best, 8 bp overlap might not be the ideal but still not too big. /note=Phamerator: The Pham number as of October 6, 2022, is 50100. This pham has 231 members. The gene is conserved in many phages like AgentM and Lev2 to name a few. The function called for those phages is helix-turn-helix dna binding domain. /note=Starterator: The analysis was run on 09/30/22. Pham number 50100 has 231 members, 12 are drafts. Start site number 28 @30920 has 29 MA`s which is a strong validation to other data found. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 30920 bp despite originally being called @30881. 30920 covers the whole potential coding area, has better SD score, and Z-score and is manually annotated much more than the other start site. /note=Function call: The function called for this gene is helix-turn-helix DNA-binding domain protein. Multiple phagesDB BLAST has hits with the function of helix-turn-helix DNA-binding domain protein with small e-values of 5e-42 . On HHPRED Has alignment with 3UJ3_X with a probability of 96.8%, 90.8% coverage, and e-value of 0.0396 which is not too small but is the best hit. NCBI BLAST hits also confirms helix-turn-helix DNA-binding domain protein function and has hits with an e-value of 8.8e-55 and probability and coverage of 100%. CDD as well as other databases have hits with the same function even though the hits are not too strong in terms of number of coverage and identity. /note=Transmembrane domains: TMHMM predicts no TMD. TOPCONS also predicts no TMD. Based on this evidence this gene is not a transmembrane gene. /note=Secondary Annotator Name: Hernandez, Betania /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. *might want to check the box for CDD hit, since hit makes sense for this gene function even if doesn`t match cutoffs* CDS complement (30913 - 31632) /gene="41" /product="gp41" /function="ThyX-like thymidylate synthase" /locus tag="PickleBack_41" /note=Original Glimmer call @bp 31632 has strength 12.71; Genemark calls start at 31632 /note=SSC: 31632-30913 CP: yes SCS: both ST: NI BLAST-Start: [ThyX-like thymidylate synthase [Mycobacterium phage Jovo] ],,NCBI, q1:s1 100.0% 1.64632E-177 GAP: 55 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.218, -2.4811409279978642, yes F: ThyX-like thymidylate synthase SIF-BLAST: ,,[ThyX-like thymidylate synthase [Mycobacterium phage Jovo] ],,YP_008859066,100.0,1.64632E-177 SIF-HHPRED: Thymidylate synthase thyX; ThyX, FAD, FdUMP, Flavoprotein, Methyltransferase, Nucleotide biosynthesis, Transferase, Structural Genomics, Seattle Structural Genomics Center for Infectious; HET: UFP, FAD; 1.9A {Mycobacterium tuberculosis},,,3GWC_B,98.7448,100.0 SIF-Syn: This gene shows synteny with phages Aragog and Cuco in which a HTH DINA Binding Domain protein (pham 50100) is upstream of and a gene in pham 48686 is downstream. /note=Primary Annotator Name: Kidd, Conner /note=Auto-annotation: Start@31632 ATG (Glimmer and Genemark Agree) /note=Coding Potential: Both GenemarkS and the host-trained genemark show very good coding potential for the region between 30913bp and 31362bp with no coding potential on other orfs for that region. /note=SD (Final) Score: SD = -2.481, Z = 3.218, both scores are maximized. /note=Gap/overlap: 55bp gap, this is within the acceptable range. /note=Phamerator: Pham 43112 on 10.6.22. This pham has 599 members of which most are cluster A with a notably large number of cluster CA phages and a sprinkle of other clusters such as EN, CQ, DS, DF, and DD. Many singletons in this pham as well. /note=Starterator: Start site 40 (236 MA’s out of 568 manually annotated genes) is present in 45.5% of the genes in this pham and it is called 89.7% of the time it is present. The most annotated start site is not present in this gene. /note=Location call: Start site @31632 because glimmer and genemark agree, host-trained and genemarkS both show good coding potential for the entire expanse of the predicted gene, and the start site has high SD/Z scores. Also, the start site is called at a high rate. /note=Function call: Thyx-like Thymidylate Kinase. PhagesdbBLAST (E = 1e-137), NCBI BLASTp (E = 1.6e-177), HHPRed (E = 1.1e-38), and CDD (E = 0) ALL agree on this gene’s function as a Thyx-like thymidylate kinase with several identified matching hits and have very very low e-values for calling this function. /note=Transmembrane domains: None. TmHmm and Topcons predict no TMDs. /note=Secondary Annotator Name: Gleason, Zoe /note=Secondary Annotator QC: Only 2-3 evidence boxes should be checked. Starterator box needs to be checked. More detail should be provided in the function call. Everything else looks good. CDS complement (31688 - 32212) /gene="42" /product="gp42" /function="hypothetical protein" /locus tag="PickleBack_42" /note=Original Glimmer call @bp 32212 has strength 11.35; Genemark calls start at 32212 /note=SSC: 32212-31688 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein AVV40_gp49 [Mycobacterium phage Swirley] ],,NCBI, q1:s1 100.0% 4.41514E-116 GAP: 25 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.699, -3.8898912632369362, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein AVV40_gp49 [Mycobacterium phage Swirley] ],,YP_009208930,98.2857,4.41514E-116 SIF-HHPRED: SIF-Syn: There is synteny with Aragog and Cuco, as pham 27 encoding ribonucleotide reductase is upstream and pham 43112 encoding ThyX-like thymidylate synthase is downstream. /note=Primary Annotator Name: Yan, Lisa /note=Auto-annotation: Glimmer and GeneMark both call the start site at 32212. /note=Coding Potential: There is high coding potential in both GeneMark Host and Self through the denoted open reading frame. /note=SD (Final) Score: The final score is -3.890 with a Z-score of 2.699. Both are the best scores. /note=Gap/overlap: The gap is 25bp with a gene length of 525bp. This is a reasonable gap, especially compared to other possible start sites which would give gaps of 150bp or greater. /note=Phamerator: As of October 5, 2022, the gene is found in pham 48646. This pham is also included in cluster A to which Pickleback belongs to, such as Wander, Wile, and LittleCherry. There was no agreed upon function for this gene. /note=Starterator: While Pickleback does not contain the most annotated start site for this pham (which is start site 15 called in 127/324 genes in the pham), it does contain start site 12 which starts at 32212 and has 19/324 MA’s. This supports the auto-annotated start site and final scores. /note=Location call: This is a real gene and the start site that is most likely is at 32212. /note=Function call: There is no known function for this gene. PhagesDB BLAST does not yield any phages with called function, and HHPRED does not yield any results with good e-values or coverage. NCBI BLAST does confirm that this gene does code for a hypothetical protein. CDD does have one hit for a DNA polymerase III subunit with an e-value of 0.000025, but the coverage and identity are too low to be significant. /note=Transmembrane domains: No transmembrane domains are predicted by TMHMM or TOPCONS. /note=Secondary Annotator Name: Mehra, Muskaan /note=Secondary Annotator QC: No NCBI BLAST evidence checked. CDS complement (32238 - 34307) /gene="43" /product="gp43" /function="ribonucleotide reductase" /locus tag="PickleBack_43" /note=Original Glimmer call @bp 34307 has strength 9.9; Genemark calls start at 34307 /note=SSC: 34307-32238 CP: yes SCS: both ST: SS BLAST-Start: [ribonucleoside-triphosphate reductase, adenosylcobalamin-dependent [Mycobacterium phage Jovo] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.218, -2.4811409279978642, yes F: ribonucleotide reductase SIF-BLAST: ,,[ribonucleoside-triphosphate reductase, adenosylcobalamin-dependent [Mycobacterium phage Jovo] ],,YP_008859068,100.0,0.0 SIF-HHPRED: RNR_II_monomer; Class II ribonucleotide reductase, monomeric form. Ribonucleotide reductase (RNR) catalyzes the reductive synthesis of deoxyribonucleotides from their corresponding ribonucleotides.,,,cd01676,98.2583,100.0 SIF-Syn: /note=Primary Annotator Name: Olvera, Kevin /note=Auto-annotation:Glimmer and GeneMark agree on 34307 /note=Coding Potential:Both GeneMark Host and Self show coding potential in the reverse direction /note=SD (Final) Score:-2.481 this is the best score on PECAAN /note=Gap/overlap:-4 bp, this is likely part of an operon. This overlap also produces the LORF /note=Phamerator:This gene is part of Pham 27, as of 09/30/22. This gene is also found in Aragog (A5) and Bluefalcon (A5) /note=Starterator: Start 47 hast the most MAs being called in 441 out of 705 non-draft genomes. Start 47 is called in PickleBack at position 34307 /note=Location call:This is a real gene at position 34307 /note=Function call: Ribonucleotide reductase- the top results on PhagesDB BLAST, NCBI BLAST, and HHPRED that match for ribonucleotide reductase all have e-values of zero. /note=Transmembrane domains:There are no TMDs predicted in either TmHmm or Topcons /note=Secondary Annotator Name: Doan, Pearl /note=Secondary Annotator QC: I have reviewed the functional and locational calls of this gene and believe them to be plausible. CDS complement (34304 - 34474) /gene="44" /product="gp44" /function="hypothetical protein" /locus tag="PickleBack_44" /note=Original Glimmer call @bp 34474 has strength 15.7; Genemark calls start at 34474 /note=SSC: 34474-34304 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein O156_gp49 [Mycobacterium phage LittleCherry] ],,NCBI, q1:s1 100.0% 4.70511E-33 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.365, -4.824049307458754, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein O156_gp49 [Mycobacterium phage LittleCherry] ],,YP_008430703,100.0,4.70511E-33 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hernandez, Betania /note=Auto-annotation: Both Glimmer and GeneMark call the start at 34474 bp (ATG codon). /note=Coding Potential: Both GeneMark Host and GeneMark Self show reasonable coding potential in the reverse strand. /note=SD (Final) Score: -4.824. Not best final score but irrelevant since part of operon. /note=Gap/overlap: -1 bp. This start site creates the longest ORF with acceptable length of 171 bp. /note=Phamerator: Pham:39644. Date: 10/6/2022. Conserved in Airmid, Anthony, and Aragog, all from cluster A. /note=Starterator: Start site 40 is manually annotated 62/602 non-draft genes in this pham. Called 100% time when present. Start site agrees with the start site called by Glimmer and GeneMark. /note=Location call: Based on evidence above, this is a real gene and most likely start site is 23771. /note=Function call: NKF. Multiple hits on phagesdb BLAST had e-values < 2e-27 but had no known function. NCBI BLAST had strong hits of hypothetical proteins with e-values < 5e-32 but also had no known function. CDD and HHpred did not have any significant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs. /note=Secondary Annotator Name: Gleason, Zoe /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS complement (34474 - 34710) /gene="45" /product="gp45" /function="hypothetical protein" /locus tag="PickleBack_45" /note=Original Glimmer call @bp 34710 has strength 12.68; Genemark calls start at 34710 /note=SSC: 34710-34474 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein X823_gp48 [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 6.25796E-48 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.489, -3.50848394673489, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein X823_gp48 [Mycobacterium phage Conspiracy] ],,YP_008857981,100.0,6.25796E-48 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Gleason, Zoe /note=Auto-annotation: Both Glimmer and GeneMark call the start site as 34710. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.508 It is the best final score on PECAAN. /note=Gap/overlap: -8. This start position is well supported by Glimmer, GeneMark, and Starterator. This overlap is also seen in phage Conspiracy. /note=Phamerator: The pham number as of 9/30/22 is 50136. The gene is conserved in phages Anon, MaryWells, and PetterN all in the same cluster as Pickleback. The genes in this pham are in multiple different clusters, but the majority are in cluster E. /note=Starterator: This pham has 194 members and 16 drafts. 37 members call start site 36, which correlates to a start site of 34710 bp for Pickleback. It is called 100% of the time it is present. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 34710 bp. Starterator, Glimmer, and GeneMark all agree with each other. /note=Function call: No known function. Multiple phagesDB BLAST hits give the function as unknown with a 3e-38 e value (ex. AgentM and Aragog). No HHPRED results are relevant due to high e value, but all results here return a DUF. The best result from NCBI BLAST is from multiple mycobacterium phages, which has a query cover of 100%, 100% identity, and an e value of 6.26e-48. It is listed as a hypothetical protein. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Tarighat, Asefeh /note=Secondary Annotator QC: “I have QC’ed this location and function call and agree with the first annotator.” CDS complement (34703 - 34798) /gene="46" /product="gp46" /function="hypothetical protein" /locus tag="PickleBack_46" /note=Genemark calls start at 34798 /note=SSC: 34798-34703 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein AVT30_gp45 [Mycobacterium phage UnionJack] ],,NCBI, q1:s1 100.0% 3.56247E-12 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.51, -5.070883472086901, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein AVT30_gp45 [Mycobacterium phage UnionJack] ],,YP_009198820,93.9394,3.56247E-12 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Doan, Pearl /note=Auto-annotation: Genemark only. Genemark calls the start at 34798. /note=Coding Potential: Coding potential in the ORF is in the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in Host-trained and Self-trained Genemark. /note=SD (Final) Score: -5.071. This is the second best score on PECAAN and covers the greatest open reading frame and coding potential. /note=Gap/overlap: Upstream overlap: 1 bp. This overlap is acceptable. Downstream gap: 7 base pairs. This is below the 50 bp threshold and therefore cannot fit another gene. This gap is considered acceptable. /note=Starterator: Start site 4 in Starterator was manually annotated on 9/30/22 in 37 non-draft genes in this pham. It is the most called start site. Start 4 is 34798 in Pickleback. This evidence agrees with the site predicted by GeneMark. /note=Phamerator: Pham: 43495. Date 09/30/22. It is conserved; it is found in Phlorence (A) and Lev2 (A). /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 34798. /note=Function call: Unknown function. The top three phagesdb BLAST hits have the function of unknwon function (E-value <10^-12), and 3 out of 5 top NCBI BLAST hits also have the function of unknown protein. (100% coverage, 93%+ identity, and E-value <10^-11). HHPred and CDD do not offer insightful evidence for function. /note=Transmembrane domains: No transmembrane domains are detected. TMHMM and TOPCONS does not detect any transmembrane domains. /note=Secondary Annotator Name: Kidd, Conner /note=Secondary Annotator QC: Looks great Pearl! CDS complement (34798 - 35424) /gene="47" /product="gp47" /function="DNA binding protein" /locus tag="PickleBack_47" /note=Original Glimmer call @bp 35424 has strength 11.9; Genemark calls start at 35424 /note=SSC: 35424-34798 CP: yes SCS: both ST: SS BLAST-Start: [DNA binding protein [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 2.18968E-151 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.94, -5.729349449255814, no F: DNA binding protein SIF-BLAST: ,,[DNA binding protein [Mycobacterium phage Conspiracy] ],,YP_008857983,100.0,2.18968E-151 SIF-HHPRED: RNA polymerase sigma-E factor; regulation, DNA-BINDING, TRANSMEMBRANE, TRANSCRIPTION; 2.0A {Escherichia coli} SCOP: a.4.13.2, a.177.1.1, l.1.1.1,,,1OR7_B,75.4808,99.7 SIF-Syn: There is synteny both upstream and downstream of the gene when compared to phage Conspiracy. The gene upstream of this gene belongs to pham 43495 and downstream belongs to pham 18 in Pickleback and Consipiracy. Conspiracy however calls no known function for this gene. /note=Primary Annotator Name: Mehra, Muskaan /note=Auto-annotation: Glimmer and GeneMark. Both call the same start site (35424) /note=Coding Potential:The coding potential for this ORF is on the reverse strand only, thus confirming that this is a reverse gene. Coding potential is found for this ORF for both GeneMark Self and Host-Trained formats. /note=SD (Final) Score: The best final score is -3.430 with a Z-score higher than 2 (2.914). However, since GeneMark, Glimmer and starterator all call the start site as 35424 with compelling evidence (54 manual annotations), it is likely that this is the real start site. It has a final score of -5.729 and Z-score of 1.94 /note=Gap/overlap: -4bp. This gap indicates that this gene is part of an operon. /note=Phamerator: 151 (as of 10/04/2022). This pham has 305 members, 15 of which are drafts. The phages within this pham most frequently belong to cluster A. /note=Starterator: Start site 13 (35424) is manually annotated in 54 phages. It is however not the most annotated start site, and is called 58.2% of the time it is present. The start codon is TTG. /note=Location call: This is a real gene with a start site at 35424. This is called by Glimmer, GeneMark and starterator. /note=Function call: DNA binding protein. There is strong evidence from PhagesDB and NCBI Blast with high coverage (~100%) and low e-values (100bp) and there is coding potential observed here in both host trained and self-trained formats. When you compare this phage genome to other related genomes, there is a gene present in this gap. Therefore, a gene needs to be added. /note=Phamerator: 50676 (as of 10/06/2022). This pham has 20 members, 2 of which are drafts. The phages within this pham all belong to cluster A. /note=Starterator: Start site 7 (38756) is manually annotated in 13 phages. It is the most annotated start site, and is called 78.9% of the time it is present. The start codon is TTG. /note=Location call: This is a real gene with a start site at 38756. This is called by Glimmer, GeneMark and starterator. /note=Function call: NKF. There is strong evidence from PhagesDB and NCBI Blast with high coverage (~100%) and low e-values ( 1). /note=Transmembrane domains:Neither TMHMM nor TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Tarighat, Asefeh /note=Secondary Annotator QC: “I have QC’ed this location and function call and agree with the first annotator.” CDS complement (38677 - 38901) /gene="56" /product="gp56" /function="hypothetical protein" /locus tag="PickleBack_56" /note= /note=SSC: 38901-38677 CP: yes SCS: neither ST: NA BLAST-Start: [hypothetical protein CL86_gp042 [Mycobacterium phage SkiPole] ],,NCBI, q1:s1 67.5676% 1.7021E-23 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.989, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein CL86_gp042 [Mycobacterium phage SkiPole] ],,YP_009019117,75.4098,1.7021E-23 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Gleason, Zoe /note=Auto-annotation: No start predicted by Glimmer and GeneMark. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -2.443. It is the best final score on PECAAN and corresponds to the LORF. /note=Gap/overlap: -1, indicative of operon. /note=Phamerator: No pham is listed but Pham Maps shows that the corresponding gene in other phages in this cluster is pham 47433 (Aragog, Conspiracy, Phlorence). PhagesDB also shows that many similar genes are this pham. Genes in this pham belong to cluster A phages. /note=Starterator: N/A. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 38901 bp. This is part of the gene candidates. Synteny for this gene shows that this gene in other phages in this cluster share the overlap of 1, making this the most likely candidate for start position. /note=Function call: No known function. Multiple phagesDB BLAST hits give the function as unknown with a 7e-19 e value (ex. AgentM and Archetta). There are no significant HHPred results, due to high e values. The best result from NCBI BLAST is from multiple mycobacterium phages, which has a query cover of 67%, 75% identity, and an e value of 1.7e-23. It is listed as a hypothetical protein. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Stephenson, Juliet /note=Secondary Annotator QC: I have reviewed the location and function calls for this gene and I agree with the primary annotator. CDS complement (38901 - 39125) /gene="57" /product="gp57" /function="hypothetical protein" /locus tag="PickleBack_57" /note=Original Glimmer call @bp 39125 has strength 12.23; Genemark calls start at 39125 /note=SSC: 39125-38901 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein X823_gp36 [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 1.50393E-45 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.995, -4.702334767722353, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein X823_gp36 [Mycobacterium phage Conspiracy] ],,YP_008857993,100.0,1.50393E-45 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Doan, Pearl /note=Auto-annotation: Glimmer and Genemark. Both call the start at 39125. /note=Coding Potential: Coding potential in the ORF is in the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in Host-trained and Self-trained Genemark. /note=SD (Final) Score: -4.702. This is the best score on PECAAN and covers the greatest open reading frame and coding potential. /note=Gap/overlap: Upstream overlap: 4 bp. This overlap is acceptable and suggests that this gene is part of an operon. Downstream gap: 78 base pairs This gap is under 100 base pairs and is acceptable. /note=Starterator: Start site 1 in Starterator was manually annotated on 9/30/22 in 12 non-draft genes in this pham. It is the most called start site. Start 1 is 39125 in Pickleback. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Phamerator: Pham: 5051. Date 09/30/22. It is conserved; it is found in Phlorence (A) and Lev2 (A). /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 39, 125. /note=Function call: Unknown function. The top three phagesdb BLAST hits have the function of unknown function (E-value <10^-35), and 3 top NCBI BLAST hits also have the function of unknown protein. (100% coverage, 100% identity, and E-value <10^-44). HHPred and CDD do not offer insightful evidence for function. /note=Transmembrane domains: No transmembrane domains are detected. TMHMM and TOPCONS does not detect any transmembrane domains. /note=Secondary Annotator Name: Kidd, Conner /note=Secondary Annotator QC: Looks good. Just one thing about the gap/overlap section: Doesn`t a gap of over 120bp, not 50bp, suggest that a gene may be present? That is all. CDS complement (39122 - 39256) /gene="58" /product="gp58" /function="hypothetical protein" /locus tag="PickleBack_58" /note=Original Glimmer call @bp 39244 has strength 11.05; Genemark calls start at 39244 /note=SSC: 39256-39122 CP: no SCS: both-cs ST: SS BLAST-Start: [hypothetical protein X823_gp35 [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 9.73793E-22 GAP: 19 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.218, -2.4811409279978642, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein X823_gp35 [Mycobacterium phage Conspiracy] ],,YP_008857994,100.0,9.73793E-22 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Tarighat, Asefeh /note=Auto-annotation: Glimmer and Genemark both call the start site to be at 39244. /note=Coding Potential: The gene does have good coding potential in the Host-Trained GeneMark in the Reverse direction and must be a real gene. ORF 39256 can be a better option considering the host/self-trained. 135 bp length with ss@39256. /note=SD (Final) Score: The start site of 39244, with a final score of -4.273. The Z score is 2.9743 both follow the guidelines. start site @ 39256, SD score= -2.481 Final score= 3.218. Scores for start @39256 are closer to the guidelines. /note=Gap/overlap: The gap for the start site of 39244 is 31 with the upstream gene. Gap for start @39256 is 19 which is better. /note=Phamerator: Pham number 1359 as of (10/09/2022). This gene with a start site of 39244 is conserved and has synteny with AgentM, and Lev2. /note=Starterator: (analysis was run on 09/23/22) Pham number 1359 has 68 members, 4 are drafts. Start site 8 @ 39256 with 62 MA`s Whereas start site 9 @39244 has only one MA`s. This data strongly declines the start site to be @39256 rather than auto-annotated 39244 ss. /note=Location call: start site 39256. Based on the information obtained on starterator, host/self trained and the scores. I will call start site 39256 to be the correct start site for this gene. /note=Function call: NKF. many hits on phagesdb have NKF for this gene with e-value of 1e-18. No data on HHPRED, NCBI blast found it to be a hypothetical function, e-value 9.73793e-22, identity and coverage 100%. CDD has no info. /note=Transmembrane domains: 0 reported on TOPCONS and THHMM not a transmembrane /note=Secondary Annotator Name: Yan, Lisa /note=Secondary Annotator QC: I agree with the location and function call, but the drop down menus need to be updated to reflect that. Update synteny box! CDS complement (39276 - 39734) /gene="59" /product="gp59" /function="endonuclease VII" /locus tag="PickleBack_59" /note=Original Glimmer call @bp 39734 has strength 7.04; Genemark calls start at 39734 /note=SSC: 39734-39276 CP: yes SCS: both ST: SS BLAST-Start: [endonuclease VII domain-containing protein [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 9.7591E-108 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.837, -3.6727816321281046, no F: endonuclease VII SIF-BLAST: ,,[endonuclease VII domain-containing protein [Mycobacterium phage Conspiracy] ],,YP_008857995,100.0,9.7591E-108 SIF-HHPRED: Restriction endonuclease Hpy99I; ENDONUCLEASE-DNA COMPLEX, RESTRICTION ENZYME, HPY99I, PSEUDOPALINDROME, HYDROLASE-DNA COMPLEX; HET: 1PE; 1.5A {Helicobacter pylori},,,3GOX_A,86.8421,99.9 SIF-Syn: This gene shows synteny with phage ForGetIt in which a gene in pham 49983 (endonuclease VII) is in between a gene in pham 1359 (upstream) and a gene in pham 5655 (downstream). /note=Primary Annotator Name: Kidd, Conner /note=Auto-annotation: Start @39734 ATG (glimmer and genemark agree) /note=Coding Potential: There is good coding potential predicted for this region by GenemarkS, however this coding potential is very close to, and nearly overlapping with coding potential for another gene on the same ORF. Host-trained genemark predicts relatively poor coding potential for this region however the issue of overlap does not appear here and this is the only region of the geneone with coding potential on any orf /note=SD (Final) Score: SD = -3.673, Z-score = 2.837. Both scores are maximized by this start site. /note=Gap/overlap: 0bp. This would explain the appearance of continuous coding potential on GenemarkS. /note=Phamerator: Pham 49983 on 10.6.22. This pham has 867 members of which the majority are cluster A however there are significant numbers of Cluster CA and CQ phages with a smattering of J, E, O, and K phages. /note=Starterator: Start site 54 is only found in 22 out of 867 genes in this pham (2.5%) but is manually annotated 100% of the time that it is present. This gene does not contain the most-annotated start site for this pham. /note=Location call: Start @39734 due to moderately favorable coding potential, glimmer and genemark agree, favorable SD and Z scores and the resolution of possibly questionable coding potential by the gap/overlap. /note=Function call: Endonuclease VII. ALL sourced agree that this gene is Endonuclease VII. NBCI BLASTp, PhagesDB BLAST, CDD, and HHPRed all show very low e-values with high probability that this gene is definitely endonuclease VII. /note=Transmembrane domains: None. TmHmm and TopCons predict 0 TMDs. /note=Secondary Annotator Name: Olvera, Kevin /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. The 0bp gap may mean this gene is part of an operon. CDS complement (39735 - 39890) /gene="60" /product="gp60" /function="hypothetical protein" /locus tag="PickleBack_60" /note=Original Glimmer call @bp 39890 has strength 3.67; Genemark calls start at 39890 /note=SSC: 39890-39735 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein X823_gp33 [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 3.60032E-30 GAP: -11 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.652, -3.5120652346393775, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein X823_gp33 [Mycobacterium phage Conspiracy] ],,YP_008857996,100.0,3.60032E-30 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Yan, Lisa /note=Auto-annotation: Glimmer and GeneMark both call the start site at 39890. /note=Coding Potential: There is high coding potential predicted by both Host-Trained and Self GeneMark throughout the ORF. /note=SD (Final) Score: The final score of start site at 39890 is -3.512 with a Z-score of 2.652. These scores are not the best scores; the best score is for start site of 39740, with a final score of -2.845 and Z-score of 2.837. However, this start site causes the gene length to be 6bp, so we can disregard this. /note=Gap/overlap: There is an 11bp overlap resulting in a 156bp long gene. While the 11bp overlap is unusual, the other start site options are unreasonable. For example, there is a start site which results in a 26bp overlap or a 79bp gap, which are both extremely large. The gene size of 156bp is also reasonable. /note=Phamerator: As of October 5, 2022, this gene is found in pham 5655, which has 11 members. All members of the pham belong to Cluster A, just like Pickleback. No function is called for any of the genes. /note=Starterator: The most annotated start site for pham 5655 is start site 2, which is called in 9/10 of the non-draft genes in the pham. Pickleback does call this start site 2, which corresponds to a start at 39890. /note=Location call: This gene is a real gene and is located at 39890 to 39735. /note=Function call: The top hits for PhagesDB BLAST all have relatively low e-values around e-26, but these phages have unknown functions for this gene. Similarly, NCBI BLAST has results for hypothetical proteins only. HHPRED and CDD have no relevant results. Therefore, this gene has no known function at the moment. /note=Transmembrane domains: No transmembrane domains are predicted by TMHMM or TOPCONS, therefore it is most likely not a membrane protein. /note=Secondary Annotator Name: Hernandez, Betania /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. *only need 2-3 evidence for phagesdb blast, leave synteny box blank since gene is NKF* CDS complement (39880 - 39978) /gene="61" /product="gp61" /function="membrane protein" /locus tag="PickleBack_61" /note=Genemark calls start at 39978 /note=SSC: 39978-39880 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein X823_gp32 [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 1.03828E-12 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.018, -5.165218170325335, no F: membrane protein SIF-BLAST: ,,[hypothetical protein X823_gp32 [Mycobacterium phage Conspiracy] ],,YP_008857997,100.0,1.03828E-12 SIF-HHPRED: SIF-Syn: This gene is present in phages Aragog (A5) and Bluefalcon (A5) however it is not identified as a membrane protein in those genomes. In phage Bonamassa (A5) this gene is identified as a transmembrane protein. /note=Primary Annotator Name: Olvera, Kevin /note=Auto-annotation:Only GeneMark at 39978 /note=Coding Potential:There is strong coding potential in the reverse direction in both /note=GeneMark Host and Self. /note=SD (Final) Score:-5.165- this is not the best score on PECAAN /note=Gap/overlap:+72 bp- although this gap produces the LORF, the length of this gene is 99 bp which is shorter of the typical 120 bp length minimum /note=Phamerator: this gene is part of pham 2483 as of 09/30/22. Phages Aragog (A5) and Bluefalcon (A5) also include this gene. Almost all other genes in this pham also have a length of 99 bp. /note=Starterator: Start 3 has the most MAs with it being called in 32 out of 33 non-draft genomes. Start 3 is present in PickleBack at position 39978. /note=Location call: Start 3 is a real gene with start at 39978 /note=Function call:Membrane protein- the top hit on HHPRED has a probability of 78.1, percent coverage of 71.875, and e-value of 3. Although this e-value is not ideal, this top hit was for a transmembrane protein which is consistent with the TMD’s predicted in TmHmm and SOSUI. /note=Transmembrane domains: There is one TMD predicted in TmHmm, Topcons, and SOSUI. /note=Secondary Annotator Name: Gleason, Zoe /note=Secondary Annotator QC: HHPRED result checked does not have significant e-value. Include detail on how TMHMM and TOPCONS have lead to this function call. CDS complement (39971 - 40054) /gene="62" /product="gp62" /function="membrane protein" /locus tag="PickleBack_62" /note= /note=SSC: 40054-39971 CP: yes SCS: neither ST: SS BLAST-Start: [hypothetical protein X823_gp31 [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 3.64707E-8 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.318, -3.935403931688939, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein X823_gp31 [Mycobacterium phage Conspiracy] ],,YP_008857998,100.0,3.64707E-8 SIF-HHPRED: SIF-Syn: Membrane protein, upstream gene is in Pham 2483, downstream is in Pham 51478, just like in phages Jovo, Lev2, and Cuco. /note=Primary Annotator Name: Hernandez, Betania /note=Auto-annotation: Both Glimmer and GeneMark do not predict a start site for this gene. /note=Coding Potential: Reasonable coding potential in this ORF is on the reverse strand only (found in both GeneMark Self and Host), indicating that this is a reverse gene. Chosen start site does include all of the coding potential. /note=SD (Final) Score: -3.935. This is the best final score on PECAAN. /note=Gap/overlap: -4 bp. Overlap of 4 suggests this gene is part of an operon. Gene length of 84 bp is shorter than the 120 bp expected of protein coding genes. However, Pham Maps shows that this gene length is found in other phages such as Jovo and Cuco. /note=Phamerator: pham not listed. Pham Maps shows that corresponding gene is found in phages Phlorence, Lev2, Jovo which are in pham 3148. PhagesDB shows that many similar genes are pham 3148. Genes in this pham belong to cluster A phages. /note=Starterator: N/A /note=Location call: Considering all of the evidence above, this is a real gene and has a start site at 40054 bp. /note=Function call: Membrane Protein. Multiple hits on Phagesdb BLAST had e-values < 2e-8 but had no known function. NCBI BLAST had strong hits of hypothetical proteins with e-values < 4e-16 but also had no known function. No significant hits for CDD and HHpred. TMHMM and TOPCONS each predict 1 TMD. /note=Transmembrane domains: TMHMM predicts just one TMD. TOPCONS also predicts one TMD. Based on this evidence this gene can be assumed to have a real TMD and is therefore a “membrane protein”. /note=Secondary Annotator Name: Stephenson, Juliet /note=Secondary Annotator QC: I have reviewed this gene and agree with the evidence and conclusions presented by the primary annotator. CDS complement (40051 - 40203) /gene="63" /product="gp63" /function="hypothetical protein" /locus tag="PickleBack_63" /note=Original Glimmer call @bp 40203 has strength 9.0; Genemark calls start at 40203 /note=SSC: 40203-40051 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein X823_gp30 [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 5.9243E-27 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.226, -6.343734332103373, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein X823_gp30 [Mycobacterium phage Conspiracy] ],,YP_008857999,100.0,5.9243E-27 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hernandez, Betania /note=Auto-annotation: Both Glimmer and GeneMark call the start at 40203 (ATG codon). /note=Coding Potential: GeneMark Host shows very little coding potential but reasonable coding potential in reverse strand shown by GeneMark Self. /note=SD (Final) Score: -6.344. Not best final score on PECAAN but overlooked since overlap suggests it is part of an operon. /note=Gap/overlap: -4 bp. Overlap of 4 bp suggests part of operon. Acceptable gene length of 153 bp with longest ORF. /note=Phamerator: Pham: 50104. Date: 10/6/2022. Conserved in AgentM, Airmid, and Aragog all from cluster A. /note=Starterator: Start site 33 in Starterator was manually annotated in 206/225 non-draft genes in this pham. Start 33 is 40203 in PickleBack. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on evidence above, this is a real gene. Most likely start site is 40203. /note=Function call: NKF. Multiple hits on phagesdb BLAST had e-values < 2e-21 but had no known function. NCBI BLAST had strong hits of hypothetical proteins with e-values < 1e-25 but also had no known function. No significant hits for CDD and HHpred. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs. /note=Secondary Annotator Name: Mehra, Muskaan /note=Secondary Annotator QC: All good! CDS complement (40200 - 40589) /gene="64" /product="gp64" /function="hypothetical protein" /locus tag="PickleBack_64" /note=Original Glimmer call @bp 40589 has strength 14.85; Genemark calls start at 40589 /note=SSC: 40589-40200 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein X823_gp29 [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 5.92166E-89 GAP: 8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.229, -2.45827804503836, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein X823_gp29 [Mycobacterium phage Conspiracy] ],,YP_008858000,100.0,5.92166E-89 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Gleason, Zoe /note=Auto-annotation: Both Glimmer and GeneMark call the start site as 40589. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -2.458. It is the best final score on PECAAN. /note=Gap/overlap: 8. Too short to contain another gene. /note=Phamerator: The pham number as of 9/30/22 is 50048. The gene is conserved in phages IronMan, Kazan, and Miramae all in the same cluster as Pickleback. There is a mix of different clusters in this pham, but about half are cluster A. /note=Starterator: This pham has 342 members and 20 drafts. 194 members call start site 102, which correlates to a start site of 40589 bp for Pickleback. This is the most often called start number in this pham and is called 93.6% of the time it is present. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 40589 bp. Starterator, Glimmer, and GeneMark all agree with each other. /note=Function call: No known function. Multiple phagesDB BLAST hits give the function as unknown with a 2e-68 e value (ex. ForGetIt and Conspiracy). One HHPred result has a significant e-value. It has a probability of 99.4%, a coverage of 52% and an e-value of 2.5e-11. The function is unknown for this result. The best result from NCBI BLAST is from multiple mycobacterium phages, which has a query cover of 100%, 100% identity, and an e value of 5.92e-89. It is listed as a hypothetical protein. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Doan, Pearl /note=Secondary Annotator QC: I have QC`ed the location and function call and agree with the annotator. CDS complement (40598 - 41404) /gene="65" /product="gp65" /function="DnaB-like dsDNA helicase" /locus tag="PickleBack_65" /note=Original Glimmer call @bp 41404 has strength 14.45; Genemark calls start at 41404 /note=SSC: 41404-40598 CP: yes SCS: both ST: SS BLAST-Start: [DnaB-like dsDNA helicase [Mycobacterium phage Conspiracy] ],,NCBI, q1:s1 100.0% 0.0 GAP: 42 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.739, -2.9754816289569264, no F: DnaB-like dsDNA helicase SIF-BLAST: ,,[DnaB-like dsDNA helicase [Mycobacterium phage Conspiracy] ],,YP_008858001,100.0,0.0 SIF-HHPRED: Replicative DNA helicase; Helicase, DnaB, Helicase loader protein, DnaC, Structural Genomics, REPLICATION, HYDROLASE; 3.1A {Escherichia coli},,,6KZA_B,94.7761,100.0 SIF-Syn: Dna-B-liike dsDNA helicase. There is synteny both upstream and downstream of the gene when compared to phage Conspiracy. The gene upstream of this gene belongs to pham 50048 and downstream belongs to pham 49998 in Pickleback and Consipiracy. /note=Primary Annotator Name: Mehra, Muskaan /note=Auto-annotation: Glimmer and GeneMark. Both call the same start site (41404) /note=Coding Potential:The coding potential for this ORF is on the reverse strand only, thus confirming that this is a reverse gene. Coding potential is found for this ORF for both GeneMark Self and Host-Trained formats. /note=SD (Final) Score: The best final score is -2.975 with a Z-score higher than 2 (2.739).This final score is for the start site 41404. /note=Gap/overlap: 42. This gap is not large enough to fit another gene (<100bp) and there is no coding potential observed here in both host trained and self-trained formats. When you compare this phage genome to other related genomes, this gap is conserved. /note=Phamerator: 9 (as of 10/06/2022). This pham has 937 members, 51 of which are drafts. The phages within this pham mostly belong to cluster A. /note=Starterator: Start site 52 (41404) is manually annotated in 578 phages. It is the most annotated start site, and is called 92.3% of the time it is present. The start codon is ATG. /note=Location call: This is a real gene with a start site at 41404. This is called by Glimmer, GeneMark and starterator. /note=Function call: DnaB-like dsDNA helicase. There is strong evidence from PhagesDB and NCBI Blast with high coverage (~98%) and low e-values (