CDS 1 - 438 /gene="1" /product="gp1" /function="terminase, small subunit" /locus tag="Bolt007_1" /note=Original Glimmer call @bp 1 has strength 14.76; Genemark calls start at 1 /note=SSC: 1-438 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_PRAIRIE_1 [Arthrobacter phage Prairie] ],,NCBI, q1:s1 100.0% 9.76641E-95 GAP: 0 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.018, -2.5052746077145835, yes F: terminase, small subunit SIF-BLAST: ,,[hypothetical protein SEA_PRAIRIE_1 [Arthrobacter phage Prairie] ],,QTF82098,98.6207,9.76641E-95 SIF-HHPRED: Small Terminase subunit; viral genome packaging motor, small terminase, Pseudomonas phage PaP3, DNA BINDING PROTEIN, VIRAL PROTEIN; 3.95A {Pseudomonas phage NV1},,,7JOQ_G,71.7241,98.2 SIF-Syn: There is synteny with the other cluster FH genomes. In the 4 other genomes this terminase-small subunit gene is followed downstream by a terminase-large subunit gene. /note=Primary Annotator Name: Fleming, Hanna /note=Auto-annotation: Both Glimmer and GeneMark call this gene and they agree on a start site at bp 1. /note=Coding Potential: There is coding potential within the putative ORF and the start site covers all of the coding potential. This coding potential appears in host-trained and self-trained GeneMark. /note=SD (Final) Score: The SD score is -2.505, this is the highest final score. This start site also had the highest Z-score at 3.018. /note=Gap/overlap: This is the first gene of the genome and it starts at bp 1 so there is no gap or overlap. /note=Phamerator: The Pham on 1/9/2022 was 55852. This pham is present in cluster FH phages Bumble, Klevy, Lilmac015, and Prairie. /note=Starterator: 24 of 42 genes in this pham call the same start site of bp 1 in Bolt007. This is not the most annotated start site but Bolt007 does not have the most annotated start. /note=Location call: Based on the collected evidence, this gene is a real gene with a start site at 1 bp. /note=Function call: terminase, small subunit. Phagesdb BLAST has two top hits with an e-value=7e-74 that have a function of terminase, small subunit. HHpred also had two hits with e value<0.00059 which had a function of terminase, small subunit. These hits had coverage>71%, and probability>97%. CDD did not have any hits and NCBI BLASTp had no relevant hits with known functions. /note=Transmembrane domains: None. Neither TmHmm or TOPCONS predict a transmembrane domain. /note=Secondary Annotator Name: Senthilvelan, Jayasuriya /note=Secondary Annotator QC: I agree with location/function call. Great work. CDS 428 - 1867 /gene="2" /product="gp2" /function="terminase, large subunit" /locus tag="Bolt007_2" /note=Original Glimmer call @bp 428 has strength 12.5; Genemark calls start at 428 /note=SSC: 428-1867 CP: yes SCS: both ST: SS BLAST-Start: [terminase large subunit [Arthrobacter phage Klevey]],,NCBI, q1:s1 100.0% 0.0 GAP: -11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.949, -5.331409440583776, no F: terminase, large subunit SIF-BLAST: ,,[terminase large subunit [Arthrobacter phage Klevey]],,UAW09361,96.8685,0.0 SIF-HHPRED: Large subunit terminase; large terminase, VIRAL PROTEIN; 2.2A {Deep-sea thermophilic phage D6E},,,5OE8_B,93.9457,100.0 SIF-Syn: In Bolt007, the upstream gene has a function of small subunit terminase, this gene has a function of being large subunit terminase, and the downstream gene has a function of portal protein. In Bumble, the upstream gene has no function listed, the conserved gene has a function of being a terminase, and the downstream gene is a portal protein. /note=Primary Annotator Name: Gonzalez, Celio /note=Auto-annotation:Genemark lists a start site of 428. Glimmer lists a start site of 428. The start codon for both is TTG. /note=Coding Potential: All coding potential is found within the bounds of 428 and 1,867 as seen in both Glimmer and Host-Trained Genemark as well as being in the forward direction. /note=SD (Final) Score:-5.331 is the final score which indicates a high likelihood that our SS is 428 /note=Gap/overlap:There`s an overlap of 11 base pairs but this is consistent with other phages such as Bumble, Klevey, Lilmac1015, Prairie so it`s not abnormal. /note=Phamerator: Pham 39494 on 1/7/2022. It is conserved in phage Bumble (FH) and phage Prairie (FH). /note=Starterator: Start site 7 in Starterator was manually annotated 4/47 in this pham. Start 7 is 428 in Bolt007. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Suggested start site is 428 due to a z score and final score with appropriate ranges that indicate high likelihood of start site 428. Additionally, it`s conserved in other phages such as Bumble, Klevey, Lilmac1015, Prairie. /note=Function call: terminase. Determined by using phagesDB BLASt and seeing how Klevey and Lilmac1015 (the phages with the highest similarity) had the same function as well as viewing HHPRED and seeing the protein 5OE8_B have the same function. /note=Transmembrane domains: Has no transmembrane domains which makes sense because it does not need to go through the membrane to carry out its function. /note=Secondary Annotator Name: Ma, Yiwen (Kristy) /note=Secondary Annotator QC: I agree with you location call and function call. However, for the function call, it would better to provide values such as e-values, identity percentage, coverage percentage.etc. Top three NCBI BLAST hits may be all considered as evidence. Overall, good job! Detailed and clear notes. CDS 1907 - 3382 /gene="3" /product="gp3" /function="portal protein" /locus tag="Bolt007_3" /note=Original Glimmer call @bp 1907 has strength 14.48; Genemark calls start at 1907 /note=SSC: 1907-3382 CP: yes SCS: both ST: SS BLAST-Start: [portal protein [Arthrobacter phage Klevey]],,NCBI, q1:s1 100.0% 0.0 GAP: 39 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.864, -3.6727816321281046, yes F: portal protein SIF-BLAST: ,,[portal protein [Arthrobacter phage Klevey]],,UAW09362,97.556,0.0 SIF-HHPRED: Portal protein; G20C, portal protein, bacteriophage, transport protein; 1.9A {Thermus phage P7426},,,5NGD_B,80.4481,100.0 SIF-Syn: Portal Protein. There is synteny because the upstream is a terminase large subunit of pham 39494 and down stream is a capsid maturation protease of pham 43301 just like in phage Klevey. /note=Sasha - Annotator checked evidence for 19 phages in Phagesdb Blast, narrowed that down to top 4 /note=Primary Annotator Name: Paek, Brian /note=Auto-annotation: Both Glimmer and GeneMark agree that the start site is 1907 with a start codon of ATG. /note=Coding Potential: There is high coding potential based on the last frame going in the forward direction within the gene range for both host-trained and self-trained GeneMark. /note=SD (Final) Score: The Final Score is -3.973, and the Z-score is 2.864, both of which are the best among other start site options. /note=Gap/overlap: There is a 39 bp gap which is reasonable because the previous and subsequent genes are all going on the forward strand. This start site produces the longest ORF of 1476 bp which is acceptable because it is consistent with the idea that the genes must be densely packed. /note=Phamerator: Pham: 83575. Date Analyzed: 01/07/2022. The gene is conserved in cluster AO2 and FH and found in phages Prairie, LilMac. Bumble and Klevey. /note=Starterator: Start site 24 is called in 37 out of 253 of the non-draft members in this pham, 19 of which were manually annotated. Start site 24 correlates to 3382 bp in Bolt007. /note=Location call: The gathered evidence suggests that this is a real gene and the most probable start site is at 1907. /note=Function call: Portal Protein. Multiple phagesdb BLAST hits have the portal protein function (E-value < 1E-121), and 3 out of 3 top NCBI BLAST hits also have the portal protein function. (> 100% coverage, 70%+ identity, and E-value <10^-170). /note=Transmembrane domains: Neither TMHMM or TOPCONS predicted any TMDs, suggesting that this gene is not a membrane protein. /note=Secondary Annotator Name: Wang, Jennifer Yiyang /note=Secondary Annotator QC: I agree with the location and functional call. I noticed that you checked a lot of of evidence boxes for phagesdb blast, not sure if you need that many. I think the top three hits will be enough, but correct me if I`m wrong. Also for Phamerator, the gene is not only found in phages Bumble and Klevey but also in Prairie and Lilmac1015, maybe include that. Great work overall! CDS 3372 - 5204 /gene="4" /product="gp4" /function="capsid maturation protease" /locus tag="Bolt007_4" /note=Original Glimmer call @bp 3372 has strength 14.56; Genemark calls start at 3372 /note=SSC: 3372-5204 CP: yes SCS: both ST: SS BLAST-Start: [capsid maturation protease [Arthrobacter phage Klevey]],,NCBI, q1:s1 97.541% 0.0 GAP: -11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.919, -5.3934175870462076, no F: capsid maturation protease SIF-BLAST: ,,[capsid maturation protease [Arthrobacter phage Klevey]],,UAW09363,89.604,0.0 SIF-HHPRED: Phage_min_cap2 ; Phage minor capsid protein 2,,,PF06152.14,14.0984,99.4 SIF-Syn: capsid maturation protease; upstream gene is portal protein, downstream gene is NKF. In Bumble, the upstream gene and the conserved gene are the same, and the downstream gene has no known function. /note=Primary Annotator Name: Rajiv, Subashni /note=Auto-annotation: Glimmer calls the start at 3372. Genemark calls the start at 3372. The start codon is ATG. /note=Coding Potential: The coding potential in this ORF is only in the forward strand, suggesting it is a forward gene. Coding potential is found in both GeneMark Host and GeneMark Self. /note=SD (Final) Score: The Final Score is -5.393 and the Z-score is 1.919. There are start sites with better Final Scores and Z-score, however they have large gaps. Other genes from phages in the same cluster call the start at 3372. /note=Gap/overlap: There is an overlap of 11bp. This is a small and normal overlap. /note=Phamerator: Pham 43301 on 1/7/2022. It is conserved in phage Bumble (FH) and phage Prairie (FH). /note=Starterator: Start site 5 in Starterator was manually annotated 29/49 in this pham. Start 5 is 3372 in Bolt007. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 3372. /note=Function call: The likely function is capsid maturation protease. PhagesDB’s two top hits predicted MuF-like minor capsid protein and capsid maturation protease function with e-values of 0, and identities of 88% and 85%, respectively. NCBI’s two top hits also predicted capsid maturation protease function with e-values of 0 and identities of 84% and 55%, respectively. The CDD database had no hits. HHpred had 2 hits suggesting both capsid maturation protease and MuF-like minor capsid protein function with e-values of 3.6e-11 and probabilities of 99.4% and 99.3%, respectively. MuF-like minor capsid protein function is no longer approved as a function for SEA-PHAGES. /note=Transmembrane domains: No transmembrane domains were called in TMHMM or TOPCONS. It is not a membrane protein. /note=Secondary Annotator Name: Whang, Allison /note=Secondary Annotator QC: Agree with start site and function call. You should add information about listed functions on phamerator within your notes. Also, not sure if the overlap indicates an operon since an operon overlap is usually only 4bp. Additionally according to the lab manual, this should be the format of the synteny box (Example: "Portal protein, upstream gene is terminase, downstream is capsid maturation protease, just like in phage XXX".) Maybe try to make synteny box more concise. Otherwise, looks good. CDS 5214 - 5522 /gene="5" /product="gp5" /function="hypothetical protein" /locus tag="Bolt007_5" /note=Original Glimmer call @bp 5214 has strength 15.28; Genemark calls start at 5214 /note=SSC: 5214-5522 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_KLEVEY_5 [Arthrobacter phage Klevey]],,NCBI, q1:s1 100.0% 3.58057E-46 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.244, -5.112587272376205, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KLEVEY_5 [Arthrobacter phage Klevey]],,UAW09364,94.1176,3.58057E-46 SIF-HHPRED: SIF-Syn: NFK, upstream gene is NKF (Pham 20169), downstream is capsid maturation protease (Pham 43301), similar to phage Bumble from the same cluster, FH. /note=Sasha Semaan/AF: Original function call was Imm-like superinfection immunity protein, but disagree with this. All other evidence suggests that this is gene has no known function, including that none of the 100 other members of this pham call any function at all. There is a SEA-PHAGES forum thread about this function. This gene does not have any TMDs like other genes that call this function. /note=Primary Annotator Name: Huq, Naveed /note=Auto-annotation: Glimmer and Genemark both agree on start site 5214, start codon is ATG /note=Coding Potential: Reasonable coding potential in putative ORF, covered by chosen start site /note=SD (Final) Score: The original start site of 5214 has a final score of -5.113 and a Z-score of 2.244. This start site does have the best Ribosome Binding Site score and the other starts are not better because the gap is much larger. /note=Gap/overlap: Gap of 9 with the upstream gene is reasonable and so is gene length /note=Phamerator: 95739 - 1/7/22. The pham my gene belongs to does present in other members of the cluster, FH. The phage that I used for comparison is Bumble. No function called. /note=Starterator: Conserved start site number 6, @5214, 26/79 other members of pham call same start site number /note=Location call: Real gene with most likely start site @5214, conserved in starterator /note=Function call: HHPRED hit: Imm-like superinfection immunity protein with 99.9 probability, 81.37% coverage and an E value of 4.2 x e^-22 /note=Transmembrane domains: No TMDs predicted /note=Secondary Annotator Name: Wright, Nicklas /note=Secondary Annotator QC: I agree with overall location call and function call, but there are several issues. Make sure to fill out the GM coding capacity box and mark genes as evidence for phagesdb BLAST. The start codon for this start site is not GTG and the gap is 9, not 6. I would also like to see more details in the phamerator section. CDS 5542 - 6699 /gene="6" /product="gp6" /function="scaffolding protein" /locus tag="Bolt007_6" /note=Original Glimmer call @bp 5542 has strength 12.41; Genemark calls start at 5542 /note=SSC: 5542-6699 CP: yes SCS: both ST: SS BLAST-Start: [scaffolding protein [Arthrobacter phage Abba] ],,NCBI, q10:s4 97.6623% 6.5921E-84 GAP: 19 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.599, -5.9375763810740265, no F: scaffolding protein SIF-BLAST: ,,[scaffolding protein [Arthrobacter phage Abba] ],,YP_009887272,62.9243,6.5921E-84 SIF-HHPRED: Phage_GPO ; Phage capsid scaffolding protein (GPO) serine peptidase,,,PF05929.14,35.0649,99.8 SIF-Syn: This gene has strong synteny with all annotated phages in the cluster. The line barely moves if that on pham maps so this gene is not only real but consistent with other phages in terms of positioning. Upstream there are portal proteins and terminases. Downstream there is major capsid proteins and head fibers. /note=Primary Annotator Name: Esparza, Pablo /note=Auto-annotation: Glimmer and Genemark agree it starts at start site @5542 F. /note=Coding Potential: There is coding potential that encompasses from the start site to the stop site. /note=SD (Final) Score: This gene does not have the best Z score or final score. However, it is still good scores. When taken in consideration with length and gap it seems like the best candidate choice. /note=Gap/overlap: The gap is 19 bp and it is a length of 1158 bp. It is the longest candidate and has the shortest gap. /note=Phamerator: This gene is a part of pham 20169 as of January 12, 2022. This is part of the cluster FH. The maps show similarities with all the other phages so it is conserved. /note=Starterator: The start number is 4 and it is not the most common start. However, start site @5542 has 4 MA. This seems like a reliable call. It has the same start number as 5 of 31 phages in the pham (about 16.1%). This pham has 31 members and 2 drafts. /note=Location call: This is a true gene and is with start site @5542 F. /note=Function call: This appears to be a scaffolding protein. While there is some good evidence for it to be a capsid maturation protease, there is a significant amount more evidence for it to be a scaffolding protein. CDD does not show any evidence. Phagesdb blast and NCBI blast have an overwhelming and significant amount of evidence for it being a scaffolding protein. HHpred has one evidence suggesting that it is a scaffolding protein. /note=Transmembrane domains: TMHMM and TOPCONS do not predict anything. /note=Secondary Annotator Name: Abuwarda, Manar /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. The synteny box has specific format to follow in the annotation manual. You should include data for function call (like e value, identity, etc). CDS 6714 - 7100 /gene="7" /product="gp7" /function="head fiber protein" /locus tag="Bolt007_7" /note=Original Glimmer call @bp 6714 has strength 6.37; Genemark calls start at 6714 /note=SSC: 6714-7100 CP: yes SCS: both ST: NI BLAST-Start: [head fiber protein [Arthrobacter phage Klevey]],,NCBI, q1:s1 100.0% 8.63179E-82 GAP: 14 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.616, -3.4259484787582526, yes F: head fiber protein SIF-BLAST: ,,[head fiber protein [Arthrobacter phage Klevey]],,UAW09366,99.2188,8.63179E-82 SIF-HHPRED: Capsid fiber protein; bacteriophage, phi29, prohead, VIRUS; HET: SO4; 1.8A {Bacillus phage phi29},,,6QYY_E,85.9375,99.7 SIF-Syn: Head fiber protein. upstream gene is scaffolding, downstream is a major capsid protein, just like in phage Klevey. /note=Primary Annotator Name: Melkote, Aditi /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 6714. /note=Coding Potential: Coding potential is on the forward strand only, indicating this is a forward gene; this ORF has good coding potential on both GeneMark S and GeneMark Host, with the start site 6714 including all of the coding potential. /note=SD (Final) Score: The final score is the best option at -3.426 and the z score is the highest at 2.616. /note=Gap/overlap: The gap is 14bp,and this gene appears to be conserved in other phages. /note=Phamerator: 94937. Date 01/06/2022. It is conserved; found in Bumble (FH) and Klevey (FH) /note=Starterator: Start site 5 (6714) called in 4/4 non-draft Cluster FH phages, to which Bolt007 belongs. /note=Location call: Given the evidence, this is a real gene with start site at 6714bp. /note=Function call: Head fiber protein. The top three phagesdb BLAST hits have the function of head fiber protein (E-value <10^-65), and 2 out of 5 top NCBI BLAST hits also have the function of head fiber protein. (100% coverage, 96%+ identity, and E-value <10^-81). HHpred had two hits for capsid fiber protein (alternative name for head fiber protein) with 99.6% and 99.7% probability respectively, 85.9% coverage for both, and E-value <10^-14. CDD had no relevant hits. /note=Transmembrane domains: No results from TMHMM or TOPCONS, suggesting this is not a membrane protein. /note=Secondary Annotator Name: Batteikh, Maysaa /note=Secondary Annotator QC: I agree with this call. Make sure to look at phamerator and starterator, they have been updated. Also, check NCBI blast as evidence. Update: Have added starterator and phamerator evidence and checked NCBI BLASTp evidence as well. CDS 7115 - 7996 /gene="8" /product="gp8" /function="major capsid protein" /locus tag="Bolt007_8" /note=Original Glimmer call @bp 7115 has strength 17.37; Genemark calls start at 7115 /note=SSC: 7115-7996 CP: yes SCS: both ST: SS BLAST-Start: [major capsid protein [Arthrobacter phage Klevey]],,NCBI, q1:s1 100.0% 0.0 GAP: 14 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.352, -3.9776535502172963, yes F: major capsid protein SIF-BLAST: ,,[major capsid protein [Arthrobacter phage Klevey]],,UAW09367,95.5932,0.0 SIF-HHPRED: d.183.1.1 (A:104-383) Major capsid protein gp5 {Bacteriophage HK97 [TaxId: 37554]},,,d2fsya1,92.8328,99.9 SIF-Syn: Major capsid protein, upstream gene is head fiber protein, downstream is NFK, The upstream and downstream genes are NFK in bumble but the gene is a major capsid protein similar to Bolt007. /note=Primary Annotator Name: Niazmandi, Kiana /note=Auto-annotation: both glimmer and genemark agree on the start site at 7115 /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host, and the chose start site covers all the coding potential. /note=SD (Final) Score: the final score is -3.978 which is the best score on PECAAN. /note=Gap/overlap: there is a gap of 14 bp upstream of the start site which is reasonable. There is no coding potential in the gap, and the length of the gene is acceptable. /note=Phamerator: pham: 74109. Date 01/14/22. It is conserved; found in Bumble (FH) and Kelvey (FH). /note=Starterator: Start site 23 in Starterator was manually annotated in 76/143 non-draft genes in this pham. Start 22 is 7115 in Bolt007. We should consider that the start number is not the most conserved number. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 7115. /note=Function call: major capsid protein. The top three phagesdb BLAST hits have the function of major capsid protein (E-value <10^-22) and the top two NCBI BLAST hits also have the function of major capsid protein. (100% coverage, 90%+ identity, and E-value = 0). HHpred had a hit for major capsid protein and E-value of 1e-151, and the phages are from cluster FH. CDD had no relevant hits. /note=Transmembrane domains: there are 0 transmembrane domains that are reasonable for the function that was found. /note=Secondary Annotator Name: Kamarzar, Minehli /note=Secondary Annotator QC: I agree with this location and functional call. Don`t forget to mention if the chosen start site covers all the coding potential in the section about coding potential. Also, in the section for gap/overlap, mention if the length of the gene is acceptable given the auto-annotated start site. For transmembrane proteins, state that TMHMM and TOPCONS did not predict any TMDs and don`t use first person. Lastly, in the synteny box make sure to include the pham numbers for the genes that had NKF. I would also check off Bumble as extra evidence in Phagesdb BLAST. Good work! CDS 7996 - 8418 /gene="9" /product="gp9" /function="hypothetical protein" /locus tag="Bolt007_9" /note=Original Glimmer call @bp 7996 has strength 17.45; Genemark calls start at 7996 /note=SSC: 7996-8418 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_PRAIRIE_9 [Arthrobacter phage Prairie]],,NCBI, q1:s1 100.0% 1.30242E-79 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.126, -4.371954607757725, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_PRAIRIE_9 [Arthrobacter phage Prairie]],,QTF82106,90.7143,1.30242E-79 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Senthilvelan, Jayasuriya /note=Auto-annotation: Both GeneMark and Glimmer say 7996 for the start site (GTG). /note=Coding Potential: There is coding potential within the putative ORF. The entire coding potential is covered by the proposed start site. /note=SD (Final) Score: -4.372 is the best possible score and it is for the start site 7996. /note=Gap/overlap: -1 bp with start site 7996. This is a very favorable gap. 7996 start site is also the LORF, which adds to the evidence that it is the correct start site. /note=Phamerator: Gene is found in Pham 95598 as of 01/08/2022. This pham is in all members of cluster FH (conserved). No function is given on the pham page of phagesdb. /note=Starterator: Most annotated start is 3 (23/31 call it), but this start is not called in my gene. Start site 8 (7996) and 32 (8359) are called. 8 is more conserved (6/31), especially among phages in cluster FH. Start 32 is not called anywhere else. Hence, 7996 is the best start site. /note=Location call: Based on above evidence, the gene is real and most likely starts at 7996. /note=Function call: Phagesdb BLAST and NCBI BLAST agree on this being a protein of unknown function. Phagesdb: there are both phages in the same cluster and in cluster FL that suggest NKF (e < 1E-10). NCBI BLAST: Hypothetical protein; hits from phages Praire and Klevey with 87% ident, e < 1E-70. However, phagesdb function frequency suggests that this might be a major capsid and protease fusion protein based on subcluster AN (frequency 44%). But, there is no other evidence to back this up, so this gene will be marked as unknown function, tentatively. CDD yielded no hits. None of the HHPred hits have E-values that are <10^-3 (no significant hits). /note=Transmembrane domains: Neither TmHmm or Topcons predicts any TMHs. No evidence to suggest this gene product is associated with the membrane. /note=Secondary Annotator Name: Krug, Kelley /note=Secondary Annotator QC: I agree with this annotation and the location/function calls. CDS 8418 - 8918 /gene="10" /product="gp10" /function="hypothetical protein" /locus tag="Bolt007_10" /note=Original Glimmer call @bp 8418 has strength 15.9; Genemark calls start at 8418 /note=SSC: 8418-8918 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_KLEVEY_10 [Arthrobacter phage Klevey]],,NCBI, q1:s1 100.0% 3.41382E-97 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.385, -4.118950045394365, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KLEVEY_10 [Arthrobacter phage Klevey]],,UAW09369,98.1928,3.41382E-97 SIF-HHPRED: SIF-Syn: The function is NKF (14863), upstream gene is NKF, downstream is head-to-tail adaptor, just like in phage Klevey, Bumble, Prairie and Lilmac1015. /note=Primary Annotator Name: Ma, Yiwen (Kristy) /note=Auto-annotation: GeneMark and Glimmer all agree on the same start site, which is 8418. The start codon is GTG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. There is coding potential predicted by Host-trained GeneMark and Self-trained GeneMark and the chosen start site includes all of the coding potential in both Host-trained GeneMark and Self-trained GeneMark. /note=SD (Final) Score: The Final Score is the best option at -4.119. The Z-score is 2.385, which is significant. /note=Gap/overlap: There is a 1bp overlap with upstream gene, which is evidence of an operon. /note=Phamerator: Pham: 14863. Date: 1/07/2022. It is conserved; found in 50 phages, 3 of them are drafts. /note=Starterator: Start 5 called the most often in the published annotations, it was called in 24 of the 47 non-draft genes in the pham. Start 5 is 8418 in Bolt007. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 8418. /note=Function call: NKF. The top two non-draft phagesdb BLAST hits indicate that the function is unknown ( Lilmac1015_10 : E-value = 9e-87; Klevey_10 : E-value = 2e-86), and the top two NCBI BLAST hits suggest a hypothetical protein, which is the same meaning of unknown function (UAW09369: 100% coverage, 96.39% identity, and E-value = 3e-97; QTF82107: 100% coverage, 91.57% identity, and E-value = 7e-89). The top hit that has a region also suggests a hypothetical protein (96.4% coverage, 57.76% identity, and E-value = 5e-47). HHpred had no significant hits. CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: FLEMING, HANNA /note=Secondary Annotator QC: I agree with your location call and your function call. Don`t forget to fill out the synteny box. CDS 8922 - 9386 /gene="11" /product="gp11" /function="hypothetical protein" /locus tag="Bolt007_11" /note=Original Glimmer call @bp 8922 has strength 10.03; Genemark calls start at 8922 /note=SSC: 8922-9386 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_PRAIRIE_11 [Arthrobacter phage Prairie] ],,NCBI, q1:s1 100.0% 5.48833E-107 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.922, -4.797923340030006, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_PRAIRIE_11 [Arthrobacter phage Prairie] ],,QTF82108,99.3506,5.48833E-107 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Whang, Allison /note=Auto-annotation: Glimmer and Genemark both agree on a start site of 8922. The start codon ATG is called. /note=Coding Potential: Via both host and self trained coding potential maps, there seems to be coding potential within the entire ORF given between start site 8922 and stop site 9386. /note=SD (Final) Score: -4.798. While this is not the best SD score (the best being that of start site 9189 with an SD score of -4.837), the start site 9189 does not encompass the most coding potential within the ORF, whereas the start site 8922 does. An SD score of -4.798 is still very reasonable because it is more negative than -2. /note=Gap/overlap: There is a 4 bp gap with the upstream gene, which stops at position 8918 whereas this gene starts at 8922. This is a reasonable gap as it is very small. /note=Phamerator: Information collected on 1/14/2022. The gene is found in pham 33172. The only other phages within cluster FH that also had genes within this pham were Lilmac1015, Bumble, and Prairie. The genes that had function calls within this pham had functions that corresponded to minor tail protein and head-to-tail adaptor. /note=Starterator: Information collected on 1/14/2022. Start site 9 was the most annotated start site for the genes that are in this pham, called for 35/56 (62.5%). For this particular gene, the corresponding start number to 9 is 8922. This is the same start site that was agreed upon by Glimmer and Genemark. /note=Location call: This gene seems like a real gene because start site 8922 covers all coding potential within the ORF, and that Glimmer and Genemark agree on this start site. /note=Function call: Multiple hits using PhagesDB blast (e-values 2e-48 and 3e-48 respectively) indicate the function of similar genes to be a head-to-tail adaptor and/or more generally, a minor tail protein. The top NCBI blast results, with high query coverage (>98%), extremely low e-values (<1e-57), and moderate % identity (~57%), indicate similar results to PhagesDB blast; that the function could either be a minor tail protein or a head-to-tail connector. However, SEA-PHAGES has stated that for a gene to have the function of head-to-tail adaptor, certain HHpred hits must be present. A potential matching HHpred hit that was observed for this gene was HK97-gp10, but the SEA-PHAGES function call spreadsheet indicates that the HHpred hit must be HK97-gp6. Ultimately calling NKF. /note=Transmembrane domains: No transmembrane domains indicated by TMHMM or TOPCONS. CDS 9386 - 9970 /gene="12" /product="gp12" /function="hypothetical protein" /locus tag="Bolt007_12" /note=Original Glimmer call @bp 9386 has strength 16.87; Genemark calls start at 9386 /note=SSC: 9386-9970 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_KLEVEY_12 [Arthrobacter phage Klevey]],,NCBI, q1:s1 100.0% 3.92503E-128 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.123, -4.455662434380964, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KLEVEY_12 [Arthrobacter phage Klevey]],,UAW09371,97.4227,3.92503E-128 SIF-HHPRED: SIF-Syn: NKF, upstream gene is head to tail adaptor (33172), downstream is also NKF (95155), just like in phage Klevey. /note=Primary Annotator Name: Wang, Jennifer Yiyang /note=Auto-annotation: Glimmer and Genemark both agree on a start site of 9386. The start codon ATG is called. /note=Coding Potential: Via both host and self trained coding potential maps, there seems to be coding potential within the entire ORF given between start site 9386 and stop site 9970. /note=SD (Final) Score: -4.456 for start site 9386. -4.456 for start site 9386. It is not the best final score on PECAAN, but has the smallest/no gap. /note=Gap/overlap: 1bp gap with upstream gene. The start site 9386 encompasses the largest ORF and does not create a large gap with the upstream gene nor create too much of an overlap. /note=Phamerator: Pham: 2644. Date 01/08/22. It is conserved; found in Bumble(FH), Klevey(FH), Lilmac1015(FH) and Prairie(FH) which are within the same cluster as Bolt007, as well as in other 45 members from other clusters. There is no function called for the gene. /note=Starterator: Start site 4 in Starterator was manually annotated in 18/47 non-draft genes in this pham. Start 4 is 9386 in Bolt007. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 9386 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: Unknown function for the gene. Most of the Phagesdb BLAST top hits state “function unknown” while the second top one states “tail terminator”. However, most NCBI BLAST top hits state “function unknown” for the gene as well and there is no other evidence for function “tail terminator”, therefore it’s not informative. NKF, there is no hit for CDD and no good hit for HHpred (high e-value and non-informative functions). /note=Transmembrane domains: No TMD`s called and no evidence suggesting TMD function within the other databases, neither TMHMM nor TOPCONS calls it. /note=Secondary Annotator Name: Paek, Brian /note=Secondary Annotator QC: I agree with the evidence provided. The overlap could be explained on why it is reasonable and maybe check out LilMac and see if the tail terminator could be a reasonable function. CDS 10029 - 10229 /gene="13" /product="gp13" /function="hypothetical protein" /locus tag="Bolt007_13" /note=Original Glimmer call @bp 10029 has strength 9.35; Genemark calls start at 10074 /note=SSC: 10029-10229 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_PRAIRIE_13 [Arthrobacter phage Prairie]],,NCBI, q1:s1 100.0% 1.03005E-31 GAP: 58 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.39, -4.66341334897022, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_PRAIRIE_13 [Arthrobacter phage Prairie]],,QTF82110,91.1765,1.03005E-31 SIF-HHPRED: SIF-Syn: This gene is in pham 95155, the upstream gene is in pham 2644, and the downstream gene is in pham 19983, just like in phage Prairie. /note=Primary Annotator Name: Wright, Nicklas /note=Auto-annotation: Genemark lists a start site of 10074. Glimmer lists a start site of 10029. The start codon for both is ATG. /note=Coding Potential: The gene has coding potential in the forward direction and both start sites include all of the coding potential. /note=SD (Final) Score: Start site 10074 has a final score of -6.381, which is good. However, the best final score belongs to start site 10029, with -4.663. /note=Gap/overlap: Start site 10074 has a gap of 103 and start site 10029 has a gap of 58. /note=Phamerator: Pham 95155 as of 1/9/2022. This pham is in 28 phages, including all 5 phages of cluster FH, which includes Bolt007. All other phages with this pham are in cluster AO. /note=Starterator: Start site 5 is highly conserved among members of this pham. It corresponds to position 10029. 25 out of 26 non-draft annotations call this start site. /note=Location call: This is likely a real gene with start site 10029. This is the superior start site because it is the longest ORF and has the best final score and the smallest gap. /note=Function call: BLASTp, CDD, and HHpred are all uninformative or lacking hits, therefore, this gene has no known function. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Rajiv, Subashni /note=Secondary Annotator QC: Everything looks good, just some minor comments.You may want to mention if there is coding potential in the gap. NCBI Blast results below appear to have significant hits. For the synteny box it may be helpful to include the function of the genes in both Bolt007 and Prairie. CDS 10226 - 11680 /gene="14" /product="gp14" /function="tail sheath protein" /locus tag="Bolt007_14" /note=Original Glimmer call @bp 10226 has strength 20.37; Genemark calls start at 10226 /note=SSC: 10226-11680 CP: yes SCS: both ST: SS BLAST-Start: [tail sheath protein [Arthrobacter phage Klevey]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.546, -3.4932654573164204, yes F: tail sheath protein SIF-BLAST: ,,[tail sheath protein [Arthrobacter phage Klevey]],,UAW09373,96.6942,0.0 SIF-HHPRED: Tail sheath protein Gp18; bacteriophage T4, phage tail terminator protein, phage sheath protein, VIRAL PROTEIN; 15.0A {Enterobacteria phage T4},,,3J2M_Y,99.5868,100.0 SIF-Syn: Tail sheath protein, upstream gene is in pham 95155, downstream gene is tail tube protein, just like in phage Klevey. /note=Primary Annotator Name: Abuwarda, Manar /note=Auto-annotation: Glimmer and Genemark. Both call the start site at 10226. /note=Coding Potential: The ORF has reasonable coding potential. Coding potential is found in both Genemark Self and Host. The chosen start site includes all the coding potential. /note=SD (Final) Score: -3.493. This is the best final score on PECAAN /note=Gap/overlap: - 4 bp. There is overlap between this gene and the upstream gene, however, a 4 bp overlap is typical of a gene found in an operon. The 4 bp overlap is ATGA which signifies the start codon of this gene overlapping with the stop codon of the upstream gene. /note=Phamerator: Pham 19983. Date 1/8/22. It is conserved and found in Lilmac1015 (FH). /note=Starterator: Start site 1 in Starterator was manually annotated in 51/51 non-draft genes in this pham. Start 1 is 10226 in Bolt007. This evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene with the most likely start site at 10226. /note=Function call: Tail Sheath Protein. The top 4 phagesdb BLAST hits have the function of tail sheath protein (E-value = 0) and the top 3 NCBI BLAST hits also have the function of tail sheath protein (100% coverage, 72%+ identity, E-value = 0). According to CDD, part of Phage_sheath_1 superfamily with phage tail sheath protein subtilisin-like domain (E-value 2e-14). HHPred had a hit for tail sheath protein with 100% probability, 99.6% coverage, and E-value of 0. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Huq, Naveed /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 11701 - 12105 /gene="15" /product="gp15" /function="tail tube protein" /locus tag="Bolt007_15" /note=Original Glimmer call @bp 11701 has strength 13.84; Genemark calls start at 11701 /note=SSC: 11701-12105 CP: yes SCS: both ST: SS BLAST-Start: [tail tube protein [Arthrobacter phage Prairie] ],,NCBI, q1:s1 100.0% 1.45638E-91 GAP: 20 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.77, -4.0637723674875, yes F: tail tube protein SIF-BLAST: ,,[tail tube protein [Arthrobacter phage Prairie] ],,QTF82112,99.2537,1.45638E-91 SIF-HHPRED: Phage_T4_gp19 ; T4-like virus tail tube protein gp19,,,PF06841.15,97.0149,100.0 SIF-Syn: Synteny is present with the upstream gene and downstream gene with bumble. The upstream genes are in pham 54537, while downstream is in pham 19983 /note=Primary Annotator Name: Batteikh, Maysaa /note=Auto-annotation: Glimmer and Gene mark both called 11701 as the start site /note=Coding Potential: The ORF has coding potential which is seen in the host and self genemark. The chosen start site, 11701 has all the coding potential for the gene. /note=SD (Final) Score: Final score of -4.064 is a reasonable option with the higher z-score of 2.77. /note=Gap/overlap: There is a 20bp gap between the previous and the upstream gene, which is reasonable for the start site. /note=Phamerator: As of 1/9/2022, this gene belongs to pham 2049, which is conserved and found in 56 other genes, 53 of which are non drafts, and 4 are from the same cluster, FH. Those phages include Bumble, Klevey, Prairie, and Lilmac1015. /note=Starterator: The most annotated start site in this pham is 1, and was called by 24/51 non draft genes. Start 1 does not correlate to Bolt007 start at 11701. Start 4 corresponds to 11701, and is the same start site for the other 4 non draft genes that are present in this pham that belong to cluster FH. /note=Location call: The evidence presented by the coding potential and the host trained as well as self trained gene mark provide evidence that the start site of 11701 is the most reasonable start site with a decent gap. Starterator agrees with glimmer and genemark on the start site at 11701. /note=Function call: Tail Tube Protein. All hits on Phagesdb BLAST present the function of the tail tube protein with low e-values (e-value <1e-40). Similar results were seen in the NCBI Blast, with most of the hits being for that function with low e-values (90%) and 100% query coverage. HHpred had a hit for tail assembly chaperone protein with 99.8% probability, 76.97% coverage, and an e-value of 1.4e-17. CDD had no relevant hits. /note=Transmembrane domains: TMHMM and TOPCONS did not predict any TMDs, which indicates that it is not a membrane protein. /note=Secondary Annotator Name: Melkote, Aditi /note=Secondary Annotator QC: I agree with this call. CDS 12739 - 12972 /gene="17" /product="gp17" /function="hypothetical protein" /locus tag="Bolt007_17" /note=Original Glimmer call @bp 12739 has strength 4.21 /note=SSC: 12739-12972 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein SEA_PRAIRIE_17 [Arthrobacter phage Prairie] ],,NCBI, q1:s1 100.0% 5.04886E-44 GAP: -43 bp gap LO: no RBS: Kibler 6, Karlin Medium, 0.595, -7.508518214240948, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_PRAIRIE_17 [Arthrobacter phage Prairie] ],,QTF82114,100.0,5.04886E-44 SIF-HHPRED: SIF-Syn: NKF gene (pham 1619), upstream gene is tail assembly chaperone, downstream is tape measure protein, like in phage Klevey. /note=Primary Annotator Name: Krug, Kelley /note=Auto-annotation: Glimmer start site 12739, none listed for GeneMark /note=Coding Potential: coding potential in 1st frame appears strongest with start site being 12676 with GTG codon and ending at 12972. Has LORF. Start site encompasses the full coding potential of the gene. Supported by both GeneMark self and host. /note=SD (Final) Score: -5.944 and it is the best RBS final score @ 12676 /note=Gap/overlap: -106 bp gap, big overlap; smaller than in other phages in same cluster /note=Phamerator: Pham 1619 as of 1/8/22. This pham is conserved in Bumble, Klevey, Prairie. /note=Starterator: Start site number 2 called 100% when present in other phages (4/4). /note=Function call: No hits on CDD, no significant hits on HHPRED, NCBI Blast had a good hit with a phage in the same cluster (94.8% identity, 100% aligned, 78.6% coverage, 8.3e-43) but it was a hypothetical protein. Thus, going with NKF. /note=Transmembrane domains: TMHMM and TOPCONS showed no TMDs /note=Secondary Annotator Name: Niazmandi, Kiana /note=Secondary Annotator QC: I agree! there is evidence for the strat site being at 12676 because only Glimmer shows that the start site is at 12739, while the rest of the evidence emphasizes on 12676. CDS 12985 - 15567 /gene="18" /product="gp18" /function="tape measure protein" /locus tag="Bolt007_18" /note=Original Glimmer call @bp 12985 has strength 20.54; Genemark calls start at 12985 /note=SSC: 12985-15567 CP: yes SCS: both ST: SS BLAST-Start: [tape measure protein [Arthrobacter phage Klevey]],,NCBI, q1:s1 100.0% 0.0 GAP: 12 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.033, -4.582905859694018, no F: tape measure protein SIF-BLAST: ,,[tape measure protein [Arthrobacter phage Klevey]],,UAW09376,94.0698,0.0 SIF-HHPRED: Tape Measure Protein, gp57; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_AF,84.186,99.3 SIF-Syn: This displays synteny with other phage from cluster FH with a gene of NKF from pham 1619 upstream, a matching tape measure protein, and a downstream LysM-like peptidoglycan binding protein. /note=Sasha - Removed evidence checked in HHpred due to low coverage % (between 2-6%). Checked another piece of evidence with appropriate coverage (84%), e-value (3.2e^-5), and probability (99.3%). Function call remains the same. /note=Primary Annotator Name: Fleming, Hanna /note=Auto-annotation: Glimmer and GeneMark call this gene with a start site at 12985 bp. /note=Coding Potential: There is coding potential on GeneMark and GeneMarkS between 12985 and 15567 bp. The start site covers all of this coding potential. /note=SD (Final) Score: The final score is -4.583, there is one final score higher than this but that score results in a very large gap between genes. /note=Gap/overlap: There is a gap of 12 in front of this gene. This is a reasonable gap and this same gap exists in other cluster FH genomes such as Klevy. /note=Phamerator: This gene is in pham 95692 as of 1/9/2022, there were 28 members of this pham. This pham is present also in phages Bumble, Klevey, Lilmac1015, and Prairie. /note=Starterator: The most annotated start is called which is start site 1 and 12985 bp in Bolt007 and is called in 25/26 non-draft genes in this pham. /note=Location call: Based on the collected evidence, this is a real gene with a start site at 12985 bp. /note=Function call: Tape Measure protein. There are multiple hits on phagesdb and NCBI BLAST with e-values of 0 suggesting that the function is tape measure protein. HHPRED also has hits with e-values<7.7e-9 and probabilities>99.7% also with a function of tape measure protein. CDD did have hits but was uninformative because the hits had unknown function. /note=Transmembrane domains: Yes. TmHmm predicts 6 transmembrane domains. TOPCONS also predicts a transmembrane domain. This makes sense given Tape Measure Protein is a transmembrane protein. /note=Secondary Annotator Name: Batteikh, Maysaa /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator based on the evidence provided. Note- State the most annotated start for the pham is 1 for starterator. Also, check the NCBI blast with hits less than e-6 (on the second and third page) Also, pages 3 and 4 for phagesDB Blast indicate hits with very low e-values for the function that should be checked. CDS 15564 - 16226 /gene="19" /product="gp19" /function="LysM-like peptidoglycan binding protein" /locus tag="Bolt007_19" /note=Original Glimmer call @bp 15564 has strength 13.51; Genemark calls start at 15618 /note=SSC: 15564-16226 CP: yes SCS: both-gl ST: SS BLAST-Start: [LysM-like peptidoglycan binding protein [Arthrobacter phage Klevey]],,NCBI, q1:s1 100.0% 2.96962E-138 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.098, -2.627458653341447, yes F: LysM-like peptidoglycan binding protein SIF-BLAST: ,,[LysM-like peptidoglycan binding protein [Arthrobacter phage Klevey]],,UAW09377,99.0909,2.96962E-138 SIF-HHPRED: SIF-Syn: In Bolt007, the upstream gene has a function of tape measure protein , this gene has a function of being LysM-like peptidoglycan binding protein, and the downstream gene has a function of minor tail protein. In Prairie,the upstream gene has a function of tape measure protein , the conserved gene has a function of being LysM-like peptidoglycan binding protein, and the downstream gene has a function of minor tail protein. /note=Primary Annotator Name: Gonzalez, Celio /note=Auto-annotation: Genemark lists a start site of 15618. Glimmer lists a start site of 15564. The start codon for both is ATG. /note=Coding Potential: All coding potential is found within the bounds of 15564 and 16226 as seen in both Glimmer and Host-Trained Genemark as well as being in the forward direction. /note=SD (Final) Score:-2.627 is the final score which indicates a high likelihood that our SS is 15564 /note=Gap/overlap:There`s an overlap of 4 base pairs but this is consistent with other phages that have the overlap such as Bumble, Klevey, Lilmac1015, Prairie so it`s not abnormal. /note=Phamerator: Pham 56305 on 1/7/2022. It is conserved in phage Bumble (FH) and phage Prairie (FH). /note=Starterator: Start site 2 in Starterator was manually annotated 4/48 in this pham (all 4 are other FH phages). Start 2 is 15564 in Bolt007. This evidence agrees with the site predicted by Glimmer. /note=Location call: Suggested start site is 15,564 due to a z score and final score with appropriate ranges that indicate high likelihood of start site 15564. /note=Function call: LysM-like peptidoglycan binding protein. Determined by using phagesDB BLASt and seeing how Klevey and Lilmac1015 (the phages with the highest similarity) had the same function as well as viewing CDD and seeing the domain PRK11198 have the same function. Additionally, NCBIBlast ascension UAW09377 had the same function as decribed above. /note=Transmembrane domains: Has no transmembrane domains which makes sense because it does not need to go through the membrane to carry out its function. /note=Secondary Annotator Name: Senthilvelan, Jayasuriya /note=Secondary Annotator QC: I agree with location call. I think you could select the second best CDD hit as evidence as well. Also select QKY79785 from NCBI blast. Other than that, you`re set. Nice work!! CDS 16226 - 17575 /gene="20" /product="gp20" /function="minor tail protein" /locus tag="Bolt007_20" /note=Original Glimmer call @bp 16226 has strength 12.25; Genemark calls start at 16226 /note=SSC: 16226-17575 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage Prairie]],,NCBI, q1:s1 99.7773% 0.0 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.062, -4.442181083645917, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Prairie]],,QTF82117,90.989,0.0 SIF-HHPRED: Tail protein, 43 kDa; tail protein, structural genomics, PSI, MCSG, Protein Structure Initiative, Midwest Center for Structural Genomics, UNKNOWN FUNCTION; 2.1A {Neisseria meningitidis MC58} SCOP: b.106.1.1,,,3D37_A,64.3653,99.9 SIF-Syn: Minor Tail Protein. Upstream is a LysM-like Peptidoglycan binding protein of Pham 56305 and downstream is NKF of pham 14649 similar to phage Klevey. /note=Primary Annotator Name: Paek, Brian /note=Auto-annotation: Both Glimmer and GeneMark agree that the start site is 16226 with a start codon of ATG. /note=Coding Potential: There is high coding potential based on the first frame going in the forward direction within the gene range for both host-trained and self-trained GeneMark. /note=SD (Final) Score: The Final Score is -4.442, and the Z-score is 2.062, both of which are the best among other start site options. /note=Gap/overlap: There is a 1 bp overlap which is reasonable because it is all going on the forward strand. This start site produces the longest ORF of 1350 bp which is acceptable because it is consistent with the idea that the genes must be densely packed. /note=Phamerator: Pham: 2994. Date Analyzed: 01/07/2022. The gene is conserved in FH and found in phages Bumble and Klevey. /note=Starterator: Start site 8 is called in 6 out of 50 of the non-draft members in this pham, 47 of which were manually annotated. Start site 24 correlates to 16226 bp in Bolt007. /note=Location call: The gathered evidence suggests that this is a real gene and the most probable start site is at 16226. /note=Function call: Minor Tail Protein. Multiple phagesdb BLAST hits have the minor tail protein function (E-value < 1E-140), three HHPRED have tail protein function ( >63% coverage and E-Value < 1e-19) and 5 out of 5 top NCBI BLAST hits also have the minor tail protein function. (> 99% coverage, 55%+ identity, and E-value <10^-166). /note=Transmembrane domains: Neither TMHMM or TOPCONS predicted any TMDs, suggesting that this gene is not a membrane protein. /note=Secondary Annotator Name: Ma, Yiwen (Kristy) /note=Secondary Annotator QC: I agree with your location call and function call. Overall, good work! The notes is concise and clear. It would be better to add information about HHpred since top hits are checked as evidence. CDS 17572 - 17976 /gene="21" /product="gp21" /function="hypothetical protein" /locus tag="Bolt007_21" /note=Original Glimmer call @bp 17572 has strength 20.45; Genemark calls start at 17572 /note=SSC: 17572-17976 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_PRAIRIE_21 [Arthrobacter phage Prairie]],,NCBI, q1:s1 100.0% 5.32195E-72 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.874, -4.9157003279346805, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_PRAIRIE_21 [Arthrobacter phage Prairie]],,QTF82118,98.5075,5.32195E-72 SIF-HHPRED: SIF-Syn: NKF (pham 2994); The upstream gene is a minor tail protein (pham 2994) and the downstream gene is a receptor binding protein (pham 8373). In Prairie, the upstream gene has function of minor tail protein and the conserved gene and the downstream gene have unknown function. /note=Primary Annotator Name: Rajiv, Subashni /note=Auto-annotation: Glimmer calls the start at 17572. Genemark calls the start at 17572. The start codon is GTG. /note=Coding Potential: The coding potential in this ORF is only in the forward strand, suggesting it is a forward gene. Coding potential is found in both GeneMark Host and GeneMark Self. /note=SD (Final) Score: The Final Score is -4.916 and the Z-score is 1.874. There are start sites with better Final Scores and Z-score, however they have large gaps. Other genes from phages in the same cluster call the start near this call. /note=Gap/overlap: There is an overlap of 4 bp. This is a small and normal overlap, which is evidence of an operon. /note=Phamerator: Pham 14649 on 1/7/2022. It is conserved in phage Bumble (FH), phage Klevey (FH), Lilmac1015 (FH), and phage Prairie (FH). /note=Starterator: Start site 25 in Starterator was found in 5/50 of genes in this pham. It was manually annotated 4 times for cluster FH. Start 25 is 17572 in Bolt007. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 17572. /note=Function call: The likely function is unknown. PhagesDB’s two top hits predicted unknown function with e-values of 1e-68 and 2e-68, and identities of 93% and 92%, respectively. NCBI’s two top hits also predicted unknown function with e-values of 5e-72 and 1e-71 and identities of 93%, respectively. The CDD database had no hits. HHpred had uninformative hits with low probability, low coverage, and a high e-value. There is not enough evidence to hypothesize a function for this gene. /note=Transmembrane domains: No transmembrane domains were called in TMHMM or TOPCONS. It is not a membrane protein. /note=Secondary Annotator Name: Wang, Yiyang (Jennifer) /note=Secondary Annotator QC: I agree with the annotation on location and function call. All of the evidence categories have been considered and the notes are informative. Great job! Maybe just add the Pham number of the upstream and downstream genes in the synteny box. CDS 17973 - 19007 /gene="22" /product="gp22" /function="hypothetical protein" /locus tag="Bolt007_22" /note=Original Glimmer call @bp 17973 has strength 11.31; Genemark calls start at 17973 /note=SSC: 17973-19007 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_KLEVEY_22 [Arthrobacter phage Klevey]],,NCBI, q1:s1 99.7093% 0.0 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.021, -4.59040903311284, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KLEVEY_22 [Arthrobacter phage Klevey]],,UAW09380,83.965,0.0 SIF-HHPRED: SIF-Syn: /note=Sasha - Changed function call to No Known Function: Evidence initially checked in HHpred had a low coverage of 27% and was only evidence provided to suggest that the function would be a receptor binding protein. However, there were significant hits in both Phagesdb Blast with phages in the same cluster that had very small e-value and no known function called. NCBI Blast also provides evidence to suggest no known function with an e-value of 0 and 99% for hypothetical proteins. /note=Primary Annotator Name: Huq, Naveed /note=Auto-annotation: Glimmer and Genemark both agree on start site 17973, start codon is ATG /note=Coding Potential: Reasonable coding potential in putative ORF, covered by chosen start site /note=SD (Final) Score: The original start site of 17973 has a final score of -4.59 and a Z-score of 2.021. This start site does not have the best Ribosome Binding Site score but the other starts are not better because the gap is much larger. /note=Gap/overlap: Gap of -4 with the upstream gene is reasonable (indicates an operon) and so is gene length /note=Phamerator: 8373 - 1/7/22. The pham my gene belongs to does present in other members of the cluster, FH. The phage that I used for comparison is Klevey_22. No function called. /note=Starterator: Conserved start site number 2, @17973, 2/2 other members of pham call same start site number /note=Location call: Real gene with most likely start site @17973, conserved in starterator /note=Function call: HHPRED indicates function is receptor binding protein with 97.9 probability, 27.61% coverage and an E value of 0.00012 /note=Transmembrane domains: No TMDs predicted /note=Secondary Annotator Name: Whang, Allison /note=Secondary Annotator QC: Agree with start site and function call. I would overall add more detail to your notes, especially the functional call notes (mention PhagesDB blast and NCBI blast hits within the function call notes section as well as HHpred. CDS 19009 - 19197 /gene="23" /product="gp23" /function="hypothetical protein" /locus tag="Bolt007_23" /note=Original Glimmer call @bp 19009 has strength 6.61; Genemark calls start at 18997 /note=SSC: 19009-19197 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_KLEVEY_23 [Arthrobacter phage Klevey]],,NCBI, q4:s3 95.1613% 2.75018E-20 GAP: 1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.829, -2.838547390232814, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KLEVEY_23 [Arthrobacter phage Klevey]],,UAW09381,73.7705,2.75018E-20 SIF-HHPRED: SIF-Syn: This phage surprisingly does not display synteny for Gene 23, however, it does for the majority of its genome with all of the confirmed phages in this cluster. Upstream is minor tail proteins, chaperones, and peptidoglycan binding protein. Downstream is endolysin and another minor tail protein. As mentioned phages Prairie and Bumble for this gene show no synteny. /note=Primary Annotator Name: Esparza, Pablo /note=Auto-annotation: Glimmer called 19009 F and Genemark called 18997 F. They aren`t too far apart. Must check other evidence. /note=Coding Potential: Our coding potential appears to be from roughly around @19000 F to @19200 F which matches our gene candidates parameters. /note=SD (Final) Score: Our final score of -2.839 is the smallest of them all and the Z score of 2.829 is the biggest of them all. Best possible score out of all candidates. /note=Gap/overlap: There appears to be a gap of 1 bp so this seems acceptable. Its length is 189 which makes it fit in well. /note=Phamerator: This gene belongs to pham 7007 as of January 10, 2022. The start site is not conserved throughout the phages in this cluster. Some are close but a little off and phages like Bumble and prairie do not have Gene 23 in their genome. This phage belongs in cluster FH. /note=Starterator: Start number 7 is the only one for bolt out of the three in the report. Thus the other 2 are the most annotated start. It mentions start site @19,009 F for number 7 and it has no manual annotations. This looks like the best start site. Given it is out of 3, this start sight is found in 33.3% of phages. /note=Location call: Interesting enough this gene is small when compared to the rest of the other phages. Gene 23 is larger in the other phages. Codon of ATG is common. There are 3 members in this pham and 1 of them are drafts. /note=Function call: NKF. It might be a minor tail protein but there is no strong evidence to confirm this. The confirmation from PhagesDB, although 100%, comes from a phage in a different sub-cluster and is the only line of evidence suggesting this function. Both blasts, HHPRED, and CDD have no significant results. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Wright, Nicklas /note=Secondary Annotator QC: More detail is required in the PECAAN notes. There is extraneous information in some sections and other sections are lacking important information. If there are two potential start sites, it is important to indicate which one you are discussing. I recommend taking a second look at the annotation lab manual. CDS 19247 - 20917 /gene="24" /product="gp24" /function="minor tail protein" /locus tag="Bolt007_24" /note=Original Glimmer call @bp 19247 has strength 14.68; Genemark calls start at 19247 /note=SSC: 19247-20917 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage Klevey]],,NCBI, q1:s1 100.0% 0.0 GAP: 49 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.401, -6.731060450590343, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Klevey]],,UAW09382,87.7698,0.0 SIF-HHPRED: SIF-Syn: Minor tail protein, upstream gene is NKF, like in phage Klevey. Downstream is a membrane protein, unlike in phage Klevey where it’s function has not been determined. /note=Primary Annotator Name: Melkote, Aditi /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 19247. /note=Coding Potential: Coding potential is on the forward strand only, indicating this is a forward gene; this ORF has good coding potential on both GeneMark S and GeneMark Host, with the start site 19247 including all of the coding potential. /note=SD (Final) Score: The final score is -6.731, this is not the best final score, however the final scores that are more negative correspond to start sites that are not viable (gap with upstream gene very high, does not cover all coding potential) /note=Gap/overlap: The gap is 49bp, and this gene appears to be conserved in other phages. /note=Phamerator: 83788. Date 01/12/2022. It is conserved; found in Bumble (FH) and Klevey (FH) /note=Starterator: Start site 3 called in 43/47 non-draft genes in the pham, which correlates to start site 19247 for Bolt007. /note=Location call: As of now with the available evidence, this appears to be a real gene with start site at 19247bp. /note=Function call: Minor tail protein. The top phagesdb BLAST hit has the function of minor tail protein (E-value = 0), and 2 out of 3 top NCBI BLAST hits also have the function of minor tail protein. (>99% coverage, >75% identity, and E-value = 0).HHpred had one hit for minor tail protein with 98.8% and probability respectively and E-value <10^-7. CDD had only one hit in total for tail spike protein, but the E-value was too high to be viable, low coverage (9.8%) and low identity value (30%). /note=Transmembrane domains: No results from TMHMM or TOPCONS, suggesting this is not a membrane protein. /note=Secondary Annotator Name: Abuwarda, Manar /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. For synteny, can mention that the downstream gene is in the same pham as Klevey. CDS 20917 - 21198 /gene="25" /product="gp25" /function="membrane protein" /locus tag="Bolt007_25" /note=Original Glimmer call @bp 20917 has strength 8.59; Genemark calls start at 20917 /note=SSC: 20917-21198 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Prairie]],,NCBI, q1:s1 100.0% 2.88991E-58 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.929, -4.862260131515226, no F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Prairie]],,QTF82121,97.8495,2.88991E-58 SIF-HHPRED: SIF-Syn: Mebrane protein, upstream gene is minor tail protein, downstream is NKF. in Phage Bumble it is protein number 24 and it is NFK, upstream gene is minor tail protein and downstream gene is NFK /note=Sasha Semaan: Changed function call to membrane protein. There was 1 hit in TMHMM and 1 hit in Topcons as well as evidence in NCBI Blast to suggest that this gene is a membrane protein. /note=Primary Annotator Name: Niazmandi, Kiana /note=Auto-annotation: both glimmer and genemarks agree on the start site at 20917 /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: the final score is -4.862 which is the best score on PECAAN. /note=Gap/overlap: there is an overlap of 1 bp upstream of the strat site which is reasonable. /note=Phamerator: pham: 11380. Date 01/14/22. It is conserved; found in Bumble (FH) and Kelvey (FH). /note=Starterator: Start site 9 in Starterator was manually annotated in 44/47 non-draft genes in this pham. Start 9 is 20917 in Bolt007. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 7115. /note=Function call: division protein in the membrane. The top three phagesdb BLAST hits have an unknown function (E-value <6e-46), and the top two NCBI BLAST hits also have the function of membrane protein. (<98% coverage,<93% identity, and E-value <2.8e^-58, and 3.8e^-57 ). HHpred had a hit for division protein in the membrane with <74% probability, <65% coverage, and E-value of 4.7-42. CDD had no relevant hits. /note=Transmembrane domains: there is one TMD hit on TMHMM and no-hit in TOPCONS so it doesn`t have a transmembrane domain, this is not in alignment with the function /note=Secondary Annotator Name: Batteikh, Maysaa /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator based on the evidence provided for the location call, but the function call is incorrect, it should be unknown function since there is only one hit on TMHMM and no hits on topcons, indicating it is not a membrane domain. CDS 21191 - 21403 /gene="26" /product="gp26" /function="hypothetical protein" /locus tag="Bolt007_26" /note=Original Glimmer call @bp 21191 has strength 15.08; Genemark calls start at 21191 /note=SSC: 21191-21403 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_KLEVEY_26 [Arthrobacter phage Klevey]],,NCBI, q1:s1 100.0% 1.08277E-39 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.098, -2.338663114094478, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KLEVEY_26 [Arthrobacter phage Klevey]],,UAW09384,100.0,1.08277E-39 SIF-HHPRED: SIF-Syn: NKF, upstream gene is membrane protein (11380), downstream is NKF (22563). Phage Klevey has the same pham numbers in the same order, but, 11380 in Klevey is not marked as membrane protein. /note=Primary Annotator Name: Senthilvelan, Jayasuriya /note=Auto-annotation: Glimmer and GeneMark agree on the start site and start codon: 21191, GTG. /note=Coding Potential: There is coding potential within the putative ORF. The start site covers this coding potential. /note=SD (Final) Score: -2.339. This is the best score. The only possible start site is 21191. Z-score for this start site is more than 3, which adds evidence for this being the start site. /note=Gap/overlap: -8 bp gap. This is a reasonable overlap. The called start site is the LORF. This gap is syntenic with phages Klevey, Lilmac1015, and Praire in the same cluster. Hence, this start site is very likely. /note=Phamerator: Gene is found in Pham 94146 as of 01/08/2022. This pham is in all members of cluster FH (conserved). No function is given on the pham page of phagesdb. /note=Starterator: Most annotated start is 5 (31/47 call it), but this start is not called in my gene. Start site 8 is the only one that is called and is conserved (4/50) among all phages in cluster FH except for one. Hence, 21191 is the best start. /note=Location call: Above evidence suggests this is a real gene and starts at 21191. /note=Function call: Phagesdb BLAST suggests unknown function for nearly all hits, with its two strongest hits having e-value < 10^-30. NCBI BLAST states the same finding as Phagesdb. HHPred and CDD yielded no significant hits. /note=Transmembrane domains: Neither TmHmm or Topcons predicts any TMHs. No evidence to suggest this gene product is associated with the membrane. /note=Secondary Annotator Name: Kamarzar, Minehli /note=Secondary Annotator QC: I agree with this location and functional call. Maybe mention the percent coverage and identity for NCBI Blast along with the e-values as more evidence in the function call. Also, I think you should checkmark phage Prairie for phagesDB Blast as good evidence for no known function. Otherwise, great job! CDS 21400 - 21801 /gene="27" /product="gp27" /function="membrane protein" /locus tag="Bolt007_27" /note=Original Glimmer call @bp 21400 has strength 16.41; Genemark calls start at 21400 /note=SSC: 21400-21801 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Prairie]],,NCBI, q1:s1 100.0% 4.34648E-60 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.255, -4.03907523433314, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Prairie]],,QTF82123,93.2331,4.34648E-60 SIF-HHPRED: SIF-Syn: The function is membrane protein (22563), upstream gene is NKF, downstream is endolysin, just like in phage Klevey, Prairie and Lilmac1015. /note=Primary Annotator Name: Ma, Yiwen (Kristy) /note=Auto-annotation: GeneMark and Glimmer all agree on the same start site, which is 21400. The start codon is ATG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. There is coding potential predicted by Host-trained GeneMark and Self-trained GeneMark and the chosen start site includes all of the coding potential in both Host-trained GeneMark and Self-trained GeneMark. /note=SD (Final) Score: The Final Score is the best option of -4.039. The Z-score is 2.255, which is significant. /note=Gap/overlap: There is a 4bp overlap with upstream gene, which is evidence of an operon. /note=Phamerator: Pham 22563. Date: 1/07/2022. It is conserved in Bumble (FH), Klevey (FH), Lilmac1015 (FH), and Prairie (FH). /note=Starterator: Start 1 called the most often in the published annotations, it was called in 3 of the 4 non-draft genes in the pham. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 21400. /note=Function call: Membrane protein. The top three Phagesdb BLAST hits and the only NCBI BLAST hit with region all suggest unknown function (low E-value). However, the top two NCBI BLAST hits provide evidence of membrane protein. (Identity > 85.7%, 100% Coverage, E-value < 1.15e-59) /note=Transmembrane domains: Both TMHMM or TOPCONS predict TMDs. Therefore, there is a high possibility that this is a membrane protein. There are 4 TMDs predicted by TmHmm and all TOPCONS programs detect TMDs. /note=Secondary Annotator Name: KRUG, KELLEY /note=Secondary Annotator QC: I agree with the annotation and the location/function calls. Remember to fill out the synteny box. CDS 21798 - 22505 /gene="28" /product="gp28" /function="endolysin" /locus tag="Bolt007_28" /note=Original Glimmer call @bp 21798 has strength 14.75; Genemark calls start at 21798 /note=SSC: 21798-22505 CP: yes SCS: both ST: SS BLAST-Start: [endolysin [Arthrobacter phage Klevey]],,NCBI, q1:s1 100.0% 1.46789E-148 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.018, -2.583959800616441, no F: endolysin SIF-BLAST: ,,[endolysin [Arthrobacter phage Klevey]],,UAW09386,93.1915,1.46789E-148 SIF-HHPRED: Putative phage lysin; endolysin, prophage, Lytic activity, HYDROLASE; 1.9A {Streptococcus phage phi7917},,,5D74_A,94.0426,99.6 SIF-Syn: Endolysin, upstream gene is not called yet(22563), downstream is also not called yet (3558), just like in phage Klevey. /note=comment on length. overlap hints at operon. is it the most annotated start? need more detail to function call. checked off more HHpred evidence. has a lot of ncbi blast checked off - janelle /note=PECAAN Notes /note=Primary Annotator Name: Wang, Jennifer Yiyang /note=Auto-annotation: Glimmer and Genemark both agree on a start site of 21798. The start codon ATG is called. /note=Coding Potential: This gene has reasonable coding potential predicted within the putative ORF, and the chosen start site covers all the predicted coding potential. /note=SD (Final) Score: final score=-2.584 for start site 21798. It is the best final score on PECAAN with the smallest gap. /note=Gap/overlap: 4 overlap. Most reasonable overlap given out of the potential start sites. /note=Phamerator: Pham: 55133. Date 01/08/22. It is conserved; found in Klevey(FH), Lilmac1015(FH) and Prairie(FH) which are within the same cluster as Bolt007, as well as in other 5 members from other clusters. The function called for the gene is endolysin. /note=Starterator: Start site 5 in Starterator was manually annotated in 4/7 non-draft genes in this pham. Start 5 is 21798 in Bolt007. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 21798 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: Function for the gene is endolysin. Most of the Phagesdb BLAST top hits state “endolysin”. NCBI BLAST top hits also state “endolysin” for the gene as well. Endolysin, there is one hit for CDD and one good hit stating “endolysin” for HHpred (low e-value with informative function). /note=Transmembrane domains: No TMD`s called and no evidence suggesting TMD function within the other databases, neither TMHMM nor TOPCONS calls it. /note=Secondary Annotator Name: Fleming, Hanna /note=Secondary Annotator QC: I agree with your location and function call. Don`t forget to fill out the coding potential drop down menu. In function call do you mean high e-value or low e-value? I would also list e-values for BLAST results. Otherwise great! CDS 22502 - 22624 /gene="29" /product="gp29" /function="membrane protein" /locus tag="Bolt007_29" /note=Original Glimmer call @bp 22502 has strength 21.23; Genemark calls start at 22502 /note=SSC: 22502-22624 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Lilmac1015]],,NCBI, q1:s1 100.0% 1.26947E-9 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.495, -6.279843477631437, no F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Lilmac1015]],,UKH48316,87.5,1.26947E-9 SIF-HHPRED: SIF-Syn: /note=changed to membrane protein bc tmhmm and topcon hits. does start site cover all CP? overlap hints at operon. - janelle /note=Primary Annotator Name: Allison Whang /note=Auto-annotation: Glimmer and Genemark both agree on a start site of 22502. The start codon ATG was called. /note=Coding Potential: There is strong coding potential within the ORF with a start site of 22502 for both the host and self-trained Genemark coding maps. /note=SD (Final) Score: -6.280 for the start site 22502. This extremely negative SD score is highly indicative that this start site is the correct start site. /note=Gap/overlap: 4 bp overlap with the upstream gene. This is a very small overlap, and does not violate the guiding principles of genome annotation by being less than a 30 bp overlap. /note=Phamerator: Information collected on 1/14/2022. The gene is found in pham 3558. All of the other phages that also had genes within this pham were within cluster FH; Klevey and Lilmac1015. No functions are listed on phamerator. /note=Starterator: Information collected on 1/14/2022. Start site 7 was the most annotated start site for the genes that are in this pham, called for 2/2 (100%). For this particular gene, the corresponding start number to 7 is 22502. This is the same start site that was agreed upon by Glimmer and Genemark. /note=Location call: This gene seems like a real gene because start site 22502 covers all coding potential within the ORF, and that Glimmer and Genemark agree on this start site. /note=Function call: The relevant PhagesDB BLAST hits match only one gene which has an unknown function (e-value 5e-10), but has a lower score of 62%. The NCBI Blast indicates no hits at all. Relevant HHpred hits match to genes with unknown function, so these are not necessarily indicative of any new information. Thus, this gene probably has NKF. /note=Transmembrane domains: TmHmm predicts one transmembrane domain, but TOPCONS does not indicate the presence of a transmembrane domain. Because there needs to be either 2 transmembrane domains called by TmHmm or 1 called by TmHMM and 1 called by TOPCONS, there are no relevant conclusions that can be gathered by this information. /note=Secondary Annotator Name:Gonzalez, Celio /note=Secondary Annotator QC: Agree with annotator CDS 22624 - 22917 /gene="30" /product="gp30" /function="membrane protein" /locus tag="Bolt007_30" /note=Original Glimmer call @bp 22624 has strength 19.8; Genemark calls start at 22624 /note=SSC: 22624-22917 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Klevey]],,NCBI, q5:s6 95.8763% 3.59502E-19 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.156, -2.6835611257758942, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Klevey]],,UAW09388,86.7347,3.59502E-19 SIF-HHPRED: SIF-Syn: This gene is in pham 48697, the upstream gene is in pham 3558, and the downstream gene is in pham 16204, just like in phage Prairie. /note=comment on length. add function on phamerator. is start site 3 most conserved? - janelle /note=PECAAN Notes /note=Primary Annotator Name: Wright, Nicklas /note=Auto-annotation: Genemark and Glimmer agree on a start site of 22624. The start codon is ATG. /note=Coding Potential: The gene has good coding potential in the forward direction and the start site includes all of the coding potential. /note=SD (Final) Score: The final score is -2.684 which is the best final score on PECAAN by a large margin. /note=Gap/overlap: There is a 1 bp overlap, suggesting that this gene may be part of an operon. /note=Phamerator: Pham 48697 as of 1/9/2022. This pham is in 15 phages, including all 5 members of cluster FH (which includes Bolt007) as well as phage ArV2, a singleton. /note=Starterator: Start site 3 is conserved among members of this pham. It corresponds to position 22624. 10 out of 13 non-draft annotations call this start site. /note=Location call: This is likely a real gene with start site 22624. /note=Function call: This gene is likely a membrane protein. It has 2 NCBI BLAST hits for membrane proteins with e-values less than 8.8037e-19, identity greater than 75%, and coverage greater than 95%. HHpred and CDD are uninformative, but the gene has transmembrane domains according to TmHmm. /note=Transmembrane domains: This gene has 2 TMD’s according to TmHmm and SOSUI, and is therefore a membrane protein. /note=Secondary Annotator Name: Paek, Brian /note=Secondary Annotator QC: I agree with the evidence. I would not say it has the best final score, however, the overlap is better than the top option because it is a much smaller overlap. CDS 22989 - 23309 /gene="31" /product="gp31" /function="baseplate wedge protein" /locus tag="Bolt007_31" /note=Original Glimmer call @bp 22989 has strength 12.54; Genemark calls start at 22989 /note=SSC: 22989-23309 CP: yes SCS: both ST: SS BLAST-Start: [baseplate wedge protein [Arthrobacter phage Prairie] ],,NCBI, q1:s1 100.0% 1.73404E-65 GAP: 71 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.521, -5.654388976111635, yes F: baseplate wedge protein SIF-BLAST: ,,[baseplate wedge protein [Arthrobacter phage Prairie] ],,QTF82126,99.0566,1.73404E-65 SIF-HHPRED: Baseplate wedge protein gp25; contractile sheath, baseplate, wedge, sheath polymerization, viral protein; HET: MSE; 2.47A {Enterobacteria phage T4},,,5IW9_A,96.2264,99.7 SIF-Syn: Baseplate wedge protein, upstream gene is in pham 48697, downstream gene is baseplate J protein, just like in phage Klevey. /note=would change starterator box to NI. comment on length. add function in phamerator. - janelle /note=Primary Annotator Name: Abuwarda, Manar /note=Auto-annotation: Glimmer and Genemark. Both call the start site at 22989. Start codon ATG which is common codon. /note=Coding Potential: The ORF has reasonable coding potential. Coding potential is found in both Genemark Self and Host. The chosen start site includes all the coding potential /note=SD (Final) Score: -5.654. It is the best final score on PECAAN. /note=Gap/overlap: 71. This gap is a little large, however, this gap is conserved in the genome of Klevey phage and the gap has no coding potential. /note=Phamerator: Pham 16204. Date 1/8/22. It is conserved and found in Bumble (FH). /note=Starterator: Start site 18 in Starterator was manually annotated in 48/51 non-draft genes in this pham. Start 18 is 22989 in Bolt007. This evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene with the most likely start site at 22989. /note=Function call: Baseplate wedge protein. The top 10 phagesdb BLAST hits have the function of baseplate wedge protein (E-value < 5e-26) and the top 8 NCBI BLAST hits also have the function of baseplate wedge protein (94%+ coverage, 55%+ identity, E-value < 9e-31). HHpred had a hit for baseplate wedge protein gp25 with 99.75% probability, 96.2% coverage, and E-value of 2.2e-16. CDD suggests that the domain is a member of superfamily pfam04965 with baseplate wedge protein gp25 function (E-value 1.68e-6). /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Rajiv, Subashni /note=Secondary Annotator QC: I agree with your calls. Please make sure to fill out the Pham starterator and all GM coding capacity drop down menus. Please also mention the start codon for Auto-annotation. CDS 23310 - 24443 /gene="32" /product="gp32" /function="baseplate J protein" /locus tag="Bolt007_32" /note=Original Glimmer call @bp 23310 has strength 17.34; Genemark calls start at 23310 /note=SSC: 23310-24443 CP: yes SCS: both ST: SS BLAST-Start: [baseplate J protein [Arthrobacter phage Pippa]],,NCBI, q1:s1 99.4695% 7.12565E-133 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.018, -3.3503726477288405, yes F: baseplate J protein SIF-BLAST: ,,[baseplate J protein [Arthrobacter phage Pippa]],,QOP66272,69.1489,7.12565E-133 SIF-HHPRED: Baseplate_J ; Baseplate J-like protein,,,PF04865.17,71.3528,100.0 SIF-Syn: Synteny is present with the downstream gene with klevey. The downstream is in pham 16204. Upstream gene does not have synteny with any genes from the phages in cluster FH /note=added start codon. comment on length. - janelle /note=Primary Annotator Name: Batteikh, Maysaa /note=Auto-annotation: Both Glimmer and Genemark called 23310 as the start site, ATG /note=Coding Potential: The ORF has coding potential which is seen in the host and self genemark. The chosen start site, 23310 has all the coding potential for the gene that only goes in the forward direction. /note=SD (Final) Score: Final score of -3.350 which is the best score present on PECAAN with a good z-score of 3.018, the highest z-score. /note=Gap/overlap: There is no gap or overlap between this gene and upstream and downstream gene. /note=Phamerator: As of 1/10/2022, this gene belongs to pham 3830, which has 56 members, 53 of which are non draft genes, and 4 other phages that belong to the same cluster FH. The genes in this pham have a baseplate J protein function. /note=Starterator: A conserved start site in this pham is 10, which is called by 45 of the 51 non draft genes. The most annotated start of 10 corresponds to the start site of 23310 for this gene. /note=Location call: From the evidence presented, the start site of 23310 is the best start site for this gene which encodes all the coding potential with no gap or overlap. The starterator information provides evidence for the start site of 23310. /note=Function call: Baseplate J protein. PhagesDB BLAST has multiple hits with this function, which contain a low e-value (1e-166). NCBI blast also has multiple hits with this function, the top two hits have e-values<2.1e-132, high percent coverage (99.46) and a decent % identity (56%). HHPRED has hits with high percent coverage (91% and higher), 100% probability and low e-values, less than e-27. CDD has one hit with a low e value of 7.4e-15, decent % coverage (64.7%) but low % identity 27.27%. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Huq, Naveed /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered CDS 24459 - 26861 /gene="33" /product="gp33" /function="minor tail protein" /locus tag="Bolt007_33" /note=Original Glimmer call @bp 24459 has strength 13.87; Genemark calls start at 24459 /note=SSC: 24459-26861 CP: yes SCS: both ST: NI BLAST-Start: [minor tail protein [Arthrobacter phage Giantsbane]],,NCBI, q226:s16 68.125% 0.0 GAP: 15 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.757, -5.080915947802713, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Giantsbane]],,QGZ17225,72.9825,0.0 SIF-HHPRED: Tail fiber protein; tail fiber G7C phage hydrolase-type esterase , SGNH hydrolase-type esterase domain (IPR013831), Adsorption of; HET: MSE, PEG; 2.411A {Escherichia phage vB_EcoP_G7C},,,4QNL_A,29.875,95.9 SIF-Syn: The function for this gene is a minor tail protein. There is no direct synteny with any of the phages in the same cluster; however, the genes upstream include a baseplate J protein and baseplate wedge protein, which are related to the tail, and the gene downstream, being a hydrolase, also has a function related to the tail just like in Phage Klevey. /note=some checked evidence do not have low e values - janelle /note=Primary Annotator Name: Kamarzar, Minehli /note=Auto-annotation: Glimmer and GeneMark were used and both agreed on the same start site. The called start codon is 24459. /note=Coding Potential: The gene contains reasonable coding potential predicted within the putative ORF. The chosen start site covers all the coding potential. /note=SD (Final) Score: The SD score of –5.081 is not the best but it is still reasonable to suggest the presence of a credible RBS since the start site covers all the coding potential. Also, the z-score is 1.757. /note=Gap/overlap: The gap with the upstream gene is very reasonable at a 15 bp gap. The length of the gene (2403 bp) is acceptable given the auto-annotated start site. /note=Phamerator: As of January 9, 2022, the gene is found in pham 90248. The gene is conserved in Phage Giantsbane which belongs to the cluster AU. The phage used for comparison was Phage Giantsbane. The function call for this gene is a minor tail protein. The function call is consistent between Phamerator and the phams database. It is approved on the SEA-PHAGES function list. /note=Starterator: The start site choice that is conserved among the members of the pham in which this gene belongs is start site 24459 which is start number 1. There is 1 non-draft member and 1 draft member in this Pham and 1/1 non-draft members call start site 3; however, the start site that made most sense for this gene is 1 which is not called by the non-draft member. /note=Location call: The gathered evidence suggests that the original start site call at 24459 by Glimmer and Genemark is reasonable and it is most likely the potential start site. In addition, it also suggests that the gene is a real gene. /note=Function call: Multiple PhagesDB BLAST hits have small e-values with the suggested function being a minor tail protein. NCBI BLASTp gave one hit with the suggested function being a minor tail protein. PhagesDB BLAST gave hits with e-values of 0 to 3e-30, while NCBI BLASTp gave an e-value of 0. The top hit for NCBI BLASTp and PhagesDB BLAST shows a reasonable identity value (>60%) and 68% query coverage. HHpred had hits that were not relevant or used due to low % coverage. The HHpred hits contained <30% coverage, 99.9% probability, and an e-value lower than e-23. Similarly, CDD had no relevant hits and was not used due to low % coverage (<30%) and % identity (<35%), but did contain small e-values (e-40). /note=Transmembrane domains: TMHMM and TOPCONS did not predict any TMDs, which indicates that it is not a membrane protein. /note=Secondary Annotator Name: Esparza, Pablo /note=Secondary Annotator QC: I would check more boxes in your evidence results to ensure it supports your minor tail protein call. I agree with start site/gene candidate. I would also talk to the professors as I see evidence boxes for potentially other functions in things like CDD and NCBI blast. Lastly, I would uncheck the boxes with insignificant e-values. CDS 26872 - 28398 /gene="34" /product="gp34" /function="glycoside hydrolase" /locus tag="Bolt007_34" /note=Original Glimmer call @bp 26872 has strength 17.51; Genemark calls start at 26872 /note=SSC: 26872-28398 CP: yes SCS: both ST: NI BLAST-Start: [esterase [Gordonia phage Sadboi] ],,NCBI, q82:s97 83.6614% 1.83133E-86 GAP: 10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.263, -1.9310779259753799, yes F: glycoside hydrolase SIF-BLAST: ,,[esterase [Gordonia phage Sadboi] ],,YP_009852656,45.9108,1.83133E-86 SIF-HHPRED: Glycoside hydrolase BT_1002; glycoside hydrolase, fucosidase, plant pectin, hydrolase; HET: CA; 2.0A {Bacteroides thetaiotaomicron},,,5MQP_B,52.7559,99.9 SIF-Syn: Hydrolase, upstream gene is minor tail protein, downstream is NKF (pham 95186), just like in phage Klevey /note=might be esterase looking at functions in pham, phagesdb blast, and ncbi blast- janelle /note=Primary Annotator Name: Krug, Kelley /note=Auto-annotation: Glimmer start site 26872, GeneMark start agrees. Start codon ATG. /note=Coding Potential: good coding potential in 1st frame with start at 26872 and stop at 28398, start site encompasses the full coding potential of the gene. Has LORF. /note=SD (Final) Score: RBS final score of -1.931, it is the best one. Also has best z-score at 3.263 /note=Gap/overlap: Has a gap of 10 bp, which is the smallest available option. Reasonable size gap, and conserved in phage Klevey /note=Phamerator: Pham 85916 as of 1/8/22. Pham is not conserved in other FH cluster phages /note=Starterator: Start number 1 called the most often (¾) but not present in Bolt007. Start number 2 is likely start for Bolt007 @26872. /note=Location call: Likely a real gene with a start site of 26872, as evidenced by the information above /note=Function call: HHPRED had significant hit for glycoside hydrolase (99.8% probability, 74.0% coverage, 4.8e-17 e-value), NCBI BLAST had two significant hits for glycoside hydrolase (38.8% identity, 53.5% aligned, 88.6% coverage, 2.73e-75 e-value). Phagesdb BLAST had hit for glycoside hydrolase with e value of 3e-70. No hits on CDD. Thus, going with hydrolase as the function. /note=Transmembrane domains: No TmHmm or Topcons hits. /note=Secondary Annotator Name: Melkote, Aditi /note=Secondary Annotator QC: I agree with this call - be sure to mention PhagesDB BLAST evidence in PECAAN notes as you have marked it below CDS 28395 - 29036 /gene="35" /product="gp35" /function="minor tail protein" /locus tag="Bolt007_35" /note=Original Glimmer call @bp 28395 has strength 13.23; Genemark calls start at 28395 /note=SSC: 28395-29036 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_KLEVEY_35 [Arthrobacter phage Klevey]],,NCBI, q3:s2 99.061% 1.81513E-138 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.263, -6.254680115992223, no F: minor tail protein SIF-BLAST: ,,[hypothetical protein SEA_KLEVEY_35 [Arthrobacter phage Klevey]],,UAW09393,95.7547,1.81513E-138 SIF-HHPRED: Tail_P2_I ; Phage tail protein (Tail_P2_I),,,PF09684.13,63.3803,99.7 SIF-Syn: There is observed synteny. Upstream there is a gene encoding a hydrolase in both Bolt007 and Klevey. Downstream there is a gene with NKF from Pham 17962 in both phages. There is less synteny observed with other phages of cluster FH. /note=AF: Called as minor tail based on HHpred evidence and phage Abba. /note=Primary Annotator Name: Fleming, Hanna /note=Auto-annotation: Both Glimmer and GeneMark call the gene and agree on a start site of 28395. Start codon GTG. /note=Coding Potential: There is coding potential on host trained and self trained GeneMark in the putative ORF a start at 28395 bp covers all of the potential. /note=SD (Final) Score: The final score is -6.255, this is the worst final score. However, it has the smallest overlap of -4, suggesting its part of an operon in which case the final score is irrelevant. /note=Gap/overlap: The overlap is 4 bp, this suggests that the gene may be a part of an operon. /note=Phamerator: The pham as of 1/9/2022 is 95186. Bumble, Klevey, Lilmac1015, and Prairie also have genes in this pham. /note=Starterator: This start site 54 (28395 bp for Bolt007) was called in only Bolt007 and no other non-draft genomes, other members of cluster FH call a different start site that is not present in Bolt007. /note=Location call: This appears to be a real gene with a start site at 28395. /note=Function call: NKF. Phagesdb has 3 hits with e-values<1e-112, however, these hits have no known function. HHpred and CDD were uninformative (very low coverages from HHpred). NCBI BLAST also has hits with no known function but e-values<1.23e-135, >99% coverage, and >91% identity. /note=Transmembrane domains: No. TmHmm predicts 0 transmembrane domains. /note=Secondary Annotator Name: Niazmandi, Kiana /note=Secondary Annotator QC: please mention what is the most common start site number in starterator and how many non-draft Pham have this start site. I agree with the start site and function CDS 29244 - 29426 /gene="36" /product="gp36" /function="hypothetical protein" /locus tag="Bolt007_36" /note=Original Glimmer call @bp 29244 has strength 22.12; Genemark calls start at 29244 /note=SSC: 29244-29426 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Prairie] ],,NCBI, q1:s1 98.3333% 4.88619E-5 GAP: 207 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.018, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Prairie] ],,QTF82132,58.3333,4.88619E-5 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Gonzalez, Celio /note=Auto-annotation: Genemark lists a start site of 29244. Glimmer lists a start site of 29244. The start codon for both is ATG. /note=Coding Potential: All coding potential is found within the bounds of 29244 and 29426 as seen in both Glimmer and Host-Trained Genemark as well as being in the forward direction. /note=SD (Final) Score:-2.505 is the final score which indicates a high likelihood that our SS is 29244 /note=Gap/overlap:There`s a gap of 11 base pairs but this is consistent with other phages that have the overlap such as Klevey and Prairie so it`s not abnormal. /note=Phamerator: Pham 17962 on 1/7/2022. It is conserved in phage Klevey (FH) and phage Prairie (FH). /note=Starterator: Start site 4 in Starterator was manually annotated 2/3 in this pham. Start 4 is 29244 in Bolt007. This evidence agrees with the site predicted by Glimmer and Genemark. /note=Location call: Suggested start site is 29244 due to a z score and final score with appropriate ranges that indicate high likelihood of start site 29244. /note=Function call: NKF. Determined by using phagesDB BLAST and seeing how Klevey and Lilmac1015 (the phages with the highest similarity) had the same function as well as viewing HHPred having PF13179.9 with its function being unknown. /note=Transmembrane domains: Has no transmembrane domains which makes sense because it does not need to go through the membrane to carry out its function. /note=Secondary Annotator Name: Kamarzar, Minehli /note=Secondary Annotator QC: I agree with this annotation, but a few things need to be addressed. In the starterator section, mention that the manual annotation was done for non-draft genes. I also do not think PF13179.9 should be checked off for HHpred, since the e-value is significantly high. Additionally, phagesDB Blast might be supportive of the functional call and may need to be checked; it would be good to check with the professors. Also, mention that TMHMM and TOPCONS did not predict any TMDs. Do not forget to include the pham numbers for the upstream/downstream genes in the synteny box. Overall, good work! CDS 29435 - 30487 /gene="37" /product="gp37" /function="exonuclease" /locus tag="Bolt007_37" /note=Original Glimmer call @bp 29435 has strength 16.36; Genemark calls start at 29435 /note=SSC: 29435-30487 CP: yes SCS: both ST: SS BLAST-Start: [Cas4 family exonuclease [Arthrobacter phage Prairie]],,NCBI, q1:s1 100.0% 0.0 GAP: 8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.869, -4.925297296037788, no F: exonuclease SIF-BLAST: ,,[Cas4 family exonuclease [Arthrobacter phage Prairie]],,QTF82133,92.8367,0.0 SIF-HHPRED: c.52.1.13 (A:) lambda exonuclease {Bacteriophage lambda [TaxId: 10710]},,,d3sm4a_,53.7143,100.0 SIF-Syn: Exonuclease. Upstream there is an NKF of pham 17962 and down stream a RecT-like ssDNA binding protein of pham 93370 /note=comment on length and functions in phamerator. has a lot of evidence checked off. - janelle /note=Primary Annotator Name: Paek, Brian /note=Auto-annotation: Both Glimmer and GeneMark agree that the start site is 29435 with a start codon of ATG. /note=Coding Potential: There is high coding potential based on the middle frame going in the forward direction within the gene range for both host-trained and self-trained GeneMark. /note=SD (Final) Score: The Final Score is -4.925, and the Z-score is 1.869, which are not the best scores among the other possibilities, however, they are acceptable and can still be considered. /note=Gap/overlap: There is a 8 bp gap which is reasonable because the previous and subsequent genes are all going on the forward strand. This start site produces the longest ORF of 1053 bp which is acceptable because it is consistent with the idea that the genes must be densely packed and are allowed to have some gaps for operons. /note=Phamerator: Pham: 95653. Date Analyzed: 01/07/2022. The gene is conserved in FH and found in phages Klevey and Prairie. /note=Starterator: Start site 33 is called in 4 out of 129 of the non-draft members in this pham, 3 of which were manually annotated for this start site. Start site 33 correlates to 29435 bp in Bolt007. /note=Location call: The gathered evidence suggests that this is a real gene and the most probable start site is at 29435. /note=Function call: Exonuclease. Multiple phagesdb BLAST hits have the exonuclease function (E-value < 1E-80), HHPRED shows three calls of an exonuclease function (53%> coverage and E-Value < 1e-24) and 3 out of 3 top NCBI BLAST hits also have the exonuclease function. (> 96% coverage, 44%+ identity, and E-value <10^-94). /note=Transmembrane domains: Neither TMHMM or TOPCONS predicted any TMDs, suggesting that this gene is not a membrane protein. /note=Secondary Annotator Name: Senthilvelan, Jayasuriya /note=Secondary Annotator QC: I agree with location/function call. Great work! CDS 30497 - 31360 /gene="38" /product="gp38" /function="RecT-like ssDNA binding protein" /locus tag="Bolt007_38" /note=Original Glimmer call @bp 30497 has strength 15.4; Genemark calls start at 30497 /note=SSC: 30497-31360 CP: yes SCS: both ST: SS BLAST-Start: [RecT-like ssDNA binding protein [Arthrobacter phage Prairie] ],,NCBI, q1:s1 100.0% 0.0 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.118, -4.325675514574479, no F: RecT-like ssDNA binding protein SIF-BLAST: ,,[RecT-like ssDNA binding protein [Arthrobacter phage Prairie] ],,QTF82134,95.0704,0.0 SIF-HHPRED: RecT ; RecT family,,,PF03837.17,67.5958,100.0 SIF-Syn: RecT-like ssDNA binding protein; The upstream gene is an exonuclease and the downstream gene has unknown function. In Bumble, the upstream gene has exonuclease function, the conserved gene has RecT-like ssDNA binding protein, and the downstream gene has unknown function. /note=comment abt length. add function to phamerator. - janelle /note=Primary Annotator Name: Rajiv, Subashni /note=Auto-annotation: Glimmer calls the start at 30497. Genemark calls the start at 30497. The start codon is ATG. /note=Coding Potential: The coding potential in this ORF is only in the forward strand, suggesting it is a forward gene. Coding potential is found in both GeneMark Host and GeneMark Self. /note=SD (Final) Score: The Final Score is -4.326 and the Z-score is 2.118. There are start sites with better Final Scores and Z-score, however they have significant large gaps. This start site allows for the longest possible ORF. /note=Gap/overlap: There is a gap of 9 bp. This is a small and normal gap. /note=Phamerator: Pham 93370 on 1/7/2022. It is conserved in phage Bumble (FH), phage Klevey (FH), Lilmac1015 (FH), and phage Prairie (FH). /note=Starterator: Start site 37 in Starterator was found in 17/248 of genes in this pham. It was manually annotated 4 times for cluster FH. Start 37 is 30497 in Bolt007. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 30497. /note=Function call: The likely function is RecT-like ssDNA binding protein. PhagesDB’s two top hits predicted RecT-like ssDNA binding protein function with e-values of e-147, and identities of 91%. NCBI’s two top hits also predicted RecT-like ssDNA binding protein function with e-values of 0 and 1e-150 and identities of 92% and 77%, respectively. The CDD database had a significant hit within the RecT superfamily with an identity of 43% and e-value of 0. HHpred had 1 significant suggesting RecT function with an e-value of 1.6e-32 and a probability of 100%. /note=Transmembrane domains: No transmembrane domains were called in TMHMM or TOPCONS. It is not a membrane protein. /note=Secondary Annotator Name: Ma, Yiwen (Kristy) /note=Secondary Annotator QC: Good job! I agree with your location call and function call. Don`t forget to check the suggested start. CDS 31353 - 31598 /gene="39" /product="gp39" /function="hypothetical protein" /locus tag="Bolt007_39" /note=Original Glimmer call @bp 31353 has strength 24.35; Genemark calls start at 31353 /note=SSC: 31353-31598 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_KLEVEY_39 [Arthrobacter phage Klevey]],,NCBI, q2:s4 95.0617% 5.51693E-12 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.841, -4.967562678515084, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KLEVEY_39 [Arthrobacter phage Klevey]],,UAW09397,60.241,5.51693E-12 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF (Pham 95517), downstream is NKF (Pham 93370), similar to phage Klevey from the same cluster, FH. /note=checked off evidence. add to function call. - janelle /note=Primary Annotator Name: Huq, Naveed /note=Auto-annotation: Glimmer and Genemark both agree on start site 31353, start codon is ATG /note=Coding Potential: Reasonable coding potential in putative ORF, covered by chosen start site /note=SD (Final) Score: The original start site of 5214 has a final score of -4.968 and a Z-score of 1.841. This start site does not have the best Ribosome Binding Site score but the other starts are not better because the gap is much larger. /note=Gap/overlap: Gap of -8 with the upstream gene is reasonable and so is gene length /note=Phamerator: 16741 - 1/7/22. The pham my gene belongs to does present in other members of the cluster, FH. The phage that I used for comparison is Klevey_39. No function called. /note=Starterator: Conserved start site number 5, @31353, 3/3 other members of pham call same start site number /note=Location call: Real gene with most likely start site @31353, conserved in starterator /note=Function call: None of the databases got hits except for HHpred. /note=Transmembrane domains: No TMDs predicted /note=Secondary Annotator Name: Wang, Yiyang Jennifer /note=Secondary Annotator QC: I agree with the location and function call. However, I do see that Pham 16741 is found in Lilmac1015, Klevey and Prairie so make sure to update that. Also don`t forget to check the evidence boxes for phagesdb and NCBI Blast even there`s no known function for the gene. Evidences are still needed to say that it`s NFK. Maybe check the annotation lab manual and try to add more details to the notes. CDS 31595 - 31807 /gene="40" /product="gp40" /function="helix-turn-helix DNA binding domain" /locus tag="Bolt007_40" /note=Original Glimmer call @bp 31595 has strength 15.67; Genemark calls start at 31595 /note=SSC: 31595-31807 CP: yes SCS: both ST: SS BLAST-Start: [helix-turn-helix DNA binding domain protein [Arthrobacter phage Prairie]],,NCBI, q4:s3 91.4286% 3.20077E-25 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.978, -4.681417770958569, no F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding domain protein [Arthrobacter phage Prairie]],,QTF82136,76.0563,3.20077E-25 SIF-HHPRED: SIF-Syn: This phage shares the most relation with Klevey as both of their helix-turn`s are gene 40. Bolt shares much synteny with all the phages in this cluster. The genes while shifted around a little are still present for all. It also is under different gene numbers for other phages. Upstream there are ssDNA binding protein and exonuclease. Downstream there is a replication inhibitor and another ssDNA binding protein. /note=Primary Annotator Name: Esparza, Pablo /note=Auto-annotation: Both Glimmer and Genemark start agree that @31595 F is the start site for this gene. /note=Coding Potential: There is coding potential that covers the gene and then some on both sides. Confirmed by self and host mark. /note=SD (Final) Score: While it does not have the highest z score as seen with @31445 having a higher score of 2.1, @31595`s 1.978 is a good enough score when you consider its final score of -4.681 is the best out of all of them. /note=Gap/overlap: There is a -4 gap so there is an overlap, suggesting it is probably part of an operon. It still checks out. Its length is 213 which even though is not the longest does not overlap/have a gap as big as the others. /note=Phamerator: The phamerator reads 95516 as of Jan 10, 2022. Belongs to the FH cluster. the start site is not conserved for the othe rphages that much. /note=Starterator: Start site 24 which is not the most common (very uncommon about 4%) is the one that corresponds to @31,595 F. It is shared with Klevey_40 (FH), Lilmac1015_42 (FH), and Prairie_39 (FH). When seen, it is called 100% of the time which is very reassuring. There are 33 phage members and 2 of them are drafts. /note=Location call: This is a real gene and the start site appears to be @ 31,595 F. While the starting codon is not AUG, it still seems to check out in other comparisons so this is plausible. /note=Function call: Phagesdb function frequency says it is a helix-turn-helix which is determined 20% of the time as seen in subcluster A5. There are some more turn-ups in other clusters with this same function. What does strengthen the confirmation that this is a helix-turn is seen in both BLASTS with significant e-values. It also has a significant value in CDD meaning it is a conserved function with other phages. /note=Transmembrane domains: TMHMM has no hits so it can`t be in the transmembrane domain. Topcons does not suggest any hits either. /note=Secondary Annotator Name: Whang, Allison. /note=Secondary Annotator QC: Agree with start site and function call. I would add more detail about phamerator and the function call sections; see the annotation lab manual for the required elements per section. Additionally, the synteny box should follow this format according to the lab manual (Example: "Portal protein, upstream gene is terminase, downstream is capsid maturation protease, just like in phage XXX".). TOPCONS was working for me as of 1/17, so if something still doesnt show up I would ask the professors. CDS 31804 - 32085 /gene="41" /product="gp41" /function="hypothetical protein" /locus tag="Bolt007_41" /note=Original Glimmer call @bp 31804 has strength 11.6; Genemark calls start at 31804 /note=SSC: 31804-32085 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_KLEVEY_41 [Arthrobacter phage Klevey]],,NCBI, q1:s1 100.0% 1.42231E-44 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.77, -3.0425830684175623, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KLEVEY_41 [Arthrobacter phage Klevey]],,UAW09399,87.0968,1.42231E-44 SIF-HHPRED: SIF-Syn: NKF just like phage Klevey. Upstream gene is an HTH DNA binding domain, unlike in phage Klevey. Downstream is NKF, like in phage Klevey. /note=add more to starterator. - janelle /note=Primary Annotator Name: Melkote, Aditi /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 31804. /note=Coding Potential: Coding potential is on the forward strand only, indicating this is a forward gene; this ORF has good coding potential on both GeneMark S and GeneMark Host, with the start site 31804 including all of the coding potential. /note=SD (Final) Score: The final score is the best option at -3.043 and the z score is the highest at 2.77. /note=Gap/overlap: The gap is -4bp, so this gene is likely part of an operon. /note=Phamerator: 9839. Date 01/12/2022. It is conserved; found in Praire (FH) and Klevey (FH) /note=Starterator: Start site 1 called in all non-draft genes in the pham, which correlates to start site 31804 for Bolt007. /note=Location call: As of now with the available evidence, this appears to be a real gene with start site at 31804bp. /note=Function call: No Known Function (NKF). The top three phagesdb BLAST hits have the no known function (E-value < 10^-37), and 2 out of 5 top NCBI BLAST hits also have function of ‘hypothetical protein’(100% coverage, 79%+ identity, and E-value <10^-44). HHpred had two hits for unknown function and hypothetical protein, however all HHpred hits had very low probability (<50%) and high E-values (>1). CDD had no relevant hits. /note=Transmembrane domains: No results from TMHMM or TOPCONS, suggesting this is not a membrane protein. /note=Secondary Annotator Name: Wright, Nicklas /note=Secondary Annotator QC: Glimmer and Genemark call the start site at 31804, so why are you talking about start sites at 6714 and 19247? These numbers are not even close, you must have gotten your wires crossed somehow. I do agree with the function call however. Update: Sorry about that, got mixed with the PECAAN notes for a different gene I was annotating. Have fixed this. CDS 32082 - 32393 /gene="42" /product="gp42" /function="hypothetical protein" /locus tag="Bolt007_42" /note=Original Glimmer call @bp 32082 has strength 15.44; Genemark calls start at 32082 /note=SSC: 32082-32393 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_PRAIRIE_41 [Arthrobacter phage Prairie]],,NCBI, q16:s8 85.4369% 1.21208E-19 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.692, -5.216491422862646, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_PRAIRIE_41 [Arthrobacter phage Prairie]],,QTF82138,63.8298,1.21208E-19 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF, downstream is ssDNA binding protein, similar to phage Klevey from the same cluster, FH. /note=unchecked high e val HHpred. - janelle /note=Primary Annotator Name: Niazmandi, Kiana /note=Auto-annotation: both glimmer and genemarks agree on the start site at 32082 /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: the final score is -5.216 which is the best score on PECAAN. /note=Gap/overlap: there is an overlap of 4 bp with its upstream gene which is reasonable and the gene could be a part of the operon. /note=Phamerator: pham: 49247. Date 01/14/22. It is conserved; found in lilmac1015 (FH) and Kelvey (FH). /note=Starterator: Start site 1 in Starterator was manually annotated in 3/5 non-draft genes in this pham. Start 1 is 320827 in Bolt007. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 32082. /note=Function call: unknown. The top three phagesdb BLAST hits have an unknown function (E-value <5e-18), and the top two NCBI BLAST hits also have an unknown function. (<83% coverage,<38% identity, and E-value <1.7e^-19). HHpred had hits that had very low coverage percent (22%-73%)with <35% probability, <65% coverage, and a very high e E-value of <83. CDD had no relevant hits. /note=Transmembrane domains: there is no TMD in TmHmm and TOPCON /note=Secondary Annotator Name: Abuwarda, Manar /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. Should mention that the -4 gap is evidence of an operon. Do not check HHpred if e values are high. CDS 32404 - 32961 /gene="43" /product="gp43" /function="ssDNA binding protein" /locus tag="Bolt007_43" /note=Original Glimmer call @bp 32404 has strength 13.15; Genemark calls start at 32404 /note=SSC: 32404-32961 CP: yes SCS: both ST: SS BLAST-Start: [ssDNA binding protein [Arthrobacter phage Prairie] ],,NCBI, q1:s1 100.0% 8.39982E-97 GAP: 10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.88, -4.884705646117852, yes F: ssDNA binding protein SIF-BLAST: ,,[ssDNA binding protein [Arthrobacter phage Prairie] ],,QTF82139,98.3784,8.39982E-97 SIF-HHPRED: Single-stranded DNA-binding protein 2; Single-stranded DNA-binding protein, Streptomyces Coelicolor, DNA damage, DNA repair, DNA replication, DNA-binding, Phosphoprotein, DNA BINDING PROTEIN; 2.141A {Streptomyces coelicolor} SCOP: b.40.4.3,,,3EIV_B,64.8649,100.0 SIF-Syn: ssDNA binding protein, upstream gene is NKF (49247), downstream is repA-like replication initiator (21492), just like in phage Klevey. Klevey has a different function for the downstream gene, though. /note=comment on length. add comparison genes in phamerator. - janelle /note=Primary Annotator Name: Senthilvelan, Jayasuriya /note=Auto-annotation: Glimmer and GeneMark agree on the start site and start codon: 32404, ATG. /note=Coding Potential: There is coding potential within the putative ORF. The start site covers this coding potential. /note=SD (Final) Score: Score of -4.885 is for the start site 32404, which is the best possible score. /note=Gap/overlap: 10 bp. It is reasonable. Gap has no coding potential. The called start site is the LORF. This gene and the gap is syntenic with most other non-draft phages except for Bumble. /note=Phamerator: Gene is found in pham 80951 as of 01/08/2022. This pham is in all members of cluster FH (conserved). Pham page says function is ssDNA binding protein. /note=Starterator: Most annotated start is 80 (212/385 call it). This start site is found in Bolt007 (32404). Bolt007 also has the start site 120, but this start is not conserved/annotated as the start site in any other phages. Hence, 32404 is the best start site. /note=Location call: Above evidence suggests this is a real gene and starts at 32404. /note=Function call: Phagesdb BLAST suggests ssDNA binding protein (Praire and Klevey, e-value < 10^-100). Phagesdb function frequency suggests the same thing (AR subcluster, 20%). NCBI BLAST agrees with the above (ssDNA binding protein). Best two hits are Klevey and Pseudarthrobacter sp. B4EP4b (e-value < 10^-80). HHPred also suggests ssDNA binding protein for Streptomyces Coelicolor and Mycobacterium leprae being the two significant hits (>60% coverage for both). CDD suggests the same thing. /note=Transmembrane domains: Neither TmHmm or Topcons predicts any TMHs. No evidence to suggest this gene product is associated with the membrane. /note=Secondary Annotator Name: BATTEIKH, MAYSAA /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator based on the evidence provided. CDS 33103 - 34071 /gene="44" /product="gp44" /function="RepA-like replication initiator" /locus tag="Bolt007_44" /note=Original Glimmer call @bp 33103 has strength 9.95; Genemark calls start at 33103 /note=SSC: 33103-34071 CP: yes SCS: both ST: SS BLAST-Start: [RepA-like replication initiator [Arthrobacter phage Prairie]],,NCBI, q1:s1 100.0% 0.0 GAP: 141 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.922, -2.7863799983944713, no F: RepA-like replication initiator SIF-BLAST: ,,[RepA-like replication initiator [Arthrobacter phage Prairie]],,QTF82140,91.9753,0.0 SIF-HHPRED: CHROMOSOME REPLICATION INITIATION PROTEIN; DNAD, DNA REPLICATION, PRIMOSOME, REPLICATION; 2.3A {GEOBACILLUS KAUSTOPHILUS HTA426},,,2VN2_B,22.6708,98.2 SIF-Syn: The function is RepA-like replication initiator (21492), upstream gene is ssDNA binding protein, downstream is NKF, just like in phage Prairie and Lilmac1015. In Klevey and Bumble, the function of the genes in the same Pham is helix-turn-helix DNA binding domain protein. /note=A couple HHpred hits suggest RepA-like protein rather than just HTH. -AF /note=Primary Annotator Name: Ma, Yiwen (Kristy) /note=Auto-annotation: GeneMark and Glimmer all agree on the same start site, which is 33103. The start codon is TTG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. There is coding potential predicted by Host-trained GeneMark and Self-trained GeneMark and the chosen start site includes all of the coding potential in both Host-trained GeneMark and Self-trained GeneMark. /note=SD (Final) Score: The Final Score is the best option of -2.786. The Z-score is 2.922, which is significant. /note=Gap/overlap: There is a 141bp gap. Somewhat large, but ultimately reasonable because the gap is conserved in other phages (Bumble, Klevey, Lilmac1015, Prairie) and there is no coding potential in the gap that might be a new gene. /note=Phamerator: Pham 21492. Date: 1/07/2022. It is conserved in Bumble (FH), Klevey (FH), Lilmac1015 (FH), and Prairie (FH). /note=Starterator: The start number called the most often in the published annotations is 1, it was called in 4 of the 4 non-draft genes in the pham. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 33103. /note=Function call: RepA-like replication initiator. The top three phagesdb BLAST hits have the function of RepA-like replication initiator (E-value <= 1e-162), OR helix-turn-helix DNA binding domain protein (E-value = 1e-162). Same conditions happened in NCBI BLAST (RepA-like replication initiator: %identity = 85.8%, %coverage = 100%, E-value = 0; helix-turn-helix DNA binding domain protein: %identity = 86.1%, %coverage = 100%, E-value = 0). There is no significant HHpred hit. CDD had no relevant hits. For both functions, the e-values are almost the same. Therefore, the possibility of either function is almost identical. RepA-like replication initiator may be a better option since it is a more specific function. (asked professor) /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Kamarzar, Minehli /note=Secondary Annotator QC: I agree with this location. However, for the function call, it would be good to give values such as e-values, percent coverage, percent identity, percent probability, and percent query coverage. I would also check with the professors to ensure that it is fine to put both functions in the function call notes. Mention how it was concluded that the function is RepA-like replication initiator. Don`t forget to fill out the synteny box! Overall, good work! CDS 34105 - 34668 /gene="45" /product="gp45" /function="helicase loader" /locus tag="Bolt007_45" /note=Original Glimmer call @bp 34105 has strength 14.38; Genemark calls start at 34105 /note=SSC: 34105-34668 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_PRAIRIE_44 [Arthrobacter phage Prairie] ],,NCBI, q1:s1 100.0% 1.09933E-108 GAP: 33 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.536, -3.5316035464369326, yes F: helicase loader SIF-BLAST: ,,[hypothetical protein SEA_PRAIRIE_44 [Arthrobacter phage Prairie] ],,QTF82141,94.6524,1.09933E-108 SIF-HHPRED: a.179.1.1 (A:2-67) Replisome organizer (g39p helicase loader/inhibitor protein) {Bacteriophage Spp1 [TaxId: 10724]},,,d1no1a1,35.8289,99.2 SIF-Syn: NKF, upstream gene function is RepA-like replication initiator (21492), downstream is also NKF (15291), just like in phage Lilmac1015. /note=add function in phamerator. can add more detail in gap - janelle /note=Primary Annotator Name: Wang, Jennifer Yiyang /note=Auto-annotation: Glimmer and Genemark both agree on a start site of 34105. The start codon ATG is called. /note=Coding Potential: This gene has reasonable coding potential predicted within the putative ORF, and the chosen start site covers all the predicted coding potential. /note=SD (Final) Score: final score=-3.532 for start site 34105. It is the best final score on PECAAN with the smallest gap. /note=Gap/overlap: gap=33. Quite large gap but is the most reasonable overlap given out of the potential start sites. /note=Phamerator: Pham: 55019. Date 01/08/22. It is conserved; found in Bumble(FH), Klevey(FH), Lilmac1015(FH) and Prairie(FH) which are all within the same cluster as Bolt007. There is no function called for the gene. /note=Starterator: Start site 1 in Starterator was manually annotated in 4/4 non-draft genes in this pham. Start 1 is 34105 in Bolt007. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 34105 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: Unknown function for the gene. All of the Phagesdb BLAST top hits state “function unknown”. All NCBI BLAST top hits state “function unknown” for the gene as well. NKF, there is no hit for CDD and no good hit for HHpred since the % coverages are all lower than 40%. /note=Transmembrane domains: No TMD`s called and no evidence suggesting TMD function within the other databases, neither TMHMM nor TOPCONS calls it. /note=Secondary Annotator Name: Krug, Kelley /note=Secondary Annotator QC: I agree with the annotation and location/function calls. For the gap you could mention that there is no coding potential in the gap and that the gap is conserved in Klevey to explain why it is reasonable. Don`t forget to select yes for "All GM Coding Capacity" dropbox. Don`t forget to update the synteny box (upstream appears to have diff fct call for Bolt007 and Klevey, so make note). CDS 34686 - 34916 /gene="46" /product="gp46" /function="hypothetical protein" /locus tag="Bolt007_46" /note=Original Glimmer call @bp 34686 has strength 19.66; Genemark calls start at 34686 /note=SSC: 34686-34916 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_KLEVEY_46 [Arthrobacter phage Klevey]],,NCBI, q1:s1 75.0% 1.24811E-26 GAP: 17 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.01, -4.613842206812587, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KLEVEY_46 [Arthrobacter phage Klevey]],,UAW09404,72.6027,1.24811E-26 SIF-HHPRED: SIF-Syn: NKF, upstream gene is pham 55019, downstream gene is pham 23030, just like in phage Klevey. /note=Primary Annotator Name: Whang, Allison /note=Auto-annotation: Glimmer and Genemark both agree on a start site of 34686. The start codon ATG was called. /note=Coding Potential: There is strong coding potential within the ORF with a start site of 34686 for both the host and self-trained Genemark coding maps. /note=SD (Final) Score: -4.614 for the start site 34686. This negative SD score (>-2) is highly indicative that this start site is the correct start site. /note=Gap/overlap: There is a 18 bp gap with the upstream gene, which stops at position 34668, whereas this gene starts at 34686. This is a reasonable gap as it is very small. /note=Phamerator: Information collected on 1/14/2022. The gene is found in pham 15291. All of the other phages that also had genes within this pham were within cluster FH; Klevey, Prairie, and Lilmac1015. No functions are listed on phamerator. /note=Starterator: Information collected on 1/14/2022. Start site 1 was the most annotated start site for the genes that are in this pham, called for 3/3 (100%). For this particular gene, the corresponding start number to 1 is 34686. This is the same start site that was agreed upon by Glimmer and Genemark. /note=Location call: This gene seems like a real gene because start site 34686 covers all coding potential within the ORF, and that Glimmer and Genemark agree on this start site. /note=Function call: Relevant PhagesDB BLAST hits match to genes with no known function. Analogous genes from phages within the same cluster as Bolt007 (FH) are included within these hits, and all have no known function. NCBI BLAST hits also match to two proteins that are hypothetical with no listed functions. HHpred hits do match to genes with functions, but e values are very large (>1) and % coverage are all less than 31%, meaning that these hits are not probable. Thus, this gene most likely has NKF. /note=Transmembrane domains: No transmembrane domains indicated by TMHMM or TOPCONS. /note=Secondary Annotator Name: Fleming, Hanna /note=Secondary Annotator QC: I agree with your location and function calls. Just make sure to check phagesdb BLAST and NCBI BLAST evidence. CDS 34913 - 35149 /gene="47" /product="gp47" /function="hypothetical protein" /locus tag="Bolt007_47" /note=Original Glimmer call @bp 34913 has strength 14.6; Genemark calls start at 34913 /note=SSC: 34913-35149 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_PRAIRIE_46 [Arthrobacter phage Prairie]],,NCBI, q1:s1 100.0% 7.10648E-45 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.246, -6.289697455117486, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_PRAIRIE_46 [Arthrobacter phage Prairie]],,QTF82143,97.4359,7.10648E-45 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Wright, Nicklas /note=Auto-annotation: Genemark and Glimmer agree on a start site of 34913. The start codon is ATG. /note=Coding Potential: The gene has good coding potential in the forward direction and the start site includes all of the coding potential. /note=SD (Final) Score: The suggested start site has a final score of -6.290, which is decent. However, start site 34997 has a much better final score (-3.867). /note=Gap/overlap: There is a 4 bp overlap, suggesting that this gene may be part of an operon. /note=Phamerator: Pham 23030 as of 1/9/2022. This pham is in all 5 phages of cluster FH, which includes Bolt007. No function is called. /note=Starterator: Start site 3 is conserved among members of this pham. It corresponds to position 34913. 2 out of 4 non-draft annotations call this start site. /note=Location call: This is likely a real gene with start site 34913. Although this start does not have the best final score, it is the only start site which includes all of the coding potential and does not have a significant gap. /note=Function call: BLASTp, CDD, and HHpred are all uninformative or lacking hits, therefore, this gene has no known function. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name:Gonzalez, Celio /note=Secondary Annotator QC: Agree with annotator CDS 35146 - 35313 /gene="48" /product="gp48" /function="hypothetical protein" /locus tag="Bolt007_48" /note=Original Glimmer call @bp 35146 has strength 13.38; Genemark calls start at 35146 /note=SSC: 35146-35313 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_KLEVEY_48 [Arthrobacter phage Klevey]],,NCBI, q9:s4 85.4545% 1.17582E-14 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.694, -3.201021426436999, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KLEVEY_48 [Arthrobacter phage Klevey]],,UAW09406,70.3704,1.17582E-14 SIF-HHPRED: SIF-Syn: NKF, upstream gene is in pham 23030, downstream gene is in pham 7404, just like in phage Klevey. /note=Primary Annotator Name: Abuwarda, Manar /note=Auto-annotation: Glimmer and Genemark. Both call the start site at 35146. ATG /note=Coding Potential: The ORF has reasonable coding potential. Coding potential is found in both Genemark Self and Host. The chosen start site includes all the coding potential. /note=SD (Final) Score: -3.201. This is not the best final score on PECAAN but it is still reasonably negative. /note=Gap/overlap: 4 bp. There is overlap between this gene and the upstream gene, however, a 4 bp overlap is typical of a gene found in an operon. The 4 bp overlap is ATGA which signifies the start codon of this gene overlapping with the stop codon of the upstream gene. /note=Phamerator: Pham 429. Date 1/8/22. It is conserved and found in Bumble (FH). /note=Starterator: Start site 4 in Starterator was manually annotated in 3/4 non-draft genes in this pham. Start 4 is 35146 in Bolt007. This evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene with the most likely start site at 35146 /note=Function call: NKF. phagesDB and NCBI BLAST show no phage hits with known function. HHPRED only shows phage hits with large e-values. CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Paek, Brian /note=Secondary Annotator QC: I agree with the evidence. Do not forget to mention the start codon. CDS 35310 - 35522 /gene="49" /product="gp49" /function="membrane protein" /locus tag="Bolt007_49" /note=Original Glimmer call @bp 35310 has strength 5.58 /note=SSC: 35310-35522 CP: yes SCS: glimmer ST: NI BLAST-Start: [membrane protein [Arthrobacter phage Klevey]],,NCBI, q1:s1 100.0% 7.82303E-19 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.297, -6.183672319573302, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Klevey]],,UAW09407,78.5714,7.82303E-19 SIF-HHPRED: SIF-Syn: Synteny is present with the upstream gene and downstream gene with klevey. The upstream genes are in pham 95561, while downstream is in pham 429 /note=Primary Annotator Name: Batteikh, Maysaa /note=Auto-annotation: Only Glimmer provides a start site for this gene, which is 35310. start is GTG /note=Coding Potential: The ORF has coding potential which is seen in the host and self genemark. The chosen start site, 35310 has all the coding potential for the gene that only goes in the forward direction. /note=SD (Final) Score: -6.184. This is the only option for a start site for this gene with a z-score of 1.297 /note=Gap/overlap: -4 overlap which is indicative of an operon. /note=Phamerator: As of 1/11/2022 this gene belongs to pham 7404 which has 3 non draft genes, with all three of them being from cluster FH. /note=Starterator: The most annotated start is 3, which is conserved by all the members of the pham. The start site for the most annotated start at 3 for this gene is 34280. /note=Location call: From the evidence gathered, this gene’s start site is at 35310. /note=Function call: Unknown function. PhagesDB blast has three hits with low e-values, <1e-20) for an unknown function.HHPRED has no hits with significant values, all e-values are greater than 0. No evidence was presented by CDD. NCBI BLAST had 2 hits of membrane protein with significant % coverage and 70% identity, with low e values of 7.8e-19, but there was no evidence provided by TmHmm or topcons to back it up, therefore this gene doesn`t have a known function. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Rajiv, Subashni /note=Secondary Annotator QC: I agree with all your calls. Make sure to check NCBI Blast boxes for evidence. For the Synteny box, please include the function of this gene and it may be helpful to include the function of the the genes in the specific pham. Please include the start codon for Auto-Annotation. CDS 35519 - 35992 /gene="50" /product="gp50" /function="RusA-like resolvase (endonuclease)" /locus tag="Bolt007_50" /note=Original Glimmer call @bp 35519 has strength 11.64; Genemark calls start at 35519 /note=SSC: 35519-35992 CP: yes SCS: both ST: NI BLAST-Start: [RusA-like resolvase [Arthrobacter phage Klevey]],,NCBI, q3:s2 98.7261% 8.47267E-101 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.978, -4.681417770958569, no F: RusA-like resolvase (endonuclease) SIF-BLAST: ,,[RusA-like resolvase [Arthrobacter phage Klevey]],,UAW09408,95.5128,8.47267E-101 SIF-HHPRED: Crossover junction endodeoxyribonuclease rusA; Homologous recombination, DNA repair, resolvase, HYDROLASE; 1.2A {Escherichia coli} SCOP: d.79.6.1,,,2H8E_A,61.7834,99.6 SIF-Syn: The function of this gene is RusA-like resolvase (endonuclease). There would have been synteny, but phage Klevey happened to add a gene upstream of this gene. However, all the other genes upstream and downstream are the same as in phage Klevey. For example, upstream gene is NKF and in pham 7404, downstream is NKF and in pham 2964, just like in phage Klevey. /note=Primary Annotator Name: Kamarzar, Minehli /note=Auto-annotation: Glimmer and GeneMark were used and both agreed on the same start site. The called start codon is 35519. /note=Coding Potential: The gene contains reasonable coding potential predicted within the putative ORF. The chosen start site covers all the coding potential. /note=SD (Final) Score: The chosen start site contains an SD score of –4.681 and z-score of 1.978, which is not the best, but is the most reasonable since this start site covers all the coding potential whereas the one with a better SD score does not. /note=Gap/overlap: The 4 base pair overlap with the upstream gene is reasonable. The length of the gene (474 bp) is acceptable given the auto-annotated start site. /note=Phamerator: As of January 9, 2022, the gene is found in pham 95561. The gene is conserved in Phage Klevey, Prairie, Bumble, and LilMac1015 which all belong to the same cluster (FH) as Phage Bolt007. The phages used for comparison were Phage Klevey, Prairie, Bumble, and LilMac1015. The function call for this gene is a RusA-like resolvase (endonuclease) protein. The function call is consistent between Phamerator and the phams database. It is approved on the SEA-PHAGES function list. /note=Starterator: The start site choice that is conserved among the members of the pham in which this gene belongs is start site 35519 which is start number 11. There are 34 non-draft members and 2 draft members in this Pham and 6/34 non-draft members call start site 28; however, the start site that made most sense for this gene is 11 which is called by 1/34 non-draft members. /note=Location call: The gathered evidence suggests that the original start site call at 35519 by Glimmer and Genemark is reasonable and it is most likely the potential start site. In addition, it also suggests that the gene is a real gene. /note=Function call: PhagesDB BLAST and NCBI BLASTp have multiple hits with small e-values with the suggested function being a RusA-like resolvase (endonuclease) protein. PhagesDB BLAST gave hits with e-values of e-80 to e-64, while NCBI BLASTp gave e-values of e-101 to e-80. The top NCBI BLASTp and PhagesDB BLAST hits sorted by e-values show high identity values (>89%) and >94% query coverage. HHpred had two hits for RusA-like resolvase (endonuclease) protein with 99.6% probability, >58% coverage, and an e-value of e-14. CDD had no relevant hits. /note=Transmembrane domains: TMHMM and TOPCONS did not predict any TMDs, which indicates that it is not a membrane protein. /note=Secondary Annotator Name: Huq, Naveed /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 36072 - 37448 /gene="51" /product="gp51" /function="hypothetical protein" /locus tag="Bolt007_51" /note=Original Glimmer call @bp 36072 has strength 20.99; Genemark calls start at 36072 /note=SSC: 36072-37448 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_KLEVEY_52 [Arthrobacter phage Klevey]],,NCBI, q1:s1 99.7817% 0.0 GAP: 79 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.018, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KLEVEY_52 [Arthrobacter phage Klevey]],,UAW09409,68.5009,0.0 SIF-HHPRED: SIF-Syn: NKF (pham 2964), upstream gene is Rus-A like resolvase, downstream gene is NKF (pham 14996), just like in phage Klevey /note=comment on SD score and length. add function to phamerator. - janelle /note=Primary Annotator Name: Krug, Kelley /note=Auto-annotation: Glimmer and GeneMark agree on the start site being 36072 /note=Coding Potential: Good coding potential in the 3rd frame, start site of 36072 encompasses all the coding potential. Has ATG start codon. /note=SD (Final) Score: has RBS final score of -2.505 and z-score of 3.018. /note=Gap/overlap: Gap of 79 bp which is significant, but it is conserved in phage Klevey. No coding potential present in the gap. /note=Phamerator: Pham is 2964 as of 1/8/22. Conserved in phage Klevey, Lilmac1015, and Prairie. /note=Starterator: Start site 3 is the most called start (3/3 non-draft genomes). Bolt007 also has start site 3, corresponding to 36072. /note=Location call: Likely a real gene with start at 36072, as evidenced by the above information. /note=Function call: No hits on CDD, no significant hits on HHPRED, two significant hits on NCBI BLAST (61.6% identity, 68.5% aligned, 99.8% coverage. e-value of 0), but both are hypothetical proteins. Thus, going with NKF. /note=Transmembrane domains: No hits on TmHmm or Topcons /note=Secondary Annotator Name: Esparza, Pablo /note=Secondary Annotator QC: I agree with the gene candidate call. I would uncheck any evidence with hypothetical protein as that does not tell us anything. I agree though that the function is unknown. CDS 37522 - 38010 /gene="52" /product="gp52" /function="hypothetical protein" /locus tag="Bolt007_52" /note=Original Glimmer call @bp 37522 has strength 16.0; Genemark calls start at 37522 /note=SSC: 37522-38010 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_KLEVEY_53 [Arthrobacter phage Klevey]],,NCBI, q1:s1 99.3827% 6.46398E-94 GAP: 73 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.536, -3.513874779476501, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KLEVEY_53 [Arthrobacter phage Klevey]],,UAW09410,88.4146,6.46398E-94 SIF-HHPRED: SIF-Syn: There is synteny displayed with other cluster FH phages. The upstream gene in Prairie is a gene with NKF in pham 2964, there is a gene in the same pham as this gene and the downstream gene encodes a helix-turn-helix binding domain protein. However, in Klevy and Lilmac1015 there is an extra gene downstream not present in the genome of Bolt007 currently. /note=Primary Annotator Name: Fleming, Hanna /note=Auto-annotation: Glimmer and GeneMark agree on a start site of 37522 bp. /note=Coding Potential: There is coding potential on both host trained and self-trained GeneMark between 37522 and 38010 bp. This start site covers all of that coding potential. /note=SD (Final) Score: The final score is -3.514, this is the best final score displayed on PECAAN. /note=Gap/overlap: There is a gap of 73 which is large but reasonable and is conserved in other cluster FH phages. /note=Phamerator: As of January 12, 2022 this gene belonged to pham 14996. This pham has only 5 members, all of which are in cluster FH. /note=Starterator: The most annotated start site was called corresponding with 37522 bp. This start is called by 3/4 other non-draft genomes. /note=Location call: This gene is most likely a real gene based on all the above evidence with start site 37522 bp. /note=Function call: NKF. There were 3 phagesDB BLAST hits with e-values<9e-75 and two NCBI BLAST hits with e-values<2.7e-92 with coverages>98%. These hits all had no known function but this does support that this is a real gene. HHPRED and CDD were uninformative. /note=Transmembrane domains: None. TmHmm predicted 0 transmembrane domains. /note=Secondary Annotator Name: Melkote, Aditi /note=Secondary Annotator QC: I agree with this call. Make sure to specify PhagesDB Blast results and comment on evidence or lack thereof from HHpred and CDD. CDS 38007 - 38369 /gene="53" /product="gp53" /function="helix-turn-helix DNA binding domain" /locus tag="Bolt007_53" /note=Original Glimmer call @bp 38007 has strength 15.26; Genemark calls start at 38007 /note=SSC: 38007-38369 CP: yes SCS: both ST: SS BLAST-Start: [helix-turn-helix DNA binding domain protein [Arthrobacter phage Klevey]],,NCBI, q3:s1 93.3333% 2.0461E-37 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.864, -2.8454123590742793, no F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding domain protein [Arthrobacter phage Klevey]],,UAW09411,60.1504,2.0461E-37 SIF-HHPRED: Putative uncharacterized protein; DNA BINDING PROTEIN; NMR {Hyperthermus butylicus},,,2LVS_A,76.6667,99.2 SIF-Syn: In Bolt007, the upstream gene has a function of NKF, this gene has a function of being large subunit terminase, and the downstream gene has a function of NKF. In Prairie, the upstream gene has no function listed, the conserved gene has a function of helix-turn-helix DNA binding domain, and the downstream gene is NKF. /note=Primary Annotator Name: Gonzalez, Celio /note=Auto-annotation:Genemark lists a start site of 38007. Glimmer lists a start site of 38007. The start codon for both is GTG. /note=Coding Potential: All coding potential is found within the bounds of 38007 and 38369 as seen in both Glimmer and Host-Trained Genemark as well as being in the forward direction. /note=SD (Final) Score:-2.845 is the final score which indicates a high likelihood that our SS is 38007 /note=Gap/overlap:There`s an overlap of 4 base pairs but this is consistent with other phages such as Bumble, Klevey, Lilmac1015, Prairie so it`s not abnormal. /note=Phamerator: Pham 93787 on 1/7/2022. It is conserved in phage Bumble (FH) and phage Prairie (FH). /note=Starterator: Start site 93 in Starterator was manually annotated 10/436 in this pham. Start 93 is 38007 in Bolt007. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Suggested start site is 38007 due to a z score and final score with appropriate ranges that indicate high likelihood of start site 38007. Additionally, it`s conserved in other phages such as Klevey, Lilmac1015, and Prairie. /note=Function call:helix-turn-helix DNA binding domain. Determined by using phagesDB BLASt and seeing how Klevey, Prairie, and Lilmac1015 (the phages with the highest similarity) had the same function as well as viewing HHPRED and seeing the protein 2LVS_A have the same function. /note=Transmembrane domains: Has no transmembrane domains which makes sense because it does not need to go through the membrane to carry out its function. /note=Secondary Annotator Name: Niazmandi, Kiana /note=Secondary Annotator QC: I agree with the start site and the function, please include more information about the Phages Db blast and HHpred like their e value and coverage. CDS 38369 - 38677 /gene="54" /product="gp54" /function="hypothetical protein" /locus tag="Bolt007_54" /note=Original Glimmer call @bp 38369 has strength 16.63; Genemark calls start at 38369 /note=SSC: 38369-38677 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_PRAIRIE_54 [Arthrobacter phage Prairie]],,NCBI, q2:s4 98.0392% 3.92805E-28 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.798, -3.2558475833917586, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_PRAIRIE_54 [Arthrobacter phage Prairie]],,QTF82151,67.3077,3.92805E-28 SIF-HHPRED: SIF-Syn: NKF. Upstream is a helix-turn-helix DNA binding domain of pham 93787 and downstream is NKF of pham 49214. /note=Primary Annotator Name: Paek, Brian /note=Auto-annotation: Both Glimmer and GeneMark agree that the start site is 38369 with a start codon of ATG. /note=Coding Potential: There is high coding potential based on the middle frame going in the forward direction within the gene range for both host-trained and self-trained GeneMark. /note=SD (Final) Score: The Final Score is -3.256, and the Z-score is 2.798, which are not the best scores among the other possibilities, however, they are acceptable and can still be considered. /note=Gap/overlap: There is a 1 bp overlap which is reasonable because it is all going on the forward strand. This start site produces the longest ORF of 309 bp which is acceptable because it is consistent with the idea that the genes must be densely packed. /note=Phamerator: Pham: 22655. Date Analyzed: 01/07/2022. The gene is conserved in FH and found in phage Bumble. /note=Starterator: Start site 4 is called in 2 out of 5 of the non-draft genes in this pham. 1 of the 4 members of pham 22655 were manually annotated for this start site. Start site 4 correlates to 38369 bp in Bolt007. /note=Location call: The gathered evidence suggests that this is a real gene and the most probable start site is at 38369. /note=Function call: Inconclusive. There are a few phagesdb and NCBI BLAST hits with E-values < 1e-22. (56% identity, 97% coverage). /note=Transmembrane domains: Neither TMHMM or TOPCONS predicted any TMDs, suggesting that this gene is not a membrane protein. /note=Secondary Annotator Name: Senthilvelan, Jayasuriya /note=Secondary Annotator QC: The final score/z-score are the best possibilities, so please edit the SD score section. 38369 start site doesn`t produce the longest ORF, so edit gap/overlap section. The -1 bp indicates an operon, which makes this start site very favorable. Otherwise, I agree with location call. Include HHPred/CDD observations in your function call. Otherwise, I agree with the function call. Nice work! CDS 38674 - 38871 /gene="55" /product="gp55" /function="hypothetical protein" /locus tag="Bolt007_55" /note=Original Glimmer call @bp 38662 has strength 14.3; Genemark calls start at 38674 /note=SSC: 38674-38871 CP: yes SCS: both-gm ST: SS BLAST-Start: [hypothetical protein SEA_KLEVEY_57 [Arthrobacter phage Klevey]],,NCBI, q1:s1 96.9231% 1.93306E-18 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.829, -2.838547390232814, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KLEVEY_57 [Arthrobacter phage Klevey]],,UAW09413,71.6418,1.93306E-18 SIF-HHPRED: SIF-Syn: NKF; Upstream and downstream gene have. NKF. All functions are conserved in phage Klevey. /note=Primary Annotator Name: Rajiv, Subashni /note=Auto-annotation: Glimmer calls the start at 38662. Genemark calls the start at 38674. The start codon is GTG. /note=Coding Potential: The coding potential in this ORF is only in the forward strand, suggesting it is a forward gene. Coding potential is found in both GeneMark Host and GeneMark Self, however there is less coding potential covering the gene in the GeneMark Self. /note=SD (Final) Score: The Final Score is -2.839 and the Z-score is 2.829. There are start sites with better Final Scores, however they have significant large gaps. This start site allows for the longest possible ORF. /note=Gap/overlap: There is an overlap of 4 bp. This is a small and normal overlap, which is evidence of an operon. /note=Phamerator: Pham 49214 on 1/7/2022. It is conserved in phage Klevey (FH), and phage Prairie (FH). /note=Starterator: Start site 3 in Starterator was found in 3/4 of genes in this pham. It was manually annotated 2 times for cluster FH. Start 3 is 38674 in Bolt007. This evidence agrees with the site predicted by GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 38674. /note=Function call: The likely function is unknown. PhagesDB’s two top hits predicted unknown function with e-values of 5e-18 and 4e-12 , and identities of 67% and 50%, respectively. NCBI’s two top hits also predicted unknown function with e-values of 2e-18 and 3e-11 and identities of 64% and 49%, respectively. The CDD database had no hits. HHpred had uninformative hits with low probability, low coverage, and a high e-value. There is not enough evidence to hypothesize a function for this gene. /note=Transmembrane domains: No transmembrane domains were called in TMHMM. 1 transmembrane domain was called in TOPCONS. It is not a membrane protein. /note=Secondary Annotator Name: Ma, Yiwen (Kristy) /note=Secondary Annotator QC: Good job! The notes is detailed. I agree with your location call and function call. CDS 38868 - 39032 /gene="56" /product="gp56" /function="membrane protein" /locus tag="Bolt007_56" /note=Original Glimmer call @bp 38868 has strength 16.44; Genemark calls start at 38868 /note=SSC: 38868-39032 CP: yes SCS: both ST: SS BLAST-Start: GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.053, -4.601664605840989, yes F: membrane protein SIF-BLAST: SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF (Pham 11055), downstream is NKF (Pham 49214), similar to phage Klevey from the same cluster, FH. /note=Primary Annotator Name: Huq, Naveed /note=Auto-annotation: Glimmer and Genemark both agree on start site 38868, start codon is GTG /note=Coding Potential: Reasonable coding potential in putative ORF, covered by chosen start site /note=SD (Final) Score: The original start site of 38868 has a final score of -4.602 and a Z-score of 2.053. This start site does not have the best Ribosome Binding Site score but the other start is not better because the gap is much larger. /note=Gap/overlap: Gap of -4 with the upstream gene is reasonable (indicates an operon) and so is gene length /note=Location call: Real gene with most likely start site @38868 /note=Function call: None of the databases got hits with an acceptable e value. - membrane protein /note=Transmembrane domains: 1 TMD predicted by both SOSUI and TMHMM. /note=Secondary Annotator Name: Wang, Yiyang Jennifer /note=Secondary Annotator QC: I agree with the location and function call. The primary annotation is not completed. Please complete and add more details. CDS 39029 - 39352 /gene="57" /product="gp57" /function="hypothetical protein" /locus tag="Bolt007_57" /note=Original Glimmer call @bp 39029 has strength 2.88; Genemark calls start at 39029 /note=SSC: 39029-39352 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_PRAIRIE_57 [Arthrobacter phage Prairie]],,NCBI, q1:s6 99.0654% 3.69112E-67 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.864, -2.8454123590742793, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_PRAIRIE_57 [Arthrobacter phage Prairie]],,QTF82154,91.0714,3.69112E-67 SIF-HHPRED: SIF-Syn: This gene only has decent synteny with Phage Prairie. The other phages have it under another gene number or like with Bumble, have it very shifted. upstream there are helix-turn-helix and a type of endonuclease. Downstream there is Par-B like nuclease domain. Remains NKF for other phages too. /note=Primary Annotator Name: Esparza, Pablo /note=Auto-annotation: Both Glimmer and Genemark say it is at start site 39029 F. /note=Coding Potential: The coding potential encompasses the entire start and stop site so it appears there is potential and that this is a real gene. /note=SD (Final) Score: This gene has the best z score and final score out of the three candidates. /note=Gap/overlap: It has an overlap of -4 which means it could be part of an operon. It is the longest of the candidates with length of 324 bp. /note=Phamerator: This gene belongs to pham 11055 as of January 12, 2022. The genes start site is not very conserved throughout the other phages but is still roughly similar to the others except for Bumble. /note=Starterator: Bolt has the most common start site. Its start number is 8 and it has 2 MA. This is found 80% for genes in the pham (4/5). There are 5 member phages and 1 is a draft. /note=Location call: This appears to be a real gene with start site @39,029 F. /note=Function call: There is a lack of any evidence throughout the blasts, Cdd, and HHpred. /note=Transmembrane domains: TOPCONS and TMHMM do not predict anything. /note=Secondary Annotator Name: Whang, Allison /note=Secondary Annotator QC: Agree with start site and function call. Add more detail to just about every section; follow the annotation lab manual for specific guidelines. Also, synteny box should follow this format (Example: "Portal protein, upstream gene is terminase, downstream is capsid maturation protease, just like in phage XXX".). CDS 39345 - 39497 /gene="58" /product="gp58" /function="hypothetical protein" /locus tag="Bolt007_58" /note=Original Glimmer call @bp 39345 has strength 12.27; Genemark calls start at 39345 /note=SSC: 39345-39497 CP: no SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_KLEVEY_60 [Arthrobacter phage Klevey]],,NCBI, q1:s1 100.0% 1.71036E-24 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.387, -5.854359841779658, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KLEVEY_60 [Arthrobacter phage Klevey]],,UAW09416,98.0,1.71036E-24 SIF-HHPRED: SIF-Syn: NKF, upstream and downstream genes are NKF, just like in phage Klevey. /note=Primary Annotator Name: Melkote, Aditi /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 39345. This is also the only start site called. /note=Coding Potential: There is no ORF depicted on GenemarkHost and S, but this can be explained by the fact that the start codon is TTG. /note=SD (Final) Score: The final score is the best and only option at -5.854 and the z score is likewise the best and only option at 1.387. /note=Gap/overlap: The gap is -8bp, this was also observed in the non-draft phage Klevey. /note=Phamerator: 11629. Date 01/19/2022. It is conserved; found in Praire (FH) and Klevey (FH) /note=Starterator: Start site 8 called in all non-draft genes in the pham (4 manual annotations), which correlates to start site 39345 for Bolt007. /note=Location call: As of now with the available evidence, this appears to be a real gene with start site at 39345bp, and despite lack of coding potential, there is synteny with other non-draft Cluster FH phages, suggesting this is a real gene. /note=Function call: No Known Function (NKF). The top three phagesdb BLAST hits have the no known function (E-value < 10^-20), and 2 out of 5 top NCBI BLAST hits also have function of ‘hypothetical protein’(100% coverage, 98%+ identity, and E-value <10^-23). HHpred had two hits for DUF (not marked as evidence). CDD had no relevant hits. /note=Transmembrane domains: No results from TMHMM or TOPCONS, suggesting this is not a membrane protein. /note=Secondary Annotator Name: Wright, Nicklas /note=Secondary Annotator QC: I disagree with the call that this is not a real gene. I think it is a real gene because I do see coding potential in GeneMark and there are some good Phagesdb BLAST hits. This gene is from pham 11629 and the other cluster FH phages have a gene from this pham at this same position in their genomes. Update: Have updated PECAAN notes. CDS 39494 - 39859 /gene="59" /product="gp59" /function="hypothetical protein" /locus tag="Bolt007_59" /note=Original Glimmer call @bp 39494 has strength 16.29; Genemark calls start at 39494 /note=SSC: 39494-39859 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_KLEVEY_61 [Arthrobacter phage Klevey]],,NCBI, q5:s2 96.6942% 3.29896E-48 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.444, -5.73456940637445, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KLEVEY_61 [Arthrobacter phage Klevey]],,UAW09417,80.6723,3.29896E-48 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF, downstream is NKF, similar to phage Klevey from the same cluster, FH. /note=Primary Annotator Name: Niazmandi, Kiana /note=Auto-annotation: both glimmer and genemarks agree on the start site at 39494 /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: the final score is -5.735 which is the best score on PECAAN. /note=Gap/overlap: there is an overlap of 4 bp with its upstream gene which is reasonable /note=Phamerator: pham: 4160. Date 01/14/22. It is conserved; found in BUmble (FH) and Kelvey (FH). /note=Starterator: Start site 1 in Starterator was manually annotated in 4/4 non-draft genes in this pham. Start 1 is 39494 in Bolt007. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 39494. /note=Function call: Unknown. The top three phagesdb BLAST hits have an unknown function (E-value <2e-40), and the top two NCBI BLAST hits also have an unknown function (called hypothetical proteins) (<80% coverage,<46% identity, and E-value <3.2e^-48). HHpred had hits that had very low coverage percent (39%-66%)with <72% probability and a very high e E-value of 29. CDD had no relevant hits.Transmembrane domains: there is no TMD in TmHmm and TOPCONS /note=Secondary Annotator Name: Abuwarda, Manar /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. The overlap is not 4. For synteny, if NKF should point out synteny of phams compared to Klevey rather than if Klevey is also NKF. You should not check HHpred if the e values are large. CDS 39856 - 40212 /gene="60" /product="gp60" /function="hypothetical protein" /locus tag="Bolt007_60" /note=Original Glimmer call @bp 39856 has strength 15.31; Genemark calls start at 39856 /note=SSC: 39856-40212 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_PRAIRIE_60 [Arthrobacter phage Prairie]],,NCBI, q1:s1 100.0% 5.00012E-17 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.551, -5.5915060314559355, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_PRAIRIE_60 [Arthrobacter phage Prairie]],,QTF82157,56.2963,5.00012E-17 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF (4160), downstream is NKF (5020), just like in phage Klevey. /note=Primary Annotator Name: Senthilvelan, Jayasuriya /note=Auto-annotation: Glimmer and GeneMark agree on the start site and start codon: 39856, GTG. /note=Coding Potential: There is coding potential within the putative ORF. The start site covers this coding potential. /note=SD (Final) Score: -5.592 is for the start site 39856, which is the best possible score. /note=Gap/overlap: -4 bp overlap for start site 39856 is highly favorable. Gap has no coding potential. The called start site is not the LORF. The LORF start site has start codon TTG (low probability of this happening) and overlap of -202 bp which is unreasonable. This gene and the gap is syntenic with other non-draft phages /note=Phamerator: Gene is found in Pham 2010 as of 01/11/2022. This pham is in all and only members of cluster FH (conserved). No function is given on the pham page of phagesdb. /note=Starterator: Most annotated start is 5 (2/4 call it), but this start is not called in my gene. Start site 4 (39856) is called and located close to start site 5. Start site 4 is not found in any other phage in starterator. Bolt007 also has start sites 1, 9, 10 which are also not found in any other phage in starterator. In summary, the pham members are too diverse and no consensus can be reached. Hence, startererator is not that informative. /note=Location call: Above evidence suggests this is a real gene and starts at 39856. /note=Function call: Phagesdb BLAST suggests that it has unknown function (Praire and Klevey have e-value < 10^-3). NCBI BLAST suggests the same thing with Praire and Klevey again being the two most significant hits. Phagesdb function frequency suggests that this might be a scaffolding protein based on subcluster F1 (frequency 94%). CDD yielded no hits. None of the HHPred hits have E-values that are <10^-3 (no significant hits). Considering all the evidence, this protein is most likely NKF since there is no other evidence to back up the idea that it is a scaffolding protein. /note=Transmembrane domains: Neither TmHmm or Topcons predicts any TMHs. No evidence to suggest this gene product is associated with the membrane. /note=Secondary Annotator Name: BATTEIKH, MAYSAA /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator based on the evidence provided. CDS 40338 - 40949 /gene="61" /product="gp61" /function="hypothetical protein" /locus tag="Bolt007_61" /note=Original Glimmer call @bp 40401 has strength 14.94; Genemark calls start at 40401 /note=SSC: 40338-40949 CP: no SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_PRAIRIE_61 [Arthrobacter phage Prairie]],,NCBI, q1:s1 99.0148% 5.21446E-104 GAP: 125 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.881, -4.90156673569703, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_PRAIRIE_61 [Arthrobacter phage Prairie]],,QTF82158,95.0739,5.21446E-104 SIF-HHPRED: SIF-Syn: The function of this gene is NKF, the upstream gene is NKF and the downstream gene is ParB-like nuclease domain. This is conserved in phage Klevey (FH), Prairie (FH), and Lilmac1015 (FH). /note=Primary Annotator Name: Ma, Yiwen (Kristy) /note=Auto-annotation: GeneMark and Glimmer all agree on the same start site, which is 40401. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. There is coding potential predicted by Host-trained GeneMark and Self-trained GeneMark. However, the auto-annotated start site does not include all of the coding potential in both Host-trained GeneMark and Self-trained GeneMark. The suggested site only includes all of the coding potential in Host-trained GeneMark. /note=SD (Final) Score: The Final Score is the best option of -4.902. The Z-score is 1.881. /note=Gap/overlap: There is a 125bp gap. Somewhat large, but ultimately reasonable because the gap is conserved in other phages of the same pham and there is no coding potential in the gap that might be a new gene. /note=Starterator: The start number called the most often in the published annotations is 14, it was called in 26 of the 46 non-draft genes in the pham. This evidence does not agree with the site predicted by Glimmer and GeneMark, which is start 18. The start 14 was chosen since this start has 26 MAs. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 40338. /note=Function call: NKF. The top three non-draft PhagesDB BLAST hits indicate that the function is unknown ( Prairie, Klevey, and Lilmac1015: E-value < 1e-101). The top two NCBI BLAST hits also show strong evidence of hypothetical protein. There is no significant HHpred hit. CDD have no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Kamarzar, Minehli /note=Secondary Annotator QC: I agree with this location and functional call. Just wondering if any of the other start sites covered all the coding potential in both the Host-trained GeneMark and Self-trained GeneMark. For phamerator, mention if the gene is conserved in any specific phages. For the function call, I would also check the top two hits for NCBI Blast along with stating the corresponding values as evidence since those are very relevant to the function. Don`t forget to fill out the synteny box! Good job! /note=Reply : I’ve added more evidence to the starterator and phamerator. The NCBI BLAST was not updated when I annotates the gene. Now I’ve updated the function call. Synteny box is completed now. CDS 40946 - 41710 /gene="62" /product="gp62" /function="ParB-like nuclease domain" /locus tag="Bolt007_62" /note=Original Glimmer call @bp 40946 has strength 21.83; Genemark calls start at 40946 /note=SSC: 40946-41710 CP: yes SCS: both ST: SS BLAST-Start: [ParB-like nuclease domain protein [Arthrobacter phage Prairie]],,NCBI, q1:s1 100.0% 8.96884E-157 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.156, -2.2364030944336752, yes F: ParB-like nuclease domain SIF-BLAST: ,,[ParB-like nuclease domain protein [Arthrobacter phage Prairie]],,QTF82159,93.4866,8.96884E-157 SIF-HHPRED: ParB domain protein nuclease; ParB-N, pnob8, partition, HYDROLASE; HET: MSE, CIT; 2.45A {Sulfolobus solfataricus},,,5K5D_B,44.4882,99.5 SIF-Syn: ParB-like nuclease domain protein, upstream gene is NFK downstream is also NFK (49554), just like in phage Klevey. /note=Primary Annotator Name: Wang, Jennifer Yiyang /note=Auto-annotation: Glimmer and Genemark both agree on a start site of 40946. The start codon ATG is called. /note=Coding Potential: This gene has reasonable coding potential predicted within the putative ORF, and the chosen start site covers all the predicted coding potential. /note=SD (Final) Score: final score=-2.236 for start site 40946. It is the best final score on PECAAN with the smallest gap. /note=Gap/overlap: 4 overlap. Most reasonable overlap given out of the potential start sites. /note=Phamerator: Pham: 15583. Date 01/08/22. It is conserved; found in Klevey(FH), Lilmac1015(FH) and Prairie(FH) which are all within the same cluster as Bolt007. The function called for the gene is ParB-like nuclease domain protein. /note=Starterator: Start site 7 in Starterator was manually annotated in 3/3 non-draft genes in this pham. Start 7 is 40946 in Bolt007. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 40946 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: Function for the gene is ParB-like nuclease domain protein. Most of the Phagesdb BLAST top hits state “ParB-like nuclease domain protein”. NCBI BLAST top hits also state “ParB-like nuclease domain protein” for the gene as well. ParB-like nuclease domain protein, there are several hits for CDD however the % coverages are all low but with good e-value, and two good hits stating “ParB-like nuclease domain protein” for HHpred (high e-value with informative function). /note=Transmembrane domains: No TMD`s called and no evidence suggesting TMD function within the other databases, neither TMHMM nor TOPCONS calls it. /note=Secondary Annotator Name: Krug, Kelley /note=Secondary Annotator QC: I agree with this annotation and the location/function calls. Make sure to fill out the GM coding capacity as yes and don`t forget to update the synteny. You could mention the 4 bp overlap may be indicative of a operon. CDS 41703 - 41837 /gene="63" /product="gp63" /function="hypothetical protein" /locus tag="Bolt007_63" /note=Genemark calls start at 41703 /note=SSC: 41703-41837 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein SEA_KLEVEY_65 [Arthrobacter phage Klevey]],,NCBI, q1:s1 95.4545% 6.92749E-11 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.864, -3.4175091270247986, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KLEVEY_65 [Arthrobacter phage Klevey]],,UAW09421,75.0,6.92749E-11 SIF-HHPRED: SIF-Syn: NKF, upstream gene is a ParB-like nuclease domain protein, downstream gene is an oxidoreductase, just like in phage Klevey. /note=Primary Annotator Name: Whang, Allison /note=Auto-annotation: There is no start site indicated by Glimmer. The start site indicated by Genemark is 41703. The start codon associated with this start site is ATG. /note=Coding Potential: There is coding potential within the entire ORF between the GeneMark indicated start site 41703 and the stop site. The coding potential is not extremely high throughout the ORF; rather it peaks to a point within the middle of the ORF. /note=SD (Final) Score: -3.418.There is only one listed start site for this gene, and the SD score >-2.0 indicates that this start site is feasible. /note=Gap/overlap: 7bp overlap with the previous gene. This is a very small overlap, <30 bp which is considered reasonable. Since the corresponding gene from Klevey also has this overlap, I would not consider this overlap an issue. /note=Phamerator: Information collected on 1/14/2022. The gene is found in pham 49554. All of the other phages that also had genes within this pham were within cluster FH; Klevey and Lilmac1015. No functions are listed on phamerator. /note=Starterator: Information collected on 1/14/2022. Start site 1 was the most annotated start site for the genes that are in this pham, called for 2/2 (100%). For this particular gene, the corresponding start number to 1 is 41703. This is the same start site that was agreed upon by Genemark. /note=Location call: This gene seems like a real gene because start site 41703 covers all coding potential within the ORF, and that this start site, while not identified by Glimmer, seems reasonable considering synteny with corresponding genes from other phages within the same cluster as Bolt007. /note=Function call: Relevant PhagesDB blast hits all match to unknown functions. The only NCBI blast hit that occurs is one that also matches to an unknown function. There are no reasonable HHpred hits as all of them have an extremely large e-value (>1). Thus, the function of this gene remains unknown (NKF). /note=Transmembrane domains: No transmembrane domains indicated by TMHMM or TOPCONS. /note=Secondary Annotator Name: Fleming, Hanna /note=Secondary Annotator QC: I agree with your location and function calls. Make sure to fill out coding capacity drop down. Also check phagesdb and NCBI BLAST evidence. In starterator PECAAN notes you say it agrees with Glimmer but Glimmer did not call a start. CDS 41852 - 41995 /gene="64" /product="gp64" /function="hypothetical protein" /locus tag="Bolt007_64" /note= /note=SSC: 41852-41995 CP: yes SCS: neither ST: NA BLAST-Start: GAP: 14 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.694, -3.120979338716238, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Added gene, though may not be real. Spike of coding potential on GM-host. GM-self has almost nothing. -AF CDS 41988 - 42932 /gene="65" /product="gp65" /function="oxidoreductase" /locus tag="Bolt007_65" /note=Original Glimmer call @bp 41988 has strength 11.86; Genemark calls start at 41988 /note=SSC: 41988-42932 CP: yes SCS: both ST: SS BLAST-Start: [oxidoreductase [Arthrobacter phage Prairie]],,NCBI, q1:s1 100.0% 0.0 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.77, -3.4897410997597818, yes F: oxidoreductase SIF-BLAST: ,,[oxidoreductase [Arthrobacter phage Prairie]],,QTF82161,95.2229,0.0 SIF-HHPRED: JBP-like; oxygenase domain of J-binding protein (JBP) 1 and JBP2 thymidine hydroxylases and similar proteins, including uncharacterized bacterial and phage proteins.,,,cd18894,80.5732,100.0 SIF-Syn: Gene 63 is an oxidoreductase and the downstream gene is in pham 55612, just like in phage Bumble. The upstream gene is in a different pham in Bolt007 compared to Bumble, but the gene is the same size in both phages. /note=Primary Annotator Name: Wright, Nicklas /note=Auto-annotation: Genemark and Glimmer agree on a start site of 41988. The start codon is ATG. /note=Coding Potential: The gene has good coding potential in the forward direction and the start site includes all of the coding potential. /note=SD (Final) Score: The final score is -3.490 which is the best final score on PECAAN. /note=Gap/overlap: This gene has a 150 bp gap, which is rather large, but other phages of cluster FH have similar gaps. /note=Phamerator: Pham 9896 as of 1/11/2022. This pham is in 40 phages of diverse clusters, although cluster B phages, such as Apex, are by far the most numerous. The function of this pham is oxidoreductase. /note=Starterator: This pham is quite large, with 40 non-draft members. The most conserved start site is 29 and it is only called in 16 of the 40 non-draft phages. Bolt007 does not have start site 29 and instead has start site 28 which corresponds to position 41988. Start site 28 is called in only 3 of 40 phages. /note=Location call: This is likely a real gene with start site 41988. /note=Function call: This gene is very likely an oxidoreductase. HHpred, BLASTp, and CDD all have very strong hits for oxidoreductase. (e-value < 4x10^-17). /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name:Gonzalez, Celio /note=Secondary Annotator QC: Agree with annotator. But do add more information on "Gap/Overlap section" to mention how despite the large gap it`s normal b/c ie other phages have large gaps, etc. Update: more detail added to "Gap" section. CDS 42929 - 43672 /gene="66" /product="gp66" /function="glycosyltransferase" /locus tag="Bolt007_66" /note=Original Glimmer call @bp 42929 has strength 13.43; Genemark calls start at 42929 /note=SSC: 42929-43672 CP: yes SCS: both ST: SS BLAST-Start: [glycosyltransferase [Arthrobacter phage Prairie]],,NCBI, q1:s1 100.0% 4.29956E-173 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.764, -3.0377949497172554, yes F: glycosyltransferase SIF-BLAST: ,,[glycosyltransferase [Arthrobacter phage Prairie]],,QTF82162,98.3806,4.29956E-173 SIF-HHPRED: Polypeptide N-acetylgalactosaminyltransferase; GalNAc-Ts, GalNAc-T3, long-range glycosylation preference, (glyco)peptides, Molecular dynamics, specificity, enzyme kinetics, FGF23, phosphate homeostasis, TRANSFERASE; HET: EDO, NAG, UDP, NGA; 1.96A {Taeniopygia guttata},,,6S22_A,91.0931,99.8 SIF-Syn: Glycosyltransferase, upstream gene is oxidoreductase, downstream gene is glucosaminyl deacetylase, just like in phage Lilmac1015. /note=Primary Annotator Name: Abuwarda, Manar /note=Auto-annotation: Glimmer and Genemark. Both call the start site at 42929. /note=Coding Potential: The ORF has reasonable coding potential. Coding potential is found in both Genemark Self and Host. The chosen start site includes all the coding potential. /note=SD (Final) Score: -3.038. This is the best final score on PECAAN. /note=Gap/overlap: - 4. There is overlap between this gene and the upstream gene, however, a 4 bp overlap is typical of a gene found in an operon. The 4 bp overlap is ATGA which signifies the start codon of this gene overlapping with the stop codon of the upstream gene. /note=Phamerator: Pham 55612. Date 1/11/22. It is conserved and found in Lilmac1015 (FH). /note=Starterator: Start site 5 in Starterator is not the most annotated start site but was manually annotated in 3/10 non-draft genes in this pham. Start 5 is 42929 in Bolt007. This evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene with the most likely start site at 42929. /note=Function call: Glycosyltransferase. The top 3 phagesdb BLAST hits have the function of glycosyltransferase (E-value < e-137) and the top 2 NCBI BLAST hits also have the function of glycosyltransferase (100% coverage, 95%+ identity, E-value < 2e-171). HHpred had a hit for acetylgalactosaminyltransferase which is a family of glycosyltransferases with 99.8% probability, 91.1% coverage, and E-value of 1.3e-20. CCD had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Paek, Brian /note=Secondary Annotator QC: I agree with the evidence. Do not forget to mention the start codon. CDS 43669 - 44259 /gene="67" /product="gp67" /function="glucosaminyl deacetylase" /locus tag="Bolt007_67" /note=Original Glimmer call @bp 43669 has strength 13.75; Genemark calls start at 43669 /note=SSC: 43669-44259 CP: yes SCS: both ST: SS BLAST-Start: [glucosaminyl deacetylase [Arthrobacter phage Prairie]],,NCBI, q1:s1 100.0% 4.22348E-107 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.978, -4.619104450198239, no F: glucosaminyl deacetylase SIF-BLAST: ,,[glucosaminyl deacetylase [Arthrobacter phage Prairie]],,QTF82163,86.2245,4.22348E-107 SIF-HHPRED: Uncharacterized protein; CE-14 deacetylase, METAL BINDING PROTEIN; HET: CD, HEZ, TRS, CL, MSE; 1.51A {Pyrococcus furiosus},,,4XLZ_F,98.9796,100.0 SIF-Syn: Synteny is present with the upstream gene and downstream gene with klevey. The upstream genes are in pham 3854, while downstream is in pham 55612 /note=Primary Annotator Name: Batteikh, Maysaa /note=Auto-annotation: Glimmer and Genemark called 43669 as the start site. /note=Coding Potential: The ORF has coding potential which is seen in the host and self genemark. The chosen start site, 43669 has all the coding potential for the gene that only goes in the forward direction. /note=SD (Final) Score:-4.619. The best final score option on PECAAN with the second best z-score of 1.978 /note=Gap/overlap: An overlap of -4, which could indicate an operon, the best start site with a reasonable overlap. /note=Phamerator: As of 1/10/2022, this gene belongs to pham 67355, which is composed of 11 members, 10 of which are non draft genes, and 3 of which belong to cluster FH. THose include Lilmac1015, Prairie, and Klevey /note=Starterator: The most annotated start site for this pham is 5, which is called by 7 of the 10 non draft genes. This gene does not have nor call 5 as the most conserved start site. Rather they call start number 4 with a start site of 43669. /note=Location call: From the evidence collected, the start site of 43669 is the best startsite for this gene due to having the best overlap, containing all the coding potential and backed up by the starterator. /note=Function call: glucosaminyl deacetylase. PhagesDB BLAST has multiple hits with this function, 3 of which are very low e-value (<1e-83). NCBI blast also has multiple hits with this function, the top two hits have e-values<3.1e-102, high percent coverage (100%) and a decent % identity (80%). CDD had one hit with this function that could be considered evidence with a low e-value, 5.0313e-11, decent percent coverage (52%), but low % identity (32%). /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Rajiv, Subashni /note=Secondary Annotator QC: I agree with your calls. Please mention the start codon. HHpred hits were not mentioned for function call and not selected as evidence. For synteny box, including the function is helpful. CDS complement (44305 - 44487) /gene="68" /product="gp68" /function="hypothetical protein" /locus tag="Bolt007_68" /note= /note=SSC: 44487-44305 CP: yes SCS: neither ST: NA BLAST-Start: [hypothetical protein SEA_KLEVEY_70 [Arthrobacter phage Klevey]],,NCBI, q1:s1 100.0% 7.52163E-17 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.451, -3.708514821076885, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KLEVEY_70 [Arthrobacter phage Klevey]],,UAW09426,70.0,7.52163E-17 SIF-HHPRED: SIF-Syn: /note=Added gene. CP present on both GM-host and self. Gene also present in Klevey. CDS complement (44484 - 44660) /gene="69" /product="gp69" /function="hypothetical protein" /locus tag="Bolt007_69" /note=Original Glimmer call @bp 44660 has strength 18.43; Genemark calls start at 44666 /note=SSC: 44660-44484 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_PRAIRIE_68 [Arthrobacter phage Prairie]],,NCBI, q1:s1 87.931% 2.48924E-10 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.864, -2.9063687850157054, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_PRAIRIE_68 [Arthrobacter phage Prairie]],,QTF82165,42.5287,2.48924E-10 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF and in pham 8210, downstream is being added, just like in phage Prairie. /note=Primary Annotator Name: Kamarzar, Minehli /note=Auto-annotation: Glimmer and GeneMark were used and both agreed on the same start site. The called start codon is 44660. /note=Coding Potential: The gene contains reasonable coding potential predicted within the putative ORF. The chosen start site covers all the coding potential. /note=SD (Final) Score: The SD score of -2.906 is the best option and the z-score is the highest at 2.864. /note=Gap/overlap: The 4 base pair overlap with the upstream gene is reasonable. The length of the gene (177 bp) is acceptable given the auto-annotated start site. /note=Phamerator: As of January 9, 2022, the gene is found in pham 3854. The gene is conserved in Phage Klevey, LilMac1015, and Prairie which belong to the same cluster (FH) as Phage Bolt007. The phages used for comparison were Phage Klevey, LilMac1015, and Prairie. There was no function call for this gene. /note=Starterator: The start site choice that is conserved among the members of the pham in which this gene belongs is start site 44660 which is start number 7. There are 3 non-draft members and 1 draft member in this Pham and 2/3 non-draft members call start site 3; however, the start site that made most sense for this gene is 7 which is called by 1/3 non-draft members. /note=Location call: The gathered evidence suggests that the original start site call at 44660 by Glimmer and Genemark is reasonable and it is most likely the potential start site. In addition, it also suggests that the gene is a real gene. /note=Function call: PhagesDB BLAST and NCBI BLASTp have multiple hits with small e-values with no known function. PhagesDB BLAST gave hits with e-values of e-9 and e-10, while NCBI BLASTp gave e-values of e-10. The top NCBI BLASTp and PhagesDB BLAST hits sorted by e-values show identity values of >50% and >87% query coverage. HHpred has hits with high e-values, therefore no relevant hits were used. CDD also had no relevant hits. /note=Transmembrane domains: TMHMM and TOPCONS did not predict any TMDs, which indicates that it is not a membrane protein. /note=Secondary Annotator Name: Huq, Naveed /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS complement (44657 - 44836) /gene="70" /product="gp70" /function="hypothetical protein" /locus tag="Bolt007_70" /note=Original Glimmer call @bp 44836 has strength 18.64; Genemark calls start at 44836 /note=SSC: 44836-44657 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_KLEVEY_73 [Arthrobacter phage Klevey]],,NCBI, q3:s14 88.1356% 0.00319695 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.355, -3.8308929311341546, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KLEVEY_73 [Arthrobacter phage Klevey]],,UAW09429,35.8696,0.00319695 SIF-HHPRED: SIF-Syn: NKF (pham 8210), upstream is NKF (pham 3854), downstream is helix-turn-helix DNA binding domain protein (pham 94753), just like in phage Klevey /note=Primary Annotator Name: Krug, Kelley /note=Auto-annotation: Glimmer and GeneMark both say start is 44836 /note=Coding Potential: Good coding potential in the 4th frame with start site 44836 and stop 44657. Start 44836 encompasses all coding potential. Has ATG start codon. /note=SD (Final) Score: final score of -3.831, is the best option, has best z-score of 2.355 as well /note=Gap/overlap: -4 gap is small and reasonable, could be indicative of an operon. /note=Phamerator: Pham 8210 as of 1/10/22. Gene is conserved in phage Klevey. /note=Starterator: Start site number 4 is the most called site, called in 3/3 non-draft genomes. Site 4 is present in Bolt007 and called, corresponding to 44836 /note=Location call: Likely a real gene with start site 44836 as evidenced by the above information. /note=Function call: No hits on CDD. NCBI BLAST had a hit for hypothetical protein from phage Klevey (31.5% identity, 35.9% aligned, 88.1% coverage, and e-value of 0.0032). HHPRED had a hit for YcbB but the evidence appears weak (90.5% probability, 74.6% coverage, e-value of 0.77). Thus, NKF. /note=Transmembrane domains: No hits for TmHmm nor Topcons. /note=Secondary Annotator Name: Esparza, Pablo /note=Secondary Annotator QC: I would once again uncheck the evidence boxes as there is no sufficient evidence to say it is anything. i agree though that there is no known function and with the gene candidate. CDS complement (44833 - 45075) /gene="71" /product="gp71" /function="helix-turn-helix DNA binding domain" /locus tag="Bolt007_71" /note=Original Glimmer call @bp 45075 has strength 15.29; Genemark calls start at 45075 /note=SSC: 45075-44833 CP: yes SCS: both ST: SS BLAST-Start: [helix-turn-helix DNA binding domain protein [Arthrobacter phage Prairie]],,NCBI, q1:s1 100.0% 5.90995E-43 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.118, -4.325675514574479, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding domain protein [Arthrobacter phage Prairie]],,QTF82167,95.0617,5.90995E-43 SIF-HHPRED: Regulatory protein cox; helix-turn-helix, DNA binding, VIRAL PROTEIN; 2.401A {Enterobacteria phage P2},,,4LHF_A,98.75,98.6 SIF-Syn: There is synteny with the genomes of Klevey and Prairie with an upstream gene in pham 8210 and a downstream gene in pham 14190 and also a matching helix-turn-helix DNA binding domain gene. /note=Primary Annotator Name: Fleming, Hanna /note=Auto-annotation: GeneMark and Glimmer both call this gene with a start site of 45075 bp. /note=Coding Potential: There is coding potential on host and self-trained GeneMark in this range and the start site at 45075 bp covers all of the coding potential. /note=SD (Final) Score: -4.326 is the best and only final score displayed on PECAAN. /note=Gap/overlap: There is a gap of -4, this is a reasonable overlap and suggests that the gene is part of an operon. /note=Phamerator: As of January 11, 2022 this gene belonged to pham 94753. There are 165 members of this pham including 3 other members of cluster FH. /note=Starterator: Start 53 was found in 9/165 non-draft genes this corresponds to 45075 bp in Bolt007. This is not the most annotated start site but this gene does not have the most annotated start site. /note=Location call: This gene is most likely a real gene based on all the evidence collected above with start site 45075. /note=Function call: Helix-turn-helix DNA binding domain. Phagesdb BLAST had 3 hits matching this function with e-values<2e-34 all belonging to cluster FH phages. HHpred had hits for a DNA binding protein with an e-value of 2.7e-9 and a putative excisionase (which have hth motifs) with an e-value of 3e-7. CDD was uninformative. /note=Transmembrane domains: None, TmHmm predicts 0 transmembrane domains and neither does TOPCONS. /note=Secondary Annotator Name: Melkote, Aditi /note=Secondary Annotator QC: I agree with this call. Update pham number (it is different now) and review starterator results. CDS complement (45072 - 45407) /gene="72" /product="gp72" /function="hypothetical protein" /locus tag="Bolt007_72" /note=Original Glimmer call @bp 45407 has strength 3.92; Genemark calls start at 45407 /note=SSC: 45407-45072 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_KLEVEY_75 [Arthrobacter phage Klevey]],,NCBI, q1:s1 100.0% 7.29476E-41 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.919, -5.648690092149513, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KLEVEY_75 [Arthrobacter phage Klevey]],,UAW09431,84.6939,7.29476E-41 SIF-HHPRED: SIF-Syn: In Bolt007, the upstream gene has a function of helix turn helix terminase, this gene has a function of being NKF , and the downstream gene has a function of NKF. In Prairie, the upstream gene has helix turn helix terminase, the conserved gene is NKF, and the downstream gene is a NKF. /note=Primary Annotator Name: Gonzalez, Celio /note=Auto-annotation:Genemark lists a start site of 45407. Glimmer lists a start site of 45407. The start codon for both is GTG. /note=Coding Potential: All coding potential is found within the bounds of 45407 and 45072 as seen in both Glimmer and Host-Trained Genemark as well as being in the forward direction. /note=SD (Final) Score:-5.649 is the final score which indicates a high likelihood that our SS is 45407 /note=Gap/overlap:There`s an overlap of 4 base pairs so it`s not abnormal because it could just be the promoter. /note=Phamerator: Pham 14190 on 1/7/2022. It is conserved in phage Klevey (FH) and phage Prairie (FH). /note=Starterator: Start site 5 in Starterator was manually annotated 3/3 in this pham. Start 5 is 45407 in Bolt007. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Suggested start site is 45407 due to a z score and final score with appropriate ranges that indicate high likelihood of start site 45407. Additionally, it`s conserved in other phages such as Klevey, Lilmac1015, and Prairie. /note=Function call: NKF. Determined by using phagesDB BLASt and seeing how Klevey and Lilmac1015 (the phages with the highest similarity) had the same function as well as viewing HHPRED and seeing the protein PF13773.9 have no known function. /note=Transmembrane domains: Has no transmembrane domains which makes sense because it does not need to go through the membrane to carry out its function. /note=Secondary Annotator Name: Niazmandi, Kiana /note=Secondary Annotator QC: I agree with the strat site and function call, please include the more evidence in the function call by mentioning the e value, coverage, and identity. CDS complement (45404 - 46402) /gene="73" /product="gp73" /function="hypothetical protein" /locus tag="Bolt007_73" /note=Original Glimmer call @bp 46402 has strength 16.63; Genemark calls start at 46402 /note=SSC: 46402-45404 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_KLEVEY_76 [Arthrobacter phage Klevey]],,NCBI, q1:s1 100.0% 0.0 GAP: 15 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.694, -3.2619778523784246, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KLEVEY_76 [Arthrobacter phage Klevey]],,UAW09432,93.0723,0.0 SIF-HHPRED: SIF-Syn: NKF. Upstream is a NKF of pham 14190 and downstream is a NKF of pham 6172 /note=Primary Annotator Name: Paek, Brian /note=Auto-annotation: Both Glimmer and GeneMark agree that the start site is 46402 with a start codon of ATG. /note=Coding Potential: There is high coding potential based on the first frame going in the reverse direction within the gene range for both host-trained and self-trained GeneMark. /note=SD (Final) Score: The Final Score is -3.262, and the Z-score is 2.694, both of which are the best among other start site options. /note=Gap/overlap: There is a 15 bp gap which is reasonable because the previous and subsequent genes are all going on the forward strand. This start site produces the longest ORF of 1053 bp which is acceptable because it is consistent with the idea that the genes must be densely packed and are allowed to have some gaps for operons. /note=Phamerator: Pham: 95551. Date Analyzed: 01/07/2022. The gene is conserved in FH and found in phages Klevey and Prairie. /note=Starterator: Start site 80 is called in 20 out of 304 of the non-draft genes in this pham. 2 of the 262 members of pham 95551 were manually annotated for this start site. Start site 80 correlates to 46402 bp in Bolt007. /note=Location call: The gathered evidence suggests that this is a real gene and the most probable start site is at 46402. /note=Function call: Inconclusive. There are a few phagesdb and NCBI BLAST hits with E-values < 1e-99. (85% identity,100% coverage). /note=Transmembrane domains: Neither TMHMM or TOPCONS predicted any TMDs, suggesting that this gene is not a membrane protein. /note=Secondary Annotator Name: Senthilvelan, Jayasuriya /note=Secondary Annotator QC: I agree with location call. Nice work! Check top two NCBI BLAST hits as evidence. Include CDD and HHPred in function call notes. Uncheck that hit in HHpred because its not a strong hit. CDS complement (46418 - 46678) /gene="74" /product="gp74" /function="hypothetical protein" /locus tag="Bolt007_74" /note=Original Glimmer call @bp 46678 has strength 15.74; Genemark calls start at 46678 /note=SSC: 46678-46418 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_KLEVEY_77 [Arthrobacter phage Klevey]],,NCBI, q1:s1 100.0% 1.26596E-25 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.876, -2.740768097004632, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KLEVEY_77 [Arthrobacter phage Klevey]],,UAW09433,71.2644,1.26596E-25 SIF-HHPRED: SIF-Syn: NKF; The upstream and donwnstream gene have no known function. All functions are conserved in phage IKlevey. /note=Primary Annotator Name: Rajiv, Subashni /note=Auto-annotation: Glimmer calls the start at 46678. Genemark calls the start at 46678. The start codon is ATG. /note=Coding Potential: The coding potential in this ORF is only in the reverse strand, suggesting it is a reverse gene. Coding potential is found in both GeneMark Host and GeneMark Self. /note=SD (Final) Score: The Final Score is -2.741 and the Z-score is 2.876. There is only one listed start site. /note=Gap/overlap: There is an overlap of 4 bp. This is a small and normal overlap, which is evidence of an operon. /note=Phamerator: Pham 6172 on 1/7/2022. It is conserved in phage Klevey (FH), and phage Prairie (FH). /note=Starterator: Start site 1 in Starterator was found in 3/3 of genes in this pham. It was manually annotated 2 times for cluster FH. Start 1 is 46678 in Bolt007. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 46678. /note=Function call: The likely function is unknown. PhagesDB’s two top hits predicted unknown function with e-values of 1e-21 and 5e-21 , and identities of 60% and 67%, respectively. NCBI’s two top hits also predicted unknown function with e-values of 1e-25 and 2e-25 and identities of 63% and 73%, respectively. The CDD database had no hits. HHpred had uninformative hits with low probability, low coverage, and a high e-value. There is not enough evidence to hypothesize a function for this gene. /note=Transmembrane domains: No transmembrane domains were called in TMHMM or TOPCONS. It is not a membrane protein. /note=Secondary Annotator Name: Ma, Yiwen (Kristy) /note=Secondary Annotator QC: Good work! I agree with you location call and function call. CDS complement (46675 - 46893) /gene="75" /product="gp75" /function="hypothetical protein" /locus tag="Bolt007_75" /note=Original Glimmer call @bp 46893 has strength 22.71; Genemark calls start at 46893 /note=SSC: 46893-46675 CP: yes SCS: both ST: SS BLAST-Start: GAP: 74 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.018, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Huq, Naveed /note=Auto-annotation: Glimmer and Genemark both agree on start site 46893, start codon is ATG /note=Coding Potential: Reasonable coding potential in putative ORF, covered by chosen start site /note=SD (Final) Score: The original start site of 46830 has a final score of -6.664 and a Z-score of 1.167. This start site does not have the best Ribosome Binding Site score but the other start is not better because the gap is much larger. /note=Location call: Real gene with most likely start site @46893 /note=Function call: None of the databases got hits with an acceptable e value. /note=Transmembrane domains: 0 TMDs predicted /note=Secondary Annotator Name: Wang, Jennifer Yiyang /note=Secondary Annotator QC: I agree with the location call. Please complete the annotation. CDS complement (46968 - 47492) /gene="76" /product="gp76" /function="hypothetical protein" /locus tag="Bolt007_76" /note=Original Glimmer call @bp 47492 has strength 16.2; Genemark calls start at 47492 /note=SSC: 47492-46968 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_KLEVEY_79 [Arthrobacter phage Klevey]],,NCBI, q14:s14 91.3793% 2.58938E-83 GAP: 103 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.018, -2.523003374675015, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KLEVEY_79 [Arthrobacter phage Klevey]],,UAW09435,77.7143,2.58938E-83 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Esparza, Pablo /note=Auto-annotation: Both Glimmer and Genemark say it is a start site of @47,492 R. /note=Coding Potential: There is coding potential on host and self covering the start and stop site of this gene. /note=SD (Final) Score: The z-score is the highest of all candidates measuring at 3.018 and the final score is the best @-2.523. /note=Gap/overlap: There is a gap of 103, while that is big it is the smallest of all the other candidates. Its length is 525 which is the longest of them all. /note=Phamerator: The pham that this gene belongs to is 9144 as of January 10, 2022. There isn`t a whole lot of conserved start sight but under different gene numbers the gene is within the vicinity of the other phages except for with Bumble. /note=Starterator: It has a start number of 2 and all the phages in this report (total number of 4) have this start number as well. There are 3 manual annotations stating the start site is @47492 R. There are 4 members and 1 of them is a draft. When found, this gene is 100% called with start number 2. This reassures the starting site. /note=Location call: This appears to be a real gene and has a codon of ATG. Starting site is @47,492 R. /note=Function call: Phagesdb function frequency says it is 100% of the time a major capsid hexamer protein but it is the only reliable line of evidence to support this. Phagesdb blast mentions the major hexamer but the e-value is far too low to be significant. I will go with no known function due to lack of evidence. /note=Transmembrane domains: TOPCONS does not say anything. TMHMM has no hits so it can`t be in the membrane domain. /note=Secondary Annotator Name: Whang, Allison /note=Secondary Annotator QC: Agree with start site and function calls. I don`t see any TOPCONS information either so its safe to assume that there is no transmembrane action happening here. I would try to add slightly more detail to each notes section; again please see the annotation manual. Again, synteny box should be in this format (Example: "Portal protein, upstream gene is terminase, downstream is capsid maturation protease, just like in phage XXX". ). CDS complement (47596 - 48750) /gene="77" /product="gp77" /function="hypothetical protein" /locus tag="Bolt007_77" /note=Original Glimmer call @bp 48750 has strength 14.33; Genemark calls start at 48750 /note=SSC: 48750-47596 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_KLEVEY_80 [Arthrobacter phage Klevey]],,NCBI, q4:s3 99.2188% 1.86588E-159 GAP: 150 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.799, -5.071299467497814, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KLEVEY_80 [Arthrobacter phage Klevey]],,UAW09436,75.25,1.86588E-159 SIF-HHPRED: SIF-Syn: NKF, upstream and downstream genes are also NKF, just as in phage Klevey /note=Primary Annotator Name: Melkote, Aditi /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 48750. /note=Coding Potential: This ORF has good coding potential on both GeneMark S and GeneMark Host, with the start site 48750 including all of the coding potential. /note=SD (Final) Score: The final score is -5.701, this is not the best final score, but the start sites with higher (more negative) final scores do not cover all the coding potential. This start site 48750 is also the only start site with the least gap with the upstream gene. The z score is 1.799. /note=Gap/overlap: The gap is 150bp, this large gap was also observed in the non-draft phage Klevey. /note=Phamerator: 3568. Date 01/12/2022. It is conserved; found in Praire (FH) and Klevey (FH) /note=Starterator: Start site 1 (6714), not called in any of 3/3 non-draft genes in the pham, only called in Bolt007. However all other start sites are not supported by the other evidence. They have excessive gaps with the upstream gene, with no coding potential in the gap region, making them unsuitable start sites. /note=Location call: As of now with the available evidence, this appears to be a real gene with start site at 48750bp. /note=Function call: No Known Function (NKF). The top three phagesdb BLAST hits have the no known function (E-value < 10^-147), and 2 out of 3 top NCBI BLAST hits also have function of ‘hypothetical protein’(>99% coverage, 75%+ identity, and E-value <10^-159). All HHpred hits have extremely low coverage (<9%), very low probability (<23%) and extremely high e-values (>150), making them unsuitable for use as evidence. CDD had no relevant hits. /note=Transmembrane domains: No results from TMHMM or TOPCONS, suggesting this is not a membrane protein. /note=Secondary Annotator Name: Wright, Nicklas /note=Secondary Annotator QC: I agree with the location call of start site 47850 and I also agree with the function call. Please fill out the synteny box. CDS complement (48901 - 49680) /gene="78" /product="gp78" /function="hypothetical protein" /locus tag="Bolt007_78" /note=Original Glimmer call @bp 49680 has strength 13.27; Genemark calls start at 49680 /note=SSC: 49680-48901 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_KLEVEY_81 [Arthrobacter phage Klevey]],,NCBI, q1:s1 99.6139% 4.19375E-132 GAP: 0 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.018, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KLEVEY_81 [Arthrobacter phage Klevey]],,UAW09437,83.5878,4.19375E-132 SIF-HHPRED: SIF-Syn: NKF. Upstream is NKF and there is no downstream! this the last gene of Bolt007 genome. similar to Kevely from cluster FH /note=Primary Annotator Name: Niazmandi, Kiana /note=Auto-annotation: both glimmer and genemarks agree on the start site at 49680 /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. coding potential exists upstream the strat site ( around 49750) but there is no start codon at that location. /note=SD (Final) Score: the final score is -2.443 which is the best score on PECAAN. /note=Gap/overlap: this is the last gene, thus there is no upstream gene. There is no gap or overlap between this gene and its upstream gene /note=Phamerator: pham: 96226. Date 01/14/22. It is conserved; found in lilmac1015 (FH) and Kelvey (FH). /note=Starterator: Start site 24 in Starterator was manually annotated in 85/117 non-draft genes in this pham. Start 29 is 49680 in Bolt007, which is not the most conserved start number. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 49680, but there is a coding potential upstream of the start site which we should consider /note=Function call: Unknown. The top three phagesdb BLAST hits have an unknown function (E-value