CDS 21 - 470 /gene="1" /product="gp1" /function="helix-turn-helix DNA binding domain" /locus tag="GravityBall_1" /note=Original Glimmer call @bp 21 has strength 15.52 /note=SSC: 21-470 CP: yes SCS: glimmer ST: SS BLAST-Start: [HTH DNA binding protein [Arthrobacter phage Timinator] ],,NCBI, q1:s1 100.0% 9.05479E-105 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.69, -3.5499432015425856, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[HTH DNA binding protein [Arthrobacter phage Timinator] ],,ASR78031,100.0,9.05479E-105 SIF-HHPRED: SIF-Syn: The downstream gene is a terminase, large subunit. This gene displays synteny with other phages in cluster AO2, including BarretLemon and StevieBAY /note=Primary Annotator Name: Castellanos, Sebastian /note=Auto-annotation: Glimmer was the only reference listed on PECAAN. It gave the start site as nucleotide position 21 for the start codon GTG. /note=Coding Potential: The start and stop sites provided by the annotation encompass all of the putative coding potential depicted in the host-trained GeneMark map as well as the self-trained GeneMark map. /note=SD (Final) Score: -3.550. The final score is the highest for the proposed start site of 21. Additionally, the z-score for this start site is the largest of all candidates - surpassing the threshold of 2 for significance as evidence. /note=Gap/overlap: Since this is the first of all of the draft genes in the GravityBall genome, this start site has no overlap with any previous gene. There are no other start sites that generate a longer open reading frame than this one - according to PECAAN, it is the LORF. The length of the gene is definitely acceptable at 450 base pairs. /note=Phamerator: Pham 86038 - this pham has 37 members, of which only 3 were drafts including GravityBall. The gene is conserved in several other phages within cluster A02, including BarretLemon and StevieBAY, both of which listed functions of "helix-turn helix DNA-binding domain" /note=Starterator: The auto-annotation start site was listed as “9” in the track listings (again beginning at nucleotide position 21), and this was the most commonly called start site (23/34 non-draft listings), with eight other members of cluster A02 having called it (Note the Phamerator & Starterator reports were accessed October 2, 2023). /note=Location call: Based on all of the evidence aforementioned, this is most likely a real gene and its start site is at 21 /note=Function call: Helix-turn-helix. The PhagesDB Blast revealed more than 40 hits with other arthrobacter genomes with e values below the threshold of 10^-6. Almost all of these low e-value hits had their functions listed as “Helix-turn-helix DNA-binding domain” (sometimes abbreviated as HTH DNA-binding domain). This includes the two top hits from the genomes of phages Timinator_1 (e-value = 4e^-80) and StevieBAY_1 (e-value = 4e^-80). /note=NCBI Blast also had as it’s top hit a helix-turn-helix binding domain associated with the phage Timinator (e-value = 9e^-105). However a number of other hits with low e-values were described as RNase proteins, including the second result, associated with the phage BarretLemon (e-value = 4e^-104) (The placement of both of these hits as well as their e-values aligned well with the NCBI listings generated on PECAAN and were selected as evidence). Research listed in the NCBI database details how some RNases possess helix-turn helix binding domains (specifically C. jejuni RNase R) and so this was considered a likely gene product. /note=HHpred returned a top result of a protein with unknown function. This protein had 98.2% probability, 29.5302% coverage, and an e-value of 0.000011. The second highest ranked result was a helix-turn-helix protein associated with actinobacteria. This family was listed as “functionally unclassified” and had 97.87% probability, 29.5302% coverage, and an E-value of 0.00021. /note=CDD returned no hits at all. Based on these results, it was decided that the most accurate function classification would be “helix-turn-helix protein” since no RNase proteins were listed among the HHpred or CDD hits. /note= /note=Transmembrane domains: No transmembrane domains were predicted by TMHMM /note=Secondary Annotator Name: Infante, Ariana /note=Secondary Annotator QC: /note=Auto-Annotation: I would omit the comparison to CallinAllBarbz just because I don’t think it would be relevant to people outside of MIMG 103AL and BL. /note=SD (final) score: I think you can omit the meaning behind the final score. Additionally, mentioning the z-score of the chosen start site would be beneficial; mention that it is the highest and greater than 2. /note=Phamerator: Include the date at which pham number noted. Also, I think you confused this section with the starterator section. Change the location of everything you wrote after the pham number to the starterator section. To the phamerator section, add how this gene is conserved in other members of AO2 who are in the same pham, give examples, and state that they have a function call that is consistent/found in the approved functions list. /note=Starterator: In addition to what I mentioned above, also write down the basepair coordinate at which the most conserved start site corresponds to in GB (because start 9 also agrees with the start being at 21). CDS 467 - 1936 /gene="2" /product="gp2" /function="terminase" /locus tag="GravityBall_2" /note=Original Glimmer call @bp 467 has strength 15.7; Genemark calls start at 467 /note=SSC: 467-1936 CP: yes SCS: both ST: SS BLAST-Start: [terminase large subunit [Arthrobacter phage BarretLemon] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.266, -5.942549405124707, no F: terminase SIF-BLAST: ,,[terminase large subunit [Arthrobacter phage BarretLemon] ],,YP_009303071,100.0,0.0 SIF-HHPRED: Large subunit terminase; large terminase, VIRAL PROTEIN; 2.2A {Deep-sea thermophilic phage D6E},,,5OE8_A,93.456,100.0 SIF-Syn: Gene 2 is a terminase, large subunit protein. Its downstream gene is a portal protein, and its upstream gene is a helix-turn-helix DNA binding domain, which matches phages BarretLemon and Timinator (cluster AO2). /note=Primary Annotator Name: Bhatt, Khushi /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 467. The start codon was GTG, which has the highest probability. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating this is a forward gene. Coding potential is found both in GeneMark Self and Host, and the selected start site covers all the coding potential. /note=SD (Final) Score: -5.943. It is the best final score on PECAAN. The Z-score was 2.266, which was also the best score on PECAAN. /note=Gap/overlap: Overlap: -4bp upstream. The overlap is small and matches with cluster AO2 phages BarretLemon and Timinator, which makes it reasonable. Additionally, a -4bp overlap is evidence that this gene is likely in an operon. /note=Phamerator: pham: 1677. Date: 10/02/23. There are 57 pham members, 5 of which are drafts. This pham is conserved in phages BarretLemon and Timinator in addition to 16 other cluster AO2 phages. Phages BarretLemon and Timinator in addition to the other 16 cluster AO2 phages assigned a terminase, large subunit function to their genes in pham 1677, which is approved by SEA-PHAGES guidelines. Therefore, it is consistent for gene 2 in phage GravityBall to also have a terminase, large subunit function. /note=Starterator: Starterator was informative. There are 52 non-draft members of this pham. 24/52 non-draft genes call start site 5, which GravityBall possesses for this gene. Start site 5 was also autoannotated for this gene in GravityBall. This start site agrees with the Glimmer and GeneMark prediction, has the highest Z-score, highest final score, contains all the coding potential, has the longest ORF, and starts with GTG. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 467. /note=Function call: Terminase, large subunit. The top 98/107 phagesdb BLAST hits have either a terminase or terminase, large subunit function. The top 27 hits all have e-values of 0.0, and the remaining 73 hits all have significant e-values below 1e-6. The top 23 NCBI BLAST hits also have a terminase, large subunit function with e-values of 0.0, and the top hit has 100% coverage, 100% identity, and 0.0 e-value. HHpred’s top five hits also had a terminase, large subunit function, and the top hit had a 100% probability, 93.5% coverage, and e-value of 9.6e-38. The top three CDD hits had terminase or terminase, large subunit functions, and the top hit had a 84.3% coverage and 8.3e-11 e-value, although it had a low %identity of 19.3%. According to SEA-PHAGES guidelines, a terminase, large subunit function can be assigned to this gene. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Liu, Rodia /note=Secondary Annotator QC: I have QC`d this gene and agree with the location and function calls. Note: For the starterator part, both showed “Manual Annotations of this start: 24 of 52.” CDS 1978 - 3444 /gene="3" /product="gp3" /function="portal protein" /locus tag="GravityBall_3" /note=Original Glimmer call @bp 1978 has strength 19.42; Genemark calls start at 1978 /note=SSC: 1978-3444 CP: yes SCS: both ST: SS BLAST-Start: [portal protein [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 0.0 GAP: 41 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.36, -5.055463787873258, no F: portal protein SIF-BLAST: ,,[portal protein [Arthrobacter phage StevieBAY]],,QJD53333,100.0,0.0 SIF-HHPRED: Portal protein; G20C, portal protein, bacteriophage, transport protein; 1.9A {Thermus phage P7426},,,5NGD_B,79.7131,100.0 SIF-Syn: Gene 3 is a portal protein of pham 96167. Its downstream gene(pham 1692) is a capsid maturation protein, and its upstream gene(pham 1677) is a terminase, large subunit protein, which matches phages BarretLemon and Timinator of cluster AO2. /note=Primary Annotator Name: Liang, Edwin /note=Auto-annotation: Both Glimmer and GeneMark call the most plausible start at position 1978 with the start codon ATG. /note=Coding Potential: The coding potential on the forward strand is found both in GeneMark Self and Host. All reasonable coding potential in ORF is covered by the start site. /note=SD (Final) Score: -5.055, which is best among all other possible start sites considering Z-score, gap, and coding potential. Z-score is at least 2 and the second highest among all other proposed start sites. /note=Gap/overlap: 41bp gap. Acceptable gap length. The gap is conserved among other AO2 cluster phages. /note=Phamerator: 96167. Portal protein called. Conserved, found in AO2 phages. /note=Starterator: Start site 26 was manually annotated in 53 of 260 phages in this pham. Start 26 is 1978 in GravityBall. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on this evidence, this is a real gene and its most plausible start site is 1978. /note=Function call: Portal protein. The top three PhagesDB Blast hits list portal protein as function with e-value of 0.00. NCBI BLAST’s top hit is a portal protein, QJD53333 (100% coverage, 100% identity, and E-value of 0.00). CDD resulted one hit for DUF935(e-value of 1.02e-07), bacterial proteins of unknown function. HHPred had a top hit for a 5NGD_B with 100% probability, 79.71% coverage, and an E-value of 3.4e-36. /note=Transmembrane domains: DeepTMHMM predicts no TMD for this protein. There is no evidence to support that this is a transmembrane protein. /note=Secondary Annotator Name: Almeida, Tarissa /note=Secondary Annotator QC: I have QC`d this gene and agree with all the calls made by the primary annotator. In synteny box, add pham numbers for upstream and downstream genes. CDS 3437 - 5149 /gene="4" /product="gp4" /function="capsid maturation protease" /locus tag="GravityBall_4" /note=Original Glimmer call @bp 3437 has strength 20.52; Genemark calls start at 3437 /note=SSC: 3437-5149 CP: yes SCS: both ST: SS BLAST-Start: [capsid maturation protease [Arthrobacter phage Timinator] ],,NCBI, q1:s1 100.0% 0.0 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.937, -5.135095111895543, no F: capsid maturation protease SIF-BLAST: ,,[capsid maturation protease [Arthrobacter phage Timinator] ],,ASR78034,100.0,0.0 SIF-HHPRED: Gln_amidase ; Papain fold toxin 1, glutamine deamidase,,,PF15644.9,17.8947,98.1 SIF-Syn: This gene has synteny with other cluster AO2 phages such as BossLady and LeeroyJAY. The preceding gene is a portal protein, with which there is a small gap, and the following gene has NKF. /note=Primary Annotator Name: Sanoyca, Alicia /note=Auto-annotation: Both Glimmer and GeneMark agree on the start site 3437 (ATG) /note=Coding Potential: The gene does not have great coding potential within the ORF, but it is present throughout. The chosen start site covers all the coding potential. /note=SD (Final) Score: This start site does not have the best (least negative) RBS final score, but its score of -5.135 is relatively higher than most other start site calls, so it is reasonable to be a credible ribosome binding site. The gene may belong to an operon, suggesting the low final scores. The Z-score is 1.937, which is less than the desired >2 threshold. /note=Gap/overlap: Start 3437 yields an 8bp overlap with the previous gene, which is reasonable if the gene is in an operon. The overlap is conserved in other phages of this subcluster such as Timinator and LeeroyJAY. It yields the longest ORF of 1713bp. /note=Phamerator: As of 10/2/23, this gene belongs to Pham 1692, which is conserved in many other members of the AO2 cluster. I used phages LeeroyJ and StevieBAY for comparison. The consistent function call for this gene is ‘capsid maturation protease’, which is included in the approved function list. /note=Starterator: The Starterator site for this gene (start 4 at Start: 3437) is conserved in 49% of the non-draft genes in the Pham. It is found in 25 of 51 manual annotations (MAs). /note=Location call: This is likely a real gene and the likely start site is 3437. The gene may be in an operon due to its overlap. /note=Function call: Capsid maturation protease. NCBI BLASTp has hits that correspond to e-values of 0. CDD calls a Papain fold toxin 1, glutamine deamidase domain with low coverage 17.3%) but a low e-value (3.7e-8), which tracks with the protein having a capsid-related function. HHpred also calls this domain with 98.1% probability, but it also mentions 3 other domains with high probabilities ( >98%) and significant e-values, but none with high % coverage. The sea-phages approved functions list mentions that a significant hit to D29 and L5 is sufficient evidence, and after running HHpred with Uniprot externally (not in PECAAN) we confirmed that there was a hit to the CMP on D29. /note=Transmembrane domains: There are no predicted TMDs, further suggesting this is not a transmembrane protein. /note=Secondary Annotator Name: Anand, Sasha /note=Secondary Annotator QC: For gap/overlap, I`d suggest mentioning synteny with other phages instead of naming all other start sites, it might be more concise. For location call, I would state "this is a real gene at X start" and use Starterator/Glimmer/GeneMark as evidence. The other evidence you mention is in other sections of your notes. Check relevant evidence boxes for NCBI Blast, HHPRED, and CDD. Check Starterator box and GM coding capacity. CDS 5153 - 5470 /gene="5" /product="gp5" /function="hypothetical protein" /locus tag="GravityBall_5" /note=Original Glimmer call @bp 5150 has strength 18.91; Genemark calls start at 5150 /note=SSC: 5153-5470 CP: no SCS: both-cs ST: SS BLAST-Start: [immunity protein [Arthrobacter phage BarretLemon] ],,NCBI, q1:s2 100.0% 1.34795E-68 GAP: 3 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.95, -3.875201829906135, yes F: hypothetical protein SIF-BLAST: ,,[immunity protein [Arthrobacter phage BarretLemon] ],,YP_009303074,99.0566,1.34795E-68 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chan, Rose /note=Auto-annotation: Glimmer and GeneMark. Both call the start site at 5150. /note=Coding Potential: Coding potential in this gene is on both the forward and reverse strand but the reverse strand can be disregarded since it is not large enough to be considered significant. Coding potential is found in both GeneMark Self and Host. /note=SD (Final) Score: -3.109. Although this start site does not have the highest final score, it gives the longest gene. Also has best z-score of 2.95. /note=Gap/overlap: 0. This may be part of an operon and this gap is conserved in other phages such as BarrettLemon and BossLady. /note=Phamerator: pham: 116409. Date: 10/4/23. The gene is conserved in phages BarrettLemon, BossLady, and Grekaycon, all in the same cluster as GravityBall. /note=Starterator: Start site 11 was manually annotated 15 of 60 non-draft genes in this pham. Start 11 is 5150 in GravityBall. This evidence agrees with the site predicted by Glimmer and Genemark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 5150. /note=Function call: NKF. The top three PhagesDB hits are also NKF (e-value 1e-57), and one NCBI Blast suggests an immunity protein while the other 2 are hypothetical proteins. HHPred had a hit for immunity protein with 99.8% probability and 68.8769% coverage, and E-value 8.5e-20. CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: De Guzman, Arieanne /note=Secondary Annotator QC: Can comment on good z-score in SD section. Otherwise, I agree with the above location and function calls. CDS complement (5593 - 5829) /gene="6" /product="gp6" /function="hypothetical protein" /locus tag="GravityBall_6" /note=Original Glimmer call @bp 5829 has strength 19.18; Genemark calls start at 5829 /note=SSC: 5829-5593 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BJD79_gp06 [Arthrobacter phage BarretLemon] ],,NCBI, q1:s1 100.0% 7.71463E-48 GAP: 128 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.1, -4.669046746593814, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BJD79_gp06 [Arthrobacter phage BarretLemon] ],,YP_009303075,100.0,7.71463E-48 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chong, Truman /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 5829 with a start codon of ATG which has a high potential for being the start site. /note=Coding Potential: The Coding Potential on the Reverse Strand is found in both GeneMark Host and Self. All reasonable coding ORF is covered by this start site. /note=SD (Final) Score: The FS score for this start site is -4.669 with a Z-score of 2.1. While this is not the highest FS score, it is the start site with the Z-score above 2 that maintains an ORF that covers all the coding potential. /note=Gap/overlap: 128 bp gap upstream with a gene length of 237 bp which is sufficient to switch gene orientation. Gene length and gap are conserved in comparison to BarretLemon. /note=Phamerator: Pham: 106669. Date: 10/4/2023. This pham has 15 members. It is conserved; found in BarretLemon and BossLady. /note=Starterator: Start Site 10 was manually annotated 14/14 non-draft genes in this pham. Start Site 10 is at 5829 in GravityBall. This evidence agrees with the site predicted by Glimmer. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 5829. /note=Function call:NKF. Top results for PhagesDB (e-value: 8e-37) and NCBI (e-value: 8e-37) were for an unknown protein. Any options with a function did not have a low enough e-value to be considered. HHpred had 1 hit (Probability: >80%, Coverage: >80%); however, the e-value for the hit was 4.4 on PECAAN and 32 on HHpred which is too high to make a call. CDD did not provide any relevant hits. /note=Transmembrane domains: TMHMM did not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Trinh, Uyen /note=Secondary Annotator QC: Gap/overlap: Not sure you need “As this gap is greater than 50 bp, it is sufficient to allow for a switch in gene orientation from forward to reverse direction.” or potentially it can be shortened! Please let me know what TA / Prof says about it. /note=Phamerator: Note down how many phages the gene is conserved in total. /note=Function call: Change to → “NKF. The top result for PhagesDB (e-value: 8e-37) and NCBI (e-value: 8e-37) were for an unknown protein. Any options with a function did not have a low enough e-value to be considered. Additionally, while HHpred had 1 hit with high probability (>80%) and high coverage (>60%), the e-value for the hit was 4.4 on PECAAN and 32 on HHpred which is too high to make a call. CDD did not provide any relevant hits.” Could be shortened even further too! /note=Looks good! CDS 5958 - 7103 /gene="7" /product="gp7" /function="scaffolding protein" /locus tag="GravityBall_7" /note=Original Glimmer call @bp 5958 has strength 14.18; Genemark calls start at 5958 /note=SSC: 5958-7103 CP: yes SCS: both ST: SS BLAST-Start: [scaffolding protein [Arthrobacter phage JKerns]],,NCBI, q1:s1 100.0% 3.60932E-166 GAP: 128 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.677, -5.864785151983202, no F: scaffolding protein SIF-BLAST: ,,[scaffolding protein [Arthrobacter phage JKerns]],,QIQ62819,76.2963,3.60932E-166 SIF-HHPRED: Phage_GPO ; Phage capsid scaffolding protein (GPO) serine peptidase,,,PF05929.14,36.2205,99.8 SIF-Syn: Scaffolding protein, upstream gene is NKF, downstream is minor capsid protein, just like in phage StevieBAY. /note=Primary Annotator Name: Infante, Ariana /note=Auto-annotation: Glimmer and GeneMark both call the start of the gene at 5958, with a start codon of ATG. This start codon has a high probability of occurrence. /note=Coding Potential: The ORF has reasonable coding potential in the forward direction, with the selected start site of 5958 covering all of the coding potential seen on the self-trained and host-trained GeneMark /note=SD (Final) Score: -5.865. This is not the best SD score provided on PECAAN for there are fewer negative SD scores, but it is reasonable with a z-score of 1.677. This is also not the best z-score provided for there are two suggested start sites with scores higher than 2, but they do not cover all of the coding potential. /note=Gap/overlap: The gap with the upstream gene is reasonable at 128 bp. The other start site candidates do create a longer ORF, but they fail to cover all of the coding potential. The length of the gene is 1146 bp, making it larger than the accepted starting length of 120 bp. /note=Phamerator: The Pham number as of October 2nd, 2023 is 114862. The gene is conserved in Abba, BarretLemon, BossLady, and Brent, who are all in the same cluster of AO2 as GravityBall. The function call for the gene is a scaffolding protein, which is consistent in Phamerator and found in the approved function list. /note=Starterator: There are 110 non-draft members in this Pham. 33 of the 110 non-draft members call start site 15, which corresponds to a start site of 5958 for GravityBall. /note=Location call: Based on the above evidence, this is a real gene that has a likely start site at 5958 bp. /note=Function call: Scaffolding protein. Multiple PhagesDB BLAST hits (BarretLemon, LeeroyJ, StevieBAY, Timinator, Pippa, and Jordan) list the function of scaffolding protein with e-values less than 10^-154 and of 0. NCBI BLAST has three hits that correspond to 100% query coverage, over 71.60% identity, and e-values between e^-166 to e^-163. HHPRED had a hit for phage capsid scaffolding protein with a probability of 99.72% and an e-value of 1.5e^16. CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Sebastian Castellanos /note=Secondary Annotator QC: I agree with the location and function calls. I would change the name given for GravityBall`s cluster to A02 in the phamerator report and give names for the genes which were listed on PhagesDB as hits. CDS 7142 - 7528 /gene="8" /product="gp8" /function="minor capsid protein" /locus tag="GravityBall_8" /note=Original Glimmer call @bp 7142 has strength 17.42; Genemark calls start at 7142 /note=SSC: 7142-7528 CP: yes SCS: both ST: SS BLAST-Start: [head scaffolding protein [Arthrobacter phage BarretLemon] ],,NCBI, q1:s1 100.0% 4.17669E-85 GAP: 38 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.456, -2.0949393225970705, yes F: minor capsid protein SIF-BLAST: ,,[head scaffolding protein [Arthrobacter phage BarretLemon] ],,YP_009303077,100.0,4.17669E-85 SIF-HHPRED: Capsid fiber protein; bacteriophage, phi29, prohead, VIRUS; HET: SO4; 1.8A {Bacillus phage phi29},,,6QYY_C,85.9375,99.6 SIF-Syn: Minor capsid protein, the upstream gene is a scaffolding protein, and the downstream gene is a major capsid protein, just like the phage BarretLemon. /note=Primary Annotator Name: Liu, Rodia /note=Auto-annotation: Glimmer and GeneMark. Both calls start at 7142. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: The final score is the best option at -2.095 and Z-score is the highest 3.456. /note=Gap/overlap: The gap with the upstream is 38 bp and the downstream is 22 bp. Both are less than 50 bp, thus, there is no coding potential in the gap to be a new gene. /note=Phamerator: pham:116188. Date 10/03/2023. It is conserved; found in BarretLemon(AO) and LeeroyJ(AO). /note=Starterator: (Start: 7 @7142 has 12 MA`s) The Starterator site for this gene is conserved among the only other FP and AO2 cluster phage in the Pham, but it is not conserved across the other Pham members. Start site 7 in Starterator was manually annotated in 12/162 non-draft genes in this pham. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 7142. /note=Function call: Minor capsid protein. The top three PhagesDB BLAST hits had the function of minor capsid protein (E-value 98%, 80%+ identity, and E-value >10E-3) and probabilities less than 80%. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs and therefore is not a membrane protein. /note=Secondary Annotator Name: Chan, Rose /note=Secondary Annotator QC: I have QC`d this gene and agree with the location and function call. CDS 9513 - 9980 /gene="12" /product="gp12" /function="hypothetical protein" /locus tag="GravityBall_12" /note=Original Glimmer call @bp 9513 has strength 15.39; Genemark calls start at 9513 /note=SSC: 9513-9980 CP: yes SCS: both ST: SS BLAST-Start: [tail completion or Neck1 protein [Arthrobacter phage BarretLemon] ],,NCBI, q1:s1 100.0% 3.20551E-109 GAP: 4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.137, -4.673768653567473, yes F: hypothetical protein SIF-BLAST: ,,[tail completion or Neck1 protein [Arthrobacter phage BarretLemon] ],,YP_009303081,100.0,3.20551E-109 SIF-HHPRED: Phage_tail_S ; Phage virion morphogenesis family,,,PF05069.16,89.6774,99.7 SIF-Syn: /note=Primary Annotator Name: Trinh, Uyen /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 9513. Start codon is ATG, which is a common start codon, providing more evidence for the location call. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.674. It is the best final score on PECAAN. Z-score of 2.137 (> 2) also provides evidence for the location call. /note=Gap/overlap: Gap of 4 bp with the upstream gene. Small, which is indicative of an appropriate start site. /note=Phamerator: 116441. Date 10/04/2023. This gene is conserved in 57 other phages. /note=Starterator: There are 52 non-draft members of this Pham. 32 non-draft members call start site 8, which correlates to a start site of 9513 bp for GravityBall. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 9513 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: NKF. The top three phagesdb BLAST hits have unknown functions with e-values greater than 10^-6. While there were HHPred, NCBI Blast, and CDD hits, the required domains were not present for head to tail adaptor protein function since there was no "SPP1 15 (5A21 chain C or D in the macromolecular complex) OR an HHPRED alignment to one of the following crystal structures: HK97 gp6 or or Bacillus protein yqbG" as detailed on SEA-PHAGES functional assignments. There was also not enough evidence to call minor tail protein as the gene is not downstream of a tape measure protein. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Chong, Truman /note=Secondary Annotator QC: I agree with the start call. However, I don`t agree with the function call. PhagesDB Blast, NCBI Blast, and HHpred has good evidence (low e-values, high % coverage, high % identity) that show for gene function. It could be helpful if you specify what required domains are not present (i.e., e-values, coverage, etc.). Also synteny box can be left blank when gene is NKF. CDS 9980 - 10591 /gene="13" /product="gp13" /function="hypothetical protein" /locus tag="GravityBall_13" /note=Original Glimmer call @bp 9980 has strength 26.91; Genemark calls start at 9980 /note=SSC: 9980-10591 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TIMINATOR_13 [Arthrobacter phage Timinator] ],,NCBI, q1:s1 100.0% 1.64629E-146 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.68, -3.5707972674952195, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TIMINATOR_13 [Arthrobacter phage Timinator] ],,ASR78043,100.0,1.64629E-146 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Castellanos, Sebastian /note=Auto-annotation: Both Glimmer and GeneMark auto-annotated the start site of this gene to be at nucleotide position 9980. The start codon recorded at this position is ATG. /note=Coding Potential: The start and stop sites provided by the Glimmer and GeneMark annotation (nucleotide position 9980) encompass all of the putative coding potential depicted in the host-trained GeneMark map as well as the self-trained GeneMark map. /note=SD (Final) Score:-3.571. The final score is the highest for the proposed start site of 9980. This same start site also has the highest z-score at 2.68, well above the threshold for significance as evidence (2). /note=Gap/overlap: This proposed start site has an overlap of one nucleotide with the preceding gene. This was taken as further evidence in favor of this start site since this overlap is usually associated with membership in an operon. /note=Phamerator: Pham 1671 - The graphical output associated with this Pham noted that this gene is unique from all other iterations (13+3). This gene is preserved in several other members of the subcluster AO2, including BarretLemon and Grekaycon, both of which listed "no known function" for their analogs. /note=Starterator: The “most annotated” start site found was “4”, called in 33 of the 52 non-draft genomes. However this start site was not present in GravityBall - that was “3”, in alignment with the auto-annotated start position at 9980. This start site was called 90.5% of the time when present. (Note Phamerator and Starterator reports were both accessed on October 4, 2023) /note=Location call: Based on all of the evidence aforementioned, this is a real gene and its start site is most likely at 9980 /note=Function call: /note=The PhagesDB Blast revealed more than 60 hits with other arthrobacter genomes with e values below the threshold of 10^-6. Almost all of these low e-value hits had their functions listed as “function unknown”. This includes the two top hits from the genomes of phages Timinator_13 (e-value = e^-111) and StevieBAY_13 (e-value = e^-111). However there were four hits classified as “tail terminators”: Wyborn_14 (e-value = 4e^-63), Schomber_42 (e-value = 3e^-07), Kabocha_44 (e-value = 3e^-07), Hanem_43 (e-value = 3e^-07) /note=NCBI Blast similarly returned three top hits of “hypothetical protein”: SEA_TIMINATOR_13 (e-value = 2e^-146), BJD79_gp13 (e-value = 2e^-145), and SEA_PIPPA (e-value = 4e^-117) (The placement of these hits as well as their e-values aligned well with the NCBI listings generated on PECAAN and were selected as evidence). /note=CDD returned no hits at all /note=HHPred returned two low e-value hits for proteins with a “minor tail protein” function from the preferred databases of PDB and Pfam (e-values = 0.000072 and 0.0005), however consulting the SEA-Phages forum as to whether or not they should be considered evidence in favor of this classification disqualified them. /note=Based on these results, it was decided that the function classification of this gene should be “no known function” /note= /note=Transmembrane domains: No transmembrane domains were predicted by PECAAN /note=Secondary Annotator Name: Infante, Ariana /note=Secondary Annotator QC: /note=Coding potential: Mention what start site encompasses all of the coding potential. /note=SD (Final) score: Remove the meaning behind final score. Add z-score and how it is the highest value greater than 2. /note=Gap/overlap: The start site has an overlap rather than a gap with the upstream gene. Mention length of gene. /note=Phamerator: Include date at which pham number noted. Also, I think you confused this section with the starterator section. Change the location of everything you wrote after the pham number to the starterator section. To the phamerator section, add how this gene is conserved in other members of AO2 who are in the same pham, give examples, and state that they have no function called. /note=Starterator: Just move the portion from phamerator that I mentioned above here. /note=Function call: Select on PECAAN which NCBI hit you used as evidence (currently unselected). CDS 10649 - 10888 /gene="14" /product="gp14" /function="hypothetical protein" /locus tag="GravityBall_14" /note=Original Glimmer call @bp 10649 has strength 10.07; Genemark calls start at 10679 /note=SSC: 10649-10888 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein BJD79_gp14 [Arthrobacter phage BarretLemon] ],,NCBI, q1:s1 100.0% 7.01112E-49 GAP: 57 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.448, -5.973704629307399, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BJD79_gp14 [Arthrobacter phage BarretLemon] ],,YP_009303083,100.0,7.01112E-49 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Bhatt, Khushi /note=Auto-annotation: Glimmer and GeneMark. Glimmer called the start at 10649, whereas GeneMark called the start at 10679. The start site at 10649 was autoannotated, and both the starts at 10649 and 10679 correspond to an ATG start codon. /note=Coding Potential: Coding potential for this ORF was on the forward strand only, indicating this is a forward gene. Coding potential is found in both GeneMark Self and Host, and the selected start site at 10649 covers all the coding potential. /note=SD (Final) Score: -5.974. It is the best final score on PECAAN. The Z-score was 1.448, which is below the ideal Z-score of 2 but is nonetheless the best score on PECAAN. /note=Gap/overlap: Gap: 57bp upstream. The gap is rather large but minimized by the start site at 10649, and this gap is conserved in the cluster AO2 phages BarretLemon and Timinator, which makes it reasonable. /note=Phamerator: pham: 114286. Date: 10/02/23. There are 36 pham members, 2 of which are drafts. This pham is conserved in the cluster AO2 phages BarretLemon and Timinator. Phages BarretLemon and Timinator did not assign a function to their genes in pham 114286. Therefore, it would be consistent that this gene does not have a function in phage GravityBall. /note=Starterator: Starterator was informative. There are 34 non-draft members of this pham. 33/34 non-draft genes call start site 7, which GravityBall possesses for this gene. Start site 7 was also autoannotated for this gene in GravityBall. This start site agrees with the Glimmer prediction, has the highest Z score, highest final score, contains all the coding potential, has the longest ORF, and starts with ATG. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 10649. /note=Function call: NKF. The top two phagesdb BLAST hits are listed as function unknown, and they have significant e-values of 6e-39. The top two NCBI BLAST hits also have a function of hypothetical protein, and the top hit has a 100 %identity, 100 %coverage, and an e-value of 7.01e-49. There are no significant HHPred hits, as all the hits have e-values that do not meet the 1e-6 significance threshold. There are no CDD hits present as well. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Liu, Rodia /note=Secondary Annotator QC: I have QC`d this gene and agree with the location and function calls. CDS 10891 - 12342 /gene="15" /product="gp15" /function="tail sheath protein" /locus tag="GravityBall_15" /note=Original Glimmer call @bp 10891 has strength 17.67; Genemark calls start at 10891 /note=SSC: 10891-12342 CP: yes SCS: both ST: SS BLAST-Start: [tail sheath protein [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 0.0 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.377, -2.17469248771465, yes F: tail sheath protein SIF-BLAST: ,,[tail sheath protein [Arthrobacter phage StevieBAY]],,QJD53345,100.0,0.0 SIF-HHPRED: Phage tail sheath protein; STRUCTURAL PROTEIN, Contractile injection system; 3.6A {Streptomyces coelicolor A3(2)},,,8BKY_E,99.1718,100.0 SIF-Syn: Gene 15 is a tail sheath protein with pham 1542. Its downstream gene(pham 1545) is a tail tube protein, and its upstream gene(pham 114286) has no known function but has synteny with phages BarretLemon and Timinator of cluster AO2. /note=Primary Annotator Name: Liang, Edwin /note=Auto-annotation: Both Glimmer and GeneMark call the most plausible start at position 10891 with the start codon ATG. /note=Coding Potential: The coding potential is found on the forward strand of both GeneMark Host and Self. All reasonable coding potential in the ORF is covered by the start site 10891. /note=SD (Final) Score: -2.175. The final score is lowest among all other start sites, with a favorable Z-score of 3.377. /note=Gap/overlap: 2bp gap. Acceptable gap length. Gap length is conserved in other AO2 cluster phages. /note=Phamerator: 1542. Tail sheath protein called. Conserved, found in AO2 phages. /note=Starterator: Start site 2 was manually annotated in 58 of 60 phages in this pham. Start site 2 is 10891 in GravityBall. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on this evidence, this is a real gene and the most plausible start site is at 10891. /note=Function call: Tail sheath protein. The top hits of PhagesDB Blast with e-values of 0 call this gene a tail sheath protein. The top hit of NCBI Blast is a tail sheath protein(QJD53345) with 100% coverage, 100% identity, and e-value of 0. From CDD, the top hit was a tail sheath protein(pfam04984) with e-value 3.06423e-8.HHPred’s top hit is for a phage tail sheath protein(8BKY_E) with 100% probability, 99% coverage, and e-value of 0. /note=Transmembrane domains: DeepTMHMM predicts no TMD for this protein. There is no evidence to support that this is a transmembrane protein. /note=Secondary Annotator Name: Almeida, Tarissa /note=Secondary Annotator QC: I have QC`d this gene and agree with all calls made by primary annotator. I would add pham numbers to the synteny box for upstream and downstream genes. CDS 12356 - 12778 /gene="16" /product="gp16" /function="tail tube protein" /locus tag="GravityBall_16" /note=Original Glimmer call @bp 12356 has strength 11.06; Genemark calls start at 12356 /note=SSC: 12356-12778 CP: yes SCS: both ST: SS BLAST-Start: [tail tube protein [Arthrobacter phage Timinator] ],,NCBI, q1:s1 100.0% 7.9479E-96 GAP: 13 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.864, -3.2017655760970833, yes F: tail tube protein SIF-BLAST: ,,[tail tube protein [Arthrobacter phage Timinator] ],,ASR78046,100.0,7.9479E-96 SIF-HHPRED: Phage_T4_gp19 ; T4-like virus tail tube protein gp19,,,PF06841.15,92.8571,99.9 SIF-Syn: Gene STOP@12778 is a tail tube protein. Its upstream is called as a tail sheath protein whereas the downstream gene has no known function. This synteny is observed in other AO2 phages such as LeeroyJ and StevieBAY. /note=Primary Annotator Name: Sanoyca, Alicia /note=Auto-annotation: Both GeneMark and Glimmer predict a start site of 12356 (ATG). /note=Coding Potential: There is some coding potential within the predicted start and stop sites of the gene in the forward coding direction of reading frame 2 in GeneMark. /note=SD (Final) Score: The final score is the best/least negative (-3.202) and the Z-score is the highest (2.864), suggesting this is likely the start site. /note=Gap/overlap: There is a 13 gap between the previous gene, which is the best among all other options. This start site yields the longest gene length of 423. /note=Phamerator: As of 10/04/23, this gene belongs to Pham 1545. This pham is conserved in other members of the AO2 cluster, i.e. LeeroyJ and StevieBAY. The function call for is consistently a tail tube protein. /note=Starterator: 27/60 non-draft genes in the pham called this start site. It is found in 28/69 (40.6%) genes in the pham and is called 100% of the time when present. Start1 @12356 is the most called start site. /note=Location call: This is a real gene with a likely start site at 12356. Starterator/Glimmer/GeneMark all agree with this start site. /note=Function call: Phagesdb BLAST calls this gene as a tail tube protein for many other pham members of the same AO2 cluster, such as StevieBAY and LeeroyJ. NCBI BLAST also displays many hits to tail tube proteins with high % identity and very low e-values. HHPRED shows >99 probability matches to viral tail tube proteins with high coverage (>95%) and very low e-values. CDD does not yield any results. The final function call is tail tube protein. /note=Transmembrane domains: There are no predicted TMDs, further suggesting this is not a transmembrane protein. /note=Secondary Annotator Name: Anand, Sasha /note=Secondary Annotator QC: For coding potential, mention which forward reading frame contains the ORF. For location call, I would state "this is a real gene at X start" and use Starterator/Glimmer/GeneMark as evidence. The other evidence may be redundant. For function call, I would name 1-2 specific hits from BLASTp and HHPRED with the coverage and probability percentages. Be sure to fill out the synteny box and select the starterator dropdown. For evidence, select 2-3 hits maximum from each database. CDS 12965 - 13522 /gene="17" /product="gp17" /function="hypothetical protein" /locus tag="GravityBall_17" /note=Original Glimmer call @bp 12965 has strength 22.13; Genemark calls start at 12965 /note=SSC: 12965-13522 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_SICARIUS2_18 [Arthrobacter phage Sicarius2]],,NCBI, q2:s4 99.4595% 4.96418E-74 GAP: 186 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.355, -2.5074698667202133, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_SICARIUS2_18 [Arthrobacter phage Sicarius2]],,QWY81923,73.7113,4.96418E-74 SIF-HHPRED: Phage_TAC_5 ; Phage XkdN-like tail assembly chaperone protein, TAC,,,PF08890.14,75.1351,99.7 SIF-Syn: /note=Primary Annotator Name: Chan, Rose /note=Auto-annotation: Both Glimmer and GeneMark call the gene and they agree on the start site 12965. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -2.507. It is the best final score on PECAAN. Has the best z-score of 3.355. /note=Gap/overlap: Gap: 186 bp. This gap is somewhat large but reasonable since it is conserved in other phages (BarrettLemon, StevieBAY), and there is no coding potential in the gap that might also be a gene. /note=Phamerator: pham: 1683. Date: 10/4/23. It is conserved, found in BarrettLemon, BossLady, and StevieBAY. /note=Starterator: Start site 12 was manually annotated in 17 of 52 non-draft genes in this pham. Start site 12 is 12965 in GravityBall. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 12965. /note=Function call: NKF. The top four PhagesDB hits are also NKF (e-value e-103), and one NCBI Blast (BarrettLemon) calls it a tail assembly chaperone while the other 3 are hypothetical proteins. HHPred had a hit for tail assembly protein with 98.19% probability and 78.9189 and E-value 0.000016. CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: De Guzman, Arieanne /note=Secondary Annotator QC: Can comment on Z-score in SD section. Otherwise, I agree with the above location and function calls. CDS 13552 - 13728 /gene="18" /product="gp18" /function="hypothetical protein" /locus tag="GravityBall_18" /note=Original Glimmer call @bp 13552 has strength 12.32; Genemark calls start at 13552 /note=SSC: 13552-13728 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BJD79_gp18 [Arthrobacter phage BarretLemon] ],,NCBI, q1:s1 100.0% 7.4918E-32 GAP: 29 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.855, -5.509176084620483, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BJD79_gp18 [Arthrobacter phage BarretLemon] ],,YP_009303087,100.0,7.4918E-32 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chong, Truman /note=Auto-annotation: Glimmer and GeneMark. Both call the site at 13552 with a start codon of ATG which has a high potential of being the start site. /note=Coding Potential: The Coding Potential on the Forward Strand is found in both GeneMark Host and Self. All reasonable coding ORF is covered by this start site. /note=SD (Final) Score: The FS score for this site is -5.509 with a Z-score of 1.855 that uses the more common ATG start site and covers all coding potential. /note=Gap/overlap: 29 bp gap upstream with a gene length of 177 bp. Gene Length and gap are conserved in comparison to BarretLemon. /note=Phamerator: Pham: 107095. Date: 10/8/2023. This pham has 7 members. It is conserved and found in both BarretLemon and LeeroyJ. /note=Starterator: Start Site 2 was manually annotated for 4/6 non-draft genes in this pham. Start Site 2 is at 13552 in GravityBall. This evidence agrees with the site predicted by Glimmer. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 5829. /note=Function call: NKF because the top results for PhagesDB (e-value: 4e-25) and NCBI (e-value: 7e-32) indicated either function unknown or hypothetical protein. Additionally, both HHpred and CDD did not provide any relevant hits. /note=Transmembrane domains: TMHMM did not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Trinh, Uyen /note=Secondary Annotator QC: SD Final Score: Could be shortened, potentially only writing about the final location call. /note=Phamerator: Note down how many phages the gene is conserved in total. /note=Function call: Change to → “NKF. The top results for PhagesDB (e-value: 4e-25) and NCBI (e-value: 7e-32) indicated either function unknown or hypothetical protein. Additionally, both HHpred and CDD did not provide any relevant hits.” /note=Good job! CDS 13748 - 16381 /gene="19" /product="gp19" /function="tape measure protein" /locus tag="GravityBall_19" /note=Original Glimmer call @bp 13748 has strength 17.3; Genemark calls start at 13796 /note=SSC: 13748-16381 CP: yes SCS: both-gl ST: SS BLAST-Start: [tape measure protein [Arthrobacter phage Timinator] ],,NCBI, q1:s1 100.0% 0.0 GAP: 19 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.898, -3.7234890726758456, yes F: tape measure protein SIF-BLAST: ,,[tape measure protein [Arthrobacter phage Timinator] ],,ASR78049,100.0,0.0 SIF-HHPRED: Tape Measure Protein, gp57; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_AF,11.6306,99.9 SIF-Syn: There is synteny. When this gene is a tape measure protein, downstream there is a LysM-like peptidoglycan binding protein and upstream there is NKF which is seen in LeeroyJ and Timinator. /note=Primary Annotator Name: Infante, Ariana /note=Auto-annotation: Glimmer calls the start site at 13748 while GeneMark calls the start site at 13796. Both start sites have a start codon of ATG. /note=Coding Potential: There is reasonable coding potential within the ORF. The start site of 13748 covers all of the coding potential seen on the self-trained and host-trained GeneMark. /note=SD (Final) Score: The start site of 13748 has the best, least negative final score (-3.723) and a z-score higher than 2 (2.898). /note=Gap/overlap: The gap with the upstream gene is 19 bp with the chosen start site of 13748. This gap is relatively small and conserved in other phages such as Timinator and StevieBay while other start site candidates create a longer gap between the upstream gene. The length of the gene (2634 bp) is acceptable. /note=Phamerator: The pham number as of October 4th, 2023 is 116570. The gene is conserved in BossLady, StevieBAY, and other phages that belong to the same subcluster as GravityBall (AO2). The function call for this gene is tape measure protein – it is a consistent call within Phamerator and is found in the approved function list. /note=Starterator: There are 31 non-draft members in this pham. 29 of the 31 non-draft genes call start site 1, which corresponds to a start site of 13748 in GravityBall. /note=Location call: Based on the above evidence, this is a real gene that has a likely start site at 13748. /note=Function call: Tape measure protein. The top three PhagesDB BLAST hits list the function of tape measure protein with e-values of 0, with the same cluster (AO2), and pham number (116570). The top two NCBI BLAST hits list identities greater than 99%, coverages of 100%, and e-values of 0. HHPRED had two hits with probabilities greater than 99.8% and e-values from 2.3e-13 to 6.2e-12, but had low coverages between 11.6% and 12.4%. CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM predicts three TMDs, ranging from 10 to 15 aa long. Based on evidence of the function of the gene being a tape measure protein, this gene can be assumed to have real TMDs and is therefore a “membrane protein.” /note=Secondary Annotator Name: Castellanos, Sebastian /note=Secondary Annotator QC: I agree with the location and function calls. I would specifically name which start site you are referring to throughout the different sections of the location call notes just so that it is entirely clear which one you are describing (the Glimmer or GeneMark start site) throughout. This may have changed in the past couple of weeks but the gap of the start site you chose is 19, not 17. Also `forgoing` doesn`t really work in the answer to the Gap/overlap section. In the phamerator section, you name Abba and Beans as phages in which the gene is conserved but neither of these are in the subcluster A02 so I would recommend explaining the distinction in your notes or else remove them. In the function call section, name the phages to which the top two or so hits from each database belong. CDS 16381 - 17046 /gene="20" /product="gp20" /function="LysM-like peptidoglycan binding protein" /locus tag="GravityBall_20" /note=Original Glimmer call @bp 16381 has strength 18.23; Genemark calls start at 16423 /note=SSC: 16381-17046 CP: yes SCS: both-gl ST: SS BLAST-Start: [LysM-like peptidoglycan-binding protein [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 5.32915E-160 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.763, -3.3418724531147843, no F: LysM-like peptidoglycan binding protein SIF-BLAST: ,,[LysM-like peptidoglycan-binding protein [Arthrobacter phage StevieBAY]],,QJD53350,100.0,5.32915E-160 SIF-HHPRED: LysM domain-containing protein; extracellular contractile injection system, STRUCTURAL PROTEIN; 2.7A {Algoriphagus machipongonensis},,,7AEB_P,89.5928,99.8 SIF-Syn: LysM-like peptidoglycan binding protein, the upstream gene is a tape measure protein, and the downstream gene is a minor tail protein, just like the phage BarretLemon. /note=Primary Annotator Name: Liu, Rodia /note=Auto-annotation: Glimmer starts 16381 with the start codon ATG and GeneMark starts 16423 with the start codon ATG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential of the selected start site is found both in GeneMark Self and Host. /note=SD (Final) Score: The final score is the best option at -3.342 and Z-score is the highest 2.763. /note=Gap/overlap: Glimmer(start at 16381) has overlap -1 indicating there is likely part of an operon, and GeneMark(start at 16423) has 49 bp gap, which is less than 50 bp and there is no coding potential for the new gene in the gap between the gene and its upstream counterpart according to GeneMark and Glimmer. /note=Phamerator: pham: 104993. Date: 10/4/23. It is conserved; found in BarretLemon(AO2) and Bosslady(AO2). The function is LysM-like peptidoglycan-binding protein in BarretLemon and Bosslady. /note=Starterator: (Start: 9 @16381 has 45 MA`s) Start site 9 in Starterator was manually annotated in 45/60 non-draft genes in this pham. This evidence agrees with the site predicted by Glimmer(start at 16381). /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 16381. /note=Function call: LysM-like peptidoglycan binding protein. The top three PhagesDB BLAST hits had the function of LysM-like peptidoglycan binding protein. (E-value 96% with tail function calls. No hits for CDD. Gene exhibits synteny with BarrettLemon and is a large gene (>2,000 bp) downstream of the tape measure protein. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs and therefore is not a membrane protein. /note=Secondary Annotator Name: Chan, Rose /note=Secondary Annotator QC: I have QC`d this gene and agree with the location and function call. CDS 20965 - 22824 /gene="24" /product="gp24" /function="minor tail protein" /locus tag="GravityBall_24" /note=Original Glimmer call @bp 20965 has strength 23.31; Genemark calls start at 20965 /note=SSC: 20965-22824 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 0.0 GAP: 55 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.136, -5.2479250502251915, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage StevieBAY]],,QJD53354,99.8384,0.0 SIF-HHPRED: L-SHAPED TAIL FIBER PROTEIN; VIRAL PROTEIN, BACTERIAL VIRUSES, CAUDOVIRALES, SIPHOVIRIDAE, INFECTION; HET: FLC; 2.52A {ENTEROBACTERIA PHAGE T5},,,4UW8_F,17.609,98.7 SIF-Syn: Minor tail protein, upstream gene is also a minor tail protein just like in phage BossLady, downstream gene is not called but has the same pham number 85656 as phage BossLady. /note=Primary Annotator Name: Trinh, Uyen /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 20965. Start codon is ATG, which is a common start codon, providing more evidence for the location call. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -5.248. It is the second best final score on PECAAN. Z-score of 2.136 (> 2) also provides evidence for the location call. /note=Gap/overlap: Gap of 55 bp with the upstream gene. Small, which is indicative of an appropriate start site. /note=Phamerator: 102580. Date 10/04/2023. This gene is conserved in 56 other phages. /note=Starterator: There are 51 non-draft members of this Pham. 46 non-draft members call start site 3, which correlates to a start site of 20965 bp for GravityBall. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 20965 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: Minor tail protein. The top three phagesdb BLAST hits have the function of minor tail protein (E-value = 0), and the top 4 NCBI BLAST hits also have the function of minor tail protein. (99% coverage, 99%+ identity, and E-value = 0). HHpred had a hit for L-shaped tail fiber protein with 98% probability, 69% coverage, and E-value of 1.3e-7. CDD had no relevant hits. Many members of the pham call this a minor tail protein. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Chong, Truman /note=Secondary Annotator QC: Everything look good. I agree with the start and function calls. CDS 22824 - 23111 /gene="25" /product="gp25" /function="membrane protein" /locus tag="GravityBall_25" /note=Original Glimmer call @bp 22890 has strength 13.56; Genemark calls start at 22824 /note=SSC: 22824-23111 CP: yes SCS: both-gm ST: SS BLAST-Start: [hypothetical protein BJD79_gp25 [Arthrobacter phage BarretLemon] ],,NCBI, q1:s1 100.0% 2.8852E-61 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.277, -4.394706008439538, no F: membrane protein SIF-BLAST: ,,[hypothetical protein BJD79_gp25 [Arthrobacter phage BarretLemon] ],,YP_009303094,100.0,2.8852E-61 SIF-HHPRED: SIF-Syn: /note=Gene (stop@23111) /note=PECAAN Notes /note=Primary Annotator Name: Castellanos, Sebastian /note=Auto-annotation: Glimmer and GeneMark flagged different start sites for this gene (22890 and 22824) with start codons ATG and GTG, respectively. /note=Coding Potential: The putative coding potential displayed by the host-trained GeneMark showed a minor spike in the stretch between the two proposed start sites (below the 50% mark). The self-trained GeneMark, however, more clearly showed that the coding potential began well behind the Glimmer start site of 22890. /note=SD (Final) Score: -4.395 (start @22824) & -6.394 (start @22890) - The final score is second lowest for the Glimmer start site and second highest (least negative) for the GeneMar start site. The GeneMark start site also had the second highest z-score at 2.277, surpassing the threshold of 2 for significance as evidence, whilst the start site with the highest z-score was not autoannotated. This was taken as further evidence in favor of GeneMark having called the correct start site. /note=Gap/overlap: The GeneMark start site (start @22824) had an overlap of 1 whereas the Glimmer start site had a gap of 65 nucleotides with the previous gene. Such a large gap is indicative of the Glimmer start site being incorrect, whereas the overlap associated with the GeneMark start site suggests membership in an operon. The length of the gene with this start site is definitely acceptable at 288 base pairs. /note=Phamerator: Pham 85656 - This gene is preserved in several other members of the subcluster AO2, including BarretLemon and Grekaycon, both of which listed "no known function" for their analogs. /note=Starterator: This pham has 54 members, of which only 5 were drafts including GravityBall. The Glimmer auto-annotation start site was listed as “22” in the track listings, however, the most commonly called start was the GeneMark auto-annotation start (26/49 non-draft listings), with eleven other members of the cluster A02 having called it. (phamerator and starterator reports were both accessed on October 4, 2023) /note=Location call: Based on all of the evidence aforementioned, this is most likely a real gene and its start site is at 22824. /note=Function call: No known function. The PhagesDB Blast revealed more than 50 hits with other Arthrobacter genes with e values below the threshold of 10^-6. Almost all of these low e-value hits had their functions listed as “function unknown”. This includes the two top hits from the genomes of phages Timinator_25 (e-value = 6e^-35) and StevieBAY_25 (e-value = 6e^-35). /note=NCBI Blast also had as it’s top two hits two proteins of no known function associated with the phages BarretLemon (e-value = 8e^-44) and Pippa (e-value = 6e^-39) - Both aligned with the results listed on PECAAN and were marked as evidence. /note=HHpred returned no hits which were admissible as evidence. The top hit was a protein of no known function (e-value = 5.4) and the second best was a cell division protein associated with a bacterial cell (e-value = 6.6). /note=CDD returned no hits at all. /note=Based on these results, it was decided that the most accurate function classification would be “No Known Function” /note= /note=Transmembrane domains: One transmembrane domain was predicted on PECAAN, so a TMHMM analysis was performed, but no transmembrane domains were identified. /note=Secondary Annotator Name: Infante, Ariana /note=Secondary Annotator QC: /note=SD (final) score: Omit meaning behind final score. Include information on z-score. /note=Gap/overlap: Mention the length of the gene with the chosen start site. /note=Phamerator: Include date at which pham number noted. Also, I think you confused this section with the starterator section. Change the location of everything you wrote after the pham number to the starterator section. To the phamerator section, add how this gene is conserved in other members of AO2 who are in the same pham, give examples, and state that they have no function called. /note=Starterator: Just add what I mentioned above to this section and it should be good :) /note=Function call: Mark the two phages mentioned in PhagesDB section on PECAAN as evidence. /note=Transmembrane domains: Double check with the DeepTMHMM analysis website because I think your gene is a membrane protein. There is one TMD that is predicted that follows the length requirements. CDS 23108 - 23329 /gene="26" /product="gp26" /function="hypothetical protein" /locus tag="GravityBall_26" /note=Original Glimmer call @bp 23108 has strength 10.15; Genemark calls start at 23108 /note=SSC: 23108-23329 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BJD79_gp26 [Arthrobacter phage BarretLemon] ],,NCBI, q1:s1 100.0% 3.28466E-44 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.84, -5.251091297337415, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BJD79_gp26 [Arthrobacter phage BarretLemon] ],,YP_009303095,100.0,3.28466E-44 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Bhatt, Khushi /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 23108. The start codon was GTG, which has a high prevalence. /note=Coding Potential: Coding potential for this ORF was on the forward strand only, indicating this is a forward gene. Coding potential is found in both GeneMark Self and Host, and the selected start site 23108 covers all the coding potential. /note=SD (Final) Score: -5.251. It is the best final score on PECAAN. The Z-score was 1.84, which is below the ideal Z-score of 2 but is nonetheless the best score on PECAAN. /note=Gap/overlap: Overlap: -4bp upstream. The overlap is small and matches with phages Timinator and LeeroyJ, which are both in cluster AO2. Additionally, a -4bp overlap is evidence that this gene is likely in an operon, which makes this overlap reasonable. /note=Phamerator: pham: 106357. Date: 10/02/23. There are 56 pham members, 4 of which are drafts. This pham is conserved in both the cluster AO2 phages Terminator and LeeroyJ. Neither phage Terminator nor phage LeeroyJ assigned a function to their genes in pham 106357. Therefore, it is consistent that this gene in phage GravityBall does not have a function either. /note=Starterator: Starterator was informative. There are 52 non-draft members of this pham. There are 33/52 non-draft genes call start site 8, which GravityBall possesses for this gene. Start site 8 was also autoannotated for this gene in GravityBall. This start site agrees with the Glimmer and GeneMark prediction, has the highest Z score, highest final score, contains all the coding potential, has the longest ORF, and starts with GTG. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 23108. /note=Function call: NKF. The top two phagesdb BLAST hits are listed as function unknown, and they have significant e-values of 2e-35. The top two NCBI BLAST hits also have a function of hypothetical protein, and the top hit has a 100 %identity, 100% coverage, and a e-value of 3.28e-44. There are no significant HHPred hits, as all the hits have non-significant values that do pass the 1e-6 threshold. There are no CDD hits as well. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Liu, Rodia /note=Secondary Annotator QC: I have QC`d this gene and agree with the location and function calls. CDS 23326 - 23739 /gene="27" /product="gp27" /function="membrane protein" /locus tag="GravityBall_27" /note=Original Glimmer call @bp 23326 has strength 9.44; Genemark calls start at 23326 /note=SSC: 23326-23739 CP: yes SCS: both ST: SS BLAST-Start: [holin [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 1.25403E-90 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.025, -3.3442433900004698, yes F: membrane protein SIF-BLAST: ,,[holin [Arthrobacter phage StevieBAY]],,QJD53357,100.0,1.25403E-90 SIF-HHPRED: SIF-Syn: Gene 27 is a holin protein of pham 96585. Its downstream gene(pham 103558) is an endolysin protein, and its upstream gene(pham 106357) has no known function. This matches phages BarretLemon and Timinator of cluster AO2. /note=Primary Annotator Name: Liang, Edwin /note=Auto-annotation: Glimmer and GeneMark call the start site at position 23326 with start codon ATG. /note=Coding Potential: The coding potential is found on the forward strand of both GeneMark Host and Self. All reasonable coding potential in the ORF is covered by the start site 10891. /note=SD (Final) Score: -3.344. Final score is the lowest among other start sites. The Z-score is 3.025, which is over 2. /note=Gap/overlap: 4bp overlap. Overlap is reasonable compared to gaps of other start sites. /note=Phamerator: Pham 96585. Holin protein. Conserved, found in other AO2 phages. /note=Starterator: Start site 2, 23326, was manually annotated 39 times for GravityBall. 100% of genes in this pham was manually annotated at start site 2. The evidence agrees with Glimmer and GeneMark. /note=Location call: Based on this evidence, this is a real gene and the most plausible start site is at 23326. /note=Function call: Holin protein. The top three hits from PhagesDB Blast list it as a holin protein and have e-values of 5e-71 and 2e-70. The NCBI Blast search yielded hits with holin protein as well. HHPred and CDD had no data to support this function, but synteny was found and an endolysin gene was found downstream of holin gene. TMD is found, which is reasonable for a holin protein. /note=Transmembrane domains: DeepTMHMM predicts a TMD for this protein. There is evidence to support that this is a transmembrane protein. /note=Secondary Annotator Name: Almeida, Tarissa /note=Secondary Annotator QC: I have QC`d this gene and agree with calls by the primary annotator. I would edit gap/overlap section to say 4bp overlap rather than -4 bp gap for clarity. Try not to refer to genes by number in synteny box and use their pham numbers instead. CDS 23729 - 25318 /gene="28" /product="gp28" /function="endolysin" /locus tag="GravityBall_28" /note=Original Glimmer call @bp 23729 has strength 16.15; Genemark calls start at 23729 /note=SSC: 23729-25318 CP: yes SCS: both ST: SS BLAST-Start: [endolysin [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 0.0 GAP: -11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.194, -2.5410833118725082, no F: endolysin SIF-BLAST: ,,[endolysin [Arthrobacter phage StevieBAY]],,QJD53358,100.0,0.0 SIF-HHPRED: Putative phage lysin; endolysin, prophage, Lytic activity, HYDROLASE; 1.9A {Streptococcus phage phi7917},,,5D74_B,40.8318,99.4 SIF-Syn: Gene STOP@25318 is an endolysin. Its upstream is called as a holin protein whereas the downstream gene has no known function. This synteny is observed in other AO2 phages such as LeeroyJ and StevieBAY. /note=Primary Annotator Name: Sanoyca, Alicia /note=Auto-annotation: Both GeneMark and Glimmer call start site 23729 (ATG). /note=Coding Potential: The gene has reasonable coding potential within the ORF, and this chosen start site covers all the coding potential. Coding potential is located on reading frame 2 of the direct sequence. /note=SD (Final) Score: Start 23759 has the best/highest Z-score (3.276) and the best/least negative final score (-2.315). However, start 23729 has comparable Z and final scores, 3.194 and -2.541 respectively. /note=Gap/overlap: Start site 23729 has an overlap of 11 with the preceding gene, and it yields the longest gene length of 1590bp. The presence of an overlap may suggest that this gene is part of an operon. Start site 23759 is also probable, as it results in a gap of 19. /note=Phamerator: As of 10/04/23 this gene belongs to pham 103558, which belongs to other phages of the AO2 cluster. Phamerator calls this gene to be an endolysin, which is found in the approved function list. /note=Starterator: Start site 10 @23729 is the most annotated start site, as it is called in 22/42 (52.4%) of the non-draft genes in the pham, including StevieBAY and BarretLemon. /note=Location call: This is likely a real gene, as it is in a conserved pham among other phages within the same AO2 cluster and the ORF covers reasonable coding potential. Furthermore, this gene has a synteny with other AO2 cluster phages. The most likely start site is 23729, as it yields the longest gene length and it is consistent with the starterator report, which mentions that this start site is the most commonly annotated start site for non-draft genes in the pham. /note=Function call: Phagesdb BLAST calls this gene an endolysin for many other pham members within the AO2 cluster, such as StevieBAY and BarretLemon. NCBI BLAST also shows a hit to endolysins with 100% identity and e-values of 0. HHPRED shows matches to endolysin with high probabilities (>98%) but questionable e-values (lowest 7.7e-10) and % coverage 40.8% and lower, however these are for phages that infect different bacteria and the approved functions list mentions differences in endolysin structure among phages of different hosts (Streptococcus, Staphylococcus, Clostridium perfringens). CDD displays a hit to a CHAP domain with low %identity/alignment/coverage, suggesting this gene may have a CHAP domain found specifically to Arthrobacter phages. The sea-phages forums also say StevieBAY has an endolysin rather than lysinA and since this is a syntenic gene with a shared cluster phage, this gene can be confidently called as an endolysin. /note=Transmembrane domains:There are no predicted TMDs, further suggesting this is not a transmembrane protein. /note=Secondary Annotator Name: Anand, Sasha /note=Secondary Annotator QC: For coding potential, mention which forward reading frame contains the ORF. Under function call, I would name 1-2 specific hits from BLASTp and HHPRED with the coverage and probability percentages. Be sure to fill out the synteny box. CDS 25318 - 25686 /gene="29" /product="gp29" /function="membrane protein" /locus tag="GravityBall_29" /note=Original Glimmer call @bp 25318 has strength 20.72; Genemark calls start at 25318 /note=SSC: 25318-25686 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BJD79_gp29 [Arthrobacter phage BarretLemon] ],,NCBI, q1:s2 100.0% 4.15814E-80 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.497, -4.40029072999159, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein BJD79_gp29 [Arthrobacter phage BarretLemon] ],,YP_009303098,99.187,4.15814E-80 SIF-HHPRED: SIF-Syn: Membrane protein. Upstream is endolysin and downstream is baseplate tail protein, also found in StevieBay. /note=Primary Annotator Name: Chan, Rose /note=Auto-annotation: Both Glimmer and GeneMark call the gene and they agree on the start site 25318. /note=Coding Potential: Coding potential in this gene is on both the forward and reverse strand but the reverse strand can be disregarded since it is not large enough to be considered significant. Coding potential is found in both GeneMark Self and Host. /note=SD (Final) Score: -4.400. It is the best final score on PECAAN. Has best z-score of 2.497. /note=Gap/overlap: Gap: -1 bp. This is a small overlap and indicates that this gene is likely part of an operon. /note=Phamerator: pham: 92. Date: 10/9/23. It is conserved, found in BarrettLemon, BossLady, and StevieBAY. /note=Starterator: Start site 27 was manually annotated in 17 of 122 non-draft genes in this pham. Start site 27 is 25318 in GravityBall. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 25318. /note=Function call: NKF. The top four NCBI Blast hits are also NKF (e-value 4e-80), and the top four hits on PhagesDB are also NKF. HHPred had no hits with high enough coverage or low E-value, and CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM predicts 2 TMDs; one is 20 aa long (from aa 34 to 54) and the other is 20 aa long (from aa 71 to 91). Despite all other evidence that it is NKF, there is at least 1 TMD around 17-22 aa long, is a membrane protein. /note=Secondary Annotator Name: De Guzman, Arieanne /note=Secondary Annotator QC: Missing section for SD. Since DeepTMHMM predicts at least 1 TMD around 17-22 aa long, function can be called as membrane protein. CDS 25730 - 26065 /gene="30" /product="gp30" /function="baseplate wedge protein" /locus tag="GravityBall_30" /note=Original Glimmer call @bp 25730 has strength 9.58; Genemark calls start at 25730 /note=SSC: 25730-26065 CP: yes SCS: both ST: SS BLAST-Start: [baseplate protein [Arthrobacter phage BarretLemon] ],,NCBI, q1:s1 100.0% 8.437E-73 GAP: 43 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.052, -5.4158953075094765, yes F: baseplate wedge protein SIF-BLAST: ,,[baseplate protein [Arthrobacter phage BarretLemon] ],,YP_009303099,100.0,8.437E-73 SIF-HHPRED: Putative tail lysozyme; extracellular contractile injection system, STRUCTURAL PROTEIN; 2.7A {Algoriphagus machipongonensis},,,7AEB_W,93.6937,99.7 SIF-Syn: Baseplate wedge protein, upstream is similarly conserved gap preceded by a conserved ORF with an unknown function, and baseplate J protein downstream, just as in many AO2 phages like BarretLemon, LeeroyJ, and Timinator. /note=Primary Annotator Name: Chong, Truman /note=Auto-annotation: Glimmer and GeneMark. Both call the site at 25730 with a start codon of ATG which has a high potential of being a start site. /note=Coding Potential: The Coding Potential on the Forward Strand is found in both GeneMark Host and Self. All reasonable coding ORF is covered by this start site. /note=SD (Final) Score: The FS score for this site is -5.416 with a Z-score of 2.052, lending this ATG start codon at 25730 to be the most likely start site due to it having the least negative FS score and the highest Z-score above 2. /note=Gap/overlap: Gap 43 bp upstream with a total gene length of 336 bp. Gene Length and Gap are both conserved in comparison to LeeroyJ and BarretLemon. /note=Phamerator: Pham: 85603. Date: 10/9/2023. This pham has 57 members. It is conserved in and found in both BarretLemon and LeeroyJ, both of which call the function as Baseplate Wedge Protein, an approved function. /note=Starterator: Start Site 18 was manually annotated for 50/52 non-draft genes in the pham with call rate of 98.2%. Start Site 18 is 25730 in GravityBall. This evidence agrees with the site predicted by Glimmer. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 25730. /note=Function call: Baseplate Wedge Protein. Top results for PhagesDB (e-value: 4e-59) and NCBI (e-value: 9e-73) indicated as such with high coverage, and a low e-value (<1e-6) for other phages. Additionally, HHpred provided many hits indicating either a tail-related lysosome, baseplate wedge, or a protein associated with contraction of the tail, all with high coverage (>90%), high probability (>99%), and low e-value (<1e-3). CDD provided more insight with evidence of a domain hit for GPW_gp25 super family (e-value: 2.80e-4) which has relation to phage protein Gene 25 from T4 that is a structural component of the outer wedge of the baseplate that has acidic lysozyme activity. This likely associates the different possibilities as listed by HHpred to have the function of the baseplate wedge protein. /note=Transmembrane domains: TMHMM did not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Trinh, Uyen /note=Secondary Annotator QC: Phamerator: Note down how many phages the gene is conserved in total. Could potentially be shortened (especially function call) even though I understand why you would want to keep it all! Good job! Appropriate evidence from HHPred, CDD, NCBD, Phagesdb BLAST were checked. CDS 26058 - 27188 /gene="31" /product="gp31" /function="baseplate J protein" /locus tag="GravityBall_31" /note=Original Glimmer call @bp 26058 has strength 22.77; Genemark calls start at 26058 /note=SSC: 26058-27188 CP: yes SCS: both ST: SS BLAST-Start: [baseplate J protein [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 0.0 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.377, -2.4634880269616195, yes F: baseplate J protein SIF-BLAST: ,,[baseplate J protein [Arthrobacter phage StevieBAY]],,QJD53361,100.0,0.0 SIF-HHPRED: Baseplate_J ; Baseplate J-like protein,,,PF04865.17,71.2766,100.0 SIF-Syn: Baseplate J protein. The upstream gene is baseplate wedge protein. The downstream gene is minor tail protein. These functions are also seen in Timinator and StevieBAY. /note=Primary Annotator Name: Infante, Ariana /note=Auto-annotation: Glimmer and GeneMark both call the start of this gene at 26058, with a start codon of ATG. This start codon has a high probability of occurrence. /note=Coding Potential: The ORF has reasonable coding potential in the forward direction, with the selected start site of 26058 covering all of the coding potential seen on the self-trained and host-trained GeneMark. /note=SD (Final) Score: The start site of 26058 has the best, less negative final score (-2.463) and the best z-score (3.377). /note=Gap/overlap: The overlap with the upstream gene is -8 bp. The other candidates for start sites create gaps larger than 79 bp, but they have poor z- and final scores in comparison to the chosen start site, thus ruling them out. The length of the gene of 1131 bp is acceptable. /note=Phamerator: The pham number as of October 4th, 2023 is 1549. The gene is conserved in Abba, Beans, BossLady, and Brent; they all belong to the same cluster as GravityBall which is AO. The function call for this gene is baseplate J protein, which is consistent across Phamerator and is found in the approved function list. /note=Starterator: There are 60 non-draft members in this pham. 24 of the 60 non-draft members call start site 10, which corresponds to a start site at 26058 in GravityBall. /note=Location call: Based on the above evidence, this is a real gene that has a likely start site at 26058. Starterator is in agreement with Glimmer and GeneMark. /note=Function call: Baseplate J protein. The top four PhagesDB BLAST hits (StevieBAY, BarretLemon, LeeroyJ, and Timinator) with a listing of baseplate J protein have e-values of 0, belong to the same cluster as GravityBall (AO2), and are a part of the same pham. The top four NCBI BLAST hits with a listing of baseplate J protein have e-values of 0, identities greater than 99%, and coverages of 100%. The top HHpred hit has a high probability of 100%, a high coverage of 71.3%, and a low e-value of 7e-31. The top CDD hit has a high coverage of 64.6% and a low e-value of 7.8e-18. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Castellanos, Sebastian /note=Secondary Annotator QC: I agree with the location and function calls. In the gap/overlap section, I would remove the line that genes with overlaps longer than 0 or negative 1 cannot be part of operons since my notes say that overlaps of negative four are also potentially members of operons and Juliet told me that longer gaps than that are a gray area. In the phamerator section, Abba, Beans, and Brent are not members of the subcluster A02 so I would specify that or else not include them. For the databases in the function call section, I would include the names of each of the hits. CDS 27188 - 28471 /gene="32" /product="gp32" /function="minor tail protein" /locus tag="GravityBall_32" /note=Original Glimmer call @bp 27188 has strength 15.17; Genemark calls start at 27188 /note=SSC: 27188-28471 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.183, -4.582905859694018, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage StevieBAY]],,QJD53362,100.0,0.0 SIF-HHPRED: Tail_P2_I ; Phage tail protein (Tail_P2_I),,,PF09684.14,32.0843,99.7 SIF-Syn: Minor tail protein, the upstream gene is a baseplate J protein, and the downstream gene is a membrane protein. BarretLemon, Jordan, and StevieBAY had the same function of the upstream gene(31), but their downstream gene is not called. /note=Primary Annotator Name: Liu, Rodia /note=Auto-annotation: Glimmer and GeneMark. Both calls start at 27188. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: The final score is the best option at -4.583 and Z-score is 2.183. /note=Gap/overlap: -1. This indicates that this gene is likely part of an operon. /note=Phamerator: pham: 102773. Date: 10/4/23. It is conserved; found in BarretLemon(AO) and Bumble(AO). The function is tail protein in BarretrLemon and Bumble. /note=Starterator: (Start: 6 @27188 has 22 MA`s). Start site 6 in Starterator was manually annotated in 23/23 non-draft genes in this pham. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 27188. /note=Function call: Minor tail protein.The top three PhagesDB BLAST hits had the function of minor tail protein and tail protein. (E-value= 0), and the NCBI BLAST hits had the function of minor tail protein and tail protein. (100% coverage, 99%+ identity, and E-value= 0). HHpred had hits for tail protein. (99% probability and E-value < e-16). CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs. Therefore, it is not a membrane protein. /note=Secondary Annotator Name: Bhatt, Khushi /note=Secondary Annotator QC: I agree with this call, and all the relevant information has been checked off! There are a few spots that can be slightly corrected otherwise: /note=1) For the "auto-annotation" section, Glimmer and GeneMark only predict the start site, not the stop site. I would correct this section to remove the description of the stop site. /note=2) For the "gap/overlap" section, I would state whether this overlap of -1bp is conserved in other phages, which would strengthen your evidence for selecting this start site. /note=3) In the "phamerator" section, make sure that phage BarretLemon is properly spelled! Also, many genes can be in the same pham but may perhaps have different functions. To strengthen your call, I would omit the phrase "but not all phages in this cluster have the same function," and I would describe the phages that do support your call (ex. the ones with low e-value BLAST hits). /note=4) Your function section has extremely strong evidence! To strengthen it even further, I would even explicitly state that you have e-values of "0," which are the most desirable/strongest pieces of evidence. /note=5) In the synteny box, make sure the ending clause says " is not called" as opposed to "no called" for proper grammar! CDS 28673 - 28807 /gene="33" /product="gp33" /function="membrane protein" /locus tag="GravityBall_33" /note=Original Glimmer call @bp 28673 has strength 18.69; Genemark calls start at 28673 /note=SSC: 28673-28807 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BJD79_gp33 [Arthrobacter phage BarretLemon] ],,NCBI, q1:s1 100.0% 2.18438E-21 GAP: 201 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.781, -3.3060637489568596, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein BJD79_gp33 [Arthrobacter phage BarretLemon] ],,YP_009303102,100.0,2.18438E-21 SIF-HHPRED: SIF-Syn: This gene`s function is a membrane protein. The gene upstream has the function of minor tail protein with a pham number 102733. The gene downstream has the function of exonuclease with a pham number 104794 /note=Primary Annotator Name: Almeida, Tarissa /note=Auto-annotation: Both Glimmer and GeneMark call the start site for this gene at 28673. The start codon is ATG which has a high probability of being used. /note=Coding Potential: The ORF in this gene has reasonable coding potential in the forward direction as supported by Glimmer and GeneMark. The start site chosen covers all of the coding potential which makes it a likely candidate for a start. /note=SD (Final) Score: The final score is -3.306 and is the best score out of all start candidates. The Z-score is 2.781 and is the best score of all possible start sites. /note=Gap/overlap: This gene has a gap of 201 which isn’t reasonable but is well conserved in other phages such as Barretlemon and BossLady. /note=Phamerator: Pham number 116695 date 10/4/23 and conserved in phage BarretLemon and BossLady. /note=Starterator: Start: 2 @28673 has 16 manual annotations and is called 100% in genes in this pham. This information agrees with the information from Glimmer and Starterator and is similar to phages BarretLemon, Beans, and BossLady. /note=Location call: Based on the information found, it is with high confidence that the start site for this gene is 28673 as well as the gene being real. /note=Function call: This genes function is a membrane protein. There are strong PhagesDB Blast hits for NKF as well as NCBI hypothetical proteins hits with significant e-values however there is a DeepTMHMM hit for being a transmembrane protein which allows us to make the specification. /note=Transmembrane domains: There is a hit on DeepTMHMM for this gene being transmembrane and also falls in between the 17-22 bp threshold. /note=Secondary Annotator Name: Liang, Edwin /note=Secondary Annotator QC: I have QC`d this gene and agree with the function and location calls. CDS 28804 - 29823 /gene="34" /product="gp34" /function="exonuclease" /locus tag="GravityBall_34" /note=Original Glimmer call @bp 28804 has strength 22.51; Genemark calls start at 28804 /note=SSC: 28804-29823 CP: yes SCS: both ST: SS BLAST-Start: [exonuclease [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.92, -5.090006274910881, no F: exonuclease SIF-BLAST: ,,[exonuclease [Arthrobacter phage StevieBAY]],,QJD53364,100.0,0.0 SIF-HHPRED: Uncharacterized protein R354; MIMIVIRE, Cas4-like, nuclease, R354, NUCLEAR PROTEIN; 2.806A {Acanthamoeba polyphaga mimivirus},,,5YET_B,87.3156,99.9 SIF-Syn: There is a 4 base pair overlap upstream of this putative gene, which is conserved in phages StevieBAY and Timinator. There is a 4 base pair gap downstream of this protein which is conserved in phages StevieBAY and BossLady. Both putative genes in GravityBall and StevieBAY are 339 base pairs long with identical start and stop sites. /note=Primary Annotator Name: Anand, Sasha /note=Auto-annotation: Glimmer and GeneMark. Both call the start site at 28804 with an ATG codon. /note=Coding Potential: The ORF for this gene encompasses all of the coding potential on the first reading frame in the forward direction. /note=SD (Final) Score: -5.090, this is not the best final score out of all potential stop sites. However, there is an overlap of 4 bp with the previous gene, indicating this gene may be part of an operon. Final score can be disregarded. The Z-score for a gene with this designated start site is 1.92, which is slightly below the threshold value of 2.0 and that of other potential starts. Besides this discrepancy, all other evidence indicates that this is a real gene. /note=Gap/overlap: This start site has an overlap of 4 with the previous gene. This corroborates the idea that this gene is likely part of an operon. /note=Phamerator: pham: 104794. Date: 10/4/2023. It is conserved; all of the non-draft genomes share similarities in this region and are located in the same pham for this gene. GravityBall displays synteny with at least two other AO2 sub-cluster members, BossLady and Grekaycon. /note=Starterator: Start site 38 has the most MAs out of all potential start sites in Pham104794. This phage does not contain the most annotated start site. Start site 24 (@28804) has 23 MAs, with 14/16 members of the AO2 cluster calling this start. /note=Location call: Based on the evidence above, the most likely start site for this gene is 28804. It has an ATG start codon. /note=Function call: Cas4 exonuclease. The top four PhagesDB BLAST hits have the function of exonuclease (E-value 0.0), and the top NCBI BLAST hits have the function of exonuclease. (100% coverage, 100% identity, and E-value 0.0). HHPRED had a hit for Cas-4-like nuclease with a probability of 99.9, 87.3% coverage, and E-value 1.6e-27. It follows the SEA-PHAGES guideline for this function with another HHPRED alignment to the nuclease superfamily PD-(D/E)XK with a probability of 99.57, 55.16% coverage, and E-value 8.3e-16. CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs. DeepTMHMM predicts this hypothetical protein is globular and located inside the viral membrane, which corroborates the function call that this is an exonuclease contained within the viral capsid. /note=Secondary Annotator Name: Sanoyca, Alicia /note=Secondary Annotator QC: Select if the start site covers all coding potential. CDS 29827 - 30672 /gene="35" /product="gp35" /function="RecT-like DNA pairing protein" /locus tag="GravityBall_35" /note=Original Glimmer call @bp 29827 has strength 22.18; Genemark calls start at 29827 /note=SSC: 29827-30672 CP: yes SCS: both ST: SS BLAST-Start: [RecT-like ssDNA annealing protein [Arthrobacter phage BarretLemon] ],,NCBI, q1:s1 100.0% 0.0 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.68, -3.859592806742189, no F: RecT-like DNA pairing protein SIF-BLAST: ,,[RecT-like ssDNA annealing protein [Arthrobacter phage BarretLemon] ],,YP_009303104,100.0,0.0 SIF-HHPRED: RecT; DNA Recombination, DNA Annealing, DNA BINDING PROTEIN; 4.5A {Listeria innocua Clip11262},,,7UBB_E,77.2242,100.0 SIF-Syn: RecT-like DNA pairing protein; upstream gene is an exonuclease (pham number 104794) and downstream gene has same pham number of 99654 as StevieBAY. /note=Primary Annotator Name: De Guzman, Arieanne /note=Auto-annotation: Glimmer and GeneMark. Both call start at 29827. /note=Coding Potential: Self and Host-Trained GeneMark found coding potential contained in the chosen start site and stop site. ORF contains reasonable coding potential in the forward strand only. /note=SD (Final) Score: -3.860. Best SD score in PECAAN. Contains the second best Z-score in PECAAN of 2.68. /note=Gap/overlap: 3 bp. Small gap with no coding potential. Acceptable as gap is below the recommended 50bp limit. Gap is conserved in other Cluster AO2 phages (StevieBAY, LeeroyJ). /note=Phamerator: Pham 114707. Date: 10/4/23. Conserved as found in StevieBAY (AO2) and LeeroyJ (AO2). /note=Starterator: Start site 38 was manually annotated in 117/241 non-draft genes in the pham and was the most often called start site. Start site 38 is present in GravityBall as 29827 and thus agrees with Glimmer and Genemark. /note=Location call: Above evidence points to a real gene with a start site of 29827. /note=Function call: RecT-like DNA pairing protein. Top hits from Phagesdb BLASTp have an e-value of > 10^-161 with function calls of RecT-like DNA pairing protein. Top phage hit from NCBI BLASTp has an e-value of 0 and a function call of RecT-like DNA pairing protein. All hits from CDD called this function with e-values of 0 and coverages of >69%. HHpred’s top hit has an e-value of 0, coverage of 77%, and probability of 100% with a RecT function call. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs and therefore is not a membrane protein. /note=Secondary Annotator Name: Chan, Rose /note=Secondary Annotator QC: I have QC`d this gene and agree with the location and function call. CDS 30686 - 30898 /gene="36" /product="gp36" /function="helix-turn-helix DNA binding domain" /locus tag="GravityBall_36" /note=Original Glimmer call @bp 30686 has strength 9.3; Genemark calls start at 30686 /note=SSC: 30686-30898 CP: yes SCS: both ST: SS BLAST-Start: [endonuclease [Arthrobacter phage BarretLemon] ],,NCBI, q1:s8 100.0% 1.06702E-40 GAP: 13 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.898, -3.1513923047253267, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[endonuclease [Arthrobacter phage BarretLemon] ],,YP_009303105,90.9091,1.06702E-40 SIF-HHPRED: Related to ribosomal protein YmL20, mitochondrial; Neurospora crassa, translating Mitoribosomes, tRNA, mRNA, mL108, TRANSLATION; HET: NAD, K, ATP, MG, SPM;{Neurospora crassa},,,6YWE_a,68.5714,96.0 SIF-Syn: helix-turn-helix DNA binding domain protein, upstream gene is RecT-like DNA pairing protein, downstream gene is ssDNA-binding protein, just like in phage BarretLemon. /note=Primary Annotator Name: Trinh, Uyen /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 30686. Start codon is GTG, which is the second most common start codon, providing evidence for the location call. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.151. It is the best final score on PECAAN. Z-score of 2.136 (> 2) also provides evidence for the location call. /note=Gap/overlap: Gap of 13 bp with the upstream gene. Small, which is indicative of an appropriate start site. /note=Phamerator: 99654. Date 10/09/2023. This gene is conserved in 37 other phages. /note=Starterator: There are 34 non-draft members of this Pham. 16 non-draft members call start site 36, which correlates to a start site of 30686 bp for GravityBall. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 30686 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: helix-turn-helix DNA binding domain protein. The top three phagesdb BLAST hits have the function of helix-turn-helix DNA binding domain protein (E-value <10^-32). While the HHPred hits have high E-values ( >10e-3) of 0.086 and 0.14, there was high probability (>80-90), high coverage (> 35%), and the hits have the helix-turn-helix DNA binding domain. The sequence alignment includes "2-3 alpha helices in the sequence separated by small spacer (turn) regions of 3-4 amino acids" that are specified in the SEA-PHAGES functional assignment. NCBI had multiple hits for the suggested function and CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Chong, Truman /note=Secondary Annotator QC: Everything looks good. I agree with the start and function calls. Perhaps it would be beneficial to also include the qualifiers for the NCBI hits as they also contribute to the helix-turn-helix function call. CDS 30898 - 31446 /gene="37" /product="gp37" /function="SSB protein" /locus tag="GravityBall_37" /note=Original Glimmer call @bp 30922 has strength 19.8; Genemark calls start at 30898 /note=SSC: 30898-31446 CP: yes SCS: both-gm ST: SS BLAST-Start: [ssDNA-binding protein [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 3.74641E-127 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.194, -3.30700010583914, yes F: SSB protein SIF-BLAST: ,,[ssDNA-binding protein [Arthrobacter phage StevieBAY]],,QJD53367,100.0,3.74641E-127 SIF-HHPRED: Single-stranded DNA-binding protein 2; Single-stranded DNA-binding protein, Streptomyces Coelicolor, DNA damage, DNA repair, DNA replication, DNA-binding, Phosphoprotein, DNA BINDING PROTEIN; 2.141A {Streptomyces coelicolor} SCOP: b.40.4.3,,,3EIV_D,67.5824,100.0 SIF-Syn: This gene displays synteny with analogs in other members of the AO2 subcluster - specifically BarretLemon, and BossLady. In both cases, the genes are also flanked on both sides by helix-turn-helix DNA-binding domains. /note=Gene (stop@31446) /note=PECAAN Notes /note=Primary Annotator Name: Castellanos, Sebastian /note=Auto-annotation: Glimmer and GeneMark flagged different start sites for this gene (30922 and 30898) with start codons GTG and ATG, respectively. /note=Coding Potential: The putative coding potential displayed by the host-trained GeneMark shows a gradual increase which is encompassed for the most part by both start sites, but the self-trained GeneMark showed the coding potentially rising at a steeper slope at approximately nucleotide 30900 which was better accommodated by a start site at 30898. /note=SD (Final) Score: -5.991 (start @30922) & -3.307 (start @30898) - The final score is highest (least negative) for the GeneMark start site. Additionally, the GeneMark start site had a much better z-score of 3.194 v. Glimmer’s 1.702 (the threshold for preferability being 2). This was taken as further evidence in favor of GeneMark having called the correct start site. /note=Gap/overlap: The GeneMark start site (start @30898) had an overlap of 1 whereas the Glimmer start site (start @30922) had a gap of 23 nucleotides with the previous gene. Such a large gap is indicative of the Glimmer start site being incorrect, whereas the overlap associated with the GeneMark start site suggests membership in an operon. The length of the gene with the start site of 30898 is definitely acceptable at 549 base pairs. /note=Phamerator: Pham 117511 - This gene was conserved in several other members of the subcluster AO2, including BarretLemon and BossLady, both of which listed the function of their analogs as "single-stranded DNA-binding protein" /note=Starterator: This pham has 138 members, of which 19 are drafts. The most commonly annotated start site - corresponding in this gene to the GeneMark autoannotated start site - was listed as 21 (95/119 non-draft genomes called it). GravityBall_37 was the only gene that possessed this particular start site but did not call it. The start site for this gene (corresponding to the Glimmer start site), was listed as 25, had no manual annotations, and was only called 3.1% of the time when present. This was taken as further evidence that the GeneMark start site was correct. (Phamerator and Starterator reports were both accessed on October 11, 2023) /note=Location call: Based on all of the evidence aforementioned, this is most likely a real gene and its start site is at 30898. /note=Function call: ssDNA-binding protein. The PhagesDB Blast revealed 100 hits with e-values below the threshold of 10^-6. Almost all of these were listed as “ssDNA-binding proteins”, including the top two hits - associated with StevieBay_37 (e-value = e^-100) and NathanVaag_35 (e-value = 4e^-83). NCBI Blast also had as its top two hits ssDNA-binding proteins associated with StevieBay (e-value = 3.75e^-127) and Pippa (e-value = 2.12e^-88). Both of these were marked as evidence. /note=HHpred also returned a number of hits listed as single-stranded DNA-binding domains. The top two, a Streptomyces Coelicolor and Mycobacterium Tuberculosis protein, had e-values of 3.1e^-29 and 2.1e^-26, respectively, as well as 100% coverage. Both were marked as evidence in PECAAN. /note=CDD also returned multiple hits described as ssDNA-binding proteins. The top result, PRK07772 - a provisional protein with 100% and an e-value of 0, was not marked as evidence. The second result, cd04496 - an Escherichia coli protein classified as SSB_OF (responsible for protecting single-stranded DNA intermediates during metabolic processes) - with an e-value = 1.63e^-38 was. /note=Based on these results, it was decided that the most accurate function classification would be “ssDNA-binding protein” /note= /note=Transmembrane domains: No transmembrane domains were predicted on PECAAN. As such, no TMHMM analysis was performed. /note=Secondary Annotator Name: Infante, Ariana /note=Secondary Annotator QC: SD (final) score: Omit what the final score means. /note=Gap/overlap: Mention the length of the gene with the chosen start site. /note=Phamerator: Include date at which pham number noted. Also, I think you confused this section with the starterator section. Change the location of everything you wrote after the pham number to the starterator section. To the phamerator section, add how this gene is conserved in other members of AO2 who are in the same pham, give examples, and state that they have a function call that is consistent/found in the approved functions list. /note=Starterator: Just add what I mentioned above to this section and it should be good :) /note=Function call: Mark the two phages mentioned in PhagesDB section on PECAAN as evidence. /note=Synteny Box: Also write down the function of upstream and downstream genes. CDS 31655 - 31888 /gene="38" /product="gp38" /function="helix-turn-helix DNA binding domain" /locus tag="GravityBall_38" /note=Original Glimmer call @bp 31655 has strength 11.52; Genemark calls start at 31655 /note=SSC: 31655-31888 CP: yes SCS: both ST: SS BLAST-Start: [helix-turn-helix DNA-binding domain protein [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 4.57181E-46 GAP: 208 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.287, -2.4329349954350334, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA-binding domain protein [Arthrobacter phage StevieBAY]],,QJD53368,100.0,4.57181E-46 SIF-HHPRED: RNA polymerase sigma factor; sigma factor, Streptomyces, transcription, ECF56; HET: GOL; 2.2A {Streptomyces tsukubensis NRRL18488},,,7QH5_A,97.4026,96.5 SIF-Syn: Gene 38 is a helix-turn-helix DNA binding domain protein. Its downstream gene is also a helix-turn-helix DNA binding domain protein, and its upstream gene is a Dpr-like ssDNA binding protein, which matches phages BarretLemon and Timinator (cluster AO2). /note=Primary Annotator Name: Bhatt, Khushi /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 31655. The start codon is ATG, which has a high prevalence. /note=Coding Potential: Coding potential for this ORF was on the forward strand only, indicating this is a forward gene. Coding potential is found in both GeneMark Self and Host, and the selected start site 31655 covers all the coding potential. /note=SD (Final) Score: -2.433. It is the best final score on PECAAN. The Z-score was 3.287, which is the best Z-score and crosses the Z-score threshold of 2. /note=Gap/overlap: Gap: 208bp upstream. This gap is large, however, there is no coding potential found in this gap in both GeneMark Self and Host, and this gap is also conserved in the cluster AO2 phages StevieBAY and JKerns. The synteny and absence of coding potential thus make this gap reasonable. /note=Phamerator: 3191. Date: 10/02/23. There are 25 pham members, 1 of which is a draft. This pham is conserved in the cluster AO2 phages StevieBAY and JKerns. Both phages assigned a helix-turn-helix DNA-binding domain protein function to their genes in pham 3191, which is consistent with SEA-PHAGES approved function list guidelines since both genes possess 2-3 alpha helices with spacers of 3-4 amino acids in between. /note=Starterator: Starterator was informative. There are 24 non-draft members of this pham. All 24/24 non-draft members of this pham call start site 16, which GravityBall possesses for this gene. Start site 16 was autoannotated for this gene in GravityBall. This start site agrees with the Glimmer and GeneMark predictions, has the highest Z-score, the highest final score, contains all the coding potential, has the longest ORF, and starts with ATG. /note=Location call: Based on the above evidence, this is a real gene, and the most likely start site is at 31655. /note=Function call: Helix-turn-helix DNA-binding domain. The top three phagesdb BLAST hits have a helix-turn-helix DNA-binding domain protein function, and the top two hits have significant e-values of 8e-36 and 2e-30. The top three NCBI BLAST hits also have a helix-turn-helix DNA-binding domain protein function, and the top hit has a 100 %identity, 100 %coverage, and significant e-value of 4.57e-46. While the top HHPred hits had non-significant/suboptimal e-values, a SEA-PHAGES forum post (https://seaphages.org/forums/post/10139/) with similar HHpred hits to sigma factors was informed to be clear evidence of a helix-turn-helix DNA-binding domain function, even with e-values not meeting the 1e-6 significance threshold. There were no CDD hits for this gene in phage GravityBall, but the multiple significant hits on both the phagesdb BLAST and the NCBI BLAST in addition to the forum discussion of the HHPred results suggested a helix-turn-helix DNA-binding domain function. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Liu, Rodia /note=Secondary Annotator QC: I have QC`d this gene and agree with the location and function calls. Note: If possible, I would also check the function of helix-turn-helix on HHPRED. Although its coverage only 70% and 0.46 e-value, the probability is high (95%), furthermore, the helix-turn-helix binding domain normally contains low coverage due to the narrow binding site. CDS 31885 - 32421 /gene="39" /product="gp39" /function="helix-turn-helix DNA binding domain" /locus tag="GravityBall_39" /note=Original Glimmer call @bp 31885 has strength 17.94; Genemark calls start at 31885 /note=SSC: 31885-32421 CP: yes SCS: both ST: SS BLAST-Start: [helix-turn-helix DNA binding domain protein [Arthrobacter phage EastWest]],,NCBI, q1:s1 100.0% 1.21723E-92 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.297, -4.4157344874810125, no F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding domain protein [Arthrobacter phage EastWest]],,UGL61924,84.7826,1.21723E-92 SIF-HHPRED: SIF-Syn: Gene 39 is a helix-turn-helix DNA binding domain protein of pham 86401. Its downstream gene(pham 119207) has no known function, and its upstream gene(pham 3191) is another helix-turn-helix DNA binding domain protein, which matches phages BarretLemon and Timinator of cluster AO2. /note=Primary Annotator Name: Liang, Edwin /note=Auto-annotation: Both Glimmer and GeneMark call the start site at position 31885 with the start site ATG. /note=Coding Potential: All reasonable coding potential in the ORF is encapsulated by start site 31885 based on GeneMark Host and Self data. /note=SD (Final) Score: -4.416. This is the best final score as it is the lowest score with reasonable gap and length. The Z-score is 2.297, which is over 3. /note=Gap/overlap: 4bp overlap. Acceptable overlap length. Other gaps are too large, have poor final scores, or have unreasonable gene length to consider. /note=Phamerator: Pham 86401. Helix-turn-helix DNA-binding domain protein. Conserved and found in other AO2 phages. /note=Starterator: Start site 7, 31885, was manually annotated 23 times. 92% of genes in this pham were manually annotated at start site 7. This agrees with Glimmer and GeneMark. /note=Location call: Based on this evidence, this is a real gene and the most plausible start site is at 31885. /note=Function call: Helix-turn-helix DNA binding domain protein. Top hits of PhagesDB NCBI Blast list this HTH protein, but also RepA-like replication initiator. HHPred analysis finds that there are at least 2-3 alpha helices separated by 3-4 amino acid spacers, meaning a HTH DNA binding domain protein is most likely. PF09681.13 used as confirmation for HTH-binding domain. /note=Transmembrane domains: No TMD is found, which is reasonable for this type of protein. /note=Secondary Annotator Name: Almeida, Tarissa /note=Secondary Annotator QC: I have QC`d this gene and agree with calls made by the primary annotator. I would reword gap/overlap section to be 4 bp overlap for clarity. Refer to genes by pham number in synteny box. CDS 32729 - 33523 /gene="40" /product="gp40" /function="hypothetical protein" /locus tag="GravityBall_40" /note=Original Glimmer call @bp 32696 has strength 17.3; Genemark calls start at 32729 /note=SSC: 32729-33523 CP: yes SCS: both-gm ST: NI BLAST-Start: [hypothetical protein BJD79_gp40 [Arthrobacter phage BarretLemon] ],,NCBI, q1:s12 100.0% 0.0 GAP: 307 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.1, -4.749088834314575, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BJD79_gp40 [Arthrobacter phage BarretLemon] ],,YP_009303109,96.0,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sanoyca, Alicia /note=Auto-annotation: Glimmer calls start site 32696 (GTG) whereas GeneMark calls start site 32729 (ATG). /note=Coding Potential: Start site 32729 is closer to when the coding potential of this gene begins to peak, as start site 32696 leaves a large gap of no coding potential at the beginning of the gene. All of the coding potential is contained within both potential ORFs. The coding potential is located on the 2nd reading frame of the direct sequence. /note=SD (Final) Score: 32729 has the better/higher Z-score (2.1) and least negative Final Score (-4.749), as compared to 32696, which has 1.719 and -5.959 respectively. /note=Gap/overlap: Start site 32696 results in the shortest gap between the preceding gene (274bp), and it yields the longest ORF of 828bp. Start 32729 yields the third shortest gap of 307. Both gaps are fairly large, suggesting there might be an uncalled gene within the gap, but those regions do not have any coding potential and the syntony maps with same-cluster phages also show a gap between matched genes. /note=Phamerator: As of 10/04/23, this gene belongs to the pham 116640, which is commonly annotated in other cluster AO2 phages such as StevieBAY and BarretLemon. The phamerator report does not call any function for this gene. /note=Starterator: Starterator calls start:2 @32696, which is found in 1/23 manual annotations. GravityBall contained the “most annotated” start: 3 which was called in 10/23 (43.5%) of non-draft genomes @32714, and the second most popular start: 5 @32729 which was called in 7/23 (30.4%) manual annotations. Due to this contention, starterator does not seem informative. /note=Location call: This is a real gene with a most likely start site of 32729. There is a substantial amount of manual annotations for non-draft genomes within this pham. 32729 is also close to when coding potential begins to spike in the host-trained GeneMark. While the Z-scores and Final scores for the start sites of this gene were contentious, 32729 had better scores compared to the other top candidate start@32696. /note=Function call: NCBI BLAST shows many hits with high %identity/alignment/coverage and near 0 e-values to various hypothetical proteins of other Arthrobacter phages. Phagesdb BLAST lists ‘unknown function’ for other genes within this pham present in same-cluster phages, such as StevieBAY and BarretLemon. HHPRED and CDD do not yield any significant results. Therefore, there can not be a function call for this protein. /note=Transmembrane domains: There are no predicted TMDs, further suggesting this is not a transmembrane protein. /note=Secondary Annotator Name: Anand, Sasha /note=Secondary Annotator QC: For coding potential, mention which forward reading frame contains the ORF. I would shorten the location call, since it mentions the notes made under Starterator and Phamerator already. Use synteny to further support the location call here. Under function call, I would name 1-2 specific hits from BLASTp and HHPRED with the coverage and probability percentages, even though it is hypothetical. Make sure to only select 2-3 evidence hits max for the databases below. CDS 33520 - 34062 /gene="41" /product="gp41" /function="hypothetical protein" /locus tag="GravityBall_41" /note=Original Glimmer call @bp 33520 has strength 8.82; Genemark calls start at 33520 /note=SSC: 33520-34062 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STEVIEBAY_41 [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 1.05796E-127 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.702, -7.070428535374837, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STEVIEBAY_41 [Arthrobacter phage StevieBAY]],,QJD53371,100.0,1.05796E-127 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chan, Rose /note=Auto-annotation: Both Glimmer and GeneMark call the gene and they agree on the start site 33520. /note=Coding Potential: Coding potential in this gene is on the forward strand only, indicating that this is a forward gene. Coding potential is found in both GeneMark Self and Host. /note=SD (Final) Score: -7.070. Has z-score of 1.702. These are not the best scores for both categories but provides the most reasonable gap. /note=Gap/overlap: Gap: -4 bp. This is a small overlap and indicates that this gene is likely part of an operon. /note=Phamerator: pham: 11790. Date: 10/11/23. It is conserved, found in BarrettLemon, BossLady, and StevieBAY. /note=Starterator: Start site 2 was manually annotated in 22 of 22 non-draft genes in this pham. Start site 2 is 33520 in GravityBall. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 33520. /note=Function call: NKF. The top four NCBI Blast hits are also NKF (e-value 1e-127), and the top four hits on PhagesDB are also NKF. HHPred had no hits with high enough coverage or low E-value, and CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 34070 - 34555 /gene="42" /product="gp42" /function="RusA-like resolvase" /locus tag="GravityBall_42" /note=Original Glimmer call @bp 34070 has strength 10.36; Genemark calls start at 34031 /note=SSC: 34070-34555 CP: yes SCS: both-gl ST: SS BLAST-Start: [RusA-like resolvase [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 9.83953E-111 GAP: 7 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.297, -5.324814838162144, no F: RusA-like resolvase SIF-BLAST: ,,[RusA-like resolvase [Arthrobacter phage StevieBAY]],,QJD53372,100.0,9.83953E-111 SIF-HHPRED: d.79.6.1 (A:) automated matches {Escherichia coli [TaxId: 562]} | CLASS: Alpha and beta proteins (a+b), FOLD: Bacillus chorismate mutase-like, SUPFAM: Holliday junction resolvase RusA, FAM: Holliday junction resolvase RusA,,,SCOP_d2h8ea_,75.7764,99.8 SIF-Syn: /note=Primary Annotator Name: Chong, Truman /note=Auto-annotation: Glimmer and GeneMark. Glimmer calls the start site at 34070 with start codon GTG and GeneMark calls the start site at 34031 with start codon GTG. /note=Coding Potential: The coding potential on the Forward Strand is found in both GeneMark Host and Self. All reasonable coding potential is covered by both start sites called by Glimmer and GeneMark. /note=SD (Final) Score: Start Site 34070 has a less negative FS score of -5.325 and a higher Z-score of 2.297 while Start Site 34031 has a FS score of -6.376 and a Z-score of 1.66, making Start Site 34070 a better start site. /note=Gap/overlap: Start Site 34070 has a gap of 7 bp with a gene length of 486 bp while Start Site 34031 has an overlap of 32 bp with a gene length of 525 bp. The gap and gene length using start site 34070 is conserved with similar AO2 phages such as BarretLemon, Timinator, and LeeroyJ. /note=Phamerator: Pham: 86340. Date: 10/10/2023. This pham has 27 members. It is conserved and found in BarretLemon, Timinator, and LeeroyJ all of which call the function of the gene as RusA-like resolvase, an approved function. /note=Starterator: Start Site 18 in Starterator was manually annotated in 5/26 non-draft genes in this pham. Start Site 18 is 34070 in GravityBall. This evidence agrees with the site predicted by Glimmer. /note=Location call: Based on the above evidence, this is a real gene with the most likely start site at 34070 as it is highly conserved with other phages in subcluster AO2. /note=Function call: RusA-like Resolvase. The top four PhagesDB BLAST hits have the function of RusA-like Resolvase (e-value: <1e-88), and the top three NCBI BLAST hits also have the function of RusA-like Resolvase (100% coverage, >96.89% identity, and e-value: <1e-106). HHpred had multiple hits for RusA-like Resolvase (>99% probability, e-value: <2.3e-12). CDD had no relevant hits. /note=Transmembrane domains: TMHMM did not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Trinh, Uyen /note=Secondary Annotator QC: Phamerator: Note down how many phages the gene is conserved in total. Other than that very good PECAAN notes. I agree with the function and location call! CDS 34603 - 34935 /gene="43" /product="gp43" /function="hypothetical protein" /locus tag="GravityBall_43" /note=Original Glimmer call @bp 34603 has strength 18.64; Genemark calls start at 34603 /note=SSC: 34603-34935 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STEVIEBAY_43 [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 7.46067E-71 GAP: 47 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.1, -4.731360067354143, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STEVIEBAY_43 [Arthrobacter phage StevieBAY]],,QJD53373,100.0,7.46067E-71 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Infante, Ariana /note=Auto-annotation: Glimmer and GeneMark call the start of this gene at 34603 with a start codon of ATG. This start codon has a high probability of occurring. /note=Coding Potential: The ORF has reasonable coding potential in the forward direction, with the selected start site of 34603 covering all of the coding potential seen on the host- and self-trained GeneMark /note=SD (Final) Score: The start site has the second-best final score (-4.731) and z-score (2.111), but both values are still reasonable. /note=Gap/overlap: The gap with the upstream gene is 47 bp. The other start candidate with the best final score and z-score creates a large gap of 281 bp and does not reach the acceptable gene length of 120 bp. The chosen start site creates a gene of acceptable length at 333 bp. /note=Phamerator: The pham number as of October 7th, 2023 is 106559. The gene is conserved in StevieBAY, Timinator, BossLady, and many other phages belonging to the same cluster as GravityBall (AO). No function is called for this gene. /note=Starterator: There are 21 non-draft members in the pham. All non-draft members call start site 1, which corresponds to a start at 34603 in GravityBall. /note=Location call: Based on the above evidence, this is a real gene that has a likely start site at 34603. Starterator is in agreement with Glimmer and GeneMark. /note=Function call: The top three PhagesDB BLASTp hits have the function of no known function with e-values less than 1e-55. The top three NCBI BLASTp hits have the function as hypothetical proteins with e-values less than 2.25e-57, identities greater than 86%, and 100% coverages. One hit on HHpred produced a decent percent probability (77.7%) and coverage (53.6%), but a very poor e-value (31). After checking on the approved functions list and the requirements (2-3 alpha helices separated by spacer regions) to call this gene as being a helix-turn-helix DNA binding domain as noted on HHpred, this could be a potential functional call. CDD produced no hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Castellanos, Sebastian /note=Secondary Annotator QC: I agree with the location and function calls. I would describe the requirements given by the approved functions list and/or forums that warranted the calling of a helix-turn-helix DNA binding domain function and list the names of the phages to which the PhagesDB and NCBI hits belonged as well as the origin of the protein listed in HHPred. Also the synteny menu will need to be filled out if the final function call is not NKF. CDS 34997 - 35182 /gene="44" /product="gp44" /function="hypothetical protein" /locus tag="GravityBall_44" /note=Original Glimmer call @bp 34997 has strength 19.23; Genemark calls start at 34997 /note=SSC: 34997-35182 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STEVIEBAY_44 [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 4.73942E-37 GAP: 61 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.398, -4.152357863761557, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STEVIEBAY_44 [Arthrobacter phage StevieBAY]],,QJD53374,100.0,4.73942E-37 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Liu, Rodia /note=Auto-annotation: Glimmer and GeneMark. Both calls start at 34997 bp with the start codon ATG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, although there is some slight coding potential on the reverse reading frames. However, the majority of reading frames indicates that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: The final score is -4.5152 and Z-score is 2.398. /note=Gap/overlap: The gap with the upstream is 61 bp. Although the gap is wider than 50 bp, there is no coding potential found in the gap between this gene and its upstream gene, indicating that no gene needs to be added and that the selected start site includes all of the coding potential. /note=Phamerator: pham: 86333. Date: 10/9/23. It is conserved; found in Beans (AO), Jordan(AO), and LeeroyJ(AO). Pham 86333 has 27 members, 2 are drafts. /note=Starterator: (Start: 15 @34997 has 3 MA`s) Start site 15 in Starterator was manually annotated in 3/25 non-draft genes in this pham. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 34997. /note=Function call: NKF. Phagesdb BLAST hits have the unknown function with low e-value(e2) and low %coverage(<50%), which do not yield any significant results. CDD had no relevant hits. Therefore, there can not be a function call for this protein. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs. Therefore, it is not a membrane protein. /note=Secondary Annotator Name: Bhatt, Khushi /note=Secondary Annotator QC: I agree with this call, and all the relevant evidence has been selected! There are just a few spots that I would tweak: /note=1) Remember that Glimmer and GeneMark only predict the start sites, not the stop sites. As a result, it may be best if you only describe the start site in the "auto-annotation" section and omit the stop site. /note=2) Upon looking at the GeneMark Self, there is some slight coding potential in the reverse reading frames (ex. 4 and 6), although the majority is in the forward strand as agreed by your call. You may want to mention it for thoroughness, although I agree with your overall interpretation. /note=3) For the "gap/overlap" section, I would slightly rephrase your sentence to make it clearer for readers. Specifically, rather than saying "there is no coding potential for the new gene...," I would say that, "there is no coding potential found in the gap between this gene and its upstream gene, indicating that no gene needs to be added and that the selected start site includes all of the coding potential." /note=4) To be extra thorough, you want to mention in the "phamerator" section that the phages found in the same pham also had no function called, which is consistent with your call! You may also want to describe the total number of draft and non-draft pham members. /note=5) For the "starterator" section, it seems like a minority of the total phages in this pham called start site 3. To strengthen your evidence, you could describe the number of phages that possessed start site 3 and the percentage of times this start site is called whenever it is present. Nonetheless, I agree with your start site call, as it reduces the gap/overlap between this gene and its upstream gene and also includes the coding potential in addition to possessing the best Z/final scores. /note=6) To be more specific, you may want to specify the exact e-values in your "function section." Otherwise, this looks great! CDS 35182 - 35835 /gene="45" /product="gp45" /function="hypothetical protein" /locus tag="GravityBall_45" /note=Original Glimmer call @bp 35179 has strength 19.65; Genemark calls start at 35179 /note=SSC: 35182-35835 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_STEVIEBAY_45 [Arthrobacter phage StevieBAY]],,NCBI, q1:s2 100.0% 5.32756E-157 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.276, -3.4160307515230497, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STEVIEBAY_45 [Arthrobacter phage StevieBAY]],,QJD53375,99.5413,5.32756E-157 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Almeida, Tarissa /note=Auto-annotation: Both Glimmer and GeneMark agree and call the start site for this gene as 35179. The start codon for this gene is ATG which has a high probability of being used. /note=Coding Potential: The ORF in this gene has reasonable coding potential in the forward direction in Glimmer and GeneMark. There are 2 consecutive start sites with high probability but due to Wetlab data, we will call the most likely start site as the second start site, 35182. /note=SD (Final) Score: The final score -3.416, which isn’t the best final score but is still a good score. The Z-score for this gene is 3.276 which is the best score out of all possible start sites. /note=Gap/overlap: There is a 1 bp overlap in this gene which is indicative of an operon in this gene. The overlap is reasonable and conserved in other phages. /note=Phamerator: 10/9/23, the pham number for this gene is 118002 and is conserved in phages BossLady and StevieBAY. /note=Starterator: 10/10/23 the start selected, 2 @ 35182 has very little manual annotations but due to wetlab experiemental data we still call this as the start site. /note=Location call: Based on the previous information, it is likely that the start site, 35182 is the most likely start site for this gene and that this particular gene is real. /note=Function call: Based on NCBI BLAST and PhagesDB Blast, there is no known function for this particular gene as there is a low e-value for it being a hypothetical protein. /note=Transmembrane domains: There were no hits for a transmembrane domain in this gene. /note=Secondary Annotator Name: Liang, Edwin /note=Secondary Annotator QC: I have QC`d this gene and agree with the function and location calls. Note: edit the start site to 35182 and fill out the "All GM Coding Capacity" box. CDS 35828 - 36292 /gene="46" /product="gp46" /function="hypothetical protein" /locus tag="GravityBall_46" /note=Original Glimmer call @bp 35828 has strength 13.39; Genemark calls start at 35828 /note=SSC: 35828-36292 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STEVIEBAY_46 [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 92.8571% 3.6203E-91 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.016, -5.4877970958852, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STEVIEBAY_46 [Arthrobacter phage StevieBAY]],,QJD53376,87.5817,3.6203E-91 SIF-HHPRED: SIF-Syn: Synteny: There is a 8 base pair overlap upstream of this putative gene, which is conserved in phage BossLady. There is no synteny directly downstream in BossLady, but there is synteny both upstream and downstream in StevieBAY and BarretLemon. /note=Primary Annotator Name: Anand, Sasha /note=Auto-annotation: Glimmer and GeneMark. Both call the start site at 35828 with start site ATG. /note=Coding Potential: The ORF for this gene encompasses all of the coding potential on the second reading frame in the forward direction. /note=SD (Final) Score: -5.488, this is the best final score on PECAAN across all potential starts for this putative gene. The Z-score for a gene with this designated start site is 2.02, which is greater than the threshold value of 2.0. /note=Gap/overlap: There is an overlap of 8 base pairs with the previous gene. This is a reasonable overlap that indicates this is likely a real gene that is 465 bp long. This is a reasonable length for a functional protein and encompasses the longest ORF. /note=Phamerator: pham: 106544. Date: 10/4/2023. It is well conserved; some non-draft genomes share high sequence similarity and low E-values in this region and are located in the same pham. GravityBall displays synteny with BossLady, JKerns, and Grekaycon. /note=Starterator: Start 5 (@35828) has 10 MAs and is the most annotated start. It is called in 9/14 non-draft AO2 genomes. /note=Location call: Based on the evidence above, the most likely start site for this gene is 35828. It has an ATG start codon. /note=Function call: No function call can be made. Although there are multiple hits for this gene with other conserved genes across the AO2 sub-cluster, none of these proteins declare a function. The top hits in PhagesDB BLAST are hypothetical proteins in phages StevieBAY and BarretLemon, with significant E-values of 2e-74 and 4e-50. TMHMM does not predict any TMDs in this gene. The top NCBI BLAST hit is with a hypothetical protein in Arthrobacter phage StevieBay (92.8% coverage, 85.0% identity, and E-value 3.62e-91). CDD and HHPred did not yield any relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs. DeepTMHMM predicts this hypothetical protein is globular and located inside the viral membrane. /note=Secondary Annotator Name: Sanoyca, Alicia /note=Secondary Annotator QC: Include the start codon on the Auto-annotation section. Select if the start codon encompasses all coding potential. Include gene length int he gap/overlap section, and mention if this is the longest open reading frame or not. Mark if starterator is helpful or not. CDS 36285 - 36674 /gene="47" /product="gp47" /function="hypothetical protein" /locus tag="GravityBall_47" /note=Original Glimmer call @bp 36285 has strength 20.1; Genemark calls start at 36285 /note=SSC: 36285-36674 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STEVIEBAY_47 [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 3.05433E-87 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.952, -3.0425830684175623, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STEVIEBAY_47 [Arthrobacter phage StevieBAY]],,QJD53377,98.4496,3.05433E-87 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: De Guzman, Arieanne /note=Auto-annotation: Glimmer and GeneMark. Both call start at 36285. /note=Coding Potential: Self and Host-Trained GeneMark found coding potential contained in the chosen start site and stop site. ORF contains reasonable coding potential in the forward strand only. /note=SD (Final) Score: -3.043. Best SD score in PECAAN. Z-score of 2.952. /note=Gap/overlap: -8 bp. Overlap is only slightly larger than accepted, but is conserved in other Cluster AO2 phages (BarretLemon, LeeroyJ). /note=Phamerator: Pham 11657. Date: 10/9/23. Conserved as found in BarrettLemon (AO2) and LeeroyJ (AO2). /note=Starterator: Start site 18 was manually annotated in 15/22 non-draft genes in the pham and was the most often called start site. Start site 18 is present in GravityBall as 36285 and thus agrees with Glimmer and Genemark. /note=Location call: Above evidence points to a real gene with a start site of 36285. /note=Function call: NKF; BLASTp (NCBI and Phagesdb BLASTp), CDD, and HHpred did not give any significant data to provide a function for this gene. Hits from Phagesdb BLASTp and NCBI BLASTp had low e-values (top 2 phage hits had e-values <10^-83 in NCBI; top 2 non-draft phage hits had e-values <10^-70) but no known functions. CDD had no hits and HHpred’s hits had high e-values (> 50) and probabilities less than 80%. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs and therefore is not a membrane protein. /note=Secondary Annotator Name: Chan, Rose /note=Secondary Annotator QC: I have QC`d this gene and agree with the location and function call. CDS 36671 - 37084 /gene="48" /product="gp48" /function="hypothetical protein" /locus tag="GravityBall_48" /note=Original Glimmer call @bp 36671 has strength 8.91; Genemark calls start at 36671 /note=SSC: 36671-37084 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STEVIEBAY_48 [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 5.43361E-94 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.803, -3.4030855957987485, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STEVIEBAY_48 [Arthrobacter phage StevieBAY]],,QJD53378,97.8102,5.43361E-94 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Trinh, Uyen /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 36671. Start codon is ATG, which is the most common start codon, providing evidence for the location call. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.403. It is the best final score on PECAAN. Z-score of 2.803 (> 2) also provides evidence for the location call. /note=Gap/overlap: Overlap of 4 bp with the upstream gene, which is indicative of an operon. /note=Phamerator: 86126. Date 10/09/2023. This gene is conserved in 33 other phages. /note=Starterator: There are 32 non-draft members of this Pham. 10 non-draft members call start site 24, which correlates to a start site of 36671 bp for GravityBall. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 36671 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: NKF. The top three phagesdb BLAST hits have no known function (E-value <10^-75). NCBI BLAST, CDD, and HHpred did not have relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Chong, Truman /note=Secondary Annotator QC: I agree with the start and function calls. However, with regard to NCBI Blast, hypothetical proteins still count has hits for NKF and can be checked off as evidence if they meet the parameters (low e-value, high coverage). CDS 37081 - 37368 /gene="49" /product="gp49" /function="hypothetical protein" /locus tag="GravityBall_49" /note=Original Glimmer call @bp 37081 has strength 11.14; Genemark calls start at 37081 /note=SSC: 37081-37368 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STEVIEBAY_49 [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 4.40474E-63 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.103, -4.80230064365488, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STEVIEBAY_49 [Arthrobacter phage StevieBAY]],,QJD53379,100.0,4.40474E-63 SIF-HHPRED: SIF-Syn: This gene displayed synteny with other members of the A02 subcluster, specifically BarretLemon and LeeroyJ, both of whose analogs were also listed as NKF and were flanked on both sides by other genes of no known function. /note=Primary Annotator Name: Castellanos, Sebastian /note=Auto-annotation: Glimmer and GeneMark both auto annotated the start site of this gene as nucleotide 37081, with the start codon ATG /note=Coding Potential: The host-trained GeneMark shows a small stretch of coding potential between 3700 and the predicted start site far below the 50% baseline. The spike which marks the beginning of the bulk of the coding potential occurs past the start site. The self-trained GeneMark similarly showed a stretch of insignificant coding potential in the same region, followed by a spike to the 50% line just inside the limits of the start site. Both were taken as evidence that the start site accommodated coding potential well. /note=SD (Final) Score: -4.802 - The final score was not the least negative of all the potential start sites. However, this was not taken as evidence against the auto-annotated start site, for reasons explained in the Gap description. The z-score was also not the highest of all the candidate start sites at 2.103, but this point against the start site was likewise abrogated for reasons described in the gap/overlap section and this z-score was still above the threshold of 2. /note=Gap/overlap: The autoannotated start site had an overlap of 4 nucleotides with the previous gene, meaning it was likely a member of an operon. The aforementioned candidate start sites with higher final scores, including the LORF, mostly had implausibly large overlaps with the previous gene. /note=Phamerator: Pham 86556 - This gene was conserved in several other members of the subcluster AO2, including BarretLemon and LeeroyJ, both of which listed no known function for their analogs. /note=Starterator: This pham has 22 members, of which GravityBall was the only draft. The most commonly annotated start site - corresponding in this gene to the autoannotated start site - was listed as 14 - 100% of the non-draft genomes called it. This was taken as further evidence in favor of this start site. /note=Location call: Based on all of the evidence aforementioned, this is most likely a real gene and its start site is at 37081. /note=Function call: No-known function. The PhagesDB Blast revealed 23 hits with e-values below the threshold of 10^-6, all of these were listed as “no known function”. The top two hits - associated with StevieBay_49 (e-value = 4e^-52) and Timinator_49 (e-value = 8e^-52) - were marked as evidence on PECAAN. /note=NCBI Blast also had as its top two hits hypothetical protein products of the same genes belonging to StevieBay and Timinator as were referenced by PhagesDB (e-values = 4.4e^-63 and 1.2e^-62, respectively). These were likewise marked as evidence in PECAAN. /note=HHpred returned no hits that were admissible as evidence - the two top hits had e-values of 1.6 and 14. /note=CDD returned no hits at all. /note= /note=Transmembrane domains: No transmembrane domains were predicted on PECAAN. As such, no TMHMM analysis was performed. /note=Secondary Annotator Name: Infante, Ariana /note=Secondary Annotator QC: Coding potential: I think you can simplify this section down, for ex. just mention where the main bulk of the coding potential is seen (don’t think mentioning the 50% is necessary) and how the chosen start site of 37081 encompasses all of it. /note=SD (final) score: Omit meaning of final score. Mention z-score. Mention that although not the best, they are still reasonable values. /note=Gap/overlap: Mention the length of the gene with the chosen start site. /note=Phamerator: Include date at which pham number noted. Also, I think you confused this section with the starterator section. Change the location of everything you wrote after the pham number to the starterator section. To the phamerator section, add how this gene is conserved in other members of AO2 who are in the same pham, give examples, and state that they have no function called. /note=Starterator: Just add what I mentioned above to this section and it should be good :) /note=Function call: Mark NCBI hits used as evidence on PECAAN. CDS 37365 - 37583 /gene="50" /product="gp50" /function="hypothetical protein" /locus tag="GravityBall_50" /note=Genemark calls start at 37365 /note=SSC: 37365-37583 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein SEA_STEVIEBAY_50 [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 1.96676E-42 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.623, -5.622485528283668, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STEVIEBAY_50 [Arthrobacter phage StevieBAY]],,QJD53380,100.0,1.96676E-42 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Bhatt, Khushi /note=Auto-annotation: Only GeneMark. GeneMark calls the start site at 37365, and the start codon is GTG, which has a high prevalence. /note=Coding Potential: Coding potential for this ORF was primarily on the forward strand (third reading frame), and there was some coding potential on the reverse strand (fifth reading frame), but this did not match the forward directionality of the nearby surrounding genes and the synteny exhibited by other cluster AO2 phages. This gene is thus a forward gene, and coding potential is found in both GeneMark Self and Host. The selected start site 37365 covers all the coding potential. /note=SD (Final) Score: -5.622. This was the best final score on PECAAN. The Z-score was 1.623, which was also the best Z-score, although it is beneath the Z-score significance threshold of 2. /note=Gap/overlap: Overlap: -4bp upstream.This overlap is small and suggests that this gene is a part of an operon. This overlap is consistent with cluster AO2 phages BarretLemon and Timinator. The synteny and overlap suggestive of gene-membership in an operon make this overlap reasonable. /note=Phamerator: 67640. Date: 10/02/23. There are 20 pham members, 1 of which is a draft. This pham is conserved in the cluster AO2 phages BarretLemon and Timinator, both of which assign an unknown function to their genes within pham 67640. Therefore, it is consistent that this gene in phage GravityBall does not have a function either. /note=Starterator: Starterator was informative. There are 19 non-draft members of this pham. 14/19 non-draft pham members call start site 6, and all non-draft pham members with the most annotated start site (start site 6) call it. Phage GravityBall possesses start site 6, and this was also the autoannotated start at 37365bp too. This start site agrees with the GeneMark prediction, has the highest Z-score, the highest final score, contains all the coding potential, has the longest ORF, and starts with GTG. /note=Location call: Based on the evidence above, this is a real gene, and the most likely start site is at 37365. /note=Function call: NKF. The top two phagesdb BLAST hits are listed as function unknown, and they have significant e-values of 1e-34. The top two NCBI BLAST hits also have a function of hypothetical protein, and the top hit has a 100 %identity, 100% coverage, and a e-value of 1.97e-42. There are no significant HHPred hits, as all the hits have non-significant values that do pass the 1e-6 threshold. There are no CDD hits as well. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Liu, Rodia /note=Secondary Annotator QC: I have QC`d this gene and agree with the location and function calls. Note: For the starterator part, "This start site agrees with the Glimmer and GeneMark predictions," I believe that only GeneMark had start site. CDS 37679 - 37921 /gene="51" /product="gp51" /function="hypothetical protein" /locus tag="GravityBall_51" /note=Original Glimmer call @bp 37679 has strength 21.44; Genemark calls start at 37673 /note=SSC: 37679-37921 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_STEVIEBAY_51 [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 2.86472E-49 GAP: 95 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.133, -2.6814417326944517, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STEVIEBAY_51 [Arthrobacter phage StevieBAY]],,QJD53381,100.0,2.86472E-49 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Liang, Edwin /note=Auto-annotation: Glimmer calls the start site at position 37679, but GeneMark calls it at 37673. /note=Coding Potential: All reasonable coding potential in the ORF is covered by both start sites in GeneMark Host and Self. /note=SD (Final) Score: -2.681 is the lowest final score, with a Z-score of 3.133. This score is more favorable than that of the start site with the longest ORF, which has a -5.88 final score and 2.01 Z-score. /note=Gap/overlap: The gap for is 95bp, which is reasonable. It is conserved in other phages of cluster AO2, such as TaeYoung and BarretLemon. /note=Phamerator: 3627. Conserved and found in other AO2 phages. /note=Starterator: Start site 3, 37679, has most manual annotations with 13. Called 82.4% of the time when this start site is present. /note=Location call: Based on this evidence, this is a real gene and the most plausible start site is at 37679. /note=Function call: NKF. No known function. PhagesDB and NCBI Blast top hits result in hypothetical proteins with NKF. HHPred have hits, but are unreliable (e > 1). /note=Transmembrane domains: No TMD. DeepTMHMM states protein is primarily outside of membrane. /note=Secondary Annotator Name: Almeida, Tarissa /note=Secondary Annotator QC: i have QC`d this gene and agree with calls made by primary annotator. I would mention that the gap observed is conserved in other phages and which phages it is conserved in. Synteny box does not have to be filled in if the gene has no function. CDS 37958 - 38311 /gene="52" /product="gp52" /function="hypothetical protein" /locus tag="GravityBall_52" /note=Original Glimmer call @bp 37958 has strength 11.99; Genemark calls start at 37958 /note=SSC: 37958-38311 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STEVIEBAY_52 [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 3.26022E-81 GAP: 36 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.1, -4.731360067354143, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STEVIEBAY_52 [Arthrobacter phage StevieBAY]],,QJD53382,100.0,3.26022E-81 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sanoyca, Alicia /note=Auto-annotation: Both GeneMark and Glimmer agree on the start site 37958 (ATG). /note=Coding Potential: There is reasonable coding potential within the putative ORF and the chosen start site appears to cover all of the coding potential. The coding potential is located on the direct strand on reading frame 2. /note=SD (Final) Score: Start site 37958 has the 2nd best/highest Z-score of 2.1, but it has the best/least negative final score of -4.731. /note=Gap/overlap: Assuming start site 37958, the gap between the previous gene is 36, which is reasonable. All other start sites would have gaps >100bp. This start site also allows for the longest open reading frame. /note=Phamerator: As of 10/09/23, this gene belongs in pham 116761, and is conserved in other cluster AO2 phages such as StevieBAY and Timinator. The phamerator report does not call any function for this gene. /note=Starterator: Starterator calls start num: 6 @37958, which is the most annotated start site called in 8/14 (57.1%) non-draft genomes in this pham. /note=Location call: This is a real gene at a likely start site is @37958. GeneMark, Glimmer, and Starterator are all in agreement. /note=Function call: NCBI BLAST shows many hits with high %identity/alignment/coverage and near 0 e-values to various hypothetical proteins of other Arthrobacter phages such as Timinator and StevieBAY. Phagesdb BLAST also lists “function unknown” for the other genes within this pham present in same-cluster phages. CDD and HHPRED do not yield any significant results. Therefore, there can not be a function call for this gene. /note=Transmembrane domains: There are no predicted TMDs, further suggesting this is not a transmembrane protein. /note=Secondary Annotator Name: Anand, Sasha /note=Secondary Annotator QC: For coding potential, mention which forward reading frame contains the ORF. I would shorten the location call, since it mentions the notes made under Starterator and Phamerator already -- "Based on the evidence above...". Use synteny to further support the location call here. Under function call, I would name 1-2 specific hits from BLASTp and HHPRED with the coverage and probability percentages, even though it is hypothetical. CDS 38308 - 38601 /gene="53" /product="gp53" /function="hypothetical protein" /locus tag="GravityBall_53" /note=Original Glimmer call @bp 38308 has strength 20.76; Genemark calls start at 38308 /note=SSC: 38308-38601 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STEVIEBAY_53 [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 4.43839E-65 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.052, -4.843798539558957, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STEVIEBAY_53 [Arthrobacter phage StevieBAY]],,QJD53383,100.0,4.43839E-65 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chan, Rose /note=Auto-annotation: Both Glimmer and GeneMark call the gene and they agree on the start site 38308. /note=Coding Potential: Coding potential in this gene is on the forward strand only, indicating that this is a forward gene. Coding potential is found in both GeneMark Self and Host. /note=SD: -4.844. This is the best final score on PECAAN. Best z-score of 2.052. /note=Gap/overlap: Gap: -4 bp. This is a small overlap and indicates that this gene is likely part of an operon. The start site that gives the LORF does not have the most favorable z-score or gap. /note=Phamerator: pham: 117894. Date: 10/11/23. It is conserved, found in BarrettLemon, BossLady, and StevieBAY. /note=Starterator: Start site 14 was manually annotated in 16 of 22 non-draft genes in this pham. Start site 14 is 38308 in GravityBall. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 38308. /note=Function call: NKF. The top four NCBI Blast hits are also NKF (e-value 5e-65), and the top four hits on PhagesDB are also NKF. HHPred had no hits with high enough coverage or low E-value, and CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: De Guzman, Arieanne /note=Secondary Annotator QC: Missing section for SD, otherwise I agree with above location and function call. CDS 38611 - 38904 /gene="54" /product="gp54" /function="hypothetical protein" /locus tag="GravityBall_54" /note=Original Glimmer call @bp 38611 has strength 19.76; Genemark calls start at 38611 /note=SSC: 38611-38904 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STEVIEBAY_54 [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 4.86233E-64 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.84, -5.188777976577085, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STEVIEBAY_54 [Arthrobacter phage StevieBAY]],,QJD53384,100.0,4.86233E-64 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chong, Truman /note=Auto-annotation: Glimmer and GeneMark. Both call the start site at 38611 with a start codon of GTG which has a high potential of being the start site. /note=Coding Potential: The Coding Potential on the Forward Strand is found in both GeneMark Host and Self. All reasonable coding ORF is covered by this start site. /note=SD (Final) Score: The FS score for this site is -5.189 with a Z-score of 1.84. While this site has the second best FS score and Z-score, it provides the longest ORF that covers all coding potential. /note=Gap/overlap: Gap of 9 bp upstream with a total gene length of 294 bp. Gene Length and Gap are both conserved in comparison to LeeroyJ and BarretLemon. /note=Phamerator: Pham: 103781. Date: 10/10/2023. This pham has 23 members. It is conserved and found in BarretLemon, Timinator, and LeeroyJ; although, all have unknown functions. /note=Starterator: Start Site 5 in Starterator was manually annotated in 22/22 non-draft genes in this pham. Start Site 5 is 38611 in GravityBall. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene with the most likely start site at 38611 as it is highly conserved with other phages in subcluster AO2. /note=Function call: Unknown Function. NKF because the top result for PhagesDB (e-value: 3e-50) and NCBI (e-value: 5e-64) were for an unknown protein. Additionally, while HHpred had hits with high probability (>80%) and high coverage (>60%), the e-value for the hits were much larger than the threshold of <1e-3 at 27 and 30 on HHpred which is too high to make a call. CDD did not provide any relevant hits. /note=Transmembrane domains: TMHMM did not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Trinh, Uyen /note=Secondary Annotator QC: Phamerator: Note down how many phages the gene is conserved in total. Other than that very good PECAAN notes. CDS 38966 - 39343 /gene="55" /product="gp55" /function="hypothetical protein" /locus tag="GravityBall_55" /note=Original Glimmer call @bp 38966 has strength 21.46; Genemark calls start at 38966 /note=SSC: 38966-39343 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STEVIEBAY_55 [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 2.24525E-82 GAP: 61 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.515, -3.8995386480556817, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STEVIEBAY_55 [Arthrobacter phage StevieBAY]],,QJD53385,100.0,2.24525E-82 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Infante, Ariana /note=Auto-annotation: Glimmer and GeneMark both call the start of this gene at 38966 with a start codon of ATG. This start codon has a high probability of occurring. /note=Coding Potential: The ORF has reasonable coding potential in the forward direction, with the selected start site of 38966 covering all of the coding potential seen on the host- and self-trained GeneMark /note=SD (Final) Score: The start site has the best, least negative final score (-3.900) and the best z-score higher than 2 (2.515). /note=Gap/overlap: The gap with the upstream gene is 61 bp. The other start site candidates create a larger gap that fails to cover all of the coding potential seen on GeneMark. This gap is also conserved in Timinator and StevieBAY. The length of the gene is acceptable at 378 bp. /note=Phamerator: The pham number as of October 10th, 2023 is 106542. The gene is conserved in StevieBAY, Timinator, BossLady, and many other phages which all belong to the same cluster as GravityBall (AO). No function is called for this gene. /note=Starterator: There are 21 non-draft members in the pham. All non-draft members call start site 3, which corresponds to a start at 38966 in GravityBall. /note=Location call: Based on the above evidence, this is a real gene that has a likely start site at 38966. Starterator is in agreement with Glimmer and GeneMark. /note=Function call: The top three PhagesDB BLASTp hits (StevieBAY, Jordan, Shade) have the function of no known function with e-values less than 1e-43. The top NCBI BLASTp hit (StevieBAY) has the function of a hypothetical protein with an e-value of 2.24e-82, 100% identity, and 100% coverage. HHpred and CDD produced no viable hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Castellanos, Sebastian /note=Secondary Annotator QC: I agree with the location and function calls. I would name the phages to which the PhagesDB Blast and NCBI Blast hits belong. CDS 39440 - 40417 /gene="56" /product="gp56" /function="hypothetical protein" /locus tag="GravityBall_56" /note=Original Glimmer call @bp 39440 has strength 20.96; Genemark calls start at 39440 /note=SSC: 39440-40417 CP: yes SCS: both ST: SS BLAST-Start: [RNA polymerase sigma factor [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 0.0 GAP: 96 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.133, -2.6814417326944517, yes F: hypothetical protein SIF-BLAST: ,,[RNA polymerase sigma factor [Arthrobacter phage StevieBAY]],,QJD53386,99.3846,0.0 SIF-HHPRED: SIF-Syn: Helix-turn-helix DNA binding domain, the upstream gene is NKF, and the downstream gene is NKF. No reference for this gene, but the pham maps are similar. /note=Primary Annotator Name: Liu, Rodia /note=Auto-annotation: Glimmer and GeneMark. Both calls start at 39440 with the start codon ATG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: The final score is the best option at -2.681 and Z-score is the highest 3.183. /note=Gap/overlap: 96. Although the gap is wider than 50 bp, there is no coding potential in the gap between this gene and its upstream gene. /note=Phamerator: pham: 3370. Date: 10/10/23. It is conserved; found in Beans (AO), Jordan(AO), and LeeroyJ(AO). /note=Starterator: (Start: 2 @39440 has 19 MA`s). Start site 2 in Starterator was manually annotated in 19/23 non-draft genes in this pham. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 39440. /note=Function call: Helix-turn-helix DNA binding domain. Phagesdb BLAST hits had the function of RNA polymerase sigma factor. (E-value 150bp gaps and very short ORFs. /note=Phamerator: As of 10/09/23, this gene belongs to pham 3002, which is conserved in other cluster AO2 phages such as LeeroyJ and Timinator. The phamerator report does not call any function for this gene. /note=Starterator: Starterator calls start num: 17 @42942, which is the most annotated start site called in 22/26 (84.6%) non-draft genomes in this pham. /note=Location call: This is a real gene with a likely start site of@42942, as it is consistently selected by GeneMark, Glimmer, and Starterator. /note=Function call: NCBI BLAST shows many hits with high %identity/alignment/coverage and near 0 e-values to various hypothetical proteins of other Arthrobacter phages, such as Timinator and StevieBAY. Phagesdb BLAST also lists “function unknown” for the other genes within this pham present in same-cluster phages, such as LeeroyJ and Timinator. CDD and HHPRED do not yield any significant results. Therefore, there can not be a function call for this gene. /note=Transmembrane domains: There are no predicted TMDs, further suggesting this is not a transmembrane protein. /note=Secondary Annotator Name: Anand, Sasha /note=Secondary Annotator QC: For coding potential, mention which forward reading frame contains the ORF. I would shorten the location call, since it mentions the notes made under Starterator and Phamerator already -- "Based on the evidence above...". For function, I would provide the specific stats for 2-3 proteins with the phages they belong to. CDS 43322 - 43765 /gene="65" /product="gp65" /function="hypothetical protein" /locus tag="GravityBall_65" /note=Original Glimmer call @bp 43322 has strength 19.27; Genemark calls start at 43322 /note=SSC: 43322-43765 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BJD79_gp65 [Arthrobacter phage BarretLemon] ],,NCBI, q1:s1 100.0% 6.73758E-96 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.111, -2.7254235724530456, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BJD79_gp65 [Arthrobacter phage BarretLemon] ],,YP_009303134,95.9184,6.73758E-96 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chan, Rose /note=Auto-annotation: Both Glimmer and GeneMark call the gene and they agree on the start site 43322. /note=Coding Potential: Coding potential in this gene is on the forward strand only, indicating that this is a forward gene. Coding potential is found in both GeneMark Self and Host. /note=SD: -2.725. This is the best final score on PECAAN. Has best z-score of 3.111. /note=Gap/overlap: Gap: -2 bp. This is a small overlap and is the start site with the highest z-score and lowest final score. /note=Phamerator: pham: 106853. Date: 10/11/23. It is conserved, found in BarrettLemon, BossLady, and StevieBAY. /note=Starterator: Start site 14 was manually annotated in 5 of 8 non-draft genes in this pham. Start site 14 is 43322 in GravityBall. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 43322. /note=Function call: NKF. The top four NCBI Blast hits are also NKF (e-value 7e-96), and the top four hits on PhagesDB are also NKF. HHPred had 4 top hits calling endoribonuclease however none of them had high enough coverage or low E-value, and CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: De Guzman, Arieanne /note=Secondary Annotator QC: Missing SD section, otherwise I agree with above location and function call. CDS 43762 - 43938 /gene="66" /product="gp66" /function="hypothetical protein" /locus tag="GravityBall_66" /note=Original Glimmer call @bp 43747 has strength 8.35; Genemark calls start at 43747 /note=SSC: 43762-43938 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_TIMINATOR_67 [Arthrobacter phage Timinator] ],,NCBI, q1:s1 100.0% 3.00063E-33 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.046, -7.128060851526577, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TIMINATOR_67 [Arthrobacter phage Timinator] ],,ASR78096,100.0,3.00063E-33 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chong, Truman /note=Auto-annotation: Glimmer and GeneMark. Both call the start site at 43747 with a start codon of GTG which has a high potential of being the start site. /note=Coding Potential: The Coding Potential on the Forward Strand is found in both GeneMark Host and Self. All reasonable and alternate coding ORF is covered by start sites 43747, and 43762. /note=SD (Final) Score: The FS score for 43747 is -6.050 with a Z-score of 1.585. While this site does not have the best FS score and Z-score, it provides the longest ORF that covers all coding potential. Additionally, the FS score for 43762 is -7.128 with a Z-score of 1.046 both of which are more negative and less positive, respectively, in comparison to 43747. /note=Gap/overlap: 43747 has an overlap of 19 bp with a total gene length of 192 bp. 43762 has an overlap of -4 bp, indicative of an operon similarly conserved many AO2 genes such as BarretLemon, LeeroyJ, and Timinator and has an acceptable gene length of 177bp. /note=Phamerator: Pham: 106722. Date: 10/10/2023. This pham has 13 members. It is conserved and found in BarretLemon, Timinator, and LeeroyJ; although, all have unknown functions. /note=Starterator: Start Site 2 in Starterator was manually annotated in 8/12 non-draft genes in this pham. Start Site 2 is 43762 in GravityBall, supporting the claim that this gene is an operon. /note=Location call: Due to the presence of the -4 bp gap, this gene appears to be an operon that is conserved throughout various phages in the AO2 subcluster. As such, the start site should be 43762 despite poorer FS score and Z-score due to the nature of operons. /note=Function call: Unknown Function. NKF because the top 3 results for PhagesDB (e-value: <3e-21) and NCBI (e-value: <3e-23) indicated either function unknown or hypothetical protein. Additionally, both HHpred and CDD did not provide any relevant hits (e-value: <1e-3). /note=Transmembrane domains: TMHMM did not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator QC: May need to ask whether providing information on the non-start but Glimmer / Genemark start site is necessary. Possibly extraneous info in PECAAN notes, but I would love to know if you keep it because right now you have a very comprehensive explanation. /note=Very good notes that could potentially be shortened. CDS 43935 - 44126 /gene="67" /product="gp67" /function="hypothetical protein" /locus tag="GravityBall_67" /note=Original Glimmer call @bp 43935 has strength 6.37; Genemark calls start at 43935 /note=SSC: 43935-44126 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TIMINATOR_68 [Arthrobacter phage Timinator] ],,NCBI, q1:s1 100.0% 1.45632E-37 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.68, -3.649482460397077, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TIMINATOR_68 [Arthrobacter phage Timinator] ],,ASR78097,100.0,1.45632E-37 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Infante, Ariana /note=Auto-annotation: Glimmer and GeneMark both call the start of this gene at 43935 with a start codon of ATG. This start codon has a high probability of occurring. /note=Coding Potential: The ORF has reasonable coding potential in the forward direction, with the selected start site of 43935 covering all of the coding potential seen on the host- and self-trained GeneMark /note=SD (Final) Score: The start site has the best, less negative final score (-3.649) and the best z-score higher than 2 (2.68). /note=Gap/overlap: There is an overlap of -4 bp with the upstream gene. This suggests that the gene is a part of an operon. The length of the gene is acceptable at 192 bp. /note=Phamerator: The pham number as of October 11th, 2023 is 87293. The gene is conserved in StevieBAY, Timinator, BarretLemon, and many other phages that all belong to the same subcluster as GravityBall (AO2). No function is called for this gene. /note=Starterator: There are 11 non-draft members in the pham. 7 out of the 11 non-draft members call start site 8, which corresponds to a start at 43935 in GravityBall. /note=Location call: Based on the above evidence, this is a real gene that has a likely start site at 43935. Starterator is in agreement with Glimmer and GeneMark. /note=Function call: The top three PhagesDB BLASTp hits (LeeroyJ, Timinator, and BarretLemon) have the function of no known function with e-values less than 8e-36. The top three NCBI BLASTp hits have the function as a hypothetical protein with e-values less than 6.14e-37, identities greater than 98%, and 100% coverage. HHpred and CDD produced no viable hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Castellanos, Sebastian /note=Secondary Annotator QC: I agree with the location and function calls. For the phamerator report, change the cluster of GravityBall to AO2. For the function call section, give the names of the phages to which the hits from PhagesDB and NCBI Blast belong. CDS 44228 - 44848 /gene="68" /product="gp68" /function="hypothetical protein" /locus tag="GravityBall_68" /note=Original Glimmer call @bp 44228 has strength 11.32; Genemark calls start at 44228 /note=SSC: 44228-44848 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BJD79_gp68 [Arthrobacter phage BarretLemon] ],,NCBI, q1:s1 100.0% 9.44034E-145 GAP: 101 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.834, -3.340404459692468, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BJD79_gp68 [Arthrobacter phage BarretLemon] ],,YP_009303137,100.0,9.44034E-145 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Liu, Rodia /note=Auto-annotation: Glimmer and GeneMark. Both calls start at 44228 bp with the start codon ATG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: The Z score is 2.834, and the final score is -3.340. /note=Gap/overlap: 101. Although the gap is wider than 50 bp,there is no coding potential in the gap between this gene and its upstream gene as predicted by Glimmer and GeneMark. Furthermore, Jordan(AO) has preserved the same gap 101 bp. /note=Phamerator: pham: 1701. Date: 10/10/23. It is conserved; found in BarretLemon(AO), Jordan(AO), and StevieBAY(AO). /note=Starterator: (Start: 15 @44228 has 27 MA`s). The most commonly annotated when present, start site 15 in Starterator was manually annotated in 27/51 non-draft genes in this pham. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 44228. /note=Function call: NKF. Phagesdb BLAST lists unknown functions (E-value= e-112) for other genes within this pham present in same-cluster phages. NCBI BLAST shows many hits as hypothetical proteins of other Arthrobacter phages with high %identity (>98%) and 100%coverage and near 0 e-values(190 bp), which is not conserved and does not contain majority of coding potential. /note=Phamerator: Pham 11620. Date 10/10/23. Conserved as found in BarrettLemon (AO2) and JKerns (AO2). /note=Starterator: Start site 3 was manually annotated in 22/23 non-draft genes in the pham and was the most often called start site. Start site 3 is present in GravityBall as 45608 and thus agrees with Glimmer and Genemark. /note=Location call: Above evidence points to a real gene with a start site of 45608. /note=Function call: Acetyltransferase; top non-draft phage hits from Phagesdb BLASTp have low e-values (<10^-118) with function calls of acetyltransferase. Top 2 NCBI BLASTp phage hits call this function with low e-values (10^-151), high query coverage (100%), and high % identity (99%). HHpred and CDD hits were not relevant (DUF). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs and therefore is not a membrane protein. /note=Secondary Annotator Name: Chan, Rose /note=Secondary Annotator QC: I have QC`d this gene and agree with the function and location call. CDS complement (46405 - 46734) /gene="72" /product="gp72" /function="helix-turn-helix DNA binding domain" /locus tag="GravityBall_72" /note=Original Glimmer call @bp 46734 has strength 13.89; Genemark calls start at 46734 /note=SSC: 46734-46405 CP: yes SCS: both ST: SS BLAST-Start: [helix-turn-helix DNA-binding domain protein [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 1.05393E-73 GAP: 33 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.82, -5.309117108752867, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA-binding domain protein [Arthrobacter phage StevieBAY]],,QJD53403,99.0826,1.05393E-73 SIF-HHPRED: Recombination Directionality Factor RdfS; Excisionase, Recombination Directionality Factor, winged helix-turn-helix, superhelix, DNA BINDING PROTEIN; HET: GOL; 2.45A {Mesorhizobium japonicum R7A},,,8DGL_B,70.6422,98.5 SIF-Syn: helix-turn-helix DNA binding domain, upstream gene is acetyltransferase, downstream gene is DNA Polymerase III sliding clamp just like phage StevieBay. /note=Primary Annotator Name: Trinh, Uyen /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 46734. Start codon is ATG, which is the most common start codon, providing evidence for the location call. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -5.309. It is the best final score on PECAAN. Z-score of 1.82. /note=Gap/overlap: Gap of 33 bp with the upstream gene. Small, which is indicative of an appropriate start site. /note=Phamerator: 117605. Date 10/10/2023. This gene is conserved in 85 other phages. /note=Starterator: There are 78 non-draft members of this Pham. 25 non-draft members call start site 29, which correlates to a start site of 46734 bp for GravityBall. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 30686 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: helix-turn-helix DNA binding domain protein. The top three phagesdb BLAST hits have the function of AlpA-like DNA binding protein; however, this seems to be an outdated functional assignment from older phages. Following phagesdb BLAST hits have a helix-turn-helix DNA binding domain protein function call (E-value <10^-40). HHpred had a hit for helix-turn-helix DNA binding domain protein with 98% probability, 70% coverage, and E-value of 0.0000013. The sequence alignment includes HTHs that are specified in SEA-PHAGES functional assignment. NCBI had some hits for the helix-turn-helix function call and CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Chong, Truman /note=Secondary Annotator QC: Everything looks good. I agree with the start and functions calls. CDS complement (46768 - 48036) /gene="73" /product="gp73" /function="DNA polymerase III sliding clamp (Beta)" /locus tag="GravityBall_73" /note=Original Glimmer call @bp 48036 has strength 24.11; Genemark calls start at 47997 /note=SSC: 48036-46768 CP: yes SCS: both-gl ST: SS BLAST-Start: [DNA polymerase III sliding clamp [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 0.0 GAP: 79 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.355, -2.297359520375101, yes F: DNA polymerase III sliding clamp (Beta) SIF-BLAST: ,,[DNA polymerase III sliding clamp [Arthrobacter phage StevieBAY]],,QJD53404,97.1564,0.0 SIF-HHPRED: DNA polymerase III, beta subunit; TM0262, DNA Polymerase III, beta subunit, Structural Genomics, Joint Center for Structural Genomics, JCSG, Protein Structure; 2.0A {Thermotoga maritima} SCOP: d.131.1.1, l.1.1.1,,,1VPK_A,96.9194,100.0 SIF-Syn: This gene displayed synteny with other members of the subcluster AO2, specifically Shade and BarretLemon, both of which also described their analogs as DNA polymerase III sliding clamp proteins. Both phages aforementioned also had 73rd genes flanked by another DNA-binding domain in the upstream direction and a gene with "no known function" in the downstream direction. /note=Primary Annotator Name: Castellanos, Sebastian /note=Auto-annotation: Glimmer and GeneMark flagged different start sites for this gene (48036 and 47997) with start codons ATG and GTG, respectively. /note=Coding Potential: The GeneMark and Glimmer start sites both seem to encompass all of the putative coding potential depicted in the host and self-trained GeneMark, though the Glimmer start site includes more of the steep ascending slope leading to the stretch at 100% /note=SD (Final) Score: -2.297 (start @48036) & -4.615 (start @47997) - The final score of the Glimmer annotated start site was the largest of all the candidates, as was the z-score (3.355 v. 2.54 for the GeneMark start site). This was taken as evidence in favor of the Glimmer start site. /note=Gap/overlap: The Glimmer auto-annotated start site had a gap of 79 nucleotides with the previous gene and the GeneMark auto-annotated start site had a gap of 118. /note=Phamerator: Pham 85396 - This gene was conserved in several other members of the subcluster AO2, including SHade and BarretLemon, both of which also listed their 73rd gene has encoding for a DNA-polymerase III sliding clamp. /note=Starterator: This pham has 80 members, of which 10 were drafts. The most commonly annotated start site was 23 - called in 31 of the 70 non-draft genomes. GravityBall_73 did not have this start site, instead calling start 15 (corresponding to the Glimmer annotated start site), which was called 96.2% of the time when present. This was taken as further evidence in favor of the Glimmer annotated start site. (The Phamerator and Starterator reports were accessed on October 11, 2023) /note=Location call: Based on all of the evidence aforementioned, this is most likely a real gene and its start site is at 48036. /note=Function call: DNA Polymerase III sliding clamp. The PhagesDB Blast revealed 14 hits with e-values of 0, all of which were listed as “DNA Polymerase III sliding clamp” or “DNA Polymerase III sliding clamp beta subunit”. The top two hits - associated with StevieBAY_74 and Timinator_74 - were marked as evidence on PECAAN. /note=NCBI Blast had the same protein products as were referenced by PhagesDB (e-value = 0) as the top two hits. They were likewise marked as evidence in PECAAN. /note=HHpred returned multiple hits which were described as DNA Polymerase III Beta subunits, including five with e-values of 0. The top two hits - 1VPK_A & 3T0P_B (associated with Thermotogamaritima and Eubacterium rectale, respectively) - with coverages in excess of 90% and probabilities of 100%, were marked as evidence in PECAAN. /note=CDD returned as its top hit a protein specifically identified as the beta processivity factor of DNA Polymerase III. This protein, with an e-value of 3.78e^-44 and 93% coverage, was marked as evidence in PECAAN. /note= /note=Transmembrane domains: No transmembrane domains were predicted on PECAAN. As such, no TMHMM analysis was performed. /note=Secondary Annotator Name: Infante, Ariana /note=Secondary Annotator QC: /note=Gap/overlap: Mention the length of the gene. /note=Phamerator: Include date at which pham number noted. Also, I think you confused this section with the starterator section. Change the location of everything you wrote after the pham number to the starterator section. To the phamerator section, add how this gene is conserved in other members of AO2 who are in the same pham, give examples, and state that they have a function call that is consistent/found in the approved functions list. /note=Starterator: Just add what I mentioned above to this section and it should be good :) CDS complement (48116 - 48760) /gene="74" /product="gp74" /function="hypothetical protein" /locus tag="GravityBall_74" /note=Original Glimmer call @bp 48760 has strength 23.35; Genemark calls start at 48760 /note=SSC: 48760-48116 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_STEVIEBAY_75 [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 1.55812E-153 GAP: 67 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.1, -4.669046746593814, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STEVIEBAY_75 [Arthrobacter phage StevieBAY]],,QJD53405,100.0,1.55812E-153 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Bhatt, Khushi /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 48760. The start codon is ATG, which has a high prevalence. /note=Coding Potential: Coding potential for this ORF was on the reverse strand only, indicating that this is a reverse gene. Coding potential is found in both GeneMark Self and Host, and the selected start site 48760 covers all the coding potential. /note=SD (Final) Score: -4.669. Is is not the best final score on PECAAN, and it also does not have the highest Z-score (2.1), but this is the most annotated start site on Starterator and exhibits synteny with other phages, which is why the start site at 48760 is selected. /note=Gap/overlap: Gap: 67bp. This is the second smallest gap, but this gap and start site is conserved in other cluster AO2 phages, including StevieBAY and Timinator. There is no coding potential found in this gap as well. /note=Phamerator: 3131. Date: 10/02/23. There are 25 pham members, 1 of which is a draft. This pham is conserved in the phage AO2 StevieBAY and Timinator, both of which a assign no function to their genes in pham 3131. Therefore, it is consistent that the according pham 3131 gene in phage GravityBall has no function too. /note=Starterator: Starterator was informative. There are 24 non-draft pham members, and 12/24 non-draft pham members call start site 13, which phage GravityBall possesses. Start site 13, which matches the start at 48760, was autoannotated in GravityBall. This start site agrees with the Glimmer and GeneMark predictions, has the highest coding potential, starts with ATG, and exhibits synteny with multiple phages in cluster AO2, including phages StevieBAY and Timinator. /note=Location call: Based on the evidence above, this is a real gene, and the most likely start site is at 48760. /note=Function call: NKF. The top two phagesdb BLAST hits are listed as function unknown, and they have significant e-values of 1e-155 and 1e-107. The top two NCBI BLAST hits also have a function of hypothetical protein, and the top hit has a 100 %identity, 100% coverage, and a e-value of 1.56e-153. There are no significant HHPred hits, as all the hits have non-significant values that greatly exceed the 1e-6 threshold. There are no CDD hits as well. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Liu, Rodia /note=Secondary Annotator QC: I have QC`d this gene and agree with the location and function calls. CDS complement (48828 - 49043) /gene="75" /product="gp75" /function="hypothetical protein" /locus tag="GravityBall_75" /note=Original Glimmer call @bp 49043 has strength 23.53; Genemark calls start at 49043 /note=SSC: 49043-48828 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BJD79_gp75 [Arthrobacter phage BarretLemon] ],,NCBI, q1:s1 100.0% 2.14337E-44 GAP: 85 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.1, -4.669046746593814, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BJD79_gp75 [Arthrobacter phage BarretLemon] ],,YP_009303144,100.0,2.14337E-44 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Liang, Edwin /note=Auto-annotation: Both Glimmer and GeneMark both call the start site at 49043. /note=Coding Potential: All reasonable coding potential in the ORF is covered by both start sites in GeneMark Host and Self. /note=SD (Final) Score: -4.669 is the lowest final score, with a favorable Z-score over 2.1. The start codon for this start site is GTG. /note=Gap/overlap: 85bp gap. Reasonable gap length. Other start sites have short gene lengths or lower final scores. /note=Phamerator: Pham 106727. Conserved in some AO2 phages. /note=Starterator: Start site 5, 49043, has 5 manual annotations. This start site is manually annotated 50% of the time. /note=Location call: Based on this evidence, this is a real gene and the most plausible start site is at 49093. /note=Function call: NKF. No hits given in PhagesDB or NCBI Blast. HHPred gives results, but are unrelated to bacteriophages. CDD has no hits as well. /note=Transmembrane domains: No TMD is found. This hypothetical protein is located inside the cell membrane. /note=Secondary Annotator Name: Almeida, Tarissa /note=Secondary Annotator QC: I have QC`d this gene and agree with calls made by primary annotator. Check PhagesDB BLAST and and NCBI as evidence for NKF. Synteny box does not have to be filled for NKF genes. CDS complement (49129 - 49629) /gene="76" /product="gp76" /function="hypothetical protein" /locus tag="GravityBall_76" /note=Original Glimmer call @bp 49629 has strength 20.23; Genemark calls start at 49587 /note=SSC: 49629-49129 CP: no SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_STEVIEBAY_77 [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 1.86053E-118 GAP: 113 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 0.161, -8.688548617674662, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STEVIEBAY_77 [Arthrobacter phage StevieBAY]],,QJD53407,99.3976,1.86053E-118 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sanoyca, Alicia /note=Auto-annotation: Glimmer calls start site 49629 (TTG) while GeneMark calls start site 49587 (ATG). /note=Coding Potential: There is reasonable coding potential within the putative ORF. Start @49629 covers all of the coding potential while start @49587 does not. /note=SD (Final) Score: Start site 49629 has the worst/lowest Z-score of 0.161 and the worst final score of -8.689. Start site 49581 has the best/highest Z-score of 3.133, and the best/least negative final score of -2.601. /note=Gap/overlap: Start site 49629 results in a 113bp gap between the preceding gene, which is the smallest gap out of the possible start codons. This start site also results in the longest open reading frame. /note=Phamerator: As of 10/09/23, this gene belongs to pham 8092, which is conserved in 8 other cluster AO2 phages. The phamerator report does not call any function for this gene. /note=Starterator: Starterator calls start num: 3 @49629, which is the most annotated start site present as it is called in 4/7 (57.1%) non-draft genomes in this pham. /note=Location call: This likely a real gene with a likely start site of 49629. This start site is consistently selected by Glimmer and Starterator. This start site also covers all coding potential, leads to the smallest preceding gap and longest open reading frame. This start site has the worst final and Z scores, but since all other evidence is consistent, the start site will be called as 49629. Low Z and final scores may have to do with low transcription rates, so it is not improbable that this is the start site. /note=Function call: NCBI BLAST shows many hits with high %identity/alignment/coverage and near 0 e-values to various hypothetical proteins of other Arthrobacter phages, such as LeeroyJ and StevieBAY. Phagesdb BLAST also lists “function unknown” for the other genes within this pham present in same-cluster phages, such as StevieBAY and Timinator. CDD and HHPRED do not yield any significant results. Therefore, there can not be a function call for this gene. /note=Transmembrane domains: There are no predicted TMDs, further suggesting this is not a transmembrane protein. /note=Secondary Annotator Name: Anand, Sasha /note=Secondary Annotator QC: Select dropdown for GM coding capacity. Make sure to note which reading frame the coding potential is on. I would also explain the TTG start site, which is uncommon. Explain why a gap of 113 is acceptable for this gene based on starterator/phamerator evidence. For location call, state "based on the evidence above...". The other sections contain the evidence you mention. For function, state 1-2 specific proteins with coverage and probability, naming the phages and function call/NKF. CDS complement (49743 - 50741) /gene="77" /product="gp77" /function="hypothetical protein" /locus tag="GravityBall_77" /note=Original Glimmer call @bp 50741 has strength 23.06; Genemark calls start at 50747 /note=SSC: 50741-49743 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_LEEROYJ_78 [Arthrobacter phage LeeroyJ]],,NCBI, q1:s3 100.0% 0.0 GAP: 75 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.1, -4.669046746593814, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LEEROYJ_78 [Arthrobacter phage LeeroyJ]],,AYD86549,99.4012,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chan, Rose /note=Auto-annotation: Both Glimmer and GeneMark call the gene. Glimmer calls the start site of 50741 but GeneMark calls the start site of 50747. /note=Coding Potential: Coding potential in this gene is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found in both GeneMark Self and Host. /note=SD: -4.669. This is the best final score on PECAAN. Has third best z-score of 2.1. /note=Gap/overlap: Gap: -2 bp. This is a small overlap and is the start site with the highest z-score and lowest final score. /note=Phamerator: pham: 114655. Date: 10/11/23. It is conserved, found in BarrettLemon, BossLady, and StevieBAY. /note=Starterator: Start site 84 was manually annotated in 20 of 297 non-draft genes in this pham. Start site 84 is 50741 in GravityBall. This evidence agrees with the site predicted by Glimmer. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 50741. /note=Function call: NKF. The top four NCBI Blast hits are also NKF (e-value 0), and the top four hits on PhagesDB are also NKF. HHPred had no relevant hits, and CDD had 1 hit with a domain of unknown function (e-value 1.50e-61). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: De Guzman, Arieanne /note=Secondary Annotator QC: Missing section for SD, otherwise I agree with the above location and function call. CDS complement (50817 - 51050) /gene="78" /product="gp78" /function="hypothetical protein" /locus tag="GravityBall_78" /note=Original Glimmer call @bp 50954 has strength 6.27; Genemark calls start at 51050 /note=SSC: 51050-50817 CP: yes SCS: both-gm ST: NA BLAST-Start: [hypothetical protein SEA_STEVIEBAY_79 [Arthrobacter phage StevieBAY]],,NCBI, q1:s1 100.0% 2.53714E-49 GAP: 0 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.473, -6.574823874961161, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_STEVIEBAY_79 [Arthrobacter phage StevieBAY]],,QJD53409,100.0,2.53714E-49 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chong, Truman /note=Auto-annotation: Glimmer and GeneMark. Glimmer calls the start site at 50954 with start codon TTG and GeneMark calls the start site at 51050 with start codon ATG. /note=Coding Potential: The Coding Potential on the Forward Strand is found in both GeneMark Host and Self. All reasonable and alternate coding ORF is covered by start sites 50954, and 51050. /note=SD (Final) Score: The FS score for 51050 is -6.575 with a Z-score of 1.473 which is the third best scores; however, with an ATG codon it is the most likely start site. /note=Gap/overlap: 50954 has a total gene length of 138 bp while 51050 has a total gene length of 234 bp. /note=Phamerator: Pham: 112569. Date: 10/11/2023. N/A, this is an orpham. /note=Starterator: Date: 10/11/2023. N/A, this is an orpham. /note=Location call: Since start site 51050 has a start codon of ATG and has a longer total gene length, it is more likely than the TTG sites to be the start site of this gene. /note=Function call: Unknown Function. NKF because the top 3 results for PhagesDB (e-value: <2e-21) and NCBI (e-value: <6e-23) indicated either function unknown or hypothetical protein. Additionally, both HHpred and CDD did not provide any relevant hits (e-value: <1e-3). /note=Transmembrane domains: TMHMM did not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Trinh, Uyen /note=Secondary Annotator QC: SD (Final score): Only mention the final score for the start site you are suggesting as the location call. Location call: This could be summarized further. Also, unsure if in NCBI hits, you need to check `hypothetical protein" as evidence. Would love to know what our TA or prof says about this! Overall your PECAAN notes look good and accurate and appropriate evidence has been checked off.