CDS 86 - 550 /gene="1" /product="gp1" /function="terminase, small subunit" /locus tag="TforTroy_1" /note=Original Glimmer call @bp 86 has strength 9.67; Genemark calls start at 86 /note=SSC: 86-550 CP: yes SCS: both ST: SS BLAST-Start: [terminase small subunit [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 1.18373E-97 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.211, -1.953940808934884, yes F: terminase, small subunit SIF-BLAST: ,,[terminase small subunit [Arthrobacter phage Yang] ],,YP_009815619,96.1039,1.18373E-97 SIF-HHPRED: Terminase small subunit; genome packaging, bacteriophage, DNA binding, VIRAL PROTEIN; 1.4A {Enterobacteria phage HK97},,,6Z6E_B,53.2468,98.8 SIF-Syn: terminase small subunit, the downstream gene is in pham 98020, just like in phages Lego, YesChef, and Tbone. /note=Primary Annotator Name: Bartolome, Alexandra /note=Auto-annotation: Glimmer and GeneMark both call the same start site at 86. It has a start codon of ATG. /note=Coding Potential: GeneMark shows coding potential included in the predicted start site at 86 in the forward strand of the ORF. /note=SD (Final) Score: The final score is -1.954 and the Z-score is 3.211. Both scores are the best scores. /note=Gap/overlap: Since this gene is the first in the genome, there is no gap or overlap with another gene. The length of the gene is 465 bp, which is reasonable and it is the longest ORF. /note=Phamerator: As of 03/30/22, this gene is part of pham 102037. This pham is conserved in phages like Adolin, Crewmate, and YesChef, which are all in cluster AZ with TforTroy. /note=Starterator: Start site number 42 was manually annotated in 24/145 non-draft genes in this pham. In TforTroy this start site is at 86, the same start site predicted by Glimmer and GeneMark. While it is not the most annotated start site, it is still highly conserved in the pham. /note=Location call: This is likely a real gene with a start site at 86, predicted by both Glimmer and GeneMark. It is highly conserved in the pham and includes all coding potential. /note=Function call: The top five Phagesdb BLAST hits have e-values from 2e-79 to 6e-71 and have the function terminase small subunit. The top two NCBI Blasts also have the function terminase small subunit with e-values of 8.8e-98 and 1.6e-90. The top two HHPred hits have e-values of 2.6e-8 and 6.3e-8 and have the function terminase small subunit. /note=Transmembrane domains: TOPCONS or TMHMM do not predict any TMDS. It is not a membrane protein. /note=Secondary Annotator Name: Pham, Britney /note=Secondary Annotator QC: It would be good to talk about what ORF the coding potential is read on. Phamerator does have some function call so maybe bring that up a little. Otherwise that I agree with the primary annotator. CDS 547 - 2244 /gene="2" /product="gp2" /function="terminase, large subunit" /locus tag="TforTroy_2" /note=Original Glimmer call @bp 556 has strength 10.13; Genemark calls start at 547 /note=SSC: 547-2244 CP: yes SCS: both-gm ST: SS BLAST-Start: [terminase large subunit [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.087, -4.394706008439538, no F: terminase, large subunit SIF-BLAST: ,,[terminase large subunit [Arthrobacter phage Yang] ],,YP_009815620,99.292,0.0 SIF-HHPRED: DNA packaging protein Gp17; large terminase, Alternative initiation, ATP-binding, DNA-binding, Hydrolase, Nuclease, Nucleotide-binding; HET: PO4; 2.8A {Bacteriophage T4},,,3CPE_A,92.9204,100.0 SIF-Syn: The gene upstream is apart of Pham 102883 and is a terminase small subunit. The gene downstream is pham 102899 and a portal protein. Synteny of the genes conserved compared to phages Amyev and Adolin /note=Primary Annotator Name: Fields, Brooke /note=Auto-annotation: Glimmer calls for this gene at start site 556 and Genemark calls for the gene at 547 /note=Coding Potential: Genemark self and host map display substantial coding potential on the forward strand for both start site 556 and 547. /note=SD (Final) Score: Start site 547 has the best final score of -4.395 with a z-score of 2.087. Start site 556 has a final score of -5.299 and z-score of 1.685. Given the z score values its safe to conclude 547 is the true start site. /note=Gap/overlap: start site 547 presents with an overlap of 4 bp and is 1698 bp long. Start site 556 presents with a gap of 5 bp and and is 1689 bps in length /note=Phamerator: this gene is found in pham 98020 (4/5/22) with 1059 members. Cluster A is the most dominant cluster present in this pham, however, clusters AZ are also present. Phamerator did not assign a function to this gene. It was compared to phage KeAlii (1698 bp, AZ cluster, terminase large subunit) and Lego, 1707 bp, cluster AZ, terminase large subunit) /note=Starterator: 98/1057 members in pham 98020 are drafts (4/1/22). The start number called the most often in the published annotations is 102, it was called in 577 of the 959 non-draft genes in the pham. Phage TforTroy_2 did not call for the gene at the most annotated start. (Start: 106 @547 has 22 MA`s) /note=Location call: Given the evidence its safe to conclude 547 is the true start site /note=Function call: The top 2 NCBI Blast hits show significant evidence that the function of gene is “terminase large subunit”. The CDD hit for Phage Terminase and showed that is was part of the Terminase_1 super family. /note=Transmembrane domains: Neither TMHMM or TOPCONS predicted any TMD’s therefore it is not a membrane protein. /note=Secondary Annotator Name: Scavetti, Alexa /note=Secondary Annotator QC: I have QC`d this annotation and I agree with the primary annotator. Don`t forget to fill out the synteny box (and Starterator drop-down menu and check PhagesDB BLAST evidence)! CDS 2278 - 3813 /gene="3" /product="gp3" /function="portal protein" /locus tag="TforTroy_3" /note=Original Glimmer call @bp 2278 has strength 10.76; Genemark calls start at 2278 /note=SSC: 2278-3813 CP: yes SCS: both ST: SS BLAST-Start: [portal protein [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 0.0 GAP: 33 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.211, -1.953940808934884, yes F: portal protein SIF-BLAST: ,,[portal protein [Arthrobacter phage Yang] ],,YP_009815621,95.6947,0.0 SIF-HHPRED: PORTAL PROTEIN; BACTERIOPHAGE SPP1, DNA TRANSLOCATION, MOLECULAR MOTOR, VIRAL PORTAL PROTEIN, VIRAL PROTEIN; HET: CA, HG; 3.4A {BACTERIOPHAGE SPP1},,,2JES_Q,81.409,100.0 SIF-Syn: /note=Primary Annotator Name: Mendoza, Alleana /note=Auto-annotation: Glimmer and GeneMark. Both call the start site at 2278 with start codon ATG. /note=Coding Potential: The ORF does have reasonable coding potential on the forward strand and the chosen start site does include all of the coding potential. /note=SD (Final) Score: -1.954. It is the best final score on PECAAN. /note=Gap/overlap: 33 bp gap. Although no other phage in the cluster has the same gap, it is only 3 bp longer than the acceptable gap, making it impossible to insert a gene. The other phages in this cluster usually have an 18 bp gap upstream. /note=Phamerator: Pham: 101476. Date: 3/31/2022. It is conserved and found in Reedo and Yang. /note=Starterator: Start site 30 in Starterator was manually annotated in 1/298 non-draft genes in this pham. Start 30 is 2278 in TforTroy. Start site 57 @2398 in Starterator was also manually annotated in 1/298 non-draft genes in this pham. However, this is unlikely to be TforTroy`s start site because it will have a big gap upstream that will be unacceptable and not seen with other phages in this cluster. Thus, this evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the data presented above, this is a real gene with 2278 as the most likely start site. /note=Function call: The top three PhagesDB BLASTp hits within this cluster are all portal proteins (E-value 0), and 3 out of 100 NCBI BLASTp hits (88%+ coverage, 74%+ identity, and E-value 0) likewise are portal proteins. CDD had a hit for portal protein with 77% coverage and E-value of 1.65e-43. The top three HHPred hits are all for portal protein with 99%+ probability, 76%+ coverage, and E-value of 80%). 3 out of 5 top NCBI BLAST hits call an ADP-ribosyltransferase domain along with a MuF-like fusion protein which should be ignored according to SEA-PHAGES (> 99% coverage, > 79% identity, e value = 0). HHpred had two hits for ADP-ribosyltransferase with 99+% probability, approximately 36% coverage, and e values < 2.5e-17. CDD had no relevant hits. This protein is on the approved SEA-PHAGES function list. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Vazquez, Gilda /note=Secondary Annotator QC: I have QC`d this annotation, and I agree with the primary annotator. Given current PECAAN notes and evidence for this gene, I agree with the location call for this gene. CDS 5889 - 6248 /gene="5" /product="gp5" /function="hypothetical protein" /locus tag="TforTroy_5" /note=Genemark calls start at 5889 /note=SSC: 5889-6248 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein HOU52_gp05 [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 8.85086E-69 GAP: 47 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.211, -1.953940808934884, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU52_gp05 [Arthrobacter phage Yang] ],,YP_009815623,93.2773,8.85086E-69 SIF-HHPRED: SIF-Syn: NKF. Upstream gene is VIP2-like-ADP-ribosyltransferase toxin and downstream gene is scaffolding protein. Phages Adolin and Crewmate show this order of gene function. /note=Primary Annotator Name: Mao, Xuanting /note=Auto-annotation: Only GeneMark called the start site at 5889. The start codon for this start site is GTG. /note=Coding Potential: The ORF does have reasonable coding potential and the chosen start site (5889) does include all of the coding potentials in which coding potential is found in both self-trained and host-trained GeneMark in the forward frame. /note=SD (Final) Score: SD Score for this start site is -1.954. It has the highest SD Score compared with other start sites. Z-value for this start site is 3.211 which is also the highest among all of the potential candidates. /note=Gap/overlap: There’s a 47 bp gap. But the gap could also be seen in other phages as well, such as Phages Crewmate and Asa15 in which they all have a 56 bp gap. /note=Phamerator: The Pham number as of March 31, 2022 is 102200. The gene is conserved in 44 phages. 10 of them belong to cluster EJ and 34 of them belong to cluster AZ. /note=Starterator: The Pham number as of March 31, 2022 is 102200. The start site called the most often in the published annotation is 12, it was called in 27 of the 29 non-draft genes in the Pham. TforTroy doesn`t have the most annotated start number and its most annotated start number is 11. The start number 11 has the start site at 5889 which is the start site called by GeneMark and it has 1 MA’s which is the only one that has MA’s. /note=Location call: Considering all of the evidence above, the gene is a real gene. The start site seems most likely at 5889 which is the one called by GeneMark. Moreover, this start site also has the highest SD Score and highest Z-value among all of the other start sites. It`s also the one with longest open reading frame. There’s no most annotated start site for this gene but the start site at 5889 is the only one that has been chosen by other people. All of these make the start site at 5889 the best start site for this gene. /note=Function call: The function for this gene is unknown because all of the NCBI hits suggest a hypothetical function with 100% or close to 100% coverage, high % identity (~100%), and low e-value(~ e-63). PhagesDB Blast indicates other final phages in cluster AZ with low e-value have function unknown. PhagesDB function frequency has different functional calls which couldn`t provide significant evidence to say the function of this gene. CDD doesn’t have a conserved domain for this gene. For HHpred, it also marks this gene as NKF because the significant hits are the ones with extremely high e-value (~500), low coverage (less than 40-50%), and low probability (less than 80-90). By combining all of the data from these blasts, it’s not convincing to say that this ORF has a function. /note=Transmembrane domains: None. There’s no transmembrane domain shown on TMHMM and Topcons in which both of them are called 0 TMD. /note=Secondary Annotator Name: Rodriguez, Sean /note=Secondary Annotator QC: I agree with this annotation and functional call. Make the "Coding Potential" section a little more concise. I agree with your conclusions, but maybe say "the chosen start site covers the region of significant coding potential." Refer to another phage that has a 47 (or approximately 47) bp gap to strengthen evidence of the gap being conserved. Mention whether the gene length as indicated by your chosen start site is reasonable. It might be a little extreme to call a 47 bp gap "very big." Make it clear that TforTroy does not have start site 12 (the most annotated start site) and that its most annotated start site is 11. Include a sentence about the phagesDB blast hits you marked as evidence, not just what the highest functional frequency is. Uncheck any of the HHpred hits as the e values are too high for them to be informative. Simplify the statement on HHpred down to "there were no informative/relevant hits" or something similar, no need to go into why they aren`t informative. CDS 6367 - 6927 /gene="6" /product="gp6" /function="scaffolding protein" /locus tag="TforTroy_6" /note=Genemark calls start at 6367 /note=SSC: 6367-6927 CP: yes SCS: genemark ST: SS BLAST-Start: [head scaffolding protein [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 2.41552E-102 GAP: 118 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.978, -3.3503726477288405, yes F: scaffolding protein SIF-BLAST: ,,[head scaffolding protein [Arthrobacter phage Yang] ],,YP_009815624,92.973,2.41552E-102 SIF-HHPRED: Scaffold protein; major capsid protein, HK97-like fold, scaffolding protein, procapsid, VIRUS; 3.72A {Staphylococcus phage 80alpha},,,6B0X_f,60.2151,98.3 SIF-Syn: Scaffolding protein. Upstream function unknown for pham 106440 and downstream major capsid protein, just like in phages Adolin and Adumb 2043. /note=Primary Annotator Name: Sun, Xingzheng /note=Auto-annotation: GeneMark calls the start at 6367. Glimmer does not call a start. /note=Coding Potential: The ORF has strong coding potential, and the chosen start covers all of the coding potentials. /note=SD (Final) Score: Final score is the best at -3.350. The z-score is the highest at 2.978. /note=Gap/overlap: Gap of 118 bp. It is a pretty large gap, but the gap was conserved in other phages as well such as Adumb2043 and Amyev. /note=Phamerator: the Pham number is 98009 as of 3/25/22. The gene is conserved in phage Adolin and Asa16 in the same cluster AZ. /note=Starterator: The start site 12 was called in 29 out of 32 non-draft genes, which correlates to the start 6367 in TforTroy called in GeneMark. /note=Location call: Based on the evidence above, this is a real gene starting at 6367. /note=Function call: Scaffolding protein. Three of the top four phagesdb BLAST hits suggest the function of scaffolding protein with e-values <2e-73. HHPRED has a hit for scaffold protein with 98.3 probability, 60.2% coverage, and 0.0002 e-value. /note=Transmembrane domains: No TMDs predicted by TMHMM or TOPCONS, suggesting a not membrane protein. /note=Secondary Annotator Name: Chang, Julia /note=Secondary Annotator QC: I agree with all sections that the primary annotator wrote. CDS 6957 - 7907 /gene="7" /product="gp7" /function="major capsid protein" /locus tag="TforTroy_7" /note=Original Glimmer call @bp 6957 has strength 11.24; Genemark calls start at 6957 /note=SSC: 6957-7907 CP: yes SCS: both ST: SS BLAST-Start: [major head protein [Arthrobacter phage Yang] ],,NCBI, q1:s1 99.3671% 0.0 GAP: 29 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.114, -2.8084998623841946, yes F: major capsid protein SIF-BLAST: ,,[major head protein [Arthrobacter phage Yang] ],,YP_009815625,99.0476,0.0 SIF-HHPRED: Major capsid protein; P22 Bacteriophage, VIRUS; 3.3A {Salmonella phage P22},,,5UU5_C,91.7721,99.9 SIF-Syn: major capsid protein, upstream gene is a scaffolding protein, downstream gene is a head-to-tail adaptor, just like in phage Crewmate and Adolin. /note=Primary Annotator Name: Pham, Britney /note=Auto-annotation: Glimmer and GeneMark, Start site called to be 6957, with the start codon being ATG /note=Coding Potential: GeneMark Self and GeneMark Host showed that the gene does have good coding potential at the third open reading frame. There seems to be high typical coding potential indicating that this has the potential to be a real gene. Doesn’t seem to be much potential toward the end by the stop site /note=SD (Final) Score: The gene had a Final score of -2.808 which is the highest (least negative) final score of all the possible stop sites. Z-score of 3.114 which is the highest score of all the possible start sites. /note=Gap/overlap: There is a gap of 29, which is the longest reasonable ORF with a sufficient sequence length. This small gap seems like no gene needs to be added prior or after this gene. /note=Phamerator: Gene is found in pham number 57253 as of 3/31/2022. It appears to be conserved (commonly annotated) in cluster AZ though it is found in other clusters too. Looking into comparisons with other AZ phages, such as Amyev and Crewmate, phamerator does seem to have a function called (major capsid protein). /note=Starterator: This seems like a reasonable start site choice. This start site falls in Start 8 which was called in 157/209 of nondraft genes in pham. The auto-annotated start number is 8 and is the same as the most annotated start which is 6957 meaning other people have also agreed that this is the best start site. /note=Location call: The start site seems to be 6957 with a stop site of 7907. The gene is most likely to be real with no need to add another gene to PECAAN. /note=Function call: Major Capsid Protein, from both NCBI and phagesDB, the function of the gene has been determined through strong hits on phagesDB Yang_7 (e-170 and cluster AZ) and Warda_7 (e-161 and cluster AZ) both with host Arthrobacter globiformis B-2979. On NCBI, there were hits with Tbone (e-value 0) and Crewmate (e-value 0) with the same function of major capsid protein. The function frequency table shows that the main function called is major capsid protein meaning other manual annotators agreed on the function. CDD had 1 hit with an e-value of 1.91e-12 and HHpred had 32 hits calling Major Capsid protein or T7-like capsid protein with low e-values. Overall, the function is deemed as major capsid protein /note=Transmembrane domains: Using TMHMM and Topcon, there is an absence of TMDs. This does support the function called because major capsid protein protects the viral DNA while entering the host cell. They also modulate the activity and specificity of viral replication processes. /note=Secondary Annotator Name: MAO, XUANTING /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator. But one thing I want to correct is that the gap is 29 not 26. CDS 7981 - 8379 /gene="8" /product="gp8" /function="head-to-tail adaptor" /locus tag="TforTroy_8" /note=Original Glimmer call @bp 7981 has strength 10.24; Genemark calls start at 7981 /note=SSC: 7981-8379 CP: yes SCS: both ST: SS BLAST-Start: [head-tail adaptor Ad1 [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 4.64962E-75 GAP: 73 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.211, -2.0162541296952132, yes F: head-to-tail adaptor SIF-BLAST: ,,[head-tail adaptor Ad1 [Arthrobacter phage Yang] ],,YP_009815626,93.8931,4.64962E-75 SIF-HHPRED: 15 PROTEIN; VIRAL PROTEIN, VIRAL INFECTION, TAILED BACTERIOPHAGE, SIPHOVIRIDAE, SPP1, VIRAL ASSEMBLY, HEAD-TO-TAIL INTERFACE, DNA GATEKEEPER, ALLOSTERIC MECHANISM; 7.2A {BACILLUS PHAGE SPP1},,,5A21_C,82.5758,99.0 SIF-Syn: Head-to-tail adaptor. Upstream gene is major capsid protein and downstream gene is NKF, like in phages Eraser and Powerpuff, both in cluster AZ with TforTroy. /note=Primary Annotator Name: Scavetti, Alexa /note=Auto-annotation: Glimmer and GeneMark. Both call the start site at 7981 with start codon ATG. /note=Coding Potential: Yes. Both Self- and Host-Trained GeneMark show coding potential in the forward direction in the region from 7981 bp to 8379 bp. The chosen start site of 7981 does cover all of the coding potential for this region. /note=SD (Final) Score: -2.016 with a Z-value of 3.211. This is the best possible SD score and Z-value according to PECAAN. /note=Gap/overlap: This gene is a reasonable length. The autoannotated start site of 7981 does leave an upstream gap of 73 bp, which is not the longest ORF. This site does, however, provide a better SD score and is more conserved in Starterator than the LORF start site, so it is still the preferred start site. There is no synteny evidence suggesting that a gene should be added in this gap. /note=Phamerator: Pham number 76440 as of 3/31/2022. Conserved in other AZ cluster phages, including Eraser and DrManhattan. Function call is head-to-tail adapter, which is consistent between Phamerator and the phams database and is included on the approved SEA-PHAGES list. /note=Starterator: Start site 4 is conserved among 30/32 non-draft genomes in the pham. This start site corresponds to position 7981 in TforTroy and was called by both Glimmer and GeneMark. /note=Location call: Based on the evidence, this is a real gene and the most likely start site is 7981. Starterator agrees with Glimmer and GeneMark, and this site has the best SD and Z-score. /note=Function call: The top two non-draft hits on both PhagesDB and NCBI BLASTp, sorted by e-value, suggested the function is head-to-tail adaptor, with high query coverage (>98%), high % identity (>88%), and low e-values (<5e-46). HHpred also called head-to-tail adaptor with high identity (>99%) and low e-value (2.3e-8). CDD did not return any hits. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predict any TMDs, so it is not a membrane protein. This agrees with the functional call of head-to-tail adaptor, which is not associated with the membrane. /note=Secondary Annotator Name: Saha, Atul /note=Secondary Annotator QC: Looks all correct. All relevant evidence included. CDS 8391 - 8504 /gene="9" /product="gp9" /function="hypothetical protein" /locus tag="TforTroy_9" /note=Original Glimmer call @bp 8391 has strength 17.74; Genemark calls start at 8391 /note=SSC: 8391-8504 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HOU52_gp09 [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 6.33596E-16 GAP: 11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.057, -2.2763497933341483, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU52_gp09 [Arthrobacter phage Yang] ],,YP_009815627,100.0,6.33596E-16 SIF-HHPRED: SIF-Syn: NKF; upstream gene is head-to-tail adaptor, downstream gene is head-to-tail adaptor like DrSierra. /note=Primary Annotator Name: Taylor, Amaya /note=Auto-annotation: Glimmer and GeneMark both call the start site at 8391. The start codon given by PECAAN is ATG. /note=Coding Potential: Coding potential is only present in the middle ORF of the forward strand, confirming that this is a forward gene. Coding potential found in Host and GeneMark Self /note=SD (Final) Score: The final score is -2.276 and it is the best final score on PECAAN. /note=Gap/overlap: There is a gap of 11bp and is conserved in other phages like DrSierra. It’s a reasonable gap indicating that the start site is probably 8391. /note=Phamerator: pham: 101644. Date: 03/31/22. The start site is conserved and found in Warda_9 and Kaylissa_9 /note=Starterator: Start site 1 in starterator was manually annotated in 20/21 non-draft genes in this pham. This evidence agrees with the start site predicted by Host and GeneMark Self, as it corresponds to 8391 in TforTroy. /note=Location call: Based on the evidence above, this is a real gene and it’s likely start site is 8391 /note=Function call: unknown; Phagesdb BLAST shows high e-values like 9e-15 and low scores like 77 indicating that the function of this gene is unknown. /note=Transmembrane domains: In TmHmm, there are no TMHs predicted. There is also no data given by TOPCONS indicating that this protein is not a transmembrane protein. /note=Secondary Annotator Name: Sun, Xingzheng /note=Secondary Annotator QC: I agree with the location and functional call in this annotation. All of the evidence has been properly considered. (You may include other phages to compare in the synteny box). CDS 8501 - 8851 /gene="10" /product="gp10" /function="head-to-tail stopper" /locus tag="TforTroy_10" /note=Original Glimmer call @bp 8501 has strength 14.18; Genemark calls start at 8501 /note=SSC: 8501-8851 CP: yes SCS: both ST: SS BLAST-Start: [head-to-tail stopper [Arthrobacter phage Cassia]],,NCBI, q1:s1 100.0% 4.09648E-74 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.068, -3.16089827114923, yes F: head-to-tail stopper SIF-BLAST: ,,[head-to-tail stopper [Arthrobacter phage Cassia]],,WGH21083,100.0,4.09648E-74 SIF-HHPRED: HEAD COMPLETION PROTEIN GP16; VIRAL PROTEIN, VIRAL INFECTION, TAILED BACTERIOPHAGE, SIPHOVIRIDAE, SPP1, VIRAL ASSEMBLY, HEAD-TO-TAIL INTERFACE, DNA GATEKEEPER, ALLOSTERIC MECHANISM; 7.2A {BACILLUS PHAGE SPP1},,,5A21_E,95.6897,99.6 SIF-Syn: Upstream is NKF ( pham 101664) and downstream is NKF ( pham 102596). But the next known protein function is gene 12 (pham 2023), as observed similarly with phage Niobe. This gene does display synteny with other phage genomes. The gene is consistently placed next to the same colored genes upstream and downstream. As seen with phages Tweety19 and YesChef. This demonstrates conserved or functional aspects of the gene and it supports that it is a real gene. /note=Primary Annotator Name: Vaquez, Gilda /note=Auto-annotation: Glimmer and Genemark were both used. Both agree on the same start site. Start site number called was 8501. /note=Coding Potential: The gene does have some reasonable coding potential predicted within the putative ORF. Coding potential is visible within the genes coordinates in the second direct sequence. /note=SD (Final) Score: The final score for the called gene is -3.161. The Z-score value is 3.068. This is the best value observed for this gene. /note=Gap/overlap: There is a 4bp overlap (upstream).This may still remain under proper gene density as the gene does not overlap over 30bp. /note=Phamerator: The gene (11) is found in pham 98989, as of 03/31/2022. It is conserved in other members of the AZ cluster. All other phages were used for comparison, like JohnDoe and London phages. The function call of this gene is not listed. /note=Starterator: Reasonable start site number 5 is 8501. It was called 100% of the time. It is called by 27/32 non-draft genes and found in 38 of 46 genes in pham. This is also the same start site called by both GeneMark and Glimmer. /note=Location call: This is likely a real gene based on above values for start site and data for coding potential. Thus, it may be a functional gene. The start site does not need to be changed. /note=Function call: Phagesdb Hit Gene displayed many products with function unknown and head-to-tail stopper. Highest e-value for non-draft phages were 2e-48 (head-to-tail stopper) and 4e-48 head-to-tail stopper. 100% identities in evidence hits. NCBI BLASTs products were listed with head-to-tail stoppers, with e-values such as 4e-58 & 1e-57. The biggest identities percentage was 81% and decreasing after while the these hits had ~98% coverages. HHPRED displayed 99.64 probability for head completion protein and 99.58 for head-tail adaptor, +95% coverages, and e-values of >8.1e-14. CDD did not display data. All data support that the function is head-to-tail stopper. /note=Transmembrane domains: The absence of both TMHMM TMDs and TOPCON TMD observed, show that it is not a membrane protein. Additionally, the functions call of head-to-tail stopper is more accurate. /note=Secondary Annotator Name: Torres Espinosa, Michael /note=Secondary Annotator QC: I have verified the evidence, and I agree with the primary annotator. Please select an option on the GM coding capacity drop down box. CDS 8862 - 9164 /gene="11" /product="gp11" /function="hypothetical protein" /locus tag="TforTroy_11" /note=Original Glimmer call @bp 8862 has strength 7.28; Genemark calls start at 8862 /note=SSC: 8862-9164 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HOU52_gp11 [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 9.27627E-60 GAP: 10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.211, -2.4811409279978642, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU52_gp11 [Arthrobacter phage Yang] ],,YP_009815629,99.0,9.27627E-60 SIF-HHPRED: SIF-Syn: NKF, upstream gene is head-to-tail stopper in pham 98989, downstream gene is tail terminator in pham 2023, just like in phages London and Niobe /note=Primary Annotator Name: Bartolome, Alexandra /note=Auto-annotation: Glimmer and GeneMark both call the same start site at 8862. It has a start codon of ATG. /note=Coding Potential: The start site of 8862 has coding potential predicted by GeneMark in the forward strand of the ORF. /note=SD (Final) Score: The final score is -2.481 and the Z-score is 3.211, which are both the best scores. /note=Gap/overlap: There is a gap of 10bp, which is the lowest gap size. The length of this gene is 303 bp, making it the longest ORF. /note=Phamerator: As of 04/05/22, this gene is in pham 102596. This pham is conserved in phages Adolin, DrManhattan, and Iter, which are all in cluster AZ with TforTroy. /note=Starterator: Start site number 5 has the most manual annotations, with 23/27 non-draft genes in the pham calling it. It is at 8862 in TforTroy, as predicted by Glimmer and GeneMark. /note=Location call: This is likely a real gene with a start site at 8862 predicted by Glimmer and GeneMark and it includes all coding potential. It is highly conserved in pham 102596 and was the most manually annotated start site. /note=Function call: NKF. The top five Phagesdb BLAST hits have low e-values from 1e-47 to 8e-43 and no known function. The top two NCBI BLAST hits have low e-values of 6.9e-60 and 1.7e-53 and no known function. HHpred and CDD were uninformative as there were no significant hits. /note=Transmembrane domains: TMHMM or TOPCONS do not predict any TMDs so this is not a transmembrane protein. /note=Secondary Annotator Name: Torres Espinosa, Michael /note=Secondary Annotator QC: I have verified the evidence for this annotation, and I agree with the primary annotator. CDS 9164 - 9577 /gene="12" /product="gp12" /function="tail terminator" /locus tag="TforTroy_12" /note=Original Glimmer call @bp 9164 has strength 8.99; Genemark calls start at 9164 /note=SSC: 9164-9577 CP: yes SCS: both ST: SS BLAST-Start: [tail terminator [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 3.83175E-90 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.507, -3.430952136556091, yes F: tail terminator SIF-BLAST: ,,[tail terminator [Arthrobacter phage Yang] ],,YP_009815630,97.8102,3.83175E-90 SIF-HHPRED: Tail terminator protein Rcc01690; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_F,94.1606,99.3 SIF-Syn: The gene downstream is a major tail protein and apart of pham 102101. Conservation of gene expression is observed in comparison to Crewmate and Amyev /note=Primary Annotator Name:Fields, Brooke /note=Auto-annotation: Both glimmer and Genemark call for this gene at start site 9164 /note=Coding Potential: There is substantial evidence of coding potential within the ORF on the Genemark self map. Start site covers all the coding potential of the gene. There is also evidence of coding potential in the host genemark map at start site 9164. /note=SD (Final) Score: This start site has the best final score of -3.431. A z-score of 2.507 /note=Gap/overlap: There is an overlap of 1 bp which appears reasonable. The gene is 414 bp’s long. /note=Phamerator: Gene is found in Pham 2023 (3/31/22) 46 members. Both clusters “AZ” and “EH” appear throughout Pham 2023. Pham base did not call a function for this gene. Compared to phages Amyev and Crewmate. Both are 414 bp long and function as a tail terminator. /note=Starterator:The start site position of this pham is 2, which corresponds to start site 9164. Found in 37/46 genes in the pham. called 100% of time when present. Start site position 2 was manually annotated 19 times for cluster AZ, 25 total. (Start: 2 @9164 has 25 MA`s) /note=Location call: Based on the evidence, 9164 is the most likely start site and is a real gene /note=Function call: The top 3 NCBI Blast hits show significant evidence that the function of gene is “tail terminator”. No conserved domain in CDD hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predicted any TMD’s therefore it is not a membrane protein. /note=Secondary Annotator Name: Pham, Britney /note=Secondary Annotator QC: I agree with the primary annotator and agree with the function called. I would maybe go more into the CDD and HHpred hits. Synteny has not been filled out yet. CDS 9590 - 10141 /gene="13" /product="gp13" /function="major tail protein" /locus tag="TforTroy_13" /note=Original Glimmer call @bp 9590 has strength 15.86; Genemark calls start at 9590 /note=SSC: 9590-10141 CP: yes SCS: both ST: SS BLAST-Start: [major tail protein [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 8.71028E-124 GAP: 12 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.222, -2.0720764396375664, yes F: major tail protein SIF-BLAST: ,,[major tail protein [Arthrobacter phage Yang] ],,YP_009815631,97.2678,8.71028E-124 SIF-HHPRED: Phage_TTP_1 ; Phage tail tube protein,,,PF04630.15,72.6776,97.4 SIF-Syn: 4/5/2022 Major tail protein, upstream gene is tail terminator, downstream is tail assembly chaperone, just like in phage Amyev and Yang. /note=Primary Annotator Name: Mendoza, Alleana /note=Auto-annotation: Glimmer and GeneMark. Both call the start site at 9590 with start codon ATG. /note=Coding Potential: The ORF does have reasonable coding potential and the chosen start site does include all of the coding potential. /note=SD (Final) Score: -2.072. It is the best final score on PECAAN. /note=Gap/overlap: 12 bp gap. This acceptable gap is also conserved in other phages. /note=Phamerator: Pham: 102101. Date: 4/5/2022. It is conserved and found in Amyev and Yang. /note=Starterator: Start site 50 in Starterator was manually annotated in 75/557 non-draft genes in this pham. Start 50 is 9590 in TforTroy. This evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the data presented above, this is a real gene with 9590 as the most likely start site. /note=Function call: The top three PhagesDB BLASTp hits within this cluster are all major tail proteins (E-value 80%). All of the top 5 NCBI BLAST hits call a tail assembly chaperone (> 99% coverage, > 80% identity, e value < 2.06e-46), with the exception of one function call with significantly lower % identity. HHpred and CDD had no relevant hits. This protein is on the approved SEA-PHAGES function list. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Taylor, Amaya /note=Secondary Annotator QC: I QC`d this gene and agree with the primary annotator. Don`t forget to complete your synteny box and you can also check evidence in HHPRED. CDS join(10233..10496,10496..10837) /gene="15" /product="gp15" /locus tag="TforTroy_15" /note= /note=SSC: 10233-10837 CP: yes SCS: neither ST: NI BLAST-Start: [tail assembly chaperone [Arthrobacter phage Cassia]],,NCBI, q3:s2 99.005% 3.95641E-137 GAP: -270 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.222, -2.0720764396375664, yes F: SIF-BLAST: ,,[tail assembly chaperone [Arthrobacter phage Cassia]],,WGH21088,98.0,3.95641E-137 SIF-HHPRED: SIF-Syn: CDS 10854 - 13178 /gene="16" /product="gp16" /function="tape measure protein" /locus tag="TforTroy_16" /note=Original Glimmer call @bp 10854 has strength 9.85; Genemark calls start at 10854 /note=SSC: 10854-13178 CP: yes SCS: both ST: SS BLAST-Start: [tail length tape measure protein [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 0.0 GAP: 15 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.211, -2.86135216970947, yes F: tape measure protein SIF-BLAST: ,,[tail length tape measure protein [Arthrobacter phage Yang] ],,YP_009815634,95.2196,0.0 SIF-HHPRED: Tape Measure Protein, gp57; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_AF,21.447,99.9 SIF-Syn: Tape measure protein, upstream gene is tail assembly chaperone and downstream gene is minor tail protein. Phages Adolin and Adumb2043 show this order of gene function as well. /note=Primary Annotator Name: Mao, Xuanting /note=Auto-annotation: Both Glimmer and GeneMark call the gene and they agree on the start site at 10854 bp. The start codon GTG is called here. /note=Coding Potential: The ORF does have reasonable coding potential and the chosen start site (10854)does include all of the coding potentials in which coding potential is found in both self-trained and host-trained GeneMark in the forward frame. /note=SD (Final) Score: SD Score for this gene is -2.861 which is the highest among all of the start sites. Z-value for this gene is 3.211 which is also the highest among all of the start sites. /note=Gap/overlap: There’s a 16 bp gap between this gene and its upstreaming gene in which this gap is the smallest compared with other start sites’ gaps. /note=Phamerator: The pham number as of March 31, 2022 is 101437. The gene is conserved in 205 phages which are divided into 3 clusters (cluster AZ, AK, and EE). /note=Starterator: The pham number as of March 31, 2022 is 101437. The start site called the most often in the published annotation is 3, it was called in 72 of the 168 non-draft genes in the pham. TforTroy is the one that has the most annotated start but does not call it. The start site number for TforTroy is 9 with the start site at 10854. There are 25 MA`s who chose this start site. /note=Location call: Considering all of the evidence above, the gene is a real gene and the start site is likely at 10854 because this start site has the highest SD Score and z-value. Moreover, this start site is also the one called by both Glimmer and GeneMark that contains all of the coding potentials. Furthermore, other researchers also chose this start site among all other potential candidates. /note=Function call: The function for this gene is likely to be tape measure protein because Phagesdb Function Frequency shows 71% frequency that cluster AK would have this as its function and 29% frequency that cluster AZ would have tape measure protein as its function. Moreover, final phages that are also from cluster AZ like Yang and Phives also indicate the function to be tape measure protein with the e-values of 0. Significant hits from HHpred also show a similar suggestion in which the hits have a high probability (~99%), and low e-value (~e-10). Moreover, significant hits from NCBI Blast suggest this gene has tape measure protein as its function in which the % coverages (~100%), % identity (85%) are very high and e-values are 0. The significant hit from CDD doesn’t meet the acceptable coverage which is 9.68992%, but the e-value is very low which is e-15. CDD also suggests the function to be tape measure protein. By combining all of the evidence above, it’s convincing that the function of this gene is tape measure protein. /note=Transmembrane domains: Yes. There are 4 TMHs called by TmHmm. This implies that this protein is likely to be a transmembrane protein. TOPCONS also confirms this protein is a transmembrane protein because it also has TMH. /note=Secondary Annotator Name: Mendoza, Alleana /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 13171 - 14046 /gene="17" /product="gp17" /function="minor tail protein" /locus tag="TforTroy_17" /note=Original Glimmer call @bp 13171 has strength 8.45; Genemark calls start at 13171 /note=SSC: 13171-14046 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage London] ],,NCBI, q1:s1 100.0% 0.0 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.824, -2.7653702713535186, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage London] ],,QOP64320,96.2199,0.0 SIF-HHPRED: Distal Tail Protein, gp58; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_BD,97.2509,100.0 SIF-Syn: Synteny maintained with phages in cluster AZ. Compared with phage Yang, tape measure protein downstream & minor tail protein upstream. /note=Primary Annotator Name: Saha, Atul /note=Auto-annotation: Glimmer and GeneMark call the start at 13,171 bp. Start codon is ATG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: 13,171 start site has best Z-score of 2.824 and highest final score of -2.765. /note=Gap/overlap: Start @ 13,171 has an overlap of 8 bases, which is large overlap. However, this overlap is conserved in other subcluster AZ phages (Powerpuff, Lego, Adumb2043). /note=Phamerator: pham: 101622. Date 3.31.22. It is conserved; found in Powerpuff, Lego, Adumb2043 (all subcluster AZ). /note=Starterator: Pham 101622 has 46 members, 11 of which are drafts. Start site 27 in Starterator was manually annotated in 18 non-draft genes for subcluster AZ phages in this pham. Start 27 is 13171 in TforTroy. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 13171 bp. Starterator agrees with Glimmer and Genemark. Full coding potential is including in this reading frame. /note=Function call: Minor tail protein. The top three phagesdb BLAST hits have the function of minor tail protein (E-value = 1e-52), and the top 8 NCBI BLAST hits also have the function of minor tail protein. (100% coverage, 85%+ identity, and E-value on PECAAN listed as 0). CDD returned one hit with phage tail protein function and E-value of 1.6•10^-3. Low coverage and relatively high E-value on CDD hit indicates that function cannot be determined solely from domain structure. HHpred had hits for distal tail protein with 100% probability, 97%+ coverage, and E-value < 4.2e-26. /note=Transmembrane domains: TMHMM & TOPCON do not predict a transmembrane domain. NOT a membrane protein. /note=Secondary Annotator Name: Rodriguez, Sean /note=Secondary Annotator QC: I agree with this annotation and functional call. Mention whether the start site covers all of the coding potential. Mention the length of the ORF with your selected start site and whether this is reasonable. Mention whether the phams database called a function for this gene and whether this function is on the approved list (I assume that it is). Mention how many non-draft genes there are in total for some context. Since all of the function calls are for minor tail protein, only check 2-3 boxes each for phagesDB and NCBI BLAST. Fill out synteny box. CDS 14059 - 15054 /gene="18" /product="gp18" /function="minor tail protein" /locus tag="TforTroy_18" /note=Original Glimmer call @bp 14059 has strength 11.11; Genemark calls start at 14059 /note=SSC: 14059-15054 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 0.0 GAP: 12 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.222, -2.0720764396375664, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Yang] ],,YP_009815636,94.8641,0.0 SIF-HHPRED: Receptor Binding Protein; beta sandwich domain, phage receptor binding protein, Lactococcus lactis pellicle cell wall polyphosphosaccharide, VIRAL PROTEIN; 1.75A {Lactococcus phage 1358},,,4L9B_A,54.9849,99.4 SIF-Syn: Minor tail protein. Both upstream and downstream are minor tail proteins as well, just like in phage Amyev and Asa16. /note=Primary Annotator Name: Sun, Xingzheng /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 14059. /note=Coding Potential: The ORF has strong coding potential, and the chosen start covers all of the coding potential. It is found both in GeneMark Self and Host. /note=SD (Final) Score: Final score is the best at -2.072. The z-score is the highest at 3.222. /note=Gap/overlap: A gap of 12 bp, which is a reasonable value. /note=Phamerator: The Pham number is 15182 as of 04/01/22. The gene is conserved in phage Adumb2043 and Amyev in the same cluster AZ. /note=Starterator: The start site 4 was called in 20 out of 20 non-draft genes, which correlates to the start 14059 in TforTroy called in GeneMark. /note=Location call: Based on the evidence above, this is a real gene starting at 14059. /note=Function call: Minor tail protein. Three of the top four phagesdb BLAST hits suggest the function of Minor tail protein with e-values <1e-172. HHpred had a hit for phage hit for a receptor binding protein with 99.4 probability, 54.9849% coverage, and an E-value of 3.8e-11. NCBI Blast hits Minor tail protein with 91.5408% identity, 94.8641% aligned, and 100% coverage. /note=Transmembrane domains: No TMDs predicted by TMHMM or TOPCONS, suggesting not a membrane protein. /note=Secondary Annotator Name: West, Julie /note=Secondary Annotator QC: I have reviewed the primary annotator`s calls and evidence and agree with their annotations. CDS 15054 - 16175 /gene="19" /product="gp19" /function="minor tail protein" /locus tag="TforTroy_19" /note=Original Glimmer call @bp 15054 has strength 10.73; Genemark calls start at 15054 /note=SSC: 15054-16175 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage London] ],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.317, -3.9718914447963414, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage London] ],,QOP64322,95.9786,0.0 SIF-HHPRED: 43 kDa tail protein; BACTERIOPHAGE MU, BASEPLATE, GENE PRODUCT 44, STRUCTURAL PROTEIN; 2.1A {Enterobacteria phage Mu} SCOP: b.106.1.1,,,1WRU_A,98.3914,99.6 SIF-Syn: This gene exhibits synteny with other phages in the same cluster. This gene is number 19 in the TforTroy genome, belongs to Pham 99540, and has the function minor tail protein. The gene upstream of it belongs to Pham 15182 and also possesses the function of minor tail protein. The gene downstream of it belongs to Pham 96077 and has the function minor tail protein. Other phages in the same AZ cluster, such as Adumb2043 and Amyev, illustrate genes 18, 19, and 20 in the same order and pertaining to the same Phams and function as genes 18, 19, and 20 in TforTroy. Consequently, this gene possesses synteny with other phage genomes in the same cluster. /note=Primary Annotator Name: Torres Espinosa, Michael /note=Auto-annotation: Auto annotation is present for both Glimmer and Genemark. Both call the start position at 15054. The start codon is ATG. /note=Coding Potential: This gene illustrates reasonable coding potential. Genemark host and Genemark self both illustrate strong coding potential between the called start and stop sites. /note=SD (Final) Score: RBS final score is -3.972. This is the best final score on PECAAN. The best Z-score is also selected, which is 2.317. /note=Gap/overlap: There is a 1 bp overlap with the upstream gene. This 1 bp overlap is indicative of the gene pertaining to an operon. /note=Phamerator: As of April 5, 2022, the pham for this gene is 99540. This gene is also conserved in phages Adumb2043, Amyev, and Asa16. Each of these phages is part of cluster AZ. /note=Starterator: There are 86 non-draft annotations for this pham. 20/86 non-draft annotation call start site 36, which corresponds to base pair coordinate 15054 in TforTroy. This is the start site called by Genemark and Glimmer. This is the second most common start site called by non-draft annotations of this pham. /note=Location call: Prior evidence suggests this is a real gene with a start site at bp coordinate 15054. Starterator agrees with Genemark and Glimmer. /note=Function call: The function of this gene is minor tail protein. Of the top 21 non-draft hits in PhagesDB BLAST, which spanned from E values of 0 to 2e-55, all had the function minor tail protein. The top 11 hits in NCBI Blast have the function minor tail protein and e-values of 0.0. E-values were reported as such because they were very small, and were at least smaller than 4e-176, the value pertaining to hit number 12. Coverage was 100% and percent identity ranged from 89.81% to 91.69%. CDD did not provide any hits, and so it was not informative. HHpred was informative. HHpred had a hit for a tail protein in a bacteriophage with an E-value of 2.3e-12, 99.65% probability, and 98.4% coverage. /note=Transmembrane domains: Neither TMHMM nor TOPCONS call any TMDs for this protein. This protein is not a membrane protein. /note=Secondary Annotator Name: Chang, Julia /note=Secondary Annotator QC: I agree with all of the primary annotator`s comments, but don`t forget to fill out the drop-down menu for "All GM Coding Capacity". Also for the gap/overlap section, did you mean to say there is a 1bp overlap upstream? A negative number usually indicates an overlap, while a positive number indiciates a gap. CDS 16181 - 19015 /gene="20" /product="gp20" /function="minor tail protein" /locus tag="TforTroy_20" /note=Original Glimmer call @bp 16181 has strength 6.87; Genemark calls start at 16181 /note=SSC: 16181-19015 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage Amyev] ],,NCBI, q16:s16 98.411% 0.0 GAP: 5 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.672, -3.7378191248686026, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Amyev] ],,YP_010677723,82.5593,0.0 SIF-HHPRED: SIF-Syn: minor tail protein, upstream gene function is minor tail protein and the downstream gene is NKF. Because the function of genes and the genes are conserved in Adumb, this demonstrates synteny and that this seems to be a conserved gene. /note=Primary Annotator Name: Pham, Britney /note=Auto-annotation:Glimmer and GeneMark, Start site called to be 16181, with the start codon being ATG /note=Coding Potential: GeneMark Self and GeneMark Host showed that the gene does have good coding potential at the second open reading frame. There seems to be high typical coding potential indicating that this has the potential to be a real gene. Doesn’t seem to be much potential toward the end by the stop site /note=SD (Final) Score: The gene had a Final score of -3.738 which is the highest (least negative) final score of all the possible stop sites. Z-score of 2.672 which is the highest score of all the possible start sites. /note=Gap/overlap: There is a gap of 5 bp, which is a small enough gap where another gene cannot be added upstream of the gene. Because there is no overlap, we know the genes are not part of an operon. This gap is acceptable because it is below 50bp gap requirement. /note=Phamerator: Gene is found in pham number 96077 as of 4/5/2022. It appears to be conserved (only found) in cluster AZ. Looking into comparisons with other AZ phages, such as Amyev and Elezi, the gene length seems to be decently conserved too with similar called functions. /note=Starterator: This seems like a reasonable start site choice. This start site falls in Start 4 which is the start number that is called most often. Start 4 is found in 28 of 29 of the genes in pham. Start 4 had 19/19 of the Manual Annotations, called 100% of the time when it is present. /note=Location call: The start site seems to be 16181 with a stop site of 19015. The gene is most likely to be real with no need to add another gene to PECAAN. /note=Function call: Minor Tail Protein, from both NCBI and phagesDB, the function of the gene has been determined through strong hits on phagesDB Amyev_20 (e-value 0 and cluster AZ) and Asa16 (e-value 0 and cluster AZ). On NCBI, there were hits with Niobe (e-value 0) and Eraser (e-value 0) with the same function of Minor /note=Tail Protein. The function frequency table shows that the main function called is minor tail protein meaning other manual annotators agreed on the function. CDD had 0 hits and HHpred had 74 hits calling tail spike protein with high e-values. Overall, the function is deemed as minor tail protein /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Hatashita, Anthony /note=Secondary Annotator QC: It would be good to add if the length of the gene is acceptable in the gap/overlap section. There also seem to be a few typos in the Function call section and the synteny box is not quite complete. Other than that, I agree with this annotation. CDS 19024 - 19365 /gene="21" /product="gp21" /function="hypothetical protein" /locus tag="TforTroy_21" /note=Original Glimmer call @bp 19024 has strength 7.28; Genemark calls start at 19024 /note=SSC: 19024-19365 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_POWERPUFF_21 [Arthrobacter phage Powerpuff] ],,NCBI, q1:s1 97.3451% 1.95245E-53 GAP: 8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.222, -2.45827804503836, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_POWERPUFF_21 [Arthrobacter phage Powerpuff] ],,QGZ17319,86.7257,1.95245E-53 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Scavetti, Alexa /note=Auto-annotation: Glimmer and GeneMark. Both call the start site at 19024 with start codon ATG. /note=Coding Potential: Yes. Both Self- and Host-Trained GeneMark show coding potential in the forward direction in the region from 19024 bp to 19365 bp. The chosen start site of 19024 does cover all of the coding potential for this region. /note=SD (Final) Score: -2.458 with a Z-value of 3.222. This is the best possible SD score and Z-value according to PECAAN. /note=Gap/overlap: There is a reasonable 8 bp gap between this gene and the upstream gene, creating the longest possible ORF without excessive overlap. This gene is also a reasonable length. /note=Phamerator: Pham number 55453 as of 4/5/2022. Conserved in other AZ cluster phages, including Eraser and Powerpuff. Function call is unknown, which is consistent between other phages containing this pham. /note=Starterator: Start site 11 is conserved among 15/23 non-draft genomes in the pham. This start site corresponds to position 19024 in TforTroy and was called by both Glimmer and GeneMark. /note=Location call: Based on the evidence, this is a real gene and the most likely start site is 19024. Starterator agrees with Glimmer and GeneMark, and this site has the best SD and Z-score. /note=Function call: The top two non-draft hits on both PhagesDB and NCBI BLASTp, sorted by e-value, suggest the function is unknown, with high query coverage (>97%), high % identity (>73%), and low e-values (<2e-43). HHpred and CDD did not return any useful hits. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predict any TMDs, so it is likely not a membrane protein. /note=Secondary Annotator Name: Mao, Xuanting /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator. (Don`t forget to complete your synteny box!) CDS 19431 - 19886 /gene="22" /product="gp22" /function="membrane protein" /locus tag="TforTroy_22" /note=Genemark calls start at 19431 /note=SSC: 19431-19886 CP: yes SCS: genemark ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Tweety19] ],,NCBI, q1:s1 93.3775% 7.59156E-59 GAP: 65 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.703, -3.6716023243868, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Tweety19] ],,YP_010678412,72.9032,7.59156E-59 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Taylor, Amaya /note=Auto-annotation: Glimmer did not call a start site for this gene. GeneMark called the start site at 19431. /note=Coding Potential: Coding potential was found in both GeneMark Self and GeneMark Host. Coding potential is only present in the bottom ORF of the forward strand, confirming that this is a forward gene. /note=SD (Final) Score: The final score is -3.672 and is the best final score on PECAAN. /note=Gap/overlap: The gap is 65bp and conserved in other phages like Asa16. This is a reasonable gap indicating that 19431 may be the start site. /note=Phamerator: pham: 55522. Date: 04/05/2022. The start site is conserved and found in Asa16. /note=Starterator: Start site 10 in starterator was manually annotated in 11/24 non-draft genes in this pham. This evidence agrees with the start site predicted by GeneMark Host and Self. /note=Location call: Based on the evidence above, this is a real gene and it’s likely start site is 19431 /note=Function call: unknown; Phagesdb BLAST shows an e-value of 2e-57 and a high score of 219 indicating that the function of this gene is unknown. HHPRED also describes the gene as unknown. /note=Transmembrane domains: NCBI blast describes the gene as a membrane protein with 98% coverage and there are 4 TMHs predicted by TmHmm indicating that this protein is likely a transmembrane protein. TOPCONS does not provide any data supporting that thus protein is a transmembrane protein but there is sufficient evidence to conclude that this protein is a transmembrane protein. /note=Secondary Annotator Name: Saha, Atul /note=Secondary Annotator QC: Looks all correct. All relevant evidence included. CDS 19887 - 20318 /gene="23" /product="gp23" /function="membrane protein" /locus tag="TforTroy_23" /note=Genemark calls start at 19887 /note=SSC: 19887-20318 CP: yes SCS: genemark ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Elezi] ],,NCBI, q1:s1 100.0% 1.04223E-79 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.393, -3.7329837339109084, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Elezi] ],,YP_010678001,94.4056,1.04223E-79 SIF-HHPRED: SIF-Syn: /note=DeepTMHMM: 1 TMD -AF /note=Primary Annotator Name: Vazquez, Gilda /note=Auto-annotation: Glimmer did not display a start site and Genemark called a start site number called 19887. /note=Coding Potential: The gene does have reasonable coding potential predicted within the putative ORF. Coding potential is visible within the genes coordinates in the third direct sequence. Higher coding potential observed after 19900. /note=SD (Final) Score: The final score for the called gene is -3.733 with a Z-score of 2.393.This is the best value observed for this gene. Other start sites listed had higher final scores but low Z-scores like 0.739. /note=Gap/overlap: There is a 0bp overlap (upstream).This is still under proper gene density because the gene does not overlap over 30bp. This also supports that no additional gene needs to be added and that the original called start site can remain unchanged. /note=Phamerator: The gene (25) is found in pham 21631, as of 04/05/2022. It is conserved in some members of the AZ cluster such as phages Elezi and Eraser. The function call of this gene is not listed. /note=Starterator: The start site number most called is number 6 at 19887, called by 8/13 non-draft genes. It was found in 13/20 genes in pham and it was called 84.6% of the time. This is also the same start site called by GeneMark. /note=Location call: This is likely a real gene based on values collected for start site, gap bp, Z-scores, and final score and data for coding potential. Thus, it may be a functional gene and the start site does not need to be changed. /note=Function call: Phagesdb Hit Gene displayed mainly products with function unknown. Highest e-value for non-draft phages were 2e-69 (function unknown) 89% identities and 3e-69 (unknown function) 88% identities in evidence hits. NCBI BLASTs products were listed with membrane protein, with e-values such as 5e-60 & 7e-60. The biggest identities percentage was 89.51% but there is more evidence suggesting unknown function. Since TmHmm finds 1 TMH and TOPCON TMD display transmembrane domain in all programs, it is evidence to support a membrane protein. HHPRED displayed the highest probability at 98.1% for protein of unknown function but no following significant hits. CDD did not display any significant hits. /note=Transmembrane domains: TmHmm only displays 1 predicted TMH. TOPCON TMD presents a transmembrane domain. Together, this shows that it is a membrane protein. /note=Secondary Annotator Name: Sun, Xingzheng /note=Secondary Annotator QC: I agree with the location call in this annotation. All of the evidence has been properly considered. However, it might be a transmembrane protein because it seems that both TmHmm and TOPCON gave predictions of transmembrane domains. CDS 20302 - 20556 /gene="24" /product="gp24" /function="membrane protein" /locus tag="TforTroy_24" /note=Original Glimmer call @bp 20302 has strength 13.91; Genemark calls start at 20326 /note=SSC: 20302-20556 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein HOU52_gp23 [Arthrobacter phage Yang] ],,NCBI, q2:s1 98.8095% 3.77507E-42 GAP: -17 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.114, -2.297359520375101, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein HOU52_gp23 [Arthrobacter phage Yang] ],,YP_009815641,97.5904,3.77507E-42 SIF-HHPRED: SIF-Syn: membrane protein, the upstream gene is in pham 21631, while downstream gene is deoxynucleoside monophosphate kinase, just like in phage DrSierra /note=Primary Annotator Name: Bartolome, Alexandra /note=Auto-annotation: Glimmer predicts a start site of 20302 with a start codon of ATG while GeneMark predicts a start site of 20326 with a start codon of GTG. /note=Coding Potential: The predicted start site 20302 includes all coding potential predicted by GeneMark in the forward strand of the ORF. /note=SD (Final) Score: The start site 20302 has a final score of -2.297 and a Z-score 3.114. These are the best scores. /note=Gap/overlap: There is an overlap of -17 bp, but is reasonable because the overlap is conserved in other phages like Elezi and London. This gene is 255 bp, thus it is the longest ORF. /note=Phamerator: As of 04/05/22, this gene is in pham 10993. This pham is conserved in phages like Adolin, Elezi, and London, which are all in cluster AZ with TforTroy. /note=Starterator: Start site number 6 has the most manual annotations, called in 18/25 non-draft genes in pham 10993. It is at 20302, as predicted by Glimmer. /note=Location call: This is a real gene with a start site most likely at 20302 since it was the most manually annotated start site and includes all coding potential. /note=Function call: membrane protein. Phagesdb BLAST, HHpred, and CDD hits were not informative and showed no known function for this gene. However, one of the top three NCBI BLAST hits with a low e-value of 3.15e-35 has the function membrane protein. /note=Transmembrane domains: TMHMM predicts two TMDs while TOPCONS does not predict any. This gene is a membrane protein. /note=Secondary Annotator Name: Sun, Xingzheng /note=Secondary Annotator QC: I agree with the location and functional call in this annotation. All of the evidence has been properly considered. (I do think it is a membrane protein as well, but I believe TOPCONS did predict some results). CDS 20704 - 21321 /gene="25" /product="gp25" /function="deoxynucleoside monophosphate kinase" /locus tag="TforTroy_25" /note=Original Glimmer call @bp 20704 has strength 13.57; Genemark calls start at 20704 /note=SSC: 20704-21321 CP: yes SCS: both ST: SS BLAST-Start: [adenylate kinase [Arthrobacter phage Yang] ],,NCBI, q3:s2 99.0244% 2.52195E-113 GAP: 147 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.978, -2.442961286954254, yes F: deoxynucleoside monophosphate kinase SIF-BLAST: ,,[adenylate kinase [Arthrobacter phage Yang] ],,YP_009815642,89.2157,2.52195E-113 SIF-HHPRED: c.37.1.1 (A:) Deoxynucleoside monophosphate kinase {Bacteriophage T4 [TaxId: 10665]},,,d1deka_,92.6829,99.9 SIF-Syn: /note=Primary Annotator Name: Mendoza, Alleana /note=Auto-annotation: Glimmer and GeneMark. Both call the start site at 20704 with start codon TTG. /note=Coding Potential: The ORF does have reasonable coding potential and the chosen start site does include all of the coding potential. /note=SD (Final) Score: -2.443; Z-score: 2.978. Both are the best final score and Z-score on PECAAN. /note=Gap/overlap: 391 bp gap. This is a large gap. However, there is no coding potential, implying that a gene should not be added upstream to fill the gap. Therefore, I do not believe a gene should be added. /note=Phamerator: Pham: 54294. Date: 4/5/2022. It is conserved and found in Amyev and Yang. /note=Starterator: Start site 39 in Starterator was manually annotated in 30/155 non-draft genes in this pham. Start 39 is 20704 in TforTroy. This evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the data presented above, this is a real gene with 20704 as the most likely start site. /note=Function call: The top three PhagesDB BLASTp hits within this cluster are all deoxynucleoside monophosphate kinases (E-value 99% coverage, > 72% identity, e value < 2.61e-93). HHpred and CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Scavetti, Alexa /note=Secondary Annotator QC: I have QC`d this annotation and I agree with the primary annotator. Don`t forget to fill out your synteny box! CDS 22221 - 23054 /gene="27" /product="gp27" /function="exonuclease" /locus tag="TforTroy_27" /note=Original Glimmer call @bp 22221 has strength 10.78; Genemark calls start at 22221 /note=SSC: 22221-23054 CP: yes SCS: both ST: SS BLAST-Start: [exonuclease [Arthrobacter phage Yang] ],,NCBI, q1:s1 99.639% 0.0 GAP: 205 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.211, -2.606079664606164, yes F: exonuclease SIF-BLAST: ,,[exonuclease [Arthrobacter phage Yang] ],,YP_009815644,99.2806,0.0 SIF-HHPRED: DNA replication ATP-dependent helicase/nuclease DNA2; DNA binding protein, Hydrolase-DNA complex; HET: ADP; 2.36A {Mus musculus},,,5EAN_A,83.0325,98.8 SIF-Syn: /note=AF: called as exonuclease per harmonization. /note=Primary Annotator Name: West, Julie /note=Auto-annotation: Genemark and Glimmer call the start @22221 /note=Coding Potential: There is strong coding potential in self and host genemark. The gene shows synteny with phages Crewmate and Iter. /note=SD (Final) Score: The start @22221 has the least negative final score (-2.606) on PECAAN. The Z-score is 3.211 which is acceptable. /note=Gap/overlap: There is a 205 bp gap upstream which is significant, but it is the smallest gap amongst the largest more reasonable ORFs. The gap appears to be conserved in other AZ genomes, such as phage Crewmate. /note=Phamerator: Pham 98715 has 102 non-draft genomes as of 4/5/2022. /note=Starterator: Starterator predicts start 37 @ 22221 which is the most annotated start in the pham with 30 manual annotations. This analysis was run on 4/1/2022. /note=Location call: Given that start @22221 has a codon of GTG and the above information, it is evident that this is the most probable start site. /note=Function call: RecB-like exonuclease. NCBI BLASTp had several hits with an e-value of 0 and 93% coverage for this function. There are hits for “RepA like helicase” as well as exonucleases. Per the SEA-PHAGES approved function list, if both domains are present, the RecB-like exonuclease function should be used. PhagesDB BLASTn gave several exonuclease hits with e values of ~10e-159. HHPred and CDD did not provide informative hits. /note=Transmembrane domains: TOPCONS and TmHmm do not predict transmembrane domains. /note=Secondary Annotator Name: Taylor, Amaya /note=Secondary Annotator QC: I QC`d this gene and agree with the primary annotator. CDS 23051 - 23188 /gene="28" /product="gp28" /function="hypothetical protein" /locus tag="TforTroy_28" /note=Original Glimmer call @bp 23051 has strength 8.74; Genemark calls start at 23051 /note=SSC: 23051-23188 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ICEWARRIOR_67 [Streptomyces phage IceWarrior] ],,NCBI, q6:s2 82.2222% 1.56762E-6 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.806, -3.15492870275902, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ICEWARRIOR_67 [Streptomyces phage IceWarrior] ],,QAY16279,56.8627,1.56762E-6 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chang, Julia /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 23051bp with codon ATG. /note=Coding Potential: There is high coding potential found both on self and host within Glimmer and GeneMark. Coding potential is on the forward strand only, suggesting that it is a forward gene. This high coding potential suggests that this is a real gene. /note=SD (Final) Score: The final score is best at -3.155 and the z-score is best at 2.806, both meeting the threshold for reasonable final and z-scores of a real gene. /note=Gap/overlap: There is an overlap of 4bp with gene (stop@23504 F) downstream of the ORF. The overlap has synteny with other non-draft phage genomes such as Crewmate and and DrSierra. /note=Phamerator: The Pham number as of 04/01/2022 is 48058. This gene is also conserved in phages Crewmate and Iter which are in the same subcluster AZ. /note=Starterator: Start site 12 is conserved among other members - such as FidgetOrca and Crewmate - of this pham at position 23051bp. 15 of the 42 non-draft phage annotations called this start site, along with 2 manual annotations. /note=Location call: Based on the gathered evidence, this is a real gene because of its full coding potential and being conserved in both Pharmerator and Starterator. /note=Function call: NKF. NKF. The top two hits for both phagesDB BLAST and NCBI BLAST hits had small e-values ranging from 10e-6 to 1e-19, but all hits had unknown functions. CDD and HHpred had no significant hits. /note=Transmembrane domains: Since neither TMHMM nor TOPCONS predicted any TMDs, it can be assumed that this gene is not a membrane protein. /note=Secondary Annotator Name: Vazquez, Gilda /note=Secondary Annotator QC: I have QC`d this annotation, and I agree with the primary annotator. Given current PECAAN notes and evidence for this gene, I agree with the location call for this gene. CDS 23185 - 23553 /gene="29" /product="gp29" /function="nucleoside deoxyribosyltransferase" /locus tag="TforTroy_29" /note=Original Glimmer call @bp 23185 has strength 7.72; Genemark calls start at 23209 /note=SSC: 23185-23553 CP: yes SCS: both-gl ST: SS BLAST-Start: [MazG-like pyrophosphatase [Arthrobacter phage Amyev] ],,NCBI, q1:s1 97.541% 3.66222E-72 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.461, -6.154904741023137, no F: nucleoside deoxyribosyltransferase SIF-BLAST: ,,[MazG-like pyrophosphatase [Arthrobacter phage Amyev] ],,YP_010677731,93.4426,3.66222E-72 SIF-HHPRED: c.23.14.1 (A:9-160) Nucleoside 2-deoxyribosyltransferase {Trypanosome (Trypanosoma brucei) [TaxId: 5691]},,,d2f62a2,88.5246,99.7 SIF-Syn: /note=Primary Annotator Name: Hatashita, Anthony /note=Auto-annotation: Both Glimmer and GeneMark predict start sites, however they disagree. Glimmer predicts start site 23185 while GeneMark predicts start site 23209. /note=Coding Potential: There is coding potential for both predicted start sites on both self and host trained genemark, however there is stronger coding potential for start site 23185 predicted by Glimmer. /note=SD (Final) Score: The Z-score of 1.461 is fairly weak, alongside a very low final score of -6.155. These are not the best Z and final scores, however they most closely align with the coding potential predicted by the self and host trained genemark coding potentials. /note=Gap/overlap: A gap of -4 is reasonable and may indicate the presence of an operon. The start site 23185 with gap -4 is stronger than other potential start sites with longer ORFs because the other gaps are unreasonably large or small. The length of the gene is acceptable given the start site. /note=Phamerator: The gene is found in pham 67497 as of 3/31/2022. The pham that the gene is conserved in is other members of the subcluster such as Adumb2043 and Amyev. There were no functions called for this gene. /note=Starterator: There is a reasonable start site choice conserved among members of the pham. It corresponds to start site 32 at base pair 23185 for the gene. Start site 32 is found in 30 of 44 of genes in the pham and is called 96.7% of the time when present. /note=Location call: Altogether, the gathered evidence suggests that this is a real gene as it has good coding potential, is conserved in phamerator, and shows synteny with other genes in the subcluster. The start site that is conserved in starterator and covers all coding potential making it the best option is 23185. /note=Function call: The predicted function for the gene is nucleoside deoxyribosyltransferase. The top Phagesdb BLAST results sorted by E-value indicate a suggested function of nucleoside deoxyribosyltransferase with E-values of 6e-64 and 6e-62 for phages Crewmate and Amyev. For HHpred, of the top hits sorted by E-value, the only with a known function is nucleoside 2-deoxyribosyltransferase with 99.7% probability, 88.5246 coverage, and an E-value of 1.1e-15. /note=Transmembrane domains: There are no transmembrane domains called for this gene, so it does not code for a membrane protein. /note=Secondary Annotator Name: Bartolome, Alexandra /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 23550 - 23954 /gene="30" /product="gp30" /function="LAGLIDADG endonuclease" /locus tag="TforTroy_30" /note=Original Glimmer call @bp 23550 has strength 9.82; Genemark calls start at 23550 /note=SSC: 23550-23954 CP: yes SCS: both ST: SS BLAST-Start: [HNH endonuclease [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 3.09428E-90 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.498, -3.513874779476501, yes F: LAGLIDADG endonuclease SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage Yang] ],,YP_009815647,99.2537,3.09428E-90 SIF-HHPRED: LAGLIDADG_1_child_1_child_1 LAGLIDADG endonuclease. LAGLIDADG endonuclease,,,cd09953,68.6567,99.5 SIF-Syn: LAGLIDADG endonuclease, upstream gene is nucleoside deoxyribosyltransferase and downstream gene is recombination directionality factor. Phages Tbone and Elezi show this order of gene function as well. /note=Primary Annotator Name: Mao, Xuanting /note=Auto-annotation: Both Glimmer and GeneMark called the same start site which is 23550. The start codon for this start site is ATG. /note=Coding Potential: The ORF does have reasonable coding potential and the chosen start site (23550)does include all of the coding potentials in which coding potential is found in both self-trained and host-trained GeneMark in the forward frame. /note=SD (Final) Score: SD Score is the highest among other potential start sites which is -3.514. Z-value is also the highest, which is 2.498. /note=Gap/overlap: There’s a 4 bp overlap with the upstream stop codon and this is likely to be the part of an operon. This is also supported by the overhang sequence of ATGA. /note=Phamerator: The Pham number as of 4/1/2022 is 100678. The gene is conserved in 55 phages in which most of them belong to cluster AZ. /note=Starterator: The start site called the most often in the published annotation is 18, it was called in 26 of the 40 non-draft genes in the pham. TforTroy is the one that has the most annotated start number 18 and the start site is 23550 which is also called by Glimmer and GeneMark. Moreover, there are 26 MA’s for this start site. /note=Location call: The start site for this gene is likely to be 23550 because it was called by Glimmer and GeneMark. It contains all of the coding potentials, it has the best SD Score and Z-value. Moreover, phamerator and starterator also provide evidence to support this start site. /note=Function call: The function for this gene is LAGLIDADG endonuclease because most of the NCBI hits suggest it as LAGLIDADG endonuclease with high coverage(~100%), high % identity (~100%), and low e-value(~ e-91). PhagesDB has a 28% functional frequency to say that this gene has the function of LAGLIDADG endonuclease. CDD says this gene belongs to the LAGLIDADG-like domain. For HHpred, it also marks this gene as LAGLIDADG endonuclease because the significant hits are the ones with high probability (~100%), high coverage(~68%), and low e-value (~e-13). By combining all of the data from these blasts, it’s convincing to say that this ORF has a function as LAGLIDADG endonuclease. /note=Transmembrane domains: None. There’s no transmembrane domain shown on TMHMM and TOPCONS in which both of them are called 0 TMD. /note=Secondary Annotator Name: Fields Brooke /note=Secondary Annotator QC: I agree with the primary annotator. Great notes! CDS 24076 - 24789 /gene="31" /product="gp31" /function="recombination directionality factor" /locus tag="TforTroy_31" /note=Original Glimmer call @bp 24076 has strength 12.77; Genemark calls start at 24076 /note=SSC: 24076-24789 CP: yes SCS: both ST: SS BLAST-Start: [recombination directionality factor [Arthrobacter phage Lego]],,NCBI, q1:s1 100.0% 1.52819E-159 GAP: 121 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.202, -4.423424636048095, no F: recombination directionality factor SIF-BLAST: ,,[recombination directionality factor [Arthrobacter phage Lego]],,QIN94429,95.7806,1.52819E-159 SIF-HHPRED: Gp3-like ; Recombination directionality factor-like,,,PF18897.3,88.1857,100.0 SIF-Syn: /note=Primary Annotator Name: Saha, Atul /note=Auto-annotation: Glimmer and GeneMark call the start at 24,076. Start codon is ATG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. GM Self and Host indicate some coding potential after the stop site, indicating that this gene could be part of an operon. /note=SD (Final) Score: 24,076 start site has best Z-score of 2.202 and second highest final score of -4.423. Start @ 24712 does have higher final score of -4.190, but the gap of 757 is unrealistic (gap also not found in other phages in same cluster) /note=Gap/overlap: Start @ 24,076 has a gap of 121 bases, which is a large gap but the lowest of potential start sites. However, this gap is conserved in other subcluster AZ phages (DrSierra, Lego, Powerpuff). There is no coding potential in the gap. This indicates that this gene could be the first gene in an operon. /note=Phamerator: pham: 4822. Date 4.5.22. It is conserved; found in Powerpuff, Lego, DrSierra (all subcluster AZ). /note=Starterator: Pham 101622 has 111 members, 24 of which are drafts. Start site 24 in Starterator was manually annotated in 43 of 87 non-draft genes for subcluster in this pham. Start 24 is 24076 in TforTroy. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 24076 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: Recombination Directionality Factor. Per the PhagesDB Function Frequency, members of pham 4822 are recombination directionality factors. The top three phagesdb BLAST hits with assigned functions have the function of recombination directionality factor (E-value = 1e-129), and the top 8 NCBI BLAST hits also have the function of recombination directionality factor. (100% coverage, 94%+ identity, and E-value ≤ 6e-158). CDD returned no hits. HHpred had one hit for recombination directionality factor protein with 100% probability, 88% coverage, and E-value of 5.3e-34. /note=Transmembrane domains: TMHMM & TOPCON do not predict a transmembrane domain. NOT a membrane protein. /note=Secondary Annotator Name: Mendoza, Alleana /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. However, it is not the highest final score, start site 24712 has a final score of -4.190 considering including a brief explanation why the predicted start site is still the better option despite of this. For NCBI Blast hits, do you mean to put recombination directionality factor not minor tail protein. Lastly, don’t forget to fill out the Synteny box. CDS 24790 - 24936 /gene="32" /product="gp32" /function="membrane protein" /locus tag="TforTroy_32" /note=Original Glimmer call @bp 24790 has strength 10.79; Genemark calls start at 24790 /note=SSC: 24790-24936 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Cassia]],,NCBI, q1:s1 100.0% 5.01172E-23 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.087, -4.665772780726076, no F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Cassia]],,WGH21104,100.0,5.01172E-23 SIF-HHPRED: SIF-Syn: Membrane protein. The Pham number is 103000. Upstream is recombination directionality factor, downstream Pham number 64637, just like in phages Adumb2043 and Amyev. /note=Primary Annotator Name: Sun, Xingzheng /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 24790 bp. /note=Coding Potential: The ORF has strong coding potential, and the chosen start covers all of the coding potential. It is found both in GeneMark Self and Host. /note=SD (Final) Score: Final score is the best at -4.666, which is almost the same as the other top three starts. The z-score is at average at 2.087. /note=Gap/overlap: A gap of 0 bp. It is a good value. /note=Phamerator: The Pham number is 103000 as of 04/01/22. The gene is conserved in phage Berrie and Cassia in the same cluster AZ. /note=Starterator: The start site 4 was called in 1 out of 1 non-draft genes and found in 4 of 6 of genes in pham, which correlates to the start 24790 in TforTroy called in GeneMark. /note=Location call: Based on the evidence above, this is a real gene starting at 24790. /note=Function call: Function unknown but can be called Membrane Protein. Phagesdb BLAST hits no non-draft phage genomes and suggests function unknown. HHPRED had no relevant hits. NCBI Blast hits membrane protein with 68.75% identity, 77.033% aligned, 100% coverage, and e-value of 9.34595e-12. /note=Transmembrane domains: Qualified to be called Membrane Protein. 1 TMD predicted by TMHMM. 1 TMD detected by Topcons. /note=Secondary Annotator Name: Rodriguez, Sean /note=Secondary Annotator QC: I agree with this annotation and functional call. On GeneMark Self and Host, there are regions of coding potential that are not covered by the chosen start site. Potentially ask the professor/TA about why this is happenning. Go more into depth about this gap; is it conserved amongst other AZ phages? With what we saw in coding potential, does such a short gene length make sense? The Starterator evidence is not as informative as you`re making it out to be; the pham is nearly 100% drafts so there`s no real validation of MA of start site 4 anywhere else but with TforTroy. Could the coding potential for the previous gene be playing a role? Reconsider whether Starterator is informative and if you keep your position, please provide more justification. CDS 25014 - 25364 /gene="33" /product="gp33" /function="hypothetical protein" /locus tag="TforTroy_33" /note=Genemark calls start at 25014 /note=SSC: 25014-25364 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein SEA_CASSIA_33 [Arthrobacter phage Cassia]],,NCBI, q1:s1 100.0% 1.39332E-75 GAP: 77 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.068, -2.253486910374644, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CASSIA_33 [Arthrobacter phage Cassia]],,WGH21106,98.2759,1.39332E-75 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Torres Espinosa, Michael /note=Auto-annotation: Auto annotation is present for Genemark. The start site called by Genemark is 25014, and the start codon is GTG. /note=Coding Potential: This gene illustrates adequate coding potential. Genemark host illustrates strong coding potential spanning from the called start site to the called stop site. Genemark self also shows mostly strong coding potential spanning from the called start site to the called stop site. /note=SD (Final) Score: The selected RBS score is -2.253. This is the best score given that it is the closest to 0. The selected Z-score is 3.068. This is also the best Z-score since it is the largest. /note=Gap/overlap: The gap with the downstream gene is 77 bp long. This is somewhat large, but it is the best option. The other options include large overlaps, the least of which is 343 base pairs. /note=Phamerator: As of April 6, 2022, the gene belongs to pham 64637. This gene is conserved in phages Adolin, Adumb2034, and Crewmate. Each of these phages belongs to the cluster AZ, which is the same as TforTroy. /note=Starterator: There are 23 non-draft annotations for this pham. 22/23 non-draft annotations call start site 7, which corresponds to the base pair coordinate 25014 in TforTroy. This agrees with the start site called by Genemark. /note=Location call: Prior evidence tells this is a real gene with a start site at base pair coordinate 25014. Starterator agrees with Genemark. /note=Function call: This gene has no known function. The top 10 non-draft hits in PhagesDB BLAST have E-values spanning from 6e-57 to 1e-55. Each of these 10 hits from PhagesDB BLAST has no known function. The top 10 hits in NCBI blast have E-values spanning from 6e-72 to 1e-67, coverage of 100%, and percent identity ranging from 86.21% to 92.24%. Each of these hits from NCBI Blast has no known function. CDD was not informative and showed no hits. HHpred was also not informative because each of the hits it displayed had E values that were very high. /note=Transmembrane domains: TMHMM and TOPCONS do not call any TMDs for this gene. This is not a membrane protein. /note=Secondary Annotator Name: West, Julie /note=Secondary Annotator QC: I have reviewed the evidence and agree with the primary annotater. CDS 25368 - 25634 /gene="34" /product="gp34" /function="NrdH-like glutaredoxin" /locus tag="TforTroy_34" /note=Genemark calls start at 25368 /note=SSC: 25368-25634 CP: yes SCS: genemark ST: SS BLAST-Start: [NrdH-like glutaredoxin [Arthrobacter phage Cassia]],,NCBI, q1:s1 100.0% 1.375E-55 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.057, -3.1837611541087343, yes F: NrdH-like glutaredoxin SIF-BLAST: ,,[NrdH-like glutaredoxin [Arthrobacter phage Cassia]],,WGH21107,97.7273,1.375E-55 SIF-HHPRED: Putative oxidoreductase; APC23140, meticillin-resistant Staphylococcus aureus, oxidoreductase, thioredoxin fold, Structural Genomics, PSI-2, Protein Structure Initiative, Midwest Center for; HET: MSE; 1.5A {Staphylococcus aureus subsp. aureus},,,3IV4_A,86.3636,99.2 SIF-Syn: /note=Primary Annotator Name: Pham, Britney /note=Auto-annotation: No glimmer start site called. GeneMark start site is called 25368. Start codon being TTG /note=Coding Potential: GeneMark Self and GeneMark Host showed that the gene does have good coding potential at the third open reading frame. There seems to be high typical coding potential indicating that this has the potential to be a real gene. Doesn’t seem to be much potential toward the end by the stop site /note=SD (Final) Score:The gene had a Final score of -3.184 which is the highest (least negative) final score of all the possible stop sites. Z-score of 3.057 which is the highest score of all the possible start sites. /note=Gap/overlap: There is a gap of 3, meaning no gene can be added in that region. This gap is also quite reasonable and is the longest reasonable ORF with a sufficient sequence length /note=Phamerator: Gene is found in pham number 96417 as of 4/5/2022. There are 949 members of the pham with a function called, glutaredoxin. The members of the pham come from all different clusters. /note=Starterator: Start number 89, manually called 14 times. This is not the most annotated start site. The most annotated start site is 107 which was called in 161 of the 873 non-draft genes in the pham. /note=Location call: The start site seems to be 25368 with a stop site of 25634. No gene needs to be added upstream as there are no large gaps. The start site could be debated on because this was not the most annotated site and could be further worked upon. /note=Function call: Glutaredoxin, from both NCBI and phagesDB, the function of the gene has been determined through strong hits on phagesDB Lizalica /note= (e-value 2e-41) and Tweety19 (e-value 6e-41). On NCBI, there were hits with DrSierra(e-value 1e-50) and Warda (e-value 2e-48) with the same function of Glutaredoxin. The function frequency table shows that the main function called is glutaredoxin meaning other manual annotators agreed on the function. CDD had 1 hits and HHpred had 250 hits calling glutaredoxin with e-values of 1.5e-9. Overall, the function is deemed as glutaredoxin. /note=Transmembrane domains: TopCon and TmHmm did not have hits for transmembrane so the gene does not seem to be transmembrane. /note=Secondary Annotator Name: Chang, Julia /note=Secondary Annotator QC: I agree with the primary annotator. CDS 25631 - 25831 /gene="35" /product="gp35" /function="hypothetical protein" /locus tag="TforTroy_35" /note=Genemark calls start at 25631 /note=SSC: 25631-25831 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein SEA_CASSIA_35 [Arthrobacter phage Cassia]],,NCBI, q1:s1 96.9697% 4.58412E-31 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.978, -3.3503726477288405, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CASSIA_35 [Arthrobacter phage Cassia]],,WGH21108,89.5522,4.58412E-31 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Taylor, Amaya /note=Auto-annotation: Glimmer did not call a start site for this gene. GeneMark called the start site at 25631. /note=Coding Potential: Coding potential was found in both GeneMark Self and GeneMark Host. Coding potential is only present in the middle ORF of the forward strand, confirming that this is a forward gene. /note=SD (Final) Score: The final score is –3.350 and is the best final score on PECAAN. /note=Gap/overlap: There is an overlap of -4 given by PECAAN indicating that this gene may be an operon. ATG is the given start codon. /note=Phamerator: pham: 57401. Date 04/05/22. The start site is conserved and found in Amyev_34 and Astro_54. /note=Starterator: Start site 14 in starterator was manually annotated in 30/32 non-draft genes in this pham. This evidence agrees with the start site predicted by GeneMark Host and Self. /note=Location call: Based on the evidence above, this is a real gene and it’s likely start site is 25631 /note=Function call: unknown; Phagesdb BLAST shows an e-value of 5e-26 and a high score of 114 indicating that the function of this gene is unknown. /note=Transmembrane domains: In TmHmm, there are no TMHs predicted. There is also no data given by TOPCONS indicating that this protein is not a transmembrane protein. /note=Secondary Annotator Name: Mao, Xuanting /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator. (I think you could add more details for the function call, such as the significant hits from NCBI Blast also suggest that this gene is NKF.) CDS 25828 - 26055 /gene="36" /product="gp36" /function="hypothetical protein" /locus tag="TforTroy_36" /note=Original Glimmer call @bp 25828 has strength 11.11; Genemark calls start at 25828 /note=SSC: 25828-26055 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HOU52_gp35 [Arthrobacter phage Yang] ],,NCBI, q1:s1 96.0% 4.33722E-18 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.596, -3.2441186272167593, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU52_gp35 [Arthrobacter phage Yang] ],,YP_009815653,75.0,4.33722E-18 SIF-HHPRED: SIF-Syn: Upstream is NKF (pham 55522) and downstream is holliday junction protein resolvase (pham 102725). This gene displays synteny with phage Iter to the right of the gene and with Crewmate to the left of the gene. Phage Yang is the only phage in which synteny is observed in both upstream and downstream of the gene. This can support that it is a real gene. /note=Primary Annotator Name: Vazquez, Gilda /note=Auto-annotation: Glimmer and Genemark both agree on the same start site 25828. /note=Coding Potential: The gene does have reasonable coding potential predicted within the putative ORF. Coding potential is most visible within the gene coordinates in the first direct sequence. The chosen start site covers all the coding potential of the gene. /note=SD (Final) Score: The best final score for the called gene is -3.244 with a Z-score of 2.596. This is the best value observed for this gene. /note=Gap/overlap: There is a -4bp overlap observed upstream of the gene. This is under proper gene density since the gene does not overlap over 30bp. /note=Phamerator: The gene (38) is found in pham 20561 as of 04/05/2022. It is conserved in other members within the AZ cluster, such as Yang but not observed in other phages. /note=Starterator: The start site number most called is number 2 at 25828, called by 2/2 non-draft genes. It was observed in 4/4 genes in pham and called 100% of the time. This is the same start site called by GeneMark. /note=Location call: This is likely a real gene concluded from data and values collected. The original called start site can be used and does not have to be adjusted. /note=Function call: Phagesdb BLASTp displayed products with function unknown. Highest e-value for non-draft phages were 1e-16 (function unknown) and 4e-16 (function unknown). 59% and 58% identities in evidence hits, respectively. NCBI BLASTs products were listed as hypothetical proteins, with e-values such as 4e-19 & 3e-18. The greatest identities percentage was 59.46%. HHPRED displayed at most 52.07 probability for tail terminator protein, but there is not enough information to support this protein function. CDD did not display data. All current data support that the function is NKF. /note=Transmembrane domains: TmHmm shows 0 predicted TMHs. TOPCON TMD displays no supporting data. These findings support that it is not a membrane protein. /note=Secondary Annotator Name: Saha, Atul /note=Secondary Annotator QC: Looks all correct. All relevant evidence included. CDS 26052 - 26408 /gene="37" /product="gp37" /function="Holliday junction resolvase" /locus tag="TforTroy_37" /note=Genemark calls start at 26052 /note=SSC: 26052-26408 CP: yes SCS: genemark ST: SS BLAST-Start: [holliday junction resolvase [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 2.44245E-72 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.903, -2.6013996449736907, yes F: Holliday junction resolvase SIF-BLAST: ,,[holliday junction resolvase [Arthrobacter phage Yang] ],,YP_009815654,95.7983,2.44245E-72 SIF-HHPRED: HOLLIDAY-JUNCTION RESOLVASE; HYDROLASE, ENZYME, HOMOLOGOUS RECOMBINATION, HOLLIDAY JUNCTION RESOLVING ENZYME, NUCLEASE, ARCHAEA, THERMOPHILE; HET: EDO, SO4; 1.8A {SULFOLOBUS SOLFATARICUS} SCOP: c.52.1.18,,,1OB8_B,83.8983,99.4 SIF-Syn: holliday junction resolvase, the upstream gene is in Pham 34494 like in phage Yang, the downstream gene is in Pham 788 like in phage Crewmate /note=Primary Annotator Name: Bartolome, Alexandra /note=Auto-annotation: GeneMark predicts a start site at 26052 with a start codon of GTG. /note=Coding Potential: The start site 26052 includes coding potential predicted by GeneMark in the forward strand of the ORF. /note=SD (Final) Score: The final score for this start site is -2.601 and the Z-score is 2.903. These are the best scores. /note=Gap/overlap: There is an overlap of -4 bp, which suggests that this gene is part of an operon. This gene is 357 bp long and while it is not the longest ORF it has the most reasonable overlap and best Z-score and final score compared to other start sites. /note=Phamerator: As of 04/06/22, this gene is in pham 102725. This pham is conserved in phages Adolin, Crewmate, and Eraser which are all in cluster AZ with TforTroy. /note=Starterator: Start number 39 has the most manual annotations, with 37/88 non-draft genes in the pham calling it. In TforTroy, this start site is at 26502, as predicted by GeneMark. /note=Location call: This is a real gene with a start site most likely at 26052, which includes coding potential and was the most manually annotated start site and is highly conserved in pham 102725. /note=Function call: holliday junction resolvase. The top Phagesdb BLAST hits have the suggested function holliday junction resolvase, with small e-values of 1e-51 to 2e-57. The top two NCBI BLAST hits have the function holliday junction resolvase, with 100% coverage, greater than 92.5% alignment, and e-values of 1.8e-72 to 8.2e-69. The top two HHPRED hits have the function holliday junction resolvase, with greater than 83.9% coverage and e-values of 6.8e-12 to 1.1e-9. /note=Transmembrane domains: TMHMM and TOPCONS do not predict transmembrane domains, so this gene is not a membrane protein. /note=Secondary Annotator Name: Saha, Atul /note=Secondary Annotator QC: Looks all correct. All relevant evidence included. CDS complement (26378 - 26539) /gene="38" /product="gp38" /function="hypothetical protein" /locus tag="TforTroy_38" /note=Genemark calls start at 26539 /note=SSC: 26539-26378 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein SEA_OBITOO_41 [Arthrobacter phage ObiToo]],,NCBI, q1:s1 98.1132% 1.95149E-19 GAP: 198 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.913, -4.741583905672826, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_OBITOO_41 [Arthrobacter phage ObiToo]],,WGH21217,80.0,1.95149E-19 SIF-HHPRED: SIF-Syn: The downstream gene is apart of pham 102725 It is a holliday junction resolvase. Upstream gene is apart of pham 83049 and is a DNA primase/helicase /note=Primary Annotator Name: Fields, Brooke /note=Auto-annotation: Genemark calls for this gene at start site 26539. GLimmer does not call for this gene /note=Coding Potential: There is evidence of coding potential at this start site on both the self and host genemark map /note=SD (Final) Score: It has the strongest final score -4.742. and the z-score is 1.913 /note=Gap/overlap: Gap is 198 bp which is larger than length of gene which is 162 bp /note=Phamerator: A member of Pham 788 (4/10/22) out of 32 members, AZ is the dominant cluster in the pham. No function assigned for this gene. It was compared to genes Liebe and Lizalica /note=Starterator: This gene called for the most annotated start. 17/23 nondraft genes call for most annotated. (Start: 21 @26539 has 17 MA`s) /note=Location call: True start is 26539 /note=Function call: No CDD hits. HHpred did not provide significant evidence to suggest a function. NCBI suggest it is a hypothetical protein /note=Transmembrane domains: Neither TMHMM or TOPCONS predicted any TMD’s /note=Secondary Annotator Name: Sun, Xingzheng /note=Secondary Annotator QC: I agree with the location and functional call in this annotation. All of the evidence has been considered. I think you can call it NKF. (Suggestion: you could put more words for explanations. Also, fill out the Synteny). CDS 26738 - 29230 /gene="39" /product="gp39" /function="DNA primase/helicase" /locus tag="TforTroy_39" /note=Original Glimmer call @bp 26738 has strength 10.62; Genemark calls start at 26738 /note=SSC: 26738-29230 CP: yes SCS: both ST: SS BLAST-Start: [DNA primase/helicase [Arthrobacter phage Lego]],,NCBI, q1:s1 100.0% 0.0 GAP: 198 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.978, -2.442961286954254, yes F: DNA primase/helicase SIF-BLAST: ,,[DNA primase/helicase [Arthrobacter phage Lego]],,QIN94438,94.6988,0.0 SIF-HHPRED: Primase; primase, helicase, ssDNA-binding protein, TRANSFERASE; HET: SO4; 2.406A {Nitratiruptor phage NrS-1},,,6K9C_A,40.241,100.0 SIF-Syn: DNA primase/helicase, upstream gene is part of Pham 788, downstream gene belongs to Pham 54541, just like in phages Asa16 and Tweety19. /note=Primary Annotator Name: Rodriguez, Sean /note=Auto-annotation: Glimmer and GeneMark. Both agree on a start site at 26738 bp. /note=Coding Potential: Coding potential found in GeneMark Self and Host along the entire putative ORF. Putative ORF only has significant coding potential in the forward strand, indicating that this is a forward gene. Chosen start site covers all coding potential. /note=SD (Final) Score: Start site at 26738 bp has a final score of -2.443, which is the best of the candidate start sites. The Z score of 2.978 is also the best. /note=Gap/overlap: 198 bp. This gap is large but ones of similar sizes are conserved in other AZ phages like Crewmate and Niobe. Since the upstream gene is coded on the reverse strand, such a gap might be needed so that transcription promoters and related machinery can fit. The autoannotated start site makes the longest ORF of 2493 bp, which is acceptable. /note=Phamerator: Pham number is 83049 as of April 5, 2022. The gene is conserved in other cluster AZ phages, including Adolin and Amyev. The function call in the phams database is DNA primase/helicase, which matches the phamerator. /note=Starterator: There are 85 non-draft members in this pham, and 38/85 call start site 27. This does not correspond to the start site at 26738 bp for TforTroy (which is 35), but this start site is annotated a significant amount of times within this pham (33/85). /note=Location call: Considering the above evidence, this is a real gene with a start site at 26738 bp. /note=Function call: 3 of the top 5 hits on PhagesDB BLAST called DNA primase/helicase (e values = 0, % identities > 89%). All of the top 5 NCBI BLAST hits had the same protein for the function call (> 99% coverage, > 89% identity, e value = 0). HHpred had two hits for primase, helicase, ssDNA-binding protein and DNA primase (> 99% probability, > 36% coverage, e value < 4.9e-21). CDD had no relevant hits /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Pham, Britney /note=Secondary Annotator QC: I agree with the function and the start site called for this gene. Synteny still needs to be filled out. CDS 29241 - 29351 /gene="40" /product="gp40" /function="hypothetical protein" /locus tag="TforTroy_40" /note=Genemark calls start at 29241 /note=SSC: 29241-29351 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein PQE13_gp38 [Arthrobacter phage Elezi] ],,NCBI, q1:s2 100.0% 4.99896E-13 GAP: 10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.222, -1.9310779259753799, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE13_gp38 [Arthrobacter phage Elezi] ],,YP_010678016,86.4865,4.99896E-13 SIF-HHPRED: SIF-Syn: NKF. upstream gene function is DNA primase/helicase and downstream gene function is DNA polymerase I, just like in phage Adumb2043 /note=Primary Annotator Name: West,Julie /note=Auto-annotation:No prediction for glimmer, but Genemark predicts the start at 29241. /note=Coding Potential: There is good coding potential in the predicted ORF. It appears that there is coding potential following the gene which is not already accounted for, so a gene likely needs to be added. /note=SD (Final) Score:-1.931. This is the least negative score on PECAAN. /note=Gap/overlap: There is a 10 bp gap upstream. This is the smallest possible gap among the start sites. /note=Phamerator: pham 54541. As of 4/7/2022, there are 26 non-draft members. /note=Starterator: start site @29241 has 9 manual annotations and was called 100% of the time when present as of 4/7/2022 /note=Location call: Start @29241 has an ATG codon, the largest reasonable ORF, and is supported by Starterator. It covers all the coding potential. Synteny is shown with phage Kaylissa. /note=Function call: NKF. PhagesDB BLAST gave several hits with low e-values and blast scores over 70. NCBI BLASTp gave many hits with low e-values and identities over 90% for NKF. HHpred and CDD were not informative. /note=Transmembrane domains: TOPCONS and TmHmm do not predict any transmembrane domains. /note=Secondary Annotator Name: Scavetti, Alexa /note=Secondary Annotator QC: I have QC`d this annotation and I agree with the primary annotator. CDS 29351 - 29473 /gene="41" /product="gp41" /function="hypothetical protein" /locus tag="TforTroy_41" /note= /note=SSC: 29351-29473 CP: no SCS: neither ST: NI BLAST-Start: [hypothetical protein HOU52_gp40 [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 1.34622E-16 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.694, -5.966914258667748, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU52_gp40 [Arthrobacter phage Yang] ],,YP_009815658,95.0,1.34622E-16 SIF-HHPRED: SIF-Syn: /note=added gene -AF CDS 29677 - 31542 /gene="42" /product="gp42" /function="DNA polymerase I" /locus tag="TforTroy_42" /note=Original Glimmer call @bp 29677 has strength 12.34; Genemark calls start at 29677 /note=SSC: 29677-31542 CP: yes SCS: both ST: SS BLAST-Start: [DNA polymerase I [Arthrobacter phage Yang] ],,NCBI, q1:s24 100.0% 0.0 GAP: 203 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.978, -2.442961286954254, yes F: DNA polymerase I SIF-BLAST: ,,[DNA polymerase I [Arthrobacter phage Yang] ],,YP_009815659,93.0124,0.0 SIF-HHPRED: DNA polymerase I; mycobacteria, DNA polymerase, Flap endonuclease, TRANSFERASE; 2.713A {Mycolicibacterium smegmatis},,,6VDE_B,97.1014,100.0 SIF-Syn: The gene`s function as DNA Polymerase I shares synteny in the same function with non-draft phages DrSierra and Kaylissa. /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 29677bp with codon TTG. /note=Coding Potential: There is high coding potential found both on self and host within Glimmer and GeneMark. Coding potential is on the forward strand only, suggesting that it is a forward gene. The ORF covers all of the coding potential and this high coding potential suggests that this is a real gene. /note=SD (Final) Score: The final score is best at -2.443 and the z-score is best at 2.978, both meeting the threshold for reasonable final and z-scores of a real gene. /note=Gap/overlap: There is a large gap downstream of this gene, with a gap of 325bp from gene (stop@29351 F). However, this is still reasonable as there is synteny with this large downstream gap in other non-draft phage genomes such as Lizalica and Crewmate. /note=Phamerator: The Pham number as of 04/01/2022 is 47481. This gene is also conserved in phages Adolin and Adumb2043 which are in the same subcluster AZ. /note=Starterator: Start site 62 is conserved among other members - such as in Asa16 and Adolin - of this pham at position 29677bp. 52 of the 889 non-draft phage annotations called this start site, along with 38 out of 815 manual annotations. /note=Location call: Based on the gathered evidence, this is a real gene because of its full coding potential and being conserved in both Pharmerator and Starterator. /note=Function call: DNA Polymerase I. The top two non-draft phage hits on phagesDB BLAST had a called function for DNA Polymerase with very low e-values that were essentially 0. The top two hits on NCBI BLAST also showed that the gene had a function of DNA Polymerase I, also having very small e-values of 0. HHpred had multiple hits for DNA Polymerase I with 100% probability, 69% coverage, and E-value of 2.3e-68. CDD had a hit for DNA Polymerase I with an e-value of 1.33e-61. /note=Transmembrane domains: Since neither TMHMM nor TOPCONS predicted any TMDs, it can be assumed that this gene is not a membrane protein. /note=Secondary Annotator Name: Taylor, Amaya /note=Secondary Annotator QC: I QC`d this gene and agree with the primary annotator. Great detail. CDS 31539 - 31715 /gene="43" /product="gp43" /function="hypothetical protein" /locus tag="TforTroy_43" /note=Original Glimmer call @bp 31554 has strength 9.08; Genemark calls start at 31554 /note=SSC: 31539-31715 CP: no SCS: both-cs ST: SS BLAST-Start: [hypothetical protein PQD82_gp44 [Arthrobacter phage Phives] ],,NCBI, q1:s1 100.0% 2.85861E-30 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.59, -5.4987073736983305, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQD82_gp44 [Arthrobacter phage Phives] ],,YP_010677678,100.0,2.85861E-30 SIF-HHPRED: SIF-Syn: NKF. Upstream gene is DNA polymerase I and downstream gene is DNA ligase. Phages Crewmate and Lizalica show this order of gene function as well. /note=changed start to -4 (operon)- AF /note=Primary Annotator Name: Mao, Xuanting /note=Auto-annotation: Both Glimmer and GeneMark called the same start site at 31554. The start codon for this start site is ATG. /note=Coding Potential: The ORF does have reasonable coding potential and the chosen start site (31554)does include all of the coding potentials in which coding potential is found in both self-trained and host-trained GeneMark in the forward frame. /note=SD (Final) Score: SD Score is the highest among all of the potential start sites which is -4.981. Z-value for this start site is also the highest, which is 2.294. /note=Gap/overlap: There’s a 11 bp gap. The gap could also observe in Phages Crewmate and Lizalica. /note=Phamerator: The pham number as of April 5, 2022 is 49784. The gene is conserved in 46 phages. 27 of them belong to cluster AZ and others belong to clusters DN and K. /note=Starterator: The pham number as of April 5, 2022 is 49784. The start site called the most often in the published annotation is 20, it was called in 16 of the 30 non-draft genes in the pham. TforTroy is one of the genes that call the most annotated start site at 31554 which is the one called by Glimmer and GeneMark. Moreover, there are 20 MA’s for this start site. /note=Location call: Considering all of the evidence above, the gene is a real gene. The start site is likely to be at 31554, which is the one called by Glimmer and GeneMark. The highest SD Score, highest z-value, and the start site were chosen by most researchers. Thus, 31554 is likely to be the start site for this gene. /note=Function call: The function for this gene is likely to be NKF because a lot of significant hits from NCBI suggest a hypothetical function that has high coverage (~98%), high % identity (~85%), and low e-value (~e-18). PhagesDB function frequency and CDD don’t provide any data for this gene. Other final phages from PhagesDB Blasts also suggest that the function of this gene is unknown. Moreover, the only hit from HHpred is not a good hit because it says that this gene is likely to be RNA annealing protein, but it has a very high e-value (250), and low % coverage (22.6415%), and a low probability (19.4). By gathering all of the information from these BLASTs, this gene is likely to be NKF. /note=Transmembrane domains: None. There’s no transmembrane domain shown on TMHMM and TOPCONS in which both of them are called 0 TMD. /note=Secondary Annotator Name: Bartolome, Alexandra /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 31708 - 32010 /gene="44" /product="gp44" /function="DNA ligase" /locus tag="TforTroy_44" /note=Original Glimmer call @bp 31708 has strength 10.38; Genemark calls start at 31708 /note=SSC: 31708-32010 CP: yes SCS: both ST: SS BLAST-Start: [DNA ligase [Arthrobacter phage London]],,NCBI, q1:s1 99.0% 1.91968E-58 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.978, -2.970161406017234, yes F: DNA ligase SIF-BLAST: ,,[DNA ligase [Arthrobacter phage London]],,QOP64343,95.0,1.91968E-58 SIF-HHPRED: d.142.2.2 (A:1-314) Adenylation domain of NAD+-dependent DNA ligase {Thermus filiformis [TaxId: 276]},,,d1dgsa3,95.0,99.6 SIF-Syn: Synteny with other phages in cluster AZ. Compared with Powerpuff, short gene with unknown function numbered 43 (pham 49784) in the genome upstream and slightly longer gene with unknown function numbered 44 (pham 16103) downstream. 5.24.22 /note=Primary Annotator Name: Saha, Atul /note=Auto-annotation: Glimmer and GeneMark call the start at 31708. Start codon is GTG. 31708 is the only start site that is called. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: 31708 start site has Z-score of 2.978 and final score of -2.970. /note=Gap/overlap: Start @ 24,076 has an overlap of 8 bases. This overlap is conserved in other subcluster AZ phages (DrManhattan, Lego, Powerpuff). /note=Phamerator: pham: 9766. Date 4.7.22. It is conserved; found in Powerpuff, Lego, DrManhattan (all subcluster AZ). /note=Starterator: Pham 9766 has 36 members, 10 of which are drafts. Start site 16 in Starterator was manually annotated in 12 of 26 non-draft genes in this pham. Start 16 is 31708 in TforTroy. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 24076 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: DNA Ligase. Per the PhagesDB Function Frequency, members of pham 4822 are DNA ligases. The top three non-draft phagesdb BLAST hits with assigned functions have the function of DNA ligase (E-value = 1e-47), and the top 3 NCBI BLAST hits also have the function of minor tail protein. (>99% coverage, >91%+ identity, and E-value ≤1.596e-58). CDD returned no hits. HHpred had multiple hits for DNA ligase, top three had >99.5% probability, 95% coverage, and E-value<2.1e-14. /note=Transmembrane domains: TMHMM & TOPCON do not predict a transmembrane domain. NOT a membrane protein. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 32007 - 32312 /gene="45" /product="gp45" /function="hypothetical protein" /locus tag="TforTroy_45" /note=Original Glimmer call @bp 32007 has strength 11.96; Genemark calls start at 32007 /note=SSC: 32007-32312 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HOU52_gp43 [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 5.63739E-60 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.722, -5.080915947802713, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU52_gp43 [Arthrobacter phage Yang] ],,YP_009815661,96.0396,5.63739E-60 SIF-HHPRED: SIF-Syn: Function unknown. The Pham number is 16103. The upstream gene function is DNA ligase (Pham number 9766), and the downstream is DNA binding protein (Pham number 97248), just like in phages DrSierra and Elezi. /note=Primary Annotator Name: Sun, Xingzheng /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 32007. /note=Coding Potential: The ORF has strong coding potential, and the chosen start covers all of the coding potential. It is found both in GeneMark Self and Host on the third row. /note=SD (Final) Score: Final score is the best at -5.081. The z-score is the highest at 1.722. /note=Gap/overlap: An overlap of 4. It might be a part of an operon. /note=Phamerator: The Pham number is 16103 as of 04/01/22. The gene is conserved in phage Adolin and Adumb2043 in the same cluster AZ. /note=Starterator: The start site 30 was called in 64 out of 73 non-draft genes and found in 85 of 95 genes in pham, which correlates to the start 32007 in TforTroy called in GeneMark. /note=Location call: Based on the evidence above, this is a real gene starting at 32007. /note=Function call: Function unknown. Phagesdb BLAST hits phage Yang and phage Crewmate with function unknown and e-values smaller than 3e-45. HHpred had no relevant hits. NCBI Blast hits function unknown with 94.0594% identity, 96.0396% aligned, 100% coverage, and 4e-60 e-value. /note=Transmembrane domains: No TMDs predicted by TMHMM or TOPCONS, suggesting not a membrane protein. /note=Secondary Annotator Name: Mendoza, Alleana /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. However, consider checking at least one more evidence for Phagesdb BLAST hits and NCBI BLAST hits. CDS 32506 - 33342 /gene="46" /product="gp46" /function="DNA binding protein" /locus tag="TforTroy_46" /note=Original Glimmer call @bp 32506 has strength 12.42; Genemark calls start at 32506 /note=SSC: 32506-33342 CP: yes SCS: both ST: SS BLAST-Start: [DNA binding protein [Arthrobacter phage Lego]],,NCBI, q1:s1 100.0% 3.09632E-143 GAP: 193 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.978, -2.583959800616441, yes F: DNA binding protein SIF-BLAST: ,,[DNA binding protein [Arthrobacter phage Lego]],,QIN94444,87.5,3.09632E-143 SIF-HHPRED: RNA polymerase sigma factor RpoS; Transcription-activator, DNA/RNA, SigmaS, beta`, TRANSCRIPTION, Transferase-DNA complex; 3.26A {Escherichia coli},,,6OMF_F,96.0432,100.0 SIF-Syn: /note=Primary Annotator Name: Torres Espinosa, Michael /note=Auto-annotation: Auto annotation is present for Genemark and Glimmer. Both call the start site at base pair coordinate 32506. The start codon is ATG. /note=Coding Potential: Coding potential for this gene is strong. Genemark host illustrates strong coding potential between this gene’s start and stop sites. Genemark self shows strong potential in the region between the start site and stop site for this gene. /note=SD (Final) Score: The RBS final score selected for this gene is -2.584. This is the best score available in PECAAN. The Z-score for this gene is 2.978. This is the best Z-score available for this gene in PECAAN. /note=Gap/overlap: This gene has a 193 bp gap with the upstream gene. This is a very large gap, but is reasonable because it lacks coding potential and is conserved in other phage genomes in the same cluster, such as Adolin, Adumb2043, and Amyev. Furthermore, it is also the smallest gap listed in PECAAN for this gene. The length of the gene for the auto annotated start site is 837 bp, which is reasonable. /note=Phamerator: As of April 7, 2022, this gene belongs to pham 97248. This gene is conserved in Adolin, Elezi, and Eraser in cluster AZ. Cluster AZ is also the cluster to which TforTroy belongs. /note=Starterator: 34 non-draft annotations exist for this pham. 22/34 non-draft annotations call start site 22. This start site corresponds to base pair coordinate 32506 in TforTroy. This agrees with the start sites called by Genemark and Glimmer. /note=Location call: Previous evidence suggests this gene is real and has a start site at base pair position 32506. Starterator agrees with Genemark and Glimmer. /note=Function call: The function of this gene is DNA binding protein. The top 6 non-draft hits in PhagesDB BLAST have the function DNA binding protein and E values ranging from e-113 to e-111. The top 5 hits in NCBI BLAST have the function DNA binding protein. These hits have E-values ranging from 7e-138 to 3e-116, coverage of 100%, and percent identity ranging from 66.43% to 74.19%. CDD returned a hit regarding a domain for an RNA polymerase sigma factor with an E value of 2.06e-4. Sorted by E-value, the top 4 hits in HHpred were about an RNA polymerase sigma factor. These hits had E-values ranging from 1.4e-26 to 2.8e-25, coverage ranging from 96.04% to 96.40%, and probability at 100%. /note=Transmembrane domains: TMHMM and TOPCONS do not show any TMDs for this gene. Therefore, this gene is not a membrane protein. /note=Secondary Annotator Name: Rodriguez, Sean /note=Secondary Annotator QC: I agree with this annotation and functional call. Please include the length of the gene with the putative start site and comment on whether this length is reasonable. Also, mention a couple of specific phages where the gap is conserved. Mention the function calls from the phams database (there are two different, but related function calls that occur within AZ phages). Please elaborate on the probability and coverage as a whole for the HHpred hits. This is a minor detail, but I checked the NCBI BLAST hits and couldn`t find those % identity values? Have they changed since you started this annotation? CDS 33428 - 33973 /gene="47" /product="gp47" /function="helix-turn-helix DNA binding domain" /locus tag="TforTroy_47" /note=Original Glimmer call @bp 33428 has strength 13.72; Genemark calls start at 33428 /note=SSC: 33428-33973 CP: no SCS: both ST: SS BLAST-Start: [helix-turn-helix DNA binding domain protein [Arthrobacter phage Cassia]],,NCBI, q55:s3 69.6133% 4.3038E-66 GAP: 85 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.722, -5.160958035523474, no F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding domain protein [Arthrobacter phage Cassia]],,WGH21118,87.5969,4.3038E-66 SIF-HHPRED: HTH_Tnp_IS630 ; Transposase,,,PF01710.19,51.3812,98.8 SIF-Syn: helix-turn-helix dna binding domain, this gene is an orpham meaning that there is only one of those genes. This gene currently does not show any synteny as it doesn`t show up in other phages in the same location. /note=HHpred contains requisite HTH secondary structure -AF /note=Primary Annotator Name: Pham, Britney /note=Auto-annotation:Glimmer and GeneMark, Start site called to be 33428, with the start codon being ATG /note=Coding Potential: GeneMark Self and GeneMark Host showed that the gene does have good coding potential at the second open reading frame. There seems to be high typical coding potential indicating that this has the potential to be a real gene. Potential seems to decrease by the stop site. /note=SD (Final) Score: The gene had a Final score of -5.161 which is not the highest (least negative) final score of all the possible stop sites. Z-score of 1.722 is also not the highest score of all the possible start sites. The start site with the highest final score is 33764 with a final score of -3.292 and a z score of 2.603. /note=Gap/overlap: There is a gap of 85 bp, which is the highest reasonable gap even though it is over the 50 bp gap maximum. It doesn’t seem like another gene can be added upstream to the gene. There is another possible stop site but this stop site has a gap of 421bp which is quite large and can have another gene added. /note=Phamerator: Gene is found in pham number 102968 as of 4/7/2022, and is the only member in the pham. This seems like the gene is a singleton and is not found in other clusters. /note=Starterator: This gene seems like a singleton and is not called up on Starterator. /note=Location call:The start site seems to be 33764 with a stop site of 33973. There was another possible start site of 33428 however it did not have as high coding potential and also had a less ideal Z-score and final score. Because of this, 33764 seems to be the more favored start site. /note=Function call: helix-turn-helix DNA binding domain protein, from both NCBI and phagesDB, the function of the gene has been determined through strong hits on phagesDB Iter (2e-24) and Phives_50 (9e-24). On NCBI, there were hits with Warda (e-value 1e-27) and Tbone (e-value 1e-25) with the same function of helix-turn-helix DNA binding domain protein. The function frequency table shows that the main function called is helix-turn-helix DNA binding domain protein, meaning other manual annotators agreed on the function. CDD had 0 hits and HHpred had 250 hits calling Helix-turn-helix transcriptional regulator (0.00063) and DNA-binding response regulator. Overall, the function is deemed as helix-turn-helix DNA binding domain protein even if HH pred does not show good evidence. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: West, Julie /note=Secondary Annotator QC: I have reviewed the evidence and agree with the primary annotator. CDS 34022 - 34618 /gene="48" /product="gp48" /function="SprT-like protease" /locus tag="TforTroy_48" /note=Genemark calls start at 34022 /note=SSC: 34022-34618 CP: yes SCS: genemark ST: SS BLAST-Start: [SprT-like domain-containing protein [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 4.98369E-139 GAP: 48 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.882, -2.707694805492614, yes F: SprT-like protease SIF-BLAST: ,,[SprT-like domain-containing protein [Arthrobacter phage Yang] ],,YP_009815666,97.9798,4.98369E-139 SIF-HHPRED: SprT-like domain-containing protein Spartan; DPC repair protease, DNA BINDING PROTEIN; HET: ADP, MLZ, FLC; 1.5A {Homo sapiens},,,6MDW_A,49.4949,99.7 SIF-Syn: /note=Primary Annotator Name: Scavetti, Alexa /note=Auto-annotation: GeneMark only. Start site 34022 was called with start codon ATG. /note=Coding Potential: Yes. Both Self- and Host-Trained GeneMark show coding potential in the forward direction in the region from 34022 bp to 34618 bp. The chosen start site of 34022 does cover all of the coding potential for this region. /note=SD (Final) Score: -2.708 with a Z-value of 2.882. This is the best possible SD score and Z-value according to PECAAN. /note=Gap/overlap: There is a reasonable 48 bp gap between this gene and the upstream gene, creating the longest possible ORF and not exceeding the 50 bp gap limit. This gene is also a reasonable length (597 bp). /note=Phamerator: Pham number 19450 as of 4/8/2022. Conserved in other AZ cluster phages, including Eraser and DrManhattan. Function call is SprT-like protease, which is consistent between Phamerator and the phams database and is included on the approved SEA-PHAGES list. /note=Starterator: Start site 40 is the most annotated start for pham 19450, and is conserved among 29/58 non-draft genomes in the pham. This start site corresponds to position 34022 in TforTroy and was called by GeneMark. /note=Location call: Based on the evidence, this is a real gene and the most likely start site is 34022. Starterator agrees with GeneMark, and this site has the best SD and Z-score. /note=Function call: The top two non-draft hits on both PhagesDB and NCBI BLASTp, sorted by e-value, suggested the function is SprT-like protease, with high query coverage (100%), high % identity (>92%), and low e-values (<1e-105). HHpred and CDD also called SprT-like protease with high probability (>99%) and low e-values (<1e-4). /note=Transmembrane domains: Neither TMHMM nor TOPCONS predict any TMDs, so it is not a membrane protein. This agrees with the functional call of SprT-like protease, which is not associated with the membrane. /note=Secondary Annotator Name: Chang, Julia /note=Secondary Annotator QC: I agree with the primary annotator. CDS 34740 - 35558 /gene="49" /product="gp49" /function="hypothetical protein" /locus tag="TforTroy_49" /note=Genemark calls start at 34740 /note=SSC: 34740-35558 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein SEA_CASSIA_47 [Arthrobacter phage Cassia]],,NCBI, q1:s1 100.0% 0.0 GAP: 121 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.679, -3.133663537764895, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CASSIA_47 [Arthrobacter phage Cassia]],,WGH21120,99.6324,0.0 SIF-HHPRED: SIF-Syn: Upstream is SprT- like protease (pham 19450) and downstream is membrane protein (pham 78437) as observed with phage Crewmate. This gene also displays overall synteny with phage Crewmater but not with other phages in the AZ cluster. /note=Primary Annotator Name: Vazquez, Gilda /note=Auto-annotation: Glimmer did not display a called start site. GeneMark displayed 34740. /note=Coding Potential: This gene does display coding potential within the ORF. Coding potential is observed in between the gene coordinates in the third direct sequence. /note=SD (Final) Score: The best final score for the called gene is -3.134 with a Z-score of 2.679. These are the best values observed for this gene. /note=Gap/overlap: There is a 145bp gap observed upstream of the gene. This is over normal gene gap values of 50bp, and should be noted. Although other phages such as Crewmate display synteny, so no further genes may need to be added. /note=Phamerator: The gene (52) is found in pham 102975 as of 04/07/2022. It is conserved in other members within the AZ cluster, such as Crewmate but not observed in other phages. /note=Starterator: A reasonable start site number is 7 at 34740, called by 3/8 non-draft genes. It was observed in 7/12 genes in pham and called 100% of the time. This is the same start site called by GeneMark. /note=Location call: Bases on collected values and data above including coding potential, this is likely a real gene. It may be a functional gene and the start site does not need to be changed. /note=Function call: Phagesdb BLASTp displayed products with function unknown. The e-value of the first non-draft phage is listed e-134 (function unknown) and 3e-96 (function unknown). 59% and 58% identities in evidence hits, respectively. NCBI BLASTs products were listed as hypothetical proteins, with e-values such as 2e-113 & 4e-75. The greatest identities percentage was 89.71% and decreasing after. HHPRED displayed the greatest probability value at 89.49 for hypothetical protein. CDD did not display information. Data support that the function of this gene is NKF. /note=Transmembrane domains: This is likely a real gene, set with NKF, the absence of both TMDs and TOPCON images support that the function of this protein is unknown. With 0 TMDs and TMHMM display 0 TMHs. thus it is also not a membrane protein. /note=Secondary Annotator Name: Mao, Xuanting /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator. (Don`t forget to complete your synteny box!) CDS 35647 - 37065 /gene="50" /product="gp50" /function="serine integrase" /locus tag="TforTroy_50" /note=Original Glimmer call @bp 35644 has strength 10.44; Genemark calls start at 35644 /note=SSC: 35647-37065 CP: no SCS: both-cs ST: NI BLAST-Start: [integrase [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 0.0 GAP: 88 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.8, -2.8970853586582503, no F: serine integrase SIF-BLAST: ,,[integrase [Arthrobacter phage Yang] ],,YP_009815667,98.0932,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Fields, Brooke /note=Auto-annotation: Both glimmer and Genemark call for this gene at start site 35644 /note=Coding Potential: Both self and host genemark maps display substantial coding potential at this start site /note=SD (Final) Score: Its final score value is close to the optimal final score (-3.867) with a z-score of 2.8 /note=Gap/overlap: with a gap of 31 bp which makes sense because the pham map suggest an insertion of a new gene after this one /note=Phamerator: this gene is in pham 78437 out of 539 members. Phamerator did not assign function. Clusters A and AZ dominate pham. Compared it to phage Warda (serine integrase, AZ cluster, 1419 bp) and phage Niobe (serine integrase, AZ cluster, 1422 bp) /note=Starterator: this gene did not call for the most annotated start site. start site 35644 was Found in 22 of 539 ( 4.1% ) of genes in pham • No Manual Annotations of this start /note=Location call: Given the substantial evidence (z-score, coding potential) we can conclude 35644 is the true start site /note=Function call: The top 5 NCBI Blast hits show significant evidence that the function of gene is “serine integrase”. The CDD hit for recombinase and shows relation between the recombinase and serine recombinase superfamilies. HHpred also displayed hits of /note=Transmembrane domains: Neither TMHMM or TOPCONS predicted any TMD’s therefore it is not a membrane protein. /note=Secondary Annotator Name: Saha, Atul /note=Secondary Annotator QC: Finish up function call section. Select PhagesDB function BLAST evidence. CDS 37334 - 37600 /gene="51" /product="gp51" /function="hypothetical protein" /locus tag="TforTroy_51" /note=Genemark calls start at 37334 /note=SSC: 37334-37600 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein HOU52_gp50 [Arthrobacter phage Yang] ],,NCBI, q1:s1 97.7273% 1.05178E-27 GAP: 268 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.978, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU52_gp50 [Arthrobacter phage Yang] ],,YP_009815668,82.0225,1.05178E-27 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Mendoza, Alleana /note=Auto-annotation: GeneMark. Only GeneMark calls the gene at start site 37334 with start codon GTG. /note=Coding Potential: The ORF does have reasonable coding potential and the chosen start site does include all of the coding potential. /note=SD (Final) Score: -2.505. It is the only gene candidate for this gene. Therefore, it is the best final score on PECAAN. /note=Gap/overlap: 268 bp gap. This is a large gap. However, there is no coding potential, implying that a gene should be added upstream to fill the gap. Therefore, I do not believe a gene should be added. This gap is also conserved in other phages. /note=Phamerator: Pham: 102793. Date: 4/8/2022. It is conserved and found in Amyev and Yang. /note=Starterator: Start site 8 in Starterator was manually annotated in 21/26 non-draft genes in this pham. Start 8 is 37334 in TforTroy. This evidence agrees with the start site predicted by GeneMark. /note=Location call: Based on the data presented above, this is a real gene with 37334 as the most likely start site. /note=Function call: The top three PhagesDB BLASTp hits all have unknown functions (E-value 80%). The two top NCBI BLAST hits called an Arthrobacter membrane protein (> 96% coverage, > 87% identity, e value < 3.45e-19). HHpred and CDD had no relevant hits. /note=Transmembrane domains: TMHMM predicts one TMD and TOPCONS also predicts one TMD. Thus, this gene can be assumed to have a real TMD and can be considered as a membrane protein. No other available evidence provides a more specific function. /note=Secondary Annotator Name: Sun, Xingzheng /note=Secondary Annotator QC: I agree with the location and functional call in this annotation. All of the evidence has been properly considered. CDS 42180 - 42344 /gene="65" /product="gp65" /function="hypothetical protein" /locus tag="TforTroy_65" /note=Genemark calls start at 42180 /note=SSC: 42180-42344 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein PQD88_gp64 [Arthrobacter phage Amyev] ],,NCBI, q1:s1 92.5926% 1.06021E-24 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.882, -2.707694805492614, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQD88_gp64 [Arthrobacter phage Amyev] ],,YP_010677767,86.7924,1.06021E-24 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chang, Julia /note=Auto-annotation: GeneMark called the start site at 42180bp with codon ATG. Glimmer did not call a start site. /note=Coding Potential: There is high coding potential found both on self and host within GeneMark. Coding potential is on the forward strand only, suggesting that it is a forward gene. The ORF covers all of the coding potential which suggests that this is a real gene. /note=SD (Final) Score: The final score is best at -2.708 and the Z-score is best at 2.882, both meeting the threshold for reasonable final and z-scores of a real gene. /note=Gap/overlap: There is an overlap of 8bp with gene (stop@42187 F) downstream of the ORF. The overlap has synteny with other non-draft phage genomes such as Adolin and and Elezi. /note=Phamerator: The Pham number as of 04/01/2022 is 17297. This gene is also conserved in phages Iter and Janeemi which are in the same subcluster AZ. /note=Starterator: Start site number 9 is conserved among other members - such as in JohnDoe and Nitro - of this Pham at position 37830bp. 35 of the 36 non-draft phage annotations called this start site, along with 23 out of 24 manual annotations. /note=Location call: Based on the gathered evidence, this is a real gene because of its full coding potential and being conserved in both Pharmerator and Starterator. /note=Function call: NKF. There were no hits for phagesDB BLAST hits. The top two hits for NCBI BLAST hits had small e-values ranging from 9e-22 to 8e-25, but all hits had unknown functions. CDD and HHpred had no significant hits. /note=Transmembrane domains: Since neither TMHMM nor TOPCONS predicted any TMDs, it can be assumed that this gene is not a membrane protein. /note=Secondary Annotator Name: Pham, Britney /note=Secondary Annotator QC: I agree with the start site and the function called. Synteny box does need to be filled out CDS 42341 - 42454 /gene="66" /product="gp66" /function="membrane protein" /locus tag="TforTroy_66" /note=Original Glimmer call @bp 42341 has strength 13.83; Genemark calls start at 42341 /note=SSC: 42341-42454 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Amyev] ],,NCBI, q1:s1 100.0% 7.34627E-11 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.978, -2.583959800616441, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Amyev] ],,YP_010677768,91.8919,7.34627E-11 SIF-HHPRED: SIF-Syn: Membrane protein of unknown function, upstream gene is pham 17297, downstream is an endonuclease (pham 16105), just like in phage Crewmate. /note=Primary Annotator Name: Hatashita, Anthony /note=Auto-annotation: Both Glimmer and GeneMark call the same start site of 42341. The start codon called is GTG. /note=Coding Potential: The gene has strong coding potential within the putative ORF and the chosen start site covers all of the coding potential. /note=SD (Final) Score: The SD score is the best. The Z-score is higher than the threshold of 2 and the final score is the least negative score option. /note=Gap/overlap: The gap with the upstream gene is reasonable. The gap of -4 indicates that the gene may be part of an operon. The length of the gene is also reasonable. /note=Phamerator: The gene is found in pham 14469 as of 4/5/2022. The pham is conserved in other members of the cluster, including phages Amyev, Adumb2043, and Crewmate. There was no function called for the gene in Phamerator or the phams database. /note=Starterator: There is a reasonable start site choice conserved among members of the pham, start site 6. It coordinates to base pair 42341 in TforTroy. /note=Location call: Altogether, the information suggest that this is a real gene as it is conserved in phamerator and has good coding potential. Further, the potential start site that is most likely is the suggested start site which is conserved in starterator and covers all coding potential. /note=Function call: The top NCBI BLASTp hit, sorted by E-value, suggested function is membrane protein, with high query coverage (100%), high % identity (78.3784%), and a low E-value of 5.45686e-11. /note=Transmembrane domains: TMHMM predicts one TMH and TOPCONS also predicts one TMH, so the protein is a membrane protein, while its exact function is unknown. /note=Secondary Annotator Name: Scavetti, Alexa /note=Secondary Annotator QC: I have QC`d this annotation and I agree with the primary annotator. Don`t forget to fill out the Starterator drop-down menu and check off evidence in the boxes below! CDS 42451 - 42801 /gene="67" /product="gp67" /function="hypothetical protein" /locus tag="TforTroy_67" /note=Original Glimmer call @bp 42499 has strength 9.38; Genemark calls start at 42499 /note=SSC: 42451-42801 CP: yes SCS: both-cs ST: SS BLAST-Start: [HNH endonuclease [Arthrobacter phage DrManhattan] ],,NCBI, q5:s3 93.1034% 1.04283E-37 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.05, -4.3913063680686815, no F: hypothetical protein SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage DrManhattan] ],,YP_009815413,70.6897,1.04283E-37 SIF-HHPRED: restriction endonuclease PacI; HNH restriction endonuclease, beta-beta-alpha-metal active site, 8 base-pair rare cutter, HYDROLASE-DNA complex; HET: SO4; 1.92A {Pseudomonas alcaligenes},,,3M7K_A,39.6552,83.4 SIF-Syn: /note=AF: leaving as NKF per recent harmonization (not enough evidence to support HNH) /note=Primary Annotator Name: Mao, Xuanting /note=Auto-annotation: Both Glimmer and GeneMark call the gene and they agree on the start site at 42499 bp. The start codon GTG is called here. /note=Coding Potential: The ORF does have reasonable coding potential and the chosen start site (42499)does include all of the coding potentials in which coding potential is found in both self-trained and host-trained GeneMark in the forward frame. /note=SD (Final) Score: SD Score for this is -6.218 and it’s not the best because it’s not the one with the least negative SD Score compared to other start sites. z-value for this start site is 1.247. /note=Gap/overlap: There’s a 44 bp gap for the start site at 42499 bp. But for the start site at 42451, it has a 4 bp overlap which shows synteny with other genes in the Pham map. /note=Phamerator: The Pham number as of March 31, 2022 is 16105. The gene is conserved in 39 phages where 38 of them belong to cluster AZ and 1 is a singleton. /note=Starterator: The pham number as of March 31, 2022 is 16105. The start number called the most often in the published annotations is 18, it was called in 9 of the 26 non-draft genes in the pham. TforTroy is the one that has the most annotated start site but doesn’t call it. The start number for TforTroy is 23 and there’s no manual annotation for this. On the other hand, start number 18 with the start site at 42451 was the one called most in which there are 9 MA’s. /note=Location call: I would consider changing the start site from 42499 to 42451 because 42451 has a 4 bp overlap and other phages like Adolin and Elezi also show this pattern of overlap. Moreover, the start site at 42451 has the highest SD Score (-4.391) among all other start sites and the second highest z-value(2.05). It also covers all coding potentials in both self-trained and host-trained GeneMark in the forward frame. Starterator also shows that this start site is the one chosen by most researchers. /note=Function call: The function for this gene is likely to be HNH endonuclease because a lot of significant hits from NCBI Blast suggest it has HNH endonuclease as its function and the evidence is that there are low e-values (~e-26), high % coverage (~80%), and high % identity (~59%) for these hits. CDD doesn’t have any data for this gene. But Phagesdb Function Frequency suggests that the likelihood this gene has the function of HNH endonuclease is 30% (from cluster AZ). Other final phages like Phives from PhagesDB Blast also suggest that this gene has HNH endonuclease as its function because it has a low e-value (~e-30). HHpred doesn’t have any useful information at this time because the lowest hit has the e-value of 0.057. Overall, other final phages such as Adolin and Adumb2043 from pham map have indicated a high degree of synteny with the function of HNH endonuclease. /note=Transmembrane domains: None. There’s no transmembrane domain shown on TMHMM and TOPCONS in which both of them are called 0 TMD. /note=Secondary Annotator Name: Taylor, Amaya /note=Secondary Annotator QC: I QC`d this gene and agree that the start site needs to be changed. I don`t believe you need to check TMHMM and TOPCONS as evidence since there is no data given. CDS 43098 - 43283 /gene="68" /product="gp68" /function="hypothetical protein" /locus tag="TforTroy_68" /note=Genemark calls start at 43095 /note=SSC: 43098-43283 CP: yes SCS: genemark-cs ST: SS BLAST-Start: [hypothetical protein HOU52_gp67 [Arthrobacter phage Yang] ],,NCBI, q1:s2 100.0% 4.34041E-30 GAP: 296 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.211, -3.560322174045489, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU52_gp67 [Arthrobacter phage Yang] ],,YP_009815685,91.9355,4.34041E-30 SIF-HHPRED: SIF-Syn: There is synteny with other phages in cluster AZ. Compared with phage Crewmate, there is a significant gap upstream and an endonuclease upstream of that gap in both phages. /note=AF: chose SS 19 because it is second of two tandem starts (different from what harmonization selected) /note=Primary Annotator Name: Saha, Atul /note=Auto-annotation: GeneMark calls the start at 43095. Start codon is ATG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: 43095 start site has best Z-score of 2.202 and second highest final score of -2.861. Highest final score of -2.034 is the LORF but a TTG start codon, making it an unlikely start point. /note=Gap/overlap: Start @ 43095 has a gap of 293 bases, which is a large gap but the second lowest of potential start sites. This gap is conserved in other subcluster AZ phages (Elezi, Lego, Dr. Manhattan). There is no coding potential in the gap. /note=Phamerator: pham: 17702. Date 4.6.22. It is conserved; found in Elezi, Lego, DrSierra (all subcluster AZ). /note=Starterator: Pham 17702 has 30 members, 10 of which are drafts. Start site 22 in Starterator was manually annotated in 12 of 27 non-draft genes in this pham. Start 22 is 43095 in TforTroy. This evidence agrees with the site predicted by GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 43095 bp. Starterator agrees with Genemark. /note=Function call: No known function returned from PhagesDB BLAST, NCBI BLAST, HHPRED, or CDD. Top NCBI Blast hits are for Arthrobacter phages Yang, Lizalica, and Crewmate. Identities range from 65-91%, coverage range from 90-100%, and evalue from 1.6e-16 to 4.5e-31. /note=Transmembrane domains: TmHmm & TOPCON do not predict a transmembrane domain. NOT a membrane protein. /note=Secondary Annotator Name: Vazquez, Gilda /note=Secondary Annotator QC: I have QC`d this annotation, and I agree with the primary annotator. Given current PECAAN notes and evidence for this gene, I agree with the location call for this gene. CDS 43283 - 43591 /gene="69" /product="gp69" /function="HNH endonuclease" /locus tag="TforTroy_69" /note= /note=SSC: 43283-43591 CP: no SCS: neither ST: NI BLAST-Start: [HNH endonuclease [Arthrobacter phage DrSierra] ],,NCBI, q1:s1 98.0392% 2.02719E-51 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.491, -3.8156109669835954, no F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage DrSierra] ],,YP_010678391,95.0,2.02719E-51 SIF-HHPRED: HNH endonuclease; Thermophilic bacteriophage, HNH Endonuclease, DNA nicking, HYDROLASE; 1.52A {Geobacillus virus E2},,,5H0M_A,62.7451,97.2 SIF-Syn: /note=added gene -AF