CDS 130 - 558 /gene="1" /product="gp1" /function="terminase, small subunit" /locus tag="Emotion_1" /note=Original Glimmer call @bp 130 has strength 10.9; Genemark calls start at 130 /note=SSC: 130-558 CP: yes SCS: both ST: SS BLAST-Start: [terminase small subunit [Arthrobacter phage Elezi] ],,NCBI, q1:s1 100.0% 2.95461E-58 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.188, -2.2763497933341483, yes F: terminase, small subunit SIF-BLAST: ,,[terminase small subunit [Arthrobacter phage Elezi] ],,QNJ56502,76.6667,2.95461E-58 SIF-HHPRED: Terminase_4 ; Phage terminase, small subunit,,,PF05119.15,47.1831,98.7 SIF-Syn: Terminase, small subunit, no upstream gene, downstream gene is terminase, large subunit, just like in phage Warda. /note=Primary Annotator Name: Chen, Daniel /note=Auto-annotation: Glimmer and Genemark both call the start site as 130. Start codon is ATG. /note=Coding Potential: Coding potential is found both in GeneMark Self and Host. The ORF has reasonable coding potential and the predicted start site covers all of it. /note=SD (Final) Score: SD final score is -2.276. Z-score at 3.188. These are the best scores on PECAAN. /note=Gap/overlap: This is the first gene so there is no overlap with an upstream gene possible. The length of the gene is acceptable. /note=Phamerator: Pham 48777. Date 9/27/22. Although this phage’s cluster is unknown, the pham is conserved in other clusters; found in Warda (AZ), Tweety19 (AZ), Crewmate (AZ), and more. The function called for this gene was terminase small subunit. Function call was consistent and is in the approved functions list. /note=Starterator: Start site 50 was manually annotated in 32/135 non-draft genes in this pham. Start site 50 is 130, which agrees with the start site predicted by Glimmer and GeneMark. Start site 50 was the most annotated for this pham. /note=Location call: Based on above evidence, the gene is real and the start site is most likely at 130. /note=Function call: Predicted function terminase, small subunit. The top 2 Phagesdb BLAST hits have the function of terminase, small subunit (E-value = 5e-49), and the top 3 NCBI BLAST hits also have the function of terminase, small subunit (100% coverage, 66%+ identity, and E-value <10^-57). HHpred has two good hits for terminase, small subunit (98.7% probability, 47% coverage, and E-value of 6.83e-8; 98.5% probability, 57% coverage, E–value of 0.0000015). CDD had no relevant hits. /note=Transmembrane domains: No transmembrane domains predicted. /note=Secondary Annotator Name: Rivera, Wendy /note=Secondary Annotator QC: Need to mention if the length of the gene is reasonable or not; Make note of the updated Starterator data, a different start site was called for Emotion. For function call, no information was mentioned for TOPCONS. Synteny box needs to be corrected: there is an upstream gene but no down stream gene. Also, there is a different Pham #. (This QC comment has been addressed; 10/17/22) CDS 551 - 2290 /gene="2" /product="gp2" /function="terminase, large subunit" /locus tag="Emotion_2" /note=Original Glimmer call @bp 551 has strength 15.58; Genemark calls start at 551 /note=SSC: 551-2290 CP: yes SCS: both ST: NI BLAST-Start: [terminase large subunit [Pseudarthrobacter siccitolerans] ],,NCBI, q20:s7 96.7185% 0.0 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.359, -2.838489286749966, yes F: terminase, large subunit SIF-BLAST: ,,[terminase large subunit [Pseudarthrobacter siccitolerans] ],,WP_200900743,84.127,0.0 SIF-HHPRED: Terminase large subunit; genome packaging, bacteriophage, ATPase, nuclease, VIRAL PROTEIN; HET: BR; 2.2A {Enterobacteria phage HK97},,,6Z6D_A,91.5371,100.0 SIF-Syn: Terminase, large subunit, upstream gene is terminase small subunit, downstream gene is portal protein just like in phage Warda /note=Primary Annotator Name: Han, Maggie /note=Auto-annotation: Glimmer and Genemark: 551 Start Codon: ATG /note=Coding Potential: Reasonable coding potential in putative ORF. Suggested start site covers all of coding potential /note=SD (Final) Score: SD score is the best. The final score of -2.838 is the least negative and the Z-score of 3.359 is a good z score /note=Gap/overlap: Overlap of 8 bp. Other start sites produce a longer ORF but have a worse Z- score and larger overlap. /note=Phamerator: The gene is in pham 51380 as of 10/16/22. The pham is conserved among cluster A which is not the cluster Emotion is in. Some phages used for comparison were 40AC, Abbyshoes, and AbbysRanger. The function commonly called is terminase large subunit /note=Starterator: There is not a reasonable start site choice conserved among the members of the pham. The most common start site is 56 found in 599 of the 1157 non draft phages in the pham. Emotion does not contain this start site. Emotion calls site 43 at 511 which is only found in 3 genes. /note=Location call: The gene is a real gene as it is conserved in phamerator and has good coding potential. 551 is a likely potential start site as it has a good RBS as well as Z-score. The start site is likely as it produces the longest ORF with a minimized gap. /note=Function call: Terminase, large subunit. PhagesDB Blast showed that the function called was terminase, large subunit with relatively high identities of greater than 75% and low e-values of 0. NCBI Blast also calls the protein as terminase, large subunit with high identities of greater than 75%, high coverage of greater than 95%, and low e-values of 0, HHPred top 5 queries showed that the function called was large terminase with high probability of 100%, high coverage of greater than 85%, and low e values of less than 10^-31. CDD had one hit with low identity of, high coverage of 70%, and low e-value of 2.99*10^-8. /note=Transmembrane domains: No transmembrane domains called. /note=Secondary Annotator Name: Chen, Daniel /note=Secondary Annotator QC: I agree with most of the annotation, but the Starterator section needs some editing. CDS 2309 - 3673 /gene="3" /product="gp3" /function="portal protein" /locus tag="Emotion_3" /note=Original Glimmer call @bp 2309 has strength 22.23; Genemark calls start at 2309 /note=SSC: 2309-3673 CP: yes SCS: both ST: NI BLAST-Start: [portal protein [Arthrobacter phage Tweety19]],,NCBI, q4:s3 96.696% 0.0 GAP: 18 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.348, -2.4811409279978642, yes F: portal protein SIF-BLAST: ,,[portal protein [Arthrobacter phage Tweety19]],,QNO12665,85.6828,0.0 SIF-HHPRED: PORTAL PROTEIN; BACTERIOPHAGE SPP1, DNA TRANSLOCATION, MOLECULAR MOTOR, VIRAL PORTAL PROTEIN, VIRAL PROTEIN; HET: CA, HG; 3.4A {BACTERIOPHAGE SPP1},,,2JES_E,92.2907,100.0 SIF-Syn: Portal protein. Terminase small subunit and terminase large subunit genes are upstream. Downstream are scaffolding and major capsid protein, but there are 2 genes between gene (stop@3673 F) and the scaffolding protein that do not have a match in phages Warda (AZ) and Tbone (AZ). Otherwise, Warda and Tbone have the same gene function order upstream and downstream. /note=Primary Annotator Name: Chew, Brandon /note=Auto-annotation: Both Glimmer and GeneMark were used to autoannotate the gene and agreed on an ATG start site at 2309. /note=Coding Potential: Very strong coding potential from ~2320 to ~3640 with few dips in between. /note=SD (Final) Score: -2.481. It is the best on PECAAN. /note=Gap/overlap: 18 bp gap (without coding potential), a reasonable gap and ORF length /note=Phamerator: Pham 48570 (accessed 9/27/22); highly conserved across several clusters, prominently clusters A and AZ. 1544 hits in total, the large majority of which call portal protein function, indicating strong evidence this is a real gene. /note=Starterator: Not informative, since most annotated start site is not present and most likely start site is present only twice in the entire pham, and not called in the other gene. Evidence still agrees with the site auto-annotated by Glimmer and GeneMark. /note=Location call: Coding potential, synteny, overlap and spacing, and gene length indicate this is a real gene. Matching GeneMark and Glimmer start site auto-annotation, along with start codon identity, lack of large gaps before start site, coverage of coding potential and favorable RBS and Z-score indicate 2309 is the correct start site. /note=Function call: The top 3 NCBI BLASTp hits, sorted by E-value, suggested portal protein function. These hits had high query coverage (>96%), moderately high %identity (>71%), and low E-values (0). The top 6 PhagesDB Blastp non-draft hits by E-value, also suggested function as a portal protein. These hits had moderately high positive coverage (>87%), moderately high %identity (>73%), and low E-values (0). CDD yielded 1 hit with low E-value (2e-43) and decent coverage (87%) for portal protein in the Gp6 family. However, the identity and coverage is quite low (19% and 35%), however, it matches the portal protein function called by Phamerator, NCBI BLASTp, and HHpred. HHpred yielded 4 hits with strong probability (>99.8%), coverage (>91%) and E-value (>4e-17) for portal protein functionality. /note=Transmembrane domains: TMHMM did not predict any TMDs, therefore it is highly unlikely to be a membrane protein. /note=Secondary Annotator Name: Han, Maggie /note=Secondary Annotator QC: pham changed to 50040 CDS 3677 - 4474 /gene="4" /product="gp4" /function="hypothetical protein" /locus tag="Emotion_4" /note=Original Glimmer call @bp 3677 has strength 15.17; Genemark calls start at 3677 /note=SSC: 3677-4474 CP: yes SCS: both ST: SS BLAST-Start: [MuF-like minor capsid protein [Arthrobacter phage Tweety19]],,NCBI, q1:s1 84.5283% 1.01838E-106 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.945, -2.7653702713535186, yes F: hypothetical protein SIF-BLAST: ,,[MuF-like minor capsid protein [Arthrobacter phage Tweety19]],,QNO12666,72.1569,1.01838E-106 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Vu, Trinity /note=Auto-annotation: Both Glimmer and GeneMark were used and call the start site 3677. The start codon called is ATG /note=Coding Potential: There’s good coding potential within the putative ORF as the chosen start site covers the potential. Coding potential is found in both GeneMark Self and Host only on the forward strand indicating this is a forward gene. /note=SD (Final) Score: -2.765. This is the best SD score on PECAAN. /note=Gap/overlap: There is an upstream gap of 3bp. Other alternatives were not chosen as they result in a gap greater than 200 bp (with one greater than 80 bp). The length is acceptable (>120 basepairs). /note=Phamerator: Pham 47725 as of 9/28/22. While the cluster of emotion is unknown, the pham is conserved in multiple clusters such as the AZ cluster in members Liebe and Maureen. Of the phages with this gene that call a gene function, they all call the MuF-like minor capsid protein function meaning the function called is consistent. However, this function corresponds to “hypothetical protein” on the approved function list. /note=Starterator: Start site #5 is most reasonable as it’s the most annotated site in the 11 member pham, called in 3/8 non-draft genomes and corresponds to basepair 3677. /note=Location call: This is a real gene and the most likely start site is 3677 as there is good coding potential and it is conserved in phamerator. Start site #5 is most likely as it’s the most annotated site in the pham and called in ⅜ non-draft genomes. /note=Function call: NKF. The top 3 PhagesDB BLAST hits (E-value <10^-44) and the top NCBI BLAST hit call the MuF-like minor capsid protein function which corresponds to the function “hypothetical protein.” HHPRED did not provide strong evidence to function due to poor E-value and % coverage of hits. CDD hit is not informative due to very low % identity and coverage in addition to poor E-value. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Chew, Brandon /note=Secondary Annotator QC: needs attention: Check to see if draft genes are informative in Starterator report. If not, please remove from Starterator call. Also, check capitalization on PhagesDB. [QC comments addressed 10/17/22] /note=**Note** Strong evidence to call the function MuF-like minor capsid protein but this function is no longer being called as it was recently removed from the SEA-phages approved function list (April 2022 forum). Below program evidence was checked as they indicate the function MuF-like minor capsid protein that is no longer called therefore supporting calling NKF. CDS 4555 - 5112 /gene="5" /product="gp5" /function="scaffolding protein" /locus tag="Emotion_5" /note=Original Glimmer call @bp 4555 has strength 22.01; Genemark calls start at 4555 /note=SSC: 4555-5112 CP: yes SCS: both ST: NA BLAST-Start: [scaffolding protein [Arthrobacter phage Powerpuff] ],,NCBI, q1:s1 97.2973% 7.03907E-42 GAP: 80 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.027, -2.66371296573402, yes F: scaffolding protein SIF-BLAST: ,,[scaffolding protein [Arthrobacter phage Powerpuff] ],,QGZ17304,65.9218,7.03907E-42 SIF-HHPRED: Scaffold protein; major capsid protein, HK97-like fold, scaffolding protein, procapsid, VIRUS; 3.72A {Staphylococcus phage 80alpha},,,6B0X_g,45.9459,96.9 SIF-Syn: Scaffolding protein, upstream gene has NKF, downstream gene is major capsid protein, just like in phage Adolin. /note=Primary Annotator Name: Hsu, Norman /note=Auto-annotation: Glimmer and GeneMark, Both call the start site at 4555, ATG /note=Coding Potential: Yes, coding potential in this sequence is on the forward strand. This coding potential is found on both Host trained and Self trained Genemark. /note=SD (Final) Score: -2.664. This is the only entry on Pecaan. /note=Gap/overlap: 81bps, the gap is smaller than 120 bps and conserved in several other phages. The length of gene is 557 bp. /note=Phamerator: The Pham number is 1850; Date: 9/28/2022. All members of pham call for the function of a scaffolding protein, indicating the protein is conserved. This gene is conserved in Adolin, Adumb2043, and Amyev that are all in cluster AZ. /note=Starterator: There are 37 non draft member of this pham, and 34 out of 37 member call for start site #16. It is the most conserved start site among the members within the pham, which correlates to the start site at 4555 in Emotion. /note=Location call: Based on all evidence, the most likely start site is @4555 /note=Function call: Scaffolding protein. The top 3 non draft phagesdb blast hits have the function of scaffolding proteins (E-value< 10^-40), and 5 out of 5 top NCBI Blast hits also have the function of a scaffolding protein (Coverage >97%, and E-value< 10^-40 ). HHpred has a hit for scaffold protein albeit having an E-value of 0.081 considering all other evidence. CDD had no relevant hits. /note=Transmembrane domains: both TmHmm and Topcons display no TMDs /note=Secondary Annotator Name: Vu, Trinity /note=Secondary Annotator QC: would be helpful to note all pham members call function (conserved), everything else looks good and i agree! CDS 5160 - 6125 /gene="6" /product="gp6" /function="major capsid protein" /locus tag="Emotion_6" /note=Original Glimmer call @bp 5160 has strength 20.61; Genemark calls start at 5160 /note=SSC: 5160-6125 CP: yes SCS: both ST: SS BLAST-Start: [major capsid protein [Arthrobacter phage Liebe] ],,NCBI, q1:s1 100.0% 0.0 GAP: 47 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.798, -3.123974469270304, yes F: major capsid protein SIF-BLAST: ,,[major capsid protein [Arthrobacter phage Liebe] ],,YP_009817038,87.8882,0.0 SIF-HHPRED: Major capsid protein; P22 Bacteriophage, VIRUS; 3.3A {Salmonella phage P22},,,5UU5_D,93.4579,100.0 SIF-Syn: Major capsid protein, upstream gene is scaffolding protein, downstream is head to tail adaptor, just like in phage Adolin (AZ). /note=Primary Annotator Name: Chang, Stacy /note=Auto-annotation: Glimmer and GeneMark agree on start at 5160. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, agreeing with the auto-annotation call. Start site 5160 covers all the coding potential, which is apparent in both GeneMark Self and Host. /note=SD (Final) Score: -3.124, which is the best final score on PECAAN. The Z score is also the highest at 2.798. /note=Gap/overlap: Gap is small at 48bp. There is no overlap and adjoining genes are also in the forward direction. /note=Phamerator: 228, which has a function call of a major capsid protein. Pham 228 is a well conserved pham with 248 members across actinobacteriophage clusters BD, L, AZ, DZ, M, EH, V, BQ, and DU. /note=Starterator: Start site 5 (5160) is called 100% of the time when present in pham 228. 43 of 224 non-draft genes have called this start site. Phages with this called start site are primarily of cluster AZ, but also may be of EH, BQ, BD6, or are singletons. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 5160. /note=Function call: Major Capsid Protein. Phagesdb BLAST hits have very low e values (1e-145) and solely call major capsid protein or “function unknown” for draft genomes. HHPRED also calls major capsid protein with high probability (100 or 99.9%), high coverage (up to 93%), and low e-value (1.2e-26). CDD calls pfam11561 (e-value 7.49e-15) which is a coat protein for phage P22, which is related in function to the major capsid protein. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Hsu, Norman /note=Secondary Annotator QC: I agree with this annotation, all the categories have been considered. CDS 6195 - 6596 /gene="7" /product="gp7" /function="head-to-tail adaptor" /locus tag="Emotion_7" /note=Original Glimmer call @bp 6195 has strength 16.88; Genemark calls start at 6195 /note=SSC: 6195-6596 CP: yes SCS: both ST: SS BLAST-Start: [head-to-tail adaptor [Arthrobacter phage Tweety19]],,NCBI, q1:s1 98.4962% 2.34049E-67 GAP: 69 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.188, -2.338663114094478, yes F: head-to-tail adaptor SIF-BLAST: ,,[head-to-tail adaptor [Arthrobacter phage Tweety19]],,QNO12669,88.6364,2.34049E-67 SIF-HHPRED: 15 PROTEIN; VIRAL PROTEIN, VIRAL INFECTION, TAILED BACTERIOPHAGE, SIPHOVIRIDAE, SPP1, VIRAL ASSEMBLY, HEAD-TO-TAIL INTERFACE, DNA GATEKEEPER, ALLOSTERIC MECHANISM; 7.2A {BACILLUS PHAGE SPP1},,,5A21_C,81.203,99.0 SIF-Syn: The region with Gene (stop@ 6596 F) has strong synteny with genomes Adolin and Adumb2043. /note=Primary Annotator Name: De Jesus, Jorja /note=Auto-annotation: Glimmer Start@6195, GeneMark Start@6195, start codon sequence: ATG /note=Coding Potential: Both the start and stop site that form the ORF are in the right locations. There is high coding potential covering the ORF. /note=SD (Final) Score: The SD final score is -2.339. This is the only provided final score for this gene. /note=Gap/overlap: There is a gap of 69 bp between this gene and the downstream gene and a space of 11 between this gene and the upstream gene. /note=Phamerator: As of 09/23/22 this gene was a part of pham 48979, which has 64 members total and 17 of those members are draft genes. Emotion has strong synteny with most, if not all, genomes that call start 7 for this region containing Gene (stop@6596 F). Among the final drafts that have synteny are Adolin, DrManhattan, and YesChef. JohnDoe, Cassia, and Tutumahutu are some of the drafts that have synteny in this region. It should be noted that while there is synteny, the genes are not completely matched; for example, the gene 7 of Emotion is not the counterpart of gene 7 on Adolin and instead is the counterpart of gene 8 on Adolin. On all of these that call start 7, the counterparts are slightly shifted. /note=Starterator: In pham 48979, the most annotated start is start 7, which was called by more than half of the genes at 36 out of 64 genes. It also has the most manual annotations of any of the other starts, making up 24 of 47 manual annotations. For Emotion, start 7 is at 6195 which matches the auto generated start and Glimmer start provided in PECAAN. /note=Location call: Glimmer and Starterator all indicate that 6195 is the start of this gene. PECAAN also only provides one suggested candidate for this gene, and it has this start as well. So it is highly likely that 6195 is the start of this gene. /note=Function call: Head to tail adaptor. All the results for Phagesdb function call identify this gene as a head-to-tail adaptor. For Phagesdb Blast, almost all of the results call this gene as a head-to-tail adaptor as well. This includes the top hits Janeemi which has an low e-value at 2e-54 and Lizalica which also has an e-value of 2e-54. The top hits on HHpred do not call this gene as a head-to-tail adaptor. The top results on HHpred have high probability of at least 99%, high coverage of at least 81%, low e-values at 6.3e-10 and 1.9e-8. These hits suggest that this gene is for a hypothetical protein and that it is a viral protein that might be involved with the head and the tail. NCBI blast has top hits that are at least 80% aligned, have 90% coverage, and low e-values below 2.3e-67. One of the hits indicates that this is a hypothetical protein, but two other of the top hits identify this as a head-to-tail adaptor. According to CDD there are no conserved domains for this gene. Because the majority of these databases indicate that this is a head-to-tail adaptor, then function is most likely a head-to-tail adaptor. /note=Transmembrane domains: TmHMM and Topcons do not have any hits for this gene, so it is not likely that it is a transmembrane domain. /note=Secondary Annotator Name: Chang, Stacy /note=Secondary Annotator QC: I agree with the location and function annotation of this gene. I believe the hypothetical protein hits on NCBI BLAST and HHPRED should be unchecked becuase they do not contribute to the function call. Also, a note on your Phamerator section: Gene number doesn`t matter for determining synteny, only upstream and downstream gene function calls matter (great job giving detailed descriptions of those in the synteny box!). CDS 6596 - 6736 /gene="8" /product="gp8" /function="hypothetical protein" /locus tag="Emotion_8" /note=Original Glimmer call @bp 6596 has strength 16.96; Genemark calls start at 6596 /note=SSC: 6596-6736 CP: yes SCS: both ST: NI BLAST-Start: GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.588, -3.5660483139923818, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: NKF, upstream gene is head to tail stopper, downstream is head to tail adaptor, just like in phage Warda. /note=Primary Annotator Name: Rivera, Wendy /note=Auto-annotation: Gene(stop@6736 F) Both Genemark and Glimmer called the start site of 6596. Start codon is ATG. /note=Coding Potential: The gene does have reasonable coding potential within the ORF and does include the listed start site of 6596. /note=SD (Final) Score: SD (Final) score of -3.566 and a Z score of 2.588 at start site 6596, which is not the best score in comparison to the other potential start site. But start site 6596 was chosen since it has better coding potential within the ORF, in comparison to the other candidate starts. /note=Gap/overlap: The gene has a reasonable 1 bp overlap with the upstream gene. The length of the gene is 141 bp given the start and stop site. The candidate start site with the longest ORF was chosen as the best start site. /note=Phamerator: As of 9/27, the gene is found in Pham 48205. No other phage from a different or same cluster/subcluster was found under the same pham #. No function was called for this gene under this pham number. /note=Starterator: As of 9/27, there is no Starterator data for this gene. /note=Location call: Considering the evidence available, I believe this gene is a real gene with the start site at 6596. Although, more evidence is needed to make a conclusion as there is no data from Starterator and Phamerator. /note=Function call: No informative hits were produced as other and draft phages showed no known function with extremely high e values on phagesDB blast and NCBIp blast. No informative hits on CDD and HHpred. Therefore, there is not enough data to develop a hypothesis on the function of the gene. /note=Transmembrane domains: No transmembrane proteins were predicted in TMHMM or in Topcons therefore there it’s not a membrane protein. /note=Secondary Annotator Name: De Jesus, Jorja /note=Secondary Annotator QC: I QC`ed this gene and confirm that it is correct. Correction might be needed under SD score: the other start candidate technically has a better final score and a better z-score. But candidate 5696 is still better since it matches the ORF on Host Trained GeneMark. CDS 6723 - 7064 /gene="9" /product="gp9" /function="head-to-tail stopper" /locus tag="Emotion_9" /note=Original Glimmer call @bp 6723 has strength 17.64; Genemark calls start at 6723 /note=SSC: 6723-7064 CP: yes SCS: both ST: SS BLAST-Start: [head-to-tail stopper [Arthrobacter phage Liebe] ],,NCBI, q1:s1 97.3451% 1.06048E-34 GAP: -14 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.945, -2.827683592113848, yes F: head-to-tail stopper SIF-BLAST: ,,[head-to-tail stopper [Arthrobacter phage Liebe] ],,YP_009817041,69.5652,1.06048E-34 SIF-HHPRED: HEAD COMPLETION PROTEIN GP16; VIRAL PROTEIN, VIRAL INFECTION, TAILED BACTERIOPHAGE, SIPHOVIRIDAE, SPP1, VIRAL ASSEMBLY, HEAD-TO-TAIL INTERFACE, DNA GATEKEEPER, ALLOSTERIC MECHANISM; 7.2A {BACILLUS PHAGE SPP1},,,5A21_F,96.4602,99.7 SIF-Syn: /note=Primary Annotator Name: Chen, Daniel /note=Auto-annotation: Glimmer and Genemark both call the start site as 6723. Start codon is TTG. /note=Coding Potential: Coding potential is found in both GeneMark Host and Self. The ORF has reasonable coding potential and the predicted start site covers all of it. /note=SD (Final) Score: SD final score is -2.828. Z-score at 2.945. These are the best scores on PECAAN. /note=Gap/overlap: There is an overlap of 14 bp, which is less than the recommended 50 bp limit, which is acceptable. /note=Phamerator: Pham 49013. Date 9/29/22. Although this phage’s cluster is unknown, the pham is conserved in other clusters; found in Warda (AZ), Tweety19 (AZ), Crewmate (AZ), and more. The function called for this gene was head-to-tail stopper. Function call was consistent and is in the approved functions list. /note=Starterator: Start site 7 was manually annotated in 32/38 non-draft genes in this pham. Start site 7 is 6723, which agrees with the start site predicted by Glimmer and GeneMark. Start site 7 was the most annotated for this pham. /note=Location call: Based on above evidence, the gene is real and the start site is most likely at 6723. /note=Function call: Predicted function head-to-tail stopper, based on hits from Phagesdb Blast, NCBI BLASTp, and HHpred, all of which had hits with high query coverage (>93%), high % identity (>53.9%), and low E-values (<4.5e-17). CDD had no relevant hits. /note=Transmembrane domains: No transmembrane domains predicted. /note=Secondary Annotator Name: VU, TRINITY /note=Secondary Annotator QC: pham number changed to 50087, re-look at starterator as now new most annotated start/start called by emotion since new pham. (The comments have been addressed; 10/17/22) CDS 7073 - 7354 /gene="10" /product="gp10" /function="hypothetical protein" /locus tag="Emotion_10" /note=Original Glimmer call @bp 7073 has strength 14.35; Genemark calls start at 7073 /note=SSC: 7073-7354 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_REEDO_11 [Arthrobacter phage Reedo]],,NCBI, q1:s1 96.7742% 2.48289E-22 GAP: 8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.793, -3.7234890726758456, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_REEDO_11 [Arthrobacter phage Reedo]],,UJQ86801,64.3564,2.48289E-22 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Han, Maggie /note=Auto-annotation: Glimmer and Genemark: 7073 Start Codon: ATG /note=Coding Potential: Reasonable coding potential in putative ORF. Suggested start site covers all of coding potential. Gene length is reasonable /note=SD (Final) Score: SD score is the best. The final score of -3.723 is the least negative and the Z score of 2.793 is a good z score /note=Gap/overlap: Gap of 8 bp is reasonable. This start site produces the largest ORF. /note=Phamerator: The gene is in pham 51664 as of 10/16/22. The pham is conserved among cluster AZ which is not the cluster Emotion is in. Some phages used for comparison were Adolin, Adumb2043, and Amyev. There is no function commonly called /note=Starterator: There is a reasonable start site choice conserved among the members of the pham. Start site 25 corresponds to 7073 bp found in 34 of the 63 non draft phages in the pham and is the most common start site. Start site 25 was called for this phage. /note=Location call: The gene is a real gene as it is conserved in phamerator and has good coding potential. 7073 is a likely potential start site as it is conserved in starterator and has a good RBS as well as Z-score. The start site is likely as it produces the longest ORF with a minimized gap. The gene also shows synteny with phage Warda /note=Function call: Unknown. PhagesDb and NCBI produce results of function unknown or hypothetical protein with low e-values of less than 10^-15 and coverage of greater than 95%. HHPred and CDD produced no results with e-values of less than 10^-3. /note=Transmembrane domains: No transmembrane domains known /note=Secondary Annotator Name: Rivera, Wendy /note=Secondary Annotator QC: Need to mention if the gene length is reasonable or not under gap/overlap. The pham # and starterator changed. Didn`t mention any information from TOPCONS. CDS 7354 - 7758 /gene="11" /product="gp11" /function="tail terminator" /locus tag="Emotion_11" /note=Original Glimmer call @bp 7354 has strength 13.34; Genemark calls start at 7354 /note=SSC: 7354-7758 CP: yes SCS: both ST: NI BLAST-Start: [tail terminator [Arthrobacter phage KeAlii]],,NCBI, q1:s1 97.7612% 1.59174E-46 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.093, -4.482213237254482, yes F: tail terminator SIF-BLAST: ,,[tail terminator [Arthrobacter phage KeAlii]],,UDL14618,73.6842,1.59174E-46 SIF-HHPRED: Tail terminator protein Rcc01690; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_F,97.0149,98.8 SIF-Syn: The upstream to downstream order of phams 49013, 997, 47361 (this gene), 43323, and 27587 (accessed 9/29/22) matches Warda and Tbone in cluster AZ and Cen1621 and Floof in cluster EB. /note=Primary Annotator Name: Chew, Brandon /note=Auto-annotation: GeneMark and Glimmer auto-annotated an ATG start site at 7354. /note=Coding Potential: Very strong coding potential from ~7350 to ~7650. Coding potential drops off ~100 bp before the stop codon. /note=SD (Final) Score: -4.482. It is the best on PECAAN. /note=Gap/overlap: 1 bp overlap. Reasonable gap length and ORF coverage of coding potential. /note=Phamerator: Pham 47361 (accessed 9/29/22); universally conserved in clusters AZ and EH, matching Adolin, Asa16, Cen1621, and Floof, all with tail terminator function. Strong evidence this is a real gene. /note=Starterator: Not informative. Most annotated start site is not present and auto-annotated start site is unique in the pham. /note=Location call: Coding potential, synteny, overlap and spacing, and gene length indicate this is a real gene. Matching GeneMark and Glimmer start site auto-annotation, along with start codon identity, lack of large gaps before start site, coverage of coding potential and favorable RBS and Z-score indicate 7354 is the correct start site. /note=Function call: The top 3 NCBI BLASTp hits, sorted by E-value, suggested tail terminator function. These hits had high query coverage (>97%), moderate %identity (>54%), and low E-values (68% sequence identity and >97% query coverage. Additionally, the top HHpred hit gives a structure that corresponds to the major tail protein function with a 98.3% probability, 90.76% coverage, and an E-value of 0.000046. The major tail protein is an approved SEA-PHAGES list function. /note=Transmembrane domains: It is not a membrane protein because neither TMHMM nor TOPCONS predicts any TMDs. /note=Secondary Annotator Name: Han, Maggie /note=Secondary Annotator QC: I agree with all calls CDS 8425 - 8691 /gene="13" /product="gp13" /function="tail assembly chaperone" /locus tag="Emotion_13" /note=Original Glimmer call @bp 8425 has strength 20.35; Genemark calls start at 8425 /note=SSC: 8425-8691 CP: yes SCS: both ST: SS BLAST-Start: [tail assembly chaperone [Arthrobacter phage Liebe] ],,NCBI, q4:s5 96.5909% 1.61392E-28 GAP: 99 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.359, -2.45827804503836, yes F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Arthrobacter phage Liebe] ],,YP_009817046,68.5393,1.61392E-28 SIF-HHPRED: Phage_TAC_10 ; Phage tail assembly chaperone,,,PF10963.11,77.2727,95.8 SIF-Syn: tail assembly chaperone, the upstream is major tail protein, downstream gene is tape measure protein, just like Adolin (AZ) /note=Primary Annotator Name: Hsu, Norman /note=Auto-annotation: Glimmer and GeneMark, Both call the start site at 8425, ATG /note=Coding Potential: Yes, coding potential in this sequence is on the forward strand. This coding potential is found on both Host trained and Self trained Genemark. /note=SD (Final) Score: -2.458, this is the best final score listed on Pecaan in comparison to other entries. /note=Gap/overlap: 99 bps, the gap is conserved in several other phages /note=Phamerator: The Pham number is 27587; Date: 9/29/2022. This gene is conserved in Adolin, Adumb2043, and Amyev that are all in cluster AZ. /note=Starterator: There are 40 non draft member of this pham, and 35 out of 40 member call for start site #6. It is the most conserved start site among the members within the pham, which correlates to the start site at 8425 in Emotion. /note=Location call: Based on all evidence, the most likely start site is @8425 /note=Function call: Tail assembly chaperone. The top 3 non draft phagesdb blast hits have the function of tail assembly chaperone (E-value< 10^-20), and 3 out of 5 top NCBI Blast hits also have the function of a tail assembly chaperone (Coverage >96%, and E-value< 10^-25 ). HHpred has two hits for tail assembly chaperone albeit having E-values of 0.28 and 0.81. Considering all hits listed above and the pham conservation of function, the gene has a function of tail assembly protein. CDD had no relevant hits. /note=Transmembrane domains: both TmHmm and Topcons display no TMDs /note=Secondary Annotator Name: Chew, Brandon /note=Secondary Annotator QC: needs attention: check gap length, and consider leaving out HHpred hits with excessive E-value. The case looks weaker for including them because of the poor E-values. Consider using pham conservation of function instead. Also, I don`t think synteny should be constructed using NKF. Consider indicating pham identity instead. CDS join(8425..8685,8685..9035) /gene="14" /product="gp14" /function="tail assembly chaperone" /locus tag="Emotion_14" /note= /note=SSC: 8425-9035 CP: yes SCS: neither ST: NI BLAST-Start: [tail assembly chaperone [Arthrobacter phage Liebe] ],,NCBI, q4:s5 96.0591% 1.94435E-69 GAP: -267 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.359, -2.45827804503836, yes F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Arthrobacter phage Liebe] ],,YP_009817045,69.8492,1.94435E-69 SIF-HHPRED: SIF-Syn: CDS 9044 - 11284 /gene="15" /product="gp15" /function="tape measure protein" /locus tag="Emotion_15" /note=Original Glimmer call @bp 9044 has strength 14.6; Genemark calls start at 9044 /note=SSC: 9044-11284 CP: yes SCS: both ST: SS BLAST-Start: [tape measure protein [Arthrobacter phage Tweety19]],,NCBI, q1:s1 93.5657% 0.0 GAP: 7 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.171, -4.4057176022952405, yes F: tape measure protein SIF-BLAST: ,,[tape measure protein [Arthrobacter phage Tweety19]],,QNO12677,69.0382,0.0 SIF-HHPRED: Tape Measure Protein, gp57; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_AF,80.1609,99.6 SIF-Syn: The region where this gene is located shows good synteny with the genomes of phages Liebe and Maureen. There are several pham lines indicating their alignment, but there isn`t a pham line for this particular gene. Maureen and Liebe seem to have gene slipping upstream of the gene that is also a tape measure protein but does not have a pham line with Gene Stop@11284. /note=PECAAN Notes /note=Primary Annotator Name: De Jesus, Jorja /note=Auto-annotation: Both GeneMark and Glimmer indicate that the start site is at 9044. The start codon sequence is GTG. /note=Coding Potential: The ORF on Host-Trained GeneMark has the start site at 9044 and the stop site at 11284. There is high coding potential throughout the ORF. /note=SD (Final) Score: Based on evidence from Starterator, the best candidate on PECAAN is the gene with the start site at 9044. This candidate has an SD Final score of -4.406 and a Z- score of 2.171. /note=Gap/overlap: There is a gap of 352 with this gene and the upstream gene, and a spacer of 9 with the downstream gene. Because there is a gap so large, there potentially could be another gene upstream of this gene, and based on synteny with other genomes, there could be gene slipping occurring. /note=Phamerator: As of 9/30/22 this gene was a part of 50449. There are 44 members in this pham and 14 of them are drafts. Many others in this pham are tape measure proteins. /note=Starterator: The start for this gene is Start 1. Start 1 is the most annotated gene and was called by 41 out of the 44 genes in this pham, and has 28 out of the 30 manual annotations. Because of this, Start 1 which matches the start site 9044 for this gene, is the correct start for this gene. /note=Location call: The start site 9044 is matching between Starterator, Glimmer, and GeneMark so this is most likely the correct start site. /note=Function call: Tape Measure Protein. Gene Stop@11284 has several pieces of evidence from Phagesdb Blast, NCBI Blast, HHPred all indicating that it is a tape measure protein. CCD has one hit with coverage greater than 50% and a low e-value, however it describes this gene as having an unknown function, so this was not included as evidence. /note=Transmembrane domains: There are 10 hits by TmHmm all indicating that this gene is most likely for a transmembrane domain. /note=Secondary Annotator Name: Hsu, Norman /note=Secondary Annotator QC: The gap with the upstream gene is fairly large, so I think an explanation is needed to justify why this pick is reasonable. Could be helpful to note that most pham members call function of tape measure protein in the phamerator section. HHPred hits have low coverage, more explanation needs to be specified to include it as evidence. CDS 11297 - 12160 /gene="16" /product="gp16" /function="minor tail protein" /locus tag="Emotion_16" /note=Original Glimmer call @bp 11297 has strength 14.65; Genemark calls start at 11297 /note=SSC: 11297-12160 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Brevibacterium phage LuckyBarnes] ],,NCBI, q1:s3 100.0% 1.91707E-48 GAP: 12 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.414, -3.8366550365551095, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Brevibacterium phage LuckyBarnes] ],,YP_009792199,50.5085,1.91707E-48 SIF-HHPRED: HYPOTHETICAL PROTEIN 19.1; VIRAL PROTEIN, DISTAL TAIL PROTEIN; 2.95A {BACILLUS PHAGE SPP1},,,2X8K_C,98.9547,100.0 SIF-Syn: Minor tail protein, upstream gene is minor tail protein, downstream gene is tape measure protein, just like in phage Adumb2043 /note=Primary Annotator Name: Rivera, Wendy /note=Auto-annotation: Gene(stop@12,160 F) Both Glimmer and Genemark agree on the start site of 11297. Start codon is ATG. /note=Coding Potential: The gene does have reasonable coding potential within the ORF and covers between the start site of 11297 and the stop site of 12160. /note=SD (Final) Score: Yes, the SD score for the start site 11297 is the best SD score -3.387 with a Z score of 2.414. /note=Gap/overlap: The gene with the candidate start site 11297 has a length of 864 bp and a reasonable gap of 12 relative to the upstream gene. The candidate start site was chosen as the LORF. /note=Phamerator: As of 9/29, the gene was found in pham# 44612. For comparison, I used the non-draft phage Adumb2043 which belongs to the AZ cluster/subcluster and has the same pham#. Emotion phage is unclustered. According to the PhagesDB Pham Database, non-draft phages in cluster AZ and in other clusters/subclusters called for the function of minor tail protein which is consistent with the approved functions list. /note=Starterator: As of 9/29, start site number 10 at 11297, 9/78 called for this start site. Start site 10 is not the most annotated start site. Emotion is an unclustered phage, but other phages under the same pham # in cluster AZ like Adumb2043 (according to PhagesDB pham database) was used for comparison. Adumb2043 has start site 3, which is also not the most annotated start site. Based on this logic, the start site 11297 is a reasonable start site for this gene. /note=Location call: Based on the evidence gathered, the gene is a real gene with a potential start site at 11297. Starterator agrees with Glimmer and Genemark, and there is reasonable coding potential covering the candidate start site 11296 and stop site 12160. /note=Function call: Predicted function is minor tail protein, based on multiple hits from NCBI Blastp and phagesDB BLAST have hits with the suggested function. With multiple hits containing high query coverage of >93.7282%, an identity % of >37.7622%, and e-values of 1.91707 e^48 or even closer values to 0. No relevant hits from CDD, HHPRED, and TOPCONS. /note=Transmembrane domains: No TMD data was predicted in TmHMM. /note=Secondary Annotator Name: Chang, Stacy /note=Secondary Annotator QC: I agree with the location and function annotation of these gene based on the evidence provided. The following sections need attention - SD (Final Score) is better when it is less negative. Z score is also better the higher the value. || Pham Starterator: The dropdown box needs to be "suggested start (SS)" because starterator called the correct start site as per manual annotation. You could better support the starterator claim in your PECAAN notes by mentioning that Emotion doesn`t contain any of the more frequently annotated start sites (such as 3 and 8) and that its earliest possible start site is start site 10, which is only a few bp away from the most commonly annotated start site of 8. CDS 12170 - 13147 /gene="17" /product="gp17" /function="minor tail protein" /locus tag="Emotion_17" /note=Original Glimmer call @bp 12170 has strength 12.52; Genemark calls start at 12170 /note=SSC: 12170-13147 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein DFO58_3310 [Arthrobacter sp. AG1021]],,NCBI, q1:s1 100.0% 6.64857E-120 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.171, -4.325675514574479, no F: minor tail protein SIF-BLAST: ,,[hypothetical protein DFO58_3310 [Arthrobacter sp. AG1021]],,RKS16753,66.7665,6.64857E-120 SIF-HHPRED: Receptor Binding Protein; beta sandwich domain, phage receptor binding protein, Lactococcus lactis pellicle cell wall polyphosphosaccharide, VIRAL PROTEIN; 1.75A {Lactococcus phage 1358},,,4L9B_A,51.3846,99.3 SIF-Syn: Minor tail protein, upstream gene is minor tail protein, and downstream gene is minor tail protein, just like in phage Warda. /note=Primary Annotator Name: Chen, Daniel /note=Auto-annotation: Glimmer and Genemark both call the start site as 12170. Start codon is TTG. /note=Coding Potential: Coding potential is found in both GeneMark Host and Self. The ORF has reasonable coding potential and the predicted start site covers all of it. /note=SD (Final) Score: SD final score is -4.326. Z-score at 2.171. These are the best scores on PECAAN. /note=Gap/overlap: There is a gap of 9 bp, which is less than the recommended 50 bp limit, which is acceptable. /note=Phamerator: Pham 49124. Date 10/4/22. Although this phage’s cluster is unknown, the pham is conserved in other clusters; found in Warda (AZ), YesChef (AZ), Elezi (AZ), and more. The function called for this gene was minor tail protein. Function call was consistent and is in the approved functions list. /note=Starterator: Start site 5 was manually annotated in 23/24 non-draft genes in this pham. Start site 5 is 12170, which agrees with the start site predicted by Glimmer and GeneMark. Start site 5 was the most annotated for this pham. /note=Location call: Based on above evidence, the gene is real and the start site is most likely at 12170. /note=Function call: Predicted function minor tail protein, based on hits from Phagesdb Blast and HHpred, both of which had hits with high query coverage (>51.4%), high % identity (>54.9%), and low E-values (<1.1e-10). CDD had no relevant hits. /note=Transmembrane domains: No transmembrane domains predicted. /note=Secondary Annotator Name: De Jesus, Jorja /note=Secondary Annotator QC: I QC`ed this gene and agree with these annotations. There is a typo on the start site under auto-annotation I think. If you want, you can uncheck NCBI BLAST`s evidence for hypothetical protein since HHpred and Phagesdb BLAST confirmed a more specific function. (The comments have been addressed; 10/17/22) CDS 13147 - 14427 /gene="18" /product="gp18" /function="minor tail protein" /locus tag="Emotion_18" /note=Original Glimmer call @bp 13147 has strength 21.42; Genemark calls start at 13147 /note=SSC: 13147-14427 CP: yes SCS: both ST: NI BLAST-Start: [Tail protein [Brevibacterium phage Rousseau]],,NCBI, q3:s5 99.2958% 7.71738E-75 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.509, -6.010787323443228, no F: minor tail protein SIF-BLAST: ,,[Tail protein [Brevibacterium phage Rousseau]],,CAH1193710,59.5174,7.71738E-75 SIF-HHPRED: Sipho_Gp37 ; Siphovirus ReqiPepy6 Gp37-like protein,,,PF14594.9,95.0704,99.8 SIF-Syn: Minor tail protein, gene upstream is minor tail protein, gene downstream is a minor tail protein just like in phage Warda /note=Primary Annotator Name: Han, Maggie /note=Auto-annotation: Glimmer and Genemark: 13147 Start Codon: ATG /note=Coding Potential: Reasonable coding potential in putative ORF. Suggested start site covers all of coding potential /note=SD (Final) Score: SD score is the best. The final score of -6.011 is the least negative and the Z score of 1.509 is a somewhat good z score /note=Gap/overlap: An overlap of 1 bp is reasonable. This start site produces the largest ORF. Overlap of 1 suggests an operon. /note=Phamerator: The gene is in pham 50325 as of 10/8/22. The pham is conserved among cluster AZ which is not the cluster Emotion is in. Some phages used for comparison were Asa16, Adumb2043, and Amyev. The function commonly called is minor tail protein /note=Starterator: There is not a reasonable start site choice conserved among the members of the pham. Start site 2 corresponds to 13147 bp and was called for this phage. It was found in 1 of 65 phages in the pham. The most common start site is 22 found in 25 of the 62 non draft phages in the pham. Emotion does not contain this start site. /note=Location call: The gene is a real gene as it is conserved in phamerator and has good coding potential. 13147 is a likely potential start site as it has a good RBS as well as Z-score. The start site is likely as it produces the longest ORF with an overlap of 1 bp which is typical of an operon. The gene also shows synteny with phage Warda /note=Function call: Minor tail protein, This is the likely function of the gene since both PhagesDB and NCBI Blast produce hits with low e-values less than 10^-70, relatively high identities of around 40%, and coverages greater than 99.5%. HHPred and CDD both produce hits with low e-values of less than 10^-10 and high probabilities of greater than 99%. This gene also shows synteny with phage Warda. /note=Transmembrane domains: No transmembrane domains were called /note=Secondary Annotator Name: Rivera, Wendy /note=Secondary Annotator QC: Glimmer and Genemark call a different start site from what`s mentioned in the notes. The SD and Z score for the start site you checked as evidence is not the best SD & Z score. Need to mention if the length of the gene is reasonable or not. Didn`t mention if TOPCONS was informative or not. CDS 14446 - 17979 /gene="19" /product="gp19" /function="minor tail protein" /locus tag="Emotion_19" /note=Original Glimmer call @bp 14446 has strength 14.9; Genemark calls start at 14446 /note=SSC: 14446-17979 CP: yes SCS: both ST: NI BLAST-Start: [minor tail protein [Arthrobacter phage Iter]],,NCBI, q508:s287 56.9244% 1.32829E-77 GAP: 18 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.608, -4.352843561762387, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Iter]],,URQ05008,36.478,1.32829E-77 SIF-HHPRED: SIF-Syn: Gene (stop@17979 F) is preceded by 2 minor tail proteins (phams 49124 and 50325) and followed by pham 2534 (all accessed 10/4/22) the same as AZ phages Warda and Tbone and FO phage Aoka. /note=Primary Annotator Name: Chew, Brandon /note=Auto-annotation: GeneMark and Glimmer auto-annotated an ATG start site at 14446. /note=Coding Potential: Very strong coding potential from ~14470 to ~17950, spotted with several drops to below 25% coding potential. However, these gaps and long gene length appear to be conserved amongst other genes in the same pham 50912 (accessed 10/4/22). /note=SD (Final) Score: -4.353. It is the best on PECAAN. /note=Gap/overlap: 18 bp gap. Reasonable gap length and ORF coverage of coding potential. /note=Phamerator: Pham 50912 (accessed 10/4/22); conserved among a handful of AZ phages and 1 FO phage. All non-draft genes in the pham are minor tail proteins. /note=Starterator: Not informative. only 3 pham members; 2 are drafts. single manually annotated start site is not present in Emotion. /note=Location call: Coding potential, synteny, overlap and spacing, gene length and NCBI BLASTp hits indicate this is a real gene. Matching GeneMark and Glimmer start site auto-annotation, along with start codon identity, lack of large gaps before start site, coverage of coding potential, favorable RBS and Z-score, and high coverage and low e-value (>96% and e-2). /note=Transmembrane domains: TMHMM does not predict any transmembrane domains, so it is unlikely to be a membrane protein. /note=Secondary Annotator Name: Chen, Daniel /note=Secondary Annotator QC: I agree with this annotation. All the evidence categories have been considered. CDS 18061 - 18345 /gene="20" /product="gp20" /function="hypothetical protein" /locus tag="Emotion_20" /note=Original Glimmer call @bp 18061 has strength 16.05; Genemark calls start at 18061 /note=SSC: 18061-18345 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein [Arthrobacter glacialis] ],,NCBI, q3:s6 95.7447% 4.34673E-20 GAP: 81 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.796, -5.608116066865693, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Arthrobacter glacialis] ],,WP_103481348,53.0973,4.34673E-20 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Vu, Trinity /note=Auto-annotation: Glimmer and GeneMark Start both call start site 18061 and call the start codon TTG. /note=Coding Potential: There is good coding potential in the putative ORF in the forward direction and only on one frame. There is no upstream coding potential cut off. /note=SD (Final) Score: -5.608. Best score on PECAAN /note=Gap/overlap: There is a 81 bp gap. While this is a somewhat large gap, there is no upstream coding potential to be covered by a more upstream start site and there is only one alternative start site which results in the loss of coding potential. /note=Phamerator: Pham 2534 as of 10/6/22. There are 35 members in the pham and the pham is conserved across multiple clusters including AZ (phages Adumb2043 and Lego). None of the pham members call a function. /note=Starterator: The most conserved/called start site is #11 which is called in 19 of the 26 non-draft phages, 27/35 pham members overall. Emotion does not possess this start site and calls start site #17 which corresponds to bp 18061. Start site #17 is only called in Emotion. /note=Location call: This is likely a real gene as there is good coding potential in the ORF, the gap doesn’t result in loss of coding potential, and the length of 285 bp is reasonable. While the start site called by Emotion is not called in any other phage, this is not unexpected as Emotion is cluster unknown and doesn’t contain the most annotated start site. /note=Function call: NKF. The top 3 PhagesDB BLAST hits (E-values of 1e-16) and the top 3 NCBI Blast hits (E-values of 7.9e-18 or smaller, 40% identity or greater, and 91% coverage or greater) all call either function unknown or hypothetical protein which corresponds to function unknown. HHPRED and CDD had either no relevant hits or not hits. /note=Transmembrane domains: TOPCONS and TMHMM predict no TMDs. /note=Secondary Annotator Name: Han, Maggie /note=Secondary Annotator QC: I agree with all calls CDS 18355 - 19074 /gene="21" /product="gp21" /function="endolysin, N-acetylmuramoyl-L-alanine amidase domain" /locus tag="Emotion_21" /note=Original Glimmer call @bp 18355 has strength 19.04; Genemark calls start at 18355 /note=SSC: 18355-19074 CP: yes SCS: both ST: NA BLAST-Start: [N-acetylmuramoyl-L-alanine amidase [Georgenia thermotolerans]],,NCBI, q1:s1 63.1799% 1.61226E-56 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.359, -2.0111200136961407, yes F: endolysin, N-acetylmuramoyl-L-alanine amidase domain SIF-BLAST: ,,[N-acetylmuramoyl-L-alanine amidase [Georgenia thermotolerans]],,WP_152203647,41.1765,1.61226E-56 SIF-HHPRED: N-acetylmuramoyl-L-alanine amidase; amidase, zinc binding, cell wall degradation, endolysine, hydrolase; HET: PO4, GOL; 1.21A {Clostridium intestinale},,,6SSC_A,48.954,99.5 SIF-Syn: this pham is only found in Emotion and VroomVroom. /note=Primary Annotator Name: Hsu, Norman /note=Auto-annotation: Glimmer and GeneMark, Both call the start site at 18355, ATG /note=Coding Potential: There is good coding potential in this ORF. It is found on the forward strand in both GeneMark Self and Host. /note=SD (Final) Score: -2.011, this is the best score on Pecaan /note=Gap/overlap: 9bp with the upstream gene, the gap is below 50bps and therefore acceptable /note=Phamerator: The pham number as of 10/3/2022 is 48120. This gene is an orpham and yet to be found in other phages. No gene function is called for by this pham number. /note=Starterator: No other genes of non-draft phages are reported. The Starterator readout is not informative. /note=Location call: Based on all available evidence, the gene is most likely a real gene and the most likely start site is at 18355. /note=Function call: Endolysin, N-acetylmuramoyl-L-alanine amidase domain. The top 4 non draft phagesdb blast hits have the function of endolysin (e-value< 10^-45).3 out of 6 top NCBI Blast hits have the function of a N-acetylmuramoyl-L-alanine amidase (Coverage >60%, and E-value< 10^-50 ), but 2 out of 6 of the hits call for the function of LysM peptidoglycan-binding domain. Yet, it was reported that LysM peptidoglycan-binding domain contains transmembrane domain, and no TMHs are reported by TmHmm. Hence, LysM peptidoglycan-binding domain cannot be its real function. HHpred has 2 hits forN-acetylmuramoyl-L-alanine amidase and 1 hit for endolysin, all having e-values smaller than 10^-5. After checking the approved list of gene functions, the gene most likely has a function of Endolysin, N-acetylmuramoyl-L-alanine amidase domain. This is also supported by CDD as it has two hits with coverage> 40% and e-values< 1E-10. /note=Transmembrane domains: both TmHmm and Topcons display no TMDs /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: needs attention: make sure to use pham number over NKF proteins in synteny box, check gap length. Function call should be rearranged so that conclusion is last, and "I believe..." can be removed. You may want to also add e-value to CDD hit. Also, remove extra line. CDS 19071 - 19367 /gene="22" /product="gp22" /function="membrane protein" /locus tag="Emotion_22" /note=Original Glimmer call @bp 19080 has strength 13.55; Genemark calls start at 19080 /note=SSC: 19071-19367 CP: yes SCS: both-cs ST: NI BLAST-Start: [membrane protein [Arthrobacter phage Powerpuff] ],,NCBI, q23:s15 75.5102% 1.64846E-23 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.588, -4.536085090614939, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Powerpuff] ],,QGZ17320,60.6742,1.64846E-23 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chang, Stacy /note=Auto-annotation: Glimmer and GeneMark agree on start@19080. /note=Coding Potential: Strong coding potential in this ORF is apparent on the forward strand only, agreeing with the auto-annotation call. Start site 19080 appears to cover all the coding potential, which is apparent in both GeneMark Self and Host. However, another potential start site is 19071. This yields a 4bp overlap which potentially indicates an operon and as such there is support for calling 19071. /note=SD (Final) Score: Start site 19071 has a final score of -4.536, which is approximately equal to the auto-annotated site (19080)’s final score of -4.393. These are among the top 3 final scores for this gene. The two start sites have the same Z-score (2.588), which is the highest Z-score among the start site candidates. /note=Gap/overlap: Gap for start@19071 is -4bp, which is indicative of an operon. Gap for start@19080 is 5bp. /note=Phamerator: 50484 as of 10/18/2022, which primarily has “function unknown” calls but occasionally is called as a tail needle protein. Pham 49098 is conserved across 30 non-draft members in actinobacteriophage clusters AZ, ED, FP, and EH. /note=Starterator: Start site 14 (19080, auto-annotated site) is called 100% of the time when present in the gene. Start site 13 (19071, purported site) has never been called by other members of the pham. However, all members of the pham that call start site 14 have upstream overlaps of 1 or 4bp (indicating an operon). Emotion’s gene’s start site 13 has a 4bp overlap with the upstream gene, which is conserved in BaileyBlu’s gene. DustyDino, Lego, Lizalica, Kaylissa, Powerpuff, Tbone, Warda, Yang, and YesChef have a 1bp overlap with its upstream gene, also indicating an operon. As such, despite the lack of conserved SS sequence, there is strong support for keeping the operon-indicating overlap on this gene. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 19071. /note=Function call: Potentially a tail needle protein. The only function (not NKF) called in PhagesDB BLAST is tail needle protein, with good e-values such as 2e-16 in BaileyBlu. Zeta1847 and DustyDino also call this protein as a tail needle protein with good e-values. Tail needle proteins are associated with coiled-coil domains, trimers, and a “tail needle/membrane penetration” domain as per HHPred, though with weak e-values and coverage. Called tail needle proteins as seen in Zeta1847 stop@18922 have one transmembrane domain, matching the pattern of Emotion stop@19367’s single TMD. /note=However, overwhelmingly, Phagesdb BLAST and HHPred present this protein as having an unknown function, and thus, in the dearth of further convincing evidence, I call this protein as a membrane protein of unknown function. /note=Transmembrane domains: Both TMHMM or TOPCONS predict one TMD, therefore this gene must be a TM protein. /note=Secondary Annotator Name: Vu, Trinity /note=Secondary Annotator QC: note what day you noted pham number as now it`s changed (overall lack of function call + occasional tail needle call still the same as existing notes), i agree with not calling tail needle protein function as there are many hits in phagesdb blast in addition to ncbi blast that call hypothetical protein/unknown function (same thing) in addition to very weak hhpred hits due to very poor e-values. since calling membrane protein, you shouldn`t check the evidence indicating tail needle protein, additionally you can check function unknown as evidence in BLAST, good reasoning for start site call! CDS 19367 - 19633 /gene="23" /product="gp23" /function="membrane protein" /locus tag="Emotion_23" /note=Original Glimmer call @bp 19367 has strength 19.11; Genemark calls start at 19367 /note=SSC: 19367-19633 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Iter]],,NCBI, q3:s2 80.6818% 1.75502E-29 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.986, -4.760712055653865, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Iter]],,URQ05011,78.75,1.75502E-29 SIF-HHPRED: SIF-Syn: The region where gene Stop@19633 has some synteny with the genomes of Iter and Yang but it is very patchy, so overall the synteny is poor overall. Only a few genes on either side of the gene match with Iter and Yang’s genes and they are usually not all in the same order. /note=Primary Annotator Name: De Jesus, Jorja /note=Auto-annotation: Both Glimmer and GeneMark have 19367 as the start site. The start codon for this gene is ATG. /note=Coding Potential: The coding potential for this gene is high within the ORF. The coding potential doesn’t rise until slightly after the beginning of the ORF and falls slightly before the ORF ends. The ORF seems to match the autoannotated start and the stop site for this gene. /note=SD (Final) Score: The Z score is slightly low at 1.986 and a slightly low Final Score at -4.761. However because Starterator indicates that the start site is also 19367, and because of the three candidates, these scores are the best, then this candidate is still the most likely candidate. /note=Gap/overlap: There is an overlap of 1 bp with the upstream gene and a gap of 11 with the downstream gene. /note=Phamerator: As of 9/30/22 this gene was a part of pham 42320. This pham has 41 members and 13 of them are drafts. /note=Starterator: Emotion calls start site 5, but this start site has no manual annotations and is not called by any other gene other than this one. However, this gene does not have the most annotated and most called start which was start 6. Start 5 matches the start point provided by both Glimmer and GeneMark. /note=Location call: The location of this gene seems to be accurate. While Start 5 or Start@19267 is not the most annotated or the most called start, there are only a few start sites called for this gene and Start 5 matches the start site provided by GeneMark and Glimmer. So this start site at 19267 should be the right start site. The coding potential of this gene also indicates that this location is correct. /note=Function call: Membrane protein. Phagesdb Blast indicates that this gene has significant alignment in many other genomes, however, like this gene, those genes also have an unknown function. Phagesdb has unknown function hits that had good e-values but since NCBI has hits with a more specific function, the Phagesdb hits were removed from the checked evidence. NCBI Blast has significant hits that only indicate that this gene is for a membrane protein. HHPred did not produce any significant hits, all of them had e-values greater than 80. CDD has no hits. While this gene could be designated as NKF, NCBI blast indicates the function of membrane protein and TmHMM/TOPCONS also indicate that it is a membrane protein, so that is most likely its function. /note=Transmembrane domains: There are two hits for TMHMM, so this gene might be for a transmembrane domain. These domains include inside portions and some helices. /note=Secondary Annotator Name: Hsu, Norman /note=Secondary Annotator QC: Would be helpful to note that the rise and drop in ORF is only observed in GenemarkS. There is no need to fill out the synteny box if the gene is NKF. Overall, I agree with the annotation! CDS 19774 - 19998 /gene="24" /product="gp24" /function="hypothetical protein" /locus tag="Emotion_24" /note=Original Glimmer call @bp 19774 has strength 18.97; Genemark calls start at 19774 /note=SSC: 19774-19998 CP: yes SCS: both ST: NA BLAST-Start: GAP: 140 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.105, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Rivera, Wendy /note=Auto-annotation: Gene(stop@19998F) Start codon is ATG. Both Glimmer and Genemark agree on the same start site of 19774. /note=Coding Potential: The gene has reasonable coding potential predicted within the ORF and the chosen start site covers all the coding potential in both host & self trained Genemark. /note=SD (Final) Score: Yes, the candidate start site has the best SD of -2.443 and a Z score of 3.105. /note=Gap/overlap: There is a gap of 140 bp with the upstream gene, which is reasonable. The gene itself has a length of 225 bp which is acceptable given the chosen start site of 19774. The chosen start site is not the LORF, the LORF has a SD score of -7.986 and a Z score of 0.766. /note=Phamerator: As of 10/4/2022, the gene is found as pham#48395. The gene is an orpham. Phams database did not have a function called for this gene. /note=Starterator: As of 10/4/2022, there is no starterator data found for my gene. /note=Location call: Taken together, the evidence suggests that this gene is a real gene due to reasonable coding potential found within the ORF. But more evidence is needed to confirm this as there is no current data from Starterator or from Phamerator as of 10/3/2022. /note=Function call: The gene has no known function since no informative hits were given from NCBI and PhagesDB BLASTp, CDD, TOPCONS and HHPRED due to extremely high e-values or no information was available. /note=Transmembrane domains: No TMD data was predicted from Transmembrane Domains. /note=Secondary Annotator Name: Chang, Stacy /note=Secondary Annotator QC: I agree with the location and function call of this gene. I would specify that the gene is in an orpham (in the Phamerator section) instead of writing that the gene "is not commonly annotated (conserved) in other phage clusters", because this implies that there are still other phams in the cluster (but not many). CDS 20136 - 20762 /gene="25" /product="gp25" /function="hypothetical protein" /locus tag="Emotion_25" /note=Original Glimmer call @bp 20136 has strength 26.28; Genemark calls start at 20136 /note=SSC: 20136-20762 CP: no SCS: both ST: NA BLAST-Start: GAP: 137 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.105, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chen, Daniel /note=Auto-annotation: Glimmer and Genemark both call the start site as 20136. Start codon is ATG. /note=Coding Potential: Coding potential is found in both GeneMark Host and Self. The ORF has reasonable coding potential and the predicted start site covers most but not all coding potential. /note=SD (Final) Score: SD final score is -2.443. Z-score at 3.105. These are the best scores on PECAAN. /note=Gap/overlap: There is a gap of 137 bp, which is not good. However, the evidence for the alternative start sites are poor with very low SD final scores and Z-scores. /note=Phamerator: Pham 48135. Date 10/4/22. There are no other phages with this pham and no function calls. /note=Starterator: There is no Starterator report. /note=Location call: Based on the above evidence, though quite sparse, the gene seems to be real and the start site is most likely 20136. /note=Function call: Function unknown. No programs returned any informative results. /note=Transmembrane domains: No transmembrane domains predicted. /note=Secondary Annotator Name: De Jesus, Jorja /note=Secondary Annotator QC: I QC`ed this location call and agree with these annotations. CDS 20865 - 21491 /gene="26" /product="gp26" /function="deoxynucleoside monophosphate kinase" /locus tag="Emotion_26" /note=Original Glimmer call @bp 20865 has strength 19.72; Genemark calls start at 20865 /note=SSC: 20865-21491 CP: yes SCS: both ST: SS BLAST-Start: [adenylate kinase [Arthrobacter phage Liebe] ],,NCBI, q6:s5 86.0577% 5.68064E-65 GAP: 102 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.105, -2.523003374675015, yes F: deoxynucleoside monophosphate kinase SIF-BLAST: ,,[adenylate kinase [Arthrobacter phage Liebe] ],,YP_009817056,69.0722,5.68064E-65 SIF-HHPRED: c.37.1.1 (A:) Deoxynucleoside monophosphate kinase {Bacteriophage T4 [TaxId: 10665]} | CLASS: Alpha and beta proteins (a/b), FOLD: P-loop containing nucleoside triphosphate hydrolases, SUPFAM: P-loop containing nucleoside triphosphate hydrolases, FAM: Nucleotide and nucleoside kinases,,,SCOP_d1deka_,88.9423,99.8 SIF-Syn: /note=Primary Annotator Name: Han, Maggie /note=Auto-annotation: Glimmer and Genemark: 20865 Start Codon: ATG /note=Coding Potential: Reasonable coding potential in putative ORF. Suggested start site covers all of coding potential /note=SD (Final) Score: SD score is the best score. The final score of -2.523 is the least negative and the Z score of 3.105 is a good z score /note=Gap/overlap: The gap is 102 bp long which is not ideal, but the coding potential suggests there is no flipping and the final score/z scores are both the best. /note=Phamerator: The gene is in pham 367 as of 10/4/22. The pham is conserved among cluster BD which is not the cluster Emotion is in. Some phages used for comparison were Alsaber, Alvy, and Amela. The function commonly called is deoxynucleoside monophosphate kinase. /note=Starterator: There is a reasonable start site choice conserved among the members of the pham. Start site 35 corresponds to 20865 bp and was called for this phage. It was found in 44 of 189 phages in the pham. The most common start site is 44 found in 55 of the 165 non draft phages in the pham. Emotion does not contain this start site. /note=Location call: The gene is a real gene as it is conserved in phamerator and has good coding potential. 20865 is a likely potential start site as it is conserved in starterator and has a good RBS as well as Z-score. The start site is likely as it produces the second longest ORF with a minimized gap. The gene also shows synteny with phage Warda /note=Function call: deoxynucleoside monophosphate kinase. This is the likely function of the gene since both PhagesDB and NCBI Blast produce hits with low e-values less than 10^-50, relatively high identities greater than 50%, and probabilities greater than . HHPred and CDD both produce hits with low e-values of less than 10^-10 and high probabilities of greater than 90%. /note=Transmembrane domains: No transmembrane domains called. /note=Secondary Annotator: De Jesus, Jorja /note=Secondary Annotator QC: I QC`ed this location and agree with these annotations. CDS 21575 - 22168 /gene="27" /product="gp27" /function="hypothetical protein" /locus tag="Emotion_27" /note=Original Glimmer call @bp 21575 has strength 14.8; Genemark calls start at 21575 /note=SSC: 21575-22168 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ELEZI_27 [Arthrobacter phage Elezi] ],,NCBI, q1:s1 93.9086% 3.44761E-56 GAP: 83 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.443, -6.892906635130683, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ELEZI_27 [Arthrobacter phage Elezi] ],,QNJ56527,67.7083,3.44761E-56 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chew, Brandon /note=Auto-annotation: GeneMark and Glimmer auto-annotated an ATG start site at 21575. /note=Coding Potential: Very strong coding potential, all of which fits well into the ORF. /note=SD (Final) Score: -6.893. It is not the best score, but this start site is better than the alternatives for capturing all coding potential and being conserved among other genes in the pham. /note=Gap/overlap: 83 bp gap. It is rather large, but this neatly fits the CP and there is no CP in the gap. Other possible start sites miss significant sections of CP. /note=Phamerator: Pham 1819 (accessed 10/5/22); conserved among all members of AZ and EH clusters, including Crewmate and Warda from cluster AZ and Cen1621 and Floof from EH. /note=Starterator: Start site 19 was manually annotated in 42 of 55 genes in this pham. Start 19 is 21575 in Emotion. Starterator agrees with Glimmer and GeneMark. /note=Location call: Coding potential, overlap and spacing, and gene length indicate this is a real gene. Matching GeneMark and Glimmer start site auto-annotation, along with start codon identity, lack of coding potential in large gap before start site, coverage of coding potential and conservation of start site within the pham indicate 21575 is the correct start site. /note=Function call: No known function. No PhagesDB BLAST, HHpred or CDD hits. NCBI BLASTp yielded 2 phage hits of >93% coverage and < e-55 E-value, both of hypothetical protein. /note=Transmembrane domains: TMHMM found no transmembrane domains, so it is unlikely to be a membrane protein. /note=Secondary Annotator Name: Rivera, Wendy /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 22403 - 23242 /gene="28" /product="gp28" /function="Cas4 exonuclease" /locus tag="Emotion_28" /note=Original Glimmer call @bp 22403 has strength 20.28; Genemark calls start at 22403 /note=SSC: 22403-23242 CP: yes SCS: both ST: SS BLAST-Start: [exonuclease [Arthrobacter phage Lizalica]],,NCBI, q1:s1 98.2079% 1.5122E-163 GAP: 234 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.448, -3.8478832731078265, yes F: Cas4 exonuclease SIF-BLAST: ,,[exonuclease [Arthrobacter phage Lizalica]],,UIW13510,89.1697,1.5122E-163 SIF-HHPRED: Mitochondrial genome maintenance exonuclease 1; human MGME1, DNA complex, DNA exonuclease, DNA BINDING PROTEIN; 2.702A {Homo sapiens},,,5ZYT_C,78.4946,99.8 SIF-Syn: Exonuclease, upstream gene is pham 1819 just like phages Lego and Warda. Downstream gene does not exhibit synteny. /note=Primary Annotator Name: Vu, Trinity /note=Auto-annotation: Both Glimmer and GeneMark were used and call the start site at 22403. The start codon is GTG. /note=Coding Potential: Good coding potential within the putative ORF with the chosen start site covering all the coding potential in both the Host- and Self-Trained GeneMark maps in the forward direction. /note=SD (Final) Score: -3.848 (Best in PECAAN) /note=Gap/overlap: There is a 234 basepair gap. Alternatives weren’t chosen as they either greatly increase the gap or they shorten the gap but the Z-score and Final Score are poor. Additionally, the gap is conserved in phages such as Warda and Tbone. The length of the gene is acceptable (>120 basepairs). /note=Phamerator: Pham 48769 as of 9/29/22. While Emotion is unclustered currently, the phage is conserved in multiple clusters including AZ (such as phages Adolin, Amyev, and Asa16) and EB (such as phages Abigail, Albright, and Armstrong). These phages call the function exonuclease. /note=Starterator: The most conserved start site is site #38 which is present in 58/152 genes in the pham and called in 39/124 non-draft genes in the pham. This start site is called by Emotion and corresponds to basepair 22403 in Emotion. This site is reasonable as it’s present in 38.2% of the pham. /note=Location call: Likely a real gene as it’s conserved in phamerator and has good coding potential. Start site #38 is the most likely as it’s the most conserved in starterator, present in Emotion, and covers all coding potential. /note=Function call: Exonuclease function (on approved SEA-Phages list). The top 2 non-draft Phagesdb BLAST hits (E-value of 1e-131) and the second and third top hits in NCBI blast (>80% identity, >98% coverage, and E-value of <1e-162) call exonuclease function. The top 2 HHPRED hits (99.8% probability, >78% coverage, and E-values <=3e-16) call mitochondrial genome maintenance nuclease 1. No CDD hits. /note=Transmembrane domains: No TMDs are predicted by TMHMM nor TOPCONS. /note=Secondary Annotator Name: Chen, Daniel /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 23246 - 23344 /gene="29" /product="gp29" /function="hypothetical protein" /locus tag="Emotion_29" /note=Genemark calls start at 23246 /note=SSC: 23246-23344 CP: yes SCS: genemark ST: NA BLAST-Start: GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.054, -4.561336254060166, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: membrane protein, the upstream gene is an exonuclease, just like Amyev, and the downstream gene belongs to Pham 48472 /note=Primary Annotator Name: Hsu, Norman /note=Auto-annotation: GeneMark calls the start site at 23246, ATG, while Glimmer shows no start site /note=Coding Potential: There is good coding potential in this ORF. It is found on the forward strand in both GeneMark Self and Host. /note=SD (Final) Score: -4.561, this is the only score on Pecaan /note=Gap/overlap: There is 3bp gap that exist with the upstream gene, the gap is small and reasonable /note=Phamerator: The pham number as of 10/6/2022 is 48456. This gene is an orpham and yet to be found in other phages. No gene function is called for by this pham number yet. /note=Starterator: No other genes of non-draft phages are reported. The Starterator readout is not informative. /note=Location call: Based on all available evidence, the gene is most likely a real gene and the most likely start site is at 23246. /note=Function call: Membrane protein predicted by TmHmm and SOSUI. There are no Phagesdb Blast, NCBI Blast, CDD hits, or HHPRED. /note=Transmembrane domains: TmHmm predicts one TMH, and SOSUI predicts one TMH. Hence, we have sufficient information to assume it has real TMD. /note=Secondary Annotator Name: Han Maggie /note=Secondary Annotator QC: has transmembrane domains so could be called a membrane protein but you can check Topcons again CDS 23341 - 23598 /gene="30" /product="gp30" /function="hypothetical protein" /locus tag="Emotion_30" /note=Original Glimmer call @bp 23341 has strength 12.95; Genemark calls start at 23341 /note=SSC: 23341-23598 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein [Glutamicibacter halophytocola] ],,NCBI, q18:s11 77.6471% 1.64696E-8 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.588, -3.627004739933808, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Glutamicibacter halophytocola] ],,WP_257746139,55.8824,1.64696E-8 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chang, Stacy /note=Auto-annotation: Glimmer and GeneMark agree on start at 23341. /note=Coding Potential: Coding potential in this ORF is primarily on the forward strand, agreeing with the auto-annotation call. While coding potential is consistently >0.5 for the host-trained GeneMark, one can observe a dip in the coding potential around the 23430bp area. This dip is exaggerated in the self-trained GeneMark, where the coding potential lowers to near (but not reaching) 0 from around 23410-23460. The start site 23341 does cover all the coding potential, which is apparent in both GeneMark Self and Host. /note=SD (Final) Score: -3.627, which is a good score. The Z score is also good at 2.588. /note=Gap/overlap: Gap for start@23341 is -4bp, which is indicative of an operon. Adjoining genes are also in the forward direction. /note=Phamerator: 49472, an orpham. /note=Starterator: Orpham; N/A /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 23341 (the only possible start site found). /note=Function call: NKF. The lipoprotein function is occasionally weakly called by Phagesdb BLAST and HHPRED, but the evidence is not strong enough to declare this function for our gene. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Chew, Brandon /note=Secondary Annotator QC: All calls look good. CDS 23595 - 23915 /gene="31" /product="gp31" /function="hypothetical protein" /locus tag="Emotion_31" /note=Original Glimmer call @bp 23595 has strength 20.91; Genemark calls start at 23595 /note=SSC: 23595-23915 CP: no SCS: both ST: NI BLAST-Start: [hypothetical protein [Herbiconiux moechotypicola]],,NCBI, q1:s1 99.0566% 3.71704E-41 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.302, -4.413617268040186, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Herbiconiux moechotypicola]],,WP_259478890,75.4717,3.71704E-41 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: De Jesus, Jorja /note=Auto-annotation: Glimmer and GeneMark both indicate that this gene starts at 23595. The start codon sequence for this gene is GTG. /note=Coding Potential: The ORF matches where the start and stop site are but coding potential does not seem to include the start site. So despite the coding potential on the rest of the gene, this gene does not have all GM coding capacity. /note=SD (Final) Score: The SD (Final) score is -4.414 and the Z-score is 2.302. This is the only SD score for this gene. /note=Gap/overlap: This gene overlaps with the upstream gene by 4 bp and has a gap with the downstream gene of 13 bp. /note=Phamerator: As of 09/30/22 this gene is a part of pham 9051 which as only 6 members and 2 of them are drafts. The members of this pham are all from different clusters and when each member is compared with Phamerator, they do have this gene, but lack synteny. /note=Starterator: This gene does not have the most called and most manually annotated start which was start 2. Instead this gene has start 3 which has no manual annotations and is not called by any other gene. This serves as poor justification for this start site. On PECAAN, however, this is the only start site. But there is no coding potential for this start site. Because of this, the start @23595 can be considered noninformative. /note=Location call: Because there are only 6 members in this pham all of which seem to be from different clusters, Glimmer and GeneMark match this start for this gene, but the coding potential for this start is low and poorly supported by Starterator. The start for this is non informative, but since this is the only start site, 23595 is the most likely start site for now. /note=Function call: NKF. Phagesdb blast indicates that Emotion has alignment for this gene with genomes such as IAmGroot and GardenState. These hits with the best e-values all call this gene as a function unknown gene. NCBI blast has hits with good coverage and e-values, but all of these were for hypothetical protein. CDD does not have any hits. HHpred does not have any good hits; many of them have too high of an e-value so they were not added as evidence. There is no evidence that gives this gene a specific function, so it must be a gene with no known function. Since NCBI blast provides the only reliable evidence, this gene is a real gene, and can be designated as NKF. /note=Transmembrane domains: TmHmm and Topcons do not have any hits for this gene, so this gene probably is not for a transmembrane protein. /note=Secondary Annotator Name: Vu, Trinity /note=Secondary Annotator QC: hypothetical protein same as NKF, pahgesdb BLAST can be checked, also please check w/ instructor to confirm but I believe we mark NFK instead of hypothetical protein + starterator may be uninformative since it doesn`t show start conservation and emotion only has 1 possiblel site? overall i agree with call!! CDS 23912 - 24310 /gene="32" /product="gp32" /function="hypothetical protein" /locus tag="Emotion_32" /note=Original Glimmer call @bp 23912 has strength 21.44; Genemark calls start at 23912 /note=SSC: 23912-24310 CP: yes SCS: both ST: SS BLAST-Start: [lipoprotein [Arthrobacter phage SilentRX]],,NCBI, q5:s2 93.9394% 7.14757E-45 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.945, -2.8454123590742793, yes F: hypothetical protein SIF-BLAST: ,,[lipoprotein [Arthrobacter phage SilentRX]],,QWY82786,72.5926,7.14757E-45 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Rivera, Wendy /note=Auto-annotation: Gene(stop@24310F) Start codon is GTG. Both Genemark and Glimmer agree with the start site @23912. /note=Coding Potential: The gene has reasonable coding potential within the ORF and includes the chosen start site. /note=SD (Final) Score: The gene has the best SD score of -2.845 and a Z score of 2.945. /note=Gap/overlap: The gene has an overlap of 4 bp with the upstream gene. The gene has a reasonable length of 399bp given the chosen start site @23912. The chosen start site is not the LORF, as the LORF start site has an overlap of 502 bp with the upstream gene and a final score of -6.010 and a Z score of 1.659. Therefore, based on this logic the LORF start site is not an ideal candidate start site. /note=Phamerator: As of 10/6/2022, the gene belongs to pham #50048. Since the phage does not belong to any clusters, the PhagesDB pham database shows that the pham the gene belongs to is also found in mainly cluster A phages. Therefore, for comparison purposes I used cluster A non-draft phages: AbbysRanger and Adzzy. No function was called for from the phams database. /note=Starterator: Among the members of the pham that my gene belongs to, there is a reasonable conserved site @Start site number 102 where 194/322 non-draft genes in this pham were called for. Although, my gene has start site 97 and 7/342 non-draft genes were called for and includes the start site @23912 that was called for by Genemark & Glimmer. /note=Location call: Based on all the evidence gathered, the gene is a real gene and has a start site @23192. Starterator, Genemark, and Glimmer call for the same start site and there is reasonable coding potential within the ORF. Not to mention, the start site has the best final and Z score. No CDD or TOPCONS hits were informative. /note=Function call: Predicted function is a lipoprotein, based on the hits from NCBI & PhagesDB BLASTp that both give results with e values ranging from e^-45 and e^-35 respectively. With a coverage of >93.9394% and identity of >56.2963%. /note=Transmembrane domains: No data from TMHMM. /note=Secondary Annotator Name: Hsu, Norman /note=Secondary Annotator QC: Make sure to fill out the starterator dropdown menu and also include the year alongside the date when you checked the phamerator. Overall, I agree with the annotation. CDS 24307 - 24687 /gene="33" /product="gp33" /function="nucleoside deoxyribosyltransferase" /locus tag="Emotion_33" /note=Original Glimmer call @bp 24307 has strength 14.22; Genemark calls start at 24307 /note=SSC: 24307-24687 CP: yes SCS: both ST: SS BLAST-Start: [nucleoside deoxyribosyltransferase [Arthrobacter phage BaileyBlu]],,NCBI, q1:s1 83.3333% 2.35322E-38 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.588, -3.627004739933808, yes F: nucleoside deoxyribosyltransferase SIF-BLAST: ,,[nucleoside deoxyribosyltransferase [Arthrobacter phage BaileyBlu]],,UJQ87166,70.9402,2.35322E-38 SIF-HHPRED: c.23.14.2 (A:1-118) Hypothetical protein PA1492 {Pseudomonas aeruginosa [TaxId: 287]} | CLASS: Alpha and beta proteins (a/b), FOLD: Flavodoxin-like, SUPFAM: N-(deoxy)ribosyltransferase-like, FAM: Hypothetical protein PA1492,,,SCOP_d1t1ja1,88.0952,99.7 SIF-Syn: Nucleoside deoxyribosyltransferase, there is no upstream synteny, but downstream is LAGLIDADG endonuclease, just like in phage Warda. The gene that is upstream of LAGLIDADG endonuclease in phage Warda is not same pham as this gene, but they have the same function. /note=Primary Annotator Name: Chen, Daniel /note=Auto-annotation: Glimmer and Genemark both call the start site as 24307. Start codon is GTG. /note=Coding Potential: Coding potential is found in both GeneMark Host and Self. The ORF has reasonable coding potential and the predicted start site covers all of it. /note=SD (Final) Score: SD final score is -3.627. Z-score at 2.588. These are the best scores on PECAAN. /note=Gap/overlap: There is an overlap of 4 bp, which may be indicative of an operon. /note=Phamerator: Pham 49073. Date 10/4/22. Although this phage’s cluster is unknown, the pham is conserved in other clusters; found in Chocolat (AR), GrandSlam (DI), Tophat (AR), and more. The functions called for this gene were hydrolase and nucleoside deoxyrybosyltransferase. Function calls were consistent and are in the approved functions list. /note=Starterator: Start site 24 was not manually annotated. Start 24 is 24307, which agrees with the start site called by Glimmer and GeneMark. This start is not the most annotated start site. /note=Location call: Based on above evidence, the gene is real and the start site is most likely at 24307. /note=Function call: Predicted function nucleoside deoxyribosyltransferase, based on hits from Phagesdb Blast, NCBI BLASTp, and HHpred, all of which had hits with high query coverage (>83%), high % identity (>46.7%), and low E-values (<7.4e-14). CDD had no relevant hits. /note=Transmembrane domains: No transmembrane domains predicted. /note=Secondary Annotator Name: Chang, Stacy /note=Secondary Annotator QC: I agree with the location and function call of this annotation. I believe that the starterator check box should be "suggested start", because you chose the start site that starterator chose. Starterator was still helpful because it allowed us to see, for example, that the nearby start site 23 had been manually annotated in other phages. I believe "not informative" is only used when you reject the start site. CDS 24684 - 25088 /gene="34" /product="gp34" /function="LAGLIDADG endonuclease" /locus tag="Emotion_34" /note=Original Glimmer call @bp 24684 has strength 12.52; Genemark calls start at 24684 /note=SSC: 24684-25088 CP: yes SCS: both ST: NI BLAST-Start: [LAGLIDADG family homing endonuclease [Pseudarthrobacter siccitolerans] ],,NCBI, q4:s5 97.7612% 9.08005E-67 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.638, -3.525068646571426, yes F: LAGLIDADG endonuclease SIF-BLAST: ,,[LAGLIDADG family homing endonuclease [Pseudarthrobacter siccitolerans] ],,WP_050053697,83.7037,9.08005E-67 SIF-HHPRED: DNA endonuclease I-CreI; protein, DNA, HYDROLASE-DNA COMPLEX; 1.6A {Chlamydomonas reinhardtii} SCOP: d.95.2.1,,,1T9I_B,79.8507,99.5 SIF-Syn: LAGLIDADG endonuclease, upstream gene is nucleoside deoxyribosyltransferase, downstream gene is unknown function just like in phage Amyev /note=Primary Annotator Name: Han, Maggie /note=Auto-annotation: Glimmer and Genemark: 24684 Start Codon: ATG /note=Coding Potential: Reasonable coding potential in putative ORF. Suggested start site covers all of coding potential /note=SD (Final) Score: SD score is the best score. The final score of -3.525 is the least negative and the Z score of 2.638 is a good z score /note=Gap/overlap: There is an overlap of 4 which is reasonable. The suggested start site produces the longest ORF. Overlap of 4 suggests this gene is part of an operon. /note=Phamerator: The gene is in pham 1807 as of 10/4/22. The pham is conserved among cluster AZ which is not the cluster Emotion is in. Some phages used for comparison were Adolin, Adumb2043, and Amyev. The function commonly called is endonuclease (either LAGLIDADG or HNH). /note=Starterator: There is no start site choice conserved among the members of the pham. Start site 18 corresponds to 24684 bp and was called for this phage. It was found in 1 of 55 phages in the pham. The most common start site is 19 found in 30 of the 38 non draft phages in the pham. Emotion does not contain this start site. /note=Location call: The start site of 24684 is likely the start site as it has a good SD score and covers the entire coding potential. An overlap of 4bp suggests that it is part of an operon. /note=Function call: LAGLIDADG endonuclease is the most likely function. Although PhagesDB blast suggests HNH endonuclease with good e-values, identity, and coverage, NCBI blast suggests LAGLIDADG endonuclease also with good e-values, identity and coverage. HHPred suggests DNA endonuclease and CDD produces a hit with LAGLIDADG binding domain with e-value of 0.0001. There seems to be more evidence leaning towards LAGLIDADG endonuclease since CDD calls a binding domain. /note=Transmembrane domains: No transmembrane domains called /note=Secondary Annotator Name: Rivera, Wendy /note=Secondary Annotator QC: Needs to mention if the gene length is reasonable or not. Needs to mention if the gene is a real gene or not. Needs to provide a brief summary of values that were used to determine the function of the gene from all the programs that provided informative results. Also, the gene does not share synteny with phage Warda. CDS 25237 - 25938 /gene="35" /product="gp35" /function="recombination directionality factor" /locus tag="Emotion_35" /note=Original Glimmer call @bp 25165 has strength 17.9; Genemark calls start at 25165 /note=SSC: 25237-25938 CP: no SCS: both-cs ST: SS BLAST-Start: [recombination directionality factor [Arthrobacter phage Iter]],,NCBI, q1:s1 100.0% 2.09862E-120 GAP: 148 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.597, -4.119075175954406, yes F: recombination directionality factor SIF-BLAST: ,,[recombination directionality factor [Arthrobacter phage Iter]],,URQ05018,83.5443,2.09862E-120 SIF-HHPRED: Gp3-like ; Recombination directionality factor-like,,,PF18897.3,87.5536,100.0 SIF-Syn: Moderate synteny, with an upstream LAGLIDADG endonuclease upstream, matching Warda and Tbone. However, the following two genes are both orphams and so synteny stops there. /note=Primary Annotator Name: Chew, Brandon /note=Auto-annotation: GeneMark and Glimmer auto-annotated an ATG start site at 25165. /note=Coding Potential: Very strong coding potential from ~25220 to ~25900, all within the ORF of the auto-annotated start site. /note=SD (Final) Score: -4.119. It is the best score on PECAAN. /note=Gap/overlap: 148 bp gap. It is a rather large gap, and the gap does contain about ~20 bp of CP, but this is the best choice due to the overwhelming number of manual annotation on this start site over the auto-annotated start site, the rather small amount of coding potential lost, and that the gap is conserved in other AZ phages (Warda and Tbone). /note=Phamerator: Pham 848 (accessed 10/5/22); completely conserved in every member of cluster AZ, EH, EB and FP. (Warda, Floof, Swervy, and BaileyBlu, respectively.) Also, every member of the pham with a known function is a recombination directionality factor. /note=Starterator: Start 31 is manually annotated in 31 out of 96 non-draft genes in this pham. It is the best start site over the auto-annotated start site, which has no manual annotations and has lower RBS score and lower Z-score. /note=Location call: Manual annotation of start site 31, conserved gap, small loss of CP, start codon identity, and favorable RBS and Z-score indicate 25237 is the correct start site and that this is a real gene. /note=Function call: NCBI BLASTp yielded 3 phage hits indicating recombination directionality factor function with strong quantitative evidence (%coverage > 99%, identity >70%, and E-value e-96). CDD yielded no hits. HHpred yielded 1 hit for recombination directionality factor with 100% probability, 87% coverage, and 7e-33 e-value. Evidence strongly supports gene being a recombination directionality factor. /note=Transmembrane domains: TMHMM and TOPCONS both predict 0 transmembrane domains, and so protein is likely not a membrane protein. /note=Secondary Annotator Name: Chen, Daniel /note=Secondary Annotator QC: I do not agree with this annotation yet. Check the QC notes on the spreadsheet. CDS 25935 - 26117 /gene="36" /product="gp36" /function="membrane protein" /locus tag="Emotion_36" /note=Original Glimmer call @bp 25935 has strength 18.8; Genemark calls start at 25935 /note=SSC: 25935-26117 CP: yes SCS: both ST: NA BLAST-Start: GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.176, -4.394706008439538, yes F: membrane protein SIF-BLAST: SIF-HHPRED: SIF-Syn: Membrane protein. The upstream gene is pham 848 (recombination directionality factor) just like phages Maureen and Liebe. The downstream gene is an orpham. /note=Primary Annotator Name: Vu, Trinity /note=Auto-annotation: Glimmer and GeneMark start call 25935 and call the start codon ATG. /note=Coding Potential: There is good coding potential in the putative ORF and only on one reading frame in the forward direction on Host-Trained and Self-Trained GeneMark maps. All coding potential is covered. /note=SD (Final) Score: -4.395. Best on PECAAN. While this is a poor RBS, it is acceptable as the gene is likely an operon which does not need a binding site. /note=Gap/overlap: There is a -4 gap which is indicative of an operon. The alternatives result in too large an overlap (196+ base pairs) or too large a gap (65+ base pairs). /note=Phamerator: Pham 48338 as of 10/8/22. Only member in pham. /note=Starterator: This gene is an orpham so n/a. /note=Location call: This is likely a real gene with start site 25925 as, while this gene is an orpham with a poor final score, it has a 4 base pair overlap with the gene upstream and calls start codon ATG, both of which are strong evidence of an operon. Additionally, it has good coding potential. /note=Function call: The likely function is membrane protein. Phagesdb BLAST, HHPRED, CDD and NCBI Blast yielded no hits or no relevant hits due to very poor (large) E-values of 10-2 or greater. This data therefore suggests NKF. However, the TMD hits in TMHMM suggest the function to be membrane protein. /note=Transmembrane domains: TMHMM predicts 2 TMDs and TOPCONS predicts 2 TMDs. /note=Secondary Annotator Name: Han, Maggie /note=Secondary Annotator QC: glimmer and genemark call 25935 but your notes say 25925 (QC comment addressed 10/17/22) CDS 26114 - 26233 /gene="37" /product="gp37" /function="hypothetical protein" /locus tag="Emotion_37" /note=Original Glimmer call @bp 26114 has strength 9.06; Genemark calls start at 26114 /note=SSC: 26114-26233 CP: yes SCS: both ST: NA BLAST-Start: GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.553, -6.4773477865966695, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: N/A /note=Primary Annotator Name: Hsu, Norman /note=Auto-annotation: Glimmer and GeneMark, Both call the start site at 26114, GTG /note=Coding Potential: There is good coding potential in this ORF. It is found on the forward strand in both GeneMark Self and Host. /note=SD (Final) Score: The SD final score is -6.477. This is not the best final score, but the other two starting sites consist of too big of an overlap with the upstream gene that make them unlikely candidates. /note=Gap/overlap: There is an -4 overlap with the upstream gene, this is a reasonable size for an overlap. /note=Phamerator: The pham number as of 10/6/2022 is 48566. This gene is an orpham and yet to be found in other phages. No gene function is called for by this pham number yet. /note=Starterator: No other genes of non-draft phages are reported. The Starterator readout is not informative. /note=Location call: Based on all available evidence, the gene is most likely a real gene and the most likely start site is at 26114. /note=Function call: No known function. There are no Phagesdb Blast, NCBI Blast, CDD hits. HHPRED has some readouts but the E-values are all larger than 10^-3, which should not be considered as evidence. /note=Transmembrane domains: Both TmHmm and Topcons display no TMDs /note=Secondary Annotator Name: Chew, Brandon /note=Secondary Annotator QC: needs attention: technically, all e-values are greater than 0. It may be better to use the 10^-3 threshold instead. CDS 26311 - 26598 /gene="38" /product="gp38" /function="NrdH-like glutaredoxin" /locus tag="Emotion_38" /note=Original Glimmer call @bp 26311 has strength 16.43; Genemark calls start at 26311 /note=SSC: 26311-26598 CP: yes SCS: both ST: SS BLAST-Start: [MULTISPECIES: redoxin NrdH [unclassified Rhodococcus] ],,NCBI, q1:s1 81.0526% 2.14613E-14 GAP: 77 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.005, -3.1725816037952645, yes F: NrdH-like glutaredoxin SIF-BLAST: ,,[MULTISPECIES: redoxin NrdH [unclassified Rhodococcus] ],,WP_042576468,65.0,2.14613E-14 SIF-HHPRED: c.47.1.0 (A:) automated matches {Baker`s yeast (Saccharomyces cerevisiae) [TaxId: 559292]} | CLASS: Alpha and beta proteins (a/b), FOLD: Thioredoxin fold, SUPFAM: Thioredoxin-like, FAM: automated matches,,,SCOP_d2m80a_,74.7368,99.1 SIF-Syn: /note=Primary Annotator Name: Chang, Stacy /note=Auto-annotation: Glimmer and GeneMark agree on start at 26311. /note=Coding Potential: On the host-trained GeneMark, significant coding potential in this ORF (>0.5) is on the forward strand only, agreeing with the auto-annotation call. On the Self-trained GeneMark, spikes of coding potential are found on the reverse strand as well, but are not contiguous as the coding potential appears on the first forward reading frame. Start site 26311 covers all the coding potential, which is apparent in both GeneMark Self and Host. /note=SD (Final) Score: -3.173, which is the best final score on PECAAN. The Z score is also the highest at 3.005. /note=Gap/overlap: Gap is somewhat large at 77bp. There is no overlap and adjoining genes are also in the forward direction. The other gene candidate, start@26266, has a smaller gap at 32, but spans a large region of the genome that does not have coding potential. /note=Phamerator: 49981, which primarily displays the similar function calls of NrdH-like glutaredoxin, NrdH-like protein, or glutaredoxin. Occasionally, function for a member of this pham is called as thioredoxin, which is a related redox protein that serves many of the same functions as glutaredoxins. Very rarely, generic redoxins are called, which is in line with the specific redoxins called by the majority of the pham members. Pham 49981 is a well conserved pham with 939 members across various actinobacteriophage clusters. /note=Starterator: Start site 104 (26311) is called only 2.7% of the time when present in pham 49981. Other phages that call this site are ExplosioNervosa (A9) and Honk39 (EH). The most-called start site, 110, is much closer to start site 104 than it is to start site 81, which is Emotion’s only other gene candidate. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 26311. /note=Function call: NrdH-like glutaredoxin. Phagesdb BLAST hits have low e values (e-9) and solely call “NrdH-like glutaredoxin” or the more general “glutaredoxin” (which has only 3 hits and displays higher e-values). HHPRED shows support for thioredoxin but draws its data from non-phage organisms. NCBI BLAST and CDD show strong support for glutaredoxin. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Vu, Trinity /note=Secondary Annotator QC: i agree with the function call and great job on explaining the varying functions called in phamerator! CDS 26591 - 26806 /gene="39" /product="gp39" /function="hypothetical protein" /locus tag="Emotion_39" /note=Original Glimmer call @bp 26612 has strength 18.37; Genemark calls start at 26591 /note=SSC: 26591-26806 CP: yes SCS: both-gm ST: NI BLAST-Start: GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.207, -4.314571574115886, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: De Jesus, Jorja /note=Auto-annotation: Glimmer indicates that this has a start site at 26612 while GeneMark states that the start site is 26591. /note=Coding Potential: There seems to be high coding potential in the region between the stop site 26806 (which matches the end of the ORF) and the region where the two auto annotated start sites are provided. /note=SD (Final) Score: The best candidate is the one with the highest final score at -4.315 and a Z-score of 2.207 /note=Gap/overlap: There is gap of -8 and a spacer of 11. /note=Phamerator: This gene is an orpham and was given the pham number 48527. /note=Starterator: Both starterator and Phagesdb indicate that this is an orpham so the start sites would only be the /note=Location call: The location of this gene cannot be determined because it is an orpham. /note=Function call: NKF. The only evidence of this gene is coding potential; there is no other evidence or data provided through the other programs. /note=Transmembrane domains: This gene is not a transmembrane domain as there were no hits on TmHmm or TOPCONS. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 26793 - 27239 /gene="40" /product="gp40" /function="Holliday junction resolvase" /locus tag="Emotion_40" /note=Original Glimmer call @bp 26793 has strength 16.71; Genemark calls start at 26793 /note=SSC: 26793-27239 CP: yes SCS: both ST: SS BLAST-Start: [holliday junction resolvase [Arthrobacter phage Tbone] ],,NCBI, q1:s1 99.3243% 1.06704E-53 GAP: -14 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.027, -3.1285997640366707, yes F: Holliday junction resolvase SIF-BLAST: ,,[holliday junction resolvase [Arthrobacter phage Tbone] ],,QPX62366,71.4286,1.06704E-53 SIF-HHPRED: Holliday junction resolvase; archeal holliday junction resolvase helicase DNA binding enzyme phage 15-6 thermus thermophilus, RECOMBINATION; HET: MSE, SO4; 2.5A {Thermus thermophilus phage 15-6},,,7BGS_B,75.6757,99.6 SIF-Syn: Holliday junction resolvase, upstream gene is NKF, downstream gene is NrdH-like glutaredoxin, just like in phage Warda. /note=Primary Annotator Name: Rivera, Wendy /note=Auto-annotation: Gene(stop@27239) Start codon is ATG. Both Glimmer and Genemark agree with the start site @26793. /note=Coding Potential: The gene has reasonable coding potential within the ORF and includes the chosen start site @26793. /note=SD (Final) Score: The candidate does have the best final score of -3.129 and Z score of 3.027. /note=Gap/overlap: The gene has a large gap of 194 bp. The gene has a reasonable length of 447 bp with the start site @26793 and is the LORF. /note=Phamerator: As of 10/6/2022,the pham my gene belongs to is pham #48667. Since Emotion does not belong to a cluster, the most common cluster that appears from the Pham database and shares the same Pham# that my gene belongs to, is Cluster AZ. Therefore, I used phage Adumb2043 as a comparison. The Phams Database also called for the function of holliday junction resolvase. /note=Starterator: Among the members of the pham that my gene belongs to, the most conserved site is start site #57 and 61/277 called for this site. Although, my gene shares start site #63 and 49/277 called this site, which includes the start site @26793 that both Glimmer and Genemark called. /note=Location call: Based on the evidence, the gene is a real gene and has a candidate start site @26793. The start site is conserved in phamerator, there is reasonable coding potential within the ORF including the candidate site, and both Glimmer and Genemark call for the start site. /note=Function call: Predicted function is holliday junction resolvase, based on the hits from NCBI and PhagesDB BLASTp, and HHPRED. With hits from NCBI consisting of low e values of 4.237e^-46, % identity higher than 68.9655%, and % coverage higher than 79.7297%. On PhagesDB BLASTp with low e-values of 7e^-39, and HHPRED having low e values of 1.1e^-14 with a % coverage of 75.67%. No informative hits from CDD and TOPCONS. /note=Transmembrane domains: No data is available from TMHMM. /note=Secondary Annotator Name: Chang, Stacy /note=Secondary Annotator QC: I agree with the location and function call of this gene. The following sections require attention - [GAP/OVERLAP] Back when you annotated the gene, there was another gene right before your gene, leading to the 14bp gap. That gene got deleted so now we have a 194bp gap instead of 14bp overlap! Not your fault but would be good to update for clarity. [STARTERATOR] I think it would lend strength to your argument to mention that Emotion does not contain start site 57, and that start site 63 is called 98% of the time when present. [STARTERATOR BOX] Since you called the same start site as starterator did, "Suggested Start" should be selected instead of "NA". [SYNTENY BOX] Due to the upstream gene being deleted, you may want to alter the synteny box to reflect this. CDS complement (27236 - 27364) /gene="41" /product="gp41" /function="hypothetical protein" /locus tag="Emotion_41" /note=Original Glimmer call @bp 27364 has strength 4.41; Genemark calls start at 27388 /note=SSC: 27364-27236 CP: yes SCS: both-gl ST: NI BLAST-Start: [hypothetical protein SEA_CREWMATE_42 [Arthrobacter phage Crewmate]],,NCBI, q1:s7 95.2381% 0.00355807 GAP: 120 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.199, -2.253486910374644, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CREWMATE_42 [Arthrobacter phage Crewmate]],,UIW13294,56.3636,0.00355807 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chen, Daniel /note=Auto-annotation: Glimmer calls 27364 and GeneMark calls 27388. Start codon for both is ATG. /note=Coding Potential: Coding potential is found in both GeneMark Host and Self. The ORF has reasonable coding potential and the predicted start sites cover all of it. /note=SD (Final) Score: The SD final score for 27364 is -2.253. Z-score at 3.199. These are the best scores on PECAAN. /note=Gap/overlap: For start 27364 there is a gap of 120 bp. This gap is quite large, but this gene is on the reverse frame. /note=Phamerator: Pham 50535. Date 10/6/22. Although this phage’s cluster is unknown, the pham is conserved in other clusters; found in Warda (AZ), YesChef (AZ), Tweety19 (AZ), and more. There were no functions called. /note=Starterator: Start site 23 was not manually annotated in this pham. Start site 23 is 27364, which agrees with the start site predicted by Glimmer. Start site 19 was the most annotated for this pham, which is included in this gene, but the other evidence above shows that it might not be the best start site for this gene. /note=Location call: Based on above evidence, the gene is real and the start site is most likely at 27364. /note=Function call: Function unknown. No programs returned any informative results. /note=Transmembrane domains: No transmembrane domains predicted. /note=Secondary Annotator Name: De Jesus, Jorja /note=Secondary Annotator QC: I QC`ed this location call and agree with these annotations. CDS 27485 - 29995 /gene="42" /product="gp42" /function="DNA primase/helicase" /locus tag="Emotion_42" /note=Original Glimmer call @bp 27485 has strength 15.59; Genemark calls start at 27485 /note=SSC: 27485-29995 CP: yes SCS: both ST: SS BLAST-Start: [DNA primase/helicase [Arthrobacter phage Janeemi]],,NCBI, q1:s1 100.0% 0.0 GAP: 120 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.171, -4.387988835334809, no F: DNA primase/helicase SIF-BLAST: ,,[DNA primase/helicase [Arthrobacter phage Janeemi]],,UVK63560,85.766,0.0 SIF-HHPRED: DNA primase; Helicase, DNA binding, AMPPNP, REPLICATION; HET: ANP; 3.1A {Staphylococcus aureus},,,7OM0_B,43.4211,100.0 SIF-Syn: /note=Primary Annotator Name: Han, Maggie /note=Auto-annotation: Glimmer and Genemark: 27485 Start Codon: TTG /note=Coding Potential: Reasonable coding potential in putative ORF. Suggested start site covers all of coding potential /note=SD (Final) Score: SD score is the best score. The final score of -4.388 is the least negative and the Z score of 2.171 is a good z score /note=Gap/overlap: There is an gap of 120 which is not the best. The suggested start site does not produce the longest ORF. /note=Phamerator: The gene is in pham 880 as of 10/8/22. The pham is conserved among cluster EB which is not the cluster Emotion is in. Some phages used for comparison were Abigail, Albright, and Armstrong. The function commonly called is DNA primase/helicase /note=Starterator: There is a reasonable start site choice conserved among the members of the pham. Start site 41 corresponds to 27485 bp found in 44 of the 94 non draft phages in the pham and is the most common start site. Start site 41 was called for this phage. /note=Location call: The start site of 27485 is likely the start site as it has a good SD score and covers the entire coding potential. Gap of 120 bp is a big gap but 27485 is the most annotated start site for this pham. /note=Function call: DNA primase/helicase is the most likely function. This is the likely function of the gene since both PhagesDB and NCBI Blast produce hits with e-values of 0 and relatively high identities greater than 70%. HHPred and CDD both produce hits with low e-values of less than 10^-31 and 0 (respectively) and high probabilities of greater than 99%. /note=Transmembrane domains: No transmembrane domains called /note=Secondary Annotator Name: De Jesus, Jorja /note=Secondary Annotator QC: I QC`ed this location call and agree with these annotations. CDS 30004 - 30849 /gene="43" /product="gp43" /function="hypothetical protein" /locus tag="Emotion_43" /note=Original Glimmer call @bp 30004 has strength 26.11; Genemark calls start at 30004 /note=SSC: 30004-30849 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein HOU70_gp40 [Arthrobacter phage Liebe] ],,NCBI, q145:s74 24.1993% 1.68306E-4 GAP: 8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.171, -4.4057176022952405, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU70_gp40 [Arthrobacter phage Liebe] ],,YP_009817072,15.1751,1.68306E-4 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chew, Brandon /note=Auto-annotation: Both Glimmer and GeneMark were used to auto-annotate the gene and agreed on a GTG start site at 30004. /note=Coding Potential: Very strong coding potential from ~30030 to ~30800 /note=SD (Final) Score: -4.406. Not the best by final score or Z-score, but it the only option that does not exclude 25% or more of the gene sequence. /note=Gap/overlap: 8 bp gap. Reasonable gap length, without possibility of a gene in the gap, and ORF covers all the coding potential. /note=Phamerator: Not informative; gene is an orpham. /note=Starterator: Not informative; gene is an orpham. /note=Location call: Coverage of coding potential, favorable Z-score, gap length, and auto-annotation indicate that 30004 is the best start site and that this is a real gene. /note=Function call: No PhagesDB hits of known function with sufficient e-value. NCBI BLASTp yielded no hits with sufficient E-value, %coverage, or %identity. CDD yielded no hits. HHpred yielded no hits of approved function with sufficient E-value or probability. Gene has NKF. /note=Transmembrane domains: Nor informative. TMHMM predicted no transmembrane domains, indicating this is unlikely to be a membrane protein. /note=Secondary Annotator Name: Rivera, Wendy /note=Secondary Annotator QC: Called the incorrect start site in notes under Auto Annotation. Needs to mention if the gene is a real gene or not. Needed to mention if TOPCONS data was informative or not. CDS 30851 - 30973 /gene="44" /product="gp44" /function="hypothetical protein" /locus tag="Emotion_44" /note=Original Glimmer call @bp 30851 has strength 18.86; Genemark calls start at 30851 /note=SSC: 30851-30973 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein SEA_BAILEYBLU_40 [Arthrobacter phage BaileyBlu]],,NCBI, q11:s10 65.0% 0.00706068 GAP: 1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.945, -3.4175091270247986, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BAILEYBLU_40 [Arthrobacter phage BaileyBlu]],,UJQ87178,48.7179,0.00706068 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Vu, Trinity /note=Auto-annotation: Glimmer and GeneMark Start both call the start site at 30851 and call the start codon ATG. /note=Coding Potential: For both the host and self-trained GeneMark maps, while there is good coding potential in the putative ORF in the forward direction, there is good coding potential on the Self-trained map and some coding potential on the Host-Trained map in the reverse direction. However, the upstream and downstream genes are also in the forward direction meaning that the presence of some coding potential in the reverse direction is significant to change the direction. All coding potential is covered. /note=SD (Final) Score: -3.418 (Best in PECAAN). /note=Gap/overlap: There is a 1 base pair gap which is reasonable. The length of 123, while short, could still be reasonable for a gene. The alternative start site was not chosen as it has a worse final score and results in an overlap of 74 base pairs. /note=Phamerator: Pham 48297 as of 10/8/22. Only member in pham. /note=Starterator: This gene is an orpham so n/a /note=Location call: This is likely a real gene with start site 30851 as there is good coding potential in the putative ORF in the forward direction, the alternative results in a large overlap, and while small, the gene length is still reasonable. /note=Function call: NKF. PhagesDB Blast, HHPRED, and NCBI BLAST had no relevant hits due to poor E-values and CDD had no hits. /note=Transmembrane domains: No TMDs predicted by TOPCONS and TMHMM. /note=Secondary Annotator Name: Chen, Daniel /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 30973 - 31122 /gene="45" /product="gp45" /function="hypothetical protein" /locus tag="Emotion_45" /note=Original Glimmer call @bp 30973 has strength 7.03 /note=SSC: 30973-31122 CP: yes SCS: glimmer ST: NA BLAST-Start: [hypothetical protein SEA_CREWMATE_45 [Arthrobacter phage Crewmate]],,NCBI, q1:s1 81.6327% 3.29736E-14 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.945, -2.9063687850157054, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CREWMATE_45 [Arthrobacter phage Crewmate]],,UIW13297,90.0,3.29736E-14 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hsu, Norman /note=Auto-annotation: Glimmer calls the start site at 30973, ATG, while Genemark shows no start site. /note=Coding Potential: There is good coding potential in this ORF. It is found on the forward strand in both GeneMark Self and Host. /note=SD (Final) Score: The final SD score is -2.906. This is the best score listed on Pecaan in comparison to other entries. /note=Gap/overlap: There is an -1 overlap with the upstream gene, this is a reasonable size for an overlap. /note=Phamerator: The pham number as of 10/6/22 is 10251. The gene is conserved in 3 out of 3 of the non draft genes including Crewmate and Yang from the AZ cluster. /note=Starterator: There are 3 non draft member of this pham, and 3 out of 3 member call for start site #3. It is the most conserved start site among the members within the pham, which correlates to the start site at 30973 in Emotion. /note= /note=Location call: Based on all available evidence, the gene is most likely a real gene and the most likely start site is at 30973. /note=Function call: NKF. Crewmate and Yang, the top 2 non draft NCBI blast hits have the function of hypothetical protein (E-value< 10^-10), and 3 out of 3 top Phagesdb Blast listed it as function unknown (Coverage >97%, and E-value< 10^-40 ). HHpred has a hit for scaffold protein albeit having an E-value of 0.081 considering all other evidence. CDD had no relevant hits. /note=Transmembrane domains: Both TmHmm and Topcons display no TMDs /note=Secondary Annotator Name: Han, Maggie /note=Secondary Annotator QC: need to select starterator drop down CDS 31321 - 33183 /gene="46" /product="gp46" /function="DNA polymerase I" /locus tag="Emotion_46" /note=Original Glimmer call @bp 31321 has strength 17.03; Genemark calls start at 31321 /note=SSC: 31321-33183 CP: yes SCS: both ST: SS BLAST-Start: [DNA polymerase I [Arthrobacter phage Janeemi]],,NCBI, q1:s1 100.0% 0.0 GAP: 198 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.348, -2.4811409279978642, yes F: DNA polymerase I SIF-BLAST: ,,[DNA polymerase I [Arthrobacter phage Janeemi]],,UVK63563,85.1852,0.0 SIF-HHPRED: Prex DNA polymerase; DNA polymerase, TRANSFERASE; 2.9A {Plasmodium falciparum (isolate 3D7)},,,5DKU_B,97.2581,100.0 SIF-Syn: /note=Primary Annotator Name: Chang, Stacy /note=Auto-annotation: Glimmer and GeneMark agree on start at 31321. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, agreeing with the auto-annotation call. Start site 31321 covers all the coding potential, which is apparent in both GeneMark Self and Host. /note=SD (Final) Score: -2.481, which is significantly better than other final scores on PECAAN. The Z score is also the highest at 3.348. /note=Gap/overlap: Gap is large at 198 bp, but this large gap is conserved in phage Warda (AZ) which has a gap of 207. There is no overlap and adjoining genes are also in the forward direction. /note=Phamerator: 51379 as of 10/18/2022, which has a function call of DNA Pol I. Pham 49976 is a well conserved pham with 1531 members across many actinobacteriophage clusters, particularly in clusters A and B. /note=Starterator: Start site 211 (start@31321) is called 99% of the time when present in pham 49976. It is the most annotated start site. 778 of 1421 non-draft genes have called this start site. Phages with this called start site span various clusters. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 31321. /note=Function call: DNA Polymerase I. All PECAAN widgets show strong support for this function call. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Chew, Brandon /note=Secondary Annotator QC: needs attention: please include date phamerator call was made (in case pham number changes), and a few examples of clusters that are well represented in the pham. Please list out number of hits from each source (PhagesDB Blast, etc.) and how good they are (e-value and maybe identity/coverage should suffice). CDS 33180 - 33395 /gene="47" /product="gp47" /function="hypothetical protein" /locus tag="Emotion_47" /note=Original Glimmer call @bp 33177 has strength 5.87; Genemark calls start at 33195 /note=SSC: 33180-33395 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_JANEEMI_44 [Arthrobacter phage Janeemi]],,NCBI, q1:s1 80.2817% 3.63033E-25 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.945, -2.9063687850157054, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_JANEEMI_44 [Arthrobacter phage Janeemi]],,UVK63564,81.6667,3.63033E-25 SIF-HHPRED: SIF-Syn: /note=PECAAN Notes /note=Primary Annotator Name: De Jesus, Jorja /note=Auto-annotation: Glimmer indicates that the start of this gene is 33177 while GeneMark indicates that the start of this gene is 33195. The start codon for 33177 is ATG. The start codon for the correct start (33180) is GTG. /note=Coding Potential: With the start at 33180 and the stop at 33395, the ORF seems to encompass this and has high coding potential in between. /note=SD (Final) Score: The SD final score is -2.906 and the Z- score is 2.945. /note=Gap/overlap: There is a gap of -4 and a spacer of 12. The overlap may be due to the presence of an operon. /note=Phamerator: As of 9/30/22 this gene is a part of pham 2369 which has 40 members and 12 drafts. /note=Starterator: There 3 starts called for for this gene: Start 3 which is 33177, Start 4 which is 33180, and start 7 which is 33195. Start 3 has no manual annotations and was the least called. Start 4 is the most manually annotated start while start 7 is the most called start since it was called by all the genes. Start 7 also has only 4 less MAs than start 4. On PECAAN based on the final score start 4 seems to be the best candidate due to it having the most MAs, start 7 has a SD final score that is too low, and although start 3 has the highest SD score, it has no MAs. /note=Location call: The correct start for this gene should be 33180. /note=Function call: NKF. HHpred and CDD do not have any informative hits. Phagesdb BLAST has hits with good e-values but only has alignments with genes of unknown function. NCBI blast has some significant hits with the top 2 being hypothetical proteins and one being a DNA helicase. Because the top 2 are hypothetical proteins, the function of this gene should match these only pieces of evidence. /note=Transmembrane domains: There are no hits on TmHmm and Topcons so this is not a transmembrane domain. /note=Secondary Annotator Name: Vu, Trinity /note=Secondary Annotator QC: function unknown (as long as not draft) can be checked as evidence in BLASTp, hypothetical protein function equivalent to NKF, just something to consider is if the 4 bp overlap is indicative of an operon or not, overall agree with function call (NFK instead tho) and good explanation of why start site was chosen! CDS 33359 - 33724 /gene="48" /product="gp48" /function="hypothetical protein" /locus tag="Emotion_48" /note=Original Glimmer call @bp 33359 has strength 13.14; Genemark calls start at 33410 /note=SSC: 33359-33724 CP: yes SCS: both-gl ST: NI BLAST-Start: [hypothetical protein SEA_ADUMB2043_41 [Arthrobacter phage Adumb2043]],,NCBI, q25:s8 77.686% 5.63295E-18 GAP: -37 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.105, -2.523003374675015, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ADUMB2043_41 [Arthrobacter phage Adumb2043]],,QOP65101,60.396,5.63295E-18 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Rivera, Wendy /note=Auto-annotation: Gene(stop@33724F) Glimmer calls the start site @33359 with start codon GTG, while Genemark calls for start site @33410 with an uncommon start codon of TTG. /note=Coding Potential: The gene has reasonable coding potential within the ORF and covers the start site @33359. At start site 33410, there is excess coding potential outside of the ORF that pertains to the gene. /note=SD (Final) Score: Start site @33359 has the best SD score of -2.523 and Z score of 3.105. Start site @33410 has a Final score of -5.403 and Z score of 1.636. /note=Gap/overlap: Start site @33359 has an overlap of 37 bp with the upstream gene and a gene length of 366. Start site @33410 has a gap of 14 bp with the upstream gene and a gene length of 310bp. Neither start site is the LORF, the LORF is not a possible candidate start site due to the lack of coding potential within the ORF and poor SD and Z score. Therefore, the chosen start site is @33359. /note=Phamerator: As of 10/6, the pham that the gene belongs to is pham #965. Since Emotion is a clusterless phage, other clusters/subclusters that are commonly annotated and share the same pham as this gene were phages Adumb2043 (cluster AZ) and Abigail (cluster EB), which were also used for comparison. The phams database did not have a function called for this pham. /note=Starterator: The most annotated start site is #35 where 100/104 called for this start site. Both phages Adumb2043 and Abigail include this start site. Emotion calls for start #32 where 1/104 called for this site and includes start site @33359. /note=Location call: Based on the evidence gathered, the gene is a real gene with a start site @33359 being a potential candidate. There is reasonable coding potential within the ORF, and the start site has the best SD & Z score. /note=Function call: No informative data from CDD, HHPRED, and TOPCONS. Predicted function is NKF/hypothetical protein according to PhagesDB and NCBI BLASTp data, with low e-values ranging from 3e^-17 and 5.63e^-18, % coverage of >77.686%, and % identity of >48.5149%. /note=Transmembrane domains: No informative data from TMHMM. /note=Secondary Annotator Name: Hsu, Norman /note=Secondary Annotator QC: I would also note that start sites with start codon TTG is less common in general. For NKF, there is no need to fill out the synteny box and the lab manual stated only put NA for orphams or if there is only one possible start site. CDS 33922 - 34743 /gene="49" /product="gp49" /function="DNA binding protein" /locus tag="Emotion_49" /note=Original Glimmer call @bp 33922 has strength 21.85; Genemark calls start at 33922 /note=SSC: 33922-34743 CP: yes SCS: both ST: NA BLAST-Start: [DNA binding protein [Arthrobacter phage DrSierra]],,NCBI, q5:s1 98.5348% 4.30037E-59 GAP: 197 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.924, -4.884705646117852, no F: DNA binding protein SIF-BLAST: ,,[DNA binding protein [Arthrobacter phage DrSierra]],,QIG58520,61.194,4.30037E-59 SIF-HHPRED: ECF RNA polymerase sigma factor SigK; sigma factor, transcription initiation, DNA binding, Promoter DNA binding and transcription initiation, anti-sigma factor, DNA BINDING; HET: CD; 2.4A {Mycobacterium tuberculosis},,,4NQW_A,93.4066,99.9 SIF-Syn: /note=Primary Annotator Name: Chen, Daniel /note=Auto-annotation: Glimmer and Genemark both call the start site as 33922. Start codon is ATG. /note=Coding Potential: Coding potential is found in both GeneMark Host and Self. The ORF has reasonable coding potential and the predicted start sites cover all of it. /note=SD (Final) Score: The SD final score for 33922 is -4.885. Z-score at 1.924. This score isn’t the best, but evidence suggests that this would be a decent start site. /note=Gap/overlap: There is a gap of 197 bp, which is quite large, but there is little to no coding potential upstream. /note=Phamerator: Pham 50253. Date 10/6/22. Although this phage’s cluster is unknown, the pham is conserved in other clusters; found in Warda (AZ), YesChef (AZ), Tweety19 (AZ), and more. The function DNA binding protein was called. /note=Starterator: Start site 29 was not manually annotated in this pham. Start site 29 is 33922, which agrees with the start site predicted by Glimmer and Genemark. This gene did not have the most annotated gene. /note=Location call: Based on above evidence shown by Glimmer, GeneMark, and Starterator the gene is real and the start site is most likely at 33922. All though the SD score and the Z score isn`t the best, the start site with the best score cuts off a lot of the coding potential. /note=Function call: Predicted function DNA binding protein, based on hits from Phagesdb Blast, NCBI BLASTp, HHpred, and CDD, all of which had hits with high query coverage (>85.7%), high % identity (>46.6%), and low E-values (<4.9e-7). /note=Transmembrane domains: No transmembrane domains predicted. /note=Secondary Annotator Name: Chang, Stacy /note=Secondary Annotator QC: I agree with the location and function call for this gene. Make sure to check the GM coding capacity dropdown box. Also, Starterator box should be "Suggested Start" if you agree with their call. Synteny box should include what phage you are comparing your phage to. I ultimately agree with your location call, but I think it would improve your argument by clarifying the evidence you provide. As it is, it feels like Final score, Z-score, and gap (LORF) are all lending credence to a different start site. You mention that the next gene (of the same pham) has a nearby start site that is commonly annotated, but Emotion_50 was actually annotated as start site 22, not 29. I would draw evidence from other genes (like Honk) with long coding portions prior to start site, as this gene has, but ultimately still call their start site at a late site. CDS 34867 - 35778 /gene="50" /product="gp50" /function="DNA binding protein" /locus tag="Emotion_50" /note=Original Glimmer call @bp 34867 has strength 18.19; Genemark calls start at 34867 /note=SSC: 34867-35778 CP: yes SCS: both ST: NI BLAST-Start: [RNA polymerase sigma factor [Arthrobacter phage Phives]],,NCBI, q16:s2 91.4191% 1.0077E-39 GAP: 123 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.159, -5.955490053384832, yes F: DNA binding protein SIF-BLAST: ,,[RNA polymerase sigma factor [Arthrobacter phage Phives]],,QOP65175,55.9701,1.0077E-39 SIF-HHPRED: RNA polymerase sigma factor RpoS; transcription initiation, Pseudomonas aeruginosa, RNA polymerase, sigmaS, SutA, RNAP beta lobe, open beta lobe, TRANSCRIPTION; 3.13A {Pseudomonas aeruginosa PAO1},,,7XL3_F,91.4191,100.0 SIF-Syn: DNA binding protein, gene upstream is DNA binding protein and gene downstream is NKF, this is not seen in other phages. Other phages seem to have the upstream gene and this one combined into one. /note=Primary Annotator Name: Han, Maggie /note=Auto-annotation: Glimmer and Genemark: 34867 Start Codon: ATG /note=Coding Potential: Reasonable coding potential in putative ORF. Suggested start site covers all of coding potential /note=SD (Final) Score: SD score is the best score. The final score of -5.955 is the least negative and the Z score of 2.159 is a good z score /note=Gap/overlap: There is an gap of 123 which is not the best. The suggested start site does produce the longest ORF. /note=Phamerator: The gene is in pham 52430 as of 10/16/22. There is only one other phage in the pham which is Pixelle in cluster AZ. No function is called. /note=Starterator: There is not a reasonable start site choice conserved among the members of the pham. Start site 2 corresponds to 34867 bp found in 1 of the 2 phages in the pham. Emotion calls this site. There is only one other phage in this pham and no manual annotations /note=Location call: The start site of 34867 is likely the start site as it has a good SD score and covers the entire coding potential. Gap of 123 bp is a big gap but 34867 produces the longest ORF. There is only one other phage in this pham and there are no manual annotations. /note=Function call: DNA binding protein is the most likely function. This is the likely function of the gene since both PhagesDB and NCBI Blast produce hits with e-values of less than 10^-35 and identities greater than 35%. HHPred and CDD both produce hits for RNA polymerase sigma factor which falls under DNA binding protein with low e-values of less than 10^-25 and 10^-7 and high probabilities of 100%. /note=Transmembrane domains: No transmembrane domains called /note=Secondary Annotator Name: Chang, Stacy /note=Secondary Annotator QC: Based on the evidence, I do not agree with the location call. Based on starterator and host-trained GeneMark, I believe the correct start site is start@34903 (start site 31 on starterator, which is the only site for Emotion that has other manual annotations (and it has a large amount)). The final score for this start site is slightly worse than that of the auto-annotated start site, but not significantly so. [DROPDOWN BOXES] Make sure to check the GM coding capacity and Starterator dropdown boxes. [SD (FINAL) SCORE] The final score of -5.955 is not the least negative (that honor goes to the next ORF, with its final score of -5.092, and then to start@35542, with its final score of -5.323). [SYNTENY BOX] Please specify which specific phages you used to determine synteny. CDS 35819 - 36088 /gene="51" /product="gp51" /function="hypothetical protein" /locus tag="Emotion_51" /note=Original Glimmer call @bp 35819 has strength 14.55; Genemark calls start at 35819 /note=SSC: 35819-36088 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein [Cryptosporangium phraense] ],,NCBI, q8:s6 60.6742% 0.00494667 GAP: 40 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.188, -2.356391881054909, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Cryptosporangium phraense] ],,WP_142705653,9.16031,0.00494667 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chew, Brandon /note=Auto-annotation: GeneMark and Glimmer auto-annotated an ATG start site at 35819. /note=Coding Potential: Very strong coding potential, all of which fits well into the ORF and after the start site. /note=SD (Final) Score: -2.356. Z-score of 3.188. Both are the best scores of all start sites on PECAAN. /note=Gap/overlap: 40 bp gap. It’s not an ideal gap, but the gap lacks length and coding potential to include another gene, and the existing ORF includes all CP. Gene length of 270 is reasonable. /note=Phamerator: Not informative; gene is an orpham. /note=Starterator: Not informative; gene is an orpham. /note=Location call: Coding potential coverage, RBS and Z-score, start site identity, gene length and gap length indicate 35819 is the best start site and that this is a real gene. /note=Function call: No non-draft hits of sufficient e-values in PhagesDB BLAST, NCBI BLASTp, CDD or HHpred. Gene has NKF. /note=Transmembrane domains: TMHMM and TOPCONS predict no transmembrane domains, indicating this gene is not a membrane protein. /note=Secondary Annotator Name: Rivera, Wendy /note=Secondary Annotator QC: Needs to mention if the start site was included within the coding potential. Needs to mention if the gene length was reasonable or not. Needs to mention the Z score. Needs to mention if the gene is a real gene or not under Location Call. CDS 36085 - 36282 /gene="52" /product="gp52" /function="hypothetical protein" /locus tag="Emotion_52" /note=Original Glimmer call @bp 36085 has strength 18.39; Genemark calls start at 36085 /note=SSC: 36085-36282 CP: yes SCS: both ST: NA BLAST-Start: GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.416, -3.9718914447963414, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Vu, Trinity /note=Auto-annotation: Glimmer and GeneMark call the start site at 36085 with the start codon ATG. /note=Coding Potential: There’s good coding potential in the putative ORF in the forward direction of a frame. There is only good coding potential on 1 frame. No coding potential is lost. /note=SD (Final) Score: -3.972 (best score on PECAAN). While this is a poor RBS, it is acceptable as the gene is likely an operon which does not need a binding site. /note=Gap/overlap: There is a 4 bp overlap indicative of an operon. The length of 198 bp is reasonable. Alternative sites result in large overlaps or gaps which would result in the loss of coding potential. /note=Phamerator: Pham 48304 as of 10/8/22. Only member in pham. /note=Starterator: Gene is an orpham so n/a. /note=Location call: This is likely a real gene because there’s good coding potential in the putative ORF, the length of 198 bp is reasonable, the overlap and start codon are indicative of an operon, and alternative start sites result in greater overlap or gaps that result in loss of coding potential. /note=Function call: NKF. There are no relevant hits in PhagesDB BLAST and HHRPRED due to very poor E-values and NCBI BLAST and CDD had no hits. /note=Transmembrane domains: TOPCONS and TMHMM both predicted no TMDs. /note=Secondary Annotator Name: Chen, Daniel /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 36404 - 36685 /gene="53" /product="gp53" /function="hypothetical protein" /locus tag="Emotion_53" /note=Original Glimmer call @bp 36404 has strength 17.98; Genemark calls start at 36404 /note=SSC: 36404-36685 CP: yes SCS: both ST: NA BLAST-Start: GAP: 121 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.105, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hsu, Norman /note=Auto-annotation: Glimmer and GeneMark, Both call the start site at 36404, ATG /note=Coding Potential: There is good coding potential in this ORF. It is found on the forward strand in both GeneMark Self and Host. /note=SD (Final) Score: The final SD score is -2.505. This is the best score listed on Pecaan in comparison to other entries. /note= /note=Gap/overlap: The gap size is 121bps. The size is fairly big but this is the longest ORF listed. /note=Phamerator: The pham number as of 10/10/22 is 48396. This gene is an orpham and yet to be found in other phages. No gene function is called for by this pham number yet. /note=Starterator: No other genes of non-draft phages are reported. The Starterator readout is not informative. /note=Location call: Based on all available evidence, the gene is most likely a real gene and the most likely start site is at 36404. /note=Function call: No known function. All the Phagesdb blast hits have unknown functions. NCBI Blast, CDD hits, and HHPRED have no relevant hits. /note=Transmembrane domains: Both TmHmm and Topcons display no TMDs /note=Secondary Annotator Name: Han, Maggie /note=Secondary Annotator QC: I agree with the calls CDS 36798 - 37415 /gene="54" /product="gp54" /function="SprT-like protease" /locus tag="Emotion_54" /note=Original Glimmer call @bp 36798 has strength 21.24; Genemark calls start at 36798 /note=SSC: 36798-37415 CP: yes SCS: both ST: SS BLAST-Start: [SprT-like protease [Arthrobacter phage Reedo]],,NCBI, q10:s2 95.1219% 3.99093E-115 GAP: 112 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.945, -2.8454123590742793, yes F: SprT-like protease SIF-BLAST: ,,[SprT-like protease [Arthrobacter phage Reedo]],,UJQ86836,90.9091,3.99093E-115 SIF-HHPRED: SprT-like domain-containing protein Spartan; DPC repair protease, DNA BINDING PROTEIN; HET: ADP, MLZ, FLC; 1.5A {Homo sapiens},,,6MDW_A,87.8049,99.6 SIF-Syn: Though directly adjacent genes do not show synteny with other phage genomes, this gene shows general synteny with phage Zeta1847 (EH). The upstream +2 and +3 gene is DNA binding protein, just like the upstream +2 gene in Zeta1847. The downstream gene is pham 48779 (as of 10/11/2022), just like the downstream +3 gene in Zeta1847. /note=Primary Annotator Name: Chang, Stacy /note=Auto-annotation: Glimmer and GeneMark agree on start at 36798. /note=Coding Potential: Consistent coding potential in this ORF is on the forward strand only, agreeing with the auto-annotation call. Start site 36798 covers all the coding potential. However, in both GeneMark Self and Host, the coding potential drops to zero prior to the stop site. In GeneMark Self, the coding potential is not entirely consistent, and drops to zero at around 36960 on the forward strand. /note=SD (Final) Score: -2.845, which is significantly better than other final scores on PECAAN. The Z score is also the highest at 2.945. /note=Gap/overlap: Gap is somewhat large at 112bp. However, other gene calls either have large overlaps (-176 bp) or even larger gaps. There is no coding potential in the gap. Adjoining genes are also in the forward direction. /note=Phamerator: 1210 as of 10/18/2022, which has a function call of a SprT-like protease. Pham 1210 is a well conserved pham with 83 members across actinobacteriophage clusters AZ, BH, EB, and EH. /note=Starterator: Start site 37 (start@36798) is unique to Emotion. It is spatially close to site 38, which is called 75% of the time when present, and site 42, which is the most-called start site. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 36798. /note=Function call: SprT-like protease. All PECAAN widgets show strong support for this function call, with synteny found across subclusters AZ, EB, EH, and BH (all members of pham 1210). /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Chew, Brandon /note=Secondary Annotator QC: needs attention: suggested start (SS) site should not be selected because start site is not called in any other gene in the pham. Please confirm coding potential report matches the chosen start site. For Gene/Overlap, include whether or not coding potential exists in the gap. Please mention PECAAN widgets by name and summarize numerical quality of hits. Good calls on synteny. CDS 37527 - 38288 /gene="55" /product="gp55" /function="hypothetical protein" /locus tag="Emotion_55" /note=Original Glimmer call @bp 37527 has strength 19.83; Genemark calls start at 37527 /note=SSC: 37527-38288 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein ARTSIC4J27_556 [Pseudarthrobacter siccitolerans]],,NCBI, q11:s11 96.0474% 7.53372E-55 GAP: 111 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.027, -2.6013996449736907, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein ARTSIC4J27_556 [Pseudarthrobacter siccitolerans]],,CCQ44629,62.3482,7.53372E-55 SIF-HHPRED: SIF-Syn: /note=PECAAN Notes /note=Primary Annotator Name: De Jesus, Jorja /note=Auto-annotation: Both Glimmer and GeneMark indicate that the start for this gene is 37527. The start codon is ATG. /note=Coding Potential: The ORF is aligned with the start@ 37,527 and the stop@ 38,288 of this gene in the forwards direction, and coding potential is high over all of the ORF. /note=SD (Final) Score: The SD final score is -2.601 and it has a Z score of 3.027. Although the Z score is not the lowest, the final score is the best of the candidates available. /note=Gap/overlap: There is a gap of 111 and a spacer of 10. /note=Phamerator: As of 9/30/22 this gene is a part of pham 48779 which has 147 members and 12 drafts. /note=Starterator: This gene has start 30 which has only 2 manual annotations out of the 135 total. However, because this gene does not include the most manually annotated start in this pham, start 30 is still the best candidate. /note=Location call: The start for this gene is start 30, or 37527. /note=Function call: NKF. Phagesdb BLAST has hits with good e-values but all of the hits are genes with unknown function. CDD has no hits. NCBI BLAST has top hits with other genes that are hypothetical proteins. HHpred has hits, but the top ones all have e-values that are less than 1 but still a bit high. Since synteny is poor in this region where the gene and those it has pham lines with do not have consistently the same function; we can go with the NCBI BLAST hits as evidence and designate this gene as a NKF. /note=Transmembrane domains: TmHmm and Topcons do not have hits so this is not a transmembrane domain. /note=Secondary Annotator Name: Vu, Trinity /note=Secondary Annotator QC: helpful to specify if coding potential only in 1 direction and if all covered by start, note if final score best in pecaan, should note what start site is most conserved and how many pham members call it, helpful to justify location call with above evidence, change function as hypothetical protein is equivalent to NKF and NKF can be checked as evidence in BLAST (i agree that function unknown) CDS 38405 - 38719 /gene="56" /product="gp56" /function="hypothetical protein" /locus tag="Emotion_56" /note=Original Glimmer call @bp 38405 has strength 19.4; Genemark calls start at 38405 /note=SSC: 38405-38719 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ADUMB2043_43 [Arthrobacter phage Adumb2043]],,NCBI, q1:s1 52.8846% 4.48338E-4 GAP: 116 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.105, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ADUMB2043_43 [Arthrobacter phage Adumb2043]],,QOP65103,35.3535,4.48338E-4 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Rivera, Wendy /note=Auto-annotation: Gene(stop@38719) Both Glimmer and Genemark agree with start site @38405 with start codon ATG. /note=Coding Potential: The gene does have reasonable coding potential within the ORF and includes start site @38405. /note=SD (Final) Score: Start site 38405 has the best SD score of -2.443 and Z score of 3.105. /note=Gap/overlap: The gene with start site 38405 has a reasonable length of 315 bp and a gap of 116 bp with the upstream gene. The chosen start site is not LORF, as the LORF has poor SD and Z scores of -5.972 and 1.393 with poor coding potential. /note=Phamerator: As of 10/6/2022, the gene belongs to pham 4404. Phage Emotion is clusterless, but other clusters/subclusters that share the same pham are only cluster AZ. For comparison purposes, phage Adumb4023 was used. No function was called for from the Phams database. /note=Starterator: Start number 9 was the most annotated where 13/13 of non-draft phages called for this site. Phage Emotion called for start number 9 and includes start site @38405, confirming Glimmer and Genemark’s prediction on calling for this start site. /note=Location call: Taken together, the evidence suggests that this gene is a real gene with start site 38405 due to the start site covering all coding potential. /note=Function call: No informative hits from CDD, TOPCONS, and HHPRED. Predicted function is NKF/hypothetical protein according from NCBI BLASTp with a hit from phage Adumb2043. Consisted of low e-values of 0.00483, a % identity of 25.25%, and a % coverage of 52.86%. /note=Transmembrane domains: No informative hits from TMHMM. /note=Secondary Annotator Name: Hsu, Norman /note=Secondary Annotator QC: I agree with the annotation. Just make sure to specify the year for the phamerator section. CDS 38728 - 38826 /gene="57" /product="gp57" /function="hypothetical protein" /locus tag="Emotion_57" /note= /note=SSC: 38728-38826 CP: yes SCS: neither ST: NA BLAST-Start: [helix-turn-helix DNA-binding domain protein [Arthrobacter phage LittleTokyo]],,NCBI, q1:s1 81.25% 1.26668E-5 GAP: 8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.414, -3.9166971242758706, yes F: hypothetical protein SIF-BLAST: ,,[helix-turn-helix DNA-binding domain protein [Arthrobacter phage LittleTokyo]],,QGZ16963,19.1304,1.26668E-5 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Bagdatli, Dila /note=Auto-annotation: This gene was manually added. No start site called by either GeneMark or Glimmer. /note=Coding Potential: There is high coding potential between the start site @38728 until the stop site @38826 in self-trained GeneMark. There is almost no coding potential in host-trained GeneMark. The start site includes the entire coding potential. /note=SD (Final) Score: The final score is -3.917 which is the most appropriate out of the 3 options as it is the least negative one. /note=Gap/overlap: There is a gap of 8 bp which is plausible and conforms to the common densely packed phage genomes. /note=Phamerator: This gene was manually added so no pham information is present. /note=Starterator: No starterator information is available. /note=Location call: Given that the 2 other start site candidates have gap of -352 and -67 which shows a significant overlap that is not possible, the start site should be @38728 bp. This includes all of the coding potential and has an appropriate 8 bp gap. It also has the highest final score which supports this location call. The gene is only 99 bp in length, however, which is less than the 120 bp recommended threshold. /note=Function call: NCBI BLASTp shows only 2 hits, one of which is with a hypothetical protein. The other, more significant BLAST hit was with the helix-turn-helix DNA binding domain protein in phage LittleTokyo with 77% identity (high) and an e-value of 1e-05 which is less than the 1e-03 threshold, and therefore significant. PhagesDB BLASTp also showed a significant hit with the helix-turn-helix DNA binding domain protein in phage LittleTokyo with 75% identity (high) and a low e-value of 1e-06. The length of this gene in LittleTokyo was 115 bp which is close to the length of this gene. HHPred shows a significant hit with ribosome biogenesis protein in the bacteria pyrococcus furiosus and also other bacteria with ~98% probability, >90% coverage, and an e-value of 2e-6. It also showed a hit with zinc-ribbon containing domain with 98% probability, >86% coverage, and e-value of 1e-5. Because there were multiple significant hits with proteins with different functions, the function call is inconclusive. /note=Transmembrane domains: There are no TM domains reported by either TmHmm or TOPCONS. CDS 39138 - 39497 /gene="58" /product="gp58" /function="hypothetical protein" /locus tag="Emotion_58" /note=Original Glimmer call @bp 39138 has strength 23.62; Genemark calls start at 39138 /note=SSC: 39138-39497 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TBONE_51 [Arthrobacter phage Tbone] ],,NCBI, q4:s5 92.437% 3.84772E-10 GAP: 311 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.945, -2.9063687850157054, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TBONE_51 [Arthrobacter phage Tbone] ],,QPX62382,58.0952,3.84772E-10 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chen, Daniel /note=Auto-annotation: Glimmer and Genemark both call the start site as 39138. Start codon is ATG. /note=Coding Potential: Coding potential is found in both GeneMark Host and Self. The ORF has reasonable coding potential and the predicted start sites cover all of it. /note=SD (Final) Score: SD final score is -2.906. Z-score at 2.945. These are the best scores on PECAAN. /note=Gap/overlap: There is a gap of 418 bp, but there is no coding potential, suggesting that there is no missing gene. /note=Phamerator: Pham 47409. Date 10/6/22. Although this phage’s cluster is unknown, the pham is conserved in other clusters; found in Warda (AZ), YesChef (AZ), Tweety19 (AZ), and more. There was no function called. /note=Starterator: Start site 9 was manually annotated in 28/29 non-draft genes in this pham. Start site 9 is 39138, which agrees with the start site predicted by Glimmer and GeneMark. Start site 9 was the most annotated for this pham. /note=Location call: Based on above evidence, the gene is real and the start site is most likely at 39138. /note=Function call: Function unknown. No program returned any informative results. /note=Transmembrane domains: No transmembrane domains predicted. /note=Secondary Annotator Name: Chang, Stacy /note=Secondary Annotator QC: I agree with the function call and annotation of this gene. CDS 39494 - 39931 /gene="59" /product="gp59" /function="hypothetical protein" /locus tag="Emotion_59" /note=Original Glimmer call @bp 39494 has strength 16.14; Genemark calls start at 39494 /note=SSC: 39494-39931 CP: yes SCS: both ST: NA BLAST-Start: GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.945, -2.9063687850157054, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Han, Maggie /note=Auto-annotation: Glimmer and Genemark: 39494 Start Codon: GTG /note=Coding Potential: Reasonable coding potential in putative ORF. Suggested start site covers all of coding potential /note=SD (Final) Score: SD score is the best score. The final score of -2.906 is the least negative and the Z score of 2.945 is the best z score /note=Gap/overlap: There is an overlap of 4 bp which is typical for an operon. The suggested start site does produce the longest ORF. /note=Phamerator: The gene is in pham 48457 as of 10/10/22. The pham is not conserved in any cluster. Emotion is the only phage in this pham. /note=Starterator: There is not a reasonable start site choice conserved among the members of the pham. Starterator suggests that this protein is an orpham /note=Location call: The start site of 39494 is likely the start site as it has a good SD score and covers the entire coding potential. Overlap of 4 bp is typical of an operon /note=Function call: No known function. PhagesDB and NCBI Blast as well as HHPred and CDD produce no hits with significant e-values /note=Transmembrane domains: No transmembrane domains called /note=Secondary Annotator Name: De Jesus, Jorja /note=Secondary Annotator QC: I QC`ed this location call and agree with the annotations. CDS 39928 - 40788 /gene="60" /product="gp60" /function="hypothetical protein" /locus tag="Emotion_60" /note=Original Glimmer call @bp 39928 has strength 20.36; Genemark calls start at 39928 /note=SSC: 39928-40788 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein SEA_CREWMATE_58 [Arthrobacter phage Crewmate]],,NCBI, q147:s3 47.5524% 2.5115E-9 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.894, -5.410945357522411, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CREWMATE_58 [Arthrobacter phage Crewmate]],,UIW13310,56.2963,2.5115E-9 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chew, Brandon /note=Auto-annotation: Both Glimmer and GeneMark were used to auto-annotate the gene and agreed on a GTG start site at 39928. /note=Coding Potential: Strong coding potential from ~39930 to ~40750, all within the ORF. ~100 bp gap at 40300 in the GeneMarkS, but lack of stop site indicates it’s one cohesive gene. /note=SD (Final) Score: -5.411. Not the best in PECAAN, but the 4 bp overlap with 2 genes upstream and downstream indicates this is an operon. Also, even though there are start sites with better Final and Z-scores, these other options leave a gaping 400 bp or more gap filled with coding potential, so these start options are not viable. /note=Gap/overlap: 4 bp overlap indicates operon. All CP is covered by ORF, so no extra genes exist upstream or downstream. /note=Phamerator: Not informative; gene is an orpham. /note=Starterator: Not informative; gene is an orpham. /note=Location call: Coverage of coding potential, gap length indicative of an operon, and auto-annotation indicate that 39928 is the best start site. 2 NCBI BLASTp hits >36% identity and 97% (NCBI BLASTp only) indicate this is a real gene with start site 43267. /note=Function call: PhagesDB BLAST and NCBI BLASTp return hits with no known function, and HHpred and CDD return no hits with sufficiently low E-value. All other members of the pham are NKF. This protein has NKF. /note=Transmembrane domains: TMHMM predicts no transmembrane domains, indicating this is not a membrane protein. /note=Secondary Annotator Name: Chen, Daniel /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 43511 - 43720 /gene="69" /product="gp69" /function="hypothetical protein" /locus tag="Emotion_69" /note=Original Glimmer call @bp 43511 has strength 22.84; Genemark calls start at 43511 /note=SSC: 43511-43720 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_KEALII_64 [Arthrobacter phage KeAlii]],,NCBI, q1:s1 97.1014% 9.01015E-9 GAP: 4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.099, -2.5969056218148614, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KEALII_64 [Arthrobacter phage KeAlii]],,UDL14670,61.3333,9.01015E-9 SIF-HHPRED: SIF-Syn: n/a /note=Primary Annotator Name: Vu, Trinity /note=Auto-annotation: Glimmer and GeneMark Start call start site 43511 and start codon ATG. /note=Coding Potential: There is good coding potential in the putative ORF on only one reading frame in the forward direction on both the Host-Trained and Self-Trained GeneMark maps. The ORF contains all the coding potential. /note=SD (Final) Score: -2.597 (only score in PECAAN) /note=Gap/overlap: There is a 4 base pair gap which is reasonable, the 210 bp length is reasonable, and there are no alternatives start sites. /note=Phamerator: pham 26597 as of 10/5/22. There is only 1 other pham member, Cluster AZ phage KeAlii, which does not call a function. /note=Starterator: Start site #1 is the most annotated and the only called start site. 2/2 pham members call this start site including Emotion (which only has 1 start site) corresponding to bp 43511. /note=Location call: This is likely a real gene with the start site 43511 as there is good coding potential in the putative ORF, the 4 bp gap is reasonable, and there is no alternative start site. /note=Function call: NKF. The top PhagesDB BLAST hit (E-value 8e-9) and the one NCBI BLAST hit (E-value of 8.96e-9 with 46% identity and 97% coverage) calls function unknown. HHPRED gave no relevant hits due to poor E-values while CDD gave no hits. /note=Transmembrane domains: Not a membrane protein as TMHMM and TOPCONS both predicted no TMDs. /note=Secondary Annotator Name: Han, Maggie /note=Secondary Annotator QC: CDS 43720 - 43983 /gene="70" /product="gp70" /function="HNH endonuclease" /locus tag="Emotion_70" /note=Original Glimmer call @bp 43720 has strength 16.29; Genemark calls start at 43720 /note=SSC: 43720-43983 CP: yes SCS: both ST: SS BLAST-Start: [HNH endonuclease [Arthrobacter phage Powerpuff] ],,NCBI, q7:s6 85.0575% 3.47448E-18 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.945, -3.6727816321281046, yes F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage Powerpuff] ],,QGZ17367,54.023,3.47448E-18 SIF-HHPRED: Restriction endonuclease Hpy99I; ENDONUCLEASE-DNA COMPLEX, RESTRICTION ENZYME, HPY99I, PSEUDOPALINDROME, HYDROLASE-DNA COMPLEX; HET: 1PE; 1.5A {Helicobacter pylori},,,3GOX_A,55.1724,94.0 SIF-Syn: HNH endonuclease, the downstream gene is pham 26597 and the upstream gene is rIIB-like protein /note=Primary Annotator Name: Hsu, Norman /note=Auto-annotation: Glimmer and GeneMark, Both call the start site at 43720, ATG /note=Coding Potential: There is good coding potential in this ORF but the coding potential does not match the reported value. The gene might have a greater length than what is being reported. It is found on the forward strand in both GeneMark Self and Host. /note=SD (Final) Score: The final SD score is -3.673. This is not the best score listed on Pecaan, yet this entries has the smallest gap with the upstream gene /note=Gap/overlap: There is an -1 overlap with the upstream gene, this is a reasonable size for an overlap. /note=Phamerator: The pham number as of 10/10/22 is 2209. The gene is conserved in 10 out of 29 of the non draft genes including Liebe, Adolin, and Maureen from the AZ cluster. /note=Starterator: There are 29 non draft member of this pham, but no other phages call for start site 26 except for Emotion. It is not the most conserved start site among the members within the pham. /note=Location call: Based on all available evidence, the gene is most likely a real gene and the most likely start site is at 43720. /note=Function call: HNH endonuclease. The top 3 non draft phagesdb blast hits have the function of HNH endonuclease (E-value< 10^-18), and 6 out of10 top NCBI Blast hits also have the function of a HNH endonuclease (Coverage >75%, and E-value< 10^-15 ). HHpred does not have any hits. CDD had no relevant hits. Considering all evidence listed above, the gene has a function of HNH endonuclease. /note=Transmembrane domains: Both TmHmm and Topcons display no TMDs /note=Secondary Annotator Name: Chew, Brandon /note=Secondary Annotator QC: needs attention: coding potential quality does not match report. SS should not be selected since start site is unique. "HHpred does not have any hit" should be pluralized. Move CDD report to before the function call conclusion. Synteny should not be constructed with NKF, identify pham number of NKF gene instead. tRNA 44209 - 44282 /gene="71" /product="tRNA-Trp(cca)" /locus tag="EMOTION_71" /note=tRNA-Trp(cca) CDS 44321 - 44479 /gene="72" /product="gp72" /function="hypothetical protein" /locus tag="Emotion_72" /note= /note=SSC: 44321-44479 CP: yes SCS: neither ST: SS BLAST-Start: GAP: 337 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.358, -3.948782731199928, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Liu, Jinge /note=Auto-annotation: Gene (stop@44479 F) does not have either Glimmer or GeneMark calling for a start site. The auto-annotated start site @44321 contains high coding potential. However, it is worth noting that start site @44333 uses the ATG (most used) start codon and also contains all coding potentials, but this start site is not the LORF, and does not have the best Z- and Final scores. /note=Coding Potential: High coding potential in the second half of the gene, low/medium coding potential in the first half. Start site 44479 covers all the coding potential. /note=SD (Final) Score: Final score is -3.949. Highest and best score. /note=Gap/overlap: 337 bp. Smallest gap. A T-RNA is located in this gap. No coding potential is observed in this gap. /note=Phamerator: 12/19/2022. This gene does not belong to a pham and is not observed in any other phages. /note=Starterator: There is no option to run Starterator since this gene does not belong to a pham. /note=Location call: Based on the evidence, start site @44321 is the start site for this real gene as supported by LORF, Z-score, and Final score. /note=Function call: NKF. No known function. PhagesDB BLAST and NCBI BLAST show no hits. /note=Transmembrane domains: Not a transmembrane protein as predicted by TmHmm and TOPCONS. /note=Secondary Annotator Name: /note=Secondary Annotator QC: CDS 44491 - 44676 /gene="73" /product="gp73" /function="hypothetical protein" /locus tag="Emotion_73" /note=Original Glimmer call @bp 44491 has strength 12.97; Genemark calls start at 44491 /note=SSC: 44491-44676 CP: yes SCS: both ST: NA BLAST-Start: GAP: 11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.425, -3.954790667257792, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: De Jesus, Jorja /note=Auto-annotation: Glimmer and GeneMark indicate that the start is 44491. It has a Glimmer score of 12.97. This start is the only candidate for this gene. The start codon is ATG. /note=Coding Potential: The start is 44491 and the stop is 44676; there is an ORF matching this. There is also high coding potential within this region. /note=SD (Final) Score: The SD final score is -3.955 and the Z-score is 2.425. /note=Gap/overlap: There is a gap of 507 and a spacer of 12. According to pham maps, there is a tRNA within that gap of 507. /note=Phamerator: This gene has the pham number 48236 but is an orpham. According to Phagesdb, this gene has the most alignment (though e-values are high) with cluster BK1’s Beuffert, Blueeyedbeauty, Limpid, and Annadreamy, so these were used for comparison. Despite their alignment, the region where this gene is does not have good synteny with them. There better synteny in this region with members from cluster AZ such as Adolin and Adumb, but it is still poor synteny. /note=Starterator: This gene is an orpham so starterator does not generate. /note=Location call: 44491 is the only start site candidate for this gene so it should be the correct one. /note=Function call: rIIB-like protein. NCBI BLAST and CDD do not have hits for this gene. HHpred has hits for this gene but their e-values are too high or they are for domain of unknown function. Phagesdb BLAST had 4 hits all which found that this gene has alignment with games from members of cluster BK1. The e-values are high but all of these genes have the same function. Based on this being the best evidence out of this, the function is most likely rIIB-like protein. /note=Transmembrane domains: TmHmm and Topcons do not have any hits for this gene /note=Secondary Annotator Name: Hsu, Norman /note=Secondary Annotator QC: The starterator roll down menu should be NA if it`s an orpham. You can also leave the synteny box empty. CDS 44679 - 45029 /gene="74" /product="gp74" /function="HNH endonuclease" /locus tag="Emotion_74" /note=Original Glimmer call @bp 44679 has strength 6.91 /note=SSC: 44679-45029 CP: yes SCS: glimmer ST: SS BLAST-Start: [HNH endonuclease [Arthrobacter phage Adumb2043]],,NCBI, q1:s1 85.3448% 3.69786E-32 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.633, -3.534757715066017, yes F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage Adumb2043]],,QOP65128,83.0,3.69786E-32 SIF-HHPRED: HNH endonuclease; Thermophilic bacteriophage, HNH Endonuclease, DNA nicking, HYDROLASE; 1.52A {Geobacillus virus E2},,,5H0M_A,62.069,97.1 SIF-Syn: HNH endonuclease, downstream is NKF, no upstream gene, just like is Adumb2043. /note=Primary Annotator Name: Rivera, Wendy /note=Auto-annotation: Gene(stop@45029F) Only Glimmer calls for a start site at 44679 with start codon GTG. /note=Coding Potential: The gene has reasonable coding potential within the ORF, but does not include the potential start site of 44679. /note=SD (Final) Score: Start site 44679 has the best final score of -3.535 and Z score of 2.633. /note=Gap/overlap: The gene has the LORF with a gene length of 351 and a reasonable gap of 2 bp with the upstream gene. /note=Phamerator: As of 10/8, the gene belongs to pham #49992. There are other clusters/subclusters that belong to the same pham that the gene belongs to such as clusters AZ (phage Adumb2043) and A (phage Anthony), which were used for comparison. The phams database also called for the function of HNH endonuclease which is consistent with the other phages that shared the same pham #, and the function is included in the approved functions list. /note=Starterator: Among the members of the pham, start number 110 was conserved and 242/719 called for this site. Phage Emotion called for start number 108, where 4/719 called for the start site and included the candidate start site of 44679. /note=Location call: Based on the evidence, the gene is a real gene with a potential start site at 44679. Evidence from Phamerator indicating that other clusters/subclusters called for the same pham that the gene belongs to, and the gene having the best SD and Z score within the LORF indicates that the gene is a real gene. /note=Function call: No informative hits from HHPRED due to weak e-values. No informative hits from CDD due to low % identities. No informative hits on TOPCONS. Predicted function is HNH endonuclease from multiple NCBI & PhageDB BLASTp hits and from the Phams database on PhagesDB with the suggested function. BLASTp hits having low e values ranging from e^-44 to e^-32, a high % identity of >75% with >85% coverage. /note=Transmembrane domains: No informative hits from TMHMM. /note=Secondary Annotator Name: Chang, Stacy /note=Secondary Annotator QC: needs attention: [CODING POTENTIAL] "Staggering" means astonishing, aka a very large coding potential, but the coding potential here is not very large. Maybe you mean "staggered"? [ALL GM CODING CAPACITY] I believe this refers to whether the start site encompasses all the coding capacity, not whether the coding capacity encompasses all of the start site. So, in this case, the answer would be "Yes" (based on third track forward sequence). [STARTERATOR] By only looking at this section, it is easy to think, "Why didn`t we choose start 110 then?" So, it would lead to a stronger argument if you mentioned that Emotion does not contain start site 110 and that start site 108 is located close to start site 110, the most annotated site. Additionally, it would be helpful to indicate that start site 108 is called 100% of the time when present. [STARTERATOR DROPDOWN BOX] This should be Suggested Start since you agreed with the starterator call. [FUNCTION EVIDENCE] Remember to check boxes for functional evidence (specifically on Phagesdb BLAST and CDD)!