CDS 52 - 471 /gene="1" /product="gp1" /function="terminase, small subunit" /locus tag="Hum25_1" /note=Original Glimmer call @bp 52 has strength 10.24; Genemark calls start at 130 /note=SSC: 52-471 CP: yes SCS: both-gl ST: SS BLAST-Start: [Terminase small subunit [Gordonia phage Thimann]],,NCBI, q6:s4 66.187% 1.77323E-18 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.701, -3.139452255336754, yes F: terminase, small subunit SIF-BLAST: ,,[Terminase small subunit [Gordonia phage Thimann]],,WNM74266,51.7857,1.77323E-18 SIF-HHPRED: Terminase small subunit; genome packaging, bacteriophage, DNA binding, VIRAL PROTEIN; 1.4A {Enterobacteria phage HK97},,,6Z6E_B,71.223,99.7 SIF-Syn: terminase small subunit protein, immediately upstream of the terminase big subunit /note=Primary Annotator Name: Nguyen, Mya /note= /note=Auto-annotation: Both Glimmer and Genemark auto-annotated. They do not agree on the same start site. Glimmer auto-annotates the start site 52 while Genemark calls the start site 130. The start codon called is TTG for both. /note= /note=Coding Potential: There is good coding potential in both the forward and reverse direction. The chosen start site covers all the coding potential. /note= /note=SD (Final) Score: The SD score for start site 52 is -3.139, which is the best score. It also has a high Z score of 2.701. The SD score for start site 130 is -5.531, which is not the best score, and it also has a lower Z score of 1.63. /note= /note=Gap/overlap: There is no gap with upstream genes since this is the first gene. The length of the gene at start site 52 is 420, which is the longest length out of all the stop sites. /note= /note=Phamerator: Date:4/5/23, Pham number 72338. Since Hum25 is a singleton, it is unknown whether the pham the gene is in is also included for other members of the same cluster. /note= /note=Starterator: The most annotated start site is start 6, which is called aby 13 of 22 non-draft genes in the pham. However, Hum25 does not have the most annotated start and instead has start 5 @ 52, which corresponds to the glimmer start site. /note= /note=Location call: The gene is a real gene as it is conserved and has good coding potential. The start site that is most likely is start site 52, which is called by glimmer and in Starterator. It has the longest ORF and the best Z and final score. /note= /note=Function call: Multiple phagesDB BLAST has hits with function calls of terminase small subunit with extremely small e-values of 6e-74 to 3e-18. Phagesdb function frequency show only terminase small subunit as the function. There are no hits on CDD. HHpred hits show high probability and low e-values with hits being terminase small subunits as well. /note= /note=Transmembrane domains: No predicted TMDs. The protein is not a transmembrane protein. The terminase small subunit does not interact with the cell membrane, therefore, it is reasonable that there are no TMDs. /note= /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 471 - 2078 /gene="2" /product="gp2" /function="terminase, large subunit" /locus tag="Hum25_2" /note=Original Glimmer call @bp 471 has strength 7.93; Genemark calls start at 471 /note=SSC: 471-2078 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Actinosynnema pretiosum] ],,NCBI, q7:s3 95.3271% 1.53371E-162 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.976, -4.743128377677594, no F: terminase, large subunit SIF-BLAST: ,,[hypothetical protein [Actinosynnema pretiosum] ],,WP_096495975,63.4545,1.53371E-162 SIF-HHPRED: Terminase large subunit; genome packaging, bacteriophage, ATPase, nuclease, VIRAL PROTEIN; HET: BR; 2.2A {Enterobacteria phage HK97},,,6Z6D_A,93.271,100.0 SIF-Syn: Terminase large subunit, upstream gene is portal protein, just like in phage DatBoi /note=Primary Annotator Name: Robles, Angel /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 471 /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. with start codon ATG. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.743. It has a high Final Score which indicates it has a good sequence match /note=Gap/overlap: Gap of -1 (overlap). This is small and reasonable. /note=Phamerator: pham: 73448. Date 03/24/2023. It is conserved; found in 20ES (A) and 40AC (A). /note=Starterator: The start number called the most often in the published annotations Start Site 58 in Starterator, which was manually annotated in 609/1172 non-draft genes in this pham. This evidence disagrees with the site predicted by Glimmer and GeneMark. The start number called for Hum25 is Start Site 94 in Starterator, which was manually annotated in 1/1172 non-draft genes in this pham. Start 94 correlates to a start site of 471bp in Hum25 /note=Location call: Does not match with the most annotated start site. Additionally, start site 94 covers all coding potential which leads me to believe that this is a real gene. /note=Function call: Terminase, large subunit. The top three phagesdb BLAST hits have the function of terminase (E-value <10^-71), and 4 out of 5 top NCBI BLAST hits also have the function of terminase protein. (95% coverage, 40%+ identity, and E-value <10^-159). HHpred had a hit for terminase protein with 100% probability, 93% coverage, and E-value of 1.79997e-41. CDD had a hit for terminase protein with 20% identity, 30% alignment, 83% coverage, with an E-value of 3.2782e-12. /note=Transmembrane domains: TMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Gowdy, Griffin /note=Secondary Annotator QC: Looks good. CDS 2091 - 3464 /gene="3" /product="gp3" /function="portal protein" /locus tag="Hum25_3" /note=Original Glimmer call @bp 2091 has strength 9.54; Genemark calls start at 2091 /note=SSC: 2091-3464 CP: yes SCS: both ST: SS BLAST-Start: [phage portal protein [Paenarthrobacter ureafaciens] ],,NCBI, q1:s1 99.7812% 0.0 GAP: 12 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.959, -4.779799586278476, no F: portal protein SIF-BLAST: ,,[phage portal protein [Paenarthrobacter ureafaciens] ],,WP_166186131,87.2527,0.0 SIF-HHPRED: Portal protein; Bacteriophage, SPP1, Portal Protein, Head completion proteins, Connector Complex, DNA Channel, VIRAL PROTEIN; 2.7A {Bacillus subtilis},,,7Z4W_D,92.3414,100.0 SIF-Syn: synteny with other A. Glob. infecting phages such as ABBA from cluster AO3 which has a large subunit of the terminase before and a capsid maturation protein after. This is also strongly seen in cluster AY with auxilium and Hestia both showing very similar synteny with pham overlaps as well. /note=Primary Annotator Name: Hamid, Bilal /note=Auto-annotation: Both Glimmer and genemark agree on 2091 start site. /note=Coding Potential: Host-trained genemark shows considerable forward potential and also some reverse coding potential between the autoannotated start and stop site. Results are corroborated with phage trained gene mark with similar levels of coding potenital for both forward and reverse genes. /note=SD (Final) Score: -4.780 is not the smallest final score but it is correlated with the largest ORF and smallest gap. /note=Gap/overlap: 12bp gap provides the largest ORF with a z-score of 1.959 is also provides enough space for a separate SD sequence which makes sens efor a non-operon gene. /note=Phamerator: 04/04/2023 - pham is 73446 with 1850 members of which 138 are current draft genes. For all phages with designated clusters it is called as a portal protein and inc ases where their is no cluster, it is often called as a putative portal protein. /note=Starterator: Start site 129 @2091, the atuo annotated site, is likely he best start site as it minimizes the space between the previous gene and this one. This gene does not content he msot often called start stie. Call site 129 is unique to Hum_25, thus there are no manual annotation of this gene. /note=Location call: 2091 is most likely the best start site based on starterator and phamerator data. /note=Function call: PhagesDB Blastp indicates function as a portal protein based on Kromp_3, Auxilium_3, and Richie 4 with e-values of 83-70, 5e-66, and 5e-66 respectively. NCBI BlastP indicates multiple e-values of 0 due to highly identical(>60%) nature to paenarthrobacter ureafaciens, arthrobacter agilis, and arthrobacter roseus phage portal proteins. CDD also indicates a portal protein from the Gp6 superfamily with an e-value of 4.2e-24. Both best PDB and Pfam results from HHpred also indicate a Gp6 portal protein with e-values of 3.5e-35 and 2.9e-34 respectively. The combination of high quality hits across a variety of sources gives strong evidence that the protein is in fact a portal protein. /note=Transmembrane domains: 0 transmembrane domains were noted by DeepTMHMM as a portal protein, it makes sense that it would not need transmembrane domains due to its specific relevance to phage assembly. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 3436 - 4440 /gene="4" /product="gp4" /function="hypothetical protein" /locus tag="Hum25_4" /note=Original Glimmer call @bp 3433 has strength 11.58; Genemark calls start at 3433 /note=SSC: 3436-4440 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein [Paenarthrobacter ureafaciens] ],,NCBI, q1:s3 100.0% 0.0 GAP: -29 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.622, -3.95820260462814, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Paenarthrobacter ureafaciens] ],,WP_166186130,86.747,0.0 SIF-HHPRED: SIF-Syn: When compared to phage Seahorse (cluster AY and infects Arthrobacter bacteria - A. glob), the genes upstream are portal proteins and the genes downstream are a scaffolding protein and minor capsid protein. Seahorse has a capsid decoration protein in between the scaffolding protein and minor capsid protein. /note=Primary Annotator Name: Tran, Krysten /note=Auto-annotation: Glimmer and GeneMark; both agree on the same start site (3433); start codons called - ATG, TTG, and GTG /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF and the start site does cover all the coding potential. /note=SD (Final) Score: -3.368 is the most favorable score on PECAAN. The start codon is ATG. /note=Gap/overlap: There is an overlap of 32 bp, which is relatively large for an overlap. /note=Phamerator: This gene is in pham 73745 as of 03/24/23. This phage is a singleton but the gene is also found in cluster BL and AY, which also calls start site 2. /note=Starterator: The start number 5 was called in 17 of the 23 non-draft genes in the pham, however start number 5 is not in this phage. Start number 2 at position 3433 has the most manual annotations (3) for this gene and is consistent with the start sites called by both Glimmer and GeneMark. The start number 2 was called in 3 of the 23 non-draft genes in the pham. /note=Location call: Based on all the evidence gathered, the start site for this gene is likely at 3433. /note=Function call: Many good hits from PhagesDB BLASTp but the first 100 hits from NCBI BLASTp with good e values were all hypothetical proteins. Two PDB hits from HHpred were strong because they had a high probability of >98.5 and a low E-value of <10e-6, however the percent coverage is low ~21% for both hits. No CDD hits. Based on the good PDB hits from HHpred and the hits from PhagesDB BLASTp, the function was close to minor capsid protein but based on the SEA-Phages approved function list I hypothesize the function of the gene is a capsid maturation protease. /note=Transmembrane domains: The hypothesized function for the gene is a capsid maturation protease, which generally functions to help assemble the major capsid protein. Therefore, it makes sense that there are no TMDs for this ORF. /note=Secondary Annotator Name: Kim, Cindy /note=Secondary Annotator QC: I agree with the annotation above. Don`t forget to fill out the synteny and function box. CDS 4509 - 4997 /gene="5" /product="gp5" /function="scaffolding protein" /locus tag="Hum25_5" /note=Original Glimmer call @bp 4509 has strength 14.95; Genemark calls start at 4467 /note=SSC: 4509-4997 CP: yes SCS: both-gl ST: SS BLAST-Start: [scaffolding protein [Arthrobacter phage Persistence] ],,NCBI, q42:s58 61.1111% 1.87221E-16 GAP: 68 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.399, -3.773970443115436, yes F: scaffolding protein SIF-BLAST: ,,[scaffolding protein [Arthrobacter phage Persistence] ],,YP_010656006,36.7232,1.87221E-16 SIF-HHPRED: Scaffold protein; major capsid protein, HK97-like fold, scaffolding protein, procapsid, VIRUS; 3.72A {Staphylococcus phage 80alpha},,,6B0X_b,38.8889,98.5 SIF-Syn: scaffolding protein, upstream is portal protein and downstream is major capsid protein, just like in phages Aglet and Aragog of cluster A which are within the genes pham /note=Primary Annotator Name: Critzer, Nicole /note=Auto-annotation: Glimmer and GeneMark do not agree on the start site, Glimmer start site listed was 4509 and GeneMark start site listed was 4467. Although the GeneMark start site produced the smallest gap and the LORF it did not have the least negative final score and its z-score was <2. The start codon for the 4509 start is also (ATG) which is common and therefore reasonable. /note=Coding Potential:The region between 4509 and 4997 has strong coding potential but it should be noted that the start omits a region of coding potential, about 45 bp long, right before it and coding potential stops about 50 bp before the stop site. /note=SD (Final) Score: -3.774 this is the least negative therefore has better sequence match and the z-score is greater than 2, it is 2.399 /note=Gap/overlap: 68 bp which is less than 120 bp so the gap is too small for another gene to be found in between. In addition to this, if one compares the pham maps of phages in cluster A to the gene, it doesn’t look like another gene is normally inserted in the space. /note=Phamerator:Gene is in pham 74106 on 04/10/2023. This pham has 596 members of which 29 are drafts. This is in agreement with starterator and the gene is conserved in cluster A phages: Aglet and Aragog which are within the pham /note=Starterator: Analysis was run on 04/09/23 on database version 507 in this version the most annotated start site is 23 which is found in 487 out of 567 non-draft genes in the pham. This gene did not have the `most annotated start` which resulted in the auto-annotated start to be 27 @4509. This corresponds to the auto-annotated start called by Glimmer and this start is called 100% of the time when present. /note=Location call: 4509, Although this start site differs from the one called in GeneMark it has the best z-score and the least negative final score. Additionally when present this start is called 100% of the time according to Starterator and it encompasses strong coding potential. /note=Function call: scaffolding protein - The majority of hits provided by Phagesdb BLAST are for genes that function as scaffolding proteins. The top hits were ZoeJ_8 and Mufasa_8 that both function as scaffolding proteins and have an e-value of 1e-23 and a score of 107. Similarly the best hits for ncbi BLAST were genes with reasonable e-values and relatively high coverage (>60) also functioned as scaffolding proteins. This agrees with the top HHpred hit (6BOX_b). When one looks at the HHpred hit there is high confidence in predicting helix structures which agrees with the structure provided for the scaffolding protein. Another thing to note is that there were no hits in CDD. /note=Transmembrane domains:No hits observed in DeepTMHHM, protein is inside the membrane, so this is not a transmembrane protein. /note=Secondary Annotator Name: Estampa, Julia /note=Secondary Annotator QC: Great job with annotations! I agree with the majority of your calls and that the best start site location is at 4509 bp based on all the gathered evidence. I also agree that the function is a scaffold protein based on the evidence retrieved. However, when checking both the PhagesDB report and Starterator report, the pham number listed is 74106 and not 63536, where this pham has 596 members of which 29 are drafts. Under Starterator, the start number called the most often is 23, which is called in 487/567 of non-draft phages in the pham. Under auto-annotation, I think that it would be helpful to note the start codon used for the selected gene (ATG). /note=Additionally, under function call, you may also briefly note that there were no CDD hits returned. Other than that, amazing work with annotations! CDS 5020 - 5922 /gene="6" /product="gp6" /function="major capsid protein" /locus tag="Hum25_6" /note=Original Glimmer call @bp 5020 has strength 15.25; Genemark calls start at 5020 /note=SSC: 5020-5922 CP: yes SCS: both ST: SS BLAST-Start: [phage major capsid protein [Tetrasphaera sp.]],,NCBI, q1:s1 100.0% 7.80437E-157 GAP: 22 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.38, -3.8137921535956054, no F: major capsid protein SIF-BLAST: ,,[phage major capsid protein [Tetrasphaera sp.]],,MCB1238517,87.3333,7.80437E-157 SIF-HHPRED: Major head protein; major capsid protein, HK97-like fold, scaffolding protein, procapsid, VIRUS; 3.72A {Staphylococcus phage 80alpha},,,6B0X_G,97.0,100.0 SIF-Syn: This gene is the major capsid protein, and the upstream gene is the scaffolding protein and the second downstream gene is the head-to-tail adapter. This is the same in phages, such as AN3, AN9, AbbysRanger, and Anaysia, among others. /note=Primary Annotator Name: Deal, Milena /note=Auto-annotation: Both Glimmer and GeneMark agree on a start site of 5,020 for this gene with a start codon of ATG, which is a more common start codon. The end for the gene is 5,922. /note=Coding Potential: There is high coding potential through the whole region of the gene in one reading frame in host-trained GeneMark and self-trained GeneMark. /note=SD (Final) Score: The final score for the auto-annotated start site at 5020 is -3.814, which is the highest final score of all the suggested start sites. The Z-score is 2.38, which is a good Z-score. /note=Gap/overlap: There is a 23 base pair gap between this gene and the upstream gene using the auto-annotated start site. This is the longest reading frame, and this is not a large gap. /note=Phamerator: On 4/3/23, this gene is in pham 73451 and there are 998 phages with genes in this pham. 74 of these are drafts. Almost all genes in this pham were given the function of major capsid protein. There are a variety of clusters that phages with this gene belong to; the most common is cluster A. /note=Starterator: On 4/3/23, the most commonly called start site was 6 and it was called in 471 out of 924 non-draft genes. Start site 6 is not an option for this gene, and start site 22 is called instead (22, 5020). Start site 22 was called in one other phage, but it was the only other phage (phage ArV2_06) that was given this start site as an option. This phage is also a singleton. /note=Location call: Although other genes in this pham commonly call a different start site, that start site is not present in this gene. The auto-annotated start site at 5020 appears to be the best option because it is the longest open reading frame, it has a good Z-score and final score, and the one other phage with this start site given as an option chose it. /note=Function call: There were many phagesDB BLASTp hits with very low E values with the function of major capsid protein. There are many hits on NCBI BLASTp with very low E values that have the function of phage major capsid protein. On CDD, there are two hits which are both related to phage capsid proteins and have low E-values and good probability and coverage. HHpred also has many hits related to phage capsid proteins with low E-values and high probability. Therefore, the function of this protein is major capsid protein. /note=Transmembrane domains: There are no predicted transmembrane domains, which makes sense with the function of major capsid protein. /note=Secondary Annotator Name: Hamid, Bilal /note=Secondary Annotator QC: Include specific hit names/ indicators in the pecaan notes to make it more clear what evidence was used for pahges db blastp, CDD, HHPred, etc. All other information looks great! CDS 5931 - 6089 /gene="7" /product="gp7" /function="hypothetical protein" /locus tag="Hum25_7" /note=Original Glimmer call @bp 5931 has strength 12.97; Genemark calls start at 5925 /note=SSC: 5931-6089 CP: yes SCS: both-gl ST: NA BLAST-Start: GAP: 8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.137, -4.4057176022952405, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kretschmer, Thomas /note=Auto-annotation: Glimmer and GeneMark disagree on the start site. Glimmer says 5931 while Genemark says 5925. Both start with GTG, 5931 has a z-score of 2.137 while 5925 has a z-scoe of 0.583 so 5931 seems like the more accurate start site. /note=Coding Potential: Coding potential is seen in both host and self trained GeneMark. Coding potential is only on the forward strand suggesting this is a forward gene. /note=SD (Final) Score: -4.406, this is the best final score on PECAAN. /note=Gap/overlap: 8 bp. This is very reasonable as this start side contains the largest amount of coding potential and has the highest z-score, as well as the shortest gap without overlap. The length of the gene is 159 bp which is also reasonable. /note=Phamerator: Pham: 23266. Date 4/4/23. Orpham, no data. /note=Starterator: No data available for starterator due to the gene being in an Orpham. Gene has a proper location when compared to other genomes in the pham map, but is smaller than most genes in a similar location. /note=Location call: Based on the above evidence, this is likely a real gene with the start site of 5931. /note=Function call: NKF /note=Transmembrane domains: There are 0 transmembrane domains according to TMHMM and TOPCONS. The protein is located inside. /note=Secondary Annotator Name: Chawla, Esha /note=Secondary Annotator QC: Hey Thomas – great job with these notes! The lab manual wants us to include information about the phage`s cluster and potential called function in the phamerator tab so I would add that in. Also, make sure to select if the chosen start site covers all of the coding potential in the box above. Other than that, this looks great! CDS 6105 - 6491 /gene="8" /product="gp8" /function="head-to-tail adaptor" /locus tag="Hum25_8" /note=Original Glimmer call @bp 6105 has strength 6.7; Genemark calls start at 6105 /note=SSC: 6105-6491 CP: yes SCS: both ST: SS BLAST-Start: [Gp19/Gp15/Gp42 family protein [Glutamicibacter arilaitensis] ],,NCBI, q1:s1 100.0% 6.9178E-34 GAP: 15 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.323, -4.28556153901447, yes F: head-to-tail adaptor SIF-BLAST: ,,[Gp19/Gp15/Gp42 family protein [Glutamicibacter arilaitensis] ],,WP_134780219,62.5,6.9178E-34 SIF-HHPRED: a.229.1.1 (A:) Hypothetical protein YqbG {Bacillus subtilis [TaxId: 1423]} | CLASS: All alpha proteins, FOLD: Hypothetical protein YqbG, SUPFAM: Hypothetical protein YqbG, FAM: Hypothetical protein YqbG,,,SCOP_d1xn8a_,89.0625,98.4 SIF-Syn: upstream of head-to-tail stopper /note=Primary Annotator Name: Vu, Thomas /note=Auto-annotation: Glimmer predicts the start site to be at 9592 BP while GeneMark predicts a start site of 9652. Regardless, these 2 start sites are very close to each other. The start codon is ATG. /note=Coding Potential: There is high gene coding potential over the region believed to be the real gene. Both the self-trained and host-trained GeneMarks show evidence of high gene coding potential and the coding potential covers the suggested start site. /note=SD (Final) Score: The score is -4.881 which is the best score of the candidates. It also has the highest Z-score at 2.372. /note=Gap/overlap: The gap is +112 BP. This gap is the smallest of all candidates and it means that there is no overlap between this gene and its neighboring gene. /note=Phamerator: The Pham number is 74618 and the analysis was run on 04/14/23. This pham was also seen in the clusters AY and FA. Hum25 was compared to phages Anekin and Auxillum. /note=Starterator: Pham 74618 contains 27 members of which 7 are drafts. The most common manually annotated start site was start site 17 (10911 bp) which was called in 16 of the 20 manual annotations of this start site. /note=Location call: This is likely a real gene with a start site of 9592. /note=Function call: The gene function is a tape measure protein. The NCBI BLAST showed multiple hits indicating tape measure protein function where coverage was at least 94% or higher with e-values of 0. In the CDD, 1 of the 3 hits was TIGR01760 which had coverage of 20.34% and an e-value of 0. The hit description was for a phage tail tape measure protein. Despite calling the gene as a tail protein, the HHPRED results were less conclusive as there were no hits with e-values less than 0.53 nor coverage higher than 25%. Nonetheless, there are multiple pieces of evidence in favor of tape measure protein function. /note=Transmembrane domains: 3 TMRs were predicted by TMHMM so transmembrane function inferred. This is expected since a tape measure protein affects phage tail function which spans across the membrane. CDS 6488 - 6889 /gene="9" /product="gp9" /function="head-to-tail stopper" /locus tag="Hum25_9" /note=Original Glimmer call @bp 6488 has strength 13.64; Genemark calls start at 6476 /note=SSC: 6488-6889 CP: yes SCS: both-gl ST: SS BLAST-Start: [head-to-tail stopper [Arthrobacter phage Richie] ],,NCBI, q1:s1 100.0% 2.05659E-60 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.048, -4.513079314654054, no F: head-to-tail stopper SIF-BLAST: ,,[head-to-tail stopper [Arthrobacter phage Richie] ],,YP_010655732,81.8182,2.05659E-60 SIF-HHPRED: Stopper protein Rcc01689; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_E,87.9699,98.4 SIF-Syn: head-to-tail stopper. Synteny with Auxillium, Constance from cluster AY. Both genes are downstream of a head-to-tail adapter and upstream of genes of unknown function with synteny to one another. /note=Primary Annotator Name: Hughes, Audia /note=Auto-annotation: Glimmer and Genemark. Glimmer calls start site at 6488. GeneMark starts at 6476. /note=Coding Potential: Coding potential is within ORF. Is present on the Forward strand. Is present on Host-Trained Genemark and gene mark self. /note=SD (Final) Score: -4.513 (goal =closer to 0) Z score: 2.048 (goal = 2 and above) /note=Gap/overlap: Overlap:-4, Chose this option over a candidate with longer ORF as the longer ORF candidate had a bigger overlap (-16) /note=Phamerator: pham 3524. Date: 4/3/2023. It is conserved, found in 25 members of pham from clusters AY and FA. /note=Starterator: Start site 9 found in 17/18 non-draft genomes including hum_25. Start 9 corresponds with start 6488 in Hum_25. /note=Location call: This a real gene and the according to the above evidence the start site is at 6488 /note=Function call: head-to-tail stopper Phages Db: 3 Highest hits are head-to-tail stopper (all with e value: 9e-48) NCBI: 10 highest hits are head-to-tail stopper with e-values between 2.05659e-60 and 4.20656e-49) . Functional call of head-to-tail stopper requires an HHpred hit to SPP1 16 or YqBH. HHpred: SPP1 16 hit with e-value 0.0000024. CDD: no valuable hits. /note=Transmembrane domains: No transmembrane domains predicted /note=Secondary Annotator Name: Wu, Grace /note=Secondary Annotator QC: Good job on your annotations! I think you hit everything, just make sure to add more details about your answers, maybe add numbers as a supportive evidence. CDS 6886 - 7275 /gene="10" /product="gp10" /function="hypothetical protein" /locus tag="Hum25_10" /note=Original Glimmer call @bp 6886 has strength 9.76; Genemark calls start at 6886 /note=SSC: 6886-7275 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein ASF74_14945 [Arthrobacter sp. Leaf145]],,NCBI, q1:s1 100.0% 6.93676E-72 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.033, -2.583959800616441, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein ASF74_14945 [Arthrobacter sp. Leaf145]],,KQQ98027,91.4729,6.93676E-72 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Estampa, Julia /note=Auto-annotation: Glimmer and Genemark both call the gene and indicate that the start site is at 6886 bp. The start codon is GTG. Host-Trained and Self-Trained GeneMark both reflect relatively high coding potential that is consistent with the ORF. The chosen start site covers approximately all the coding potential. /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF. /note=SD (Final) Score: The SD score is -2.584 and is the best from the list, and this start site includes the LORF. However, the final score is irrelevant in this case because the gene is most likely part of an operon as suggested by the 4bp overlap. /note=Gap/overlap: The overlap is 4 bp (gap of -4bp) long upstream of the gene, which is small and reasonable. This overlap suggests the gene is most likely part of an operon. The length of the gene (390 bp) is acceptable. /note=Phamerator: Pham: 3708. Date found: 04/03/23. The majority of the phages in this pham are in cluster AY with gene length of 384 bp. There is no called function. /note=Starterator: Start number 4 was called the most often in published annotations and was manually annotated in 18 of the 18 non-draft genes in this Pham. Start number 4 has been found in 24 of 24 (100%) of genes in the Pham. This evidence agrees with the site predicted by Glimmer and GeneMark. 3 clusters represented in this pham: AY, FA, singleton. Phage genes in this pham have different start sites, but the majority share the same start number 4. /note=Location call: Gathered evidence suggests this is a real gene with good coding potential and that the strongest candidate for the start site is 6886 bp. This start site is consistent with the ORF and the 4 bp overlap is common for genes in an operon. /note=Function call: NKF. The majority of phagesDB BLAST hits revealed “function unknown” for phages with E-values ranging from 1e-13 to 2e-05. NCBI BLAST hits revealed similar results suggesting NKF. No CDD hits returned. In HHPred, the top hit suggested minor capsid protein function with a probability of 99.38% and E-value of 1.3e-12, however, there is overall not much evidence supporting this function, and the majority of gathered evidence strongly suggests NKF. /note=Transmembrane domains: Since there are no predicted TMHs or TMDs returned from PECAAN and DeepTMHMM, it is not a membrane protein. /note=Secondary Annotator Name: Tosasuk, Kaemin /note=Secondary Annotator QC: Great work! Nothing else needs to be changed. CDS 7272 - 7703 /gene="11" /product="gp11" /function="tail terminator" /locus tag="Hum25_11" /note=Original Glimmer call @bp 7272 has strength 11.19; Genemark calls start at 7272 /note=SSC: 7272-7703 CP: yes SCS: both ST: SS BLAST-Start: [tail terminator [Arthrobacter phage EvePickles]],,NCBI, q17:s6 88.1119% 1.22002E-28 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.483, -3.738562143838384, no F: tail terminator SIF-BLAST: ,,[tail terminator [Arthrobacter phage EvePickles]],,UYL88300,60.0,1.22002E-28 SIF-HHPRED: Tail terminator protein Rcc01690; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_F,93.7063,99.7 SIF-Syn: There does not seem to be any synteny. /note=Primary Annotator Name: Le, Vivian /note=Auto-annotation: Both Glimer and GeneMark call the start of 7272 /note=Coding Potential: Reasonable coding potential is found for both GeneMark self and host. The start site also covers all the coding potential. There was no significant overlap either. /note=SD (Final) Score: -3.739. This is this the best final score on PECAAN /note=Gap/overlap: -4 bp overlap. This indicates that this is an operon. /note=Phamerator: The pham number as of 04.03.2023 is 3597. The gene is conserved in GlobiWarming (FA) and Sakai (AY). /note=Starterator: Gene (stop@7703 F) did not have the “Most Annotated” (manual annotation) start like the other genes in the pham, which was start site 6. Instead, Starterator called a start site of 4. It must also be noted that Hum25 is a singleton. Starterator agreed with Glimmer and GeneMark with a start of 7272. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 7272. Starterator agrees with Glimmer and Genemark. /note=Function call: Based on the PhagesDB BLASTp and NCBI BLASTp, there is strong evidence for the function to be a tail terminator. There were strong BLAST hits and e-scores. There were no strong CDD hits. HHpred showed a strong hit for 6TE9_F, which is a tail terminator protein. There was also a strong hit for 5A21_G, a tail-to-head joining protein. It must be noted that in order to assign the function tail terminator, it must have an HHpred alignment to one of the following: SPP1 17 (5A21 chain G in the macromolecular complex) or Lambda U (3FZ2_chains A through F). In this case, we do have an alignment, 5A21_G. Overall, it seems that the function is a tail terminator. /note=Transmembrane domains: There were 0 predicted TMD`s. The topology graph also showed the gene/protein to only be on the inside. This makes sense, because we predicted that the gene is tail terminator involved tail assembly. If this is the case then, it would not need to cross the membrane and would need to be on the inside for tail assembly. /note=Secondary Annotator Name: Smith, Steven /note=Secondary Annotator QC: CDS 7728 - 7901 /gene="12" /product="gp12" /function="hypothetical protein" /locus tag="Hum25_12" /note=Original Glimmer call @bp 7746 has strength 8.71; Genemark calls start at 7758 /note=SSC: 7728-7901 CP: yes SCS: both-cs ST: NA BLAST-Start: [hypothetical protein [Pseudarthrobacter equi] ],,NCBI, q13:s6 50.8772% 1.42358E-9 GAP: 24 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.414, -6.496397267647118, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Pseudarthrobacter equi] ],,WP_261619065,54.0,1.42358E-9 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Wang, Xinyi /note=Auto-annotation: /note=Glimmer and GeneMark both called the gene. Glimmer annotates start at 7746 with start codon TTG. GeneMark annotates start at 7758 with start codon ATG. Auto-annotation disagrees with each other. The LORF starts at 7728. /note=Coding Potential: /note=There is coding potential in the forward direction for starts designated by Glimmer and GeneMark and the LORF start, which suggests that this gene is real. 7746, 7758, and 7728 all cover the entire coding potential. /note=SD (Final) Score: /note=The SD(Final) score for start 7746 is -8.659, which is the best score on PECAAN. The SD(Final) score for start 7758 is -4.721. The SD(Final) score for start 7728 is -6.496. /note=Gap/overlap: /note=7746 leaves a gap of 42 bp between the previous gene and the current gene. 7758 leaves a gap of 54 bp between the previous gene and the current gene. Given that the start site that gives the longest open reading frame, which is the start 7728, leaves a gap of 24 bp between the previous gene and the current gene, 3 different start site choices are still available. /note=Phamerator: As of 4/3/23, the gene belongs to Pham 24968, which is an orpham. /note=Starterator: The website reports “Page/Report not found, The requested Pham report could not be found.” /note=Location call: Need QC or instructor help for assigning start site. /note=Start 7746 has the worst SD score, and its z-score is only 0.574, which is the worst of the three, and it also has a rare start codon TTG. /note=While start 7758 has a good z-score(1.949) and SD score(-4.721), but it leaves a huge gap of 54 bp between this gene and the previous one. This start covers most of the coding potential but not all of them. /note=In contrast, start 7728 allows LORF, a reasonable z-score(1.414) and SD score(-6.496), covers the entire coding potential, and has a common start codon(GTG). /note=Function call: /note=Overall, the function is unknown given the existing evidence from databases. /note=There is no information about the function on the conserved domain database in NCBI. /note=There is only one “hypothetical protein” entry on the NCBI BlastP website which has a 50% coverage and a 1e-09 E value. Although the data seems doable that suggests this gene is real, the function is unknown. /note=Using PhagesDB BlastP, it shows one entry with a score of 29, an e-value of 3.3, and an identity of 42%. Overall, this reference doesn’t give a good match with the target gene sequence, and the function of the reference sequence is unknown. /note=Checking the HHpred outputs, none of the probability, e-values, and overall scores suggest that the listed sequences can provide evidence for the target sequence function. /note=Transmembrane domains: /note=There is no transmembrane domain suggested by TMHMM, which shows that the protein is neither a membrane protein nor a transmembrane domain. /note=Secondary Annotator Name: Reyimjan, Diana /note=Secondary Annotator QC: I agree with the observations made in this annotation, including the choice of the start site. It might be worth mentioning that start 7746 just barely covers all the coding potential, supporting the choice of 7728. I think that start site 7728 having the lowest final score trumps the small difference (around 0.5) in the z-scores of 7728 and 7758. I agree with the function called. Perhaps it would be good to also mention the lack of CDD hits. CDS 7904 - 8419 /gene="13" /product="gp13" /function="hypothetical protein" /locus tag="Hum25_13" /note=Original Glimmer call @bp 7904 has strength 18.26; Genemark calls start at 7904 /note=SSC: 7904-8419 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein ASF74_14955 [Arthrobacter sp. Leaf145]],,NCBI, q3:s2 98.8304% 1.98906E-95 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.033, -2.583959800616441, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein ASF74_14955 [Arthrobacter sp. Leaf145]],,KQQ98029,88.2353,1.98906E-95 SIF-HHPRED: SIF-Syn: /note=Gene 13: Possible major tail based on synteny, but HHpred hits in support of htis call are terrible, and most genes in pham do not call a function. CDS 8518 - 9039 /gene="14" /product="gp14" /function="tail assembly chaperone" /locus tag="Hum25_14" /note=Original Glimmer call @bp 8518 has strength 15.98; Genemark calls start at 8518 /note=SSC: 8518-9039 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein M707_02750 [Arthrobacter sp. AK-YN10]],,NCBI, q4:s3 97.6879% 3.9796E-71 GAP: 98 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.173, -6.87831925829631, no F: tail assembly chaperone SIF-BLAST: ,,[hypothetical protein M707_02750 [Arthrobacter sp. AK-YN10]],,ERI39189,77.7778,3.9796E-71 SIF-HHPRED: Uncharacterized protein yqbN; NESG, STRUCTURAL GENOMICS, PSI-2, PROTEIN STRUCTURE INITIATIVE, NORTHEAST STRUCTURAL GENOMICS CONSORTIUM, UNKNOWN FUNCTION; 2.2A {Bacillus subtilis},,,3KLU_A,41.6185,97.8 SIF-Syn: /note=Primary Annotator Name: Smith, Steven /note=Auto-annotation: Both GeneMark and Glimmer called the same start site at 8518 /note=Coding Potential: The Host-trained and the Self-trained GeneMark both show good coding potential in the region before the stop codon. /note=SD (Final) Score: The start site at 8518 has a Final Score of -6.878, and a Z-Score of 1.173. This Z-score is not the highest but I still think this start site is the best as it has an ATG start codon as well as being the longest ORF. Despite the high final score, it also the only option with a reasonably small gap between genes, as the next smallest gap is nearly 200 bp. /note=Gap/overlap: There is a gap of 98 bp which means this gene is not part of an operon. This is also not an unusually small or large gap. /note=Phamerator: 4/5/23 this gene is in pham 8925 which is found in 4 other phages that are not draft genomes. Faja and EvePickles are members of cluster AY, and TripleJ and LuckyBarnes are other singletons. /note=Starterator: The called start site, start site 1, is the most common with 3/5 being called and 2/4 manual annotations of this start site. /note=Location call: Looking at the above evidence this call is most likely a functional gene starting at 8518. /note=Function call: NKF. No function call can be made for this gene as only phagesDB BLAST gave hits with low enough e-values to be considered significant. NCBI BLAST only gave hits for hypothetical proteins. CDD gave no hits at all and HHPRED gave no hits with low e-values. /note=Transmembrane domains: There were no TMDs predicted by DeepTMHHM so this cannot be a transmembrane domain. /note=Secondary Annotator: Lim, Madeleine /note=Secondary Annotator QC: Nothing else needs to be changed. Great Work! CDS join(8518..8994,8994..9374) /gene="15" /product="gp15" /function="tail assembly chaperone" /locus tag="Hum25_15" /note= /note=SSC: 8518-9374 CP: no SCS: neither ST: NI BLAST-Start: [hypothetical protein M707_02750 [Arthrobacter sp. AK-YN10]],,NCBI, q4:s3 57.8947% 9.95693E-63 GAP: -522 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.173, -6.87831925829631, no F: tail assembly chaperone SIF-BLAST: ,,[hypothetical protein M707_02750 [Arthrobacter sp. AK-YN10]],,ERI39189,74.269,9.95693E-63 SIF-HHPRED: SIF-Syn: CDS complement (9327 - 9497) /gene="16" /product="gp16" /function="membrane protein" /locus tag="Hum25_16" /note=Original Glimmer call @bp 9542 has strength 5.18 /note=SSC: 9497-9327 CP: yes SCS: glimmer-cs ST: NA BLAST-Start: [hypothetical protein [Pseudarthrobacter equi] ],,NCBI, q1:s16 100.0% 3.08333E-20 GAP: 94 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.722, -5.723716915074367, no F: membrane protein SIF-BLAST: ,,[hypothetical protein [Pseudarthrobacter equi] ],,WP_261619061,67.6056,3.08333E-20 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Vanderpool, Lauren /note=Auto-annotation: Just GlimmerStart, 9542 /note=Coding Potential: There seems to be reasonable coding potential in the reverse direction based on the Host-Trained GeneMark map. However, there is some overlap according to the Pham Maps.  /note=SD (Final) Score: --4.455, with the start site at 9479. I would argue that this is the best choice for this particular gene, because it has the best final score, best Z-score, and a larger gap which is necessary for the directional change. The gene was compared to another orpham in the AY cluster, and it had a similar gene with a gap of 108 bp. /note=Gap/overlap: According to the Pham Map, there is an overlap of 47 base pairs to the left of the gene and a gap of 50 base pairs to the right of the gene. It is the only reverse gene in the area. However, with a changed start site, there would be a gap of 112 bp and an overlap of 47 bp. /note=Phamerator: Pham 22679 (last checked 4/5/23), singleton /note=Starterator:  Orpham, N/A /note=Location call: I would call it at 9479, because this start site has the best final score, best Z-score, and a larger gap which is necessary for the directional change. The gene was compared to another orpham in the AY cluster, and it had a similar gene with a gap of 108 bp. /note=Function call: BLAST returns “function unknown”, with similar genes listed as “tape measure proteins”. NCBI had some results with low e-values, but they were hypothetical proteins. According to HHpred, this is a signal peptide.  /note=Transmembrane domains: There are 0 predicted TMD’s, and it appears to be an internal protein based off of the summary.  /note=Secondary Annotator Name: Vajragiri, Shreya /note=Secondary Annotator QC: Y: Great job overall! I think I agree with you that the gene is real, but there is the issue of the overlap (~50bp, which is a lot) with the preceding forward gene (to the left), but determining the start site is a little tricky. Other than that, I also noticed that you said there were no NCBI blast hits, which isn`t quite true as there are some good hits with low e-values, but its just with hypothetical proteins. I would probably clarify this! Also put n/a or similar in the synteny box, and please check the hits in the BLAST (NCBI). CDS 9592 - 12558 /gene="17" /product="gp17" /function="tape measure protein" /locus tag="Hum25_17" /note=Original Glimmer call @bp 9592 has strength 15.23; Genemark calls start at 9652 /note=SSC: 9592-12558 CP: yes SCS: both-gl ST: SS BLAST-Start: [phage tail tape measure protein [Arthrobacter sp. M4] ],,NCBI, q1:s1 95.9514% 0.0 GAP: 94 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.372, -4.880971795477472, no F: tape measure protein SIF-BLAST: ,,[phage tail tape measure protein [Arthrobacter sp. M4] ],,WP_225498486,70.2191,0.0 SIF-HHPRED: SIF-Syn: downstream of tail assembly chaperones /note=Primary Annotator Name: Vu, Thomas /note=Auto-annotation: Glimmer predicts the start site to be at 9592 BP while GeneMark predicts a start site of 9652. Regardless, these 2 start sites are very close to each other. The start codon is ATG. /note=Coding Potential: There is high gene coding potential over the region believed to be the real gene. Both the self-trained and host-trained GeneMarks show evidence of high gene coding potential and the coding potential covers the suggested start site. /note=SD (Final) Score: The score is -4.881 which is the best score of the candidates. It also has the highest Z-score at 2.372. /note=Gap/overlap: The gap is +49 BP. This gap is the smallest of all candidates and it means that there is no overlap between this gene and its neighboring gene. /note=Phamerator: The Pham number is 72845 and the analysis was run on 03/24/23. This pham is a singleton, meaning there is only 1 cluster that contains this pham. Hum25 was compared against phages Chymera and Attoomi. /note=Starterator: There is only 1 non-draft gene and thus 1 most-annotated start site which is at site 1 (10035 BP). While this annotated start site is near the predicted start site of 9592, the Staterator analysis is somewhat uninformative as there is only 1 non-draft gene and annotation to base the analysis on. /note=Location call: This is likely a real gene with a start site of 9592. /note=Function call: The gene function is a tape measure protein. The NCBI BLAST showed multiple hits indicating tape measure protein function where coverage was at least 94% or higher with e-values of 0. In the CDD, 1 of the 3 hits was TIGR01760 which had coverage of 20.34% and an e-value of 0. The hit description was for a phage tail tape measure protein. Despite calling the gene as a tail protein, the HHPRED results were less conclusive as there were no hits with e-values less than 0.53 nor coverage higher than 25%. Nonetheless, there are multiple pieces of evidence in favor of tape measure protein function. /note=Transmembrane domains: 3 TMRs were predicted by TMHMM so transmembrane function inferred. This is expected since a tape measure protein affects phage tail function which spans across the membrane. CDS 12574 - 13560 /gene="18" /product="gp18" /function="minor tail protein" /locus tag="Hum25_18" /note=Original Glimmer call @bp 12574 has strength 9.89; Genemark calls start at 12820 /note=SSC: 12574-13560 CP: yes SCS: both-gl ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage Kumotta] ],,NCBI, q4:s3 99.0854% 2.93039E-108 GAP: 15 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.284, -4.0962544390943965, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Kumotta] ],,YP_010649425,65.2308,2.93039E-108 SIF-HHPRED: HYPOTHETICAL PROTEIN 19.1; VIRAL PROTEIN, DISTAL TAIL PROTEIN; 2.95A {BACILLUS PHAGE SPP1},,,2X8K_C,94.5122,99.9 SIF-Syn: /note=Primary Annotator Name: Douglas, Katherine /note=Auto-annotation: 12574 by Glimmer and 12820 by Genemark /note=Coding Potential: Covers all the coding potential in the region. There are a few areas of small overlap and coding potential on other ORFs in the reverse direction but not overly and the majority of all the coding potential is contained within one ORF in the forward direction. There is no synteny with other phages and Hum25 as is a singleton. The gene is present in several other phages but does not share surrounding synteny. There are also some very good z-scores and strong BLAST results. /note=SD (Final) Score: -4.096 (not the least negative but one of the less negative numbers and coupled with a good z-score of 2.234) /note=Gap/overlap: 15 (reasonable gap) /note=Starterator: Glimmer auto annotated start site at 12574 is not the most annotated start site but Hum25 does not contain the most annotated start site. The site (site #82 at 12574) is called 77.8% of the time while present and has 5 MAs. Since Hum25 does not contain the most annotated start site and ite #82 at 12574 has several MAs and is usually called when present, it is a reasonable start site for Hum25. The other auto annotated start site by GeneMark (site #227 at 15820) had no MAs. /note=Location call: The start site predicted by Glimmer at 12574 is the probable start site. It includes all the coding regions, has a reasonable gap of 15, and good RBS (-4.096) and z-score (2.284). This start site was annotated 77.8% of the time when present. It was not the most annotated site, however, Hum25 did not have the most annotated site. The site at 12574 had the most MAs of the start sites Hum25 did have. The start codon is GTG which has a high probability. /note=Function call: Based on BLASTp results in both Phagesdb and NCBI, this protein is a minor tail protein. Other phages in several other clusters, predominately the A and K clusters, all called this protein as a minor tail protein with strong E values (less than e^-60). CDD had no hits. HHpred showed strong alignment of several hits to viral proteins, though not specifically within Arthrobacter phages specifically. The best hits for viral tail proteins that matched the HHpred results for this gene had very good e-values (highest being e^-20) with probabilities over 99. /note=Transmembrane domains: 0 TMDs detected. Since this is a minor tail protein without a Transmembrane protein, this particular tail protein is likely more involved in other steps of host recognition rather than entry through the host membrane. CDS 13554 - 15002 /gene="19" /product="gp19" /function="minor tail protein" /locus tag="Hum25_19" /note=Original Glimmer call @bp 13554 has strength 11.12; Genemark calls start at 13542 /note=SSC: 13554-15002 CP: yes SCS: both-gl ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage Kumotta] ],,NCBI, q6:s3 59.1286% 2.97103E-111 GAP: -7 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.702, -3.1380934916725747, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Kumotta] ],,YP_010649426,55.0802,2.97103E-111 SIF-HHPRED: Protein gp18; NP_465809.1, prophage tail protein gp18, Structural Genomics, Joint Center for Structural Genomics, JCSG, Protein Structure Initiative; HET: MSE, MLY; 1.7A {Listeria monocytogenes EGD-e},,,3GS9_A,98.3402,99.8 SIF-Syn: Minor Tail Protein. Second minor tail protein downstream from tape measure protein, as well as between two other minor tail proteins as in FB cluster Sarge and Shoya. /note=Primary Annotator Name: Sacristan, Ariana /note=Auto-annotation: Glimmer and GeneMark do not agree on the start site. Glimmer calls the start site at 13554 bp at a GTG start codon and GeneMark calls the start site at 13542 bp at an ATG start codon. /note=Coding Potential: This gene demonstrates reasonable coding potential predicted within the putative ORF that contains either of the chosen starts site. /note=SD (Final) Score: The Final score for the selected start site -3.138. This is the best SD score available. The Z- Score is 2.7 which is a highest score, and a strong indication of the start site. /note=Gap/overlap: There is a 7 bp overlap between the gene and the one upstream, which is small enough to be reasonable. The chosen start site does not include the true LORF because that site had a larger overlap and lower SD score. The length of this gene is acceptable as it is 1,449 bp long. /note=Phamerator: As of 03/24/2023 this gene is located in Pham 65652. This pham contains other singleton members such as Success and Ibantik, which were used to compare against Hum25 for synteny. The function consistently called in Phamerator for this gene was minor tail protein. /note=Starterator: There is a reasonable start site choice that is conserved among the Pham 65652. It is the Start 5 which is called in 127/142 non-draft genes in the pham. However, the start site for this gene is called at Start 2 @13554, which is very close to the "most annotated" start, suggesting an evolutionary change. Additionally, it is the only start with a manual annotation. /note=Location call: Collectively the evidence suggests that this is a real gene with the potential start site most likely located at 13554. /note=Function call:The predicted function of this ORF is a minor tail protein. NCBI and PhagesDB BLASTp provided several strong hits. PhagesDB Blastp had minor tail protein hits with E-values as low as 2e-90. NCBI had hits with an average of 56% coverage and E-value as low as 2.97e-111. HHPred also had numerous hits for various tail proteins with very high probabilities and low E-values. A top PDB HHPred hit for tail protein has a probability of 99.55% and an E-value of 7.9e-11. /note=Transmembrane domains: There were no predicted TMDs by TMHMM, therefore it is not a membrane protein. /note=Secondary Annotator Name: Tran, Krysten /note=Secondary Annotator QC: I agree with both your location and function call for this gene and your notes are detailed enough to accurately explain these calls as well. I would review the synteny box again and compare this gene in Hum25 to other phages that infect the same host rather than other singletons. CDS 15002 - 15970 /gene="20" /product="gp20" /function="minor tail protein" /locus tag="Hum25_20" /note=Original Glimmer call @bp 15002 has strength 7.54; Genemark calls start at 15002 /note=SSC: 15002-15970 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Pseudarthrobacter sp. NamE5] ],,NCBI, q1:s1 70.4969% 2.52525E-89 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.137, -4.387988835334809, yes F: minor tail protein SIF-BLAST: ,,[hypothetical protein [Pseudarthrobacter sp. NamE5] ],,WP_138607656,48.8235,2.52525E-89 SIF-HHPRED: Receptor Binding Protein; beta sandwich domain, phage receptor binding protein, Lactococcus lactis pellicle cell wall polyphosphosaccharide, VIRAL PROTEIN; 1.75A {Lactococcus phage 1358},,,4L9B_A,48.7578,99.7 SIF-Syn: Minor Tail Protein. Third minor tail protein gene downstream of tape measure protein, same organization and function to phages Bauer (FN), Zucker (FN), & Sarge (FB). /note=Primary Annotator Name: Barden, Sophia /note=Auto-annotation: Glimmer and GeneMark agree; call Start at 15002. /note=Coding Potential: Coding potential in the second ORF on the forward strand for 15002-15970 only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host Trained. /note=SD (Final) Score: The best final score is -4.388 with the highest z score of 2.137. This selection calls Start site 15002 with codon ATG. /note=Gap/overlap: Overlap of -1 bp. Ultimately reasonable overlap. There is coding potential that initiates at the overlap, which likely contributes to the next gene. Hum25 is a singleton, so identifying synteny/conservation across other phage genomes is difficult. Based on Pham 8188, synteny for this gene is identified with phages Bauer (FN), Zucker (FN), & Sarge (FB). /note=Phamerator: Analysis was run 03/24/23 on database version 505. Pham number 8188 has 6 members, 2 are drafts. Hum25, Gene20(stop@15970F) does not have the most annotated start. Starterator: Calls start 2: (2, 15002). This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and from 15002 - 15970. Starterator agrees with Glimmer and Genemark. /note=Function call: PhagesDB BLAST top three hits → All functions unknown. (Score range: 237-221, E value range: 3e-62 – 1e-57). NCBI BLAST top three hits → all functions unknown. Two hits selected with unknown function (E value range: 2.5e-89 – 5.7e-74, % coverage: 70–67%, % identity: 43–38%). HHPRED top three hits → Calls Receptor Binding protein (probability: 99+, % coverage: 42%+, E value range: 2.4e-15 – 4.4e-7). This is directly downstream of tape measure protein. Call Minor Tail Protein for this gene. /note=Transmembrane domains: Cannot call membrane protein. Number of predicted TMRs: 0 /note=Secondary Annotator Name: Ryan Hoang /note=Secondary Annotator QC: Great job! Be careful about the synteny- since I know it is a singleton and therefore a lot of genes are probably going to be orphams, but nice job otherwise CDS 15984 - 16310 /gene="21" /product="gp21" /function="membrane protein" /locus tag="Hum25_21" /note=Original Glimmer call @bp 15984 has strength 7.9; Genemark calls start at 15984 /note=SSC: 15984-16310 CP: yes SCS: both ST: NI BLAST-Start: GAP: 13 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.866, -3.1452892479852843, no F: membrane protein SIF-BLAST: SIF-HHPRED: SIF-Syn: There is no synteny to report for this gene. /note=Primary Annotator Name: Dawson, Niels /note=Auto-annotation: Glimmer and Genemark agree on the start site at 15984. /note=Coding Potential: There is only coding potential on the forward strand, indicating that this is likely a real gene. This is shown on genemark self and host. /note=SD (Final) Score: SD is -3.145, the best final score on PECAAN. /note=Gap/overlap: Gap is 13, which is reasonable considering that the previous gene is also a forward gene. Bacteriophage is a singleton. /note=Phamerator: pham is 21621. Date: 3/31/2023. This gene does not seem to be conserved in other bacteriophages, as e values are nonsignificant. /note=Starterator: Starterator report not available for this bacteriophage, as it is a singleton. /note=Location call: Based on available evidence, this is a real gene and the start site is at 15984. /note=Function call: There is no evidence for function of this gene. It’s function is unknown. HHPRED, CDD and NCBI BLAST had no hits for this bacteriophage. /note=Transmembrane domains: There is one transmembrane domain predicted by TmHMM. Topcons had two sets of software that detected two similar transmembrane domains. These were polyphobius and pillius. Other programs did not predict transmembrane domains. /note=Secondary Annotator Name: Sacristan, Ariana /note=Secondary Annotator QC: Notes were very concise. Reported on all the available information and made NKF call very clear. CDS 16282 - 17019 /gene="22" /product="gp22" /function="minor tail protein" /locus tag="Hum25_22" /note=Original Glimmer call @bp 16282 has strength 4.24; Genemark calls start at 16282 /note=SSC: 16282-17019 CP: no SCS: both ST: NA BLAST-Start: [minor tail protein [Arthrobacter phage Elesar] ],,NCBI, q11:s13 95.9184% 1.85477E-49 GAP: -29 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.163, -4.4121902292613795, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Elesar] ],,YP_010761188,64.876,1.85477E-49 SIF-HHPRED: SIF-Syn: Primary Annotator Name: Li, Mulin The minor tail protein is flanked by upstream tape measure proteins and downstream lysin proteins, just like in Phage Eyre, Finckle, Footloose, and Gudmit. /note=Primary Annotator Name: Li, Mulin /note=Auto-annotation: Both Glimmer and GeneMark called and agreed on the same start codon position - 16282 /note=Coding Potential: The coding potential predicted by both self-trained and host-trained model is high throughout the body of the gene. However, no coding potential was predicted at the predicted start sites (@16282). Coding potential was predicted for another start site (@16354). /note=SD (Final) Score: Start site @16282 has -4.412; Start site @16354 has -5.329. Though SD score and z-score (2.163) for start site @16282 are higher than start site @16354 (z-score = 1.827), the start site @16354 contributes more to a compact and functional phage genome. /note=Gap/overlap: The auto-annotated start site @16282 has a 29 nt overlap with the previous gene, which is way larger than the accepted overlapping range (4bp). Another start site @16354 has a 43 nt gap with the previous gene, which is short enough to keep the phage genome compact. /note=Phamerator: This gene product belongs to Pham 67284, which is conserved among 65 phages. The majority of the phages in this Pham belongs to BU cluster. There are a few exceptions that belong to FF, AY, EJ, and a few singletons. The majority of the phages are fully annotated, which supports this gene as a real gene. The most frequent called function is minor tail protein. /note=Starterator: The starterator was generated on 03/24/23. Hum25 does not have the most annotated start sites (start site #2) called by 53 other phages in the pham (65 in total). The predicted Hum25_22@16282 is only called by Hum25. Though this start site has a high RBS score, it has too much an overlapping (29nt) with the previous gene and it is not supported by good coding potential. An alternative start site @16354 has a decent RBS score and an acceptable gap with previous gene (43 nt). It is also supported by good coding potential. /note=Location call: This gene is a real gene. It has a start site @16354 and a stop site @17019 /note=Function call: Minor Tail Protein; Both PhageDB Blast and NCBI Blast generated top hits with minor tail proteins from other phages such as Elesar and EvePickles or Collagen-like proteins from bacteria with high percentage of sequence alignment (around 40-50%) and e-values less than 4e-30. CDD generated hits for LPXTG-anchored collagen-like adhesin Scl2/SclB (e-value = 1.95e-11, bit-score = 62.60), and HHpred generated crystal structures similar to many collagen alpha chains (though not associated with a decent e-value; e-value > 10e-3). This gene also shows high level of synteny with other phage genes annotated as minor tail proteins. Together, these evidence supported that this gene encodes for minor tail proteins. /note=Transmembrane domains: Neither TmHmm nor Topcon predicts transmembrane domain for this gene. TmHmm predicts this protein to be the outside of the cell, which supports its function as minor tail protein. /note=Secondary Annotator Name: NGUYEN, MYA /note=Secondary Annotator QC: Notes are concise and clear. Explanations on the calls were well supported. Evidence boxes are checked off, however, I think that you may have checked off too many since the QC checklist says to check no more than 3. Drop down box and synteny box is filled out. CDS 17016 - 17738 /gene="23" /product="gp23" /function="endolysin" /locus tag="Hum25_23" /note=Original Glimmer call @bp 17016 has strength 6.97; Genemark calls start at 17016 /note=SSC: 17016-17738 CP: yes SCS: both ST: SS BLAST-Start: [endolysin [Arthrobacter phage Sarge] ],,NCBI, q1:s1 97.0833% 2.61985E-119 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.37, -3.898968357315439, no F: endolysin SIF-BLAST: ,,[endolysin [Arthrobacter phage Sarge] ],,YP_010649576,80.8333,2.61985E-119 SIF-HHPRED: N-acetylmuramoyl-L-alanine amidase; amidase, zinc binding, cell wall degradation, endolysine, hydrolase; HET: PO4, GOL; 1.21A {Clostridium intestinale},,,6SSC_A,50.8333,99.6 SIF-Syn: Function is endolysin, upstream gene is holin, downstream gene is minor tail protein, just like in cluster FB Arthrobacter globiformis phage Sarge. /note=Primary Annotator Name: Reyimjan, Diana /note=Auto-annotation: Glimmer and GeneMark call the same start site of 17016. This is not the longest open reading frame and the start codon is ATG, which agrees with the phage’s start codon usage. /note=Coding Potential: There is coding potential in the third open reading frame for both host-trained and self-trained GeneMark. The predicted start site of 17016 covers all the coding potential. /note=SD (Final) Score: The Final Score is -3.899. This is the lowest of the final scores for all the possible start sites, which indicates 17016 is a better match for the sequence. The z score of 2.37 is not the most positive, but because the overlap of 4bp places less weight on the start call, the start site of 17016 is still likely. /note=Gap/overlap: There is an overlap of 4bp. This is the smallest overlap there is. It is also known that 4 bp overlap is common in phage genomes because of operons, so it’s likelt that the start site of 17016 is valid. /note=Phamerator: As of 4/3/23, this gene is in pham 69556. There are 5 genes in this pham total. Hum25 is the only singleton in this pham, while there are 3 genes from cluster FD Anjali_20, Mendel_22, Tillums_Draft_26) and one gene from cluster FB (Sarge_22). For non-draft phages Anjali and Mendel, this gene was called as the amidase domain of lysin A, while for Sarge it was called as an endolysin. The function called is mostly consistent with a lysin protein. /note=Starterator: The most manually annotated start site is 9, which is called in non-draft phages Anjali and Mendel. It is found in 3 out of 5 genes in the pham. This does not have a corresponding start site in Hum25, whose start site of 17016 corresponds to start site 8. Besides Hum25, start site 8 is also called in Sarge_22. This is informative because start site 9 does not have a corresponding site in Hum25, so the fact that a non-draft genome also calls start site 8 is encouraging. /note=Location call: Taken together, the coding potential, low final score, small overlap, and phamerator/starterator results point to the fact that this is a real gene. The potential start site is 17016. Not only does it cover all coding potential, but it has the smallest overlap that also suggests that the gene is part of an operon. This start site is also conserved in phage Sarge. This start site is not conserved in the other non-draft phages in the pham, but it is not too concerning because the other non-draft phage genomes call a start site that has no equivalent in Hum25. /note=Function call: Predicted function is an endolysin. PhagesDB BLASTp gives hits that have very small e-values (<1e-40), with the top three being endolysins. To support this, HHPred has 2 hits that correspond to a N-acetylmuramoyl-L-alanine amidase domain with probabilities of 99.55% and 99.46% and e-values of 7.1e-13 and 5.1e-12. CDD supports HHPred results in that the conserved domain (8.55e-17) is in the peptidoglycan recognition family, which includes N-acetylmuramoyl-L-alanine amidase. Since these hits do not fully cover the gene ( % Coverage < 60%), this gene should be called as an endolysin. This gene definitely contains the N-acetylmuramoyl-L-alanine amidase domain of an endolysin protein, and if there are other endolysin proteins discovered for this genome the domain will be added to the function. /note=Transmembrane domains: There are no predicted transmembrane domains. This makes sense if the gene is for an endolysin, since endolysins do not need to have contact with the bacterial cell membrane. /note=Secondary Annotator Name: Nguyen, Mya /note=Secondary Annotator QC: Good explanation and reasoning on your calls. Drop down boxes and evidences are checked off. I agree with your calls. CDS 17738 - 17992 /gene="24" /product="gp24" /function="holin" /locus tag="Hum25_24" /note=Original Glimmer call @bp 17738 has strength 12.37; Genemark calls start at 17738 /note=SSC: 17738-17992 CP: yes SCS: both ST: SS BLAST-Start: [holin [Renibacterium salmoninarum] ],,NCBI, q1:s1 95.2381% 5.214E-14 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.734, -3.2123487306667524, yes F: holin SIF-BLAST: ,,[holin [Renibacterium salmoninarum] ],,WP_012244782,65.4321,5.214E-14 SIF-HHPRED: Phage_r1t_holin ; Putative lactococcus lactis phage r1t holin,,,PF16945.8,83.3333,99.9 SIF-Syn: just downstream of endolysin /note=Primary Annotator Name: Kim, Cindy /note=Auto-annotation: Glimmer and GeneMark both called the same start site at 17738. This start site corresponds to a start codon of ATG. /note=Coding Potential: There is good coding potential in both the Host and Self Trained GeneMark. The chosen start site covers all the coding potential in both. /note=SD (Final) Score: The best SD score is -3.212 which corresponds to the predicted start site at 17738. /note=Gap/overlap: There is a gap of -1 which indicates an overlap with the downstream gene. This is a reasonable overlap and is conserved with other genomes like Footloose. The gene length is acceptable given the likely start site of 17738. /note=Phamerator: 4/3/23; Pham 71862. This gene is conserved in other members of this pham such as Attoomi and Footloose. The Phams database called a function of holin for this gene. /note=Starterator: Start site number 67 at 17738 has 289 manual annotations, and start site number 67 was also the most called; 289 of the 340 non-draft genomes in the pham called start site number 67. /note=Location call: This gene is most likely a real gene that starts at 17738. This start site covers good coding potential in both the Host and Self GeneMark and conservation of the gene is observed in other phages from Phamerator. The most called start site number on Starterator also supports the start site of 17738 which was also the most manually annotated for this gene’s pham. /note=Function call: Predicted function is holin. NCBI Blastp had two hits with > 95% coverage with small e-values of 5.839e-13 to 5.214e10-14 and reasonable % identities ( > 45%). HHpred had one hit with high % coverage (83.3%), high probability of 99.9%, and a low e-value of 1.12-20. /note=Transmembrane domains: 2 transmembrane proteins are predicted but both are only 14-15 amino acids long, therefore according to SEA-PHAGES guidelines they are likely not membrane proteins. /note=Secondary Annotator Name: Kretschmer, Thomas /note=Secondary Annotator QC: Good job! Your notes are detailed yet brief include all pertinent evidence. Great job using all the different pieces of evidence and describing your decision making process. CDS 18076 - 18852 /gene="25" /product="gp25" /function="hypothetical protein" /locus tag="Hum25_25" /note=Original Glimmer call @bp 18076 has strength 15.38; Genemark calls start at 18076 /note=SSC: 18076-18852 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein [Arthrobacter sp. ISL-69] ],,NCBI, q1:s1 100.0% 2.30681E-47 GAP: 83 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.38, -3.893834241316366, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Arthrobacter sp. ISL-69] ],,WP_214953964,61.5385,2.30681E-47 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chawla, Esha /note=Auto-annotation: Glimmer Start Site: 18076, GeneMark Start Site: 18076. Both Glimmer and GeneMark called (auto-annotated) the same start site for this gene. The start codon at this gene is GTG. /note=Coding Potential: There is high coding potential contained in the putative ORF, in the third forward reading frame in both GeneMark Host and Self. As such, this gene is likely a real gene with a start site at 18076. /note=SD (Final) Score: The best SD (Final) Score is -3.894, which is the SD score associated with the start site at 18076. I think it is worth considering if a new start site needs to be proposed, which would also result in a better SD score. /note=Gap/overlap: While is an 84 base pair gap, there is no coding potential in this 84 base pair gap, making it unlikely there is another gene in the gap or that the start needs to be moved further upstream. This gene is a relatively large gene – it is 777 base pairs long. /note=Phamerator: On the day of my investigation, 4/3/2023, this gene was found in Pham 24656. /note=This Pham only has 1 member, Hum25, which is a draft. This phage is a singleton, which makes its analysis especially interesting, as there are currently no other phages that are in the same cluster. Hum25 is in cluster AX, and there is no called function for this gene. /note=Starterator: There is no data in the starterator, as this protein is an orpham, or phams with only one member in them. There is no starterator data generated, so there is nothing to compare it to. /note=Location call: This is likely a real gene, as it has good coding potential. From the proposed start site at 18076 to stop site, there is high coding potential, and the entire coding potential is covered. Because this protein is an orpham, it is unknown if the gene is well-conserved, so further analysis is needed on this. /note=Function call: There is no conclusive evidence regarding function. NCBI BLASTp proposes the gene may be a head maturation protease with high confidence, but that it may also have an unknown function. PhagesDB BLASTproposes this gene has an unknown function.HHpred proposes this gene encodes CRISPR-associated helicase Cas3, but the e-value is very high at 190. There are no domain hits using CDD> /note=Transmembrane domains: No transmembrane domains – unable to conclude if this is in-line with the function, as the current function of this gene is NKF. /note=Secondary Annotator Name: Kaemin Tosasuk /note=Secondary Annotator QC: Great work! Your notes are very detailed. No other changes needed. CDS 18853 - 20148 /gene="26" /product="gp26" /function="esterase" /locus tag="Hum25_26" /note=Original Glimmer call @bp 18853 has strength 12.15; Genemark calls start at 18853 /note=SSC: 18853-20148 CP: yes SCS: both ST: NI BLAST-Start: [MULTISPECIES: GDSL-type esterase/lipase family protein [unclassified Rhodococcus (in: high G+C Gram-positive bacteria)] ],,NCBI, q25:s42 92.5754% 4.81848E-49 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.271, -4.043393633469635, no F: esterase SIF-BLAST: ,,[MULTISPECIES: GDSL-type esterase/lipase family protein [unclassified Rhodococcus (in: high G+C Gram-positive bacteria)] ],,WP_094621097,47.973,4.81848E-49 SIF-HHPRED: Acyl-CoA thioesterase I; thioesterase, TesA, fatty acid, octanoic acid, HYDROLASE; HET: OCA; 0.97A {Escherichia coli} SCOP: l.1.1.1, c.23.10.5,,,5TIF_A,47.3318,99.8 SIF-Syn: /note=Primary Annotator Name: Pisipati, Kirthana /note=Auto-annotation: Glimmer and Genemark agree on a start site of 18853. The start codon is TTG, which is unlikely relative to codons ATG and GTG, casting some doubt on this start site. /note=Coding Potential: Both host trained and self trained Genemark show high coding potential throughout the gene, and all of the coding potential is covered by the start site. /note=SD (Final) Score: The final score is -4.043; only one other start site has a higher score. However, since this start site indicates that this gene is part of an operon, we may disregard the RBS/final score. /note=Gap/overlap: The gap is 0bp, which may indicate this gene is part of an operon. The ORF is 1296bp. /note=Phamerator: This gene is in pham 7842 as of 4/3/22. There are no non-draft genomes that belong to the same cluster, since this phage is a singleton. /note=Starterator: Start site 12 (18853bp) was found only in Hum25, and thus was not manually annotated in any of the five other genes in this pham. All of the genes have different start sites, so there is not one that is most conserved; starterator info is not particularly significant. /note=Location call: This is a real gene, as it has high coding potential throughout the ORF. The start site is somewhat unclear since the auto-annotated site does not have a very high final score and its start codon has a lower probability than other start codons; however, no other start site is significantly better. Start site 18853, as called by Glimmer and Genemark, is likely correct and is part of an operon, which would explain the gap of 0bp and low final score. It also has the longest ORF. /note=Function call: Predicted function is esterase, based on hits from phagesDB BLAST, NCBI BLASTp, and HHpred, all of which had multiple hits with low e values (e-40) and high probabilities (94%). The first hit on HHpred is a thioesterase with a probability of 99.78% and an e value of 7.5e-17. CDD shows domain hit for family of lipases and esterases. /note=Transmembrane domains: There were no transmembrane domains shown on the DeepTMHMM output, which makes sense for the function of esterase. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS complement (20152 - 22092) /gene="27" /product="gp27" /function="O-acetyltransferase" /locus tag="Hum25_27" /note=Original Glimmer call @bp 21576 has strength 6.09; Genemark calls start at 22092 /note=SSC: 22092-20152 CP: yes SCS: both-gm ST: NI BLAST-Start: [acyltransferase [Pseudarthrobacter sp. L1SW] ],,NCBI, q3:s1 54.0248% 1.76608E-151 GAP: 148 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.538, -5.664406167576407, no F: O-acetyltransferase SIF-BLAST: ,,[acyltransferase [Pseudarthrobacter sp. L1SW] ],,WP_228404163,78.8301,1.76608E-151 SIF-HHPRED: OpgC_C ; OpgC protein,,,PF10129.12,51.7028,99.9 SIF-Syn: /note=Primary Annotator #1: Hoang, Ryan /note=Auto-annotation: Glimmer states that the start site is at 21,576. GeneMark, on the other hand, calls the start site at 22,092. /note=Coding Potential: Most of the ORF has high coding potential on both GeneMark self and host, indicating that the gene is a real gene. There is coding potential in the 3rd open reading frame on the reverse strand. The start site at 22092 best captures the majority of the coding potential, with the other start called by Glimmer not capturing the rest of the coding potential present. /note=SD (Final score): The final score of the Glimmer start is not the highest, at a SD score of -4.845. The other start sites have higher scores, of up to -3.768. The GeneMark call at 22092 has a SD of -5.664. There is a very large gap of 1425bp between the Glimmer start, and the next gene, however, there is a smaller gap of 478 between this gene and the following gene. /note=Gap/overlap: Depending on the gap, there is a large gap between this gene and the upstream gene, of either 1425bp or 478bp. The gene downstream of this has an overlap of 4bp, however, this gene is a forward gene. This is a bit interesting, given genes that switch between forward and reverse order should typically have gaps of 50bp. /note=Phamerator: This is a singleton, and as a result, it has no other phages that it can compare with. The phage is in Pham 26764 as of April 5, 2023. /note=Starterator: There were no other results for the Starterator call, as there was no other phages in this pham. /note=Location call: I would call it at 22092, simply because it captures more coding potential. However, this is a bit weird as it doesn’t have the highest RBS Final score or Z-score in comparison to the other start site as suggested by Glimmer. /note=Function call: NCBI Blast suggests that it is an acyltransferase, with the top two results in Blast being acyltransferases with very low e-values of 2e-151 and 7e-144, while BlastP on PhagesDB suggests that it is an esterase with low e-values of 2e-29 and a thioesterase of 5e-15. CDD hits actually did come up, suggesting that once again, it was an acyltransferase in the Superfamily of cl21495 from 1 to 375, with an e-value of 1.16e-42 However, from 499 to 634, there was another hit as well which had a higher e-value of 2.55*e-9, which was called to be a SGNH hydrolase in the superfamily of cl01053. For HHPred, the top two results were an OpgC protein and an acyltransferase with e-values of 3.9e-23 and 7e-23, respectively. Overall, this suggests that the functional call should be an O-acetyltransferase, as specified by PhagesDB protein list, as the N-terminus has an acyltransferase and the C-terminus has an SNGH hydrolase protein as found by CDD. Furthermore, it had 11 transmembrane domains, which also is consistent with calling it an O-acetyltransferase. /note=Transmembrane domains: DeepTmHmm called that it had 11 transmembrane domains. This supports the functional call of the gene being an O-acetyltransferase as it should have between 9-11 transmembrane domains to be considered an O-acetyltransferase. /note=Secondary Annotator Name: Vanderpool, Lauren /note=Secondary Annotator QC: Your notes are super detailed overall! I would like to highlight the point you made in your notes, where you said that the gene downstream "has an overlap of 4bp, however, this gene is a forward gene. This is a bit interesting, given genes that switch between forward and reverse order should typically have gaps of 50bp". I think that is something worth looking into. Otherwise, awesome job! CDS complement (22241 - 22384) /gene="28" /product="gp28" /locus tag="Hum25_28" /note= /note=SSC: 22384-22241 CP: no SCS: neither ST: NI BLAST-Start: GAP: 7 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.189, -6.457100313770254, no F: SIF-BLAST: SIF-HHPRED: SIF-Syn: CDS complement (22392 - 22505) /gene="29" /product="gp29" /function="hypothetical protein" /locus tag="Hum25_29" /note= /note=SSC: 22505-22392 CP: no SCS: neither ST: NI BLAST-Start: GAP: 65 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.684, -4.0047731777977384, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: CDS complement (22571 - 22777) /gene="30" /product="gp30" /function="hypothetical protein" /locus tag="Hum25_30" /note=Original Glimmer call @bp 22777 has strength 5.03; Genemark calls start at 22777 /note=SSC: 22777-22571 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein PBI_NANDITA_67 [Arthrobacter phage Nandita]],,NCBI, q5:s55 94.1176% 2.84863E-18 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.772, -5.154811153158988, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PBI_NANDITA_67 [Arthrobacter phage Nandita]],,AYN58686,40.1575,2.84863E-18 SIF-HHPRED: SIF-Syn: This gene has the same NKF call as the following four genes; because the gap between the start site and the following gene is -4 (meaning they overlap), this gene appears to be part of an operon. This operon`s function has yet to be determined. The preceding gene, which codes for O-acetyltransferase, has a significant gap between it and this gene, which suggests that they are not connected in function. /note=Primary Annotator Name: Lim, Madeleine /note=Auto-annotation: Glimmer and Genemark denote start of gene at 22777 /note=Coding Potential: High coding potential can be seen throughout 22600 to 22800 bp in the third reading frame; shows potential slight mismatch of auto-annotation and actual gene’s start and stop /note=SD (Final) Score: Best start site was seen at 22735 given Z-score (2.303 > 1.772) and Final Score (-4.328 > -5.155) compared to 22777. /note=Gap/overlap: Start 22735 has a gap of 38, but Start 22777 has a gap of -4, indicating the gene is potentially part of an operon if Start 22777 is the gene’s true start. /note=Phamerator: There is no Phamerator information for Pham 22484, which only contains Hum25; this indicates it is an orpham. /note=Starterator: There is no Starterator report for Pham 22484, which only contains Hum25; this indicates it is an orpham. Therefore, the Starterator is not informative. /note=Location call: Because the gap for Start 22777 indicates the gene is part of an operon, and the RBS can thus be disregarded, the most likely start site for this gene is at 22777. /note=Function call: NKF; both NCBI and PhagesDB BLAST had one result for a hypothetical protein for the 67th protein of phage Nandita of Cluster FF; there was a low e-value of 3e-16, indicating a close match, but the function for that protein is unknown. HHPred’s highest matches are for proteins of unknown function, and there are no results from CDD. /note=Transmembrane domains: There are no hits from DeepTMHMM, indicating that it is an internal protein. /note=Secondary Annotator Name: Vu, Thomas CDS complement (22774 - 23124) /gene="31" /product="gp31" /function="hypothetical protein" /locus tag="Hum25_31" /note=Original Glimmer call @bp 23124 has strength 4.77; Genemark calls start at 23124 /note=SSC: 23124-22774 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein PP357_gp42 [Arthrobacter phage Sarge] ],,NCBI, q47:s42 53.4483% 5.46485E-25 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.163, -4.4121902292613795, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PP357_gp42 [Arthrobacter phage Sarge] ],,YP_010649596,47.6636,5.46485E-25 SIF-HHPRED: SIF-Syn: NKF /note=Primary Annotator Name: Martin, Kyle /note=Auto-annotation: Genemark and Glimmer were both used in this case and have a start site at 23124. HHPred also agrees with the start site, CDD as mentioned below, but has no evidence for me to look at. /note=Coding Potential: The coding potential as seen is in the reverse direction for this gene and is similar in both the Host and Self-Trained GeneMark. /note=SD (Final) Score: This score is -4.412 which is not the best of all the candidates but has the true LORF. /note=Gap/overlap: The overlap is -4 which indicates that it is overlapping with another gene. We see that the length is also reasonable as well. /note=Phamerator: The pham number is 39562 and was run on 3/31/23. It is an orpham so no other member to compare it to. /note=Starterator: The start site from the Pham Starterator and PhagesDB is 39562. It is an orpham so no starterator report to check. /note=Location call: This is a real gene with a likely start site of 23124. /note=Function call: NFK. For the function, it says it is an unknown function. The CDD dataset had no data for me to look at. The e values are relatively low showing a real gene with no real function on the HHPred results. /note=Transmembrane domains: After running the DeepTMHMM, we notice that there are zero domains which mean, none are found. /note=Secondary Annotator Name: Li, Mulin /note=Secondary Annotator QC: evidence needs to picked for the functional call on PECCAN. At most three pieces of evidence should be picked for start sites, Blast, CDD, HHpred, etcs. - The coding potential was only seen in the reverse direction throughout the gene. - While no evidence was shown for its function, please include more reasons why this is a real gene. For example, size of the gene and its transcription orientation. CDS complement (23121 - 23411) /gene="32" /product="gp32" /function="hypothetical protein" /locus tag="Hum25_32" /note=Original Glimmer call @bp 23411 has strength 10.6; Genemark calls start at 23411 /note=SSC: 23411-23121 CP: yes SCS: both ST: NA BLAST-Start: GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.538, -5.646677400615975, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: orpham /note=Primary Annotator Name: Pan, Crystal /note=Auto-annotation start source: Both Glimmer and Genemark agree on the called start site 23411 and the stop site at 23121. The coding potential is covered with this start site. /note=Coding Potential: The start site covers the entire coding potential. There is sufficient evidence to conclude this is a real gene. Host-Trained GeneMark covers majority of the coding potential. Since it is a reverse gene, the stop at 23121 seems a little bit premature, but covers enough of the coding potential to be reliable. Self-Trained GeneMark concurs with Host-Trained GeneMark, and shows that the start site also covers most of the coding potential. Coding potential is in the reverse strand, agreeing that this is a reverse gene. /note=SD (Final) Score: Final score is -5.647, and z score is 1.538. The RBS score is not very good, but the 4 basepair overlap, according to the manual, often times has poor RBS scores, but the favorable overlap trumps the RBS score. The z-score is <1.6, so it is a poor score. However, the other evidence is very convincing -- this start site covers the entire coding potential best, and the 4 basepair overlap is favorable. This is likely an orpham that is part of an operon (predicted from the -4bp gap). /note=Gap/overlap: -4 bp. Favorable gaps are -1 bp and -4 bp (1 bp and 4 bp overlaps). /note=Phamerator: 19890. Date: 23-04-03. This gene is not conserved in any other bacteriophages. The only gene this shows up in is in the Hum25 phage that is a singleton. Thus, this is an orpham. /note=Starterator: The starterator indicates that it is pham 19890. The starterator does not yield any useable information, likely because this gene is an orpham. /note=Location call: Evidence supports that this gene is real and the start site is 23411 (ATG). /note=Function call: Likely NKF. PhagesDB function frequency lists it as recT-like ssDNA binding protein, but recT-like ssDNA binding protein is no longer part of the approved functions list. PhagesDB BLAST lists no known function, but the e-values are also not great. HHPred shows this is a DBP, but the e-values are very very high and the coverage is not ideal (though still higher than the 35%). CDD shows no hits. With the evidence shown, it seems to make sense that there is no known function for this protein. /note=Transmembrane domains: DeepTMHMM does not call any transmembrane domains. It says that it is an `inside` protein. CDS complement (23408 - 23590) /gene="33" /product="gp33" /function="hypothetical protein" /locus tag="Hum25_33" /note=Genemark calls start at 23590 /note=SSC: 23590-23408 CP: yes SCS: genemark ST: NA BLAST-Start: GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.919, -4.924073590240254, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Vajragiri, Shreya /note=Auto-annotation: There is no Glimmer call for start source. There is a GeneMark auto-annotated start which is 23590, an ATG. /note=Coding Potential: Both host-trained and self-trained GeneMark show coding potential for most of the gene (there is a drop near the stop site but this cannot be adjusted) on the reverse strand. /note=SD (Final) Score: -4.924. It’s the best (highest) final score, and corresponds to the smallest gap/overlap. /note=Gap/overlap: -4. This is very small/reasonable, and a small overlap indicates that this gene is likely an operon. /note=Phamerator: 24043. There is no Pham report because this gene is an orpham. /note=Starterator: n/a (no Pham report) /note=Location call: Evidence (coding potential) supports that the gene is real. Start site is predicted to be 23590 based on SD final score and coding potential /note=Function call: NKF. No Blastp hits (on phagesdb or ncbi). No CDD hits. No significant HHPRED hits (all have very high e-values) /note=Transmembrane domains: DeepTMHMM predicts it to be a globular protein on the inside of the phage. However, there are no similar hits. /note=Secondary Annotator Name: Hamid, Bilal /note=Secondary Annotator QC: Please add synteny notes CDS complement (23587 - 23949) /gene="34" /product="gp34" /function="hypothetical protein" /locus tag="Hum25_34" /note=Original Glimmer call @bp 23949 has strength 1.26; Genemark calls start at 23949 /note=SSC: 23949-23587 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PP639_gp049 [Arthrobacter phage Seahorse] ],,NCBI, q6:s2 94.1667% 7.60941E-26 GAP: 141 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.38, -3.893834241316366, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PP639_gp049 [Arthrobacter phage Seahorse] ],,YP_010656235,65.0,7.60941E-26 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Zhu, Yichen /note=Auto-annotation: Gene (stop@23587 R) /note=Coding Potential: high /note=SD (Final) Score: -3.894 /note=Gap/overlap: 141 /note=Phamerator: 73711 (22 members) /note=Starterator: 29 /note=Location call: 23949 /note=Function call: No known function as all hits are hypothetical proteins /note=Transmembrane domains: 0 /note=Secondary Annotator Name: Wang, Xinyi /note=Secondary Annotator QC: I agree with the location and function call, but maybe write more detail on phamerator and starterator findings. CDS complement (24091 - 25182) /gene="35" /product="gp35" /function="tyrosine integrase" /locus tag="Hum25_35" /note=Original Glimmer call @bp 25182 has strength 10.34; Genemark calls start at 25206 /note=SSC: 25182-24091 CP: no SCS: both-gl ST: NI BLAST-Start: [tyrosine-type recombinase/integrase [Arthrobacter sp. zg-Y20]],,NCBI, q12:s55 96.1432% 9.41569E-94 GAP: 7 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.419, -7.440254293245045, no F: tyrosine integrase SIF-BLAST: ,,[tyrosine-type recombinase/integrase [Arthrobacter sp. zg-Y20]],,MCC3276409,52.3227,9.41569E-94 SIF-HHPRED: Int protein; transposase protein-DNA complex, tyrosine recombinase, Y-transposase, Tn916-like conjugative transposon, antibiotic resistance transfer, RECOMBINATION; 2.5A {Enterococcus faecalis},,,6EMY_A,77.686,100.0 SIF-Syn: /note=Primary Annotator Name: Wu, Grace /note=Auto-annotation: Glimmer starts at 25182, and GeneMark starts at 25206. /note=Coding Potential: In the host-trained GeneMark, there is high coding potential in the reverse direction. In the self-trained coding potential map, we also found almost full coverage from the designated start site and stop site in the reverse direction, indicating a high coding potential. Both host-trained and self-trained maps agree that this gene is a reverse strand. /note=SD (Final) Score: The final score is -7.440, the z-score is 1.419, and this start site is not the LORF. Overall, the RBS score for the start site is not optimistic (final score close to 0, and z-score close to 2). /note=Gap/overlap: Gap of 336 bp, this is not within the reasonable range. There could be potential missing genes before this gene. /note=Phamerator: This gene is in Pham 74089. There are in total 958 members in this pham, and a diverse cluster type, including clusters AY, A, AS, L K, etc. However, Hum25 is shown as a singleton in this Pham. /note=Starterator: This gene is in Pham 74089. The analysis was run on 04/09/2023. There are 78/957 genes in this pham as a draft gene. The most common start number is 187, which is called in 227/897 non draft genes. Hum25 is called at start 146. Since the majority of the genes are not called at the most often start number, the start number does not impact the gene. This start number is only called by Hum25. This information further validated the fact that Hum25_33 is a singleton. /note=Location call: The start codon is TTG, at start site 25182. This start codon is not the most common start codon, compared to other genes, the start site called for this gene is debatable. /note=Function call: In PhageDB BLASTp, the overall alignment score is high (range 80-200 bp). The best hits indicate the function as tyrosine integrase. For NCBI BLASTp, the top hits with low e-value (~1e-94) and high coverage (~98%) indicate function as site-specific integrase and tyrosine integrase. In HHPred, the best hits have data that are optimistic (high probability ~100, low e-value ~2.8e-31 and high coverage), and the function is integrase for the top hits. The function is mostly likely to be tyrosine integrase, for site-specific integrase is not in the list of approved functions by SEA PHAGES. /note=Transmembrane domains:TMHMM show no transmembrane domains, there are only outside and inside domains. /note=Secondary Annotator Name: Tosasuk, Kaemin /note=Secondary Annotator QC: I would re-evaluate the start site of the gene based on the fact that Glimmer and GeneMark do not agree. Furthermore, the chosen start site has a very low Z-score and final score compared to the rest of the start sites. CDS complement (25190 - 25375) /gene="36" /product="gp36" /function="hypothetical protein" /locus tag="Hum25_36" /note= /note=SSC: 25375-25190 CP: no SCS: neither ST: NA BLAST-Start: GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.97, -4.6768946155982105, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kim, Cindy /note=Auto-annotation: Due to this gene being newly added there are no recorded Glimmer nor GeneMark start sites. /note=Coding Potential: There is good coding potential in the Host and Self GeneMark with the chosen start site covering all of the coding potential. /note=SD (Final) Score: The chosen start site at 25324 bp has the best Final Score of -3.887 and a Z-score of 2.375, which is the second best Z-score. /note=Gap/overlap: There is a 47 bp gap, which appears to be reasonable. Within other singletons, this gap appears to be conserved as compared to Success and Sleepyhead. /note=Phamerator: As this gene is newly added, there is no Phamerator data as of 4/10/23. /note=Starterator: As this gene is newly added, there is no Starterator data as of 4/10/23. /note=Location call: Based on the above evidence, this gene is a real gene and has a start site at 25324 bp. /note=Function call: There were no NCBI, BLASTp or CDD hits. HHPred hits had extremely high E-values from 7.4 to 82 with low probabilities (21% to 69.91%). Based on this evidence, this gene has No Known Function (NKF). /note=Transmembrane domains: DeepTMHMM predicted no transmembrane regions, therefore this gene is most likely not a membrane protein. /note=Secondary Annotator Name: Wang, Xinyi /note=Secondary Annotator QC: The synteny is a bit weird to use as an evidence in this location call because the two phage synteny mentioned doesn`t really display synteny with the target gene in my opinion. Also, remember to fill out the synteny box. Otherwise, I agree with the choice of location and function call. CDS complement (25372 - 25512) /gene="37" /product="gp37" /function="hypothetical protein" /locus tag="Hum25_37" /note= /note=SSC: 25512-25372 CP: yes SCS: neither ST: NA BLAST-Start: GAP: 6 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.469, -4.729704120146823, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chawla, Esha /note=Auto-annotation: No proposed Glimmer start or GeneMark start sites, as this is a new gene we have proposed. /note=Coding Potential: There is a decent amount of coding potential contained in the putative ORF, in the 3rd reverse reading frame in both GeneMark Host and Self. This gene is likely real as there is decently good coding potential throughout the whole gene. /note=SD (Final) Score: The best SD (Final) score is -4.730, which is the SD score associated with the start site at 25512, making this the better start site. This SD score is not very good, but there is not much coding potential upstream of the proposed start site. /note=Gap/overlap: This gene is a reverse gene in the midst of many reverse genes. There is a 48 base pair gap between the stop of the gene and the start of the previous, reverse gene. There is no coding potential in this 48 base pair gap, so it is unlikely there is gene in the gap or that the start needs to be moved further upstream. This gene is a relatively normal sized gene – it is 141 base pairs long. /note=Phamerator: On the day of my investigation, 4/11/2023, there is no generated Phamerator data, as this gene has been newly proposed. Hum25 is in cluster AX, and there is no called function for this gene. /note=Starterator: There is no data in the starterator, as this protein is an orpham, or phams with only one member in them. There is no starterator data generated, so there is nothing to compare it to. /note=Location call: This is likely a real gene, as it has decently good coding potential. The start site at 25512 is the most likely start site, as it has the lowest SD (Final) Score and covers all of the coding potential without overlapping too much or having to big of a gap with adjacent genes. Because this protein is an orpham, it is unknown if the gene is well-conserved, so further analysis is needed on this. /note=Function call: There is no conclusive evidence regarding function. PhageDB BLAST and NCBI Blast did not call any functions. HHPRED proposes the gene may encode for oxidoreductase, but the e-value is very large (e = 28). There are no domain hits when analysis is performed using CDD. /note=Transmembrane domains: No transmembrane domains – unable to conclude if this is in-line with the function, as the current function of this gene is NKF. /note=Secondary Annotator Name: /note=Secondary Annotator QC: CDS complement (25519 - 25683) /gene="38" /product="gp38" /function="hypothetical protein" /locus tag="Hum25_38" /note=Original Glimmer call @bp 25683 has strength 9.0; Genemark calls start at 25683 /note=SSC: 25683-25519 CP: yes SCS: both ST: NI BLAST-Start: GAP: 90 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.241, -5.208452476010064, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Nguyen, Mya /note= /note=Auto-annotation: Both Glimmer and GeneMark autoannotated this gene. They both agree on the start at 25683. The start codon called is ATG. /note= /note=Coding Potential: There is good coding potential in the reverse direction and some in the forward direction. The chosen start site does cover all this coding potential. /note= /note=SD (Final) Score: The SD score is -5.208, which is the worst score out of all. The Z-score is 2.241, which is a good score that is over the value of 2. /note= /note=Gap/overlap: The gap is 512, which is a relatively large gap. However, the other start sites have an even larger gap than 512. The length of the gene is 165, which is the longest out of all the other start sites and seems the most reasonable. /note= /note=Phamerator: Date: 4/10/23, the pham number is 16924. It is the only member of this pham. /note= /note=Starterator: N/A Page/report not found /note= /note=Location call: The gathered evidence suggests that this is a real gene at start site 25683, given the good coding potential that is covered from the start site and the fact that both Glimmer and GeneMark annotated this start site. /note= /note=Function call: Phagesdb function frequency have two entries with tail assembly chaperone as the function, both called 50% of the time. Phagesdb BLAST also shows tail assembly chaperone, however, the entries have extremely high e-values, suggesting weak hits. HHPRED has hits that are all RNA polymerase sigma factors, with lower e-values, high probability, and decent coverage, however, the evidence here is also not that strong as the e-values range from 0.47 and higher. /note= /note=Transmembrane domains: There are no predicted TMDs. This is not a transmembrane protein. /note= /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS complement (25774 - 26145) /gene="39" /product="gp39" /function="helix-turn-helix DNA binding domain" /locus tag="Hum25_39" /note=Original Glimmer call @bp 26208 has strength 6.75; Genemark calls start at 26064 /note=SSC: 26145-25774 CP: no SCS: both-cs ST: NI BLAST-Start: [helix-turn-helix domain-containing protein [Gracilibacillus phocaeensis]],,NCBI, q48:s8 46.3415% 2.559E-5 GAP: 50 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.658, -6.160605165723925, no F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix domain-containing protein [Gracilibacillus phocaeensis]],,WP_130859330,20.7071,2.559E-5 SIF-HHPRED: a.35.1.2 (C:) P22 C2 repressor, DNA-binding domain {Salmonella bacteriophage P22 [TaxId: 10754]} | CLASS: All alpha proteins, FOLD: lambda repressor-like DNA-binding domains, SUPFAM: lambda repressor-like DNA-binding domains, FAM: Phage repressors,,,SCOP_d3jxbc_,49.5935,98.7 SIF-Syn: /note=Primary Annotator Name: Robles, Angel /note=Auto-annotation: Glimmer calls the start at 26208.GeneMark calls the start at 26064. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host /note=SD (Final) Score: -6.691. It has a high Final Score which indicates it has a good sequence match. /note=Gap/overlap: 56, which is slightly above the recommended 50bp limit and therefore slightly acceptable. There is no coding potential in the gap. Additionally, this gap makes sense as the following gene is in the forward direction. /note=Phamerator: pham: 16395. Date 04/04/23. It is only found in Hum25, thus is an orpham. /note=Starterator: Not informative /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 26208 bp. /note=Function call: Helix-turn-helix DNA Binding Domain. Two of the top three phagesdb BLAST hits have the function of helix-turn-helix DNA binding protein, however the E-value is not very good (E-value <0.0001), and 4 out of 5 top NCBI BLAST hits also have the function of helix-turn-helix DNA binding protein, however with low coverage (40% coverage, 10%+ identity, and E-value <0.00016714). HHpred had a hit for helix-turn-helix with 98.6% probability, 45% coverage, and E-value of 4.8e-7. CDD had a hit for Helix-turn-helix with 33% identity, 60% alignment, 40% coverage, with an E-value of 9.62631e-9. /note=Transmembrane domains: TMHMM does not predict any TMDs, therefore it is not a membrane protein. CDS 26196 - 26420 /gene="40" /product="gp40" /function="helix-turn-helix DNA binding domain" /locus tag="Hum25_40" /note=Original Glimmer call @bp 26265 has strength 4.94; Genemark calls start at 26196 /note=SSC: 26196-26420 CP: yes SCS: both-gm ST: NI BLAST-Start: [helix-turn-helix transcriptional regulator [Dolichospermum sp. BR01]],,NCBI, q6:s7 85.1351% 1.88888E-4 GAP: 50 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.748, -4.647348953121449, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix transcriptional regulator [Dolichospermum sp. BR01]],,MBS9384268,49.3151,1.88888E-4 SIF-HHPRED: P22 C2 REPRESSOR; TRANSCRIPTION REGULATION; NMR {Enterobacteria phage P22} SCOP: a.35.1.2,,,1ADR_A,79.7297,98.8 SIF-Syn: Not much synteny in this region as this region is quite unique especially compared to cluster AY phages which show synteny in much fo the earlier regions. /note=Primary Annotator Name: Hamid, Bilal /note=Auto-annotation: Glimmer and Genemark provide conflicting start sites. Glimmer indicates 26265 start site while genemark indicates 26196 start site. Given that the genemark site results in a larger ORF and a smaller gap, I suspect that it is the better site. Start codon is ATG. /note=Coding Potential: Host-trained genemark coding potential appears at approximately 26200 which indicates a possibility for the earlier start site at 26196 to be the correct site. Phage-trained genemark agrees with this start site which adds more evidence that the gene starts at the earlier site. This earlier start site covers all of the coding potentials. /note=SD (Final) Score: With a final score of -4.647, I strongly suspect the true start site is at 29196. /note=Gap/overlap: Selecting Start site @29196 causes the gaps to be ~ 500 from the previous gene. This is an error with how the gap is presented as the genes are out of order as of 04/15/23. Once corrected, the gap will be `80bp which si too small for another gene to be present. /note=Phamerator: Phamerator indicates this is a unique pham with only the HUM25 draft in it. /note=Starterator: No Starterator due to a lack of data to compare to (Unique Pham) /note=Location call: Start@ 26196 is likely the best site, but this depends on where the suspected missing gene ends. /note=Function call: best phagesDB blastp hits have e-values > 0.01 thus they will not be used to determine function. NCBI blastP has slightly better results with e-values of 2e-4 and 3e-4 for helix-turn-helix transcriptional regulator for Dolichospermum sp. BR01 and Dolichospermum sp. FACHB-1091 respectively. Overall, these results are a little weak for confidently calling function. CDD shows slightly better indications of helix-turn-helix XRE(Xenobiotic response element) -family like proteins with an e-value of 2.76e-5. Two best PDB hits indicate competence regulator (ComR) from 6HU8_A and an unclear SaPI repressor from Staphylococcus aureus with e values of 3.4e-7 and 4.0e-7 respectively. While overall the results are weak, given a relatively strong HTH hit, I will call the function as a helix-turn-helix DNA binding Domain /note=Transmembrane domains: DEEPTMHMM calls no transmembrane domains given this is likely a dna binding protein, this makes sense. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 26509 - 26679 /gene="41" /product="gp41" /function="hypothetical protein" /locus tag="Hum25_41" /note= /note=SSC: 26509-26679 CP: yes SCS: neither ST: NI BLAST-Start: GAP: 88 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.977, -5.012410989999993, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Pisipati, Kirthana /note=Auto-annotation: Neither Glimmer or Genemark call a start site, so there is no auto-annotated site. /note=Coding Potential: There is a decent amount of coding potential throughout the putative ORF. Start site 26497 is the only site that seems to include all of the coding potential, but start site 26509 is also feasible as it includes the vast majority of coding potential. All other start sites would leave out too much. /note=SD (Final) Score: Start site 26497 has a final score of -7.62. Start site 26509 has a final score of -5.012. While neither of these scores are particularly favorable, start site 26509 is higher. /note=Gap/overlap: The gap for start site 26497 is 288bp, and the ORF is 183bp. The gap for start site 26509 is 300bp, and the ORF is 171bp. Both start site possibilities are similar in their size and both have a relatively large gap. /note=Phamerator: As of 4/11/23, this gene is not yet assigned to a pham, and cannot be compared with other genomes. /note=Starterator: Since this gene is not yet in a pham, there is no starterator output. /note=Location call: This is a real gene. There is little evidence to support either possible start site, but start site 26509 seems to be slightly more favorable because of its higher final score and higher Z score. /note=Function call: Neither of the BLASTp databases (phagesDB and NCBI) had any hits. CDD also did not have any hits. HHpred had several hits, but the e values were very high, so none of the results were useful. Thus, this gene has no known function. /note=Transmembrane domains: There are no transmembrane domains, which cannot be used to confirmed the function, since there is no known function. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 26666 - 27178 /gene="42" /product="gp42" /function="hypothetical protein" /locus tag="Hum25_42" /note=Original Glimmer call @bp 26666 has strength 8.02; Genemark calls start at 26666 /note=SSC: 26666-27178 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PP641_gp077 [Arthrobacter phage SilentRX] ],,NCBI, q1:s1 97.0588% 7.23661E-54 GAP: -14 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.967, -2.6623718267059564, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PP641_gp077 [Arthrobacter phage SilentRX] ],,YP_010656458,66.8605,7.23661E-54 SIF-HHPRED: SIF-Syn: NKF /note=Primary Annotator Name: Tran, Krysten /note=Auto-annotation: Glimmer and GeneMark; both agree on the same start site (26666); start codons called - ATG, TTG, and GTG /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF and the start site does cover all the coding potential. /note=SD (Final) Score: -2.662 is the most favorable score on PECAAN. The start codon is ATG. /note=Gap/overlap: There is a gap of 245 bp, which is the shortest gap of the options for the start site. This is a large gap indicating a gene should be added before, which is supported by the coding potential predicted in the coding potential maps. /note=Phamerator: This gene is in pham 6322 as of 03/24/23. Hum25 is the only phage that calls start site 3 and it does not contain the other two start sites called by other genes in the pham (start numbers 2 and 4). /note=Starterator: The start number 4 was called in 5 of the 8 non-draft genes in the pham, however start number 4 is not in this phage. Start number 3 at position 26666 has no manual annotations but is consistent with the start sites called by both Glimmer and GeneMark. The start number 3 was not called in any of the 8 draft or non-draft genes in the pham. Of the other two phages that did not have the most annotated start site, both are in cluster AP1, and Hum25 also does not have the start site called in these genes (start number 2). /note=Location call: Based on all the evidence gathered, the start site for this gene is likely at 26666. Although there are no manual annotations for this start site, it is the most logical as it has the most favorable final score and has the smallest gap (all other start sites results in gaps larger than 280bp) /note=Function call: NKF; Many hits from PhagesDB BLASTp and from NCBI BLASTp with good e-values but all were unknown function/hypothetical proteins. No hits from HHpred were strong because they had very high e-values. One HHpred hit had a probability of 85.76%, a percent coverage of 43.5%, however the e-value is 16 and it is for a protein of unknown function. One CDD hit for a BAR domain-containing protein which is a dimerization module that functions in membrane dynamics. Although there were significant hits from PhagesDB BLASTp and from NCBI BLASTp, they were all of unknown function. There were no significant HHpred and only one hit that was not that strong in CDD. Based on all the evidence gathered, I cannot hypothesize the function of the gene. /note=Transmembrane domains: No TMDs for this ORF but it is unclear if this is logical or not since the proposed function is NKF. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 27175 - 27351 /gene="43" /product="gp43" /function="hypothetical protein" /locus tag="Hum25_43" /note=Original Glimmer call @bp 27175 has strength 2.36 /note=SSC: 27175-27351 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein AHiyo6_04080 [Arthrobacter sp. Hiyo6]],,NCBI, q3:s5 93.1034% 1.47561E-5 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.574, -3.5490877672587735, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein AHiyo6_04080 [Arthrobacter sp. Hiyo6]],,GAP53843,53.2258,1.47561E-5 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Critzer, Nicole /note=Auto-annotation:Used both Glimmer and GeneMark for auto annotation but only Glimmer called a start site; 27175 with the start codon ATG which is common and therefore reasonable. /note=Coding Potential: The coding potential is reasonable and is within the start and stop site and its presence on the first ORF confirms it is a forward gene. /note=SD (Final) Score:-2.664, this isn’t that relevant since only one possible start was provided /note=Gap/overlap:-4, this overlap could be indicative of the gene being part of an operon so the overlap is reasonable /note=Phamerator: Pham 55762 observed on 04/06/23 and 04/10/23(new analysis was run). This pham has 27 members 9 of which are drafts and the gene is is conserved in phages RadFad and Isolde from cluster AY and Elesar and Cole in cluster FF that are within the pham. /note=Starterator: (Start: 11@27175) analysis was run on 04/09/23 on database version 507. The gene does not contain the “most annotated” start site 10, so starterator auto annotated start 11@27175 which agrees with Glimmer and includes all the coding potential. It should be noted though that this start site has no manual annotations and was not found in the other tracks of the phages in this pham. /note=Location call:27351 - both Glimmer and Starterator agree, it is the only start provided and encompasses all coding potential. It also has a reasonable final score. /note=Function call: NKF - all hits in PhagesDB BLAST were for genes of unknown function, a result which was mirrored in the NCBI BLAST. Although HHpred had hits with functions, all e-values associated with them were greater than 1 with low coverage and no other evidence to support function assignment. There were also no observed hits in CDD. /note=Transmembrane domains:No hits observed in DeepTMHHM, protein is inside the membrane, so this is not a transmembrane protein. /note=Secondary Annotator Name: Li, Mulin /note=Secondary Annotator QC: CDS 27363 - 27629 /gene="44" /product="gp44" /function="hypothetical protein" /locus tag="Hum25_44" /note=Original Glimmer call @bp 27363 has strength 3.18; Genemark calls start at 27363 /note=SSC: 27363-27629 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Paeniglutamicibacter antarcticus]],,NCBI, q1:s1 100.0% 1.15655E-31 GAP: 11 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.277, -1.993391246735709, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Paeniglutamicibacter antarcticus]],,WP_068733434,77.2727,1.15655E-31 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Deal, Milena /note=Auto-annotation: Both Glimmer and GeneMark agree on a start site of 27364. This start site has a start codon of GTG, which is a more commonly used start codon. The stop site for this gene is at 27629. /note=Coding Potential: There is coding potential throughout the whole gene in host-trained GeneMark and self-trained GeneMark. /note=SD (Final) Score: The final score for the auto-annotated start site at 27,363 is good and is the best final score out of all the potential start sites. The Z-score is good as well. /note=Gap/overlap: There is a 12 base pair gap between this gene and the nearest upstream gene. This is not the longest open reading frame. The longer ORFs would overlap with the upstream gene too much, so they are not likely to be the true stop site. /note=Phamerator: As of 4/5/23, there are 21 members in this pham, 17 of which are non-draft genomes. All the phages belong to cluster A and are part of subclusters AZ, AP, and AL. /note=Starterator: As of 4/5/23, the most chosen start site is 11, but this gene does not have that start site as an option. The auto-annotated start site (10, 27363) is not found in any other genes in this pham. Starterator is not very informative in this case. /note=Location call: This is a real gene beginning at the auto-annotated start site of 27363. Both Glimmer and GeneMark agree on this start site and its position makes sense based on the position of the upstream gene. /note=Function call: PhagesDB BLASTp and NCBI BLASTp results are all unknown function. HHpred does not have any significant hits and CDD has no hits. Therefore, the function of this protein is NKF. /note=Transmembrane domains: There are no transmembrane domains, so this does not help us determine the protein function. /note=Secondary Annotator Name: Vanderpool, Lauren /note=Secondary Annotator QC: CDS 27639 - 27827 /gene="45" /product="gp45" /function="helix-turn-helix DNA binding domain" /locus tag="Hum25_45" /note=Original Glimmer call @bp 27639 has strength 1.36 /note=SSC: 27639-27827 CP: yes SCS: glimmer ST: NI BLAST-Start: [helix-turn-helix domain-containing protein [Caulobacter sp. 602-2] ],,NCBI, q2:s10 95.1613% 1.36899E-10 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.169, -2.6835611257758942, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix domain-containing protein [Caulobacter sp. 602-2] ],,WP_165261857,53.6585,1.36899E-10 SIF-HHPRED: Putative DNA-binding protein; BldC, S. coelicolor, developmental switch, MerR-like, DNA BINDING PROTEIN-DNA complex; 3.09A {Streptomyces venezuelae},,,6AMA_L,87.0968,98.9 SIF-Syn: /note=Primary Annotator Name: Kretschmer, Thomas /note=Auto-annotation: Only Glimmer has an auto-annotated start site. The start site is likely 27639 with a start codon of ATG. /note=Coding Potential: There is coding potential at this gene in both host and self trained genemark, there is some small coding potential in the the reverse direction but not enough to be significant. The coding potential is small but completely contained within the gene start and end sites. /note=SD (Final) Score:-2.648 /note=Gap/overlap: 9 /note=Phamerator: This gene is in pham 60100. It is a small pham consisting of mostly helix-turn-helix proteins. All other phages with this pham are in the DS Cluster. /note=Starterator: Start site 2 is the only possible start site for Hum25. There are 0 manual annotations for this start site and it is present, but not called in any other phage. /note=Location call: This is likely a real gene that starts on 27639. /note=Function call: Helix-turn-helix DNA Binding protein. There are many NCBI BLAST and HHPRED hits with strong evidence that this is a Helix-turn-helix dna binding protein. /note=Transmembrane domains: 0 Predicted TMDs, this is likely an outside protein. /note=Secondary Annotator Name: Barden, Sophia /note=Secondary Annotator QC: Agree with start site and coding potential conclusions. For the SD and Gap sections; be sure to include the significance of what these indicate for you gene (ex: z score? best option? reasonable gap?) For Phamerator, be sure to include database version and date of analysis. I would also include some phages by name for pham map comparison if we wanted to look at potential matches for synteny. Why did you select Not informative for sarterator? Be a little more specific with Function call hits, and explain why you did not select any PhageDB Hits for this gene. Fill out Synteny box!! (Even if you dont locate compelling synteny, explain why not. also look at DS cluster on pham maps) CDS 27824 - 28003 /gene="46" /product="gp46" /function="hypothetical protein" /locus tag="Hum25_46" /note=Genemark calls start at 27824 /note=SSC: 27824-28003 CP: yes SCS: genemark ST: SS BLAST-Start: GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.773, -5.169097109949875, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Gowdy, Griffin /note=Auto-annotation: Only called by GeneMark, start 27824, GTG. /note=Coding Potential: Reasonable coding potential exists in this region, and the entire stretch of coding potential is included with the auto-annotated start. /note=SD (Final) Score: This start does have the highest Z-score and final score. They aren’t striking, however, this is reasonable for a polycistronic operon. /note=Gap/overlap: This start has an overlap of -4, which is a characteristic of polycistronic operons. This gives the gene a final length of 180, which is short but realistic. /note=Phamerator: As of 4/5/23, orpham 24348. /note=Starterator: N/a, orpham. /note=Location call: With the available evidence, it is reasonable to conclude stop@28003 F is real and starts at position 27824. Given that it is an orpham, wet lab confirmation and characterization would be ideal. /note=Function call: NFK - No programs returned any significant hits. /note=Transmembrane domains: No transmembrane domains predicted by DeepTMHMM. /note= /note=Secondary Annotator Name: Sacristan, Ariana /note=Secondary Annotator QC: Notes were very concise. Reported on all the available information and made NKF call very clear. CDS 28000 - 28131 /gene="47" /product="gp47" /function="hypothetical protein" /locus tag="Hum25_47" /note=Original Glimmer call @bp 28000 has strength 1.68 /note=SSC: 28000-28131 CP: yes SCS: glimmer ST: SS BLAST-Start: GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.277, -2.0111200136961407, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Estampa, Julia /note=Auto-annotation: Glimmer called the gene and noted that the start site is at 28000 bp. GeneMark did not call any start site. The start codon is GTG. Host-Trained and Self-Trained GeneMark both reflect relasonable coding potential that is consistent with the ORF. The chosen start site covers all the coding potential. /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF for both Host-trained and Self-trained GeneMark. However, for Self-trained GeneMark, there is some overlap between the forward and reverse direction. /note=SD (Final) Score: The SD score is -2.011 and is the best from the list, and this start site includes the LORF. However, the final score is irrelevant in this case because the gene is most likely part of an operon as suggested by the 4bp overlap. /note=Gap/overlap: The overlap is 4 bp (gap of -4bp) long upstream of the gene, which is small and reasonable. This overlap suggests the gene is most likely part of an operon. The length of the gene (132 bp) is acceptable, but requires a bit of careful examination. /note=Phamerator: Pham: 25937. Date found: 04/05/23. Hum25_42 is the only gene in this pham (orpham). /note=Starterator: No Starterator report. Since it’s the only gene in the pham, it is classified as an orpham as of 04/05/23. /note=Location call: Gathered evidence suggests this is a real gene with relatively good coding potential and that the strongest candidate for the start site is at 28000 bp. This start site is consistent with the ORF and the 4 bp overlap is common for genes in an operon. However, further examination may be needed to strongly confirm this as GeneMark did not report a start site and there is coding potential overlap under Self-trained GeneMark. /note=Function call: NKF. phagesDB BLAST and NCBI BLAST hits revealed unknown function with poor E-value scores. No CDD hits returned. HHPred did not have any significant hits and the top hit suggests unknown function. /note=Transmembrane domains: Since there are no predicted TMHs or TMDs returned from PECAAN and DeepTMHMM, it is not a membrane protein. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 28128 - 28283 /gene="48" /product="gp48" /function="hypothetical protein" /locus tag="Hum25_48" /note=Original Glimmer call @bp 28128 has strength 3.48 /note=SSC: 28128-28283 CP: yes SCS: glimmer ST: NA BLAST-Start: GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.135, -6.509217224627325, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: There does not seem to be synteny. /note=Primary Annotator Name: Le, Vivian /note=Auto-annotation: Glimmer called a start of 28128. GeneMark did not call a start. /note=Coding Potential: Reasonable coding potential is found for both GeneMark self and host. /note=SD (Final) Score: -6.509. This was not the best final score. /note=Gap/overlap: -4 bp overlap. This indicates that this is an operon. /note=Phamerator: The pham number as of 04.05.2023 is 26577. Hum25_44 is the only gene in that pham. /note=Starterator: Starterator report was not found. The gene is most likely an orphan, since it is also the only gene in the pham currently. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 28128. Although, the start site may not have the best final score, it is the most reasonable. The other start sites would create unreasonable gaps/overlaps. /note=Function call: Based on the PhagesDB BLASTp and NCBI BLASTp, there is currently no known function, because there were no strong hits. HHpred and CDD also did not have any significant hits. /note=Transmembrane domains: There were 0 predicted TMD`s. The topology graph also showed the gene/protein to only be on the inside. Not much can be said right now, because there is no known function. /note=Secondary Annotator Name: Deal, Milena /note=Secondary Annotator QC: Very clear notes! You do not need to write anything about synteny because the gene is NKF. CDS 28280 - 28900 /gene="49" /product="gp49" /function="hypothetical protein" /locus tag="Hum25_49" /note=Original Glimmer call @bp 28427 has strength 4.7; Genemark calls start at 28280 /note=SSC: 28280-28900 CP: yes SCS: both-gm ST: NA BLAST-Start: [ERF family protein [Arthrobacter sp. 9V] ],,NCBI, q1:s1 99.5146% 2.2156E-85 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.027, -3.5059859724959925, yes F: hypothetical protein SIF-BLAST: ,,[ERF family protein [Arthrobacter sp. 9V] ],,WP_159734117,77.2277,2.2156E-85 SIF-HHPRED: ERF ; ERF superfamily,,,PF04404.16,64.0777,99.9 SIF-Syn: No synteny displayed. /note=Primary Annotator Name: Wang, Xinyi /note=Auto-annotation: Glimmer and GeneMark both called the gene. Glimmer annotates start at 28427 with start codon ATG. GeneMark annotates start at 28280 with start codon ATG, which is also the LORF. Auto-annotation disagrees with each other. /note=Coding Potential: There is coding potential in the forward direction for starts designated by Glimmer and GeneMark, which suggests that this gene is real. While 28280 covers the entire coding potential region, 28427 covers part of the coding potential and leaves a gap of 143 bp between the previous gene. /note=SD (Final) Score: /note=The SD(Final) score for start 28280 is -3.506, which is the best score on PECAAN. The SD(Final) score for start 28427 is -5.464. Start 28280 has a better SD score than start 28427. /note=Gap/overlap: /note=28427 leaves a gap of 143 bp between the previous gene and the current gene. 28280 overlaps for 4 bp with the previous gene. 28280 generates a more reasonable distance between this gene and the previous one. /note=Phamerator: As of 4/10/23, the gene belongs to Pham 22680, which is an orpham. /note=Starterator: The website reports “Page/Report not found, The requested Pham report could not be found.” /note=Location call: /note=Start 28280 has the best SD score, and its z-score is 3.027, which is the best among all choices. It also allows LORF and has a reasonable start codon ATG. /note=On the other hand, start 28427 doesn’t have a good z-score(1.595) and SD score(-5.464), and it leaves a huge gap of 143 bp between this gene and the previous one. /note=Function call: Overall, the function is ERF family protein. There is one entry showing that there the gene is in the ERF superfamily with an E-value of 9.00e-20 on the conserved domain database in NCBI. There is one “ERF family protein” entry on the NCBI BlastP website which has a 99% coverage and a 2e-85 E value. This entry indicates that the target sequence is probably an ERF family protein. /note=Using PhagesDB BlastP, multiple entries show that the sequence is similar to ERF family protein. /note=Checking the HHpred outputs, the first entry shows a probability of 99.83, an E-value of 2.6e-24, and a 161.78 of score with a function of the ERF superfamily. /note=Transmembrane domains: /note=There is no transmembrane domain suggested by TMHMM, which shows that the protein is neither a membrane protein nor a transmembrane domain. CDS 28897 - 29445 /gene="50" /product="gp50" /function="hypothetical protein" /locus tag="Hum25_50" /note=Original Glimmer call @bp 28897 has strength 8.61; Genemark calls start at 28897 /note=SSC: 28897-29445 CP: yes SCS: both ST: SS BLAST-Start: [Ku-like dsDNA break binding protein [Arthrobacter phage Shoya] ],,NCBI, q3:s4 98.3516% 2.11074E-71 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.526, -3.5707972674952195, no F: hypothetical protein SIF-BLAST: ,,[Ku-like dsDNA break binding protein [Arthrobacter phage Shoya] ],,YP_010649674,73.3696,2.11074E-71 SIF-HHPRED: SIF-Syn: /note=stop @ 29445: Might be a Ku-like dsDNA break-binding protein based on good hits to Gam proteins, analogues of Ku. But example gene for Ku-like (Omega_206) definitely has strong hits to actual Ku protein. Left as NKF for now. See this forum post proposal. https://seaphages.org/forums/topic/4766/ CDS 29442 - 29834 /gene="51" /product="gp51" /function="hypothetical protein" /locus tag="Hum25_51" /note=Original Glimmer call @bp 29442 has strength 5.48; Genemark calls start at 29442 /note=SSC: 29442-29834 CP: yes SCS: both ST: NA BLAST-Start: GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.899, -5.3524728754355015, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Smith, Steven /note=Auto-annotation: Both GeneMark and Glimmer called the same start site at 29442 /note=Coding Potential: The Host-trained and the Self-trained GeneMark both show good coding potential in the first half of the region before the stop codon, but much less closer to the stop codon. /note=SD (Final) Score: The start site at 29442 has a Final Score of -5.352 and a Z-Score of 1.899. /note=Gap/overlap: There is an overlap of 4 bp which is a good indication of this gene being part of an operon. /note=Phamerator: 4/5/23: this gene is in pham 20281 but it is an orpham. /note=Starterator: No data, gene is an orpham. /note=Location call: Looking at the above evidence this call is most likely a functional gene starting 29442. Both Glimmer and Genemark agree on the start site which is a good sign that our start site is correct, and both GeneMarks also showed a large amount of coding potential in the region before the called stop codon. /note=Function call: We cannot call a function for this gene. There were no significant hits in BLASTp, CDD or HHpred which is most likely because this gene is an orpham. /note=Transmembrane domains: There were no TMDs predicted by DeepTMHHM so this cannot be a transmembrane domain. /note=Secondary Annotator Name:Critzer, Nicole /note=Secondary Annotator QC:Good job on the notes just don`t forget to put NA for the starterator dropdown box since the phage is an orpham. I think you could also elaborate as to why you think the gene is functional around that start in the location call section. Just cite and elaborate on one or two pieces of evidence since the final score is not the least negative and the z-score is less than 2. This will give us more insight into the logic behind the chosen start site. CDS 29831 - 29980 /gene="52" /product="gp52" /function="hypothetical protein" /locus tag="Hum25_52" /note= /note=SSC: 29831-29980 CP: yes SCS: neither ST: NI BLAST-Start: GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.692, -3.2388291398924656, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: There was no synteny as it is a singleton phage. /note=Gene (stop@#29,980F) /note=PECAAN Notes /note=Primary Annotator #1: Hoang, Ryan /note=Auto-annotation: This was a manually annotated gene, and as a result, there was no annotation here. /note=Coding Potential: There is low coding potential in the 2nd ORF of the forward reading frame. /note=SD (Final score): The final score of the suggested start is -3.239, which is a good final score. /note=Gap/overlap: It appears that there is an overlap of -4, suggesting that perhaps this gene is a part of an operon in the phage. It appears to fill in the potential gap that was created, from the genes upstream and downstream, and as a result, was determined it could be a real gene. /note=Phamerator: This is a singleton, and as a result, it has no other phages that it can compare with. There was no pham located. /note=Starterator: There were no other results for the Starterator call, as there was no other phages in this pham. /note=Location call: I would call it at 29831, simply because it captures more coding potential, and it has a smaller overlap of 4bp, compared to the other potential start site. As there is no synteny due to it being a singleton, I cannot use synteny to help. /note=Function call: NCBI Blast did not come up with any results, and PhagesDB added onto that, also not finding any conclusive results, with the other results having high e-values. There were additionally no CDD hits. For HHPred, there were also inconclusive hits, with very high E-values >20. /note=Transmembrane domains: There were no transmembrane domains, with DeepTMHMM finding that it was most likely on the interior. /note=Secondary Annotator Name: Tosakuk, Kaemin /note=Secondary Annotator QC: Great work! I agree with your conclusion that this might not be a real gene due to the lack of coding potential overall in the forward direction of the site. CDS 29977 - 30147 /gene="53" /product="gp53" /function="hypothetical protein" /locus tag="Hum25_53" /note= /note=SSC: 29977-30147 CP: yes SCS: neither ST: NA BLAST-Start: GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.527, -3.586741874491512, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Lim, Madeleine /note=Auto-annotation: Glimmer and Genemark do not have an auto-annotated start for this gene. /note=Coding Potential: High coding potential can be seen throughout 30000 to 30100 bp in the first reading frame; shows potential slight mismatch of coding potential in relation to the stop codon. /note=SD (Final) Score: Best start site was seen at 29977 given Z-score (2.527) and Final Score (-3.587). /note=Gap/overlap: Start 29962, the designated LORF site, has an overlap of -19, but Start 29977 has a gap of -4, indicating the gene is potentially part of an operon if Start 29977 is the gene’s true start. /note=Phamerator: There is no Phamerator information as there is no pham associated with this gene. (4/11/2023) /note=Starterator: There is no Phamerator information as there is no pham associated with this gene. Therefore, the Starterator is not informative. (4/11/2023) /note=Location call: Because the gap for Start 29977 indicates the gene is part of an operon, and the LORF call can thus be disregarded, the most likely start site for this gene is at 29977. /note=Function call: NKF; both NCBI and PhagesDB BLAST (4/10/2023) had no results. HHPred’s highest matches (4/10/2023) have high e-values (>20), and there are no results from CDD. /note=Transmembrane domains: There are no hits from DeepTMHMM (4/10/2023), indicating that it is an internal protein. /note=Secondary Annotator Name: Vanderpool, Lauren /note=Secondary Annotator QC: Very detailed notes! Agree with your conclusions. CDS 30144 - 30515 /gene="54" /product="gp54" /function="hypothetical protein" /locus tag="Hum25_54" /note=Original Glimmer call @bp 30144 has strength 10.89; Genemark calls start at 30144 /note=SSC: 30144-30515 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein [Rothia koreensis] ],,NCBI, q9:s5 93.4959% 7.71283E-18 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.769, -2.9966005857560156, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Rothia koreensis] ],,WP_129315507,58.8235,7.71283E-18 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Tosasuk, Kaemin /note=Auto-annotation: Both GLIMMER and GENEMARK agree that the start site is at 30144. The start codon is expected to be ATG. /note=Coding Potential: The gene has reasonable coding potential on both the host-trained and self-trained gene mark. /note=SD (Final) Score: This start site has a score of -2.997, which is the highest possible final score. /note=Gap/overlap: There is a gap upstream of the gene which is 10 bp long, which is within the recommended less than 50 bp threshold. /note=Phamerator: 4/5/2023: 14471. This Pham is also conserved in the one other phage, ValentiniPuff_99. The function of genes in this pham is unknown. /note=Starterator: There is a reasonable start site but it is not conserved within other phages in the cluster. The start site is 30144 and corresponds to ATG. Start 2: 1 of 2 call this start site. /note=Location call: This suggests the gene is real, and starts at 30144. /note=Function call: No known function. All of the hits obtained from BLASTp, CDD hits, and HHpred, indicate that there is no known function for this gene/protein. /note=Transmembrane domains: There were no transmembrane domains indicating that this is not a membrane protein. /note= Secondary Annotator Name: DEAL, MILENA /note=Secondary Annotator QC: Great notes! Remember to update your notes because I believe another gene has been added upstream of yours so your gap is not accurate anymore. Also, I recommend being more specific for the function call, but I do know that the gene is NKF so that might not be necessary. CDS 30512 - 30661 /gene="55" /product="gp55" /function="hypothetical protein" /locus tag="Hum25_55" /note=Original Glimmer call @bp 30512 has strength 5.74; Genemark calls start at 30512 /note=SSC: 30512-30661 CP: yes SCS: both ST: NA BLAST-Start: GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.105, -4.45540384879315, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Vanderpool, Lauren /note=Auto-annotation: Both GlimmerStart and GeneMark Start, at 30512 /note=Coding Potential: There seems to be reasonable coding potential for this gene based on the Host-Trained GeneMark map. However, in the Pham Map there seems to be an overlap in either direction of about 3 base pairs. /note=SD (Final) Score: -4.455, which is not the best score, but it is pretty good compared to the other scores listed. /note=Gap/overlap: As stated above, there is an overlap of about 3 base pairs in both directions. /note=Phamerator: Pham 26021 (last checked 4/9/2023) /note=Starterator: Orpham, N/A /note=Location call: N/A, see above /note=Function call: BLAST returns as “function unknown”, and NCBI had no results. /note=Transmembrane domains: There are 0 predicted TMD’s, and it appears to be an external protein based off of the summary. /note=Secondary Annotator Name: Dawson, Niels /note=Secondary Annotator QC: CDS 30658 - 30984 /gene="56" /product="gp56" /function="HNH endonuclease" /locus tag="Hum25_56" /note=Original Glimmer call @bp 30658 has strength 1.34; Genemark calls start at 30658 /note=SSC: 30658-30984 CP: yes SCS: both ST: NA BLAST-Start: [HNH endonuclease [Caulobacteraceae bacterium]],,NCBI, q1:s2 92.5926% 3.74006E-25 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.066, -4.473611913377223, no F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Caulobacteraceae bacterium]],,NBW17079,55.7522,3.74006E-25 SIF-HHPRED: SIF-Syn: /note=Stop @ 30984: Possible "atypical" endonuclease with HNK and HNN motifs (per this forum post: https://seaphages.org/forums/topic/5505/). has good hits to HNH proteins in HHpred. /note=Primary Annotator Name: Vu, Thomas /note=Auto-annotation: Both Glimmer and GeneMark predict the start site to be at 30658 BP. The start codon is GTG. /note=Coding Potential: There is high gene coding potential over the region believed to be the real gene. Both the self-trained and host-trained GeneMarks show evidence of high gene coding potential and coding potential covers the suggested start site of 30658 BP. /note=SD (Final) Score: -4.474 (best score of all candidates). This candidate also had the 2nd highest Z score at 2.066. /note=Gap/overlap: -4 BP (overlap of 4 BP). This gap is the smallest of all candidates and for that reason, as well as a very strong SD score, suggest this start candidate is reasonable. /note=Phamerator: The pham number is 73969 and the analysis was run on 03/24/23. The pham is seen in Hum25 (which is a singleton cluster) and the AS3 cluster, although the phage found in the AS3 cluster is a draft. /note=Starterator: There are only 2 draft candidates with no manual annotations produced. Thus, the Staterator analysis is uninformative. /note=Location call: This is a real gene with a likely stop site of 30984 BP and start start site of 30658. /note=Function call: The gene function is an HNH endonuclease. The NCBI BLAST showed multiple hits indicating HNH endonuclease function where coverage was at 96% with e-values ranging from e-25 to e-22. For example, NBW17079 called for an HNH endonuclease at 93% coverage and an e-value of e-25. The CDD was not informative as there was only 1 hit with poor coverage (32%). There were multiple hits in the HHPRED calling for endonuclease function such as SCOP_d4ogca2 with an e-value of 0.04, coverage of 56%, and a probability of 95.9. /note=Transmembrane domains: /note=No TMDs were predicted by TMHMM or TOPCONs, so no transmembrane function inferred. /note=Secondary Annotator Name: Barden, Sophia /note=Secondary Annotator QC: Agree with start site, good analysis of coding potential! Why did you choose to compare with Attoomi and Cantare? Be sure to include that the phage found in cluster AS3 on phameratoris a draft still. You made a typo in calling the location call! make sure you include the start site that you called in addition to the stop bp. Blast hit list looks good! Overall great analysis! CDS 30981 - 32510 /gene="57" /product="gp57" /function="DNA methyltransferase" /locus tag="Hum25_57" /note=Original Glimmer call @bp 30981 has strength 7.57; Genemark calls start at 30981 /note=SSC: 30981-32510 CP: yes SCS: both ST: NI BLAST-Start: [methyltransferase [Arthrobacter phage Vibaki] ],,NCBI, q12:s5 67.78% 3.62022E-158 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.958, -2.9525085049809894, no F: DNA methyltransferase SIF-BLAST: ,,[methyltransferase [Arthrobacter phage Vibaki] ],,YP_009883981,66.0241,3.62022E-158 SIF-HHPRED: Modification methylase HhaI; CG-SPECIFICITY, CPG SEQUENCE, C5-METHYLCYTOSINE, NUCLEOTIDE FLIPPING, S-ADENOSYL-L-HOMOCYSTEINE, COMPLEX (METHYLTRANSFERASE- DNA), transferase-DNA complex; HET: SO4, 3DR, SAH; 1.594A {Haemophilus parahaemolyticus},,,5CIY_A,39.2927,100.0 SIF-Syn: /note=Primary Annotator Name: Douglas, Katherine /note=Auto-annotation: Both Glimmer and GeneMark predicted a start site at 30981. /note=Coding Potential: Good coding potential in a single ORF in the forward direction on both the self-trained and host-trained genemark. There was little other coding potential on other ORFs and an appropriate gap between the end of the gene and the beginning of the adjacent gene. While this gene was found in some other pages, there was no extended synteny. Since this gene is a singleton this makes sense. Additionally, there are several high results in the BLAST hits within the same pham with strong e-values. The chosen start site covers all coding potential. /note=SD (Final) Score: -2.953 (least negative value) /note=Gap/overlap: -4 (very small overlap) (since this is an overlap of 4, it is possible this gene is part of an operon) /note=Phamerator: Pham 74651 as of 4/11/2023. Hum25 is a singleton but the pham is found in several phages across multiple other clusters such as BD and ED, among others. It is called as a methyltransferase or DNA methyltransferase in most cases. /note=Starterator: The most annotated start site #28 was found in 8/13 nondraft genomes but Hum25 did not contain this start site. Hum25 contained start site #21 which was very close to #25. This site was not found in any other phages but since Hum25 is a singleton this is not unusual. Site #21 is near the area of the gene where most other phages within the pham called a start site. /note=Location call: This is a real gene with a probable start site at 30981. There is coding potential in the region which is covered by this start site. Although this was not the highest annotated start site among the phages with this gene, it was near the most annotated site. Hum25 did not contain the most annotated start site. There is a gap of -4 which is an acceptable gap. The start codon is ATG which has a high probability. /note=Function call: DNA methyltransferase. There were several hits in both phagesdb and NCBI with strong e-values across several other Artrobacter phages. All these hits called the protein as DNA methyltransferase. CDD also had several hits corresponding to DNA methyltransferase proteins within other bacteriophages. HHpred had many very strong hits with several results that were aligned almost exactly for the entire gene. These hots had e-values below e^-30 and probabilities of 99-100. These are extremely good values and mean Hum25_gp56 likely shares a function with these protein which were both identifies as DNA methyltransferase proteins. /note=Transmembrane domains: 0 predicted domains. This makes sense as DNA methyltransferase is involved with the addition of methyl groups to DNA which is located inside the cell and not in the cell membrane. /note=Secondary Annotator Name: Estampa, Julia /note=Secondary Annotator QC: Great job with annotations! I agree with your start site and function calls! A few suggestions: Under your coding potential notes, I think it would be helpful if you specify which other phages in other clusters demonstrated some synteny. Additionally, because there`s an overlap of 4bp, I think it`s important to make note that the gene is most likely part of an operon. Other than that, amazing work with annotations!! CDS 32602 - 33336 /gene="58" /product="gp58" /function="RepA-like replication initiator" /locus tag="Hum25_58" /note=Original Glimmer call @bp 32602 has strength 8.31; Genemark calls start at 32602 /note=SSC: 32602-33336 CP: yes SCS: both ST: NI BLAST-Start: [replication initiation protein [Gordonia phage Gsput1] ],,NCBI, q23:s7 88.5246% 2.24807E-22 GAP: 91 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.701, -3.219494343057515, yes F: RepA-like replication initiator SIF-BLAST: ,,[replication initiation protein [Gordonia phage Gsput1] ],,YP_009275731,48.8789,2.24807E-22 SIF-HHPRED: Replication initiator protein; initiation, replication, multidrug resistance, PROTEIN BINDING; 2.6003A {Staphylococcus aureus subsp. aureus},,,4PTA_D,38.5246,98.1 SIF-Syn: Synteny was not observed for this gene amongst other phage genomes. /note=Primary Annotator Name: Sacristan, Ariana /note=Auto-annotation: Glimmer and GeneMark agree on start site 32602 (ATG) /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF that encompasses the chosen start site. /note=SD (Final) Score: This site has the best Final Score, -3.219 and is supported by a high Z-score of 2.701. /note=Gap/overlap: The 91 bp gap is reasonable as it is too small to indicate a missing gene and does not contain other possible start sequences that contain coding potential or ORFs. This start site also exhibits the true LORF. The length of the gene is also acceptable as it is 735 bp in length. /note=Phamerator: As of 04/09//2023 this gene is located in Pham 72489. This Pham consists mostly of Cluster F members, such as, Beakin and ByChance which were used to compare against Hum25. The function consistently called in Phamerator had three different options for gene function called including, RepA-like replication initiator, helix-turn-helix DNA-binding domain protein and DNA binding protein, each was called once. All other pham members had no function called. /note=Starterator: There is a reasonable start site choice that is conserved among the Pham 72489 members. The start site number conserved in this pham is 5, which begins at 8366, and is called by 9/9 non-draft genes. This gene does not contain the “Most Annotated” start. However, it does call start site 4, which is very close to the "most annotated" start, suggesting an evolutionary change. /note=Location call:The gathered evidence suggests that this is a real gene that most likely starts at 32602. /note=Function call: The predicted function of this ORF is a repa-like replication initiator. NCBI and PhagesDB BLASTp provided a few strong hits. PhagesDB Blastp had repa-like replication initiator hits with E-values as low as 2e-13. NCBI had one strong hit for a known protein function, which was a replication initiation protein with 88% coverage and E-value of 2.97e-22. HHPred also had a couple of hits for various replication initiator proteins with high probabilities and decent E-values. A top PDB HHPred hit for replication initiator protein had a probability of 98.1%, the highest possible coverage of all hits at 38%, and an E-value of 7.5e-5. /note=Transmembrane domains:There were no predicted TMDs by TMHMM, therefore it is not a membrane protein. This correlates with the called function as the RepA-like replication initiator would not be located within a transmembrane /note=Secondary Annotator Name: Chawla, Esha /note=Secondary Annotator QC: Hey – great job with your notes, they`re very thorough and complete! I would include which reading frame has good coding potential in coding potential section, and I would include whether it is reasonable that there are not TMDs by TMHMM based on the proposed function. CDS 33333 - 34556 /gene="59" /product="gp59" /function="DnaB-like dsDNA helicase" /locus tag="Hum25_59" /note=Original Glimmer call @bp 33333 has strength 8.96; Genemark calls start at 33333 /note=SSC: 33333-34556 CP: yes SCS: both ST: SS BLAST-Start: [AAA family ATPase [Pseudarthrobacter albicanus]],,NCBI, q1:s1 98.2801% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.567, -5.521679486415972, no F: DnaB-like dsDNA helicase SIF-BLAST: ,,[AAA family ATPase [Pseudarthrobacter albicanus]],,WP_211881275,90.2913,0.0 SIF-HHPRED: DNAB-Like Replicative Helicase; ATPase, REPLICATION; 3.91A {Bacillus phage SPP1},,,3BGW_A,89.9263,100.0 SIF-Syn: DnaB-like dsDNA helicase – Waiting for identification of upstream gene functions - similar domain of genes (proteins involved in nuclease functions and DNA binding) to phages Banquo (CU1), DinoDaryn (CU1), and Aleemily(DZ). /note=Primary Annotator Name: Barden, Sophia /note=Auto-annotation: Glimmer and GeneMark agree, call Start at 33333. /note=Coding Potential: Coding potential in the third ORF on the forward strand for 33333-34556 only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host Trained. /note=SD (Final) Score: The best final score is -5.522 with the highest z score of 1.567. This selection calls Start site 33333 with codon ATG. /note=Gap/overlap: Overlap of -4. Ultimately reasonable overlap. Coding potential decreases upstream of the stop site and initiates overlap in other ORFs that indicates the start of another gene. Hum25 is a singleton, so identifying synteny/conservation across other phage genomes is difficult. Based on Pham 4302, synteny for this gene is identified with phages Banquo (CU1), DinoDaryn (CU1), and Aleemily(DZ). /note=Phamerator: Analysis was run 03/24/23 on database version 505. Pham number 4302 has 19 members, 5 are drafts. Hum25 Gene(stop@34556F) does not have the most annotated start. /note=Starterator: Calls start 13: (13, 33333). This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and from 33333- 34556. Starterator agrees with Glimmer and Genemark. /note=Function call: Algorithms run 04/07/2023. PhagesDB BLAST top two hits → functions called DNA Helicase. (Score range: 193–191, E value range: 5e-49 – 3e-48). NCBI BLAST top three hits → Functions called AAA family ATPase (E value range: 0–7e-150, % coverage: 98%+, % identity: 84-53%). HHPRED top three hits → Calls functions, Dnab-like Replicative Helicase, Dnab replication for helicase, Replicative helicase (probability: 100, % coverage: 90–92% , E value range: 3.7e-33–1.2e-32). This is downstream of methyltransferase and HNH endonuclease. Call DnaB-like dsDNA helicase for this gene. Most weight is given to the HHpred and CDD hits, focusing heavily on the specific helicase specified by HHpred. /note=Transmembrane domains: Cannot call membrane protein. Number of predicted TMRs: 0 /note=Secondary Annotator Name: Douglas, Katherine /note=Secondary Annotator QC: Annotations look pretty good overall! A couple things: In your SD final score notes you state that 33333 has the best z-score and RBS. I don’t think this is true. Z-score should be above 2 and RBS should be the most negative score. It`s fine to call this site anyways based on other evidence but just be careful with what you write as your reasoning in the notes. For starterator (I think you put this info in the phamerator box by accident), you mention that Hum25 does not have the most annotated start site. Is there any other information to be gained for this information, such as if there are any MAs for Hum25 start sites? Or if the called start site is near the most annotated one? Also, consider including more information about what other members of the pham were called as in phamerator. Good work! CDS 34553 - 34777 /gene="60" /product="gp60" /function="hypothetical protein" /locus tag="Hum25_60" /note=Original Glimmer call @bp 34553 has strength 5.46; Genemark calls start at 34553 /note=SSC: 34553-34777 CP: yes SCS: both ST: NA BLAST-Start: GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.033, -3.3503726477288405, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: No synteny to report for this gene. This bacteriophage is a singleton. /note=Primary Annotator Name: Dawson, Niels /note=Auto-annotation: Both glimmer and genemark agree on start site 34553. /note=Coding Potential: There was coding potential on GenemarkS and Genemark host that existed across the entire proposed gene. It is important to note that both programs reported no coding potential on the reverse strand that would suggest an alternate gene. /note=SD (Final) Score: Final score is -3.350 and z score is 3.033, which are significantly supportive scores for the start site at 34553. /note=Gap/overlap: There is an overlap of -4 for this start site, which indicates that this gene is part of an operon. This particular piece of evidence suggests that start site 34553 is the strongest candidate, as other strong candidates have good final scores and z scores, but have large gaps that are suboptimal compared to -4. /note=Phamerator: The pham is 24657. This is a singleton pham and there are no other bacteriophages. /note=Starterator: There is no synteny to report for this gene and no starterator report to view. /note=Location call: Based on the above evidence, the most likely start site is 34553 and this is a real gene. /note=Function call: No significant hits on NCBI nucleotide blast and BLASTp for this gene. No hits on CDD. Therefore, this gene has no known function. /note=Transmembrane domains: No transmembrane domain hits from TmHMM and topcons. /note=Secondary Annotator Name: Douglas, Katherine /note=Secondary Annotator QC: Good job overall! Couple things to check is that you say there is only evidence of coding potential in the self-trained Genemark and not the host-trained. I would double check this as it seemed to me there was potential in the other. Also, I think you made a typo in your location call for the start site where you said it was 24553 instead of 34553. In regards to the synteny box I think maybe you`re just supposed to leave it blank if there is no synteny but maybe double check with the annotation guidelines. Other than that everything looks good! CDS 34774 - 35166 /gene="61" /product="gp61" /function="MazG-like nucleotide pyrophosphohydrolase" /locus tag="Hum25_61" /note=Original Glimmer call @bp 34774 has strength 9.02; Genemark calls start at 34774 /note=SSC: 34774-35166 CP: yes SCS: both ST: SS BLAST-Start: [pyrophosphatase [Arthrobacter phage Faja] ],,NCBI, q1:s8 100.0% 2.96318E-43 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.016, -2.4787699911121788, yes F: MazG-like nucleotide pyrophosphohydrolase SIF-BLAST: ,,[pyrophosphatase [Arthrobacter phage Faja] ],,YP_010656372,67.9688,2.96318E-43 SIF-HHPRED: a.204.1.2 (A:) automated matches {Mouse (Mus musculus) [TaxId: 10090]} | CLASS: All alpha proteins, FOLD: all-alpha NTP pyrophosphatases, SUPFAM: all-alpha NTP pyrophosphatases, FAM: MazG-like,,,SCOP_d6sqwa_,96.1538,99.2 SIF-Syn: Primary Annotator: Mulin Li This gene is usually upstream of ssDNA-binding proteins and downstream of DNA methylase and DnaB-like dsDNA helicase, similar to genome architecture of Dardnus, Catfish, and Banquo. /note=Primary Annotator Name: Li, Mulin /note=Auto-annotation: Both GeneMark and Glimmer called on the same start site @34774. /note=Coding Potential: Both self-trained and host-trained GeneMarks models show high coding potential over parts of the auto-annotated start site @34774 and across the whole gene. Only one lane in the forward gene direction shows coding potential, while none of the lanes in the reverse direction shows any coding potential. /note=SD (Final) Score: The auto-annotated start site @34774 has a z-score of 3.016 and RBS score of -2.479. The predicted start site is covered with high coding potential. /note=Gap/overlap: The start site of this gene is identified as a ATGA operaon with -4 bp overlapped with the previous stop site. /note=Phamerator: This gene product belongs to the Pham 74274, which is conserved among 156 phages from a variety of clusters (GD, AE, CS, AE, EG, etc.) A majority of the phages in this Pham are manually annotated. The most commonly annotated function for this Pham is MazG-like nucleotide pyrophosphohydrolase. /note=Starterator: The Starterator was generated on 04/09/23. Hum25 does not have the most annotated start site, which is #52 found in 54 of 156 phage genomes in the pham and called by 48/125 manual annotations. The Glimmer and GeneMarks predicted start site @34774 is only found in Hum25. This start site is supported by high RBS score and z-score and its gap with regard to the previous gene fits the compact phage genome architecture. /note=Location call: This gene is a real gene in the forward direction. It starts @34774 and ends @35166. /note=Function call: MazG-like nucleotide pyrophosphohydrolase; Both PhageDB blast and NCBI blast generated top hits with high e-value and percentage identities for MazG-like nucleotide pyrophosphohydrolase from phage RadFad, EvePickles, and Gorpy (e-value < 3e-40). CDD generated hits for nucleoside triphosphate Pyrophosphohydrolase (EC 3.6.1.8) MazG-like domain superfamily with e-value = 8.53e-25. HHpred found crystal structures for Nucleotide pyrophosphohydrolase MazG with high probability (>95%), score (>70), and e-value (<1e-9). This gene also shows synteny with many other phages. It’s usually upstream of ssDNA-binding proteins and downstream of DNA methylase and DnaB-like dsDNA helicase. Those evidence together support its function to be MazG-like nucleotide pyrophosphohydrolase. /note=Transmembrane domains: Neither TmHmm or Topcon predicts transmembrane domain for this protein product. It was predicted to be an intracellular protein which fits its functional annotation of MazG-like nucleotide pyrophosphohydrolase. /note=Secondary Annotator Name: LE, VIVIAN /note=Secondary Annotator QC: After reviewing, I agree with the primary annotator`s annotations/observations. CDS 35235 - 35735 /gene="62" /product="gp62" /function="SSB protein" /locus tag="Hum25_62" /note=Original Glimmer call @bp 35235 has strength 12.51; Genemark calls start at 35235 /note=SSC: 35235-35735 CP: yes SCS: both ST: SS BLAST-Start: [single-stranded DNA-binding protein [Paenarthrobacter sp. A20] ],,NCBI, q1:s1 100.0% 1.11549E-80 GAP: 68 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.973, -5.020155606601113, no F: SSB protein SIF-BLAST: ,,[single-stranded DNA-binding protein [Paenarthrobacter sp. A20] ],,WP_253468357,87.9747,1.11549E-80 SIF-HHPRED: Single-stranded DNA-binding protein; DNA binding, quaternary structure, plasticity, inhibitor development, DNA BINDING PROTEIN; HET: FMT; 1.92A {Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv)},,,7F5Y_A,72.2892,100.0 SIF-Syn: Function was called as SSB protein. Upstream is a gene in pham 11583 (as of 4/5/23) and downstream is a gene in pham 14481 (as of 4/5/23), just like in cluster FB Arthrobacter globiformis phage Shoya. /note=Primary Annotator Name: Reyimjan, Diana /note=Auto-annotation: Both Glimmer and GeneMark call the start site as 35235. This corresponds to the longest open reading frame with start codon ATG, which agrees with the host preference. /note=Coding Potential: The start site of 35235 covers all coding potential in both host-trained and self-trained GeneMark. /note=SD (Final) Score: The final score is -5.020, which is not the lowest final score out of the possible start sites. The z-score of 1.973 is also not the highest of the possible start sites. /note=Gap/overlap: There is a gap of 68 bp, which is not long enough to insert a gene. This is also the smallest gap with the start site of 35235. /note=Phamerator: This gene, as of 4/5/23, is in pham 73472. There are 461 phages in this pham, with the majority of non-draft genomes being called as ssDNA-binding protein. The members of this pham belong to a wide range of clusters, and it appears that Cluster E is the most prevalent in this pham. /note=Starterator: The most manually annotated start site is 89, which was called in 183 out of 389 non-draft genes in the pham and is called 99.1% of the time when present. This corresponds to a start site at 35235. /note=Location call: Based on the evidence from auto-annotation, Starterator, and BLAST, this is a real gene and the start site of this gene is most likely 35235. Not only is it called by both Glimmer and GeneMark, but it covers all coding potential and has the smallest gap out of all the start sites. Although the final score and z-score are not the best, it is more convincing that the genes in the pham manually annotate this start site the most. Since the genes in this pham are from a wide range of clusters and this start site is still the most manually-annotated site, start site 89 must be highly conserved. Thus, it is very likely that the start site is 35235. /note=Function call: Based on hits from PhagesDB BLAST, NCBI BLAST, CDD, and HHPred, this is a gene for a ssDNA-binding protein. PhagesDB BLAST yielded hits from other Arthrobacter globiformis phages with very low e-values (<1e-60) that corresponded to ssDNA-binding protein, and NCBI BLAST yielded similarly robust results from Arthrobacter sp. phages. The CDD identified almost the entire length of the single-stranded DNA binding protein with a very low e-value of 1.59e-81. HHPred also had hits with high probabilities (>99%) that were for ssDNA-binding proteins (e-value < 1e-20). Therefore, this gene is for a ssDNA-binding protein. According to SEA-PHAGES guidelines, the function will be called as SSB protein. /note=Transmembrane domains: There are no transmembrane domains. DeepTMHMM predicts that the topology of this protein is mostly inside, which makes sense if the protein is a ssDNA-binding protein; the protein would have no reason to have a transmembrane domain since it only interacts with DNA inside the bacterial cell. /note=Secondary Annotator Name: Douglas, Katherine /note=Secondary Annotator QC: Really good work! I don’t have any other notes and I agree with all your calls. CDS 35736 - 36095 /gene="63" /product="gp63" /function="RusA-like resolvase" /locus tag="Hum25_63" /note=Original Glimmer call @bp 35775 has strength 6.76; Genemark calls start at 35736 /note=SSC: 35736-36095 CP: yes SCS: both-gm ST: NI BLAST-Start: [hypothetical protein [Arthrobacter sp. zg-Y20] ],,NCBI, q6:s3 94.958% 1.16904E-36 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.086, -5.534405718628961, no F: RusA-like resolvase SIF-BLAST: ,,[hypothetical protein [Arthrobacter sp. zg-Y20] ],,WP_227914501,65.5462,1.16904E-36 SIF-HHPRED: d.79.6.1 (A:) automated matches {Escherichia coli [TaxId: 562]} | CLASS: Alpha and beta proteins (a+b), FOLD: Bacillus chorismate mutase-like, SUPFAM: Holliday junction resolvase RusA, FAM: Holliday junction resolvase RusA,,,SCOP_d2h8ea_,98.3193,99.9 SIF-Syn: no synteny found /note=Primary Annotator Name: Kim, Cindy /note=Auto-annotation: Glimmer called a start site at 35775 and GeneMark called a start site at 35736. /note=Coding Potential: There is good coding potential in the forward strand in both the Host and Self GeneMark. The chosen start site includes all of the coding potential. /note=SD (Final) Score: The chosen start site at 35796 has the second best final score of 1.955 and the second best Z-score of 1.955. /note=Gap/overlap: Gap of 60 bp. While not ideal, this start site at 35796 has a more ideal final and Z-score. /note=Phamerator: Pham 73953 on 4/5/23. It is conserved in Bubbles123 (F1) and Chuckly (F1). PhagesDB indicates that the only non-draft phage in the pham has a function of RusA-like resolvase. /note=Starterator: Start site 9 was manually annotated in 1/1 non-draft genes in the pham. Start site 9 corresponds to 35796 in Hum25, which does not agree with Glimmer and GeneMark. /note=Location call: Considering all the evidence above, this gene is a real gene and has a start site at 35796 bp. /note=Function call: RusA-like resolvase. The top three PhagesDB BLAST hits have the function of RusA-like resolvase with low e-values from 1e-9 to 9e-9. There were also 2 significant HHpred hits for endodeoxyribonuclease RusA with low E-values (2.5e-16 to 1.7e-13), high probabilities of 99.7%, and high coverage (92-95%). /note=Transmembrane domains: DeepTMHMM did not predict any transmembrane regions, therefore it is not a membrane protein. /note=Secondary Annotator Name: Kretschmer, Thomas /note=Secondary Annotator QC: Really good job. I think you should try to include more evidence for your chosen start site. Everything else is really detailed and you did a really good job. CDS complement (36191 - 36400) /gene="64" /product="gp64" /function="hypothetical protein" /locus tag="Hum25_64" /note=Original Glimmer call @bp 36400 has strength 2.52; Genemark calls start at 36556 /note=SSC: 36400-36191 CP: no SCS: both-gl ST: NA BLAST-Start: GAP: 37 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.683, -3.318900340397143, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chawla, Esha /note=Auto-annotation: Glimmer Start Site: 36400, GeneMark Start Site: 36556. The auto-annotated called start sites are approximately 156 base pairs apart. The start codon is the same for both: ATG. /note=Coding Potential: There is a decent amount of coding potential contained in the putative ORF, in the 1st reverse reading frame in both GeneMark Host and Self. It is more likely the start site is at 36556, it better captures all of the coding potential. This gene is likely real as there is decently good coding potential throughout the whole gene. /note=SD (Final) Score: The best SD (Final) score is -3.319, which is the SD score associated with the Glimmer called start site at 36400, making this the better start site. The SD (Final) score for the GeneMark called start site at 36556 is -6.097, which is significantly worse than the SD (Final) score for the Glimmer called start site. /note=Gap/overlap: This gene is a reverse gene in the midst of many forward genes. There is a 96 base pair gap between the stop of this gene and the stop of the previous, forward gene. There is no coding potential in this 96 base pair gap, so unlikely there is gene in the gap or that the start needs to be moved further upstream. This gene is a relatively normal sized gene – it is 210 base pairs long. /note=Phamerator: On the day of my investigation, 4/6/2023, this gene was found in Pham 26990. /note=This Pham only has 1 member, Hum25, which is a draft. This phage is a singleton, which makes its analysis especially interesting, as there are currently no other phages that are in the same cluster. Hum25 is in cluster AX, and there is no called function for this gene. /note=Starterator: There is no data in the starterator, as this protein is an orpham, or phams with only one member in them. There is no starterator data generated, so there is nothing to compare it to. /note=Location call: This is likely a real gene, as it has decently good coding potential. The start site at 36400 is the more likely start site, as it has the lower SD (Final) Score and has less overlap with previous genes, compared to start site at 36556. Because this protein is an orpham, it is unknown if the gene is well-conserved, so further analysis is needed on this. /note=Function call: There is no conclusive evidence regarding function. NCBI Blast and PhagesDB BLAST did not show any genes with a significant degree of alignment. Most of the results generated in HHPRED were “Domain of unknown function.” There are no domain hits when analysis is performed using CDD. /note=Transmembrane domains: No transmembrane domains – unable to conclude if this is in-line with the function, as the current function of this gene is NKF. /note=Secondary Annotator Name: Vanderpool, Lauren /note=Secondary Annotator QC: Y. Looks good! Very detailed notes CDS 36438 - 36614 /gene="65" /product="gp65" /function="hypothetical protein" /locus tag="Hum25_65" /note=Original Glimmer call @bp 36438 has strength 4.09 /note=SSC: 36438-36614 CP: yes SCS: glimmer ST: NA BLAST-Start: GAP: 37 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.312, -4.038464273694425, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Pisipati, Kirthana /note=Auto-annotation: Glimmer calls a start site of 36438, while there is no start site called by Genemark. The start codon is ATG, which has a relatively high probability. /note=Coding Potential: Both host trained and self trained Genemark show decent coding potential throughout the putative ORF. However, start site 36438 does not cover all of the coding potential. /note=SD (Final) Score: The final score is -4.038; several other start sites have better final scores but would either result in an ORF that is too short or a significant overlap with the previous gene. /note=Gap/overlap: The gap is 37bp, which is not ideal but not large enough to add a new gene. The ORF is 177bp long. /note=Phamerator: This gene is in pham 22932 as of 4/9/23, which is an orpham. As such, there are no genomes that can be used to compare to Hum25. /note=Starterator: Since this gene is in an orpham, there are no genomes with which to compare start sites; starterator info is not applicable. /note=Location call: This is a real gene, since there is decent coding potential throughout the ORF. While the autoannotated start site (36438bp) does not cover all of the coding potential and does not have the best final score, it is still likely to be the start site since all other potential start sites would either result in a very short ORF or overlap too much with the previous gene. /note=Function call: Neither of the BLASTp databases (phagesDB and NCBI) had any hits. CDD also did not have any hits. HHpred had several hits, but the e values were very high, so none of the results were useful. Thus, this gene has no known function. /note=Transmembrane domains: There are no transmembrane domains, which cannot be used to confirmed the function, since there is no known function. /note=Secondary Annotator Name: Wang, Xinyi /note=Secondary Annotator QC: I agree with both location and function calls. While there is a gap between the previous gene and this one, it is not a gap larger than 50bp. Since the gene is an orpham, it is reasonable that there isn`t any reference for function call in the databases. CDS 36611 - 36898 /gene="66" /product="gp66" /function="hypothetical protein" /locus tag="Hum25_66" /note=Original Glimmer call @bp 36731 has strength 5.8; Genemark calls start at 36611 /note=SSC: 36611-36898 CP: yes SCS: both-gm ST: NA BLAST-Start: [MAG TPA: hypothetical protein [Caudoviricetes sp.]],,NCBI, q2:s9 40.0% 0.00124155 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.409, -5.916673162539987, no F: hypothetical protein SIF-BLAST: ,,[MAG TPA: hypothetical protein [Caudoviricetes sp.]],,DAS47138,23.7113,0.00124155 SIF-HHPRED: SIF-Syn: /note=Gene (stop@#36,898F) /note=PECAAN Notes /note=Primary Annotator #1: Hoang, Ryan /note=Auto-annotation: Glimmer states that the start site is at 36,731. GeneMark, on the other hand, calls the start site at 36,611. /note=Coding Potential: Most of the ORF has high coding potential on both GeneMark self and host, indicating that the gene is a real gene, on the 2nd ORF of the forward strain. Both start sites capture the majority of the coding potential, however, it does appear that the GeneMark start does capture more coding potential. /note=SD (Final score): The final score of the GeneMark start is not the highest, at a SD score of -5.917. The other start sites have higher scores, of up to -4.698. The GeneMark call at 36731 comparatively has a SD of –5.849. /note=Gap/overlap: Depending on the start site of the gene, there is either an overlap of -4 (at start 36611), or a gap of 116bp at 36731. The gene downstream of this has a gap of 22bp. /note=Phamerator: This is a singleton, and as a result, it has no other phages that it can compare with. The phage is in Pham 24044 as of April 12, 2023. /note=Starterator: There were no other results for the Starterator call, as there was no other phages in this pham. /note=Location call: I would call it at 36,611, simply because it captures more coding potential, and it has a smaller gap than the start at Glimmer. However, this start site doesn’t have the highest RBS Final score or Z-score in comparison to the other start site as suggested by Glimmer. As there is no synteny due to it being a singleton, I cannot use synteny to help. /note=Function call: NCBI Blast did not come up with any conclusive results, and PhagesDB added onto that, also not finding any conclusive results, with the other results having high e-values and being genes of unknown function. There were additionally no CDD hits. For HHPred, there were also inconclusive hits, with very high E-values >20. /note=Transmembrane domains: There were no transmembrane domains, with DeepTMHMM finding that it was most likely on the interior. /note=Secondary Annotator Name: Martin, Kyle /note=Secondary Annotator QC: Good Job, I agree with your evidence and like how its very detailed. /note=QC - Fadi Al Banaa: changed start site to 36611 CDS 36895 - 37020 /gene="67" /product="gp67" /function="hypothetical protein" /locus tag="Hum25_67" /note=Original Glimmer call @bp 36895 has strength 2.71; Genemark calls start at 36895 /note=SSC: 36895-37020 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PP357_gp60 [Arthrobacter phage Sarge] ],,NCBI, q2:s3 95.1219% 5.10999E-6 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.798, -5.03807843799195, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PP357_gp60 [Arthrobacter phage Sarge] ],,YP_010649614,69.0476,5.10999E-6 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Lim, Madeleine /note=Auto-annotation: Glimmer and Genemark denote start of gene at 36895 /note=Coding Potential: High coding potential can be seen throughout 36900 to 37000 bp in the first reading frame; there is also high coding potential seen in the complementary sequence, as shown by the last reading frame. /note=SD (Final) Score: Best start site was seen at 36895 given Z-score (1.798), which is lower than the ideal threshold of 2 but is higher than the one other possible start site, and Final Score (-5.038). /note=Gap/overlap: The auto-annotated Start 36895 has a gap of -4, indicating the gene is potentially part of an operon if Start 36895 is the gene’s true start. /note=Phamerator: This gene is shared with Sarge (Gene 60) and Shoya (Gene 63) from Cluster FB; it is denoted as a “hypothetical protein” for both phages. /note=Starterator: This gene is shared with Sarge and Shoya from Cluster FB; it is denoted as a “hypothetical protein” for both phages. All three phages share the same start #2, which has two manual annotations. This corresponds to Start 36895 (auto-annotated) for Hum25. (4/9/23) /note=Location call: Because of consensus between auto-annotations, the overlap of 4 bp between this gene and the previous gene, and the presence of manual annotations for this sequence, Start 36895 is the most likely start for this gene. /note=Function call: NKF; both NCBI and PhagesDB BLAST (3/31/2023) had results for “hypothetical protein”s only. HHPred’s highest matches (3/31/2023) have high e-values (>5) but also has high coverage and high probability (~80-82). The top six hits all call for some form of glycosyltransferase, but the example for a phage with that protein, named Spud, is not available on PECAAN and does not have a similar sequence to Hum25`s. /note=Transmembrane domains: There are no hits from DeepTMHMM (4/11/2023), indicating that it is an internal protein. /note=Secondary Annotator Name: Reyimjan, Diana /note=Secondary Annotator QC: Agree with start site call. Not sure that the function call is accurate; ran HHPred analysis and the e-values are not very good. Need to consult instructor. Also select option under GM coding capacity and Starterator! CDS 37030 - 37182 /gene="68" /product="gp68" /function="hypothetical protein" /locus tag="Hum25_68" /note= /note=SSC: 37030-37182 CP: yes SCS: neither ST: SS BLAST-Start: GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.544, -4.297631822681255, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: NKF /note=Primary Annotator Name: Martin, Kyle /note=Auto-annotation: Genemark and Glimmer were not auto-annotated for this gene. /note=Coding Potential: The coding potential as seen is in both the forward and reverse direction for this gene and is similar in both the Host and Self-Trained GeneMark. /note=SD (Final) Score: This score is -4.298 which is the best of all candidates. /note=Gap/overlap: The gap is 9 which indicates a gap between that and another gene. We see /note=that the length is also reasonable as well. /note=Phamerator: Since it was not auto-annotated, we do not have a Pham number yet. It was run on 4/10/23. /note=Starterator: We do not have a staterator since we did not auto-annotate this gene. It is an orpham so no starterator report to check. /note=Location call: This is a real gene with a likely start site of 37030. /note=Function call: Function call: NFK. For the function, it says it is an unknown function. The CDD dataset had no data for me to look at. /note=Transmembrane domains: After running the DeepTMHMM, we notice that there are zero domains which mean, none are found. /note=Secondary Annotator Name: Vu, Thomas /note=Secondary Annotator QC: Y: Everything looks good and I agree with your conclusion, just make sure to fiill out the "Pham" box CDS 37145 - 37354 /gene="69" /product="gp69" /function="hypothetical protein" /locus tag="Hum25_69" /note=Original Glimmer call @bp 37145 has strength 1.4; Genemark calls start at 37136 /note=SSC: 37145-37354 CP: yes SCS: both-gl ST: NA BLAST-Start: GAP: -38 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.266, -2.606079664606164, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: NKF /note=Primary Annotator Name: Martin, Kyle /note=Auto-annotation: Genemark and Glimmer were both used in this case. Glimmer has a start site at 37145 and GeneMark has a start site at 37136. Going with the Glimmer in this case with the lower Final Score and Gap. /note=Coding Potential: The coding potential as seen is in both the forward and reverse direction for this gene and is similar in both the Host and Self-Trained GeneMark. /note=SD (Final) Score: This score is -2.606 which is the best of all candidates. /note=Gap/overlap: The gap is 124 which indicates a gap between that and another gene. We see that the length is also reasonable as well. /note=Phamerator: The Pham number is 17985 and was run on 3/31/23. /note=Starterator: The start site from the Pham Starterator and PhagesDB is 17985. It is an orpham so no starterator report to check. /note=Location call: This is a real gene with a likely start site of 37145. /note=Function call: NFK. For the function, it says it is an unknown function. The CDD dataset had no data for me to look at. /note=Transmembrane domains: After running the DeepTMHMM, we notice that there are zero domains which mean, none are found. /note=Secondary Annotator Name: Hoang, Ryan /note=Secondary Annotator QC: Careful- it looks like Glimmer and GeneMark actually has different start sites. I`d love to see a little bit of discussion on why you decided to go for Glimmer`s start (which I agree with). For the gap- there is a gap between the downstream gene, yes, however, there are overlaps that is a bit high at -38bp overlap between the upstream gene and this gene. CDS 37354 - 37650 /gene="70" /product="gp70" /function="hypothetical protein" /locus tag="Hum25_70" /note=Genemark calls start at 37354 /note=SSC: 37354-37650 CP: yes SCS: genemark ST: NA BLAST-Start: [hypothetical protein [Arthrobacter sp. StoSoilB13] ],,NCBI, q19:s11 79.5918% 9.20125E-9 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.527, -3.586741874491512, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Arthrobacter sp. StoSoilB13] ],,WP_224089731,48.3871,9.20125E-9 SIF-HHPRED: SIF-Syn: orpham /note=Primary Annotator Name: Pan, Crystal /note=Auto-annotation: Glimmer does not have a start site. GeneMark says there is a gene, and the start site is called at 37354. /note=Coding Potential: Host-trained GeneMark shows good coding potential except for a big dip at 38420, and near the stop site where there is less coding potential. Self-trained GeneMark shows the same good coding potential. The dips in the coding potential still show that there is coding potential throughout the start site. Coding potential is for the most part only in the forward strand. Where there is a dip at approximately 38420, there is coding potential in the reverse strand (hypothesized that because it`s part of an orpham, there may be ribosomal slippage?? seems to be normal for there to be some noise). There is likely a forward gene here, and the start site 37354 covers the entire coding potential. /note=SD (Final) Score: -3.567. This is the best RBS score, and covers all of the coding potential. The z-score is 2.527 which is higher than 2, so it is good (more than 2 standard deviations away). /note=Gap/overlap: -1 bp. This is a favorable gap, as it is a 1 bp overlap. This indicates that it is likely part of an operon. /note=Phamerator: 16925. Date 2023-04-05. This gene is not conserved in any other bacteriophages. The only gene this shows up in is in the Hum25 phage that is a singleton. Thus, this is an orpham. /note=Starterator: The starterator indicates that it is pham 16925. The starterator does not yield any useable information, likely because this gene is an orpham. /note=Location call: Evidence supports that this gene is real and the start site is 37354 (ATG). /note=Functional call: NKF. HHPred says phageshock protein B, but the e-value is very bad, and the coverage is very small (19%, which is lower than the 35%. CDD does not show any hits and Phagesdb function frequency also does not show any hits. Because it is also an orpham, it makes sense to conclude there is no known function. /note=Transmembrane domains: DeepTMHMM does not call any transmembrane domains. It says that it is an `outside` protein. /note=Secondary Annotator Name: Le, Vivian /note=Secondary Annotator QC: After reviewing, I agree with the primary annotator`s annotations/observations. CDS 37651 - 37791 /gene="71" /product="gp71" /function="hypothetical protein" /locus tag="Hum25_71" /note=Original Glimmer call @bp 37651 has strength 6.31; Genemark calls start at 37651 /note=SSC: 37651-37791 CP: yes SCS: both ST: NI BLAST-Start: GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.449, -3.670670413150579, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: n/a /note=Primary Annotator Name: Vajragiri, Shreya /note=Auto-annotation: Both Glimmer and GeneMark agree that there is a gene and the start site is 37651. /note=Coding Potential: Both self-trained and host-trained GeneMark show coding potential on the forward strand for this gene; there is a small gap (~20bp) at the beginning of the predicated start site where there isn’t coding potential, but this is reasonable. /note=SD (Final) Score: -3.671. It’s the best score, and the only reasonable option as the other possible start sites created genes that were too small (<50bp), missing a lot of coding potential. /note=Gap/overlap: 0, this is a highly likely start site, as there is no gap or overlap. /note=Phamerator: 16396. Date: 03/24/23. The gene is conserved; it is found additionally in Anekin. However, there are only 2 members, and all are drafts. Therefore there is no good comparison for Hum25. /note=Starterator: There are no manual annotations for any of the genes in the pham. The auto-annotated start site for Anekin is different from Hum25, but each start site is not present in the other phage (it is unique). /note=Location call: Evidence supports that the gene is real, and that the start site is 37651 based on coding potential and final score. /note=Function call: unknown function: there are no blast hits (within PhagesDB and on NCBI, for both nucleotide and protein sequences). There are no CDD hits. Any HHPred hits are insignificant, with too high e-values. /note=Transmembrane domains: No similar hits, but the protein is predicted to be an inside protein (globular). /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 37788 - 38012 /gene="72" /product="gp72" /function="hypothetical protein" /locus tag="Hum25_72" /note=Original Glimmer call @bp 37788 has strength 8.39; Genemark calls start at 37788 /note=SSC: 37788-38012 CP: yes SCS: both ST: SS BLAST-Start: GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.201, -4.331668843916977, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: No synteny found /note=Primary Annotator Name: Zhu, Yichen /note=Auto-annotation: Gene (stop@38012 F) /note=Coding Potential: high /note=SD (Final) Score: -4.332 /note=Gap/overlap: -4 /note=Phamerator: 25224 (1 member) /note=Starterator: NA /note=Location call: 37788 /note=Function call: No known function as all hits are hypothetical proteins /note=Transmembrane domains: 0 /note= /note=Secondary Annotator Name: Barden, Sophia /note=Secondary Annotator QC: Good identification for each section but explanation is needed for the significance/ comprehension of all these results! Agree with Start site 37788 based on coding potential that supports this site. Exalting significance of SD and Gap. For Phamerator, explain this gene in 25224 is an orpham. Explain why starterator is N/A. What is your location call? Denote the start and stop site. Explain why you didn’t select any Hits even if NKF (Low coverage? Bad score?) CDS 38012 - 38176 /gene="73" /product="gp73" /function="hypothetical protein" /locus tag="Hum25_73" /note=Original Glimmer call @bp 38012 has strength 10.47; Genemark calls start at 38012 /note=SSC: 38012-38176 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein SEA_BAUER_54 [Arthrobacter phage Bauer]],,NCBI, q3:s11 94.4444% 1.7605E-4 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.28, -4.165057350296871, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BAUER_54 [Arthrobacter phage Bauer]],,UYM26603,63.4921,1.7605E-4 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Wu, Grace /note=Auto-annotation: Glimmer and GeneMark both start at 38012. /note=Coding Potential: In the host-trained GeneMark, there is coding potential in the forward direction. In the self-trained coding potential map, we also found almost full coverage from the designated start site and stop site in the reverse direction, indicating a high coding potential. Both host-trained and self-trained maps agree that this gene is a forward strand. /note=SD (Final) Score: The final score is -4.165, the z-score is 2.28, and this start site is the LORF. Overall, the RBS score for the start site is optimistic (final score close to 0, and z-score close to 2 compared to other start sites). /note=Gap/overlap: overlap of 1 bp, this is within the reasonable range. /note=Phamerator: This gene is in Pham 9433. There are in total 2 members in this pham, there is cluster FN in this pham. However, Hum25_66 is shown as a singleton in this Pham. /note=Starterator: Ran on 04/09/2023. This gene is in Pham 24160. There are 1/4 genes in this pham as draft genes. The most often called start is 1 (3/3 non-draft gene), but Hum25_66 was called at start 3. Start 3 for Hum25_66 is at the same location where there is a gap for the other three nondraft genes. This evidence further suggests this gene as a singleton, and this gene is mostly likely to be an orpham. /note=Location call: The start codon is ATG, at start site 38012. This start codon is one of the most common start codons. The RBS and common start codon support the start at 38012 and stop at 38176. /note=Function call: In PhageDB BLASTp, the top hits with low e-values suggest an unknown function. For NCBI BLASTp, the top hits do not have high alignment scores (~40-50), and the functions were hypothetical proteins. In HHPred, the top hit shows an unknown function. The e-values for all the hits are extremely high, and the probability scores and coverages are low. The other suggested functions are not within the list of known functions by SEA PHAGES. The most likely function for this gene is NKF. /note=Transmembrane domains:TMHMM show no transmembrane domains, there are only outside and inside domains. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 38173 - 38556 /gene="74" /product="gp74" /function="hypothetical protein" /locus tag="Hum25_74" /note=Original Glimmer call @bp 38173 has strength 3.63; Genemark calls start at 38173 /note=SSC: 38173-38556 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PP635_gp44 [Arthrobacter phage Auxilium] ],,NCBI, q27:s86 77.1654% 1.25504E-57 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.563, -3.5719506502182776, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PP635_gp44 [Arthrobacter phage Auxilium] ],,YP_010655863,50.0,1.25504E-57 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Nguyen, Mya /note=Auto-annotation: Both Glimmer and GeneMark autoannotated this gene. They both agree on the start at 38173. The start codon called is GTG. /note= /note=Coding Potential: Coding potential is mostly in the forward direction, suggesting that this is a forward gene. Coding potential is found in both Genemark self and host. /note= /note=SD (Final) Score: The final score is -3.572, which is the best score out of the gene candidates. The z-score is 2.563, which is a good value since it’s over 2. /note= /note=Gap/overlap: The gap is -4, which is a bit concerning, however, this start site gives the longest ORF of 384. /note= /note=Phamerator: Date: 04/10/23. The pham number is 24160 and there are two members of this pham, this gene and another draft gene that is part of cluster AY. /note= /note=Starterator: This pham only has the two members. The start site of this gene is not the same as the start site of the other gene in this pham. There are no manual annotations and both genes are draft genes. This gene calls start 4, corresponding to start @ 38173. /note= /note=Location call: Given that Genemark and glimmer autoannotated this start site to be 38173 and Starterator calls the same start site, this is a gene at start 38173. Furthermore, this start site has the best SD score and z-score out of all the candidate start sites. /note= /note=Function call: Phagesdb function frequency only has one entry that is major tail protein. All Phagesdb BLAST entries have entries that all have functions labeled “function unknown”, with low e-values from e-51 to e-9. HHPRED shows varying function calls, but with low probability, low coverage, and high e-values, signifying weak evidence and weak hits. There are no CDD hits. From this evidence, there is no known function. /note= /note=Transmembrane domains: There are no predicted TMDs. This is not a transmembrane protein. /note= /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 38553 - 38753 /gene="75" /product="gp75" /function="hypothetical protein" /locus tag="Hum25_75" /note=Original Glimmer call @bp 38568 has strength 14.2; Genemark calls start at 38568 /note=SSC: 38553-38753 CP: yes SCS: both-cs ST: NA BLAST-Start: GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.399, -3.8540125308361968, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Robles, Angel /note=Auto-annotation: Glimmer calls the start at 38568.GeneMark calls the start at 38568. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host /note=SD (Final) Score: -3.056. It has a high Final Score which indicates it has a good sequence match /note=Gap/overlap: Gap of 11. This is small and reasonable /note=Phamerator: pham: 27211. Date 04/09/2023. It is only found in Hum25, thus is an orpham /note=Starterator: Report not found /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 38568 bp. /note=Function call: Function unknown. No NCBI BLAST hits. HHpred had hits with high E-values greater than 10. CDD had no hits /note=Transmembrane domains: TMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Vajragiri, Shreya /note=Secondary Annotator QC: Y: Great job on the annotations! I agree with all of your calls. However, I would adjust a few things. One, I don`t think you have to include the PhagesDB Blast result Hum25_draft, because that is the same phage so it`s definitely going to be the same, so you don`t have to talk about that. Two, for TMHMM, I don`t think a membrane protein necessarily has to have a TMD. The software will tell you whether its an inside or outside protein so maybe I would double check that! Also, maybe put N/A or something similar in the synteny box? CDS 38750 - 39133 /gene="76" /product="gp76" /function="hypothetical protein" /locus tag="Hum25_76" /note=Original Glimmer call @bp 38750 has strength 4.14 /note=SSC: 38750-39133 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein PP635_gp62 [Arthrobacter phage Auxilium] ],,NCBI, q4:s3 85.0394% 6.70283E-31 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.025, -4.641378341780927, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PP635_gp62 [Arthrobacter phage Auxilium] ],,YP_010655881,68.2243,6.70283E-31 SIF-HHPRED: SIF-Syn: Synteny results are not relevant to this region given all neighboring proteins are all no known function. /note=Primary Annotator Name: Hamid, Bilal /note=Auto-annotation: Glimmer gives a start site of 38750 no genemark start site is given. /note=Coding Potential: Some coding potential indicated though the host trained genemark though much less than the region just before and after. Much stronger gene potential is indicated by phage trained genemark. /note=SD (Final) Score: -4.641 at start site@38750. This is not the best final score, but the only site with a better final score has a gap of -46 which si too much voerlap for individual genes. /note=Gap/overlap: A Gap of -4 would make the most sense if this gene is part of a polycistronic operon as this is a sufficient but not excessive overlap. /note=Phamerator: 04/10/23 - pham 9004 with 6 members. Two of which are drafts. All members are arthrobacter phages represented in clusters AY, AP, and Fl. No functions are given for the proteins in this pham. /note=Starterator: Auto annotation gives start site 8 as most likely site. This site was manually annotated 100% (2/2) times it appears in this pham in non-draft genes. It is the auto annotated site in the two draft genes it appears in. /note=Location call: Start @38750 is the most likely start site based on phamerator and starterator data. /note=Function call: Phagesdb blastp indicates low e-values with unknown function genes Vibaki_56 (e-value of 2e-30) and Auxilium_62 (e-value of 9e-30). NCBI Blastp coorborates these hits with high similarity to hypothetical proteins PP635_gp62 from Arthrobacter phage Auxilium and HYP95_gp56 from Arthrobacter phage Vibaki with e-values 7e-31 and 2e-30 respectively. CDD data does not show any hits for known protein domains. HHpred gives 1 hit with a >100 e-value for a protein of unknown function, thus no great hits exist to compare function against. The final function call is no known function. /note=Transmembrane domains: DEEPTMHMM calls no transmembrane domains given this protein has no known function, it doesn’t alter any function call. /note=Secondary Annotator Name: Pisipati, Kirthana /note=Secondary Annotator QC: Great job with the notes - there`s just a few things you might want to add. In the auto-annotation section, make sure to mention what the start codon is and its relative probability. Also mention if the auto-annotated start site covers all of the coding potential for the putative ORF. In the gap/overlap section, include the length of the ORF so we know it`s reasonable. There could also be a little more evidence in your location call since Starterator is not sufficient (could be human error); maybe mention the final score or coding potential. Also, be careful with checking DUF proteins as evidence for HHpred! CDS 39130 - 39198 /gene="77" /product="gp77" /function="hypothetical protein" /locus tag="Hum25_77" /note=Genemark calls start at 39064 /note=SSC: 39130-39198 CP: no SCS: genemark-cs ST: NI BLAST-Start: GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.305, -4.033958934919761, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Glania - Changed start site to 39130. Though GeneMark calls 39064, the gap of -70 is quite large and has no coding potential. Start site 39130 only has a overlap of -4, which is evidence for an operon. /note= /note=Primary Annotator Name: Tran, Krysten /note=Auto-annotation: GeneMark only; calls start site (39064); start codons called - ATG and GTG /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF and the start site does cover all the coding potential. /note=SD (Final) Score: -2.523 is the most favorable score on PECAAN. The start codon is ATG. /note=Gap/overlap: There is an overlap of 70 bp; Slightly large but the overlapped area has no coding potential predicted; The other reasonable start site has an overlap of 4 (could be part of an operon) but the length is only 69 bp which is too short for a gene /note=Phamerator: This gene is in pham 15311 as of 04/09/23; Hum25 has start site 6 /note=Starterator: The start site number 6 was called in 1 of the 1 non-draft genes in the pham; There is only one other phage that has a gene in this pham; Both the non-draft phage and Hum25 call start site number 2 and both have 1 manual annotation for this start site number /note=Location call: Based on all the evidence gathered, the start site for this gene is likely at 39064. /note=Function call: No significant hits from PhagesDB BLASTp (3 hits total and all had high e-values). No hits from NCBI BLASTp. No HHpred hits were strong because they had low probabilities of >60 and high e-values of <15. The highest percent coverage is ~80%, however it has a low probability of ~52 and high e-value of 14. No CDD hits. Based on the lack of evidence gathered, I cannot hypothesize the function of the gene. /note=Transmembrane domains: No TMDs for this ORF but it is unclear if this is logical or not since the proposed function is NKF. /note=Secondary Annotator Name: Martin, Kyle /note=Secondary Annotator QC: I agree with your conclusion and based on your evidence everything looks good. CDS 39195 - 39458 /gene="78" /product="gp78" /function="hypothetical protein" /locus tag="Hum25_78" /note=Original Glimmer call @bp 39195 has strength 3.2; Genemark calls start at 39195 /note=SSC: 39195-39458 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein [Arthrobacter sp. StoSoilB13]],,NCBI, q9:s6 54.023% 8.35373E-6 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.612, -3.389881189012514, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Arthrobacter sp. StoSoilB13]],,WP_224089769,34.7368,8.35373E-6 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Critzer, Nicole /note=Auto-annotation:Glimmer and Genemark agree giving the auto-annotated start 39195 which has the common start codon ATG. /note=Coding Potential: The coding potential is strong and within the auto-annotated start and stop site. Should note that potential stops about 20 bp before the stop and does not start until about 20 bp after the start. /note=SD (Final) Score: -3.390, this is the least negative final score and therefore supports the auto-annotated start as the best start site since it indicates the best alignment. /note=Gap/overlap:-4, this overlap could be indicative of the gene being part of an operon so the overlap is considered reasonable. /note=Phamerator: Pham 25349 on 04/07/23 and 04/10/23, the gene is an orpham so this is not very useful information /note=Starterator: N/A, since gene is an orpham there was no starterator data to analyze /note=Location call: 39195, both Glimmer and GeneMark agreed on this auto-annotated start and the start contains all the coding potential. Additionally this start has the only z-score greater than 2 (2.612), has the least negative final score so best alignment, and produces the LORF. /note=Function call: NKF - the majority of hits in PhageDB BLAST were for genes of unknown function; the first hit with a function (terminase large subunit) had a high e-value and no other data to support the function assignment. NCBI BLASTp produced only two hits both of which were for hypothetical proteins and in HHpred all hits had e-values greater than 20. There were also no hits in CDD. So there is not enough evidence to support a function assignment. /note=Transmembrane domains:No hits, protein is inside the membrane, so this is not a transmembrane protein. /note=Secondary Annotator Name: Pisipati, Kirthana /note=Secondary Annotator QC: Great job with the notes! make sure you include the date that phamerator was accessed, but otherwise everything looks good CDS 39455 - 40183 /gene="79" /product="gp79" /function="helix-turn-helix DNA binding domain" /locus tag="Hum25_79" /note=Original Glimmer call @bp 39455 has strength 5.46; Genemark calls start at 39455 /note=SSC: 39455-40183 CP: no SCS: both ST: NI BLAST-Start: [helix-turn-helix DNA binding domain protein [Arthrobacter phage Sonali] ],,NCBI, q2:s3 99.1736% 2.46443E-18 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.772, -5.154811153158988, no F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding domain protein [Arthrobacter phage Sonali] ],,YP_009819753,40.0722,2.46443E-18 SIF-HHPRED: a.6.1.5 (B:1-69) automated matches {Escherichia phage [TaxId: 10710]} | CLASS: All alpha proteins, FOLD: Putative DNA-binding domain, SUPFAM: Putative DNA-binding domain, FAM: Terminase gpNU1 subunit domain,,,SCOP_d6hn7b1,22.314,97.9 SIF-Syn: There does not appear to be any synteny with other phages with this gene and the upstream/downstream genes. /note=Primary Annotator Name: Deal, Milena /note=Auto-annotation: Both Glimmer and GeneMark agree on a start site of 39455. This start has a start codon of ATG, which is a more common start codon. /note=Coding Potential: On host-trained and self-trained GeneMark, there appears to be coding potential throughout the whole gene. /note=SD (Final) Score: The final score for the auto-annotated start site is not great, but is relatively good compared to other start sites for this gene. The final score is not important though if this gene is part of an operon. The Z-score is not great either, but is better than some other start sites too. /note=Gap/overlap: There is a 4 base pair overlap with this gene and the upstream gene when using the auto-annotated start site, which suggests an operon. This is not the longest ORF, but the LORF overlaps with the upstream gene a substantial amount. /note=Phamerator: As of 4/5/23, this gene belongs to pham 74182. Many genes in the pham have the function of helix-turn-helix DNA binding domain protein. There are a variety of different clusters that have phages with this genes in this pham, including N, F, C, D, and singletons. There are 258 members in this pham and 21 are drafts. /note=Starterator: As of 4/10/23, the start number called the most often in the published annotations is 135 and was called in 134 of the 237 non-draft genes in the pham. The most called start site is not present in this gene, and the auto-annotated start site was (145, 39455), and this start site is present in no other genes in this pham. Therefore, starterator is not that informative in this case. /note=Location call: This is a real gene with a start site of 39455. This start site has the best final score and one of the best Z-scores. Both Glimmer and GeneMark use this start site in the autoannotation as well. Finally, this start site has an overlap with the upstream gene of 4 base pairs, which is strong evidence that this gene is part of an operon and that this is the real start site. /note=Function call: CDD found a domain of unknown function with an E-value of 7.25E-3; members of this family have a beta-sheet region followed by an alpha-helix and an unstructured C-terminus. There are hits on HHpred with low E-values and high probability and coverage that are DNA binding proteins, helix-turn helix domains, and alpha proteins. Therefore, this protein has the function of helix-turn-helix DNA binding domain. /note=Transmembrane domains: DeepTMHMM predicted no transmembrane domains. This makes sense for a DNA binding protein because I would not expect a protein to be a transmembrane protein and bind DNA. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 40180 - 40368 /gene="80" /product="gp80" /function="hypothetical protein" /locus tag="Hum25_80" /note=Original Glimmer call @bp 40180 has strength 7.63; Genemark calls start at 40180 /note=SSC: 40180-40368 CP: no SCS: both ST: NI BLAST-Start: GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.845, -2.979545903895001, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kretschmer, Thomas /note=Auto-annotation: Glimmer and Genemark agree that the start site is likely 40180. This has a z-score of 2.845 and is the start site with the longest open reading frame. /note=Coding Potential: There is solid coding potential in the forward direction in both host and self trained. The predicted start site contains the entire coding potential, and there is no contradictory coding potential in the reverse direction. /note=SD (Final) Score: -2.980 /note=Gap/overlap: -4 /note=Phamerator: There is one other phage with a gene in this pham and it is also a draft. /note=Starterator: Start 1 was also called in the auto-annotation of the other phage with a gene in this pham. There are 0 manual annotations. /note=Location call: Due to the above evidence, this is likley a real gene that starts on 40180. /note=Function call: NKF. There is very little evidence about the function of this gene. /note=Transmembrane domains: This gene is likely outside. There are 0 predicted TMDs. /note=Secondary Annotator Name: Robles, Angel /note=Secondary Annotator QC: CDS complement (40484 - 40726) /gene="81" /product="gp81" /function="hypothetical protein" /locus tag="Hum25_81" /note= /note=SSC: 40726-40484 CP: no SCS: neither ST: NA BLAST-Start: GAP: 68 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.115, -4.434549782840516, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Pan, Crystal /note=Auto-annotation: Glimmer and Genemark both don`t call a start site -- the computer did not do a good job at predicting all the genes. /note=Coding Potential: There is some coding potential in the fourth ORF, though not very strong. /note=SD (Final) Score: -4.435. This is the least negative score and the z-score is >2, so this is the most likely start site. With the lack of other data, this is the strongest evidence we have that this is a gene. /note=Gap/overlap: 68 bp. This is above the recommended 50bp gap. However, since this is the only gene that is reverse amongst a group of forward genes, it is possible this gene is transcribed in the opposite direction such that there is space between them for transcription promoters in both directions. /note=Phamerator: There is no information which pham this is. 23-04-11. /note=Starterator: The starterator does not have any information about this gene. /note=Location call: Though there is very little evidence, the evidence does support that this could possibly be a real gene with the start site 40726 (GTG). /note=Functional call: NKF. All the blast hits were unknown function, but all were very bad e-values. There is not enough information for us to determine what the function of this gene is. /note=Transmembrane domains: DeepTMHMM does not call any transmembrane domains. It says that it is an `outside` protein. CDS 40795 - 41019 /gene="82" /product="gp82" /function="hypothetical protein" /locus tag="Hum25_82" /note=Original Glimmer call @bp 40795 has strength 5.06; Genemark calls start at 40795 /note=SSC: 40795-41019 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDH55_gp58 [Arthrobacter phage Bennie] ],,NCBI, q1:s1 74.3243% 5.96547E-15 GAP: 68 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.547, -3.5445441946970573, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDH55_gp58 [Arthrobacter phage Bennie] ],,YP_009602493,52.6316,5.96547E-15 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Gowdy, Griffin /note=Auto-annotation: Called by both Glimmer and GeneMark at position 40795, ATG. /note=Coding Potential: This region, fully encapsulated by the called start, has strong coding potential. /note=SD (Final) Score: Both the final score and Z-score are the best, and are reasonable for a real gene. /note=Gap/overlap: There is a large gap of 426, however, this region lacks coding potential. Further, this is the best start and the final length of 225 is reasonable. /note=Phamerator: As of 4/5/23, pham 5639.This pham is found in only cluster AK and Hum25. No members of this pham call a function. /note=Starterator: Start 2 is found in all eight published genomes containing this pham, and called 100% of the time. Start 2 is at position 40795 in Hum25. /note=Location call: Given all available evidence, despite the gap, this is the best start location for stop at 41019. Conservation within the pham and strong scores support this conclusion. /note=Function call: NKF - No significant hits in HHpred or CD search. Numerous significant hits to cluster AK phages in phagesDB BLASTp (e-value 4e-13), and three hits to hypothetical proteins in Arthrobacter phages (e-values from 5.97e-15 to 1.99e-14). /note=Transmembrane domains: No transmembrane domains predicted by DeepTMHMM. /note= /note=Secondary Annotator Name: Robles, Angel /note=Secondary Annotator QC: CDS 41016 - 41543 /gene="83" /product="gp83" /function="protease" /locus tag="Hum25_83" /note=Original Glimmer call @bp 41016 has strength 8.84; Genemark calls start at 41016 /note=SSC: 41016-41543 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Arthrobacter koreensis] ],,NCBI, q1:s1 96.0% 2.8267E-43 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.272, -6.223337771675792, no F: protease SIF-BLAST: ,,[hypothetical protein [Arthrobacter koreensis] ],,WP_263127499,64.0719,2.8267E-43 SIF-HHPRED: d.153.1.4 (A:) HslV (ClpQ) protease {Thermotoga maritima [TaxId: 2336]} | CLASS: Alpha and beta proteins (a+b), FOLD: Ntn hydrolase-like, SUPFAM: N-terminal nucleophile aminohydrolases (Ntn hydrolases), FAM: Proteasome subunits,,,SCOP_d1m4ya_,95.4286,99.9 SIF-Syn: /note=Primary Annotator Name: Hughes, Audia /note=Auto-annotation: Glimmer and Genemark call start. Both call the same start: 41016 /note=Coding Potential: There is coding potential within ORF. Is present on Host-Trained and Self gene marks. /note=SD (Final) Score: -6.223. Best Final score on PECAAN. /note=Gap/overlap:-4 suggests operon. The start site with the best gap/overlap is the same start site with the longest ORF and best Final score. /note=Phamerator: Pham: 8123. Date 4/5/23. Found in Five members of cluster GB and 2 singletons. /note=Starterator: The most commonly called start was start site 1. For Hum25 start site 1 was 41016. /note=Location call: Start: Based on evidence is a real gene and starts at 41016 /note=Function call: Protease. Phagesdb: 3 Highest hits with function calls, calls protease. Including phage Pumpernickel, Leeroy Jenkins, and WaterT. HHPred: 10+ protease/proteasome hits, many with strong e-values (5.6e-20, 7.6e-20,7.3e-20). CDD: 1 hit for a protease with slightly significant e-values (0.0000714539) /note=Transmembrane domains: No TMDs, not a transmembrane protein /note=Secondary Annotator Name: Critzer, Nicole /note=Secondary Annotator QC: I agree with the start site call and the function call seems to have a lot of evidence to back it up. Also you say there is evidence to support the location call but don`t specify what it is, are you using the fact that Glimmer and GeneMark agree and that starterator says its the `most annotated` and has 3 MAs. For the function call be more specific which hits correspond to a protease function I think you can cite the strong hits observed in PhagesDB and NCBI BLAST as well as one of the HHpred hits. Additionally for the phamerator section I would mention which five members by name or just name two so people can check the evidence you are trying to cite as support. CDS 41552 - 41713 /gene="84" /product="gp84" /function="hypothetical protein" /locus tag="Hum25_84" /note=Original Glimmer call @bp 41552 has strength 4.08; Genemark calls start at 41552 /note=SSC: 41552-41713 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PP637_gp88 [Arthrobacter phage Persistence] ],,NCBI, q1:s1 100.0% 1.3036E-18 GAP: 8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.783, -4.0178693334748665, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PP637_gp88 [Arthrobacter phage Persistence] ],,YP_010656089,79.2453,1.3036E-18 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Estampa, Julia /note=Auto-annotation: Glimmer and GeneMark both call the gene and agree that the start site is at 41552 bp. The start codon for both is ATG. Host-Trained and Self-Trained GeneMark both reflect relatively high coding potential that is consistent with the ORF. The chosen start site covers all the coding potential. /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF. Host-Trained and Self-Trained GeneMark both reflect similar high coding potential that is consistent with the ORF in the forward direction. The chosen start site covers approximately all the coding potential. However, there is a bit of coding potential in the reverse direction under Self-Trained GeneMark. /note=SD (Final) Score: The SD score is -4.018, which is the best on the list, and this start site includes the LORF. /note=Gap/overlap: The gap is 8 bp long upstream from the gene, and is reasonable. The length of the gene is acceptable (162 bp). /note=Phamerator: Pham: 4758. Date found: 04/03/23. The majority of the phages in this pham of similar gene length are in cluster AY, such as phage EvePickles_94, phage Faja_94, and phage Gorpy_91. /note=Starterator: Start number 15 was called the most often in the published annotations, and it was called in 9 of the 12 non-draft genes in the pham. Hum25_76 called this the “most annotated” start. Start number 15 is found in 15 of 19 (78.9%) of genes in this pham. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Gathered evidence suggests this is a real gene with good coding potential and that the strongest candidate for the start site is 41552 bp as supported by Glimmer and GeneMark. /note=Function call: NKF. The majority of phagesDB BLAST hits revealed “function unknown” with the top hit having an E-value of 3e-16. NCBI BLAST hits revealed similar results suggesting NKF. No CDD hits returned. HHPred did not have any significant hits and the top hit suggests unknown function with an E-value of 0.82 and a probability of 88.77%. /note=Transmembrane domains: Since there are no predicted TMHs or TMDs returned from PECAAN and DeepTMHMM, it is not a membrane protein. /note=Secondary Annotator Name: Kim, Cindy /note=Secondary Annotator QC: I agree with the annotation above. CDS 41713 - 41844 /gene="85" /product="gp85" /function="hypothetical protein" /locus tag="Hum25_85" /note= /note=SSC: 41713-41844 CP: yes SCS: neither ST: NA BLAST-Start: [hypothetical protein SEA_GORPY_90 [Arthrobacter phage Gorpy] ],,NCBI, q1:s3 100.0% 2.13026E-16 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.285, -4.154841807616074, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_GORPY_90 [Arthrobacter phage Gorpy] ],,UVF61052,77.7778,2.13026E-16 SIF-HHPRED: SIF-Syn: n/a /note=Primary Annotator Name: Vajragiri, Shreya /note=Auto-annotation: There is no auto-annotated start site for either Glimmer or GeneMark, because this gene was manually added. /note=Coding Potential: Both self-trained and host-trained GeneMark show weak coding potential for this gene in both forward and reverse strands. /note=SD (Final) Score: -4.155. It’s the best score. /note=Gap/overlap: -1. This is a very small overlap, and is very reasonable (it could be an operon). /note=Phamerator: There is no Pham. /note=Starterator: There is no Pham. /note=Location call: Evidence supports that the gene is real because there is weak coding potential, and the gene is very similar to other genes in PhagesDB blast. The start site is 41713 based on coding potential and final score. /note=Function call: NKF. There are BLAST hits (within PhagesDB and on NCBI, for both nucleotide and protein sequences) with low e-values, such as in Gorpy or Sakai, but these hits are all also of no known function. There are no CDD hits. Any HHPred hits are insignificant, with too high e-values. /note=Transmembrane domains: No similar hits, but the protein is predicted to be an outside protein. /note=Secondary Annotator Name: Sacristan, Ariana /note=Secondary Annotator QC: Notes were very concise. Reported on all the available information and made NKF call very clear. CDS 41844 - 42065 /gene="86" /product="gp86" /function="hypothetical protein" /locus tag="Hum25_86" /note= /note=SSC: 41844-42065 CP: no SCS: neither ST: NI BLAST-Start: [hypothetical protein SEA_ATUIN_233 [Arthrobacter phage Atuin]],,NCBI, q3:s8 84.9315% 1.09554E-22 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.161, -4.927720537609461, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ATUIN_233 [Arthrobacter phage Atuin]],,UDL16805,55.4348,1.09554E-22 SIF-HHPRED: SIF-Syn: NA /note=Primary Annotator Name: Zhu, Yichen /note=Auto-annotation: Gene (stop@42065 F) /note=Coding Potential: high /note=SD (Final) Score: -4.928 /note=Gap/overlap: -1 /note=Phamerator: NA /note=Starterator: NA /note=Location call: 41844 /note=Function call: No known function as all hits are function unknown. /note=Transmembrane domains: 2 /note=Secondary Annotator Name: Barden, Sophia /note=Secondary Annotator QC: This gene looks scary. Because there are no Glimmer or GeneMark starts, be sure to explain why you chose the start site that you did! (best final score or z score? best fit to coding potential? smallest gap?) I don’t know what else to add for Phamerator or Starterator so maybe ask Dr. Freise about that. I would explain the significance to the TMD’s that you identified (can this be a clue to the gene function?) CDS 42055 - 42285 /gene="87" /product="gp87" /function="hypothetical protein" /locus tag="Hum25_87" /note=Original Glimmer call @bp 42055 has strength 3.75; Genemark calls start at 42055 /note=SSC: 42055-42285 CP: yes SCS: both ST: NA BLAST-Start: GAP: -11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.305, -3.9716456141594323, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: There does not seem to be any synteny. /note=Primary Annotator Name: Le, Vivian /note=Auto-annotation: Both Glimer and GeneMark call the start of 42055. /note=Coding Potential: Reasonable coding potential is found for both GeneMark self and host. The start site does not cover all the coding potential. However, this start site covers the most coding potential. It is the best out of the ones suggested. There was no significant overlap either. /note=SD (Final) Score: -3.972. This is the best final score. /note=Gap/overlap: There is a 341 bp gap. When looking at GeneMark, there seems to be some coding potential before the gene. This could mean that another gene should be added before. Based on the coding potential, it looks like 2 potential genes could be added before the one. /note=Phamerator: The pham number as of 04.05.2023 is 25994. Hum25_77 is the only gene in that pham. /note=Starterator: Starterator report was not found. The gene is most likely an orpham, since it is also the only gene in the pham currently. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 42055. /note=Function call: Based on the PhagesDB BLASTp and NCBI BLASTp, there is currently no known function, because there were no strong hits. HHpred and CDD also did not have any significant hits. /note=Transmembrane domains: There were 0 predicted TMD`s. The topology graph also showed the gene/protein to only be on the inside. Not much can be said right now, because there is no known function. CDS 42285 - 42626 /gene="88" /product="gp88" /function="HNH endonuclease" /locus tag="Hum25_88" /note= /note=SSC: 42285-42626 CP: yes SCS: neither ST: NA BLAST-Start: [hypothetical protein [Arthrobacter roseus] ],,NCBI, q1:s1 94.6903% 2.28068E-34 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 0.493, -7.84036675130217, no F: HNH endonuclease SIF-BLAST: ,,[hypothetical protein [Arthrobacter roseus] ],,WP_205161922,66.055,2.28068E-34 SIF-HHPRED: SIF-Syn: