CDS 56 - 520 /gene="1" /product="gp1" /function="terminase, small subunit" /locus tag="RavenCo17_1" /note=Original Glimmer call @bp 56 has strength 5.86 /note=SSC: 56-520 CP: yes SCS: glimmer ST: SS BLAST-Start: [terminase small subunit [Gordonia phage Mcklovin]],,NCBI, q1:s1 100.0% 2.67583E-105 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.957, -3.4175091270247986, yes F: terminase, small subunit SIF-BLAST: ,,[terminase small subunit [Gordonia phage Mcklovin]],,QFP96788,100.0,2.67583E-105 SIF-HHPRED: Terminase small subunit; genome packaging, bacteriophage, DNA binding, VIRAL PROTEIN; 1.4A {Enterobacteria phage HK97},,,6Z6E_B,64.9351,99.8 SIF-Syn: Terminase small subunit, downstream gene is terminase large subunit which belongs to pham 55975. Phages Kita and Polly also show this order of gene function as well. /note=Primary Annotator Name: Abana, Juana /note=Auto-annotation: Glimmer calls the start at 56 however GeneMark did not call a start. /note=Coding Potential: The coding potential in this ORF is predominantly seen in the forward strand which shows that this is a forward gene. There is coding potential found in both Host GeneMark and Self GeneMark. Moreover, both the Host and Self GeneMark include all of the coding potential from the chosen Glimmer start site. /note=SD (Final) Score: The final score is the best option at -3.418 since it is the least negative and the Z-score is the highest at 2.957. /note=Gap/overlap: There is no gap/overlap upstream of the gene since this is the first gene in the phage genome. Moreover downstream of the gene there is a 22 bp overlap with the following gene. /note=Phamerator: The pham number as of March 31, 2022 is 96329. The gene is conserved in phages like Kita and Polly that also belong to the CZ cluster. /note=Starterator: Start site 3 in Starterator was manually annotated in 26/27 non-draft genes in this pham. Start 3 is 56 in RavenCo17. This evidence agrees with the start site predicted by Glimmer. /note=Location call: Based on the information above this is a real gene and the most likely start is 56 /note=Function call: Terminase small subunit. The top two phagesdb BLAST hits have the function of terminase small subunit and have an e-value of 9e-86. Most of the NCBI BLAST hits also have the function of terminase small subunit, the two selected have 100% coverage, 99.3506% identity, and an e-values of 2.67583e-105 and 7.84433e-105. HHpred had two hits for terminase small subunit which had 99.8% and 99.6% probability, 64.9351% and 60.3896% coverage, and e-values of 8.2e-19 and 1e-14. CDD also had two hits that had 37.931% and 50% identity, 75.3247% and 58.4416% coverage and e-values of 6.5711e-33 and 3.06148e-32. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Sharma, Devshi /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered, including coding potential, starterator, gap/overlap, synteny, and final score. CDS 498 - 2222 /gene="2" /product="gp2" /function="terminase, large subunit" /locus tag="RavenCo17_2" /note=Original Glimmer call @bp 498 has strength 10.46; Genemark calls start at 498 /note=SSC: 498-2222 CP: yes SCS: both ST: SS BLAST-Start: [terminase large subunit [Gordonia phage Madeline]],,NCBI, q1:s1 100.0% 0.0 GAP: -23 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.127, -4.460666092178802, no F: terminase, large subunit SIF-BLAST: ,,[terminase large subunit [Gordonia phage Madeline]],,QDH47606,99.8258,0.0 SIF-HHPRED: Terminase large subunit; genome packaging, bacteriophage, ATPase, nuclease, VIRAL PROTEIN; HET: BR; 2.2A {Enterobacteria phage HK97},,,6Z6D_A,87.8049,100.0 SIF-Syn: Terminase, large subunit. The upstream gene is terminase, small subunit and the downstream gene is NKF. There is synteny for this gene in BunnyBear (CZ) and TimTam (CZ1), which both have the same gene layout, and the same length of overlap with the upstream gene. BunnyBear has the same overlap with the downstream gene, and TimTam`s overlap is slightly different, but overall very similar. /note=Primary Annotator Name: Charton, Chris /note=Auto-annotation start source: Glimmer and GeneMark both call for the start site @ 498. /note=Coding Potential: There is significant coding potential found for this gene in the third forward reference frame. This is seen in both GeneMark self and host. /note=SD (Final) Score: -4.461. This is the highest final score among the potential start sites. This corresponds with the highest Z-Score as well at 2.127 /note=Gap/overlap: -23bp. This is a large overlap with the upstream gene. This overlap region however does not have coding potential for the upstream gene`s reading frame. Additionally this start site has the best SD and Z-Scores of potential sites. This 23bp overlap with the upstream gene is conserved in other phages such as Bunny Bear (CZ) and TimTam (CZ1) /note=Phamerator: As of 3/31/2022 the pham is 55975. This is conserved in Bunny Bear (CZ) and TimTam (CZ1), members of the same cluster, but different subclusters. /note=Starterator: Start site 19 is manually annotated in 29 of 149 phages, and is called in 97.1% of phages when present. This start site corresponds to a start site of 498. /note=Location call: There is strong evidence this is a real gene with a start site of 498. /note=Function call: Terminase, large subunit. There are numerous phageDB BLAST hits with the function of terminase, large subunit with an e-value of 0. There are additional NCBI BLAST hits with this same function and e-values of 0. In HHpred there are hits for terminase, large subunit as well with 100% probability, good coverage and low e-values. /note=Transmembrane domains: 0. Neither TMHMM or TOPCONS predict any TMDs. /note=Secondary Annotator Name: Carreon, Justin /note=Secondary Annotator QC: Based on evidence of coding potential, and conservation of the start number in phages of the cluster with the start site, I agree that the gene starts at position 498. Given the synteny and the strong evidence towards this gene`s function as a large terminase subunit, I agree with the function call. CDS 2219 - 2380 /gene="3" /product="gp3" /function="membrane protein" /locus tag="RavenCo17_3" /note=Original Glimmer call @bp 2264 has strength 5.65; Genemark calls start at 2219 /note=SSC: 2219-2380 CP: yes SCS: both-gm ST: SS BLAST-Start: [membrane protein [Gordonia phage Bunnybear]],,NCBI, q1:s1 98.1132% 1.21395E-13 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.206, -5.905440708311495, no F: membrane protein SIF-BLAST: ,,[membrane protein [Gordonia phage Bunnybear]],,QNJ58059,90.566,1.21395E-13 SIF-HHPRED: SIF-Syn: NKF, upstream gene is terminase, downstream is portal protein. Synteny supported in Phage BoyNamedSue. /note=Primary Annotator Name: Nguyen, Calvin /note=Auto-annotation: Although Glimmer calls the start site at 2264, GeneMark calls the start site at 2219. There is ample evidence from coding potential, starterator, phamerator, and the starting codon to suggest that the true start site is located at 2219. /note=Coding Potential: Coding potential in both GeneMark Self and Host is prominent and contained between the GeneMark chosen start site of 2219 and the stop site of 2380. /note=SD (Final) Score: The final score of the Glimmer suggested start is -6.423, and the Z-score is 1.706. These are not the best scores among the available start sites; the GeneMark suggested start site has a higher Z-score of 2.206 with a final RBS score of -5.905. Because GeneMark’s SS is likely an operon, its final RBS score is not as determinant. /note=Gap/overlap: Glimmer’s suggested start of 2264 has a gap of 41, which is a reasonably sized gap. However, this start site leaves it at a length of 117 bp, making it on the shorter side. GeneMark’s suggested start of 2219 has an overlap of -4 and has a start codon of ATG, suggesting that it may be an operon; it also has a longer length of 162 bp. The gene appears to be conserved within other phages of subcluster CZ1, such as Eviarto. /note=Phamerator: As of 3/31/22, the pham number is 21202. According to the Glimmer SS, the gene length is not conserved in RavenCo17 among other members of its cluster; RavenCo17 has a gene length of 117 bp, while other members of cluster CZ have bp lengths of 162 bp. The GeneMark SS provides more solid evidence of gene conservation, as RavenCo17 would have a gene length of 162 bp that is more commonly conserved in the pham. /note=Starterator: According to Starterator, Start Site 5 was manually annotated in 20 out of 26 non-draft genes. Start Site 5 is equivalent to 2219 in RavenCo17, which is congruent to GeneMark’s prediction. /note=Location call: According to the previous evidence provided (minimum gene coding length, coding potential contained in stop/start sites, no reversals), this gene is most likely real. Additionally, the start site for the gene is most likely located at 2219 as according to evidence from Starterator. /note=Function call: Membrane protein. While the top three results of the Phagesdb BLAST and HHPRED yielded no known functions for the gene in question, NCBI Blast recorded several high strength hits for membrane proteins with the best e-score being 1.2 e-13. This membrane protein call is corroborated by the the presence of TMDs, allowing us to conclude that this is likely to be a membrane protein. /note=Transmembrane domains: TMHMM predicts a single TMD. Additionally, TOPCONS also provides a significant prediction for a TMD. Therefore, we can conclude that this is a membrane protein. /note=Secondary Annotator Name: Koetters, Owen /note=Secondary Annotator QC: I agree with the start site prediction, however believe this could be a membrane protein as both TmHmm and TOPCONS predict a TM domain. CDS 2377 - 3798 /gene="4" /product="gp4" /function="portal protein" /locus tag="RavenCo17_4" /note=Original Glimmer call @bp 2377 has strength 9.24; Genemark calls start at 2377 /note=SSC: 2377-3798 CP: yes SCS: both ST: SS BLAST-Start: [portal protein [Gordonia phage WelcomeAyanna] ],,NCBI, q1:s1 99.3658% 0.0 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.161, -4.4536196888290105, no F: portal protein SIF-BLAST: ,,[portal protein [Gordonia phage WelcomeAyanna] ],,QCW22190,97.2516,0.0 SIF-HHPRED: Phage portal protein, HK97 family; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_A,76.1099,100.0 SIF-Syn: Portal protein, upstream gene is terminase, downstream gene is capsid maturation protease, just like in phages Agueybana, AlumE, and Antonio in cluster CZ. /note=Primary Annotator Name: Reyes, Glania /note=Auto-annotation: Both Glimmer and GeneMark call the gene and agree on the same start site at 2377 bp. /note=Coding Potential: Coding potential in the ORF is very high on the forward strand, indicating that this is a forward gene. Coding potential is found in GeneMark Self and Host. /note=SD (Final) Score: -4.454. This is not the best Final Score, but it does not matter too much because the -4 bp overlap suggests that this gene is part of an operon. This overlap is also conserved in other phages in cluster CZ (Agueybana and AlumE). /note=Gap/overlap: -4 bp. This overlap suggests that this gene is part of an operon. The overlap is also conserved in other phages in cluster CZ (Agueybana and AlumE). /note=Phamerator: The pham number as of March 31, 2022 is 13986. The gene is conserved in phages Agueybana, AlumE, Antonio, and BatStarr, all of which are in cluster CZ with a conserved length of 1419 bp. The function call for all of these genes is a portal protein and it is consistent between Phamerator and the phams database. /note=Starterator: There are 38 members in this pham, 4 of which are drafts. Start 2 is found in 37/38 of genes in the pham, which correlates to a start site of 2377 bp for RavenCo17. /note=Location call: Considering the above evidence, this gene is a real gene and has a start site at 2377 bp. Starterator agrees with Glimmer and GeneMark. /note=Function call: Portal protein. Multiple phagesDB BLAST has hits with the suggested function portal protein with very small e values of 0 to 3e-08. A majority of NCBI BLAST also has hits with the function portal protein with small e values ranging from 0 to 9e-143 (Top 6 hits: 95% query cover, ~95% percent identity, E-value of 0.0). CDD had one hit, which was for a phage portal protein, with an E-value of 1.45e-40 for specific hits, placing it in the Phage_portal superfamily. HHpred shows multiple hits for phage portal protein. The top hit from HHpred has 100% probability and an e-value of 4.7e-32. /note=Transmembrane domains: TmHmm predicts no TMDs. Topcons also predicts no TMDS. This makes sense because the function for the gene is a portal protein, which turns into a hole for the phage to eject its genomic information into its host, so it is not a transmembrane protein. /note=Secondary Annotator Name: Mascareno, Greta /note=Secondary Annotator QC: I agree with the start site called and function call. I also agree that there is evidence for an operon. CDS 3776 - 4705 /gene="5" /product="gp5" /function="capsid maturation protease" /locus tag="RavenCo17_5" /note=Original Glimmer call @bp 3815 has strength 10.69; Genemark calls start at 3815 /note=SSC: 3776-4705 CP: yes SCS: both-cs ST: SS BLAST-Start: [capsid maturation protease [Gordonia phage Mcklovin]],,NCBI, q5:s1 98.7055% 0.0 GAP: -23 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.196, -2.338663114094478, yes F: capsid maturation protease SIF-BLAST: ,,[capsid maturation protease [Gordonia phage Mcklovin]],,QFP96792,97.3684,0.0 SIF-HHPRED: Peptidase_S78 ; Caudovirus prohead serine protease,,,PF04586.20,62.4595,99.9 SIF-Syn: The function call for this gene is capsid maturation protease. Compared to Faith5x5, the genes have the same call. The gene upstream Faith5x5 and RavenCo17 are both portal proteins, and downstream they are both major capsid proteins. This synteny is also shown with phage Moosehead, with the gene upstream being portal protein and downstream being major capsid protein. The overlap is very similar, showing that the gene has synteny. /note=Updated location call 7-13-22 to 3776 (start site #8). Manually annotated 15 times. Creates larger overlap with upstream gene, but fits starterator data much better. RBS score also better, and creates LORF. /note=Start site 10 (3788) is another option. it is never annotated in other genes in the pham, but is also not FOUND in any other genes in the pham. 3788 produces a gene length (918bp) closer to that of many other genes in the pham. -Amanda Freise /note=Primary Annotator Name: Bovee, Alyson /note=Auto-annotation: Both Genemark and Glimmer called the start site 3815. The start codon is ATG which is very common. Because they both called the same site, it is highly likely this is the correct start site. /note=Coding Potential: The self and host graphs show very good coding potential through the start site 3815 and the stop site 4705, showing this is a real gene. /note=SD (Final) Score: The final score is -5.020, which is the fourth best final score. This is evidence that it could be the correct start site/is a real gene because it is a good score compared to the others. /note=Gap/overlap: The gap shown is 16bp, which is a good indicator that a gene cannot fit upstream, thus the start site must be correct. /note=Phamerator: The pham number for RavenCo17 is 21479 (called 3/31/2022) and there are 37 phages with the same pham number. Most of these members are in cluster CZ. The base pair length is highly conserved between 890-920 base pairs, which is a good indicator that it is a real gene. RavenCo17 was compared to phages Trumpet and Polly and shows similarities in base pair length and cluster. The main difference is that these two phages have 912bp and RavenCo17 has 891bp. /note=Starterator: Starterator also called the pham to be 21479 and shows 37 members in this pham with 3 drafts. Starterator says RavenCo17 does not have the most annotated start site, and the start site is only shown in 1 or 37 phages in the cluster, but it is called 100% of the time when present, which shows this is likely the best start site. /note=Location call: Based on the guiding principles met, such as a final score of -5.020 and a start site called 100% of the time, the correct start site for this gene is 3815. /note=Function call: The function of this site is likely a capsid maturation protease. This was called by BLAST, and is compared to phage Mcklovin, which has an e-value of 1e-161. HHPRED calls the function to be a prohead serine protease as compared to Madeline, with a probability of 99.9 and an e-value of 4.7e-21. Based on the approved functions list, capsid maturation protease and serine protease are similar functions. /note=Transmembrane domains: There are no transmembrane domains as called by TmHmm and Topcons. /note=Secondary Annotator Name: Shaikh, Iman /note=Secondary Annotator QC: I agree with the start site called for this gene due to the coding potential, small gap, SD score, and auto-annotated start sites. I also agree that the function of this gene is a capsid maturation protease based on evidence from PhagesDB and NCBI BLAST. CDS 4702 - 6351 /gene="6" /product="gp6" /function="major capsid protein" /locus tag="RavenCo17_6" /note=Original Glimmer call @bp 4702 has strength 11.02; Genemark calls start at 4702 /note=SSC: 4702-6351 CP: yes SCS: both ST: SS BLAST-Start: [major capsid protein [Gordonia phage Bunnybear]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.015, -3.5527928455068705, yes F: major capsid protein SIF-BLAST: ,,[major capsid protein [Gordonia phage Bunnybear]],,QNJ58062,99.0893,0.0 SIF-HHPRED: Major capsid protein; Virus Procapsid particles, VIRUS; 5.2A {Enterobacteria phage HK97},,,3QPR_D,91.0747,100.0 SIF-Syn: Major capsid protein; upstream gene is capsid maturation protease (pham 21479) and downstream gene is NKF (pham 21500), just like in cluster CZ1 phages like Antonio, Agueybana, and AlumE. /note=Primary Annotator Name: Cini, Victoria /note=Auto-annotation: Gimmer and GeneMark. Both call an ATG start site at 4702. /note=Coding Potential: The gene has high coding potential predicted within the putative ORF. In the first panel for both H-GeneMark and S-GeneMark maps, we observe the highest coding potential, indicating that this is a forward gene. We observe coding potential throughout the entirety of the gene sequence, and the chosen start site captures all the coding potential. /note=SD (Final) Score: The chosen start site of 4702 has best final score (-3.553) and z-score (3.015) compared to other potential start sites on PECAAN. /note=Gap/overlap: 4bp overlap with the upstream gene, indicating that these genes are part of an operon. All the other potential start sites have gap sizes >= 20bp. /note=Phamerator: Pham 14117 has 38 members with 34 non-draft members (3/30/22). Gene is conserved and present in many cluster CZ1 phages like Agueybana, AlumE, and Antonio. Note that there are no other cluster CZ8 phages to compare this gene to. /note=Starterator: Start number 12, corresponding to the basepair coordinate 4702 in RavenCo17, is the most annotated start in Starterator. This site has manual annotations in 28/34 non-draft genes in the pham. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 4702. /note=Function call: Major capsid protein. The top Phagesdb BLAST hits (score > 1070 and e-value=0) come from cluster CZ1 phages and have a function of major capsid protein assigned to them. These genes have the same pham of 14117, sequence length of 549, and gene number of 6 like RavenCo17`s gene (stop @6351 F). The top NCBI BLAST hits (e-value =0 , 100% coverage, 98%+ identity, score 540+) also come from cluster CZ1 phages and have major capsid protein function associated with them. Both databases have hits with the best possible e-value of 0, indicating a high likelihood of this gene encoding for a major capsid protein. HHpred also had strong hits for major capsid protein. Likewise CDD had strong hits for phage major capsid protein families. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Hu, Yixiao (Sherry) /note=Secondary Annotator QC:Hu, Yixiao (Sherry), Yes CDS 6437 - 6775 /gene="7" /product="gp7" /function="hypothetical protein" /locus tag="RavenCo17_7" /note=Original Glimmer call @bp 6437 has strength 6.4; Genemark calls start at 6437 /note=SSC: 6437-6775 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_MADELINE_7 [Gordonia phage Madeline]],,NCBI, q1:s1 100.0% 4.09079E-57 GAP: 85 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.957, -2.8454123590742793, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_MADELINE_7 [Gordonia phage Madeline]],,QDH47611,87.8261,4.09079E-57 SIF-HHPRED: SIF-Syn: Unknown function gene (pham 21500), upstream gene is major capsid protein (pham 14117), downstream gene is a head to tail adaptor (pham 100884), just like in phages Bosnia and Antonio. /note=Primary Annotator Name: De Schutter, Elena /note=Auto-annotation: Glimmer and GeneMark both call the gene at a start site of 6437. /note=Coding Potential: The coding potential covers the entire gene for both GeneMark Self and Host and is very high. /note=SD (Final) Score: The final score is -2.845 and the z score is 2.957, these are good values and the only other option for a start site has very poor values for both. /note=Gap/overlap: The gap is 85 which is relatively high but the other option has a much larger gap. I don’t think the gap of 85 warrants a gene in between this and the upstream gene since there is no coding potential there and the gap is not big enough. This gap is also seen in other phages in the CZ cluster (examples: Antonio and Bosnia). /note=Phamerator: pham 21500 as of 03/31/22, this pham has 2 main clusters present: CZ (the same cluster as our phage and includes phages Antonio and Bunnybear) and I as well as a singleton phage and one without a cluster. The only function call present for two of the genes is a head-to-tail connector protein /note=Starterator: The most called start site is #2 (which is present in our phage) and it was called 34/34 times for the non-draft genomes. This corresponds to start site 6437 and was also predicted by Glimmer and GeneMark. /note=Location call: Because Glimmer, GeneMark, and Starterator all agree and because of the rest of the evidence provided, this gene is real and its start site is 6437. /note=Function call: Unknown function, the top hits on phagesDB all have unknown function (they have an e-value of 4e-48). HHPRED did not have any relevant hits, CDD did not have any good hits since the e-value was not good and neither was the coverage (the one hit had 56.25% coverage and an e-value of 0.00114338), and NCBI blast only had hits for hypothetical proteins which had good values with 100% coverage, 83+% identity, and an e-value of 4.09e-57. While there do seem to be a few hits for "head-to-tail adaptor", calling this function requires very specific hits on HHPRED according to SEA-Phages and these are not seen here. /note=Transmembrane domains: There are no transmembrane domains and so this is not a membrane protein. /note=Secondary Annotator Name: Hu, Yixiao (Sherry) /note=Secondary Annotator QC: Hu, Yixiao (Sherry), Yes CDS 6772 - 7320 /gene="8" /product="gp8" /function="head-to-tail adaptor" /locus tag="RavenCo17_8" /note=Original Glimmer call @bp 6772 has strength 8.29; Genemark calls start at 6772 /note=SSC: 6772-7320 CP: yes SCS: both ST: SS BLAST-Start: [head-to-tail connector protein [Gordonia phage Eyre] ],,NCBI, q1:s1 100.0% 1.33998E-110 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.199, -5.222075281493363, no F: head-to-tail adaptor SIF-BLAST: ,,[head-to-tail connector protein [Gordonia phage Eyre] ],,YP_009292399,92.8571,1.33998E-110 SIF-HHPRED: Adaptor protein Rcc01688; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_D,99.4505,99.9 SIF-Syn: My gene is a head-to-tail adapter. Upstream gene is NKF while downstream head to tail stopper. Same as in Faith5x5. /note=Primary Annotator Name: Sharma, Devshi /note=Auto-annotation: Both Glimmer and Genemark agree with each other and say that the start site is 6772. Since they both agree with each other, this is likely the start site. The start codon that is called is GTG. /note=Coding Potential: Both self and host genemark graphs have very good coding potential predicted within the potential open reading frame. The start site is seen in the coding potential in both sequences too. When I compared the gene sequence to other sequences, there were many similarities which helps me decide if this is a real gene. The start site covers the entire coding potential. /note=SD (Final) Score: -5.222 is the final score. Out of all the scores, this is the second least negative. The least negative value has a really large gap which makes me think that the final score of -5.222 is the best one. /note=Gap/overlap: -4. The overlap is -4 which means there could be a potential operon present here. I do not believe that there are alternative start candidates because of the evidence given from Glimmer and Genemark. This reading frame is the LORF. The length of the gene is acceptable being around 549 base pairs long. /note=Phamerator: Pham 100884. 03/31/2022. I can see that the function shown is a head to tail adaptor and that the length of the gene is roughly 500+ which is conserved throughout each sequence. There are a variety of clusters present within this pham. The gene is conserved within other members of the pham and I compared them to Agueybana_8 and Andrew_9. Both of these had similar base pair lengths to my gene and also the same function. /note=Starterator: Pham has 34 members, 5 are drafts. Start 2 (6772) called 29/29 times. /note=Location call: I believe this gene is a real gene given all the evidence presented above. The best start site is 6772 for this gene. /note=Function call: Phamerator helped me find the function of the gene because the head to tail adaptor was seen across many genes in this pham. Looking at blast, I can see that my gene sequence function is a head to tail adaptor. I used Eyre which is head to tail connector protein with an e value of 6e-91. On HHPRED, the percent coverage, probability, and e values were good for numerous hits and in order to call the head to tail adapter function, there were necessary alignments which consisted of SPP1 which were met in 6TE9_D and 5A21_C. For NCBI BLAST, accession YP_009292399 had a greatest e value and the function called was also head-to-tail connector protein. This is very convincing evidence that this gene sequence is also a head-to-tail connector protein. /note=Transmembrane domains: No transmembrane domains. Both TMHMM or TOPCONS did not provide any TMDs which did not help me figure out the function of my gene. /note=Secondary Annotator Name: Abana, Juana /note=Secondary Annotator QC: I agree with the called start site however on starterator it says that it is start number 27 and it has 33/300 MA`s. Moreover, the pham of this gene is 102427 and it has 335 members. Also make sure to select Yes for the "All GM Coding Capacity" as well as check in the evidence for the Phagesdb BLAST. Other than that everything else looks good. CDS 7317 - 7679 /gene="9" /product="gp9" /function="head-to-tail stopper" /locus tag="RavenCo17_9" /note=Original Glimmer call @bp 7317 has strength 6.95; Genemark calls start at 7317 /note=SSC: 7317-7679 CP: yes SCS: both ST: SS BLAST-Start: [head-to-tail stopper [Gordonia phage Faith5x5]],,NCBI, q1:s1 100.0% 1.51533E-29 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.118, -6.524946451624424, no F: head-to-tail stopper SIF-BLAST: ,,[head-to-tail stopper [Gordonia phage Faith5x5]],,QGJ87563,71.1712,1.51533E-29 SIF-HHPRED: HEAD COMPLETION PROTEIN GP16; VIRAL PROTEIN, VIRAL INFECTION, TAILED BACTERIOPHAGE, SIPHOVIRIDAE, SPP1, VIRAL ASSEMBLY, HEAD-TO-TAIL INTERFACE, DNA GATEKEEPER, ALLOSTERIC MECHANISM; 7.2A {BACILLUS PHAGE SPP1},,,5A21_E,95.8333,98.7 SIF-Syn: This gene at Stop 7679 has a function as a head-to-tail stopper. The genes immediately upstream and downstream have functions as head-to-tail adapters. In Faith5x5 (CZ6), the genes upstream and downstream of the gene of the same pham are also of the same pham of their respective genes in RavenCo17. /note=Primary Annotator Name: Carreon, Justin /note=Auto-annotation: Both Glimmer and GeneMark call the gene start at position 7317 with an ATG as the start codon. /note=Coding Potential: There is apparent coding potential in both the host trained and self trained genemark maps, with the auto-called start site encapsulating all predicted coding potential. There is an oddity in the presence of coding potential within both strands of DNA over the region the gene is predicted to occupy. On the reverse strand there is coding potential approximately 100bp long centered around position 7500, and in the rest of the region covered by the gene there is only potential in one reading frame of the forward strand. However, despite the coding potential on both strands it is most reasonable to discard the result of the reverse strand, as the presence of coding potential on the forward strand both immediately upstream and downstream of the potential gene is evidence against a gene on the reverse strand, as there would not be enough space for a switch in gene direction. /note=SD (Final) Score: The final score for this start site is the 4th highest out of 5 at -6.525, with the 4th highest Z-score out of 5 at 1.118. While these values are relatively poor, the candidate start sites with higher scores than this start site have gaps between 76-299 bp long, and while the scores are relatively low, the predicted start overlap of 4 bp is indicative of belonging to an operon, thus RBS score is irrelevant. /note=Gap/overlap: The overlap between this gene and the upstream gene is 4 bp, indicative of an operon. This overlap is conserved in other members of the pham with the same start site in sub-cluster CZ6 and in singletons, however, of note is that members of the pham with the same start site in sub-cluster CZ4 do not preserve this overlap. /note=Phamerator: The gene is of pham 100505 as of 03/31/2022 at 1406 hrs PST. This gene is conserved in several members of cluster CZ such as in BoyNamedSue (CZ1), BaxterFox (CZ3), and Sidious (CZ7). There is no gene function auto-called, however other members of the cluster have called this gene as a head-to-tail stopper gene. /note=Starterator: As of 03/31/2022 at 1415 hrs PST, the start number with the most number of manual annotations in the pham is start 10, which is called in 23 of 71 non-draft genomes. RavenCo17 does not have this start site. The start site that is auto-called is start site 14, which is present in 12 of 84 genes in the pham and is manually annotated in 6/12 of the genes it is present. In RavenCo17, start site 14 corresponds to position 7317. /note=Location call: Given the coding potential for the gene in both the host-trained and self-trained genemark, the relatively high number of annotations with this start site in genes where it is present, the presence of the overlap in genomes where this start site for the gene is present, and the presence of this pham in genomes outside this gene’s sub-cluster, this gene is real and starts at bp 7317. Starterator, Glimmer, and GeneMark agree on this start site. /note=Function call: Head-to-Tail Stopper. This gene is highly conserved within cluster CZ and amongst a few singletons, indicative of a protein potentially necessary for the structure or function of Gordonia phages. In manually annotated genomes of both within the cluster and in singletons, the function of the gene is that of head-to-tail stopper. There were no hits in the CDD, but there were hits in PhagesDB BLAST, NCBI Blast, and HHPRED. Of the hits in the above, a hit matching the head-to-tail adapter in phage Eyre had the lowest e-value and highest score, but as the gene was lacking the prerequisite domain for a head-to-tail adapter and was possessing of the 5A21 chain E in the macromolecular complex, the gene could be identified as a head-to-tail stopper. /note=Transmembrane domains: No Transmembrane Domains, which makes sense. /note=Secondary Annotator Name: Charton, Chris /note=Secondary Annotator QC: I have QCed this gene, and agree with the location call and the function call. CDS 7679 - 8008 /gene="10" /product="gp10" /function="minor capsid protein" /locus tag="RavenCo17_10" /note=Original Glimmer call @bp 7679 has strength 10.49; Genemark calls start at 7679 /note=SSC: 7679-8008 CP: yes SCS: both ST: SS BLAST-Start: [HK97 gp10 family protein [Gordonia phage SoilAssassin] ],,NCBI, q1:s1 99.0826% 3.08964E-41 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.711, -3.329294830936593, yes F: minor capsid protein SIF-BLAST: ,,[HK97 gp10 family protein [Gordonia phage SoilAssassin] ],,YP_009303005,74.0741,3.08964E-41 SIF-HHPRED: Minor_capsid_2 ; Minor capsid protein,,,PF11114.11,85.3211,99.6 SIF-Syn: /note=Primary Annotator Name: Koetters, Owen /note=Auto-annotation: Glimmer and Gene Mark both call a start site at position 7679. /note=Coding Potential: Both the host and self trained Gene Mark indicate a significant amount of coding potential that falls completely within the auto-annotated ORF in the forward direction. /note=SD (Final) Score: The auto-annotated start site received an RBS final score of -3.329, which is the closest to zero of all potential ORFS. /note=Gap/overlap: The auto-annotated gap is -1, suggesting an overlap rather than a gap and that this gene may be contained in an operon presence of an operon. /note=Phamerator: The pham called is 53136 as of 03/30/22. Pham 53136 is highly conserved and includes 124 members, two of which are phage Bosnia and phage BunnyBear. /note=Starterator: Start site 23 was manually annotated the most often in this pham, as it was called 100% of the time when present and in 45/101 non-draft genomes. However, this start site is missing from the genome of RavenCo17, which instead shows a start number of 30. This track is fairly conserved, as it has been manually annotated in 31/101 non-draft genomes and 96.9% of the time when present. Further, it is highly conserved in cluster CZ2 and RavenCo17 is a member of CZ8. /note=Location call: Given the above evidence, this is a real gene and the auto-annotated start site at position 7679 is the best option listed. /note=Function call: Minor capsid protein. Does not have required evidence to call adaptor. /note=Transmembrane domains: There are no TMDs predicted by TOPCONS or TMHMM, suggesting that this gene product is not membrane associated or bound. /note=Secondary Annotator Name: Nguyen, Calvin /note=Secondary Annotator QC: According to the evidence provided by PECAAN, Starterator, and Phamerator, I agree with this gene annotation and the function call. Please make sure to fill out the Synteny Box. CDS 8008 - 8472 /gene="11" /product="gp11" /function="tail terminator" /locus tag="RavenCo17_11" /note=Original Glimmer call @bp 8008 has strength 11.61; Genemark calls start at 8008 /note=SSC: 8008-8472 CP: yes SCS: both ST: SS BLAST-Start: [tail terminator [Gordonia phage Vasanti]],,NCBI, q1:s1 98.7013% 6.54153E-75 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.083, -4.692527399714444, yes F: tail terminator SIF-BLAST: ,,[tail terminator [Gordonia phage Vasanti]],,QAY05749,84.3137,6.54153E-75 SIF-HHPRED: Minor_capsid_3 ; Bacteriophage minor capsid protein,,,PF12691.10,87.013,99.8 SIF-Syn: /note=Primary Annotator Name: Mascareno, Greta /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 8008. /note=Coding Potential: Coding potential is found both in GeneMark Self and Host. The ORF encompasses the gene from 8008 bp to around 8300 bp, however the proposed stop site is at 8472 bp. /note=SD (Final) Score: The best final score on PECAAN is -4.693. /note=Gap/overlap: There is an overlap of 1 bp which is favorable, may be evidence of an operon, and is under the 30 bp limit for gaps. /note=Phamerator: The pham number as of 03/31/2022 is 101771. The gene is conserved; it is found in phage Sidious from cluster CZ7, Moosehead from cluster CZ6, and Denise from cluster CZ5. /note=Starterator: Start site 12 is called in 30/113 (26.5%) non-draft members of the Pham. Start site two is located at 8008 bp as called by Glimmer and GeneMark. /note=Location call: Considering the evidence, this is a real gene with the start site of 8008 bp. /note=Function call: Tail terminator. NCBI BLAST and Phagesdb BLAST give good evidence for this function as e values are very low and percentage identity are high. HHPRED gives evidence tail terminator protein. /note=Transmembrane domains: TmHmm and Topcons provided no matches for TMHs so this gene is not a membrane protein. /note=Secondary Annotator Name: Reyes, Glania /note=Secondary Annotator QC: I agree with this annotation. I noticed you forgot to fill the synteny box! CDS 8475 - 9215 /gene="12" /product="gp12" /function="major tail protein" /locus tag="RavenCo17_12" /note=Original Glimmer call @bp 8475 has strength 12.4; Genemark calls start at 8475 /note=SSC: 8475-9215 CP: yes SCS: both ST: SS BLAST-Start: [major tail protein [Gordonia phage Eyre] ],,NCBI, q1:s1 100.0% 2.48401E-174 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.662, -5.4744737427355155, no F: major tail protein SIF-BLAST: ,,[major tail protein [Gordonia phage Eyre] ],,YP_009292403,100.0,2.48401E-174 SIF-HHPRED: Major tail protein V; gpV, Bacteriophage Lambda, Major tail protein, VIRAL PROTEIN; NMR {Enterobacteria phage lambda},,,2K4Q_A,61.7886,99.7 SIF-Syn: This gene displays synteny with phage Attis which belongs to cluster CZ and the same pham (84968). Both phages RavenCo19 and Attis have the same gene in this position. /note=Primary Annotator Name: Shaikh, Iman /note=Auto-annotation: Both the Glimmer start site and the GeneMark start site are at 8475 with an ATG start codon for this gene. /note=Coding Potential: This gene has reasonable coding potential which can be seen in the host-trained and self-trained GeneMark. The chosen start site covers the coding potential. /note=SD (Final) Score: The SD (final) score for this gene is -5.474, which is very high and reasonable. This score is given for start site 8475. /note=Gap/overlap: There is a gap of 2 base pairs, which is very small and is the smallest gap out of all of the possible start sites, so it is very likely that the start site for this gene is 8475. /note=Phamerator: According to an analysis run on March 31, 2022, the pham for this gene is 84968. Phage Ravenco17 belongs to cluster CZ, and there are several other phages from this cluster that also belong to this pham. /note=Starterator: Start number 21 is the most highly conserved, with 45/92 non-draft genes in the pham calling this site. This site is not present in this gene The start site for this gene is start 4, which is found in 30/114 genes in the pham. Start 4@8475 has 12 manual annotations. /note=Location call: It is very likely that the start site for this gene is 8475. /note=Function call: Major tail protein based on evidence from PhagesDB BLAST, with an e-value of 10^-137 and NCBI BLAST with an e-value of 10^-174 /note=Transmembrane domains: Neither TmHmm nor Topcons predict any transmembrane domains for this gene, which means that this gene can serve as a major tail protein. /note=Secondary Annotator Name: Bovee, Alyson /note=Secondary Annotator QC: I agree with the primary annotator, and see fit that the function of this gene is a major tail protein. CDS 9219 - 9959 /gene="13" /product="gp13" /function="tail assembly chaperone" /locus tag="RavenCo17_13" /note=Original Glimmer call @bp 9219 has strength 9.26; Genemark calls start at 9219 /note=SSC: 9219-9959 CP: yes SCS: both ST: SS BLAST-Start: [tail assembly chaperone [Gordonia phage Eyre] ],,NCBI, q1:s1 100.0% 1.27541E-172 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.414, -6.825770155834725, no F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Gordonia phage Eyre] ],,YP_009292404,98.374,1.27541E-172 SIF-HHPRED: Phage_Gp15 ; Bacteriophage Gp15 protein,,,PF06854.14,22.7642,91.2 SIF-Syn: The upstream gene has a function of major tail protein and its pham number is 84968. The downstream gene has a function of and its pham number is 13194. Comparing to phages Attis and Beans 7 from Cluster CZ2, we find that all three of them have the same genes and same Pham number within the same order. /note=Primary Annotator Name: Hu, Yixiao /note=Auto-annotation: Both the Glimmer start and GeneMark Start is 9219 /note=Coding Potential: The coding potential of this gene is seen in the forward strand so it is a forward gene. Also, the coding potential can be seen both in Host-Trained GeneMark and Self-Genemark. /note=SD (Final) Score: the best SD score is for start 9549, with score -3.949 and its Z-score is 2.55. The z-score for start 9219 is 1.41 and the corresponding final score is -6.826. /note=Gap/overlap: The gap upstream of this gene is 3 and there’s 19 overlap downstream, which is a small overlap. /note=Phamerator: The Pham number of the gene is 93057. There are 131 members in total, and all parts belong to the Cluster CZ, same as that of Faith5x5 and Moosehead in the Pham map. /note=Starterator: The starterator is 93057 and there are 25 of the 25 non-draft members, which favors that start site of 9219 for the gene. /note=Location call: This gene is a real gene and its start site should be 9219 /note=Function call: In phages BLAST, except for the first hit RavenCo17_Draft with unknown function, all following hits (like the Eyre and Faith5x5) has the function of tail assembly chaperone and both have a really low e-value of 1e-138 and 4e-89. In NCBI Blast the first hit suggests that the gene has a function of tail assembly, with a 97.561% identity, 100% coverage, and an e-value of 1.27541e-172. Although in HHpred the third hits of the function tail assembly chaperone has a relatively low probability (43.4%) and CDD has no hits, we still think the gene has a function of tail assembly chaperone. /note=Transmembrane domains: From both TmHmm and Topcons, we can see no transmembrane domains. /note=Secondary Annotator Name: Cini, Victoria /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered, including coding potential, starterator, gap/overlap, synteny, and final score. Remember to select an option for the drop-down menus for Pham Starterator and GM coding capacity. Also remember to update the synteny section. Lastly, make sure to check the evidence you used for function call (CDD, NCBI BLAST, PhagesDB BLAST, and HHpred). Other than that, great work!:) CDS 9940 - 10368 /gene="14" /product="gp14" /function="tail assembly chaperone" /locus tag="RavenCo17_14" /note=Original Glimmer call @bp 9940 has strength 10.79; Genemark calls start at 9940 /note=SSC: 9940-10368 CP: yes SCS: both ST: SS BLAST-Start: [tail assembly chaperone [Gordonia phage Moosehead]],,NCBI, q1:s1 98.5916% 8.95919E-78 GAP: -20 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.317, -4.134629096801125, no F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Gordonia phage Moosehead]],,QLF83825,84.7222,8.95919E-78 SIF-HHPRED: SIF-Syn: Tail assembly chaperone, upstream gene is also a tail assembly chaperone which belongs to pham 93057. The downstream gene is a tape measure protein which belongs to pham 14620. Phages Attis and Clap also show this order of gene function as well. /note=Primary Annotator Name: Abana, Juana /note=Auto-annotation: Glimmer and GeneMark both call the start at 9940 /note=Coding Potential: The coding potential in this ORF is predominantly seen in the forward strand which shows that this is a forward gene. There is coding potential found in both Host GeneMark and Self GeneMark. Moreover, both the Host and Self GeneMark include all of the coding potential from the chosen Glimmer and GeneMark start site. /note=SD (Final) Score: The most reasonable final score is -4.135 since its among the least negative and the most reasonable Z-score is among the highest at 2.317 /note=Gap/overlap: Upstream of the gene there is a 20 base pair overlap while downstream there is a 1 base pair overlap. This strongly suggests that this gene is part of an operon. In phages Mocha12 and Savage there is also a overlap seen but, the overlap is 11 base pairs. /note=Phamerator: The pham number as of April 5, 2022 is 13194. The gene is conserved in phages like Mocha12 and Savage that also belong to the CZ cluster. /note=Starterator: Start site 4 in Starterator was manually annotated in 25/25 non-draft genes in this pham. Start 4 is 9940 in RavenCo17. This evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the information above this is a real gene and the most likely start is 9940. /note=Function call: Tail assembly chaperone. The top two phagesdb BLAST hits have the function of tail assembly chaperone and have an e-value of 4e-63 and 7e-63. Most of the NCBI BLAST hits also have the function of tail assembly chaperone, the two selected have 98.5916% coverage, 76.3889% identity, and an e-values of 9.99017e-79 and 8.95919e-78. HHpred and CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: De Schutter, Elena /note=Secondary Annotator QC: Really great notes thank you for all the details! My only 2 comments would be to see if this 20bp overlap is seen in other phages as well (so looking at the pham maps and synteny) because that could help to show that this is not an issue. My second comment is just for the synteny box, make sure you add 2 examples of other phages that show synteny. However, I agree with the location and function call! CDS 10368 - 14717 /gene="15" /product="gp15" /function="tape measure protein" /locus tag="RavenCo17_15" /note=Original Glimmer call @bp 10368 has strength 10.23; Genemark calls start at 10368 /note=SSC: 10368-14717 CP: yes SCS: both ST: SS BLAST-Start: [tape measure protein [Gordonia phage Eyre] ],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.677, -6.28787341001706, no F: tape measure protein SIF-BLAST: ,,[tape measure protein [Gordonia phage Eyre] ],,YP_009292406,99.2409,0.0 SIF-HHPRED: Tape Measure Protein, gp57; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_AF,45.8247,98.8 SIF-Syn: Tape measure protein. The upstream gene is tail assembly chaperone, the downstream gene is minor tail protein. This pattern is also seen in phages Ebert (CZ2) and Moosehead (CZ6) /note=Primary Annotator Name: Charton, Chris /note=Auto-annotation start source: Both Glimmer and GeneMark call for the start site @ 10368. /note=Coding Potential: There is coding potential seen in the third forward reading frame for this gene. Both GeneMark self and host show this potential. /note=SD (Final) Score: -6.288. This is not the highest score, nor is the Z-Score of 1.677 the highest. However, this start site has an overlap of 1 bp, which indicates that it is part of an operon, which makes the SD and Z-Scores not as relevant for the call. /note=Gap/overlap: -1bp. There is a 1bp overlap with the upstream gene, this indicates that the gene is part of an operon. /note=Phamerator: As of 3/31/2022 the pham is 14620. This is conserved in phages Ebert (CZ2) and Moosehead (CZ6). The phamerator database calls for tape measure protein as the function. /note=Starterator: Start site 1 is conserved amongst all members of this pham. 25 of 25 non-draft phages manually annotate for this start site. This corresponds to a start site of 10368 for RavenCo17, and in line with Glimmer and GeneMark /note=Location call: Taken together there is strong evidence this is a real gene, with a start site of 10368. /note=Function call: Tape measure protein. There are numerous hits on phagesDB BLAST with this function and an e-value of 0. This function is additionally seen in NCBI BLAST, with e-values of 0. /note=Transmembrane domains: 0. Neither TMHMM or TOPCONS predict any TMDs. /note=Secondary Annotator Name: Sharma, Devshi /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered, including coding potential, starterator, gap/overlap, synteny, and final score. CDS 14717 - 15550 /gene="16" /product="gp16" /function="minor tail protein" /locus tag="RavenCo17_16" /note=Original Glimmer call @bp 14717 has strength 11.53; Genemark calls start at 14717 /note=SSC: 14717-15550 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Gordonia phage Eyre] ],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.692, -3.3060637489568596, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Gordonia phage Eyre] ],,YP_009292407,99.278,0.0 SIF-HHPRED: HYPOTHETICAL PROTEIN 19.1; VIRAL PROTEIN, DISTAL TAIL PROTEIN; 2.95A {BACILLUS PHAGE SPP1},,,2X8K_C,88.4477,99.9 SIF-Syn: Minor Tail Protein. Upstream gene is tape measure protein, downstream is minor tail protein, just like in phage Bosnia and Kita in Cluster CZ1. /note=Primary Annotator Name: Nguyen, Calvin /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 14717. /note=Coding Potential: Both GeneMark Host and Self show substantial coding potential in the region between the suggested start site 14717 and the stop site 15550. All coding potential is contained within those sites. /note=SD (Final) Score: The Final Score is -3.306, and the Z-Score is 2.692. These are the highest/best scores of all available start sites in the gene. /note=Gap/overlap: Overlap is 1 bp, with a common start codon of GTG. This is a reasonable overlap size, and is also supported by synteny in phages BoyNamedSue, Eviarto, and Bosnia of Cluster CZ1. /note=Phamerator: As of 3/31/22, the pham of this gene is 78270. Although the exact bp length is not identical, it is very commonly conserved within cluster CZ, as can be seen with phages Adora, AlumE, and Bunnybear. It shares identical gene length conservation with phage Eyre, a singleton. /note=Starterator: In this Pham, there are 71 Non-Draft members. RavenCo17 does not have the most commonly manually annotated Start Site of 4, which was manually annotated 36/71 times. RavenCo17 shares Start 5 with phage Eyre, which corresponds to the start site 14717. /note=Location call: According to all evidence provided, this gene is likely to be real with a start site of 14717. /note=Function call: Minor tail protein. According to Phagesdb BLAST, there were strong matches with phage Eyre, Denise, and Lamberg with high scores and low e-values that indicated the function of the gene to be a minor tail protein. HHpred adds additional evidence with a match for a distal tail protein with coverage 88.4477% and E-value of 6.9e-25. /note=Transmembrane domains: TmHmm provided no predicted TMHs, and Topcons also provided no matches for TMHs. Therefore, we can conclude that this gene is not a membrane protein. /note=Secondary Annotator Name: Carreon, Justin /note=Secondary Annotator QC: Due to overall evidence I agree with the primary annotator on the location call and the function call of the gene. CDS 15550 - 17259 /gene="17" /product="gp17" /function="minor tail protein" /locus tag="RavenCo17_17" /note=Original Glimmer call @bp 15550 has strength 13.61; Genemark calls start at 15550 /note=SSC: 15550-17259 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Gordonia phage Eyre] ],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.918, -5.239132038859023, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Gordonia phage Eyre] ],,YP_009292408,100.0,0.0 SIF-HHPRED: Sipho_Gp37 ; Siphovirus ReqiPepy6 Gp37-like protein,,,PF14594.9,88.225,99.8 SIF-Syn: Minor tail protein, upstream gene is minor tail protein, downstream gene is lysin A, just like in phages Adora and Agueybana in cluster CZ. /note=Primary Annotator Name: Reyes, Glania /note=Auto-annotation: Both Glimmer and GeneMark call the gene and agree on the same start site at 15550 bp. /note=Coding Potential: Coding potential in the ORF is very high on the forward strand, indicating that this is a forward gene. Coding potential is found in GeneMark Self and Host. /note=SD (Final) Score: -5.239. This is not the best Final Score, but it does not matter too much because the -1 bp overlap suggests that this gene is part of an operon. Overlap is also conserved in other phages in cluster CZ (Adora and Agueybana). /note=Gap/overlap: -1 bp. This overlap suggests that this gene is part of an operon. Overlap is also conserved in other phages in cluster CZ (Adora and Agueybana). /note=Phamerator: The pham number as of April 1, 2022 is 83510. The gene is conserved in phages Adora, Agueybana, AlumE, Antonio, and BatStarr, all of which are in cluster CZ with a conserved length of ~1599 bp. The function call for all of these genes is minor tail protein and it is consistent between Phamerator and the phams database. /note=Starterator: There are 94 members in this pham, 14 of which are drafts. Start 20 is found in 30/94 of genes in the pham, which correlates to a start site of 15550 bp for RavenCo17. There are also 25 of 80 manual annotations of this start site. This evidence agrees with Glimmer and GeneMark calling 15550 bp as the start site. /note=Location call: Considering the above evidence, this gene is a real gene and has a start site at 15550 bp. Starterator agrees with Glimmer and GeneMark. /note=Function call: Minor tail protein. Multiple phagesDB BLAST has hits with the suggested function minor tail protein with very small e values of 0 to 1e-07. A majority of NCBI BLAST also has hits with the function minor tail protein with small e values ranging from 0 to 6e-54 (Top 6 hits: 100% query cover, ~99-86% percent identity, E-value of 0.0). HHpred shows multiple hits with the function tail protein, not specifying if it is the minor tail protein. The top hit from HHpred has 99.82% probability and an e-value of 7.5e-17. /note=Transmembrane domains: TmHmm predicts no TMDs. Topcons also predicts no TMDS. This makes sense because the function for the gene is a minor tail protein, which helps the phage identify the correct host, so it is not a transmembrane protein. /note=Secondary Annotator Name: Koetters, Owen /note=Secondary Annotator QC: I agree with the predicted start site and functional annotation. CDS 17341 - 17994 /gene="18" /product="gp18" /function="lysin A, protease C39 domain" /locus tag="RavenCo17_18" /note=Original Glimmer call @bp 17341 has strength 6.22; Genemark calls start at 17341 /note=SSC: 17341-17994 CP: yes SCS: both ST: SS BLAST-Start: [lysin A protease C39 domain [Gordonia phage Doggs]],,NCBI, q1:s6 100.0% 3.60161E-150 GAP: 81 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.287, -2.230514797657003, yes F: lysin A, protease C39 domain SIF-BLAST: ,,[lysin A protease C39 domain [Gordonia phage Doggs]],,QKY80021,95.045,3.60161E-150 SIF-HHPRED: ALR0975 PROTEIN; PHYTOCHELATIN SYNTHASE, PCS, ALR0975, ACYL-ENZYME INTERMEDIATE, NOSTOC, GLUTATHIONE METABOLISM, CYSTEINE PROTEASE, TRANSFERASE; HET: 3GC, MSE; 1.4A {ANABAENA SP.} SCOP: d.3.1.14,,,2BU3_B,81.5668,99.7 SIF-Syn: Synteny with Bunnybear_CZ. Same upstream gene (minor tail protein). Downsteam gene is different, though. /note=Primary Annotator Name: Bovee, Alyson /note=Auto-annotation: Both Genemark and Glimmer called the start site of 17341 with a score of 6.22, which is high. The start codon is ATG which is a common start codon, so this is good evidence this is the correct start site. /note=Coding Potential: The Genemark Self graph shows coding potential throughout the entire ORF. The Genemark host graph also shows extremely high coding potential throughout the entire ORF. The start site and stop site are both located in the correct locations, so this is evidence this is the correct start site. /note=SD (Final) Score: The final score is -2.231, which is the least negative value, thus this is the best candidate for the correct start site. The value that is the least negative is the best candidate for the start site, which is out of 8 different start sites. Thus 17341 is the best start site. /note=Gap/overlap: There is a 81 base pair gap which is large, but not large enough for a gene to be added. There is not coding potential present in the gap. /note=Phamerator: RavenCo17 is in pham 53671 (called 4/5/2022) which has 106 members. The clusters are very diverse and CR was the most annotated, and RavenCo17 is in CZ which is the second most annotated. The majority of sequence lengths are about 600-700, and RavenCo17 is 654bp, which is a good indicator this is a real gene. RavenCo17 was compared to SpeedDemon, which is in cluster DL and Patio in CR, which though they are in different clusters have similar sequence length and are in the same pham. /note=Starterator: Starterator called pham 53671 to have 106 members and 23 drafts. RavenCo17 does not have the most annotated start site, and is found in 38 of 106 phages in the pham. However, upon further investigation, start site 17341 is called in 25 MA’s. /note=Location call: Based on a final score of -2.231 and a Z-score of 3.287, the correct start site call is 17341. /note=Function call: The proposed function of this RavenCo17 gene is a lysin A protease c39 domain. Multiple hits in BLAST and HHpred to this specific function. /note=Transmembrane domains: There are no transmembrane proteins predicted by either TmHmm or Topcons, thus this is not a transmembrane protein and the function call of lysin A protease stands. /note=Secondary Annotator Name: Mascareno, Greta /note=Secondary Annotator QC: I agree with the start site and function called. I think there should be more explanation for the gap since 80 bp is a bit large. Possibly look at the coding potential in that gap and verify that there is no evidence of that space being part of the gene. CDS 17994 - 18929 /gene="19" /product="gp19" /function="lysin A, glycosyl hydrolase domain" /locus tag="RavenCo17_19" /note=Original Glimmer call @bp 17994 has strength 6.94; Genemark calls start at 17994 /note=SSC: 17994-18929 CP: yes SCS: both ST: SS BLAST-Start: [lysin A, glycosyl hydrolase domain [Gordonia phage DirtyBoi]],,NCBI, q1:s1 96.1415% 1.40066E-178 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.135, -6.551191350318669, no F: lysin A, glycosyl hydrolase domain SIF-BLAST: ,,[lysin A, glycosyl hydrolase domain [Gordonia phage DirtyBoi]],,QOC55885,87.9085,1.40066E-178 SIF-HHPRED: d.2.1.0 (A:) automated matches {Bryum coronatum [TaxId: 216087]},,,d3wh1a_,58.5209,99.3 SIF-Syn: Lysin A, glycosyl hydrolase domain; upstream gene is lysin A, protease C39 domain (pham 53671), downstream gene is holin (pham 97087). Cluster CZ phages Cynthia, Sahara, and Whiteclaw also have a downstream gene with holin function and of pham 97087. And cluster CZ phage Bunnybear has an upstream gene with function of lysin A protease C39 domain and pham 53671, just like RavenCo17_19. These genes are not adjacent like how they are in RavenCo17 however. /note=Primary Annotator Name: Cini, Victoria /note=Auto-annotation: Gimmer and GeneMark. Both call an ATG start site at 17994. /note=Coding Potential: The gene has high coding potential predicted within the putative ORF. In the third panel for both H-GeneMark and S-GeneMark maps, we observe the highest coding potential, indicating that this is a forward gene. We observe coding potential throughout the entirety of the gene sequence, and the chosen start site captures all the coding potential. /note=SD (Final) Score: The chosen start site of 17994 has one of the worst final scores (-6.551) and z-scores (1.135) compared to other potential start sites on PECAAN. However, it has the smallest gap/overlap size of 1 compared to the other start sites which have gap/overlap sizes >= 26bp /note=Gap/overlap: 1bp overlap with the upstream gene, indicating that this gene may be part of an operon. All the other potential start sites have larger gap/overlap sizes of >= 26bp. /note=Phamerator: Pham 1102782 has 70 members with 54 non-draft members (04/01/22). Within this pham, gene lengths and cluster vary considerably, although function is mostly lysin A. This pham is present mainly in cluster CR, DB, DI, and DN phages- RavenCo17 is the only CZ phage to have this pham. As such, gene (stop @18929, F) does not exhibit conservation with other cluster CZ phages. /note=Starterator: Start number 15, corresponding to the basepair coordinate 17994 in RavenCo17, is the most annotated start in Starterator. This site has manual annotations in 35/54 non-draft genes in the pham. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 17994. /note=Function call: lysin A, glycosyl hydrolase domain. The top Phagesdb BLAST hits (e-values < 1e^-147) come from cluster DB phages and have a function of lysinA assigned to them. These genes have the same pham of 17994, similar sequence length of 306 (vs 311 for RavenCo17), and similar protein numbers to RavenCo17`s gene (stop 6351 F). The top NCBI BLAST hits (e-value <= 1e^-177, 96%+ coverage, 80%+ identity) also come from cluster DB phages and have lysinA, glycosyl hydrolase domain function associated with them. HHpred and CDD had strong hits for chitinase structure. This makes sense because chitinase is an enzyme that degrades glycosidic bonds in chitin, which is a component in the cell walls. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Shaikh, Iman /note=Secondary Annotator QC: I agree that the start site for this gene is 17994 based on the coding potential, final score, single basepair overlap, and evidence from starterator. I also agree that the function of this gene is lysin A, based on evidence from PhagesDB BLAST and NCBI BLAST. CDS 18932 - 19447 /gene="20" /product="gp20" /function="membrane protein" /locus tag="RavenCo17_20" /note=Original Glimmer call @bp 18932 has strength 14.8; Genemark calls start at 18932 /note=SSC: 18932-19447 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Gordonia phage Ebert] ],,NCBI, q1:s1 100.0% 8.91322E-96 GAP: 2 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.692, -3.4470622626190464, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Gordonia phage Ebert] ],,AWY04689,96.4912,8.91322E-96 SIF-HHPRED: SIF-Syn: Holin gene (pham 21500), upstream gene is lysin A, glycosyl hydrolase domain (pham 102782), downstream gene is a membrane protein (pham 99978), just like in phages Cynthia and Sahara apart from the fact that their lysin genes are not the same pham and not directly adjacent for Cynthia. /note=Primary Annotator Name: De Schutter, Elena /note=Auto-annotation: Glimmer and GeneMark both call the gene at a start site of 18932. /note=Coding Potential: The coding potential covers the entire gene for both GeneMark Self and Host and is very high. /note=SD (Final) Score: The final score is -3.447 and the z score is 2.692, these are good values, but it is worth mentioning that there is another start site (18926) that is also the longest ORF which has good values as well (SD: -4.356 & the same z score). This one is not called by Glimmer or GeneMark. /note=Gap/overlap: The gap for 18932 is 2 which is very good, and for 18926 the gap is -4 which is also possible. /note=Phamerator: pham 97087 as of 03/31/22, this pham has many clusters present: CZ (the same cluster as our phage and includes phages Attis and Bjanes), BH, CT, DG, DS, etc. as well as 3 singleton phages. The only function call present for a few phages is holin (this function is called in a few phages from the CZ cluster). /note=Starterator: The most called start site is #35 (which is present in our phage) and it was called 30/58 times for the non-draft genomes. This corresponds to the start site 18932 and was also predicted by Glimmer and GeneMark. /note=Location call: While there was a question between two different start sites. Since Genemark, Glimmer, and Starterator all point towards 18932, this is the start site called for this gene. /note=Function call: POSSIBLE HOLIN but not confirmed. Genes 20, 21, 22: All have 4 TMDs. Gene 19 is lysin A, so one of these is possibly holin. Some genomes call gene 20 (stop 19447) the holin gene, but more members of that pham do NOT call a function. Gene 20 has some HHpred hits for hoiln, but they are all very poor (I suspect these hits are what led other people to call this gene as holin). Calling "membrane protein" for now on all three, since it is not clear which one is the holin. -AF /note=Initially called as just a membrane protein but the function of the surrounding genes helped determine the holin function, the top hits on phagesDB has holin function (4e-82). /note=Transmembrane domains: There are 4 TMHs called by TmHmm and 4 transmembrane domains seen in TOPCONS. /note=Secondary Annotator Name: Shaikh, Iman /note=Secondary Annotator QC: I agree that the start site for this gene is 18932 based on the auto-annotation, coding potential, Starterator, and 2 basepair gap. CDS 19444 - 19869 /gene="21" /product="gp21" /function="membrane protein" /locus tag="RavenCo17_21" /note=Original Glimmer call @bp 19444 has strength 4.81; Genemark calls start at 19444 /note=SSC: 19444-19869 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Gordonia phage Sahara]],,NCBI, q1:s1 97.8723% 2.48866E-77 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.544, -5.216381181846213, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Gordonia phage Sahara]],,QYW00751,88.5714,2.48866E-77 SIF-HHPRED: SIF-Syn: My gene is a membrane protein. Upstream gene is membrane protein while downstream membrane protein. When comparing my sequence to Attis in Cluster CZ2, the same gene is a NKF. Upstream and downstream is NKF. Despite the lack of function evidence in pham maps, I feel that there is enough proof that my gene sequence is real and a membrane protein. The pham number for both RavenCo17 and Attis is 99978. 99978. My gene is a membrane protein. Upstream gene is membrane protein while downstream membrane protein. When comparing my sequence to Bjanes7, the same gene is a NKF. Upstream and downstream is NKF. This proves to me that my gene sequence is real and a membrane protein. The pham number for both RavenCo17 and Bjanes7 is 99978. /note=Primary Annotator Name: Sharma, Devshi /note=Auto-annotation: Glimmer and Genemark call the start site as 19444 and they both agree. The start codon is GTG that is called. /note=Coding Potential: Both self and host genemark graphs have very good coding potential predicted within the potential open reading frame. The start site is seen in the coding potential in both sequences too. When I compared the gene sequence to other sequences, there were many similarities which helps me decide if this is a real gene. The start site covers the entire coding potential. /note=SD (Final) Score: -5.216 is the final score. Out of all the scores, this is the second least negative. The least negative value has a really large gap of 47 nucleotides which makes me think that the final score of -5.216 is the best one. /note=Gap/overlap: -4. The overlap is -4 which means there could be a potential operon present here. I do not believe that there are alternative start candidates because of the evidence given from Glimmer and Genemark. This reading frame is the LORF. The length of the gene is acceptable being around 426 base pairs long. /note=Phamerator: Pham 99978. 04/05/2022. I can see that the function shown is unknown and that the length of the gene is roughly 500+ which is conserved throughout each sequence. There are a variety of clusters present within this pham. The gene is conserved within other members of the pham and I compared them to Argie_13 and Audrey_18. /note=Starterator: Pham number 99978 has 199 members, 36 are drafts. This includes Genes that do not have the "Most Annotated" start for RavenCo17. Start: 40 has 19 MA`s which is strong evidence that 19444 is the start site. Called 100.0% of time when present which is strong evidence for this being a real gene. /note=Location call: I believe this gene is a real gene given all the evidence presented above. The best start site is 19444 for this gene. /note=Function call: In Phagesdb BLAST, there are other phages with high E values but also have no known function. In HHPRED, there are many phages shown with high percent coverage and probability, but e values are horrible which means I did not consider this for function. In NCBI BLAST, the E values are very good and all of them tell me that the function of this gene is a membrane protein. Percent coverage is also really high. /note=Transmembrane domains: In TmHmm, there are 4 hits telling me that the function of my gene is a membrane protein. Topcons had no hits but there were none needed since TmHmm had a high number of hits. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 19866 - 20312 /gene="22" /product="gp22" /function="membrane protein" /locus tag="RavenCo17_22" /note=Original Glimmer call @bp 19866 has strength 9.71; Genemark calls start at 19866 /note=SSC: 19866-20312 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Gordonia phage DirtyBoi]],,NCBI, q1:s1 100.0% 7.65058E-100 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.824, -5.160958035523474, no F: membrane protein SIF-BLAST: ,,[membrane protein [Gordonia phage DirtyBoi]],,QOC55888,98.6487,7.65058E-100 SIF-HHPRED: SIF-Syn: This gene has synteny with other genes, as the phams of this gene and the upstream and downstream genes are conserved, such as in Attis, Bjanes7, and Cynthia, all of which are members of subcluster CZ2. /note=Primary Annotator Name: Carreon, Justin /note=Auto-annotation: Glimmer and GeneMark both call the start site at position 19866 with a start codon of ATG. /note=Coding Potential: There exists coding potential within the range of the auto-called start site and the stop site on both Host-Trained and Self-Trained GeneMark in one reading frame of the forward strand. There is an extremely thin spike of coding potential centered around position 20270 in one reading frame from the reverse strand, but evidence that precludes the possibility of a gene flip renders this coding potential moot. /note=SD (Final) Score: The final score for this start site is the 8th highest at -5.161 and the 10th highest Z score of 1.824. However, this start site has the highest scores with the most reasonable amount of gap or overlap, as the start sites with higher final scores have gaps or overlaps in the hundreds, while start sites with higher z scores have gaps in the hundreds and the smallest overlap being 70 bp. Furthermore, this start site seems to have an overlap of 4 bp, indicating that it is part of an operon and thus final score and z score are irrelevant. /note=Gap/overlap: There is an overlap of 4 base pairs between this gene and the upstream gene. While the overlap of 4 base pairs is not conserved in members of the same cluster, an overlap in general is conserved within the cluster, as the members of subcluster CZ2 that have this gene have an overlap of 1 bp. /note=Phamerator: The gene is of pham 100740 as of 04/05/2022 at 1415 hours PST. This gene is conserved within several members of Cluster CZ, mostly in subcluster CZ2 in phages such as Attis, however the gene also exists in members outside the cluster such as in Suzy (DG1) and DirtyBoi (DB). There is no function auto-called, and none of the phages with this gene at time of recording have the function called. /note=Starterator: As of 04/01/2022, the start number with the most number of annotations in this pham is start number 11, which is found in 21/22 genes in the pham, annotated for 19 of 20 non-draft genomes, and is called 100% of the time when present. This gene has this start site, which corresponds to position 19866. /note=Location call: Given the existence of significant coding potential in only one reading frame, the presence of the most conserved and annotated start number in the pham, and the conservation of the overlap in genomes this pham is present, this gene is real and starts at positon 19866. Starterator, Glimmer, and GeneMark agree with this call. /note=Function call: Membrane Protein. The lack of conservation across all members of cluster CZ is indicative of a non-critical role in phage life cycle. There were no hits for domains in CDD or HHPRED or for similar proteins in NCBI Blastp. However, there were hits in Phages DB blast for other proteins, although they were listed as unknown function. The presence of 4 predicted transmembrane helices by TmHmm is sufficient evidence to assign this gene as one for a membrane protein. /note=Transmembrane domains: TmHmm predicts 4 transmembrane helices, although TOPCONS produces no hits. Given the proposed function as a membrane protein, this makes sense as it would require sufficient anchoring to remain bound. /note=Secondary Annotator Name: Abana, Juana /note=Secondary Annotator QC: I agree with the predicted start site and functional annotation. CDS 20316 - 20693 /gene="23" /product="gp23" /function="membrane protein" /locus tag="RavenCo17_23" /note=Original Glimmer call @bp 20316 has strength 9.24; Genemark calls start at 20316 /note=SSC: 20316-20693 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Gordonia phage SoilAssassin] ],,NCBI, q1:s1 100.0% 4.76421E-61 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.957, -3.1164791313608173, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Gordonia phage SoilAssassin] ],,YP_009303018,98.4,4.76421E-61 SIF-HHPRED: SIF-Syn: The pham called is 15784 as of 04/04/22. Pham 15784 is highly conserved and includes 28 members, and two that display a conserved genome architecture are phage Attis23 and phage Haley23. /note=Primary Annotator Name: Koetters, Owen /note=Auto-annotation: Glimmer and Gene Mark both call a start site at position 20316. /note=Coding Potential: Both the host and self trained Gene Mark indicate a significant amount of coding potential that falls completely within the auto-annotated ORF in the forward direction. /note=SD (Final) Score: The auto-annotated start site received an RBS final score of -3.166, which is the closest to zero of all potential ORFS. /note=Gap/overlap: The auto-annotated gap is 3, which falls within the probable range. All other potential ORFS would contain a gap of over two hundred nucleotides. /note=Phamerator: The pham called is 15784 as of 04/04/22. Pham 15784 is highly conserved and includes 28 members, two of which are phage Attis23 and phage Haley23. /note=Starterator: Start site 3 was manually annotated the most often in this pham, as it was called in 20/23 non-draft phage genomes in this pham and 100% of the time when present. This start site matches that which is called in RavenCo17, supporting the auto-annotated start site at position 20316. /note=Location call: Given the above evidence, this is a real gene and the auto-annotated start site at position 20316 is the best option listed. /note=Function call: Membrane protein. Although there is no strong evidence that supports a more specific function, as many unknown and hypothetical functions are predicted using HHpred and a PhagesDB blast, this gene does contain a Transmembrane Domain. Furthermore, an NCBI blast hit with acceptable evidence (high probability, coverage, and low e-values) is also predicted to be a membrane protein. /note=Transmembrane domains: There is one TMDs predicted by TOPCONS or TMHMM, suggesting that this gene product is membrane associate. /note=Secondary Annotator Name: Charton, Chris /note=Secondary Annotator QC: I have QCed this gene, and agree with the location and function call CDS 20690 - 21409 /gene="24" /product="gp24" /function="lysin B" /locus tag="RavenCo17_24" /note=Original Glimmer call @bp 20690 has strength 7.07; Genemark calls start at 20690 /note=SSC: 20690-21409 CP: yes SCS: both ST: SS BLAST-Start: [lysin B [Gordonia phage Eyre] ],,NCBI, q2:s4 99.5816% 2.46585E-159 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.534, -3.769471247018311, no F: lysin B SIF-BLAST: ,,[lysin B [Gordonia phage Eyre] ],,YP_009292414,94.6281,2.46585E-159 SIF-HHPRED: Gene 12 protein; alpha/beta sandwich, CELL ADHESION; 2.0A {Mycobacterium phage D29},,,3HC7_A,94.1423,99.9 SIF-Syn: The upstream gene has a function of membrane protein and its pham number is 15784. The downstream gene has a function of minor tail protein and its pham number is 99784. /note=Primary Annotator Name: Mascareno, Greta /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 20690. /note=Coding Potential: Coding potential is found both in GeneMark Self and Host. The ORF encompasses the gene from start site to stop. /note=SD (Final) Score: The best final score on PECAAN is -3.769 which corresponds with the second best Z score of 2.534. /note=Gap/overlap: There is an overlap of 4 bp which is favorable, may be evidence of an operon, and is under the 30 bp limit for gaps. /note=Phamerator: The pham number as of 04/05/2022 is 101201. The gene is conserved; it is found in phage Lord Farquaad from cluster CZ, Moosehead and Faith5x5 from cluster CZ6. /note=Starterator: Start site 52 is called in 52/78 (66.0%) non-draft members of the Pham. Start site 52 is the most annotated start site and it is located at 20690 bp as called by Glimmer and GeneMark. /note=Location call: Considering the evidence, this is a real gene with the start site of 20690 bp. /note=Function call: Lysin B. NCBI BLAST gives good hits with high coverage, alignment, and probability. Phagesdb Function Frequency also indicated the recurring function of Lysin B. /note=Transmembrane domains: TmHmm and Topcons provided no matches for TMHs so this gene is not a membrane protein. /note=Secondary Annotator Name: Nguyen, Calvin /note=Secondary Annotator QC: Agree with call and annotation according to all evidence provided. Additional specific information could be provided in regards to function call (e.g. e-values, Phagesdb BLAST). Remember to fill in synteny, GM Coding Capacity, and Phamerator dropdown list. CDS 21406 - 23838 /gene="25" /product="gp25" /function="minor tail protein" /locus tag="RavenCo17_25" /note=Original Glimmer call @bp 21406 has strength 8.95; Genemark calls start at 21406 /note=SSC: 21406-23838 CP: yes SCS: both ST: NI BLAST-Start: [minor tail protein [Gordonia phage Ewald]],,NCBI, q46:s105 94.4444% 0.0 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.92, -4.963787326180191, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Gordonia phage Ewald]],,QCG77038,83.6639,0.0 SIF-HHPRED: SIF-Syn: This gene displays synteny with Eviarto upstream but the genomic architecture downstream of this gene is different. This seems to be the case with several other phages in the CZ cluster. /note=Primary Annotator Name: Shaikh, Iman /note=Auto-annotation: Both Glimmer and GeneMark have called the start site for this gene at position 21406. /note=Coding Potential: There is coding potential present on both the Host-Trained GeneMark and Self-Trained GeneMark for this gene. /note=SD (Final) Score: The SD (final) score for this gene at start site 21406 is -4.964, which is very high and reasonable, with a GTG start codon. /note=Gap/overlap: There is an overlap of 4 base pairs, which is the smallest overlap out of the different possible start sites, so it is likely that the start site for this gene is at position 21406. /note=Phamerator: According to an analysis run on Mar 31, 2022, the pham for this gene is 99784. This phage belongs to the CZ cluster and there are other phages in this cluster (Agueybana_27, Antonio_28) that are also part of this pham. /note=Starterator: The most conserved start site is 73, which was called in 45/153 non-draft genes in this pham. However, this gene did not have that start site. The start number for Ravenco17_25 was start 23, and is the only gene in the pham with this start site. /note=Location call: The start site for this gene is likely 21406. /note=Function call: This gene is a minor tail protein based on evidence from PhagesDB BLAST and NCBI BLAST. /note=Transmembrane domains: Neither TmHmm nor Topcons predict any transmembrane domains for this gene, which makes sense if this gene codes for a minor tail protein. /note=Secondary Annotator Name: Reyes, Glania /note=Secondary Annotator QC: I agree with the annotation. I noticed you forgot to fill the synteny box! CDS 23835 - 24911 /gene="26" /product="gp26" /function="minor tail protein" /locus tag="RavenCo17_26" /note=Original Glimmer call @bp 23973 has strength 5.68; Genemark calls start at 23973 /note=SSC: 23835-24911 CP: yes SCS: both-cs ST: SS BLAST-Start: [minor tail protein [Gordonia phage Dogfish]],,NCBI, q3:s2 99.4413% 1.28266E-136 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.456, -5.9139150158552, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Gordonia phage Dogfish]],,QBI97719,80.8333,1.28266E-136 SIF-HHPRED: SIF-Syn: Function of minor tail protein. The upstream gene has a function of major tail protein and its pham number is 107022. The downstream gene has a function of and its pham number is 108037. Comparing to phages DumpsterDude from Cluster DW and Meyran from Cluster DT, we find that all three of them have the same genes and same Pham number within the same order. /note=Primary Annotator Name: Hu,Yixiao /note=Auto-annotation: Both the Glimmer start and GeneMark Start is 23973 /note=Coding Potential: The coding potential of this gene is seen in the forward strand so it is a forward gene. Also, the coding potential can both be seen in the Host-Trained GeneMark and Self-Genemark. /note=SD (Final) Score: -5.786 is the best final score and the corresponding z-score is 1.513, which is okay. /note=Phamerator: The pham number of the gene is 102988. There are 681 members in total and belong to different clusters, like Z,B, and L. /note=Starterator: RavenCo17 does not have most annotated start. Start 4 (23835) is called 4 times. /note=Location call: This gene is a real gene and its start site should be 23973. /note=Function call: In phages BLAST, the second hit DumpsterDude suggests a minor tail function, with a low e-value of 1e-145. In NCBI Blast, the first four hits suggest that the gene may have a function of minor tail protein, with separately a 64.4444%, 70%, 65.651% , and 61.5169 identity, 99.6795% ,98.3974% 98.3974%, and 99.6795% coverage. Also, all of them have a low e-value of 2.50127e-123, 4.44102e-106, 4.68068e-95, and 1.11975e-88. In HHpred, the first hit has the function of phage_tail_collar, which corresponds to the function of minor tail protein, with a low probability of 4.2% and CDD has no hits. Since all data related to function is relatively low, we think this gene has the function of minor tail protein. /note=Transmembrane domains: From both TMHMM and TOPCONS, we can see no transmembrane domains. /note=Secondary Annotator Name: Bovee, Alyson /note=Secondary Annotator QC: CDS 24931 - 25293 /gene="27" /product="gp27" /function="hypothetical protein" /locus tag="RavenCo17_27" /note=Original Glimmer call @bp 24931 has strength 9.92; Genemark calls start at 24931 /note=SSC: 24931-25293 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HWC68_gp35 [Gordonia phage Gibbin] ],,NCBI, q2:s3 97.5% 1.41373E-66 GAP: 19 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.832, -3.370572328722819, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HWC68_gp35 [Gordonia phage Gibbin] ],,YP_009852388,88.4297,1.41373E-66 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Abana, Juana /note=Auto-annotation: Glimmer and GeneMark both call the start at 24931 /note=Coding Potential: The coding potential in this ORF is predominantly seen in the forward strand which shows that this is a forward gene. There is coding potential found in both Host GeneMark and Self GeneMark. Moreover, both the Host and Self GeneMark include all of the coding potential from the chosen Glimmer and GeneMark start site. /note=SD (Final) Score: The final score is the best option at -3.371 since it is the least negative and the Z-score is the highest at 2.832. /note=Gap/overlap: Upstream of the gene there is a 19 base pair gap while downstream there is a 37 base pair gap. /note=Phamerator: The pham number as of April 5, 2022 is 101660. The gene is conserved in phage suerte which also belongs to the CZ cluster. /note=Starterator: Start site 94 in Starterator was manually annotated in 135/529 non-draft genes in this pham. Start 94 is 24931 in RavenCo17. This evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the information above this is a real gene and the most likely start is 24931. /note=Function call: No known function. The top two phagesdb BLAST hits have function unknown and have an e-value of 5e-54. Most of the NCBI BLAST hits called the function as a hypothetical protein, the two selected hits have 97.5% coverage, 85.124% identity, and an e-values of 1.41373e-66 and 4.52343e-66. HHpred and CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Cini, Victoria /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered, including coding potential, starterator, gap/overlap, synteny, and final score.Great work!:) CDS 25331 - 27253 /gene="28" /product="gp28" /function="minor tail protein" /locus tag="RavenCo17_28" /note=Original Glimmer call @bp 25331 has strength 8.89; Genemark calls start at 25331 /note=SSC: 25331-27253 CP: yes SCS: both ST: NI BLAST-Start: [minor tail protein [Gordonia phage MagicMan]],,NCBI, q131:s124 79.6875% 0.0 GAP: 37 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.957, -2.8454123590742793, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Gordonia phage MagicMan]],,QDM55846,79.4629,0.0 SIF-HHPRED: SIF-Syn: Minor tail protein. Although this gene is a member of a pham and seen in other phages; for example Fenry (CV) and DirtyBoi (DB), it does not share synteny with the upstream and downstream genes with any of the other 4 phages of this pham. For RavenCo17, both upstream and downstream genes are NKF. /note=Primary Annotator Name: Charton, Chris /note=Auto-annotation start source: Both Glimmer and GeneMark call for the start site at 25331 /note=Coding Potential: There is evidence of coding potential for this gene in the second reading frame in the forward direction. This is seen in both GeneMark self and host. /note=SD (Final) Score: -2.845. This is the best SD score amongst start sites. It additionally has the highest Z-score at 2.957. /note=Gap/overlap: 37bp gap. This gap is reasonably sized, other start sites either incur significant overhang or an excessively large gap. /note=Phamerator: As of 3/31/2022 the pham corresponding to this gene is 102028. This is conserved in phage Fenry (CV) and phage DirtyBoi (DB) both from different clusters, but the length of these proteins is very similar- minorly different- corresponding with different start sites. /note=Starterator: Start site 13 is only found in RavenCo17, and has not been manually annotated before. There are only 4 other non-draft phages in this pham, and all are from different clusters. There are no manual annotations for any of the start sites present for the gene, but looking at the Z-Score, SD and gap information we can narrow down start site 13 as the most likely start site, which corresponds to a start @25331. This additionally agrees with the start sites predicted by Glimmer and GeneMark /note=Location call: Based on the evidence, this is a real gene with a most likely start site of 25331. /note=Function call: Minor tail protein. There are numerous hits on PhagesDB with good e-values for minor tail proteins. The top two have e-values of 0, and the third has an e-value of 1e-74. From NCBI Blast there are two hits for minor tail protein with an e-value of 0 as well. /note=Transmembrane domains: 0. Neither TMHMM or TOPCONS predict any TMDs. /note=Secondary Annotator Name: De Schutter, Elena /note=Secondary Annotator QC: I agree with your start site call and also function call! Just make sure you select in the drop down menu what start site you have picked with starterator (I think it should be the "Not Informative" choice for you since you didn`t have the most annotated start site), and also don`t forget to select that there was coding potential in that drop down menu. You probably didn`t fill this out because other genes weren`t done around you, but don`t forget to do the synteny box. Great job!! CDS 27390 - 28007 /gene="29" /product="gp29" /function="hypothetical protein" /locus tag="RavenCo17_29" /note=Original Glimmer call @bp 27390 has strength 14.2; Genemark calls start at 27390 /note=SSC: 27390-28007 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_MAGICMAN_30 [Gordonia phage MagicMan]],,NCBI, q1:s1 88.2927% 4.84387E-41 GAP: 136 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.764, -3.810925907842985, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_MAGICMAN_30 [Gordonia phage MagicMan]],,QDM55847,52.4752,4.84387E-41 SIF-HHPRED: SIF-Syn: NKF. No known synteny is observed in comparison with phage Atis, Bjanes, and Cynthia of Cluster CZ2. /note=Primary Annotator Name: Nguyen, Calvin /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 27390. /note=Coding Potential: According to GeneMark Host and Self, there is considerable coding potential in the region between the suggested start at 27390 and the stop site of 28007. All coding potential is contained between these two sites.  /note=SD (Final) Score: The suggested start site has a final score of -3.811 and Z-score of 2.764; these are the best values of all available start sites in the gene.   /note=Gap/overlap: There is a gap of 136 bp between this gene and its upstream counterpart, which is rather large. Coding potential does appear to be present in the *reverse* side of the gap; however, this is present only in GeneMark Self and is unlikely to be an actual gene due to the extreme proximity with Gene 29`s start site. The stop site for the potential ORF inthe reverse direction also completely overlaps with the upstream gene. The large gap does not appear to be conserved among members of the CZ cluster, but RavenCo17 appears to be quite mosaic so this is not surprising. The start codon is ATG, which is a common starting codon.  /note=Phamerator: As of 3/31/22, this gene is located in Pham 2648. There does not seem to be many cluster members with the conserved gene. The only other phage with the same pham is MagicMan, which is in cluster DB. /note=Starterator: The only other non-draft phage called start site 2 (27390). /note=Location call: According to all available information and evidence, it appears that this gene is real and is likely to start at site 27390. /note=Function call: NKF. Only Phage MagicMan provided any significant evidence for NKF in Phagesdb BLAST with an e-value of 1e-34. HHpred was uninformative and did not yield significant results. /note=Transmembrane domains: None. According to TmHmm, there are 0 predicted TMHs, and there are no recorded hits on Topcons as well. As a result, we can conclude that there are no transmembrane domains. /note=Secondary Annotator Name: Sharma, Devshi /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered, including coding potential, starterator, gap/overlap, synteny, and final score. CDS 28004 - 28459 /gene="30" /product="gp30" /function="hypothetical protein" /locus tag="RavenCo17_30" /note=Original Glimmer call @bp 28004 has strength 5.94; Genemark calls start at 28004 /note=SSC: 28004-28459 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein [Mycobacterium attenuatum] ],,NCBI, q30:s28 80.7947% 1.32695E-27 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.979, -4.843798539558957, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Mycobacterium attenuatum] ],,WP_122497956,53.0612,1.32695E-27 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Reyes, Glania /note=Auto-annotation: Both Glimmer and GeneMark call the gene and agree on the same start site at 28004 bp. /note=Coding Potential: Coding potential in the ORF is substantial on the forward strand, indicating that this is a forward gene. Coding potential is found in GeneMark Self and Host, but is weaker in GeneMark Host. /note=SD (Final) Score: -4.844. This is not the best Final Score, but it does not matter too much because the -4 bp overlap suggests that this gene is part of an operon. Overlap is not conserved in other phage clusters. /note=Gap/overlap: -4 bp. This gene is possibly part of an operon. /note=Phamerator: The pham number as of April 7, 2022 is 2106. This gene from RavenCo17 is the only member of this pham. /note=Starterator: N/A. Orpham. /note=Location call: 28004. Based on the evidence above. Also several moderate NCBI BLAST hits to "bacterial" proteins which may be prophage genes. /note=Function call: NKF. PhagesDB BLAST has no hits. A majority of NCBI BLAST also has hits with no function, just “hypothetical protein” with small e values ranging from 1.3e-27 to 2.2e-13. HHpred shows no relevant hits (with high e-values of ~93). CDD also had no hits. /note=Transmembrane domains: TmHmm predicts no TMDs. Topcons also predicts no TMDS. This makes sense because the function for the gene has no known function. /note=Secondary Annotator Name: Carreon, Justin /note=Secondary Annotator QC: Given that there is coding potential for the region and the selected start site is the only start that preserves a high gene concentration by having the smallest gap, I agree with the primary annotator on the location call. Given the lack of evidence towards protein function or even similar domains in other proteins, I also agree with the primary annotator that there is No Known Function. CDS 28473 - 29000 /gene="31" /product="gp31" /function="hypothetical protein" /locus tag="RavenCo17_31" /note=Original Glimmer call @bp 28590 has strength 7.09; Genemark calls start at 28590 /note=SSC: 28473-29000 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein [Mycobacterium sp.]],,NCBI, q1:s1 100.0% 2.5151E-55 GAP: 13 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.687, -6.966359958041252, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Mycobacterium sp.]],,MBV8178792,69.3182,2.5151E-55 SIF-HHPRED: SIF-Syn: /note=Brief notes summary/final call by instructor: /note=CP present on GM host and self (more on self). Glimmer/Genemark both call 28590, which leaves large gap, but SD score is slightly better with 28590 than 28473. However, 28473 gives longer ORF and encapsulates more coding potential as seen on GM-self. Chose 28473. /note=Primary Annotator Name: Bovee, Alyson /note=Auto-annotation: Both Genemark and Glimmer call 28590 as the start site, with a Glimmer score of 7.09, which is high. The start codon is ATG which is a common start codon, which is good evidence that this is the correct start site. /note=Coding Potential: The Genemark self graph shows coding potential in the region of this gene, as shown but the line and the dark, bold line at the bottom axis showing coding potential. In the host trained graph, there is limited coding potential in the second half of the gene. However, with the self trained graph showing a good output, this is evidence of a real gene. /note=SD (Final) Score: The final score is -6.352 which is the fourth best final score of all the possible start sites, thus this is a good call for the start site. /note=Gap/overlap: The gap of the gene is 130, which is large but typically not large enough for a gene to be filled in, thus this is a good call for the start site to be 28590. A reason this could be a large gap is because of a switch in orientation from forward to reverse downstream the gene. /note=Phamerator: The pham number for RavenCo17 is 7720. It is the only member in this pham, making it an orpham. Because of this, this RavenCo17 gene cannot be compared to any other phage. It is in Cluster CZ and has a length of 411bp. /note=Starterator: There is no starterator report for this gene. /note=Location call: Based on the final score of -6.352 and a gap of 130bp, the correct start site is 28590. /note=Function call: The proposed function of this RavenCo17 gene is a baseplate j protein. This is called by the Phagesdb Function Frequency, and it is always called as this but for a different cluster (AO). This can be compared to phage Brent, which also has this function but has an e value of 2.3, which is not a good e value. Another proposed function as called by HHPRED is an endoribonuclease, but this call also has a very high e value of 2.8. Because of these poor e-values, the function cannot be confidently called, but may be a baseplate J protein, but is most likely NKF. /note=Transmembrane domains: There are no transmembrane proteins as called by TmHmm or Topcons. /note=Secondary Annotator Name: Koetters, Owen /note=Secondary Annotator QC: I agree with the predicted start site and manually annotated functional call. A factor that could be contributing to the large gap and final score could be the switch from F-R immediately downstream this gene. CDS complement (29012 - 29179) /gene="32" /product="gp32" /function="hypothetical protein" /locus tag="RavenCo17_32" /note=Original Glimmer call @bp 29179 has strength 7.33; Genemark calls start at 29179 /note=SSC: 29179-29012 CP: no SCS: both ST: NA BLAST-Start: GAP: 121 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.265, -6.574698744401119, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: NKF; upstream gene is an orpham with NKF (pham 54438) and downstream gene is also an orpham with NFK (pham 7720). There is no synteny observed with other cluster CZ phages, however, it`s important to note that according to SEAPHAGES, "Cluster CZ phages are very plastic in the middle of the genome, near the integration cassette.” This means that it`s not abnormal to observe orphams or wild gene patterns mid-genome for phages in this cluster like RavenCo17. /note=Primary Annotator Name: Cini, Victoria /note=Auto-annotation: Gimmer and GeneMark. Both call an GTG start site at 29179. /note=Coding Potential: The gene has coding potential predicted within the putative ORF. In the fourth panel for both H-GeneMark and S-GeneMark maps, we observe the highest coding potential, indicating that this is a reverse gene. We observe coding potential throughout most of the gene sequence, however, the chosen start site does not capture all the coding potential. /note=SD (Final) Score: The start site of 29179 has one of the worst final scores (-6.575) and z-scores (1.265) compared to the other potential start sites on PECAAN. However, it has the smallest gap size of 121bp, compared to the other start sites which have gap sizes >= 157bp /note=Gap/overlap: 121bp gap with the upstream gene. This gap size is relatively large, however, all the other potential start sites have larger gap sizes of >= 157bp. /note=Phamerator: Pham 722 has 1 member with 0 non-draft members (04/01/22). Gene is an orpham. /note=Starterator: Not applicable because gene is an orpham (reports are not generated for phams with only one member as there is nothing to compare it to). /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 29179. /note=Function call: NKF. No relevant Phagesdb Blast or HHpred hits (all e-values > 1e^-7). No CDD or NCBI Blast hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Mascareno, Greta /note=Secondary Annotator QC: I agree with the start site and function call of this gene. I think it would be beneficial to look into the coding potential of this large gap and include the findings in these notes. Because this gene is in an orpham, we cannot compare to see if this gap is conserved so focus on the coding potential and add that to the Gap/Overlap section. CDS complement (29301 - 29396) /gene="33" /product="gp33" /function="hypothetical protein" /locus tag="RavenCo17_33" /note=Genemark calls start at 29396 /note=SSC: 29396-29301 CP: yes SCS: genemark ST: NA BLAST-Start: [hypothetical protein SEA_BEGONIA_36 [Gordonia phage Begonia]],,NCBI, q1:s45 100.0% 2.21797E-11 GAP: 43 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.604, -3.83711508627892, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BEGONIA_36 [Gordonia phage Begonia]],,QDF16209,38.961,2.21797E-11 SIF-HHPRED: SIF-Syn: There is no relevant synteny to speak off since this gene is part of an orpham. /note=Primary Annotator Name: De Schutter, Elena /note=Auto-annotation: Only GeneMark calls the gene at a start site of 29396. Glimmer does not have an annotation. /note=Coding Potential: Coding potential is a little bit strange in this situation because there is some present but it definitely does not cover the whole gene. It only slightly covers the start site. /note=SD (Final) Score: The final score is -3.837 and the z score is 2.604, these are good values and the best option as only one z score is slightly better (2.774) but our initial call has the longest open reading frame. The gene is only 96bp long but Dr. Freise has confirmed that it is still possible to have a gene this small. /note=Gap/overlap: The gap is 180bp, while this is a very large gap, it exists between a forward and reverse gene in which it makes sense (and is expected) to have a large gap. /note=Phamerator: This is an orpham (54438 as of 04/05/22) /note=Starterator: This gene is an orpham so there is no starterator. /note=Location call: The start site of this gene is called to be 29396 as this seems to be the best option available. Calling this a real gene was tricky since looking at pham maps does not help us determine whether this gene is real or not since it is not conserved at all throughout the CZ cluster. After seeking help from Professor Ana. Dr. Freise, and Michelle, we are calling this a real gene. In addition to that, according to SeaPhages “Cluster CZ phages are very plastic in the middle of the genome, near the integration cassette.” (thank you Dr. Freise for that research!). /note=Function call: Unknown function, the top hits on phagesDB all have unknown function (2e-11). HHPRED did not have any relevant hits, CDD did not have any hits at all. NCBI did have good hits with a good e-value (lower than 10e-4) and 100% coverage, but all of these hits were for hypothetical proteins. /note=Transmembrane domains: There were no transmembrane domains and thus it is not a membrane protein. /note=Secondary Annotator Name: Mascareno, Greta /note=Secondary Annotator QC: I agree with this start site and function call. Although the evidence is lacking, I agree with our professors that the best calls have been made for this gene. CDS complement (29440 - 29529) /gene="34" /product="gp34" /function="hypothetical protein" /locus tag="RavenCo17_34" /note= /note=SSC: 29529-29440 CP: yes SCS: neither ST: NI BLAST-Start: [hypothetical protein SEA_JALAMMAH_36 [Gordonia phage Jalammah]],,NCBI, q1:s1 72.4138% 3.61542E-4 GAP: 47 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.738, -6.163879819553072, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_JALAMMAH_36 [Gordonia phage Jalammah]],,QWY84303,28.0,3.61542E-4 SIF-HHPRED: SIF-Syn: /note=Manually added gene to fill gap. There is a small amount of CP on both genemark host and genemark self. Unsure if real, but CP seemed convincing enough and 90bp length is unusually small but still sufficient. Leaning toward not real, but would like QC to take a second look. - Amanda Freise CDS 29577 - 29801 /gene="35" /product="gp35" /function="hypothetical protein" /locus tag="RavenCo17_35" /note=Original Glimmer call @bp 29577 has strength 5.98; Genemark calls start at 29577 /note=SSC: 29577-29801 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_OPIE_36 [Gordonia phage Opie]],,NCBI, q1:s42 100.0% 6.22569E-33 GAP: 47 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.357, -4.071022498715144, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_OPIE_36 [Gordonia phage Opie]],,QHB37916,53.0435,6.22569E-33 SIF-HHPRED: SIF-Syn: My gene is a NKF. Upstream gene is NKF and downstream is helix turn helix DNA binding domain. When comparing my sequence to Opie in Cluster DB and Marteena in Cluster CY1, the same gene is a NKF. Upstream and downstream is NKF. /note=Primary Annotator Name: Sharma, Devshi /note=Auto-annotation: Glimmer and Genemark call the start site as 29577 and they both agree. The start codon is ATG that is called. /note=Coding Potential: The coding potential is very weak for this open reading frame. The graph goes up and down quite a bit within the open reading frame. /note=SD (Final) Score: -4.071 is the final score. Out of all the scores, this is the second least negative. The least negative value has a really large gap of 297 nucleotides which makes me think that the final score of -4.071 is the best one. /note=Gap/overlap: 180 is the gap. However, even within this open reading frame, there is a huge gap of 180 nucleotides but this can be explained by the fact that the previous open reading frame is in reverse while this one is forward. I do not believe that there are alternative start candidates because of the evidence given from Glimmer and Genemark. This reading frame is the LORF. The length of the gene is acceptable being around 225 base pairs long. /note=Phamerator: Pham 10854. 04/05/2022. I can see that the function shown is unknown and that the length of the gene is roughly 230 which is conserved throughout each sequence. There are a variety of clusters present within this pham. The gene is conserved within other members of the pham and I compared them to Malibo_31 and Opie_36. /note=Starterator: Pham number 10854 has 5 members, 1 is a draft. This includes Genes that do not have the "Most Annotated" start for RavenCo17. Start: 14 and has no MA`s which is enough evidence that 29577 is the start site. Called 50.0% of time when present which is enough evidence for this being a real gene. /note=Location call: I believe this gene is a real gene given all the evidence presented above. The best start site is 29577 for this gene. /note=Function call: In Phagesdb BLAST, there are other phages with high E values but also have no known function. In HHPRED, there are many phages showing hypothetical functions, but e values are horrible which means I did not consider this for function. In NCBI BLAST, the E values are decent but all of them are hypothetical proteins. This did not help me determine function. /note=Transmembrane domains: No transmembrane domains. Both TMHMM or TOPCONS did not provide any TMDs which did not help me figure out the function of my gene. /note=Secondary Annotator Name: Shaikh, Iman /note=Secondary Annotator QC: While there is a large gap for this gene, I agree that it is real and that the start site is at 29577. I also agree that there is no known function for this gene. CDS complement (29944 - 30180) /gene="36" /product="gp36" /function="helix-turn-helix DNA binding domain" /locus tag="RavenCo17_36" /note=Original Glimmer call @bp 30138 has strength 5.3; Genemark calls start at 30180 /note=SSC: 30180-29944 CP: yes SCS: both-gm ST: SS BLAST-Start: [hypothetical protein SEA_NYMPHADORA_34 [Gordonia phage Nymphadora] ],,NCBI, q1:s1 100.0% 2.91532E-47 GAP: -14 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.821, -5.738304291908741, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[hypothetical protein SEA_NYMPHADORA_34 [Gordonia phage Nymphadora] ],,YP_009286079,98.7179,2.91532E-47 SIF-HHPRED: Putative DNA-binding protein; BldC, S. coelicolor, developmental switch, MerR-like, DNA BINDING PROTEIN-DNA complex; 3.09A {Streptomyces venezuelae},,,6AMA_D,85.8974,98.8 SIF-Syn: While the genes on either side of Stop29944R have no functions annotated as of 04/05/2022 at 2220 hours PST, in all non-draft genomes of subcluster CZ1 that have this gene, genes belonging to the phams 20963 and 5975 are also present upstream in that order from this gene. Additionally, in the case of at least phages BatStarr, Bosnia, and Nymphadora, this set of phams (102108, 20963, and 5975) is flanked by directional gene switches. It should be noted that while this gene does appear occasionally in Subcluster CZ2, such as in phage Cynthia, it does not maintain synteny when present, and the gene is not found at all within Cluster CZ outside of subclusters CZ1, CZ2, and CZ8. /note=Primary Annotator Name: Carreon, Justin /note=Auto-annotation: Glimmer and GeneMark disagree on the start site. Glimmer predicts the start to be at 30138 and GeneMark predicts the start to be at 30180. Of the two, GeneMark predicts the start to be at the Longest Open Reading Frame. /note=Coding Potential: According to the host-trained GeneMark, only the start site at 30180 encompasses the entirety of the coding potential. According to the Self-Trained GeneMark, both start sites have coding potential. In either case, there is only one reading frame on either strand to have coding potential in the region bounded by the stop site and the start site candidates. /note=SD (Final) Score: For start site 30180, it has the highest final and z-score at -5.738 and 1.821 respectively. However, this start site also implies an operon so these scores are irrelevant. /note=Gap/overlap: For the start site at position 30180, there is an overlap of 14 base pairs. This overlap is found in all members of Cluster CZ with this start site manually annotated (such as Antonio, BatStarr, and Nymphadora), and in several manual annotations of members outside Cluster CZ such as Asapag (DN1), BeeGee (CY), Sleepyhead (singleton), and Whack (singleton). /note=Phamerator: The pham for this gene as of 04/04/2022 at 2025 hours PST is 102108. This pham is conserved in several clusters, principally clusters B, CZ, DN, and DR, with a presence in a member of several other clusters. This gene can be found in Colbert (B), Cynthia (CZ), BENtherdunthat (DN), and CaiB (DR). There was no function auto-called, but with exception to Cynthia, all of the listed phages above have this gene annotated as a “Helix-turn-helix DNA binding protein”, whereas Cynthia has this gene annotated as an excise protein, presumably due to more evidence in phage Cynthia of this specific function. /note=Starterator: As of 04/01/2022, the start number with the most number of annotations was start 45, which is found in 258 of 481 genes in the pham, has 234 manual annotations with this start number, and is called in 99.6% of the genes it is found in. However, this gene does not have this start number. Starterator auto-calls the start number to be start 77, which corresponds to position 30138 in RavenCo17, and has 3 manual annotations, is found in 91 of 481 genes in the pham, and is called 8.8% of the time when present. However, the 3 genes in the pham that have start number 77 manually annotated are of Cluster DN such as Apricot, BENtherdunthat, and Phistory, and the other 4 genes of the pham that call this start site are draft genomes. All the other phages of Cluster CZ that call this start number are draft genomes (Eudoria, Manasvini). The more likely start number candidate is start number 53, which is found in 41 of 481 genes in the pham, has 27 of 423 manual annotations, and is called 75.6% of the time when present. Furthermore, all members of Cluster CZ, such as Antonio, BatStarr, and Nymphadora, that call this start site are manually annotated. In RavenCo17, start site 53 corresponds to position 30180 /note=Location call: Based on the coverage of coding potential in the host-trained and self-trained GeneMarks and the conservation of the overlap in genes in the pham within and without the cluster, this gene is real and starts at position 30180. GeneMark agrees with this location call, whereas Glimmer and Starterator disagree. The start codon using this start site is an ATG. /note=Function Call: helix-turn-helix DNA binding domain. Blast Results from Phages DB Blast and NCBI Blast indicate high similarity to other phage proteins with functions listed as helix-turn-helix, while a hit from HHPRED gives evidence that the protein is DNA binding. Very weak evidence (accession cd00093) from CDD likens the protein to Helix-Turn-Helix XRE family of proteins, but with poor identity, alignment, coverage, and e-value. /note=Transmembrane domains: No transmembrane domains predicted by TmHmm or TOPCONS. /note=Secondary Annotator Name: Hu, Yixiao (Sherry) /note=Secondary Annotator QC:Hu, Yixiao (Sherry), Yes CDS complement (30167 - 30415) /gene="37" /product="gp37" /function="membrane protein" /locus tag="RavenCo17_37" /note=Original Glimmer call @bp 30337 has strength 5.84; Genemark calls start at 30337 /note=SSC: 30415-30167 CP: yes SCS: both-cs ST: NI BLAST-Start: [hypothetical protein SEA_KITA_37 [Gordonia phage Kita] ],,NCBI, q1:s1 100.0% 8.52558E-48 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.612, -3.469401707954294, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein SEA_KITA_37 [Gordonia phage Kita] ],,YP_009301398,98.7805,8.52558E-48 SIF-HHPRED: SIF-Syn: The pham called is 20963 as of 04/06/22. Pham 20963 is highly conserved and includes 36 members, and two that contain the same architecture are phage Kitta and phage Polly. /note=Note: tricky start site, but chosen start site (30415) is also chosen by CZ1 phages Kita and Maridalia. 30415 also results in likely operon with another membrane protein. /note=Primary Annotator Name: Koetters, Owen /note=Auto-annotation: Glimmer and Gene Mark both call a start site at position 30337. /note=Coding Potential: Both the host and self trained Gene Mark indicate a significant amount of coding potential within the auto-annotated ORF, but there also appears to be coding potential in the self trained GeneMark (not in host trained) that extends slightly past position 30400. /note=SD (Final) Score: The auto-annotated start site received an RBS final score of -6.075, which is not the closest to zero of all potential ORFS. One potential ORF with a start site of 30415 contains a final score of -3.469, which is favorable compared to the auto-annotated start. /note=Gap/overlap: The auto-annotated gap is 74, which does not fall within the probable range if the gene that is transcribed before it is indeed a reverse gene. However, the previously mentioned ORF with a start site of 30415 has a gap (overlap) of -4 nucleotides with the previous gene, a probable feature of operon contained genes. /note=Phamerator: The pham called is 20963 as of 04/06/22. Pham 20963 is highly conserved and includes 36 members, two of which are phage Kitta and phage Polly. /note=Starterator: Start site 3 was manually annotated the most often in this pham, as it was called in 36/36 (100%) non-draft phage genomes in this pham and manually annotated 85.1% of the time when present. This start site matches that which is called in RavenCo17, supporting the auto-annotated start site at position 30337. Start site 30415 corresponds to start site 1, which is only found in 13/36 (36.1%) genes in this pham and manually annotated 23.1% of the time when present. /note=Location call: Given the above evidence, this is a real reverse gene. However, there is conflicting evidence supporting both start site 30337 and start site 30415. Because the auto-annotated start site does not include all coding potential indicated in the self-trained GeneMark and because that start site also has a -4 nucleotide overlap with the previous gene, I believe the start site needs to be changed. /note=Function call: Membrane Protein. There is not much evidence that supports a more specific function, as many unknown and hypothetical functions are predicted using HHpred and a PhagesDB blast. However, this gene does contain two Transmembrane Domain predicted by TMMhMM and TOPCONS. Furthermore, an NCBI blast hit with acceptable evidence (high probability, coverage, and low e-values) is also predicted to be a membrane protein. /note=Transmembrane domains: There are two TMDs predicted by TOPCONS and TMHMM, suggesting that this gene product is membrane associated. /note=Secondary Annotator Name: Abana, Juana /note=Secondary Annotator QC: I agree with the functional annotation CDS complement (30412 - 30576) /gene="38" /product="gp38" /function="membrane protein" /locus tag="RavenCo17_38" /note=Original Glimmer call @bp 30546 has strength 5.83; Genemark calls start at 30501 /note=SSC: 30576-30412 CP: yes SCS: both-cs ST: NI BLAST-Start: [membrane protein [Gordonia phage Neobush]],,NCBI, q1:s1 100.0% 6.85697E-25 GAP: 69 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.887, -5.604302664956637, no F: membrane protein SIF-BLAST: ,,[membrane protein [Gordonia phage Neobush]],,QUE26322,88.8889,6.85697E-25 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Mascareno, Greta /note=Auto-annotation: Glimmer calls the start site at 30546 and GeneMark calls the start at 30501. /note=Coding Potential: Coding potential is found both in GeneMark Self and Host. The ORF encompasses the gene from start site to stop. /note=SD (Final) Score: The best final score on PECAAN is -3.491 which corresponds to the best Z score of 2.774, however the gap at this start is 144 bp. Although a gap of at least 50 bp is required since the next gene switches to the forward direction, this gap is too large. The next best start site is at 30576 which has a final score of -5.604 and Z score of 1.887. /note=Gap/overlap: The start site of 30576 has a gap of 69 bp which aligns with the need for at least 50 bp between the next gene that switches to the forward direction. /note=Phamerator: The pham number as of 04/05/2022 is 5975. The gene is conserved; it is found in phage Antonio, BatStarr and Bosnia from cluster CZ1. /note=Starterator: Start site 19 is called as the start site for this gene however it only has two manual annotations and it has a bigger gap and worse scores than start site 18 at 30576 bp. This start site has been manually annotated 6 times. Start site 20 has also been manually annotated 6 times but it is not available for this gene. The start site chosen based on the evidence is start site 18 at 30576 bp. This start site was not called by Glimmer or GeneMark. /note=Location call: Considering the evidence, this is a real gene with the start site of 30576 bp. /note=Function call: Although most evidence on HHPRED and Phagesdb BLAST indicate that the function of this gene is unknown, NCBI BLAST gives evidence for membrane protein with 87% identity, 88.8% alignment and 100% coverage. There are also hits in TmHmm and Topcons so this may be a membrane protein. /note=Transmembrane domains: TmHmm provides 1 hit and Topcons also showed evidence that this gene may be a membrane protein. /note=Secondary Annotator Name: Charton, Chris /note=Secondary Annotator QC: I have QCed this gene and agree with the location and function calls. For transmembrane domains, TmHmm and Topcons don`t provide "hits" per say, they predict TMDs. Both predict this protein has 1 TMD, sufficient evidence to suggest this is a membrane protein. For location call, also mention the gene length, real genes are almost always 120bp or longer, so that eliminates most potential start sites, and gives good evidence that the selected start site is the best call. CDS 30646 - 30777 /gene="39" /product="gp39" /function="hypothetical protein" /locus tag="RavenCo17_39" /note=Original Glimmer call @bp 30646 has strength 2.17 /note=SSC: 30646-30777 CP: no SCS: glimmer ST: NI BLAST-Start: [hypothetical protein SEA_EMSQUAREDA_34 [Gordonia phage EMsquaredA] ],,NCBI, q7:s21 86.0465% 3.14538E-15 GAP: 69 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.229, -5.080948913705638, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_EMSQUAREDA_34 [Gordonia phage EMsquaredA] ],,QAY17637,61.4035,3.14538E-15 SIF-HHPRED: SIF-Syn: /note=Instructor note: tricky gene. Does have other real members in pham, but is placed between two reverse genes which leaves liimited room for changes in transcription direction. Included as real gene based on other pham members. Chose start 30646 based on SD score + gap of 69 to leave more room for direction change. /note=Primary Annotator Name: Shaikh, Iman /note=Auto-annotation: The auto-annotated start site for this gene is 30646 according to Glimmer. There is no start site predicted by GeneMark. /note=Coding Potential: There is no coding potential for this gene at start site 30646 according to the host-trained GeneMark. There is a bit of coding potential according to the self-trained GeneMark. /note=SD (Final) Score: The SD (final) score for this start site is -5.081, which is the second highest score. This start site has an ATG start codon. /note=Gap/overlap: There is a gap of 99 base pairs at this start site. /note=Phamerator: According to an analysis run on Mar 31, 2022, this gene belongs to pham 6422. There are 7 members in this pham, and RavenCo17 is the only phage from the CZ cluster that is part of this pham. /note=Starterator: Pham 6422 has 7 members, two of which are drafts. The most annotated start site is 8, called in 3 of 5 non-draft genes for this pham. The start number for this gene is start 9, and is the only gene in this pham with this start site. There are no manual annotations for this gene at this start site. The information provided by starterator was not informative. /note=Location call: Based on the evidence above, this is not a real gene. /note=Function call: There is no known function for this gene according to PhagesDB BLAST, which makes sense since it does not seem like this gene is real. /note=Transmembrane domains: There are no transmembrane domains predicted for this gene since this gene does not appear to be real. /note=Secondary Annotator Name: Nguyen, Calvin /note=Secondary Annotator QC: Agree with call that this gene does not appear to be real, can include further evidence and reasoning of NR Gene such as length, gap, etc. Mark GM coding capacity dropdown and make note of the non-real gene in the synteny box. CDS complement (30841 - 31404) /gene="40" /product="gp40" /function="lipoprotein" /locus tag="RavenCo17_40" /note=Original Glimmer call @bp 31404 has strength 12.96; Genemark calls start at 31404 /note=SSC: 31404-30841 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BIZ73_gp34 [Gordonia phage Eyre] ],,NCBI, q6:s5 97.3262% 2.55131E-125 GAP: 388 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 0.955, -6.857740919865087, no F: lipoprotein SIF-BLAST: ,,[hypothetical protein BIZ73_gp34 [Gordonia phage Eyre] ],,YP_009292425,95.6989,2.55131E-125 SIF-HHPRED: Uncharacterized lipoprotein yjhA; YJHA_BACSU, yjhA, lipoprotein, SR562, NESG, Structural Genomics, PSI-2, Protein Structure Initiative, Northeast Structural Genomics Consortium, Membrane; HET: MSE; 2.4A {Bacillus subtilis},,,3CFU_B,72.7273,99.0 SIF-Syn: The downstream gene has a function of tyrosine integrase The upstream gene has no known function Comparing to phages Angelique from Cluster CY1 and Cynthia from Cluster CZ2, we find that all three of them have the same genes and same Pham number within the same order. /note=Primary Annotator Name: Hu,Yixiao /note=Auto-annotation: Both the Glimmer start and GeneMark Start is 31404 /note=Coding Potential: coding potential can both be seen in the Host-Trained GeneMark and Self-Genemark. /note=SD (Final) Score: -3.595 is the best final score and the corresponding z-score is 2.722, which is also the best z-score. /note=Gap/overlap: There’s a gap of 13 upstream (left) and a huge gap downstream (to the right)l however, there is a change in direction to forward, and this gap is maintained in phage Bjanes7 /note=Starterator: Start 2 (31404) is called 22/22 times including in RavenCo17. /note=Location call: This gene is a real gene and its start site should be 31404. /note=Function call: In phages BLAST, the first hit Eyre has function of lipoprotein, with a low e-value of 1e-100. In NCBI Blast, the first hit suggest that the gene is a hypothetical protein (or might be a lipoprotein as well), with a 94.6237% identity, 100% coverage, and a low e-value of 2.55131e-125. Also the NCBI suggest in the second hit that the gene may be a lipoprotein, with with a 88.2353% identity, 99% coverage, and a low e-value of 3.60711e-100. In HHpred, the sixth hit states the gene is lipoprotein, with a high probability of 99% and low e-value of 1.3e-7. CDD has no hits. /note=Transmembrane domains: From both TMHMM and TOPCONS, we can see no transmembrane domains. /note=Secondary Annotator Name: Reyes, Glania /note=Secondary Annotator QC: I agree with this annotation. CDS 31793 - 32899 /gene="41" /product="gp41" /function="tyrosine integrase" /locus tag="RavenCo17_41" /note=Original Glimmer call @bp 31793 has strength 9.53; Genemark calls start at 31793 /note=SSC: 31793-32899 CP: yes SCS: both ST: SS BLAST-Start: [tyrosine integrase [Gordonia phage BeeGee]],,NCBI, q1:s1 100.0% 0.0 GAP: 388 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.425, -4.759808411204065, no F: tyrosine integrase SIF-BLAST: ,,[tyrosine integrase [Gordonia phage BeeGee]],,QTF81745,98.913,0.0 SIF-HHPRED: Integrase; Integrase, tyrosine recombinase, integration, site-specific recombination, hydrolase; 1.9A {Enterobacteria phage P2} SCOP: d.163.1.1,,,5C6K_B,81.7935,100.0 SIF-Syn: /note=Primary Annotator Name: Abana, Juana /note=Auto-annotation: Glimmer and GeneMark both call the start at 31793 /note=Coding Potential:The coding potential in this ORF is predominantly seen in the forward strand which shows that this is a forward gene. There is coding potential found in both Host GeneMark and Self GeneMark. Moreover, both the Host and Self GeneMark include all of the coding potential from the chosen Glimmer and GeneMark start site. /note=SD (Final) Score: The most reasonable final score is -4.760 since its among the least negative and the most reasonable Z-score is among the highest at 2.425. /note=Gap/overlap: Upstream of the gene there is a 388 base pair gap while downstream there is a 2,238 base pair gap since the gene following this one was removed and therefore caused a bigger gap . /note=Phamerator: The pham number as of April 6, 2022 is 102798. The gene is conserved in phages Hortense and Howe that also belong to cluster CZ. /note=Starterator: Start site 100 in Starterator was manually annotated in 221/362 non-draft genes in this pham. Start 100 is 31793 in RavenCo17. This evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the information above this is a real gene and the most likely start is 31793. /note=Function call: Tyrosine integrase. The top two phagesdb BLAST hits have the function of tyrosine integrase and have an e-value of 0 which is great. Some of the NCBI BLAST hits called the function as a tyrosine integrase, the two selected hits have 100% coverage, 97.8261% identity, and an e-value of 0. HHpred had one hit for tyrosine integrase which had a 100% probability, 81.7935% coverage, and an e-value of 4e-30. CDD also had two hits that had 34.6939% and 26.0355% identity, 45.5543% and 43.75% coverage and e-values of 2.70207e-38 and 4.01057e-22. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Bovee, Alyson /note=Secondary Annotator QC: I agree with the primary annotator that the function of this gene is a tyrosine integrase. CDS complement (33591 - 34553) /gene="42" /product="gp42" /function="hypothetical protein" /locus tag="RavenCo17_42" /note=Original Glimmer call @bp 34553 has strength 13.95; Genemark calls start at 34553 /note=SSC: 34553-33591 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein [Rhodococcus sp. B50] ],,NCBI, q1:s1 97.5% 6.85317E-161 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.957, -2.8454123590742793, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Rhodococcus sp. B50] ],,WP_213930472,87.1795,6.85317E-161 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Nguyen, Calvin /note=Auto-annotation: Both Glimmer and GeneMark call the startsite of the reverse-coding gene at 34553. /note=Coding Potential: Both GeneMark Host and GeneMark Self show prominent coding potential at the suggested start site of 34553 and the stop site of 33591. GeneMark Self appears to have greater amounts of coding potential than the GeneMark Host, but both are visible and substantial. /note=SD (Final) Score: The Final Score of the start site is -2.845, and the Z-score of the start site is 2.957. The Z-score score is the highest out of all available start sites, and the final RBS score is also the highest out of all available sites. /note=Gap/overlap: Overlap of suggested start site is -1 bp, which is reasonable. The starting codon is ATG, which is common and could also indicate the presence of an operon. Synteny does not appear to be present in this gene, and genome architecture is not shared amongst phages of different subclusters (e.g. phage Polly, Suscepit, and TimTam of CZ1). /note=Phamerator: As of 4/7/22, this gene is located in Pham 15205. This gene appears to be an orpham as it is currently the only gene in its pham; thus, the gene does not appear to be conserved within other phages. /note=Starterator: Starterator is not usable in this gene, as it is the only one in its pham. Thus, it is not informative and not available. /note=Location call: According to all available evidence (large gene, substantial coding potential), this gene appears to be real with a start site of 34553 as predicted by Glimmer and GeneMark. It should be noted that there is a large gap downstream before the transition from forward coding to reverse coding. /note=Function call: NKF. PhagesdbBlast did not result in any significant hits; the highest result was Powerball, which was a tape measure protein with a low e-value of 0.27. HHPred additionally provided no evidence of a gene function, leading the gene to be considered NKF as of the time of writing. /note=Transmembrane domains: TmHmm did not predict any TMHs in this gene, and Topcons provided no hits for TMHs as well; thus, there is no evidence that this gene has any transmembrane domains. /note=Secondary Annotator Name: De Schutter, Elena /note=Secondary Annotator QC: I agree with both the location and function call. I also think the large gap is probably due to the switch from forward to reverse so its presence would make sense! Great notes. CDS complement (34553 - 35137) /gene="43" /product="gp43" /function="immunity repressor" /locus tag="RavenCo17_43" /note=Original Glimmer call @bp 35137 has strength 11.67; Genemark calls start at 35137 /note=SSC: 35137-34553 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_EMSQUAREDA_43 [Gordonia phage EMsquaredA] ],,NCBI, q1:s1 94.8454% 1.57225E-126 GAP: 221 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.229, -5.353489640146256, no F: immunity repressor SIF-BLAST: ,,[hypothetical protein SEA_EMSQUAREDA_43 [Gordonia phage EMsquaredA] ],,QAY17646,93.8462,1.57225E-126 SIF-HHPRED: a.35.1.3 (A:1-68) SinR repressor, DNA-binding domain {Bacillus subtilis [TaxId: 1423]},,,d1b0na2,35.0515,98.3 SIF-Syn: /note=Primary Annotator Name: Reyes, Glania /note=Auto-annotation: Both Glimmer and GeneMark call the gene and agree on the same start site at 35137 bp. /note=Coding Potential: Coding potential in the ORF is substantial on the reverse strand, indicating that this is a reverse gene. Coding potential is found in GeneMark Self and Host. /note=SD (Final) Score: -5.353. This is not the best final score, but the z-score is 2.229, which is a good score. /note=Gap/overlap: 221. This is a large gap, larger than the recommended 50 bp. The pham maps show no missing gene. /note=Phamerator: The pham number as of April 8, 2022 is 92936. The gene is conserved in phages Marteena, Horus, and Easley. The function call for all these genes is an immunity repressor. /note=Starterator: There are 7 members in this pham, 2 of which are drafts. Start 6 is found in 7/7 of genes in the pham, which correlates to a start site of 35137 bp for RavenCo17. There are also 4 manual annotations of this start site. This evidence agrees with Glimmer and GeneMark calling 35137 bp as the start site. /note=Location call: Considering the above evidence, this gene is a real gene and has a start site at 35137 bp. Starterator agrees with Glimmer and GeneMark. /note=Function call: Immunity Repressor. Multiple phagesDB BLAST has top hits with the suggested function immunity repressor with very small e values of ~2e-98. Multiple NCBI BLAST also has hits with the function immunity repressor with small e values ranging from 9e-29 to 9e-20. HHpred shows multiple hits with the repressor (immunity repressor not specified). E-values are small ~0.00002. CDD had no hits. /note=Transmembrane domains: TmHmm predicts no TMDs. Topcons also predicts no TMDS. This makes sense because the function for the gene is an immunity repressor. /note=Secondary Annotator Name: Sharma, Devshi /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered, including coding potential, starterator, gap/overlap, synteny, and final score.Maybe add more to the synteny CDS 35359 - 35661 /gene="44" /product="gp44" /function="helix-turn-helix DNA binding domain" /locus tag="RavenCo17_44" /note=Original Glimmer call @bp 35359 has strength 8.41; Genemark calls start at 35353 /note=SSC: 35359-35661 CP: yes SCS: both-gl ST: SS BLAST-Start: [helix-turn-helix DNA binding domain protein [Gordonia phage Horus] ],,NCBI, q1:s3 95.0% 1.56176E-53 GAP: 221 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.415, -3.9531326986493704, no F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding domain protein [Gordonia phage Horus] ],,YP_009808292,85.5769,1.56176E-53 SIF-HHPRED: Endothelial differentiation-related factor 1; EDF1, HMBF1alpha, helix-turn-helix, Structural Genomics, NPPSFA, National Project on Protein Structural and Functional Analyses, RIKEN Structural; NMR {Homo sapiens} SCOP: l.1.1.1, a.35.1.12,,,1X57_A,73.0,99.1 SIF-Syn: /note=Notes on start site: Diverse pham, and I ended up disagreeing with Starerator most-annotated call, 35395. Start @35395 has 13/38 manual annotations, but start 35359 has much better RBS score + closes gap (and is manually annotated 3/38 times). Ended up choosing 35359. -AF /note=Primary Annotator Name: Bovee, Alyson /note=Auto-annotation: Genemark and Glimmer called different start sites, 35353 and 35359, respectively. The Glimmer score is 8.41 and the start codon is ATG, which is common, so the likely start site is 35359. /note=Coding Potential: There is very strong coding potential on the host trained graph with the start site being 35359 and the stop site being 35661. The self trained graph also has very strong coding potential throughout the gene, but starts to drop near the end of the gene. This is evidence that the start site is likely 35359. /note=SD (Final) Score: The final score is -3.593, which is the least negative value for all of the potential start sites, which is evidence that 35359 is the best start site. /note=Gap/overlap: The gap for this gene is 221bp, which is large and could indicate that there is a gene missing upstream. However, when shown on Pham maps, though the genes do not overlap with phage Sidious, they both have the same layout of genes, and there is also a gap with very similar length in Sidious, thus showing this could be apart of the normal genome. Also, the gap between this gene and the gene preceding it as the upstream gene is on the reverse strand, and so would need significant space between it and this gene on the forward strand. There is no coding potential up or downstream. /note=Phamerator: This pham: majority of members in cluster CY, while RavenCo17 is in CZ. The phages in this pham all have genes of length around 303bp, including RavenCo17. The RavenCo17 gene was compared to phages in Che9d and ThetaBob, both in cluster F, but have the same gene length and are of function helix-turn-helix DNA binding protein. /note=Location call: Based on the guiding principles met, including the least negative final score and Starterator giving strong evidence of the start site, the start site must be 35359. /note=Function call: PhageDB Function Frequency called the function to be helix-turn-helix DNA binding protein. This function is found in the pham 14011, along with many other phams that are similar to RavenCo17. BLAST compared RavenCo17 to ThetaBob which also showed a helix-turn-helix domain, with an e-value of 6e-23. HHPRED also showed a hit with helix-turn-helix DNA binding protein with a 99.1% probability and an e-value of 3.7e-8. Based on the evidence provided, the function call of this gene is a helix-turn-helix DNA binding protein. /note=Transmembrane domains: TmHmm and Topcons both do not call the gene to be a transmembrane protein. /note=Secondary Annotator Name: Carreon, Justin /note=Secondary Annotator QC: I agree with the function and location call. CDS 35658 - 35978 /gene="45" /product="gp45" /function="hypothetical protein" /locus tag="RavenCo17_45" /note=Original Glimmer call @bp 35658 has strength 6.64; Genemark calls start at 35658 /note=SSC: 35658-35978 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_EMSQUAREDA_45 [Gordonia phage EMsquaredA] ],,NCBI, q1:s1 100.0% 7.72239E-61 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.114, -2.583959800616441, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_EMSQUAREDA_45 [Gordonia phage EMsquaredA] ],,QAY17648,89.6226,7.72239E-61 SIF-HHPRED: SIF-Syn: NKF; upstream gene is helix-turn-helix DNA binding domain (pham 14011) and downstream gene is NKF (pham 102382). We observe some synteny with cluster CY phages EMsquaredA and Marteena, which have upstream gene with same pham of 14011, although it has excise function, and downstream gene with NKF and pham of 102382. /note=Primary Annotator Name: Cini, Victoria /note=Auto-annotation: Gimmer and GeneMark. Both call a GTG start site at 35658. /note=Coding Potential: The gene has high coding potential predicted within the putative ORF. In the third panel for both H-GeneMark and S-GeneMark maps, we observe the highest coding potential, indicating that this is a forward gene. We observe coding potential throughout the entirety of the gene sequence, and the chosen start site captures all the coding potential. /note=SD (Final) Score: The chosen start site of 35658 has the best final score (-2.584) and z-score (3.114) compared to other potential start sites on PECAAN. /note=Gap/overlap: 4bp overlap with the upstream gene, indicating that this gene may be part of an operon. All the other potential start sites have larger gap sizes of 8bp and >= 143bp and worse final scores and z-scores. /note=Phamerator: Pham 97441 has 49 members with 33 non-draft members (04/01/22). Within this pham, gene length, gene number, and phage cluster vary considerably, although function is all NKF. This pham is present mainly in cluster CY, CZ, and DN phages, and gene (stop @35978, F) shows synteny with other cluser CZ phages like Cynthia, Denise, and Albert. /note=Starterator: Start number 14, corresponding to the basepair coordinate 35658 in RavenCo17, is the most annotated start in Starterator. This site has manual annotations in 21/33 non-draft genes in the pham. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 35658. /note=Function call: NKF. The top Phagesdb BLAST hits (e-values < 1e^-46) come from cluster CY and CZ phages and have NKF assigned to them. These genes have the same pham of 97441, same sequence length of 106, and similar protein numbers to RavenCo17`s gene (stop@6351 F). The top NCBI BLAST hits (e-value <= 1e^-46, 100% coverage, 73%+ identity) also come from cluster CY and CZ phages and have hypothetical protein function associated with them. HHpred had no relevant hits (e-value= 53 >>> e^-7), and CDD had no hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Koetters, Owen /note=Secondary Annotator QC: I agree with the predicted start site and manually annotated functional call. CDS 35992 - 36366 /gene="46" /product="gp46" /function="hypothetical protein" /locus tag="RavenCo17_46" /note=Original Glimmer call @bp 35992 has strength 9.42; Genemark calls start at 36034 /note=SSC: 35992-36366 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_MALIBO_49 [Gordonia phage Malibo]],,NCBI, q1:s1 100.0% 2.67747E-70 GAP: 13 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.574, -6.499514260064533, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_MALIBO_49 [Gordonia phage Malibo]],,UAJ16225,92.7419,2.67747E-70 SIF-HHPRED: SIF-Syn: Unknown function gene (pham 102382), upstream gene is NKF, (pham 97441), downstream gene is a DNA polymerase III sliding clamp (Beta) (pham 102408), in phage Suerte this gene and the one upstream is conserved. For phage Vasanti, this gene and the one downstream is conserved. /note=Primary Annotator Name: De Schutter, Elena /note=Auto-annotation: Glimmer calls the start site at 35992 and GeneMark at 36034. /note=Coding Potential: There is coding potential for both of these start sites and it does not cover those sites but covers the rest of the gene. /note=SD (Final) Score: For 35992, the Z score is 1.574 and the final score is -6.500. For 36034, the values are a little bit better with the Z score being 2.433 and the final score being -3.917. /note=Gap/overlap: This part really helps with starting to choose our start site because the gap for 35992 is 13 but for 36034 it is 55 which is too large and does not really make sense when looking at pham maps. There is also no reason to make the gap this large. /note=Phamerator: pham 102382 as of 04/5/22, this pham contains genes from many different clusters (CY, CH, DW, CZ, and 2 singletons). There is no function associated with these genes. /note=Starterator: This gene has the most annotated start site which is #7 and was called in 4 of the 11 non-draft genes. It corresponds to the site called by Glimmer, 35992. The other site was found more often when looking at the tracks, but it was most often called when the other start site was not present. It was called in 3 of the 11 non-draft genomes. /note=Location call: Due to the results seen in Starterator, I would call the start site to be the one predicted by Glimmer (which generally takes precedence over GeneMark as well). /note=Function call: Unknown function, the top hits on phagesDB all have unknown function (5e-58). HHPRED did not have any relevant hits, CDD did not have any hits, and NCBI blast only had hits for hypothetical proteins which had good values (100% coverage, 92.7% aligned, 2.677e-70 e-values). /note=Transmembrane domains: There were no transmembrane domains and thus it is not a membrane protein. /note=Secondary Annotator Name: Koetters, Owen /note=Secondary Annotator QC: I agree with the predicted start site and manually annotated functional call. CDS 36363 - 36593 /gene="47" /product="gp47" /function="hypothetical protein" /locus tag="RavenCo17_47" /note=Original Glimmer call @bp 36363 has strength 10.16; Genemark calls start at 36363 /note=SSC: 36363-36593 CP: yes SCS: both ST: SS BLAST-Start: [DNA polymerase III beta subunit [Gordonia phage Bjanes7] ],,NCBI, q1:s1 90.7895% 4.91676E-34 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.594, -4.0338999058337315, yes F: hypothetical protein SIF-BLAST: ,,[DNA polymerase III beta subunit [Gordonia phage Bjanes7] ],,ATW60741,85.1351,4.91676E-34 SIF-HHPRED: SIF-Syn: /note=Functional call: a couple hits for DNA polymerase III sliding clamp (Beta), but unclear what those hits were based on. Calling NKF for now. -AF /note=Primary Annotator Name: Sharma, Devshi /note=Auto-annotation: Both Glimmer and Genemark agree with each other and say that the start site is 36363. Since they both agree with each other, this is likely the start site. The start codon that is called is ATG. /note=Coding Potential: The host genemark graph has pretty good coding potential predicted within the open reading frame, however, the self genemark graph is mediocre throughout the open reading frame. The start site is seen in the coding potential in both sequences. When I compared the gene sequence to other sequences, there were many similarities which helps me decide if this is a real gene. The start site covers the entire coding potential. /note=SD (Final) Score: -4.034 is the final score. Out of all the scores, this is the best score. The Z score is also greater than 2 which is the best score out of all of them. /note=Gap/overlap: -4. The overlap is -4 which means there could be a potential operon present here. I do not believe that there are alternative start candidates because of the evidence given from Glimmer and Genemark. This reading frame is not the LORF, however, the length of the gene is acceptable being around 231 base pairs long. /note=Phamerator: Pham 102408. 04/06/2022. I can see that the function shown is DNA polymerase III sliding clamp for some of the phages and that the length of the gene is roughly 225+ which is conserved throughout each sequence. There are a variety of clusters present within this pham but mainly CZ and CY. The gene is conserved within other members of the pham and I compared them to Attis and Antonio. Both of these had similar base pair lengths to my gene and also the same function. /note=Starterator: Pham number 102408 has 73 members, 9 are drafts. This includes genes that call this "Most Annotated" start for RavenCo17. Start: 20 @ 36363 has 64 MA`s which is strong evidence that 36363 is the start site. /note=Location call: I believe this gene is a real gene given all the evidence presented above. The best start site is 36363 for this gene. /note=Function call: Looking at PhagesDb, some of the functions were DNA polymerase III sliding clamp which makes me think my gene is also DNA polymerase III sliding clamp. In HHPRED, there are many phages shown with high percent coverage and probability, but e values are horrible which means I did not consider this for function. For NCBI BLAST, I can see that some genes call this a DNA polymerase III sliding clamp and some are hypothetical. They have very good e values making me think that this is a DNA polymerase III sliding clamp. /note=Transmembrane domains: No transmembrane domains. Both TMHMM or TOPCONS did not provide any TMDs which did not help me figure out the function of my gene. /note=Secondary Annotator Name: Mascareno, Greta /note=Secondary Annotator QC: I agree with the start site and function calls of this gene. You could also add the Z score since it is grater than 2 and the best of all scores as support in the SD (Final) Score section. Make sure to fill out all drop down menus. CDS 36590 - 36871 /gene="48" /product="gp48" /function="excise" /locus tag="RavenCo17_48" /note=Original Glimmer call @bp 36590 has strength 3.79 /note=SSC: 36590-36871 CP: yes SCS: glimmer ST: SS BLAST-Start: [excise [Gordonia phage EMsquaredA] ],,NCBI, q1:s1 100.0% 5.60163E-54 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.534, -3.6907860541164537, yes F: excise SIF-BLAST: ,,[excise [Gordonia phage EMsquaredA] ],,QAY17651,97.8495,5.60163E-54 SIF-HHPRED: Putative DNA-binding protein; BldC, S. coelicolor, developmental switch, MerR-like, DNA BINDING PROTEIN-DNA complex; 3.09A {Streptomyces venezuelae},,,6AMA_A,66.6667,98.7 SIF-Syn: In the phages in the cluster such as Clark, Dolores, and MichaelScott, both this gene and the downstream gene are conserved, while the upstream gene is replaced by a pham with members of similar function to that of the upstream pham present in RavenCo17. Furthermore, in phages in the cluster missing this gene, such as AlumE and Bjanes, the phase of the upstream and downstream genes are conserved as with RavenCo17, and a gene of another pham with members of similar function to members of this gene`s pham replaces the gene within the implied operon. /note=Primary Annotator Name: Carreon, Justin /note=Auto-annotation: Glimmer predicts the gene start to be at position 36590, while GeneMark makes no prediction as to start location. The Auto-Called start codon for this gene is an ATG. /note=Coding Potential: Self and Host-Trained GeneMarks both indicate coding potential in an ORF for the region the gene is predicted to occupy. In the Host-Trained GeneMark where coding potential for this region is mainly on a single ORF, the coding potential is completely encompassed by the predicted start site by Glimmer. /note=SD (Final) Score: This gene has the highest Z Score and Final Score of the potential start sites on this ORF, with scores of 2.534 and -3.691 respectively, and is also the LORF. Even if the scores were not the best of the start site candidates, the predicted start site implies this gene’s participation in an operon, and so final and z score would be irrelevant. /note=Gap/overlap: There is an overlap of 4 base pairs, which is indicative of this gene’s presence in an operon. /note=Phamerator: This gene belongs to pham 11205 as of 04/07/2022 at 1858 hours PST. The other genes in this pham are conserved principally in Cluster CZ, with some relevance in Cluster CY and a singleton. In Cluster CZ, this gene is conserved in phages such as Clark, Floral, Sekhmet and is annotated as an excise gene. This gene is also annotated in phage BeeGee (CY) as a helix-turn-helix DNA binding domain. /note=Starterator: As of 04/01/2022, the most called start number for genes in this pham was start number 6, it was called in all 20 of the non-draft genes in the pham, such as in Clark (CZ4), Howe (CZ4), and Pollux (CY1). In RavenCo17, start number 6 corresponds to position 36590. /note=Location call: Based on the presence of coding potential for the region between the stop of 36871 and the predicted start of 36590 by Glimmer and Starterator, as well as this gene’s conservation with it’s cluster, this gene is real and starts at position 36590, with its start codon being an ATG. Glimmer and Starterator agree on the start location, whereas GeneMark does not predict start site. /note=Function call: Excise protein. Several hits on PhagesDB Blast return this protein as belonging to an excise protein or otherwise as a helix-turn-helix (HTH) DNA binding protein. Given that HTH DNA binding is the safer call for proteins with less evidence, excise is a possibility. Hits on HHPRED primarily return as HTH and putative excise proteins, while on NCBI Blastp, several hits return as that of excise proteins for phages within the same cluster as this gene’s virus. Additionally, many of the genes in this gene’s pham (11205) have been called as excise proteins. Even in phages in the cluster missing this gene or a gene in the pham such as in Bosnia (CZ) and GemG (CZ), the phams of the immediate upstream and downstream genes of this gene are present (102408 and 12689 respectively) and maintain synteny as another pham with members called as excise proteins (102261) replaces this gene. /note=Transmembrane domains: There are no transmembrane domains predicted by TmHmm or TOPCONS /note=Secondary Annotator Name: Shaikh, Iman /note=Secondary Annotator QC: I agree that the start site for this gene is 36590 based on the presence of coding potential and Glimmer. I agree that the function of this protein is excise based on PhagesDB BLAST and NCBI BLAST CDS 36868 - 37074 /gene="49" /product="gp49" /function="hypothetical protein" /locus tag="RavenCo17_49" /note=Original Glimmer call @bp 36868 has strength 14.84; Genemark calls start at 36868 /note=SSC: 36868-37074 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_NYMPHADORA_52 [Gordonia phage Nymphadora] ],,NCBI, q1:s1 100.0% 1.80993E-38 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.218, -4.4157344874810125, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_NYMPHADORA_52 [Gordonia phage Nymphadora] ],,YP_009286097,98.5294,1.80993E-38 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Koetters, Owen /note=Auto-annotation: Glimmer and Gene Mark both call a start site at position 36868. /note=Coding Potential: Both the host and self trained Gene Mark indicate a significant amount of coding potential within the auto-annotated ORF and all coding potential is covered. /note=SD (Final) Score: The auto-annotated start site received an RBS final score of -4.416, which is the closest to zero of all potential ORFS. This indicates the auto-annotated start position is favorable compared to the other options. /note=Gap/overlap: The auto-annotated gap is -4 nucleotides with the previous gene, a probable feature of an operon contained gene. This supports the auto-annotated start position. /note=Phamerator: The pham called is 12689 as of 04/06/22. Pham 12689 is highly conserved and includes 83 non-draft members, two of which are phage Adora and phage Ruthy. /note=Starterator: Start site 22 was manually annotated the most often in this pham, as it was manually annotated in 83/83 (100%) non-draft phage genomes. This start site matches that which is called in RavenCo17, supporting the auto-annotated start site at position 36868. /note=Location call: Given the above evidence, this is a real forward gene with the auto-annotated start position of 36868. /note=Function call: Hypothetical Protein. There is not much evidence that supports a more specific function, as many unknown and hypothetical functions are predicted using HHpred and a PhagesDB blast. There are two potential NCBI blast hits that contain a membrane protein annotation, however these annotations also predict hypothetical function as well. Further, there is only 1 TMHMM and no TOPCONS data available, so the membrane protein call cannot be made. /note=Transmembrane domains: There is one TMD predicted by TMHMM and no TOPCONS prediction available. We cannot determine if this protein is membrane associated. /note=Secondary Annotator Name: Hu, Yixiao (Sherry) /note=Secondary Annotator QC:Hu, Yixiao (Sherry), Yes CDS 37160 - 37501 /gene="50" /product="gp50" /function="hypothetical protein" /locus tag="RavenCo17_50" /note=Original Glimmer call @bp 37160 has strength 13.06; Genemark calls start at 37160 /note=SSC: 37160-37501 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ENALISNAILO_49 [Gordonia phage EnalisNailo]],,NCBI, q1:s1 100.0% 8.00155E-76 GAP: 85 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.593, -3.50848394673489, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ENALISNAILO_49 [Gordonia phage EnalisNailo]],,QDB74401,99.115,8.00155E-76 SIF-HHPRED: SIF-Syn: /note=Start site notes: Start site 46 is called as the start site for this gene which corresponds to Genemark and Glimmer, 37160. This start site is called most often when present (37 of 68 non-draft genes). However, start site 33 at 37100 has a smaller gap of 25, with a decent final and z score and it has been manually annotated 8 times. /note=Primary Annotator Name: Mascareno, Greta /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 37160. /note=Coding Potential: Coding potential is found both in GeneMark Self and Host. The ORF encompasses the gene from start site to stop. /note=SD (Final) Score: The best final score on PECAAN is -3.508, which corresponds to the second best Z score of 2.593, however the gap at this start is 85 bp. /note=Gap/overlap: Gap of 85 bp. Although a gap limit of 50 bp is exceeded, this start site is called the most often when present and it is conserved in genes like BaxterFox and Yeezy. /note=Phamerator: The pham number as of 04/09/2022 is 13252. The gene is conserved; it is found in phage BaxterFox and Yeezy from cluster CZ3. /note=Location call: Considering the evidence, this is a real gene and the chosen start is 46 at 37160 bp because it was called twice and has been conserved, however I would appreciate feedback on this. /note=Function call: NKF. Evidence on HHPRED indicates that the function of this gene is DNA mismatch repair protein with only 70% probability and 43% coverage, however Phagesdb BLAST states function unknown and NCBI BLAST give evidence for hypothetical protein at 98% identity, 99% alignment and 100% coverage. Transmembrane domains: TmHmm and Topcons provided no matches for TMHs so this gene is not a membrane protein. /note=Secondary Annotator Name: Abana, Juana /note=Secondary Annotator QC: I agree with all of the evidence that is provided but it is very conflicting since start site #46 has more manual annotations however start site #46 has a more reasonable gap. Moreover I agree with the function call. Also make sure to fill in synteny box. CDS 37505 - 38374 /gene="51" /product="gp51" /function="hypothetical protein" /locus tag="RavenCo17_51" /note=Original Glimmer call @bp 37505 has strength 12.51; Genemark calls start at 37505 /note=SSC: 37505-38374 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_NYMPHADORA_54 [Gordonia phage Nymphadora] ],,NCBI, q1:s1 100.0% 0.0 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.979, -4.904754965500383, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_NYMPHADORA_54 [Gordonia phage Nymphadora] ],,YP_009286099,99.654,0.0 SIF-HHPRED: SIF-Syn: This gene displays synteny with phage Beenie, as both phages share the same genomic architecture of upstream and downstream genes; /note=Primary Annotator Name: Shaikh, Iman /note=Auto-annotation: Both Glimmer and GeneMark call the start site for this gene at position 37505. /note=Coding Potential: There is coding potential present on both the Host-Trained and Self-Trained GeneMark /note=SD (Final) Score: The SD (final) score for this gene at start site 37505 is -4.905 with a start codon of ATG. /note=Gap/overlap: There is a gap of 3 base pairs at this start site which is very small, so it is likely that 37505 is the start site for this gene. /note=Phamerator: According to Mar 31, 2022, the pham that this gene belongs to is pham 10149. There are 84 members of this pham, and several of the members are also part of cluster CZ. /note=Starterator: The most conserved start site for genes in this pham is 4, which is called in 16/70 non-draft genes in the pham. RavenCo17_51 does not have this start site. The start site called for RavenCo17_51 is start 8 (37505), which is found in 15/84 genes in the pham, with 14 manual annotations (called 100% of time when present). /note=Location call: The start site for this gene is 37505 based on the evidence above. /note=Function call: NKF according to NCBI BLAST and PhagesDB BLAST /note=Transmembrane domains: There are no transmembrane domains predicted by TmHmm nor Topcons. /note=Secondary Annotator Name: Charton, Chris /note=Secondary Annotator QC: I have QCed this gene and agree with location and function calls. Remember to compare start site SD to other potential sites, mention that it is the highest. Mention specific phages for phamerator. . Need to mention that start site 8 is called 100% of the time when present. CDS 38371 - 38733 /gene="52" /product="gp52" /function="hypothetical protein" /locus tag="RavenCo17_52" /note=Original Glimmer call @bp 38371 has strength 10.95; Genemark calls start at 38371 /note=SSC: 38371-38733 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ENALISNAILO_51 [Gordonia phage EnalisNailo]],,NCBI, q1:s1 100.0% 1.76076E-72 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.287, -4.135088716821063, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ENALISNAILO_51 [Gordonia phage EnalisNailo]],,QDB74403,96.6667,1.76076E-72 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hu,Yixiao /note=Auto-annotation: Both the Glimmer start and GeneMark Start is 38371 /note=Coding Potential: The coding potential of this gene is seen in the forward strand so it is a forward gene. Also, the coding potential can both be seen in the Host-Trained GeneMark and Self-Genemark /note=SD (Final) Score: - 4.135 is the best final score and the corresponding z-score is 2.287, which is also the best z-score. /note=Gap/overlap: There’s a gap of 10 upstream and a 3 overlap downstream, which are both very reasonable. /note=Phamerator: The pham number of the gene is 56793. There are 107 members in total, and genes are from different clusters, like CZ,CY snd CV. /note=all parts of it belong to the Cluster CZ8 /note=Starterator: Start site 16 most annotated (39/91 non-draft calls). 16 is site 38371. Called 100% of time when present. /note=Location call: This gene is a real gene and its start site should be 38371. /note=Function call: In phages BLAST, the first hit RavenCo17_Draft has unknown function, with a low e-value of 4e-63. In NCBI Blast, the first hit suggests that the gene is a hypothetical protein, with a 93.3333% identity, 100 % coverage, and a low e-value of 1.76076e-72. In HHpred, the first hit states that the gene is Flagellar motor switch protein FliM, with a probability of 90.3, low coverage of 44.1667%, and a really high e-value of 4.5; the second hit suggests that the gene is domain of unknown function, with a probability of 49.9%, 37.5% coverage and also a high e-value of 54.CDD has no hits. Therefore, although HHpred shows some possibility of function, we still think this gene has no function (as all other evidence supports the unknown function) /note=Transmembrane domains: From both TMHMM and TOPCONS, we can see no transmembrane domains. /note=Secondary Annotator Name: Nguyen, Calvin /note=Secondary Annotator QC: Agree with call and function according evidence. CDS 38730 - 39557 /gene="53" /product="gp53" /function="RecE-like exonuclease" /locus tag="RavenCo17_53" /note=Original Glimmer call @bp 38730 has strength 12.13; Genemark calls start at 38730 /note=SSC: 38730-39557 CP: yes SCS: both ST: SS BLAST-Start: [RecE-like exonuclease [Gordonia phage Bosnia]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.127, -4.811774952186101, no F: RecE-like exonuclease SIF-BLAST: ,,[RecE-like exonuclease [Gordonia phage Bosnia]],,QOI66884,97.0909,0.0 SIF-HHPRED: Exodeoxyribonuclease 8; Exonuclease, Recombination, Hydrolase, Nuclease; 2.8A {Escherichia coli},,,3H4R_A,90.9091,99.9 SIF-Syn: RecE-like exonuclease. downstream gene is a RecT-like ssDNA binding protein which belongs to pham 102067. RecE genes are most likely to be called just upstream of RecT. /note=Primary Annotator Name: Abana, Juana /note=Auto-annotation: Glimmer and GeneMark both call the start at 38730 /note=Coding Potential: The coding potential in this ORF is predominantly seen in the forward strand which shows that this is a forward gene. There is coding potential found in both Host GeneMark and Self GeneMark. Moreover, both the Host and Self GeneMark include all of the coding potential from the chosen Glimmer and GeneMark start site. /note=SD (Final) Score: The most reasonable final score is -4.812 since it`s among the least negative and the most reasonable Z-score is among the highest at 2.127. Moreover, this is the most reasonable option since it has the smallest gap compared to the others. /note=Gap/overlap: Upstream of the gene there is a -4 base pair overlap while downstream there is also a 4 base pair overlap. This strongly suggests that this gene is part of an operon. /note=Phamerator: The pham number as of April 8, 2022 is 14263. The gene is conserved in phages Bunnybear and Polly that also belong to cluster CZ. /note=Starterator: Start site 95 in Starterator was manually annotated in 1/233 non-draft genes in this pham. Start 95 is 38730 in RavenCo17. This evidence agrees with the start site predicted by Glimmer and GeneMark however the evidence is a bit weak since it is only one manual annotation. /note=Location call: Based on the information above this is a real gene and the most likely start is 38730. /note=Function call: RecE-like exonuclease. The top two phagesdb BLAST hits have the function of RecE-like exonuclease and have an e-value of 0. Some of the NCBI BLAST hits also have the function of RecE-like exonuclease, the two selected have 100% and 99.6364% coverage, 94.5455% and 92.029% identity, and an e-values of 0. HHpred and CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Reyes, Glania /note=Secondary Annotator QC: I agree with this annotation! CDS 39554 - 40645 /gene="54" /product="gp54" /function="RecT-like ssDNA binding protein" /locus tag="RavenCo17_54" /note=Original Glimmer call @bp 39554 has strength 11.13; Genemark calls start at 39554 /note=SSC: 39554-40645 CP: yes SCS: both ST: SS BLAST-Start: [RecT-like ssDNA binding protein [Gordonia phage Bosnia]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.218, -4.274735973818826, no F: RecT-like ssDNA binding protein SIF-BLAST: ,,[RecT-like ssDNA binding protein [Gordonia phage Bosnia]],,QOI66885,98.0769,0.0 SIF-HHPRED: RecT ; RecT family,,,PF03837.17,41.5978,99.5 SIF-Syn: RecT-like ssDNA binding protein. For RavenCo17, the upstream gene is RecE-like exonuclease, and the downstream gene is NKF. There is synteny with the upstream gene with phage Bosnia (CZ1) and Yeezy (CZ3), however, both have different downstream genes. /note=Primary Annotator Name: Charton, Chris /note=Auto-annotation: Both Glimmer and GeneMark call this gene, with start site at 39554. /note=Coding Potential: There is significant coding potential seen for this gene in the second forward reading frame, seen both in HostTrained and SelfTrained reports. /note=SD (Final) Score: -4.275. This is the best final score amongst potential start sites, its Z-Score is 2.218, the second highest. /note=Gap/overlap: -4bp. There is a 4bp overlap with the upstream gene, this indicates that the gene is part of an operon. /note=Phamerator: As of 4/5/2022 the pham is 102067. It is conserved, seen in Bosnia (CZ1) and Yeezy (CZ3) /note=Starterator: Start site 51 is called for in 73 of 180 non-draft phages, and all phages which have this most annotated start site call for it. This start site corresponds to 39554 in RavenCo, and agree with the start site predicted by Glimmer and GeneMark /note=Location call: Based on the evidence, this is a real gene with a start site @ 39554. /note=Function call: RecT-like ssDNA binding protein. There are over 50 very strong hits on PhagesDB BLAST with e-values 1e-100 or less for this protein function. There are additional strong hits with e-values of 0 on NCBI Blast for this function as well. /note=Transmembrane domains: 0. Neither TMHMM or TOPCONS predict any TMDs. /note=Secondary Annotator Name: Bovee, Alyson /note=Secondary Annotator QC: I agree with the primary annotator that the correct function of this gene is a RecT-like ssDNA binding protein. CDS 40642 - 41484 /gene="55" /product="gp55" /function="hypothetical protein" /locus tag="RavenCo17_55" /note=Original Glimmer call @bp 40642 has strength 6.94; Genemark calls start at 40642 /note=SSC: 40642-41484 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_POSH_61 [Gordonia phage Posh]],,NCBI, q1:s1 95.7143% 3.04135E-161 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.239, -4.760068129112225, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_POSH_61 [Gordonia phage Posh]],,QXN74969,89.5911,3.04135E-161 SIF-HHPRED: SIF-Syn: NKF. Upstream is Pham 14363 with function RecE-like exonuclease, while downstream is Pham 56567. Downstream synteny is shared with phage Cynthia, but upstream synteny is not conserved. Conversely, upstream synteny is shared with phage GemG but downstream synteny is not conserved. /note=Primary Annotator Name: Nguyen, Calvin /note=Auto-annotation: Both Glimmer and GeneMark call the start position of the gene at 40642. /note=Coding Potential: Both GeneMark Host and Self appear to show substantial coding potential between stop site 41484 and the suggested start site 40642. All coding potential appears to be contained within the interval. /note=SD (Final) Score: The suggested start site has a Z-score of 2.239 and a final RBS score of -4.760, which is not the highest, but is still relatively high amongst the available start sites. /note=Gap/overlap: The gene has a reasonable overlap of -4 bp and has a start codon of ATG, suggesting the presence of an operon. This overlap is supported by synteny via phage Cynthia and GemG of Cluster CZ. /note=Phamerator: As of 4/8/22, this gene is located in Pham 56567. The gene appears to be heavily conserved within various phages of cluster CZ and CY, such as with phages AlumE, Antonio, and Attis. /note=Starterator: In Starterator, RavenCo17 does not have the most annotated start site of 44, which was called in 18 out of 50 non draft genes. It instead uses start site 13, which was called 100% of the time (8/8) of the time it was present in the gene. /note=Location call: According to all available evidence, it appears that this is a real gene. It likely starts at start site 40642 as was predicted. /note=Function call: NKF. Phagesdb BLAST’s top three hits of an unknown function had a score of 456, 456, and 438, and had e-values of 1e-128, 1e-128 and 1e-123 respectively. HHpred provided no further valuable information as to the function of the gene. NCBI BLAST calls have very strong e-values, and provide additional credence to NKF call. /note=Transmembrane domains: According to TmHmm, there are no predicted TMHs in the gene. There are also no hits provided by Topcons; thus, we can conclude that this gene is not a transmembrane protein. /note=Secondary Annotator Name: Cini, Victoria /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered, including coding potential, starterator, gap/overlap, synteny, and final score. Great work!:) CDS 41481 - 41663 /gene="56" /product="gp56" /function="hypothetical protein" /locus tag="RavenCo17_56" /note=Original Glimmer call @bp 41493 has strength 9.66; Genemark calls start at 41481 /note=SSC: 41481-41663 CP: yes SCS: both-gm ST: SS BLAST-Start: [hypothetical protein SEA_BJANES7_54 [Gordonia phage Bjanes7] ],,NCBI, q1:s1 100.0% 6.01143E-35 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.161, -4.532304881730868, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BJANES7_54 [Gordonia phage Bjanes7] ],,ATW60749,98.3333,6.01143E-35 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF, downstream gene is chaperonin, just like in phages Antonio, AlumE, and Attis. /note=Primary Annotator Name: Reyes, Glania /note=Auto-annotation: Glimmer and GeneMark call the gene but do not agree on the same start site. Glimmer calls the start site at 41493 bp and GeneMark calls the start site at 41481 bp. /note=Coding Potential: Coding potential in the ORF is substantial on the forward strand, indicating that this is a forward gene. Coding potential is found in GeneMark Self and Host. Both suggest the start site to be closer to 41481 bp. /note=SD (Final) Score: -4.532 at 41481 bp start. This is the best final score, with a z-score of 2.161, which is also the best score. /note=Gap/overlap: -4. This indicates that this gene is possibly part of an operon. /note=Phamerator: As of April 8, 2022, the pham number is 101749. The gene is conserved in phages Agueybana, AlumE, and Antonio, all of which are in cluster CZ with a conserved length of ~180 bp. The function call for all of these genes is unknown and it is consistent between Phamerator and the phams database. /note=Starterator: There are 101 members in this pham, 14 of which are drafts. Start 37, which correlates to a start site of 41481 bp has 81 MAs. Start 38, which correlates to a start site of 41493 bp has only 3 MAs. This evidence agrees with GeneMark calling 41481 bp as the start site. /note=Location call: 41481 bp. Based on the above evidence, this gene is a real gene and has a start site at 41481 bp. Starterator agrees with GeneMark. /note=Function call: NKF. Multiple phagesDB BLAST hits come up with functions unknown, with very low e-values ~5e-29. Multiple NCBI BLAST hits come up with hypothetical proteins, suggesting no known function, with very low e-values ~1.4e-34. HHPred also comes up with hypothetical proteins. CDD had no hits. /note=Transmembrane domains: TmHmm predicts no TMDs. Topcons also predicts no TMDS. /note=Secondary Annotator Name: De Schutter, Elena /note=Secondary Annotator QC: I agree with location and function call! You probably weren`t able to finish it before because other genes around you weren`t done but don`t forget the synteny box. One more small thing would just be for starterator, because the start site was called in 81 phages but out of how many? But that`s only a small detail, great job! CDS 41660 - 42298 /gene="57" /product="gp57" /function="chaperonin, DnaJ-like" /locus tag="RavenCo17_57" /note=Original Glimmer call @bp 41660 has strength 7.6; Genemark calls start at 41660 /note=SSC: 41660-42298 CP: yes SCS: both ST: SS BLAST-Start: [DnaJ-like chaperonin [Gordonia phage WelcomeAyanna] ],,NCBI, q1:s1 100.0% 2.41916E-118 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.015, -2.7254235724530456, yes F: chaperonin, DnaJ-like SIF-BLAST: ,,[DnaJ-like chaperonin [Gordonia phage WelcomeAyanna] ],,QCW22248,92.4883,2.41916E-118 SIF-HHPRED: a.2.3.1 (A:7-117) Large T antigen, the N-terminal J domain {Simian virus 40, Sv40 [TaxId: 10633]},,,d1gh6a1,25.9434,98.2 SIF-Syn: There is synteny when compared to phage Bunnybear, and the gene overlapping with this gene in RavenCo17 is also a DNA-J chaperonin. The gene upstream and downstream in both phages have NKF, so there are similarities there as well. /note=Primary Annotator Name: Bovee, Alyson /note=Auto-annotation: Genemark and Glimmer both called the start site 41660. The Glimmer score is 7.6 and has a start codon of ATG, which is common, so this is likely the correct start site. /note=Coding Potential: The self trained graph shows great coding potential with the called start and stop sites and a high graph line over this area. The host trained graph also shows a very high coding potential in the correct region, showing this is the correct ORF and thus the correct start site is 41660. /note=SD (Final) Score: The final score is -2.725, which is the least negative final score of all the possible start sites. It is also the lowest open reading frame and has the highest Z-score, thus this is evidence that 41660 is the best start site call. /note=Gap/overlap: The gap/overlap is -4, which is close to being indicative of a codon, but instead is a simple gene overlap. This overlap is normal and shown in other phage genes, thus 41660 is the correct start site. /note=Phamerator: The pham number for this RavenCo17 gene is 2749. This pham has 54 members with the majority of clusters being EG while RavenCo17 is CZ. The length of each gene are all in the mid-600s, including the RavenCo17 gene which is 639bp. The RavenCo17 gene was compared to Bunnybear and Maridalia, which are both also in cluster CZ and have a length of 639bp with a function of DnaJ-like chaperonin. /note=Starterator: Starterator calls RavenCo17’s gene to have the most annotated start site at 41660 out of 54 members with 14 drafts. This start site is found in 49 of 54 genes in the pham and is called 81.6% of the time when present. Other phages with this same start site and call are Bunnybear and Maridalia, as explained previously. This is good evidence of the correct start site being called. /note=Location call: The start site of this gene is 41660 based on the guiding principles met, such as having the most annotated start site, the LORF, and the least negative final score. /note=Function call: The proposed function call based off Phagesdb Function Frequency is a DnaJ-like chaperonin. This is also shown on BLAST when compared to Maridalia and Bunnybear, which have this same function, are in the same cluster, and the e-value is 1e-109. NCBI BLAST also showed this function in Gordonia phage WelcomeAyanna with an identity of 87.8% and an e-value of 2.4e-118. Based on this evidence, the function of the gene is a DnaJ-like chaperonin. /note=Transmembrane domains: TmHmm and Topcons do not call any transmembrane domains for this gene. /note=Secondary Annotator Name: Sharma, Devshi /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered, including coding potential, starterator, gap/overlap, synteny, and final score. CDS 42392 - 42841 /gene="58" /product="gp58" /function="hypothetical protein" /locus tag="RavenCo17_58" /note=Original Glimmer call @bp 42392 has strength 8.01; Genemark calls start at 42392 /note=SSC: 42392-42841 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_MATTEO_51 [Gordonia phage Matteo]],,NCBI, q1:s1 82.5503% 9.65984E-27 GAP: 93 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.514, -4.197870532213559, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_MATTEO_51 [Gordonia phage Matteo]],,QOC55981,48.7805,9.65984E-27 SIF-HHPRED: SIF-Syn: NKF; upstream gene is DnaJ-like chaperonin (pham 2749) and downstream gene is NKF (pham 56567). We observe synteny with other cluster CZ phages like Agueybana, BatStarr, and Eviarto, which have the same functions and phams present in surrounding genes. These phages have a downstream gene with NKF (pham 56567) and an upstream gene with DnaJ-like chaperonin function (pham 2749) just like RavenCo17, although these genes are not directly adjacent like in RavenCo17. /note=Primary Annotator Name: Cini, Victoria /note=Auto-annotation: Gimmer and GeneMark. Both call an ATG start site at 42392. /note=Coding Potential: The gene has high coding potential predicted within the putative ORF. In the second panel for both H-GeneMark and S-GeneMark maps, we observe the highest coding potential, indicating that this is a forward gene. We observe coding potential throughout the entirety of the gene sequence, and the chosen start site captures all the coding potential. /note=SD (Final) Score: The chosen start site of 42392 has the best final score (-4.198) and z-score (2.514) compared to other potential start sites on PECAAN. /note=Gap/overlap: Relatively large gap size of 93bp with the upstream gene. However, this is reasonable considering that all the other potential start sites in PECAAN have larger gap sizes of >= 105bp and worse z scores and final scores. /note=Phamerator: Pham 102350 has 15 members with 12 non-draft members (04/05/22). Within this pham, gene length, gene number, and phage cluster vary considerably, although function is all NKF. This pham is present mainly in cluster CY, CZ, and D phages and shows synteny with other cluster CZ phages like BaxterFox, BoyNamedSue, and Easley. /note=Starterator: Start number 17 is the most annotated start in Starterator with manual annotations in 21/33 non-draft genes in the pham. However, this start site is not present in RavenCo17, so Starterator output was not informative. Instead, RavenCo17 calls start site 11 corresponding to basepair coordinate 42392. Although start site 11 has 0 manual annotations, it agrees with the start site predicted by Glimmer and GeneMark and it captures all the coding potential in the gene. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 42392. /note=Function call: NKF. The top Phagesdb BLAST hits (e-values < 1e^-18) come from cluster CZ phages and have NKF assigned to them. The top NCBI BLAST hits (e-value <= 1e^-26, 79%+ coverage, 37%+ identity) also come from cluster CZ phages and have hypothetical protein function associated with them. HHpred and CDD had no relevant hits (all e-values > 1e^-7). /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Carreon, Justin /note=Secondary Annotator QC: I agree with the primary annotator on the location and function calls. CDS 42838 - 42981 /gene="59" /product="gp59" /function="membrane protein" /locus tag="RavenCo17_59" /note=Original Glimmer call @bp 42838 has strength 10.86; Genemark calls start at 42838 /note=SSC: 42838-42981 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Gordonia phage Bosnia] ],,NCBI, q1:s1 100.0% 2.05262E-23 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.008, -4.767597290511617, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Gordonia phage Bosnia] ],,QOI66892,95.7447,2.05262E-23 SIF-HHPRED: SIF-Syn: Membrane protein gene (pham 102014), upstream gene is NKF, (pham 102350), downstream gene is a DNA methyltransferase (pham 19968), in phage AlumE this is conserved and also in BoyNamedSue. /note=Primary Annotator Name: De Schutter, Elena /note=Auto-annotation: Glimmer and GeneMark call the start site at 42838. /note=Coding Potential: The coding potential covers the entire gene for both GeneMark Self and Host and is very high. /note=SD (Final) Score: The final score is -4.768 and the z score is 2.008 for start site 42838 (also the LORF), these are good values and the only other option for a start site has very poor values for both. /note=Gap/overlap: The overlap is 4bp which is indicative of an operon and thus is a reasonable overlap. /note=Phamerator: pham 102014 as of 04/08/22, this pham has many clusters present: CZ (the same cluster as our phage and includes phages Antonio and Attis), CY, DN, DC, CQ, DW, a singleton, and DH. There is no function called for any of the genes in this pham. /note=Starterator: The most called start site is #50 (which is present in our phage) and it was called 79/113 times for the non-draft genomes. This corresponds to start site 42838 and was also predicted by Glimmer and GeneMark. /note=Location call: Because Glimmer, GeneMark, and Starterator all agree and because of the rest of the evidence provided, this gene is real and its start site is 42838. /note=Function call: Unknown function, the top hits on phagesDB all have unknown function (2e-20). HHPRED did not have any relevant hits, CDD did not have any hits, and NCBI blast only had hits for hypothetical proteins which had good values (100% coverage, 97.87% aligned, 6.62e-24 e-values). /note=Transmembrane domains: There was one TMHMM membrane domain and a hit in Topcons so we can call this a membrane protein. /note=Secondary Annotator Name: Carreon, Justin /note=Secondary Annotator QC: According to available evidence, I agree with the location and function call of the primary annotator. CDS 42978 - 44555 /gene="60" /product="gp60" /function="DNA methyltransferase" /locus tag="RavenCo17_60" /note=Original Glimmer call @bp 42978 has strength 8.28; Genemark calls start at 42978 /note=SSC: 42978-44555 CP: yes SCS: both ST: SS BLAST-Start: [DNA methylase [Gordonia phage MrBates]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.276, -2.17469248771465, yes F: DNA methyltransferase SIF-BLAST: ,,[DNA methylase [Gordonia phage MrBates]],,UJD21080,98.8571,0.0 SIF-HHPRED: c.66.1.26 (A:1137-1600) Methyltransferase domain from DNA methyltransferase 1 (DNMT1) {Mouse (Mus musculus) [TaxId: 10090]},,,d3pt9a3,36.5714,100.0 SIF-Syn: My gene is a dna methyltransferase. Upstream gene is membrane protein and downstream gene are tRNA methyltransferase. When comparing my sequence to AlumE, the same gene is a DNA methyltransferase. Upstream and downstream is NKF. This proves to me that my gene sequence is real and a DNA methyltransferase. The pham number for both RavenCo17 and AlumE is 19968. My gene is a dna methyltransferase. Upstream gene is membrane protein and downstream gene are tRNA methyltransferase. When comparing my sequence to Antonio, the same gene is a DNA methyltransferase. Upstream and downstream is NKF. This proves to me that my gene sequence is real and a DNA methyltransferase. The pham number for both RavenCo17 and Antonio is 19968. /note=Primary Annotator Name: Sharma, Devshi /note=Auto-annotation: Both Glimmer and Genemark agree with each other and say that the start site is 42978. Since they both agree with each other, this is likely the start site. The start codon that is called is ATG. /note=Coding Potential: Both self and host genemark graphs have very good coding potential predicted within the potential open reading frame. The start site is seen in the coding potential in both sequences too. When I compared the gene sequence to other sequences, there were many similarities which helps me decide if this is a real gene. The start site covers the entire coding potential. /note=SD (Final) Score:-2.175 is the final score. Out of all the scores, this is the least negative score. /note=Gap/overlap: -4. The overlap is -4 which means there could be a potential operon present here. I do not believe that there are alternative start candidates because of the evidence given from Glimmer and Genemark. This reading frame is not the LORF but the LORF has a really terrible Final Score. The length of the gene is acceptable being around 1578 base pairs long. /note=Phamerator: Pham 19968. 04/07/2022. I can see that the function shown is DNA methyltransferase and that the length of the gene is roughly 1550+ which is conserved throughout each sequence. There are a variety of clusters present within this pham. The gene is conserved within other members of the pham and I compared them to AlumE and Angelique. Both of these had similar base pair lengths to my gene and also the same function. /note=Starterator: Pham number 19968 has 119 members, 16 are drafts. This includes genes that call this "Most Annotated" start for RavenCo17. This is strong evidence that 42978 is the start site. /note=Location call: I believe this gene is a real gene given all the evidence presented above. The best start site is 42978 for this gene. /note=Function call: Phamerator helped me find the function of the gene because dna methyltransferase was seen across many genes in this pham. Looking at blast, I can see that my gene sequence function is a dna methyltransferase. I used MrBates and CarolAnn which is a dna methyltransferase with an e value of 0. On HHPRED, the percent coverage, probability, and e values were not that great so this did not help me find the function. On BLAST, accession number UJD21080 had an e value of 0 and great percent coverage. This is very convincing evidence that this gene sequence is also a dna methyltransferase. /note=Transmembrane domains: No transmembrane domains. Both TMHMM or TOPCONS did not provide any TMDs which did not help me figure out the function of my gene. /note=Secondary Annotator Name: Koetters, Owen /note=Secondary Annotator QC: I agree with the predicted start site and manually annotated functional call. There could be additional HHpred evidence supporting this function. CDS 44552 - 45028 /gene="61" /product="gp61" /function="tRNA-methyltransferase" /locus tag="RavenCo17_61" /note=Original Glimmer call @bp 44552 has strength 12.19; Genemark calls start at 44588 /note=SSC: 44552-45028 CP: yes SCS: both-gl ST: SS BLAST-Start: [methyltransferase [Gordonia phage MrBates]],,NCBI, q1:s1 100.0% 1.10219E-112 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.182, -4.429150775994987, no F: tRNA-methyltransferase SIF-BLAST: ,,[methyltransferase [Gordonia phage MrBates]],,UJD21081,100.0,1.10219E-112 SIF-HHPRED: N2, N2-dimethylguanosine tRNA methyltransferase; tRNA methyltransferase, RFM domain, NFLD, THUMP domain, TRANSFERASE; 1.7A {Thermococcus kodakarensis (strain ATCC BAA-918 / JCM 12380 / KOD1)},,,5E71_A,96.8354,99.7 SIF-Syn: In every member of cluster CZ with a gene from this gene`s pham, this gene is preceded by a gene from pham 19968, which is mostly comprised of DNA methyltransferases, and this gene and the upstream gene are conserved as an operon, with an overlap of this gene and the upstream gene of 4 basepairs. While not part of an operon in phage Bran (EP), the gene of this pham is immediately downstream of another pham consisting of DNA methyltransferases. /note=Primary Annotator Name: Carreon, Justin /note=Auto-annotation: Glimmer calls the start at position 44552, while GeneMark calls the start at 44588. Both potential start candidates are ATG codons. /note=Coding Potential: There is coding potential in only one ORF and on the forward strand in the Host-Trained GeneMark, however there is overlapping coding potential in this region on the Self-Trained GeneMark. However, given the annotation as this gene as a methyltransferase and the synteny provided by the overlap of the forward upstream gene as a DNA methylase, it is unlikely that this gene would be coded on the reverse strand. /note=SD (Final) Score: This gene has the 3rd highest Z and Final scores, of 2.182 and -4.429 respectively. The participation of this gene in an operon renders these points moot however. /note=Gap/overlap: There is an overlap of 4 base pairs, indicating participation of this gene in an operon. This overlap is conserved in every member of the cluster with a gene from this pham. /note=Phamerator: This gene is of pham 19182 as of 4/07/2022 at 2048 hours PST. This gene is present in a few members of Cluster CZ, and while there is not function auto-called, in every phage this pham is found in (RavenCo17 is the only draft genome with this pham at the time), the function is called as a methyltransferase. /note=Starterator: As of 04/01/2022, the most annotated and conserved start number in the pham was start number 9, being found in 7 of 10 genomes in the pham and called in 6 of 9 non-draft genomes. In cluster CZ, start number 9 is the only start number with manual annotations, and RavenCo17 does not have any of the other manually annotated starts. In RavenCo17, start number 9 corresponds to position 44552. /note=Location call: This gene starts at position 44552. This start position encompasses all coding potential predicted in the host-trained and self-trained GeneMark, and in every member of Cluster CZ with this pham, start number 9 is called which corresponds to this position. Additionally, every member of this cluster with this gene also conserves a 4 base pair overlap to form an operon with the upstream DNA methylase gene, and so while this gene has the 3rd highest Z and Final Scores, the scores are irrelevant. /note=Function call: tRNA Methyltransferase. Every genome in the pham with a manual annotation lists this gene as a methyltransferase. Additionally, very strong hits from NCBI Blastp return as the methyltransferase protein of other members of RavenCo17’s cluster, such as MrBates which has 100% Identity, Alignment, and Coverage, and a e-value of 1.10219e-112. Returns from HHPRED are strongly of methyltransferase or tRNA methyltransferase domains, with 5E71_A (N2, N2-dimethylguanosine tRNA methyltransferase; tRNA methyltransferase, RFM domain, NFLD, THUMP domain, TRANSFERASE; 1) returning with a probability of 99.73%, a score of 108.85, and an e-value of 1.3e-16. /note=Transmembrane domains: No transmembrane domains are predicted by TmHmm or TOPCONS. /note=Secondary Annotator Name: Mascareno, Greta /note=Secondary Annotator QC: I agree with the start site and function call of this gene. Although the start site called does not have the best Z score or Final score, it is the most conserved so this makes outweighs the scores. CDS 45028 - 45279 /gene="62" /product="gp62" /function="hypothetical protein" /locus tag="RavenCo17_62" /note=Original Glimmer call @bp 45028 has strength 3.39 /note=SSC: 45028-45279 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein SEA_MRBATES_58 [Gordonia phage MrBates]],,NCBI, q1:s1 100.0% 1.53286E-52 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.594, -3.5690131075310805, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_MRBATES_58 [Gordonia phage MrBates]],,UJD21082,100.0,1.53286E-52 SIF-HHPRED: SIF-Syn: The pham called is 55893 as of 04/06/22. Pham 55893 is highly conserved and includes 21 non-draft members, and two with a similar genomic architecture are phage Marteena and phage Sahara. /note=Primary Annotator Name: Koetters, Owen /note=Auto-annotation: Only Glimmer predicts a start site, which is located at position 45028. /note=Coding Potential: Both the host and self trained Gene Mark indicate a significant amount of coding potential within the auto-annotated ORF and all coding potential is covered. /note=SD (Final) Score: The auto-annotated start site received an RBS final score of –3.569, which is the closest to zero of all potential ORFS. Two other options at positions 45025 (-4.034) and 45082 (-3.774) received a close final score. /note=Gap/overlap: The auto-annotated gap is -1 nucleotides with the previous gene, a probable overlap that supports the auto-annotated start position. The start option at position 45025 contains a -4 gap, which is also strong evidence supporting that site. Start position 45082 would have a 5 fairly large 52 nucleotide gap. /note=Phamerator: The pham called is 55893 as of 04/06/22. Pham 55893 is highly conserved and includes 21 non-draft members, two of which are phage Marteena and phage Sahara. /note=Starterator: Start site 20 was manually annotated the most often in this pham, as it was found in 21/21 non-draft genomes and manually annotated in 15/21 (71.4%) non-draft phage genomes. This start site matches that which is called in RavenCo17, supporting the auto-annotated start site at position 45028. However, start site 19, corresponding to start position 45025 in the RavenCo17 genome, is found in 14/21 non-draft genomes and manually-annotated in 6/14 (42.8%) of the time when present. /note=Location call: Given the above evidence, this is a real forward gene with the auto-annotated start position of 45028. There does not appear to be any evidence that would warrant changing the auto-annotated start position. Further, start position 20 was manually annotated in all other CZ cluster phages. /note=Function call: Hypothetical Protein. There is no evidence that supports a function, as no blastP hits exist using HHpred, PhagesDB blast, NCBI blast. Further, there are no TMDs predicted by TMHMM or TOPCONS. /note=Transmembrane domains: There are no TMDs predicted by TMHMM or TOPCONS. /note=Secondary Annotator Name: Shaikh, Iman /note=Secondary Annotator QC: I agree that the start site for this gene is 45028 based on auto-annotation and an overlap of only one basepair. I also agree that the function of this gene is a hypothetical protein based on BLAST results. CDS 45276 - 45527 /gene="63" /product="gp63" /function="hypothetical protein" /locus tag="RavenCo17_63" /note=Original Glimmer call @bp 45360 has strength 4.63; Genemark calls start at 45360 /note=SSC: 45276-45527 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_KITA_66 [Gordonia phage Kita] ],,NCBI, q1:s1 100.0% 2.41197E-48 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.682, -3.389881189012514, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KITA_66 [Gordonia phage Kita] ],,YP_009301427,96.3855,2.41197E-48 SIF-HHPRED: SIF-Syn: The upstream gene has no known function and its pham number is 55893. The downstream gene has no known function and its pham number is 100480. /note=Primary Annotator Name: Mascareno, Greta /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 45360. /note=Coding Potential: Coding potential is found both in GeneMark Self and Host. The ORF encompasses the gene from start site to stop. /note=SD (Final) Score:The best final score on PECAAN is -3.390, which corresponds to the best Z score of 2.682 (this is the only score over 2), however this start site would be at 45276 which is not called. /note=Gap/overlap: Gap of 80 bp for the called start site but overlap of 4 with the start site at 45276 bp. The overlap of 4 bp could be an indication of an operon. /note=Phamerator: The pham number as of 04/09/2022 is 10679. The gene is conserved; it is found in phage Sidious from cluster CZ7 and Suerte from cluster CZ4. /note=Starterator: The most annotated start site is present in this gene but it is not called. Start site 24 is called in 26 of the 59 non-draft genes, at 45276. This start site has been manually annotated 26 times. Start site 32 at 45360, which was called, has been manually annotated 9 times. /note=Location call: Considering the evidence, this is a real gene at the start site that was not called of 45276. /note=Function call: NKF. Evidence on HHPRED indicates that the function of this gene is YaiA protein with only 60% probability and 78% coverage, however Phagesdb BLAST states function unknown and NCBI BLAST give evidence for hypothetical protein at 96% identity, 96% alignment and 100% coverage. /note=Transmembrane domains: TmHmm and Topcons provided no matches for TMHs so this gene is not a membrane protein. /note=Secondary Annotator Name:Hu, Yixiao (Sherry), /note=Secondary Annotator QC: Hu, Yixiao (Sherry), Yes CDS 45624 - 46025 /gene="64" /product="gp64" /function="hypothetical protein" /locus tag="RavenCo17_64" /note=Original Glimmer call @bp 45816 has strength 7.98; Genemark calls start at 45816 /note=SSC: 45624-46025 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_ANTONIO_66 [Gordonia phage Antonio]],,NCBI, q1:s1 100.0% 1.68009E-85 GAP: 96 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.833, -3.07999262951171, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ANTONIO_66 [Gordonia phage Antonio]],,QCG77485,96.2406,1.68009E-85 SIF-HHPRED: SIF-Syn: The corresponding gene in phage Beenie is farther upstream than it is in phage RavenCo17. However, there is synteny of genomic architecture downstream of this gene for these phages. /note=Primary Annotator Name: Shaikh, Iman /note=Auto-annotation: Both Glimmer and GeneMark predict the start site for this gene at 45816. /note=Coding Potential: There is coding potential at this start site as well as start site 45624. /note=SD (Final) Score: Start site 45816 has an SD (final) score of -3.831, which is reasonable. The start codon for this start site is ATG. Start site 45624 has an SD (final) score of -3.080 which is also reasonable and has a start codon of TTG. /note=Gap/overlap: There is a gap of 288 base pairs at start site 45816, which is very high. There is a gap of 96 base pairs at start site 45624 which is more reasonable. /note=Phamerator: According to Apr 5, 2022, the pham that this gene belongs to is 100480. There are 69 genes in this pham, and several of them are also from cluster CZ. /note=Starterator: The most annotated start site for this pham is start site 17, which was called in 32/63 non-draft genomes. RavenCo17 calls this most annotated start site and also has 32 manual annotations for start site 17, which is at 45816. However, start site 1 (45624) is called 31.2% of the time when present and this start site is also present in RavenCo17 and has 5 manual annotations. /note=Location call: The start site for this gene is likely 45624 since it has a similar SD score to other start sites and the gap at this start site is much more reasonable than some of the other gene candidates. This start site has 5 manual annotations according to Starterator and is called 31.2% of the time when present. /note=Function call: NKF according to evidence from PhagesDB BLAST /note=Transmembrane domains: There are no transmembrane domains predicted by TmHmm nor Topcons for this gene which makes sense since there is no known function for this gene. /note=Secondary Annotator Name: Abana, Juana /note=Secondary Annotator QC: I agree with the predicted start site and functional annotation however, try to add some more information for the function call. Try to mention how there was no relevant evidence/ bad e-values. Make sure to select Yes for the drop down menu above and fill in the synteny box. CDS 46154 - 47035 /gene="65" /product="gp65" /function="helix-turn-helix DNA binding domain" /locus tag="RavenCo17_65" /note=Original Glimmer call @bp 46154 has strength 5.42; Genemark calls start at 46154 /note=SSC: 46154-47035 CP: yes SCS: both ST: SS BLAST-Start: [helix-turn-helix DNA binding domain protein [Gordonia phage Madeline]],,NCBI, q1:s1 100.0% 0.0 GAP: 128 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.594, -4.414111147545337, no F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding domain protein [Gordonia phage Madeline]],,QDH47668,97.6109,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Abana, Juana /note=Auto-annotation: Glimmer and GeneMark both call the start at 46154 /note=Coding Potential: The coding potential in this ORF is predominantly seen in the forward strand which shows that this is a forward gene. There is coding potential found in both Host GeneMark and Self GeneMark. Moreover, both the Host and Self GeneMark include all of the coding potential from the chosen Glimmer and GeneMark start site. /note=SD (Final) Score: /note=Gap/overlap: Upstream of the gene there is a 336 base pair gap while downstream there is a 4 base pair overlap. /note=Phamerator: The pham number as of April 8, 2022 is 57418. The gene is conserved in phages Bunnybear and Madeline that also belong to cluster CZ. /note=Starterator: Start site 3 MA`ed 12/14 non-draft times, but RavenCo doesn`t have site 3. Site 4 (46154) is present and called 1/1 times (phage Madeline_65, CZ1). /note=Location call: Based on the information above this is a real gene and the most likely start is 46154. /note=Function call: Helix-turn-helix DNA binding domain. The top two phagesdb BLAST hits have the function of helix-turn-helix DNA binding domain and have e-values of 1e-162 and 5e-86 . Some of the NCBI BLAST hits also have the function of helix-turn-helix DNA binding domain, the two selected have 100% and 98.2935% coverage, 96.5487% and 57.4534% identity, and e-values of 0 and 1.84378e102. HHpred and CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Nguyen, Calvin /note=Secondary Annotator QC: Great work! According to available evidence, I agree with all aspects and ultimate call of the gene function; starterator may need to be changed to NI instead of NA. CDS 47032 - 47364 /gene="66" /product="gp66" /function="helicase loader" /locus tag="RavenCo17_66" /note=Original Glimmer call @bp 47032 has strength 8.95; Genemark calls start at 47032 /note=SSC: 47032-47364 CP: yes SCS: both ST: SS BLAST-Start: [helicase loader [Gordonia phage Maridalia] ],,NCBI, q1:s1 100.0% 2.51622E-69 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.472, -3.8183153089559627, yes F: helicase loader SIF-BLAST: ,,[helicase loader [Gordonia phage Maridalia] ],,AZF98811,97.2727,2.51622E-69 SIF-HHPRED: replisome organizer; helical bipartite natively unfolded domain, replication; HET: MSE; 2.4A {Bacillus phage SPP1} SCOP: l.1.1.1, a.179.1.1,,,1NO1_C,96.3636,99.7 SIF-Syn: Helicase loader. For RavenCo17, the upstream gene is helix-turn-helix DNA binding domain, and the downstream gene is RusA-like resolvase (endonuclease). There is synteny with the upstream gene in phages BunnyBear (CZ) and Madeline (CZ1). These phages however have a different gene inserted downstream. /note=Primary Annotator Name: Charton, Chris /note=Auto-annotation: Both Glimmer and GeneMark call for this gene, with a start site at 47032 /note=Coding Potential: There is significant coding potential for this gene seen in the first forward reading frame. This is seen in both the SelfTrained and HostTrained reports. /note=SD (Final) Score: -3.818. This is the highest final score. The Z-Score is 2.472, also the highest amongst potential start sites. /note=Gap/overlap: -4bp. This site has a 4bp overlap with the preceding gene. This is a common overlap for genes structured in an operon. /note=Phamerator: As of 4/7/2022 the pham is 97098. It is conserved in Bunnybear (CZ) and Madeline (CZ1) /note=Starterator: Start site 7 is manually annotated in 12 of 13 non-draft phages. All phages with this start site call for it. For RavenCo17 this corresponds to a start @ 47032 /note=Location call: This is a real gene with a start site @ 47032 /note=Function call: Helicase loader. There are numerous good hits on PhagesDB BLAST with e-values <1e-50 for this function. There are 3 additional strong hits on NCBI BLAST with e-values <1e-66 with the same given function. /note=Transmembrane domains: 0. Neither TMHMM or TOPCONS predict any TMDs. /note=Secondary Annotator Name: Reyes, Glania /note=Secondary Annotator QC: I agree with this annotation. All evidence categories have been considered. CDS 47361 - 47729 /gene="67" /product="gp67" /function="RusA-like resolvase (endonuclease)" /locus tag="RavenCo17_67" /note=Original Glimmer call @bp 47361 has strength 8.85; Genemark calls start at 47361 /note=SSC: 47361-47729 CP: yes SCS: both ST: SS BLAST-Start: [RusA-like resolvase [Gordonia phage Matteo] ],,NCBI, q1:s1 100.0% 3.41825E-77 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.259, -6.315560378745417, no F: RusA-like resolvase (endonuclease) SIF-BLAST: ,,[RusA-like resolvase [Gordonia phage Matteo] ],,QOC55992,96.7213,3.41825E-77 SIF-HHPRED: Crossover junction endodeoxyribonuclease rusA; Homologous recombination, DNA repair, resolvase, HYDROLASE; 1.2A {Escherichia coli} SCOP: d.79.6.1,,,2H8E_A,99.1803,99.9 SIF-Syn: RusA-like resolvase (endonuclease). Upstream is helicase loader, and downstream is Pham 95451. Exact synteny is not displayed with phages within cluster CZ or its subclusters. Phage Maridalia of cluster CZ1 also has helicase two genes upstream from its corresponding gene. /note=Primary Annotator Name: Nguyen, Calvin /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 47361. /note=Coding Potential: Both GeneMark Host and Self demonstrate substantial coding potential within the suggested start of 47361 and the stop site of 47729. Coding potential is contained within the start and stop genome interval. /note=SD (Final) Score: The suggested start site has a Z-score of 1.259 and a final RBS score of -6.316. The Z-score is the lowest out of all available sites, but the final RBS score is the highest out of all reasonable starting sites without a major gap or overlap. /note=Gap/overlap: There is an overlap of -4, which is reasonably sized for a gene. It also has the start codon GTG, which is very common in genes. /note=Phamerator: This gene is located in Pham 102367 as of 4/8/22. There are 244 members of the pham, and it appears that this gene is heavily conserved across a variety of different clusters (e.g. Phage Acquire from cluster L, Adora from cluster CZ, and Apricot from cluster DN). /note=Starterator: Although RavenCo17 did not call the most annotated start site as it was not present in its genome, it used start site 89 which was called 100% of the time when it was present (92/92 non draft genes). This corresponds to the suggested start site at 47361. /note=Location call: According to all available evidence, it appears that this gene is real and is located at start site 47361. /note=Function call: RusA-like resolvase. The top three hits in Phagesdb blast had a high e-value of 5e-64 and score of 241 and indicated the function of RusA-like resolvase. HHpred has a hit with 2H8E_A for Crossover junction endodeoxyribonuclease rusA with probability of 99.9%, coverage of 99.1803%, and E-value of 3.1e-20. /note=Transmembrane domains: TmHmm predicted no TMHs, and Topcons similarly did not provide any hits for TMHs. Therefore, we can conclude that this gene is not a transmembrane protein. /note=Secondary Annotator Name: Bovee, Alyson /note=Secondary Annotator QC: I agree with the primary annotator that the function of this gene is a RusA-like resolvase. CDS 47726 - 48397 /gene="68" /product="gp68" /function="helix-turn-helix DNA binding domain" /locus tag="RavenCo17_68" /note=Original Glimmer call @bp 47726 has strength 8.04; Genemark calls start at 47726 /note=SSC: 47726-48397 CP: yes SCS: both ST: SS BLAST-Start: [HTH DNA-binding domain protein [Gordonia phage SoilAssassin] ],,NCBI, q1:s1 100.0% 4.46604E-149 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.957, -2.9063687850157054, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[HTH DNA-binding domain protein [Gordonia phage SoilAssassin] ],,YP_009303064,95.9641,4.46604E-149 SIF-HHPRED: Putative DNA-binding protein; BldC, S. coelicolor, developmental switch, MerR-like, DNA BINDING PROTEIN-DNA complex; 3.09A {Streptomyces venezuelae},,,6AMA_A,26.4574,97.1 SIF-Syn: Helix-turn-helix DNA binding domain protein, upstream gene is RusA-like resolvase, downstream gene is helix-turn-helix DNA binding domain protein, just like in phages Gizermo, Lamberg, and Matteo in cluster CZ. /note=Primary Annotator Name: Reyes, Glania /note=Auto-annotation: Both Glimmer and GeneMark call the gene and agree on the same start site at 47726 bp. /note=Coding Potential: Coding potential in the ORF is very high on the forward strand, indicating that this is a forward gene. Coding potential is found in GeneMark Self and Host. /note=SD (Final) Score: -2.906. This is the best final score, with a z-score of 2.957 which is also the best z-score. /note=Gap/overlap: -4. This overlap suggests that this gene is part of an operon. /note=Phamerator: The pham number as of April 8, 2022 is 95451. The gene is conserved in phages Attis, GemG, and Cynthia, all of which are in cluster CZ with a conserved length of ~680 bp. The function call for all of these genes is a helix-turn-helix DNA binding protein and it is consistent between Phamerator and the phams database. /note=Starterator: There are 96 members in this pham, 10 of which are drafts. Start 32 correlates to a start site of 47726 bp for RavenCo17, which has 84 MAs. Glimmer and GeneMark agree with this start site. /note=Location call: Considering the above evidence, this gene is a real gene and has a start site at 47726 bp. Starterator agrees with Glimmer and GeneMark. /note=Function call: Helix-turn-helix DNA binding domain protein. Multiple phagesDB BLAST has hits with the suggested function helix-turn-helix DNA binding domain protein with very small e values of 1e-114 to 0.002. A majority of NCBI BLAST also has hits with the function helix-turn-helix DNA binding domain protein with small e values ranging from 4.5e-149 to 6.2e-140. CDD had no hits. HHpred shows multiple hits, but the e-values are too high to consider as evidence. /note=Transmembrane domains: TmHmm predicts no TMDs. Topcons also predicts no TMDS. This makes sense because the function for the gene is a helix-turn-helix DNA binding domain protein, which does not contact the membrane. /note=Secondary Annotator Name: Cini, Victoria /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered, including coding potential, starterator, gap/overlap, synteny, and final score. Great work!:) Just make sure to update synteny box. CDS 48487 - 48933 /gene="69" /product="gp69" /function="hypothetical protein" /locus tag="RavenCo17_69" /note=Original Glimmer call @bp 48487 has strength 12.76; Genemark calls start at 48487 /note=SSC: 48487-48933 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BIZ73_gp69 [Gordonia phage Eyre] ],,NCBI, q1:s1 81.7568% 1.39564E-77 GAP: 89 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.722, -3.3064319479770883, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BIZ73_gp69 [Gordonia phage Eyre] ],,YP_009292460,95.1613,1.39564E-77 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Bovee, Alyson /note=Auto-annotation: Both Genemark and Glimmer call the start site to be 48487. The Glimmer score is 12.76 which is very high, and the start codon for this start site is GTG, which is common. Based on this evidence, the correct start site should be 48487. /note=Coding Potential: The coding potential shown on the self trained graph is very high, indicating good coding potential in the given ORF. The host trained graph is also high in the given ORF with the start site and stop site in the correct locations (48487 through 48933). This is a good indication that the start site called is correct. /note=SD (Final) Score: The final score is -3.306, which is the least negative value of all the potential start sites. This is a good indication that the correct start site is 48487. /note=Gap/overlap: The gap shown is 89, which is not large enough for a valid gene to fit upstream. The pham map also shows a very small region open, which a gene would not be able to be added. This is evidence that the correct start site is called and is 48487. /note=Phamerator: The pham number for this RavenCo17 is 99256. There are 74 members in this pham and all (besides two singletons) are in cluster CZ along with RavenCo17. All the length of genes are in the mid-300s, which is different from the RavenCo17 gene which is 447. /note=Starterator: On starterator, pham 99256 is called to have 74 members with 10 drafts. The start site for this RavenCo17 gene does not have the most annotated start site. The start site 48487 is found in 29 of 74 genes, but is called 93.1% of the time when present. Other phages in this category are Faith5x5 and Whiteclaw, which have been compared to RavenCo17 previously and consistently show similarities. Based on Starterator, the best called start site is 48487, which is the proposed start site, thus this must be the correct start site. /note=Location call: The start site for this gene in RavenCo17 is 48487 and shown by the guiding principles met, such as the least negative final score and the Starterator report calling 48487 as the best and most called start site. /note=Function call: According to Phagesdb Function Frequency, the proposed function for this gene is a helix-turn-helix dna-binding domain protein, which is called in cluster A15 100% of the time. However, BLAST and HHPRED call the potential function to be a hypothetical protein or oxidoreductase. Based on the evidence, having low scores for having a function is inconsistent results, the function of this protein has no known function. /note=Transmembrane domains: There are no transmembrane domains called by TmHmm or Topcons. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 48930 - 49265 /gene="70" /product="gp70" /function="hypothetical protein" /locus tag="RavenCo17_70" /note=Original Glimmer call @bp 48930 has strength 10.99; Genemark calls start at 48930 /note=SSC: 48930-49265 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BIZ73_gp71 [Gordonia phage Eyre] ],,NCBI, q1:s1 100.0% 9.78377E-73 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.692, -3.368377069717189, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BIZ73_gp71 [Gordonia phage Eyre] ],,YP_009292462,99.0991,9.78377E-73 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Cini, Victoria /note=Auto-annotation: Gimmer and GeneMark. Both call a GTG start site at 48930. /note=Coding Potential: The gene has high coding potential predicted within the putative ORF. In the third panel for both H-GeneMark and S-GeneMark maps, we observe the highest coding potential, indicating that this is a forward gene. We observe strong coding potential throughout the entirety of the gene sequence, and the chosen start site captures all the coding potential. /note=SD (Final) Score: The chosen start site of 48930 has the best final score (-3.368) and z-score (2.692) compared to other potential start sites on PECAAN. /note=Gap/overlap: 4bp overlap with the upstream gene, indicating that this gene may be part of an operon. All the other potential start sites have larger gap/overlap sizes of >= 38bp. /note=Phamerator: Pham 5666 has 80 members with 70 non-draft members (04/05/22). This pham is mainly present in cluster CZ phages, and NKF is assigned for all instances. This pham shows synteny and appears towards the end of the right arm like many other cluster CZ phages including: Antonio, Agueybana, AlumE, and others. /note=Starterator: Start number 7 is the most annotated start in Starterator with manual annotations in 44/70 non-draft genes in the pham. However, this start site is not present in RavenCo17, so Starterator output was not informative. Instead, RavenCo17 calls start site 6 corresponding to basepair coordinate 48930. Start site 6 has manual annotations in 23/70 non-draft genomes, so although this start site isn`t the most manually annotated, it agrees with the start site predicted by Glimmer and GeneMark and it captures all the coding potential in the gene. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 48930. /note=Function call: NKF. The top Phagesdb BLAST hits (e-values < 1e^-45) come from a singleton phage and other cluster CZ phages, and they all have NKF assigned to them. The top NCBI BLAST hits (e-value <= 1e^-57, 98%+ coverage, 75%+ identity) also come from cluster CZ phages and have hypothetical protein function associated with them. CDD and HHpred had no relevant hits (all e-values > 1e^-7). /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Sharma, Devshi /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered, including coding potential, starterator, gap/overlap, synteny, and final score. CDS 49279 - 49611 /gene="71" /product="gp71" /function="HNH endonuclease" /locus tag="RavenCo17_71" /note=Original Glimmer call @bp 49411 has strength 1.99 /note=SSC: 49279-49611 CP: yes SCS: glimmer-cs ST: SS BLAST-Start: [HNH endonuclease [Gordonia phage Madeline]],,NCBI, q1:s1 100.0% 3.05906E-58 GAP: 13 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.796, -5.200997464701116, no F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Gordonia phage Madeline]],,QDH47679,88.1818,3.05906E-58 SIF-HHPRED: HNH endonuclease; Thermophilic bacteriophage, HNH Endonuclease, DNA nicking, HYDROLASE; 1.52A {Geobacillus virus E2},,,5H0M_A,90.9091,98.7 SIF-Syn: HNH endonuclease gene (pham 102338), upstream gene is NKF, (pham 102350), seen in phages AlumE and Bunnybear as well. /note=Primary Annotator Name: De Schutter, Elena /note=Auto-annotation: Only Glimmer calls a start site at 49411 /note=Coding Potential: There is coding potential present at the gene, it does go fairly high; however, it does not cover the start site or the whole rest of the gene. I do still think this gene is real because it is found in other phages as well (ex: BoyNamedSue and Bialota). /note=SD (Final) Score: The final score is -4.724 and the z score is 2.029 for start site 49411, these are good values, the other option for a start site (49279) has a final score is -5.201 and the z score is 1.796. /note=Gap/overlap: The gap is 145 for the recommended start site by Glimmer, the other start site only has a gap of 13. The 145bp is slightly large since the recommended gap is only 50bp so the 13bp gap would be better. /note=Phamerator: pham 102338 as of 04/08/22, this pham has many clusters present including not limited to: CZ (the same cluster as our phage and includes phages Antonio and AlumE), F, A, EE, DN, singletons, etc. The HNH endonuclease function is called very often within this cluster. /note=Starterator: (AS OF 7/15/22, this gene is an orpham). The most called start site is not present in our gene. The one with the most manual annotations that is present in our gene is start site 49279 (#110) and was annotated 24 times. The start site called by Glimmer has only ever been present in this specific gene. /note=Location call: This is a rather tricky location call; while this would go against what is called by Glimmer, due to the manual annotations seen in Starterator and also the much smaller gap, I would call 49279 as the start site. Especially because the gap of 145bps was not seen in other phages that have this gene (when looking at pham maps, ex: BoyNamedSue and AlumE) and this start site also gives the gene the LORF. /note=Function call: HNH endonuclease, the top hits on phagesDB has HNH endonuclease function (4e-52), HHPRED had a good hit for HNH endonuclease (98.7% probability, 90.9% coverage, 4.7e-8 e-value), NCBI blast good hits for HNH endonuclease (88.2% aligned, 100% coverage, e-value of 3.06e-58), and CDD had one relatively good one as well for this function (42.1% aligned, 50.9% coverage, and an e-value of 0.00040). /note=Transmembrane domains: Neither topcon nor TMHMM show any transmembrane domain and thus it is not a membrane protein. /note=Secondary Annotator Name: Charton, Chris /note=Secondary Annotator QC: I have QCed this gene and agree with the location and function calls. Starterator and GM drop downs need to be selected. For Transmembrane domains, might want to just mention that specifically TmHmm and Topcons call for 0 TMDs. Also specifically name a few phages that share the gap structure in the location call. Looks good.