CDS 1 - 372 /gene="1" /product="gp1" /function="terminase, small subunit" /locus tag="ModicumRichard_1" /note=Original Glimmer call @bp 1 has strength 11.92; Genemark calls start at 1 /note=SSC: 1-372 CP: yes SCS: both ST: SS BLAST-Start: [terminase small subunit [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 8.62027E-82 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.759, -3.170856472917156, no F: terminase, small subunit SIF-BLAST: ,,[terminase small subunit [Gordonia phage Cafasso] ],,QXN74216,100.0,8.62027E-82 SIF-HHPRED: Terminase small subunit; genome packaging, bacteriophage, DNA binding, VIRAL PROTEIN; 1.4A {Enterobacteria phage HK97},,,6Z6E_A,74.7967,99.5 SIF-Syn: Terminase small subunit, no upstream gene, downstream gene is helix-turn-helix DNA binding domain, just like in phages Morgana, ObLaDi, Aleemy, and Cafasso. /note=Primary Annotator Name: Baugh, Alex /note=Auto-annotation: Gene (stop@372 F) Both Glimmer and Genemark call the start site at 1. /note=Coding Potential: There is coding potential only on the forward strand, showing that this is a forward strand. There is coding potential in both Genemark self and host. /note=SD (Final) Score: -3.171. The second highest final score, but I chose this since both glimmer and genemark called it at 1 and it had the longest ORF. It also had the second highest z-score of 2.759. /note=Gap/overlap: No gap since this is the first gene. /note=Phamerator: At 10/29/24, the gene is in pham 158482.The cluster DZ is conserved and is found in 4 other phages Aleemy, Cafasso, Morgana, ObLaDi. /note=Starterator: There is 5 non-draft members in this pham. 4 out of the 5 non-draft members call the start site of 1 which correlates to a start site of 1bp for ModicumRichard. /note=Location call: Based on the above evidence, this is a real gene and the start site is called for at 1. /note=Function call: Terminase small subunit. The top 2 phagesdb Blast hits and top 2 NCBI Blast hits have the function of terminase small subunit with E-values between 10-44 and 10-82. There are also strong domain alignment data for CCD and HHpred that align the ORF with known terminase small subunits in phages. With these consistent matches across multiple tools and datasets, the function of terminase small subunit is assigned. It also meets requirements listed on the Approved Functions List. /note=Transmembrane domains: DeepTMHMM did not predict any TMDs, so it is not a membrane protein. /note=Secondary Annotator Name: Lee, Kyle /note=Secondary Annotator QC: I agree with this annotation including the location and functional call. But, please ensure you fill out the transmembrane domains section of the PECAAN notes. Also, I believe you switched the meanings of upstream/downstream in your synteny notes. CDS 374 - 967 /gene="2" /product="gp2" /function="helix-turn-helix DNA binding domain" /locus tag="ModicumRichard_2" /note=Original Glimmer call @bp 431 has strength 16.62; Genemark calls start at 431 /note=SSC: 374-967 CP: yes SCS: both-cs ST: SS BLAST-Start: [helix-turn-helix DNA binding domain protein [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 1.63816E-140 GAP: 1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.443, -3.8970005018578204, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding domain protein [Gordonia phage Cafasso] ],,QXN74217,100.0,1.63816E-140 SIF-HHPRED: HTH_58 ; Helix-turn-helix domain,,,PF19575.2,31.4721,96.2 SIF-Syn: Helix-turn-helix DNA binding domain; upstream gene is terminase small subunit, downstream gene is terminase large subunit, just like in phages Cafasso, Morgana, Aleemily and ObLaDi. /note=Primary Annotator Name: Iasi, Matilde /note=Auto-annotation: Gene 1 (stop@967 F), Glimmer and Gene Mark both call the start at 431, which is good evidence for a real gene, and the start codon is GTG. /note=Coding Potential: there is good coding potential for both the Host and Self trained GeneMark. The coding potential goes before the start site at 431, suggesting 374 is a better start site as it would cover all the coding potential. /note=SD (Final) Score: -3.897, and it is the best final score given on PECAAN. This Final score is for the start site at 374, rather than 431, which has overall better values and is conserved in other genes in the pham. /note=Gap/overlap: Gap is of 1 bp. It is the smallest gap possible, indicating that it is in a suitable position compared to its closest upstream gene. This start site (374) provides the longest reading frame as well and gives a length of 594 bp to the gene, which is again conserved amongst other genes in the pham. /note=Phamerator: Pham 183917. Date 18/10/24. It is conserved, found in Morgana_2 (DZ), Cafasso_2 (DZ), Aleemily_2 (DZ) and ObLaDi_2 (DZ) /note=Starterator: Start site 3 (374) was manually annotated in 4/5 non-draft genes in this pham. Start 3 is 374 in ModicumRichard draft, but this evidence does not agree with the start site predicted by Glimmer and GeneMark (431). /note=Location call: Most likely, this is a real gene and the start site is called for at 374. This statement is supported by valid z-scores and final scores, a shorter gap and a LORF, as well as it being conserved in other genomes /note=Function call: helix-turn-helix DNA binding domain. The top four PhagesDB hits have the function of a helix-turn-helix DNA binding domain protein (E-value e-108), and the top 10 hits on NCBI BLAST hits also have the same function (100% coverage, 100% and 94.4% identity, and E-value 2e-140). CDD did not yield any informative results (no hits at all), while HHpred had a hit with a helix-turn-helix domain, with a high probability of 96.2 but a coverage of 31%, which was however the highest. The E-value is of 2.4 x 10-4. This would match the BLAST search for the function of the protein. /note=Transmembrane domains: No TMDs detected /note=Secondary Annotator Name: Khuu, Jacob /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator (for 374). I do agree with the functional call based on the evidence; the PhagesDB BLAST, the HHpred, and the NCBI BLAST all had its top hits call for the function helix-turn-helix DNA Binding Domain. CDS 964 - 2316 /gene="3" /product="gp3" /function="terminase, large subunit" /locus tag="ModicumRichard_3" /note=Original Glimmer call @bp 964 has strength 13.69; Genemark calls start at 964 /note=SSC: 964-2316 CP: yes SCS: both ST: SS BLAST-Start: [terminase large subunit [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.506, -4.535884094130711, no F: terminase, large subunit SIF-BLAST: ,,[terminase large subunit [Gordonia phage Cafasso] ],,QXN74218,100.0,0.0 SIF-HHPRED: Large subunit terminase; large terminase, VIRAL PROTEIN; 2.2A {Deep-sea thermophilic phage D6E},,,5OE8_A,93.5556,100.0 SIF-Syn: Large subunit terminase. Upstream gene is a helix-turn-helix gene, downstream is a portal protein, just like in phage Morgana. /note=Primary Annotator Name: Mathew, Taniya /note=Auto-annotation: Gene 1 (stop@2316 F), Glimmer and GeneMark both indicate the start site is 964. /note=Coding Potential: Since coding potential for the ORF this gene is located in is only in the forward strand, this gene is a forward gene and the chosen start site contains all of the coding potential. /note=SD (Final) Score: The SD final score given for this gene is -4.536, which is the second best final score on PECAAN, but the better score (-4.102) is associated with a lower z-score and a larger gap, so it is not better than the SD final score. /note=Gap/overlap: Gap overlap is -4, which means there is minor overlap. Since the gap is not greater than -7, this can be overlooked. A gap size of -4 might indicate an operon. /note=Phamerator: Pham 1724. Date 10/29/24. The gene is conserved in phages Aleemily, Cafasso, and ObLaDi, which are part of the same cluster, DZ, as ModicumRichard. /note=Starterator: Start site was called at 964 on both auto-annotation and starterator, and my gene contains this most-annotated start site. /note=Location call: This gene is most likely a real gene and the start site is called at 964. /note=Function call: Large subunit terminase. There are several high confidence BLASTp hits on PhagesDB and NCBI (several e values of 0 which is <10^-44) and top HHPred hits also had this function with high probability, good e-values (all <10^-44), and good coverage. There were also hits on CDD. /note=Transmembrane domains: There are no TMDs detected within this gene. /note=Secondary Annotator Name: Xu, Sasha /note=Secondary Annotator QC: I agree with the call. For the phagesdb Blast, be careful about checking draft genomes as evidence. Also, be sure to update the synteny box CDS 2279 - 3949 /gene="4" /product="gp4" /function="portal protein" /locus tag="ModicumRichard_4" /note=Original Glimmer call @bp 2279 has strength 12.33; Genemark calls start at 2279 /note=SSC: 2279-3949 CP: yes SCS: both ST: SS BLAST-Start: [portal protein [Gordonia phage Aleemily] ],,NCBI, q1:s1 100.0% 0.0 GAP: -38 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.653, -3.4685663819143713, no F: portal protein SIF-BLAST: ,,[portal protein [Gordonia phage Aleemily] ],,UVK59744,100.0,0.0 SIF-HHPRED: Portal protein; complex, phage, sapi, portal, STRUCTURAL PROTEIN; 2.4A {Staphylococcus phage 80alpha},,,8V8B_A,86.1511,99.9 SIF-Syn: This gene belongs to Pham 1736 and exhibits strong synteny with other phages, including Aleemily, Cafasso, Morgana, and ObLaDi. Within these phages, the upstream gene belongs to PHAM 1724 and the downstream gene belongs to PHAM 187115. This conserved syntenic pattern across multiple non-draft phages in cluster DZ supports the functional consistency of the gene as a portal protein. The placement and orientation within the genomic context are further validated by Phamerator, demonstrating high conservation in gene architecture and functional relevance within the cluster. /note=Primary Annotator Name: Reynolds, Noah /note=Auto-annotation: Gene Stop @3949 F. Both genemark and Glimmer gave the start site as 2279 with a glimmer score of 12.33 /note=Coding Potential: Coding potential in this ORF is in the forward strand only, indicating it is a forward gene. /note=SD (Final) Score: -3.469 and it is the best final score given on PECAAN /note=Gap/overlap: The gap is -38. This is concerning due to how large it is. However the Pham map and coding potential are normal and have good synteny with similar phages. /note=Phamerator: Pham 1736 10/29/2024, This gene is conserved in 7/64 genes. It is found in Aleemily_4, Cafasso_4, Morgana_4 and ObLaDi_4, all of which are part of the same cluster. /note=Starterator: The start site 19 is conserved by 7 of the 63 genes. However, most of the genes that have this start site belong to subcluster DZ. This start site correlates with the auto annotated start for this gene. /note=Location call: This is a real gene and the start site is called for 2279. /note=Function call: The top three PhagesDB BLAST hits for this gene have the function of portal protein (E-value = 0), with 100% identity and consistent length of 556 amino acids. Similarly, the top 3 out of 5 NCBI BLAST hits also confirmed the portal protein function (E-value = 0) with 100% coverage and at least 99% identity. HHpred returned a hit for the 80alpha portal protein with 100% probability, 86.15% coverage, and an E-value of 5.1e-32, further supporting this function. HHpred also revealed a match to 7Z4W_J, a known portal protein structure, with 98.5% probability, 80.4% coverage, and an E-value of 2.4e-20, indicating a strong structural alignment consistent with portal protein characteristics. Additionally, PFAM analysis identified PF05133.17, a phage portal protein family domain, with a significant match probability of 97.8%, coverage of 82%, and an E-value of 3.2e-12, providing further support for the functional call. The CDD database also identified the "Phage_prot_Gp6" domain with a significant E-value of 7.90e-08, confirming its role as a phage portal protein. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Su, Erin /note=Secondary Annotator QC: I have QC’ed this location call and agree with the primary annotator. However, you have incorrectly stated the gap as -36 instead of -38. Please also proofread for clarity and concision. I also agree with this function call as there is extremely strong evidence to suggest that this is a portal protein. However, I see that you`ve checked multiple HHPred hits as evidence, be sure to address each (even just as a collective "the top three hits..."). Also, be sure to mention the functions of the upstream and downstream genes in the synteny section. CDS 3924 - 4982 /gene="5" /product="gp5" /function="hypothetical protein" /locus tag="ModicumRichard_5" /note=Original Glimmer call @bp 3924 has strength 9.68; Genemark calls start at 3924 /note=SSC: 3924-4982 CP: yes SCS: both ST: SS BLAST-Start: [MuF-like minor capsid protein [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 0.0 GAP: -26 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.592, -3.513874779476501, no F: hypothetical protein SIF-BLAST: ,,[MuF-like minor capsid protein [Gordonia phage Cafasso] ],,QXN74220,100.0,0.0 SIF-HHPRED: Phage_min_cap2 ; Phage minor capsid protein 2,,,PF06152.14,94.8864,100.0 SIF-Syn: /note=Primary Annotator Name: Bonver, Sarah /note=Auto-annotation: Both Glimmer and Genemark call start at 3924 /note=Coding Potential: The stop and start site of this gene captures all of the coding potential indicated by the black and red dotted lines in the self-trained GeneMark. Also it captures all the coding potential in the host-trained GeneMark. /note=SD (Final) Score: The final score is -3.514 which is the least negative number and the Z score is 2.592 which is the highest Z score. This is good evidence that this is the start site. /note=Gap/overlap: There is -26 overlap with the gene before it and there is a gap behind it which has coding potential. /note=Phamerator: pham: 187115. Date 11/06/2024. It is conserved; found in Aleemily (DZ), Cafasso (DZ), ObLaDi (DZ) /note=Starterator: start #11@ 3924bp is the auto-annotated start site which is manually annotated in 4 phages. Found in 5/29 genes in pham. /note=Location call: There’s sufficient evidence to say this is a real gene because the start site has a good Z-value and the least negative RBS score. Also Also, E-value, synteny, length, and coding potential are strong pieces of evidence that support it being real. Start codon "ATG" is very likely. /note=Function call: NFK. Possible minor capsid protein. No strong evidence for function from HHPred or CDD. NFK is called for this gene in other phages as well such as ObLaDi and Morgana. /note=Transmembrane domains: Cannot determine whether the absence of TMDs makes sense since gene is NFK /note=Secondary Annotator Name: Baugh, Alex /note=Secondary Annotator QC: I agree with location call and function call. I would just make sure to complete the functional hypothesis in Module 7 in your notebook on whether the CDD and HHpred results were informative. I would also add why you called NFK for the function call in your PECANN notes. CDS 5120 - 5893 /gene="6" /product="gp6" /function="scaffolding protein" /locus tag="ModicumRichard_6" /note=Original Glimmer call @bp 5117 has strength 12.44; Genemark calls start at 5120 /note=SSC: 5120-5893 CP: no SCS: both-gm ST: NI BLAST-Start: [scaffolding protein [Gordonia phage ObLaDi]],,NCBI, q1:s2 100.0% 1.08547E-180 GAP: 137 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.565, -3.6476983004329377, no F: scaffolding protein SIF-BLAST: ,,[scaffolding protein [Gordonia phage ObLaDi]],,UXE03729,99.6124,1.08547E-180 SIF-HHPRED: Scaffold protein; major capsid protein, HK97-like fold, scaffolding protein, procapsid, VIRUS; 3.72A {Staphylococcus phage 80alpha},,,6B0X_b,49.0272,23.5 SIF-Syn: "scaffolding protein, upstream gene is (likely) minor capsid protein, downstream is major capsid protein, just like in phage Aleemily" /note=PECAAN Notes /note=Primary Annotator Name: Graham, Noah /note=Auto-annotation: Called by both Glimmer and GeneMark, disagreed on start site by 3 bp (5117 and 5120 respectively), start codon GTG /note=Coding Potential: there is reasonable coding potential within the called ORF, the called start site does miss some coding potential just upstream but it is unclear whether or not it should be included with the gene // forward coding potential is present /note=SD (Final) Score: -3.587, not the best but second least negative so it is still reasonable for an RBS // not an operon // Z-Score is 2.565 /note=Gap/overlap: 134bp, very large gap (does miss some earlier coding potential on the Self-Trained) which might be reasonable because no other start site called would create a longer ORF as this is the smallest gap and is also conserved in other phages with similar genes, the length of the gene is reasonable as it is much greater than 120bp /note=Phamerator: Pham: 10247 as of 10/29/24 // the other members of the pham are in the same cluster (DZ) as ModicumRichard or listed as a singleton (VanLee), the ones I used for comparison were ObLaDi and Morgana // phagesDB called this gene with a scaffolding protein function and is shared with 4 other genes with phages in the same cluster /note=Starterator: there are two reasonable start sites that are conserved in genes similar to this one in other phages of the cluster // Start: 1, @ position: 5117bp called in ⅘ phages while start 2: @5120 was called in ⅗ phages (final) // has 5 other members and 4 of them called both sites and were MA`d at the same rate which makes Starterator uninformative as all the data was split /note=Location call: the evidence so far suggests that this gene is real (synteny, auto-called twice, GTG start codon, coding potential is covered mostly // it is a real gene as it`s conserved and displays good coding potential // start sites @ 5117 and 5120 both seem reasonable but 5120 was decided on due to preference of ATG start codon /note=Function call: predicted function is scaffolding protein, based on hits from phagesDB BLASTP and NCBI BLASTp, both had at least 2 hits with high coverage (100%), high % identity (100%), and low E-values (e-142 and e-180) /note=Transmembrane domains: Scaffolding proteins aid in protein-protein interactions and bind with others to facilitate signaling complexes inside the cell. The absence of TMDs does make sense due to this idea. The topology also states a Globular type of protein which also confirms how scaffolding proteins are classified. /note=Secondary Annotator Name: Iasi, Matilde /note=Secondary Annotator QC: I have QC`s this location and agree with the first annotator. The start site at 5120 seems the most reasonable as it covers the most coding potential and the gap is conserved across phages of the cluster. Additionally, I agree on the functional call as the PhagesDB BLAST, the NCBI BLAST and HHpred top hits all call for the scaffolding protein function. CDS 5963 - 6880 /gene="7" /product="gp7" /function="major capsid protein" /locus tag="ModicumRichard_7" /note=Original Glimmer call @bp 5963 has strength 17.31; Genemark calls start at 5963 /note=SSC: 5963-6880 CP: yes SCS: both ST: SS BLAST-Start: [major capsid protein [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 0.0 GAP: 69 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.007, -2.7423981586358774, yes F: major capsid protein SIF-BLAST: ,,[major capsid protein [Gordonia phage Cafasso] ],,QXN74222,100.0,0.0 SIF-HHPRED: P22_CoatProtein ; P22 coat protein - gene protein 5,,,PF11651.11,98.3607,99.9 SIF-Syn: Major capsid protein. Upstream gene is scaffolding protein, downstream is unknown, just like in phage Cafasso. /note=Primary Annotator Name: Kang, Hannah /note=Auto-annotation: Glimmer and GeneMark both call the gene and call a start at 5963 bp, with the start codon ATG. /note=Coding Potential: The gene has a reasonable coding potential predicted within the putative ORF. The chosen start site covers all the coding potential found in GeneMark Self and Host. The coding potential of the gene is only found on one ORF on the forward strand. /note=SD (Final) Score: -2.742. It is the best final score on PECAAN. Also has the highest Z-score of 3.007 on PECAAN. /note=Gap/overlap: 69bp. It is a large gap that is conserved in other phages (Cafasso. Aleemily) and there is no coding potential in that gap that might indicate another gene. /note=Phamerator: Pham: 228. Date 10/29/2024. The gene is conserved in phages Aleemily, Cafasso, and ObLaDi, which are within cluster DZ, the same as ModicumRichard. /note=Starterator: Start site 11 in ModicumRichard was manually annotated in 96 of 284 genes in this pham. Start 11 is 5963 in ModicumRichard, which agrees with the site predicted by Glimmer and GeneMark. /note=Location call: With the evidence above, this is a real gene and has a start site at 5963 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: Major capsid protein. The top three phagesBD BLAST hits have the function of major capsid protein (E-Value < 10^-169). The top NCBI BLAST hit has the function of major capsid protein (100% coverage, 100% identity, and E-Value = 0.0). The second top NCBI BLAST hit also has the function of major capsid protein (100% coverage, 95% identity, and E-Value = 0.0). CDD has a hit with 98.6885% coverage and an E-Value of 1.585e-37. This was also called by HHpred. HHpred had two significant hits on a coat protein and major capsid protein, both with 99.9% coverage, 98%+ coverage, and E-Values less than 9e-27. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, so we can conclude this gene does not encode a membrane protein. /note=Secondary Annotator Name: Mathew, Taniya /note=Secondary Annotator QC: I have QC`ed this location and function call and agree with the primary annotator. CDS 6917 - 7312 /gene="8" /product="gp8" /function="hypothetical protein" /locus tag="ModicumRichard_8" /note=Original Glimmer call @bp 6917 has strength 13.71; Genemark calls start at 6917 /note=SSC: 6917-7312 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_8 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 1.94695E-86 GAP: 36 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.84, -5.071299467497814, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_8 [Gordonia phage Cafasso] ],,QXN74223,100.0,1.94695E-86 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Nuckowski, Sophie /note=Auto-annotation: Both Glimmer and GeneMark call the start source at 6917. /note=Coding Potential: There is coding potential on the second frame on the forward strand only; high, and entirely covered by the start site. /note=SD (Final) Score: -5.071, not the best final score, but close to it /note=Gap/overlap: 36 bp gap, which is acceptable, as it’s less than 50 bp. /note=Phamerator:10693 /note=Starterator: All phages in the same pham have the same manually annotated start, at 6917. /note=Location call: Start @6917, based on previous information, and unambiguous information from starterator. /note=Function call: /note=Transmembrane domains: /note=Secondary Annotator Name: Reynolds, Noah /note=Secondary Annotator QC: I agree with this annotation. However it would be a good idea to add more detail to your PECAAN notes and then fill out the drop down menu for the starterator and GM coding capacity. It would be a good idea to add more to your phamerator (including date the PHAM analysis was run, how many genes its conserved in and the clusters). Try to also add some more information/details to your starterator. Please try to get the last QC notes fixed as well as complete Modules 6-8 in your Lab NB as soon as possible so you can fill out these PECAAN notes and determine a function for your gene. After running a PhagesDB BLAST, NCBI BLAST, CDD, HHpred, and DeepTMHMM analyses, I have determined the gene to have NKF. The PhagesDB BLAST produced four positive hits with ObLaDi_8, Cafasso_8, Morgana_8, and Aleemily_8. These hits had strong E-values, were part of the DZ cluster, exhibited close or similar scores, and demonstrated good identity. The NCBI BLAST results also showed strong matches for Morgana and Cafasso. However, the CDD, HHpred, and DeepTMHMM analyses did not provide any significant hits or relevant domains. As a result of this comprehensive review, I have determined the gene to have NKF. Make sure to include enough detail when you add your own notes. You should include specific values/data sets to provide good evidence. Also when you are done, make sure to click the evidence box for the information used to make the claim of NKF CDS 7322 - 7741 /gene="9" /product="gp9" /function="head-to-tail adaptor" /locus tag="ModicumRichard_9" /note=Original Glimmer call @bp 7331 has strength 12.15; Genemark calls start at 7331 /note=SSC: 7322-7741 CP: yes SCS: both-cs ST: NI BLAST-Start: [head-to-tail adaptor [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 4.95891E-91 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.251, -5.757584294088564, yes F: head-to-tail adaptor SIF-BLAST: ,,[head-to-tail adaptor [Gordonia phage Cafasso] ],,QXN74224,100.0,4.95891E-91 SIF-HHPRED: Hypothetical protein yqbG; ALPHA, GFT NMR, STRUCTURAL GENOMICS, PROTEIN STRUCTURE INITIATIVE, PSI, NESG, SR215, NORTHEAST STRUCTURAL GENOMICS CONSORTIUM, CYANA2.0; NMR {Bacillus subtilis} SCOP: a.229.1.1, l.1.1.1,,,1ZTS_A,95.6835,98.8 SIF-Syn: Head-to-tail adaptor, upstream gene is NKF from pham 10693, downstream is head-to-tail stopper, just like in phage Cafasso. /note=Primary Annotator Name: Burgess, Brooklyn /note=Auto-annotation: Both Glimmer and Genemark agree and call the gene on the call start at 7331. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: For start site 7322, -5.758, which is the best final score on PECAAN. Also, the z score is the highest at 2.251. /note=Gap/overlap: For start site 7322, the gap is 9bp. Very reasonable, especially since the gap is conserved in other phages (ObLaDi, Morgana, Cafasso, Aleemily). /note=Phamerator: Pham 188297 as of Nov 11, 2024. Conserved; Gene 9 displays synteny with ObLaDi (DZ), Morgana (DZ), Cafasso (DZ), and Aleemily (DZ). /note=Starterator: As of November 11, 2024, there are 33 non-draft members of this pham. 24/33 non-draft members call start site 13, however, ModicumRichard does not have the most annotated start. Start number 17 was manually annotated 4 times for cluster DZ. Start site 17 corresponds to a start site @ 7322 for ModicumRichard. This evidence does not agree with the start site of 7331 in ModicumRichard predicted by Glimmer and GeneMark, which is start site 19. /note=Location call: This is a real gene and there is good evidence for the start site at 7322bp. While GeneMark and Glimmer agree and call the gene on the call start at 7331, the more common start site on Starterator is at 7322 and because the best z-score, final score, and gap are at start site 7322, signifying that there is strong evidence that the start site is at 7322, not 7331. /note=Function call: Head-to-tail adaptor. The top three phagesdb BLAST hits have the function of head-to-tail adaptor (E-value <1e-72), and 3 out of 5 top NCBI BLAST hits also have the function of head-to-tail adaptor. (100% coverage, 100%+ identity, and E-value <5e-91). HHpred had a hit for head-to-tail adaptor with 99.41% probability, 85.57% coverage, and E-value of 3.4e-12. CDD also had a hit for head-tail connector protein with E-Value: 5.85e-05. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a transmembrane protein. This makes sense as it functions as a structural component connecting the head and tail within the phage`s protein-based architecture, without interacting with a lipid bilayer. /note=Secondary Annotator Name: Bonver, Sarah /note=Secondary Annotator QC: Almost all sections are complete and good quality. There`s lot`s of evidence to support this gene`s location call and also function. I agree that this is a head-to-tail adaptor. The one thing I recommend is to revisit the transmembrane domains section because I think its possible that some head-to-tail adaptors will have transmembrane domains. The explanation for this section requires more detail and clarification - what about the gene being a head-to-tail adaptor makes sense for why there is no domains in the membrane. I think a quick google search will help clarify. Also the synteny box should contain info in the following format -"Head-to-tail adaptor, upstream gene is NFK, downstream is head-to-tail stopper, just like in phage XXX" Make sure to find a phage for which this statement is true. Also the head-to-tail stopper, is downstream, not upstream. CDS 7738 - 8097 /gene="10" /product="gp10" /function="head-to-tail stopper" /locus tag="ModicumRichard_10" /note=Original Glimmer call @bp 7738 has strength 8.8; Genemark calls start at 7783 /note=SSC: 7738-8097 CP: yes SCS: both-gl ST: SS BLAST-Start: [head-to-tail stopper [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 8.56147E-80 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.921, -4.966271270034708, no F: head-to-tail stopper SIF-BLAST: ,,[head-to-tail stopper [Gordonia phage Cafasso] ],,QXN74225,100.0,8.56147E-80 SIF-HHPRED: Head completion protein gp16; Bacteriophage, SPP1, Portal Protein, Head completion proteins, Connector Complex, DNA Channel, VIRAL PROTEIN; 2.7A {Bacillus subtilis},,,7Z4W_3,94.1176,98.9 SIF-Syn: head-to-tail stopper, upstream gene is head-to-tail adaptor, downstream is protein of the same Pham 9788, just like in phage Cafasso /note=Primary Annotator Name: Piao, Sara /note=Auto-annotation: Both Glimmer and GeneMark called. Glimmer called 7738, GeneMark called 7783. /note=Coding Potential: There is large evidence that this ORF has high coding potential. Both the host and self-trained GeneMark Coding Potential Maps showed a strong coding potential in the 1st forward reading frame as most of the gene was overlapped by the coding potential peak. /note=SD (Final) Score: -4.966. This is the second best final score on PECAAN. This was still chosen because the gap/overlap was the most indicative of a start site. /note=Gap/overlap: Overlap: 4 bp. This shows evidence that the gene is most likely to be part of a lac operon. /note=Phamerator: pham: 9596. Date: 10/30/2024. It is conserved; found in Aleemily (DZ), Cafasso (DZ), Morgana (DZ), ObLaDi (DZ), PSonyx (singleton), and VanLee (singleton). /note=Starterator: Start site 3 in Starterator was manually annotated in 5/6 non-draft genes in this pham. Start 3 is @7738 in ModicumRichard. This evidence agrees with the site predicted by Glimmer. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 7738. /note=Function call: Predicted function is a head-to-tail stopper, based on hits from PhagesDB BLASTp, NCBI BLASTp, and HHpred. PhagesDB BLASTp had 2 hits with 100% identity and low e-values of 2e-66. NCBI BLASTp also had 2 hits with 100% coverage, high identity percentage (≥99%), and low e-values of ≤3e-79. CDD had no hits. HHpred had two relevant hits for gp16 proteins, involved in the mechanism of head-to-tail stopper functions as they encapsulate the viral DNA inside the capsid. The PDB hit was a gp16 head completion protein with 99.4% probability, 94.1176% coverage and a low e-value of 4.6e-12. The Pfam hit was a head-to-tail joining protein with 99.17% probability, 84.8739% coverage, and a low e-value of 6.7e-10. When aligning the amino acid sequence of this gene to the SEA-PHAGES example sequence for a head-to-tail stopper, there was a high probability of 99.75% and a low e-value of 1.3e-21. All of this evidence supports that the function of this gene is most likely a head-to-tail stopper. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Graham, Noah /note=Secondary Annotator QC: after review of the evidence I agree with decision of the primary annotator. CDS 8108 - 8416 /gene="11" /product="gp11" /function="hypothetical protein" /locus tag="ModicumRichard_11" /note=Original Glimmer call @bp 8108 has strength 16.68; Genemark calls start at 8108 /note=SSC: 8108-8416 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_11 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 1.00372E-68 GAP: 10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.404, -3.898968357315439, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_11 [Gordonia phage Cafasso] ],,QXN74226,100.0,1.00372E-68 SIF-HHPRED: Minor_capsid_2 ; Minor capsid protein,,,PF11114.11,98.0392,99.2 SIF-Syn: /note=Primary Annotator Name: Sahagun, Diego /note=Auto-annotation: Glimmer and Genemark both have the start site 8108. /note=Coding Potential: Both Self and Host have good coding potential and coverage of proposed gene. /note=SD (Final) Score: -3.899, score closest to 0 and also has a z-score of 2.404 and is the highest /note=Gap/overlap:10, gap is big enough to not be operon and small enough to not be another gene. /note=Phamerator: Pham of 9788, Aleemily, Cafasso, Morgana,ObLaDi, Phages all conserve this gene with an unknown function. /note=Starterator: Start Site 6 which had matches in 5/5 nondraft phages and is 8108 in ModicumRichard. /note=Location call: 8108 /note=Function call: Minor Capsid Protein, BLAST results ended up in a unknown function result compared to all the other phages in the same cluster having this gene conserved. Considering the HHPred results having a really high probability and coverage mean that this evidence should provided a reasonable function to the gene but a condition of the exact protein has to do with surrounding protein genes has not been met. This means that the gene is unable to be a minor capsid protein and the BLAST data is the best supporting meaning the gene has an unknown function. /note=Transmembrane domains: No calls for Trans membrane domains, only on outside proteins, aligns with function call. /note=Secondary Annotator Name: Kang, Hannah /note=Secondary Annotator QC: I have QC`ed this location and function call and agree with the first annotator. CDS 8416 - 8820 /gene="12" /product="gp12" /function="Tail Terminator" /locus tag="ModicumRichard_12" /note=Original Glimmer call @bp 8416 has strength 14.26; Genemark calls start at 8416 /note=SSC: 8416-8820 CP: yes SCS: both ST: SS BLAST-Start: [tail terminator [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 6.70338E-94 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.212, -4.291708691156382, no F: Tail Terminator SIF-BLAST: ,,[tail terminator [Gordonia phage Cafasso] ],,QXN74227,100.0,6.70338E-94 SIF-HHPRED: Tail terminator protein Rcc01690; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_F,97.7612,99.1 SIF-Syn: Tail terminator, downstream major tail protein, upstream of a protein with an unknown function (Pham 9788). just like in Phage Cafasso /note=Primary Annotator Name: Zhang, Kevin /note=Auto-annotation: Glimmer and Genemark. Both call at 8416 /note=Coding Potential: Coding potential is only present in the forward ORF. In GeneMark Self and Host, there is coding potential that covers the region between the start and stop site (8416-8820). /note=SD (Final) Score: -4.292. Largest SD score on PECAAN list. Z-score is 2.212 (highest score) /note=Gap/overlap: 1 bp overlap (-1). In a sequence of genes, observed to be a part of consecutive groups in Morgana and ObLaDi /note=Phamerator: Pham: 144818. Date:10/29/24. It is conserved; found in Cafasso, ObLaDi, Morgana, and Aleemily (all DZ) /note=Starterator: Start site 2 in Starterator was manually annotated in 4/5 non-draft genes in this pham. Start 2 is 8416 in ModicumRichard. This is the same start site that Glimmer and GeneMark calls /note=Location call: Based on the evidence above and the start site is most likely 8416. /note=Function call: Tail terminator protein. Based on the phages from the same pham with the help of phagesDB BLAST (e-value of 4/4 in cluster on phagesDB is <6e-70) and NCBI BLAST (top two results were phages from same cluster Cafasso and Aleemily with e-scores <2e-87) call for the protein to be a tail terminator. CDD had no hits for this protein. HHpred calls with 5A21_G with 99.2% probability, 97.612% coverage and an e-value of 1e-13. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Nuckowski, Sophie /note=Secondary Annotator QC: I agree with this annotation, and it contains a good amount of detail in support that the gene is real, and that 8416 is the correct location call. There is only one reading frame with coding potential, there are no excessive gaps and the overlap present (1 bp) strongly indicates that the gene may be an operon, and there is a high degree of synteny with other related phages. /note=Tertiary Annotator Name: Bouklas, Tejas /note=Tertiary Annotator QC: I agree with your function call. Please only include 2 pieces of evidences and complete the synteny boxes. CDS 8905 - 9573 /gene="13" /product="gp13" /function="major tail protein" /locus tag="ModicumRichard_13" /note=Original Glimmer call @bp 8905 has strength 17.51; Genemark calls start at 8905 /note=SSC: 8905-9573 CP: yes SCS: both ST: SS BLAST-Start: [major tail protein [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 1.05237E-160 GAP: 84 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.334, -2.0720764396375664, yes F: major tail protein SIF-BLAST: ,,[major tail protein [Gordonia phage Cafasso] ],,QXN74228,100.0,1.05237E-160 SIF-HHPRED: YSD1_22 major tail protein; Bacteriophage tail, helical assembly, VIRAL PROTEIN; 3.5A {Bacteriophage sp.},,,6XGR_I,55.8559,93.7 SIF-Syn: Major Tail protein, upstream gene is NFK , downstream is Tail Terminator, just like in phage Aleemily. /note=Primary Annotator Name: Alondra Echeverria /note=Auto-annotation: Glimmer and Genemark. Both call the start at 43901. /note=Coding Potential: Coding Potential in this ORF is on the forward strand and includes all coding potential in that region in both host and self. /note=SD (Final) Score: The final score is the best option at -2.072 and the z score is the highest at 3.334. /note=Gap/overlap: 84 bp. Somewhat larger than 50 bp but ultimately close that it is reasonable because gap is conserved in other phages. /note=Phamerator: 192477. Date 10/24/2024. It is conserved; found in Aleemily and Cafasso. /note=Starterator: Start Site is 5 in Starterator was manually annotated in 77 of 82 in this pham. Start 5 is 8905 in ModicumRichard. This evidence agrees with the site predicted by Glimmer and GenMark. Location call: Based on the information above, this is a real gene and the most likely start site is 8905. /note=Function call: Major Tail Protein. The top two phagesdb BLAST hits have the function of major tail protein (E-value <10^-44), and 4 out of 5 top NCBI BLAST hits also have the function of major tail proteins. (100% coverage, 222/222(100%), and E-values <10^-160. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a transmembrane protein. /note=Secondary Annotator Name: Burgess, Brooklyn /note=Secondary Annotator QC: I agree with the annotations. All of the evidence categories have been considered. Note: put location call on the next line below so that it can be formated better. CDS 9666 - 9929 /gene="14" /product="gp14" /function="hypothetical protein" /locus tag="ModicumRichard_14" /note=Original Glimmer call @bp 9666 has strength 17.87; Genemark calls start at 9666 /note=SSC: 9666-9929 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_14 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 8.53733E-50 GAP: 92 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.927, -2.827683592113848, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_14 [Gordonia phage Cafasso]],,QXN74229,100.0,8.53733E-50 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Jana, Pranava /note=Auto-annotation: Glimmer and Genemark agree that the start site is @ 9666. /note=Coding Potential: Most of the Coding Potential of the ORF is present in the 3rd forward reading frame with a little reverse potential spike in the 2nd reading frame that is likely inaccurate. Glimmer and Genemark both agree with very similar coding potentials. /note=SD (Final) Score: Lowest (best) SD of -2.828, with a good z-score of 2.927. /note=Gap/overlap: 92bp. Slightly large, but ultimately reasonable since the gap is conserved in other phages (ObLaDi, Morgana, Cafasso, and Aleemily), and there is no coding potential in the gap that might be a new gene. /note=Phamerator: Pham 11022 as of Nov 11, 2024. Conserved; Gene 14 displays synteny with ObLaDi (DZ), Morgana (DZ), Cafasso (DZ), and Aleemily (DZ). /note=Starterator: As of Nov 11, 2024, there are only 4 non-draft members of this pham. 4/4 non-draft members call start site 2, which corresponds to a start site @ 9666 for ModicumRichard. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 9666 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: Multiple phagesDB BLAST has hits with no suggested function with small e-values of 1e-44 to 4e-32 and 4/4 top NCBI BLAST hits with NKF (100% probability, 100% coverage, and e-value of 8.35415e-50). HHPRED and CDD have no significant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein, but the protein is intracellular and a “globular protein.” /note=Secondary Annotator Name: Piao, Sara /note=Secondary Annotator QC: I have QC’ed this location and function call (NKF) and agree with the first annotator. Make sure to also select the two top hits for PhagesDB BLASTp in PECAAN even if there is no known function. CDS 9969 - 10481 /gene="15" /product="gp15" /function="tail assembly chaperone" /locus tag="ModicumRichard_15" /note=Original Glimmer call @bp 9969 has strength 12.91; Genemark calls start at 9969 /note=SSC: 9969-10481 CP: yes SCS: both ST: SS BLAST-Start: [tail assembly chaperone [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 9.22324E-118 GAP: 39 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.62, -4.443838065747148, no F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Gordonia phage Cafasso] ],,QXN74230,100.0,9.22324E-118 SIF-HHPRED: GP24_25 ; Mycobacteriophage tail assembly protein,,,PF17388.5,70.5882,71.0 SIF-Syn: Downstream is a gene with no known function (Pham 11022). Upstream is a gene encoding for a tail assembly chaperone (Pham 154333, only phage in pham). Phage ObLaDi, also in cluster DZ, also has gene with no known function (Pham 11022) downstream and a tail assembly chaperone upstream (different Pham). /note=Primary Annotator Name: Mastro, David /note=Auto-annotation: Glimmer and GeneMark. Both call at 9969. /note=Coding Potential: Coding potential is seen across the entire range (9969-10481) in the appropriate direction (Forward). This is shown in both GeneMark Self and Host. /note=SD (Final) Score: The SD Final score is -4.444. It is the 2nd best final score, but has a much more reasonable gap. /note=Gap/overlap: Gap is 39bp. This is a good size and space for a promoter. /note=Phamerator: The pham is 185743 (10/30/24). This pham is found in every other gene in its cluster. /note=Starterator: Start #1 was found in 12 other non-draft phages. This phage did not have the start site most common to all of the phages in its starterator group, so start #1 (9969) makes the most sense. /note=Location call: Based on the evidence, this is likely a real gene that starts at 9969. There is reasonable gap, good scoring, Glimmer+Genemark agree, and proper coding potential coverage. /note=Function call: Tail assembly chaperone. Top 20 hits (excluding self) on PhagesDB BLASTp suggest tail assembly chaperone (e value < 4e-7). Top 7 hits on NCBI BLASPp suggest tail assembly chaperone (e value < 1e-95). Top PhagesDB hit had 100% identity, top NCBI hit had 100% identity. This gives strong proof. While CDD showed no hits, HPred had a strong hit with 100% coverage and 93.66% probability, and this hit suggested tail assembly chaperone. This gives strong proof again to tail assembly chaperone. Also, the protein is predicted to reside within the membrane, which aligns with our predicted function. /note=Transmembrane domains: DeepTMHMM predicts no transmembrane domains and instead predicts the protein exists within the membrane. Therefore, this is likely not a transmembrane protein. /note=Secondary Annotator Name: Sahagun, Diego /note=Secondary Annotator QC: Evidence confirms that the gene is real with synteny. Start site confirmation through phamerator annotation and autoannotation matches.Evidence from BLAST HHPRED and NCBI all supports tail assembly chaperone. CDS 10346 - 10921 /gene="16" /product="gp16" /function="tail assembly chaperone" /locus tag="ModicumRichard_16" /note=Original Glimmer call @bp 10538 has strength 14.59; Genemark calls start at 10538 /note=SSC: 10346-10921 CP: yes SCS: both-cs ST: NI BLAST-Start: [tail assembly chaperone [Gordonia phage Cafasso]],,NCBI, q35:s161 82.199% 3.30955E-101 GAP: -136 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.7, -5.342105208121439, yes F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Gordonia phage Cafasso]],,QXN74231,48.265,3.30955E-101 SIF-HHPRED: SIF-Syn: Tail assembly chaperone protein, upstream gene is also tail assembly chaperone protein, downstream is tape measure protein /note=Primary Annotator Name: Namburu, Ushaswini /note=Auto-annotation: Glimmer and Genemark. Both call the start @10538. /note=Coding Potential: The coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Good coding potential is observed in Genemark Self and Host. /note=SD (Final) Score: The final score is the best option at -5.432 and the z-score is the highest at 1.7 for start @ 10346. /note=Gap/overlap: The overlap is large with a 136 bp overlap, but this overlap is observed in the Pham maps of final annotated genes and the function of the gene is known to have overlaps with other genes. /note=Phamerator: Pham 154333. Date 10/31/24. This is the only gene that is part of the pham. /note=Starterator: Starterator is not available for this pham. /note=Location call: Considering all the evidence, this gene is a real gene and has a start site at 10346 as it has a higher final score and more significant Z-score. This does not agree with Glimmer and Genemark, which call the start site at 10538. /note=Function call: Tail assembly chaperone protein. The top two phagesdb BLAST hits have the function of tail assembly chaperone protein (E-value <10^-65), and 2 out of 5 top NCBI BLAST hits also have the function of tail assembly chaperone protein. (100% coverage, 65%+ identity, and E-value <10^-81). CDD and HHpred did not have any significant hits as there was low coverage. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Zhang, Kevin /note=Secondary Annotator QC: Annotation looks good and location call looks good. Only issue is the gap/overlap not matching with the numbers from PECAAN and PECAAN notes. /note=Function call looks great. The evidence provided through phagesDB and NCBI BLAST show the function. Could possibly elaborate why the HHPred had no significant hits (i.e. low coverage) CDS 10946 - 15646 /gene="17" /product="gp17" /function="tape measure protein" /locus tag="ModicumRichard_17" /note=Original Glimmer call @bp 10946 has strength 13.59; Genemark calls start at 10946 /note=SSC: 10946-15646 CP: yes SCS: both ST: SS BLAST-Start: [tape measure protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 0.0 GAP: 24 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.404, -4.48879389222639, no F: tape measure protein SIF-BLAST: ,,[tape measure protein [Gordonia phage Cafasso]],,QXN74232,99.9361,0.0 SIF-HHPRED: Tape Measure Protein, gp57; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_BF,3.57599,98.6 SIF-Syn: Downstream from this gene is a tail assembly chaperone protein and upstream is a minor tail protein. Upstream pham number for ObLaDi is 193606 (no synteny downstream with gene 16). Upstream for Cafasso is 193606 (no synteny downstream with gene 16) /note=Primary Annotator Name: Xeleste Aguilar /note=Auto-annotation: Glimmer and GeneMark, both call for start at 10946 /note=Coding Potential: Yes the gene has coding potential within the ORF forward strand. Both the GeneMark Self and Host demonstrate the same forward strand coding potential, there are no visible discrepancies. /note=SD (Final) Score: -4.489 /note=Gap/overlap: gap 24 /note=Phamerator: pham: 162522, Date: 10/30/24, Conserved and found in phages Cafasso and Aleemily. /note=Starterator: Start site 1 in Starterator was manually annotated in 5/5 non-draft genes. Start site 1 is 10946. /note=Location call: Based on the evidence collected, the start site of this gene is 10946. /note=Function call: Tape Measuring Protein (all hits show the same function, e scores are low on the hits and there is similarity across all) /note=Transmembrane domains: DeepTMHMM does predict 1 TMD, so it is a membrane protein with 9aa /note=Secondary Annotator Name: Echeverria, Alondra /note=Secondary Annotator QC: I agree with the annotations. All of the categories have been considered. Note 3bp gap is okay it is most likely an operon. CDS 15643 - 16551 /gene="18" /product="gp18" /function="minor tail protein" /locus tag="ModicumRichard_18" /note=Original Glimmer call @bp 15643 has strength 11.28; Genemark calls start at 15643 /note=SSC: 15643-16551 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Gordonia phage Aleemily]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.737, -3.296037457437639, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Gordonia phage Aleemily]],,UVK59758,100.0,0.0 SIF-HHPRED: HYPOTHETICAL PROTEIN 19.1; VIRAL PROTEIN, DISTAL TAIL PROTEIN; 2.95A {BACILLUS PHAGE SPP1},,,2X8K_C,85.4305,99.8 SIF-Syn: Minor tail protein, upstream gene is tape measure protein, downstream gene is minor tail protein just like phage ObLaDi /note=Primary Annotator Name: Sarahi Jasso /note=Auto-annotation: The start site for Glimmer and GeneMark both agree at 15643. /note=Coding Potential: Yes /note=SD (Final) Score: The final score was -3.296, which is higher than 2, but the Z score was well below 0. /note=Gap/overlap: -4 bp /note=Phamerator: Found in pham 192168 on 10/30/2024. /note=Starterator: Shows a start site of 93, which is not the most conserved. /note=Location call: The evidence suggests that the original start site was correct. Since it has an overlap of -4 it is possible that this is an operon and thus does not have the most conserved start site of 15643. /note=Function call: Most likely a minor tail protein since there were good BLASTp hits that had e scores that were well below 0. /note=Transmembrane domains: 0, there are no transmembrane domains /note=Secondary Annotator Name: Jana, Pranava /note=Secondary Annotator QC: Try to include z-score. Also specify if coding potential is in the forward or reverse strand. I have QC’ed this location and function call (minor tail protein) and agree with the first annotator. Make sure to fill out synteny since the function is listed as minor tail protein. CDS 16548 - 18212 /gene="19" /product="gp19" /function="minor tail protein" /locus tag="ModicumRichard_19" /note=Original Glimmer call @bp 16548 has strength 16.57; Genemark calls start at 16548 /note=SSC: 16548-18212 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Gordonia phage ObLaDi]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.074, -4.864188174661353, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Gordonia phage ObLaDi]],,UXE03742,100.0,0.0 SIF-HHPRED: Sipho_Gp37 ; Siphovirus ReqiPepy6 Gp37-like protein,,,PF14594.9,87.0036,99.6 SIF-Syn: Minor tail protein, upstream gene is HNH endonuclease, downstream is minor tail protein, just like in phages Cafasso and ObLaDi /note=Primary Annotator Name: Mathew, Mallika /note=Auto-annotation: Glimmer and GeneMark. Both call the start site at 16548 /note=Coding Potential: Coding potential in this ORF is in the forward strand only, indicating that it is a forward gene. Coding potential is found both in GeneMark Self and Host /note=SD (Final) Score: -4.684. It is the second highest score on PECAAN. /note=Gap/overlap: gap of -4 bp indicating it is probably an operon. Creates longest ORF and is a very strong start site. /note=Phamerator: Pham 192166 Date 10/29/2024. It is conserved, found in ObLaDi, Morgana, Aleemily /note=Starterator: Start position 19 in Starterator (start site 16548) was manually annotated in non-draft genes in this pham. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: It seems to be a real gene due to good coding potential and synteny with non-draft phages in cluster DZ. 44897 is the correct start site and Starterator agrees with Glimmer and GeneMark. /note=Function call: Minor tail protein. The top 2 BLASTp hits and NCBI hits were minor tail proteins, it is in the syntenic region for minor tail proteins, and PDB gave a strong hit for a minor tail protein. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name:Mastro, David /note=Secondary Annotator QC: Good - the evidence is strong and it is good that both Glimmer and GeneMark agree. Function looks good, I agree with minor tail protein. Great support from PhagesDB and NCBI blast, and PDB hit is promising. Need to mark evidence in phagesDB. CDS complement (18225 - 18713) /gene="20" /product="gp20" /function="HNH endonuclease" /locus tag="ModicumRichard_20" /note=Original Glimmer call @bp 18713 has strength 5.44; Genemark calls start at 18713 /note=SSC: 18713-18225 CP: yes SCS: both ST: SS BLAST-Start: [HNH endonuclease [Gordonia phage ObLaDi]],,NCBI, q1:s1 100.0% 3.89734E-118 GAP: 243 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.828, -3.108788982793736, yes F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Gordonia phage ObLaDi]],,UXE03743,100.0,3.89734E-118 SIF-HHPRED: d.4.1.3 (M:1-105) Intron-encoded homing endonuclease I-HmuI {Bacteriophage SP01 [TaxId: 10685]},,,d1u3em1,64.8148,99.9 SIF-Syn: HNH endonuclease, upstream gene is NKF (pham 173580) , downstream gene is minor tail protein, just like in phage Cafasso /note=Primary Annotator Name: Sridharan, Ananya /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 18713. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.109 is the best score on PECAAN /note=Gap/overlap: There is a gap of 243, but there is no coding potential in that gap that suggests it might need to be another gene /note=Phamerator: Pham 192232 Date 10/18/2024. It is conserved; found in ObLaDi (DZ) and Cafasso (DZ). /note=Starterator: Start site 28 in Starterator was manually annotated in non-draft genes in this pham. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: The gene is real and starts at 18713 /note=Functional call: HNH endonuclease. The top three phagesdb BLAST hits have the function of HNH endonuclease (E-value 1e-100), NCBI BLAST hits also have the function. (100% coverage). HHpred had a hit for HNH endonuclease. CDD has 2 significant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Namburu, Ushaswini /note=Secondary Annotator QC: Everything looks good, maybe you could include the Z-score in the SD section. The HHpred and CDD hits are good, but maybe you can include coverage values for CDD. CDS 18957 - 19691 /gene="21" /product="gp21" /function="hypothetical protein" /locus tag="ModicumRichard_21" /note=Original Glimmer call @bp 18957 has strength 11.0; Genemark calls start at 18957 /note=SSC: 18957-19691 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_21 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 4.06443E-179 GAP: 243 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.564, -3.649482460397077, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_21 [Gordonia phage Cafasso] ],,QXN74236,100.0,4.06443E-179 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Carrillo, Daniel /note=Auto-annotation: Glimmer and Genemark both call the start at 18957. /note=Coding Potential: Coding potential is found on the third ORF of the forward strand, indicating this is a forward gene. Both self-trained and host-trained GeneMark demonstrate sufficient coding potential. /note=SD (Final) Score: The suggested final score is the best score at -3.649 with a Z-value of 2.564 being the highest. /note=Gap/overlap: There is a 243bp gap upstream of the Glimmer and GeneMark called start site. However this is reasonable as the gap is conserved in other phages such as ObLaDi and Cafasso. There is no coding potential in the gap that may code for a new gene. /note=Phamerator: This gene is found in Pham: 173580 as of 10/29/2024. This gene is conserved in phages ObLaDi, Cafasso, and Aleemily. /note=Starterator: Start site 4 is conserved across 3 other phages. It was manually annotated in ⅗ final phages (ObLaDi, Cafasso, and Aleemily). Start site 4 is 18957 in ModicumRichard. This agrees with Glimmer and GeneMark. /note=Location call: Based on above evidence, this is a real gene with a startsite of 18957, agreeing with Glimmer and GeneMark. /note=Function call: Unknown function. The majority of top “hits” in BLASTp and NCBI BLAST do not have a function assigned to them, despite two phages ObLaDi and Morgana having a minor tail protein (E-values of 1e^-144 and 1e^-127 respectively). The majority of top hits just do not have a function while having some of the best E-values. Furthermore CDD only had one hit with an E-value of 7.32e-14. HHPRED had one significant hit that had an alignment with PF10910.13 with a 100% probability, a coverage of 54.5082%, and an E-value of 1.4e-34. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore the protein is not a membrane protein. /note=Secondary Annotator Name: Aguilar, Xeleste /note=Secondary Annotator QC: Everything looks great in the notes and PEECAN is up to date! Only thing that needs to be addressed is the gap #. I do not see 264 gap listed in the chart above, this needs attention. Also make sure to select your coding capacity and starterator. I agree with the first annotator on gene location. /note=Secondary Annotator QC 11/15: Everything looks great in the notes and PEECAN is up to date! Only thing that needs to be addressed is the Synteny box below. Annotation module says the box should be left blank if the function of the gene is unknown, this is the only correction that should be made. CDS 19702 - 21441 /gene="22" /product="gp22" /function="minor tail protein" /locus tag="ModicumRichard_22" /note=Original Glimmer call @bp 19702 has strength 15.76; Genemark calls start at 19702 /note=SSC: 19702-21441 CP: yes SCS: both ST: NI BLAST-Start: [minor tail protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 0.0 GAP: 10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.177, -2.253486910374644, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Gordonia phage Cafasso]],,QXN74237,99.3092,0.0 SIF-HHPRED: Exosporium protein; ExsFA, BxpB, Bacillus cereus, exosporium, endospore, STRUCTURAL PROTEIN; HET: GOL; 1.08A {Bacillus cereus},,,7R5I_B,19.3437,69.8 SIF-Syn: Minor tail protein has more minor tail proteins down streaam, along with some up steam and chaperones. Calfasso and ObLaDi had great syteny with the gene. /note=Primary Annotator Name: Gutierrez, Michelle /note=Auto-annotation: Both the GeneMark start and Glimmer start recorded the start codon at 19702 /note=Coding Potential: Forward strand has coding potential because there is a start codon at the start of the gene 19702 and a stop codon at the 21441 location. /note=SD (Final) Score: -2.253 /note=Gap/overlap: 10 /note=Starterator: The start site is reasonable and conserved across all phages in the pham. Start site called is 3. 4/4 final phages called this start site. Start site 3 in staterator is 19703bp in Modicum Richard. The evidence agrees with Glimmer and Genemark. /note=Location call: Based on the above evidence, this is a real gene and most likely start site is 19703bp. /note=Function call: Minor tail protein. The top three phagesdb BLAST hits have the function of minor tail protein (E-value 0), NCBI BLAST hits also have the function. (100% coverage). HHpred had a hit for structural protein. CDD had 0 significant hits. /note=Transmembrane domains: There was 1 TMD all on the outside. /note=Secondary Annotator Name: Jasso, Sarahi /note=Secondary Annotator QC: Update pecan notes on notebook as well. For the location call, explain the actual start site and how it was proved using the evidence. Also for the starterator, include that it had start site number 3. CDS 21455 - 21619 /gene="23" /product="gp23" /function="hypothetical protein" /locus tag="ModicumRichard_23" /note=Original Glimmer call @bp 21455 has strength 16.22; Genemark calls start at 21455 /note=SSC: 21455-21619 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ALEEMILY_22 [Gordonia phage Aleemily] ],,NCBI, q1:s1 98.1481% 5.28182E-29 GAP: 13 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.797, -5.909146062529675, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEEMILY_22 [Gordonia phage Aleemily] ],,UVK59762,100.0,5.28182E-29 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Scheithauer, Julia /note=Auto-annotation: Both Glimmer and GeneMark call the start site as 21455. The start codon is ATG. /note=Coding Potential: Coding potential is found on the forward strand only, suggesting this is a forward gene. Coding potential was found on both GeneMark and host. /note=SD (Final) Score: -5.909. This is the best SD score that PECAAN reported, as it was the only start site suggested. It is still reasonable however. /note=Gap/overlap: 13. This is a reasonable gap, as it is not too big. The gene is 165 bp long. /note=Phamerator: Pham 194012, 11/15/2024. This Pham does appear to be conserved in the other phages in the cluster. I used ObLaDi and Aleemily as comparison. The other phages in this Pham cannot call a function for the gene. /note=Starterator: Yes, the start site is 21455 bp. This start site does appear to be the most conserved with other members of the cluster, even though it is not the most called start site overall for the gene (29/63 phages in the pham call it). Start number 10, position: 21455 bp /note=Location call: This seems to be a real gene, due to the good coding potential and the synteny observed in other finalized phages. Starterator also agrees with this start site, and it is the same as the other phages in the cluster DZ. The start site appears to be at 21455. /note=Function call: NKF because of the lack of quality hits from HHpred and CDD (very high e-values, low coverage, and low probability), and the other phages in the genome from PhagesDB also have genes here with NKF with very high e-values. /note=Transmembrane domains: none. /note=Secondary Annotator Name: Mathew, Mallika /note=Secondary Annotator QC: I have QC`ed this annotation and agree with the location call made as the data from starterator and phamerator as well as the final score support this location call. Add which non-draft phages it is conserved in and starterator start site evidence to PECAAN notes. I agree with the function call made and the evidence from PhagesDB, CDD and HHpred CDS 21653 - 21991 /gene="24" /product="gp24" /function="hypothetical protein" /locus tag="ModicumRichard_24" /note=Original Glimmer call @bp 21653 has strength 15.98; Genemark calls start at 21653 /note=SSC: 21653-21991 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ALEEMILY_23 [Gordonia phage Aleemily] ],,NCBI, q1:s1 100.0% 3.14287E-76 GAP: 33 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.007, -2.7423981586358774, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEEMILY_23 [Gordonia phage Aleemily] ],,UVK59763,100.0,3.14287E-76 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Grunski, Lilly /note=Auto-annotation: Glimmer and Genemark agree on start site: 21653 /note=Coding Potential: Yes, both the self-trained and host-trained showed the same coding potential across the same regions. The ORF has 1 stop codon in the expected region and includes all of the coding potential. /note=SD (Final) Score: -2.742which is the best score with a z-score of 3.007 /note=Gap/overlap: 33 bp gap which is pretty small and most likely does not contain an additional gene as there is no coding potential on the host/self trained Genemark. The gap is also conserved in other phages with synteny: ObLaDi and Aleemy. /note=Phamerator: 192433 (10/29/2024) It is conserved; found in Aleemy (DZ), Cafasso (DZ), Morgana (DZ), and ObLaDi (DZ) /note=Starterator: The start number called the most often in the published annotations is 29, it was /note=called in 44 of the 106 non-draft genes in the pham. (Start: 29 @21653 has 44 MA`s) This agrees with the autoannotations of both Glimmer and Genemark. /note=Location call: Start: 29 @21653 has 44 MA`s /note=Function call: NKF, Both the phagesDB and NCBI BLAST have hits from phages in the same cluster (DZ) with the suggested function NKF with small e values of 1e-61 to 2e-51. HHpred had hits with very low e-values, but non-ideal coverage percentages. The CDD resulted in zero significant hits. /note=Transmembrane domains: None /note=Secondary Annotator Name: Sridharan, Ananya /note=Secondary Annotator QC: Good descriptions and results CDS 21995 - 23920 /gene="25" /product="gp25" /function="minor tail protein" /locus tag="ModicumRichard_25" /note=Original Glimmer call @bp 21995 has strength 17.37; Genemark calls start at 21995 /note=SSC: 21995-23920 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 0.0 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.435, -3.773970443115436, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Gordonia phage Cafasso] ],,QXN74240,100.0,0.0 SIF-HHPRED: SIF-Syn: Minor tail protein, downstream is another minor tail protein /note=Primary Annotator Name: Gonzalez Hernandez, Yalit /note=Auto-annotation: Genemark and Glimmer Start: 21,995 at ATG start codon /note=Coding Potential: The gene has a reasonable coding potential predicted within the putative ORF and is found in both Host-Trained and Self-Trained Genemark. Coding potential is found on the forward strand only. /note=SD (Final) Score: -3.774; best highest score, Z-score is higher than 2 (2.435) /note=Gap/overlap: 3; gap is reasonable given that the surrounding genes are very close to each other. The gene is 1926 base pairs long. /note=Phamerator: The gene on 10/31/24 was found to be in pham number- 191508. The gene is a highly conserved gene in the DZ cluster (Aleemily_24, Cafasso_25, ObDiLa_25) and seems to be a minor tail protein. /note=Starterator: Starterator seems to state that the most conserved start site for the gene is 8, but the DZ cluster shares a start different from other genes in a different cluster which is start site 14 at 21,955 with a few bp in variance compared to other genes that have start 14.. Also, the length of the gene is conserved throughout the DZ cluster at 1926 bp. /note= Location call: Yes, my notes show that this is a real gene and that it has good coding potential at the start site 14: at around 21,995 bp. Starterator was a bit helpful in determining the trend of a different conserved start site for the subcluster DZ (start site 14: at around 21,995 bp) that supports the start site 21,995 for ModicumRichard_25 gene. This start site covers the coding potential for this gene. /note=Function call: The predicted function of the gene as a Minor Tail Protein is supported by strong BLAST results from phagesDB and NCBI BLASTp, which show 100% identity with Minor Tail Proteins from the phages ObLaDi_25 and Cafasso_25, both with high query coverage (100%) and E-values of 0, indicating reliable matches. Additionally, the sequence aligns well with similar proteins in other Gordonia phages, such as Cafasso and Mollymur, with 78% identity. But, CDD and HHpred did not support these results, since there seem to not be any significant hits. The next following gene seems to be a minor tail protein as well. /note=Transmembrane domains:No TMD were detected; but this whole protein seems to be presented on the outside, which does align to where the minor tail prrotien should be at considering that it facilitates entry into host cell. /note=Secondary Annotator Name: Carrillo, Daniel /note=Secondary Annotator QC: Start is sufficient. /note=For Starterator I would add comments onto the start position in addition to the start number 14. I would also bring mention of other manual annotations and phages that share the start site. /note=For Phamerator bring mention of date and other phages you compared to that are in same subcluster. 11/16 - Update - I do not agree with this call. This is based solely on BLASTp results with e-values of 0 which are not that significantly low, meaning that alignment is likely random. Ensure to do Transmembrane domain - even if the protein is likely NFK there is still possibility it may be a membrane protein. CDS 23920 - 24630 /gene="26" /product="gp26" /function="minor tail protein" /locus tag="ModicumRichard_26" /note=Original Glimmer call @bp 23920 has strength 7.82; Genemark calls start at 23920 /note=SSC: 23920-24630 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_26 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 3.72215E-161 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.002, -4.741344217713455, no F: minor tail protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_26 [Gordonia phage Cafasso] ],,QXN74241,100.0,3.72215E-161 SIF-HHPRED: d.51.1.1 (G:195-274) Ribosomal RNA-processing protein 40, RRP40 {Human (Homo sapiens) [TaxId: 9606]},,,d2nn6g3,4.66102,11.8 SIF-Syn: gene is a minor tail protein, which assists in the assembly of the phage’s tail and allows it to adsorb to a bacterial cell and penetrate it’s surface in order to inject its genomic DNA content and begin infection. This is supported because it is also found in Gordonia phae aleemily and Gordonia Phage ObLaDi which both contain identity scores of 99%. Upstream of this gene is a minor tail protein and downstream is a lysin A protease. /note=Primary Annotator Name: Pimentel, Patricia /note=Auto-annotation: Both Glimmer and GeneMark both call the gene to start at 23920. /note=Coding Potential: There is coding potential on both frames of the forward strand. This coding potential is entirely covered by the start site and is considered high for both the Self and Host-Trained. /note=SD (Final) Score: -4.741. This is not the best and most optimal start, as the other genes candidates in the sequence have z-scores that are lower than the 2.000 threshold and have smaller (more negative) final scores. /note=Gap/overlap: The overlap is -1, which suggests that this gene is involved in the presence of an operon. /note=Phamerator: As of 11/03/2024, the pham is 193630. This pham has 581 members, 51 of which are drafts, which are in different clusters. There are 36 clusters represented in this pham: singleton, DJ, DL, DZ, DV, DR, DQ, A15, A14, A17, A16, A11, A10, CU1, A12, CW1, A19, A18, A3, A2, A5, A4, A6, A9, DU1, A, A8, CR, C2, CT, A20, A21, CR1, A13, DE3, B3. My gene is found in start 23, which is found in 11 of 581 (1.9%) of genes in pham. /note=Starterator: The auto-annotated start site is site 38. which corresponds to the position of 23920. This is also the most-annotated start site. /note=Location call: Starts at 23920. Based on the evidence, my gene is a real gene as it is conserved in Starterator and has good coding potential. 23920 seems the most likely location call start site as it has good synteny and the most optimal Z-score. /note=Function call: minor tail protein; assists in the assembly of the phage’s tail and allows it to adsorb to a bacterial cell and penetrate it’s surface in order to inject its genomic DNA content and begin infection. This is supported because it is also found in Gordonia phage aleemily and Gordonia Phage ObLaDi which both contain identity scores of 99%. /note=Transmembrane domains: There are 0 predicted TMDs. The program predicts a 100% probability that sequences greater than 40 are outside of the domain of the membrane whereas only sequences below 25 are signals. /note=Secondary Annotator Name: Gutierrez, Michelle /note=Secondary Annotator QC: 11/19 everything looks good and there is significant detail in your work no need to fix anything in my opinion. CDS 24665 - 25339 /gene="27" /product="gp27" /function="lysin A, protease C39 domain" /locus tag="ModicumRichard_27" /note=Original Glimmer call @bp 24665 has strength 13.52; Genemark calls start at 24665 /note=SSC: 24665-25339 CP: yes SCS: both ST: SS BLAST-Start: [lysin A, protease C39 domain [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 3.48415E-166 GAP: 34 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.521, -3.659876950936526, no F: lysin A, protease C39 domain SIF-BLAST: ,,[lysin A, protease C39 domain [Gordonia phage Cafasso] ],,QXN74242,100.0,3.48415E-166 SIF-HHPRED: SIF-Syn: This gene exhibits synteny with other genes with its cluster DZ. More specifically, it shares function calls with phages Morgana, Aleemily, and Cafasso. However, only shares its start site with Aleemily and Cafasso. For the other two phages mentioned, the alignment is slightly altered, with the genes being located a few base pairs upstream. However, with Cafasso, all the genes are directly aligned thus showing its synteny. /note=Primary Annotator Name: Allyson Khachaturov /note=Auto-annotation: high, consistent with both Glimmer and GeneMark /note=Coding Potential: A start site is listed for both Glimmer and GeneMark. Both programs have listed GTG as its start and predict the location call to be 25336. Since both programs identified the same start at the both the location and the exact codon, I would consider it to be accurate. This gene has good coding potential due to the first direct sequence that shows an abundance of uninterrupted ORFs. Furthermore, Host-Trained GeneMark indicates gene coding potential before stop 25339 with no gaps in the genome. This also shows that it has high genome potential. Compared to other genes in the non-draft phage genomes, the gene above displays more synteny due to the fact that its GeneMark and Glimmer scores align more closely than the genes below. Furthermore, this gene is more conserved and likely to be real due to the fact that it shares its start with many other non-draft genes in other phages within its cluster including Cafasso and ObLaDi. /note=SD (Final) Score:-3.660. This SD score falls within the values necessary to reinforce that this is a viable start site. /note=Gap/overlap:34. Small gap with minimal overalapping genes. This indicates that the gene has this start. This start is also shared with other members of the cluster that this phage is in. This is also consistent within other phages within the cluster that ModicumRichard is categorized in. This includes the phages Cafasso and ObLaDi. The gap between the gene’s suggested start and the previous gene’s is 34. Based on the GeneMark graphs of the gene potential, this is the longest reasonable ORF because the gene is long and should have a limited gap which is consistent with this value. There is a small gap upstream that cannot be changed since it is the beginning of the start sequence. There is no coding potential because downstream the gene has no gaps within the coding potential within Host-Trained Gene mark and GeneMarkS. Moreover, the length of the gene is acceptable since it is longer than 40 basepairs and due to the fact that above data shows that the start site is the most likely. /note=Phamerator: The gene is found in pham 898 and is conserved in many phages (123 phages). Some of the listed phages are within the same cluster as ModicumRichard. This is a commonly annotated pham with many conserved genes in other phages such as Cafasso and Aleemily. Based on the phamerator no function was called. /note=Starterator: In addition to phamerator, the starterator calls the pham as 898. For gene 24665, the start for this gene is not the most annotated start site. The most annotated start site is 67 and the called one is 32. Due to the fact that many of the other phages within ModicumRichard’s cluster have a similar start site, I believe it is the correct start site. /note=Location call: 25339. Based on the above information, I would agree with the start site call made by Glimmer and GeneMark. /note=Function call: Lysin A protease c39 /note=Summary:Based on the above data, I believe that I have enough data to call the function of this gene due to the synteny it displays with other phages within its cluster including Aleemily and Cafasso which has been consistent throughout other modules due to the consistency of the hits that PhagesDb provided as similar. Furthermore, other non-draft phages within the data show similar function calls which is a lysin protease. Lastly, all of the above phages have high blasts as well as the appropriate e-values showing more evidence towards this being the correct function call. From all the hits that were found, the only ones that were helpful were the ones from CDD. This is due to the fact that it gave consistent hits for the function with it being a peptidase. Furthermore HHpred was not informative due its lack of consistency and minimal hits for function. I hypothesize that the function of this ORF is that it is a protease or lysing agent of some sort since many of the above databases have linked it to having anti-microbial properties to stop bacterial replication. Lysin A, protease C39 domain is listed as a SEA-phages function. It acts to split two genes. /note=Starterator drop down: Indicates that the start is ATG. /note=Phamerator: Shows high coding potential based on the data above. /note=Secondary Annotator Name: Scheithauer, Julia /note=Secondary Annotator QC: I agree! CDS 25336 - 26043 /gene="28" /product="gp28" /function="lysin A, N-acetylmuramoyl-L-alanine amidase domain" /locus tag="ModicumRichard_28" /note=Original Glimmer call @bp 25336 has strength 12.55; Genemark calls start at 25336 /note=SSC: 25336-26043 CP: yes SCS: both ST: SS BLAST-Start: [lysin A, N-acetylmuramoyl-L-alanine amidase domain [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 2.66528E-172 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.154, -4.429150775994987, yes F: lysin A, N-acetylmuramoyl-L-alanine amidase domain SIF-BLAST: ,,[lysin A, N-acetylmuramoyl-L-alanine amidase domain [Gordonia phage Cafasso] ],,QXN74243,100.0,2.66528E-172 SIF-HHPRED: d.118.1.1 (A:) AmpD protein {Citrobacter freundii [TaxId: 546]},,,d1j3ga_,77.8723,99.6 SIF-Syn: lysin A, N-acetylmuramoyl-L-alanine amidase domain, upstream gene is Pham #5718, downstream gene is lysin A, protease C39 domain, just line in phage Cafasso /note=Primary Annotator Name: LIEU, JADELIEN /note=Auto-annotation: Glimmer and GeneMark. Both attribute 25336 as the start site. /note=Coding Potential: Consistent coding potential on the forward strand for both the host-trained GeneMark and self-train GeneMark. /note=SD (Final) Score: -4.429. It is the best final score from the list on PECAAN. /note=Gap/overlap: Overlap of -4. This indicates that the gene likely contains an operon. /note=Phamerator: Pham number is 87495 as of 10/31/24. Gene is conserved in all other phages in the pham, such as BlueNGold and Aleemily. Function call is mostly lysin A, N-acetylmuramoyl-L-alanine amidase domain, with GMA2 attributing the function call to a putative lysin. /note=Starterator: There are 9 non-draft members of this Pham. All 9 call this start site 5, correlating to start site 25336 in ModicumRichard. /note=Location call: This is likely a real gene with a start site of 25336. Auto and manual annotation both agree on the start site and are supported by the evidence above. /note=Function call: Lysin A (N-acetylmuramoyl-L-alanine amidase domain). All evidence supports, including PhagesDB BLAST (first 5 hits in BLAST call the same function; many other top hits, save for a few exceptions, call the same function; lowest e score of e-137), NCBI BLAST (lowest e score of 3e-172), CDD and HHpred hits (probability 99.48). /note=Transmembrane domains: DeepTMHMM predicts no TMDs; only one outside domain. /note=Secondary Annotator Name: Lilly Grunski /note=Secondary Annotator QC: I agree with this annotation as all of the evidence categories have been considered CDS 26066 - 26449 /gene="29" /product="gp29" /function="hypothetical protein" /locus tag="ModicumRichard_29" /note=Original Glimmer call @bp 26066 has strength 12.38; Genemark calls start at 26066 /note=SSC: 26066-26449 CP: yes SCS: both ST: SS BLAST-Start: [lysin A, glycosyl hydrolase domain [Gordonia phage Pleakley] ],,NCBI, q11:s217 89.7638% 2.3236E-45 GAP: 22 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.257, -2.0895162839948163, yes F: hypothetical protein SIF-BLAST: ,,[lysin A, glycosyl hydrolase domain [Gordonia phage Pleakley] ],,YP_010097450,27.2997,2.3236E-45 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Ornelas, Jessica /note=Auto-annotation: Auto-annotation was provided for both Glimmer and Genemark. They both agree on the same start site of 26066. This start site is associated with the ATG start codon. /note=Coding Potential: The gene does have reasonable coding potential predicted within the putative ORF because it matches all of the guiding principles. Coding potential is on the forward strand only, indicating it is a forward gene. Coding potential is found in both Host-Trained and Self-Trained GeneMark. The chosen start site covers all this coding potential. /note=SD (Final) Score: The final score is -2.090 which is the best out of 6 entries since it is the least negative. The Z-value was also good with a value of 3.257, which is more than 2. /note=Gap/overlap: The gap upstream is reasonable with a gap of 22 bp. There are no alternative start candidates. The gene is 384 bp long, which is an acceptable length. /note=Phamerator: The gene is part of pham 5718 as of 10/29/24. The pham my gene is conserved is in other members of the DZ cluster. I used Aleemily_28, Cafasso_29, Morgana_29, and ObLaDi_29 as comparison. The phams database did not have a function called for this gene. /note=Starterator: Starterator shows a highly conserved start site, number 2 among the members of the pham. This start site corresponds to the basepair 26066. There are 12 members total according to phamDB, and Starterator lists 2 of them as drafts. From Starterator, 10/10 non-draft genes call this site. Start 2 is 26066 in ModicumRichard. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence, my gene is a real gene as it is conserved in Starterator and has good coding potential. 26066 seems the most likely start site as it is conserved in Starterator and covers all coding potential. /note=Functional call: NKF. There are multiple phagesDb BLAST hits with the suggested function of lysin A in the glycosyl hydrolase domain with small e-values of 2e-39 and 6e-24. The sequence identity in phagesDb BLAST was 52% and higher, reaching the requirement of being more than 35%. Multiple NCBI BLASTp hits were also endolysins with high query covers (>80%) and low E-values of 2e-45 and 2e-25. The sequence identity in NCBI BLASTp was 55% and higher, reaching the requirement of being more than 35%. However, there were no CDD hits and there were no significant HHpred hits as they had high e-values and low probabilities. Additionally, none of the HHpred hits listed lysin as their function. Lastly, there was no synteny when comparing the genome maps of my gene and the blast hits on phamerator. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Gonzalez Hernandez, Yalit /note=Secondary Annotator QC: I agree that the gene should remain NFK because the BLAST, CDD, and HHpred results did not have strong hits for the function. Further analysis needs to be conducted to determine function. /note=Tertiary Annotator Name: Bouklas, Tejas /note=Tertiary Annotator QC: Good work, Jessica. You may move on to the next module. Minor comment that z-score of 2 or above is fine and does not add more weight if 3 v 2; for your information only. CDS 26446 - 27123 /gene="30" /product="gp30" /function="cysteine protease" /locus tag="ModicumRichard_30" /note=Original Glimmer call @bp 26452 has strength 13.66; Genemark calls start at 26452 /note=SSC: 26446-27123 CP: yes SCS: both-cs ST: NI BLAST-Start: [cysteine protease [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 3.34983E-162 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.468, -6.8551355900001, no F: cysteine protease SIF-BLAST: ,,[cysteine protease [Gordonia phage Cafasso] ],,QXN74245,100.0,3.34983E-162 SIF-HHPRED: d.3.1.1 (A:) Cruzain {Trypanosoma cruzi [TaxId: 5693]},,,5Z5O_A,97.7778,99.9 SIF-Syn: Cysteine protease, upstream gene is a holin, downstream has no known function (though all within pham 5718), just like in phage Morgana, Aleemily, Cafasso, and ObLaDi. /note=Primary Annotator Name: McFarlane, Kelly /note=Auto-annotation: Glimmer and GeneMark both call the gene to start at 26452. /note=Coding Potential: Coding potential on both Self and Host-Trained is high, only in the forward direction, and is all included in the chosen start site. /note=SD (Final) Score: The final score of the best start site is -6.855 and the Z-score is 1.468. While this is not the best final score among all of the possible start sites, all other suggested start sites with better final scores have extremely large gaps/overlaps that would not show good synteny when compared to other non-draft phages. It is also most likely a non-optimal final score because it is within an operon. /note=Gap/overlap: The overlap is 4 bp, which optimizes the genome and indicates the presence of an operon. /note=Phamerator: As of 10/29/24, the pham is 2871. This pham has 31 members and is found in 4 other non-draft phages in the same cluster (DZ). /note=Starterator: The auto-annotated start site is 15 which corresponds to position 26452 (same as Glimmer and GeneMark). This is not the most annotated site, but that site is not a possible start site for this gene. The other phages in this cluster call start site 12, which is at position 26446 for this phage. This evidence is compelling to disagree with the auto-annotated start and change to start 26446. /note=Location call: Given the above evidence, this gene is real and most likely has a start site at 26446. /note=Function call: Cysteine protease. The top four Phagesdb BLAST hits call the function a cysteine protease (e-values <10^-120), and the top two NCBI BLAST hits also call it a cysteine protease (e-values <10^-142). There are three CDD hits (only two which show in PECAAN) that call it a cysteine protease (e-values <10^-4), and the top four HHpred hits (only three which show in PECAAN) mention cysteine protease (all probabilities 99.93%). Also, when aligned with the phage Saguaro (listed on SEA-PHAGES function list as containing a cysteine protease), there was 99.95% probability and 6e-30 e-value. /note=Transmembrane domains: DeepTMHMM predicts that there are no transmembrane domains in this gene, with 100% probability, meaning this is not a membrane protein. /note=Secondary Annotator Name: Pimentel, Patricia /note=Secondary Annotator QC: This is a great analysis of the evidence from HHpred, NCBI, and PhagesDB blast--which all point to the conclusion that the sequence is in fact representative of a cysteine protease. These calls are well supported by supplementary data such as the low e-values and high probability and identity values. CDS 27120 - 27479 /gene="31" /product="gp31" /function="holin" /locus tag="ModicumRichard_31" /note=Original Glimmer call @bp 27126 has strength 15.18; Genemark calls start at 27120 /note=SSC: 27120-27479 CP: yes SCS: both-gm ST: SS BLAST-Start: [holin [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 5.7432E-75 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 0.822, -7.90556050301914, no F: holin SIF-BLAST: ,,[holin [Gordonia phage Cafasso] ],,QXN74246,100.0,5.7432E-75 SIF-HHPRED: SIF-Syn: Holin, upstream gene is a cysteine protease, downstream is NFK (IN OTHER PHAGES ITS ALL MEMBRANE PROTEINS) just like in phages Morgana, ObLaDi, Aleemy, and Cafasso. /note=Primary Annotator Name: Lawson, Matthew /note=Auto-annotation: Glimmer calls the start at 27,126, but GenMark calls the start at 27,120. /note=Coding Potential: Coding potential is found only on the forward strand in both host-trained and self-trained GeneMark, indicating that this is a forward gene /note=SD (Final) Score: -2.523 /note=Gap/overlap: -4 base pairs. /note=Phamerator: 11,626. Date 10/29/24. It is conserved; found in Aleemily (AZ), Cafasso (AZ), Morgana (AZ), and ObLaDi(AZ). /note=Starterator: Start site 2 in Starterator was manually annotated in 4/4 non-draft genes in this pham. Start 2 is 27,120 in ModicumRichard. This evidence agrees with the site predicted by GeneMark, but contradicts the site predicted by Glimmer (site 27,126). Despite the better Z-score, final score, and start codon usage probability of the 27,126 start site, the 4BP overlap present in the 27,120 start site makes the 27,120 start site seem more likely. Additionally, the 27,120 start would result in a gene length of 360 base pairs, which is the same length of all 4 of the non-draft genes in this pham which ModicumRichard also displays synteny with. Due to this, the 27,120 start site seems most likely. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 27,120. /note=Function call: Holin. Holin. The top 4 phagesdb BLAST hits have the function of holins (E-value <10^-59), and 2 out of the 3 total NCBI BLAST hits also have the function of holins. (100% coverage, 99%+ identity, and E-value <10^-75). HHpred and CCD have no relevant hits. /note=Transmembrane domains: TMHMM predicts 4 transmembrane domains, which is consistent with the hypothesized function of a holin, as holins typically require at least 2 transmembrane domains. /note=Function Call QC: Overall, I would agree with your function call. The TMD as well as the phages DB, blast hits show enough relevant data to call that function. /note=Secondary Annotator Name: Khachaturov, Allyson /note=Secondary Annotator QC: Based on the data that you collected, I would agree with this call despite Glimmer and GeneMark not agreeing on the start site. Since it is exhibited with other genes in other phages, it is more evidence proving that this is a valid start site. The only note I would give would be to elaborate a little more on the synteny such as incorporating details regarding blasts from PhagesDB. CDS 27476 - 27847 /gene="32" /product="gp32" /function="hypothetical protein" /locus tag="ModicumRichard_32" /note=Original Glimmer call @bp 27476 has strength 17.18; Genemark calls start at 27476 /note=SSC: 27476-27847 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 7.63976E-82 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 0.953, -7.910617765408869, no F: hypothetical protein SIF-BLAST: ,,[membrane protein [Gordonia phage Cafasso] ],,QXN74247,100.0,7.63976E-82 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Muff, Margaret /note=Auto-annotation: Glimmer and GeneMark both called the gene and agree that the start site is at 27476 bp. /note=Coding Potential: Coding potential shown as forward; both Self and Host trained GeneMarks found this coding potential, so it is likely a real, forward gene. /note=SD (Final) Score: -7.911; very negative but other evidence supports this as the start site. /note=Gap/overlap: -4 bp; indicative of an operon at the start site, and there is evidence of an operon/similar overlap in finalized genomes of Morgana and Cafasso phages. /note=Phamerator: pham 187490 as of 10/30/2024. This is conserved in Morgana, Cafasso, Aleemily, and ObLaDi (DZ). /note=Starterator: Start site 3 is conserved in four of four finalized phage genomes (DZ). Start site 3 is at 27476 bp in ModicumRichard, and this is the start that both Glimmer and GeneMark called. /note=Location call: Given the above evidence, this appears to be a real gene and I agree with the start site at 27476 bp. /note=Function call: NKF, there are some good hits on the phagesdb blast as well as the PBDe hit, but they are DUF proteins, meaning that there’s no defined function of this protein, but it is conserved across several species of Gordonia phage. All calls had high e-values (> e-47), further supporting that there is likely no known function. /note=Transmembrane domains: 1 TMD, the protein crosses the membrane once according to the deep TMHMM, but the majority of the protein remains inside the membrane. /note=Secondary Annotator Name: Lieu, Jadelien /note=Secondary Annotator QC: 1) Glimmer and GeneMark agree on the start site. Coding potential lines up for host- and self-trained GeneMarks. 2) SS on Starterator. Conserved in other pham members and displays synteny. 3) GTG start site and -4 overlap. 4) Z score and final score are very low—0.953 and -7.911, respectively—but all other evidence supports. 5) Function call, evidence boxes, and drop-down menues quality check completed. Overall: Agree with the primary annotation. CDS 27844 - 28596 /gene="33" /product="gp33" /function="lysin B" /locus tag="ModicumRichard_33" /note=Original Glimmer call @bp 27844 has strength 12.58; Genemark calls start at 27844 /note=SSC: 27844-28596 CP: yes SCS: both ST: SS BLAST-Start: [lysin B [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 2.68901E-179 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.435, -4.125079303122735, no F: lysin B SIF-BLAST: ,,[lysin B [Gordonia phage Cafasso] ],,QXN74248,100.0,2.68901E-179 SIF-HHPRED: Gene 12 protein; alpha/beta sandwich, CELL ADHESION; 2.0A {Mycobacterium phage D29},,,3HC7_A,90.0,99.8 SIF-Syn: Lysin B, upstream gene is unknown with the pham number of 187490, downstream is an immunity repressor (similar to phage Morgana for downstream) /note=Primary Annotator Name: Choudhary, Khushi /note=Auto-annotation: Both Glimmer and Genemark agree on the same start site of 27844. /note=Coding Potential: Reasonable coding potential in the predicted ORF with the start site covering all the coding potential according to the typical and alternate coding potential. /note=SD (Final) Score: Best SD score for this start original as the final score is -4.125 and the Z-score is 2.435 which meets the criteria of a good score for both when compared to other start sites. /note=Gap/overlap: 4bp overlap which is reasonable as it could indicate it being an operon. Length of gene acceptable with this start site and due to scores this is better than an alternative even with longer ORF. /note=Phamerator: The pham is 192181 with the date of 29th October 2024. It is conserved in this phage and in ObLaDi, Aleemily, Cafasso, all of which are in the same cluster as ModicumRichard of DZ. /note=Starterator: Start site 186 in Starterator was manually annotated in 4/4 non-draft genes in this pham. Start site 186 is at 27844 in ModicumRichard which is the same start site that Glimmer and GeneMark have suggested. /note=Location call: Real gene with good coding potential and an original start site that is most likely as it covers all coding potential and meets the criteria of favorable RBS score and no large coding gaps before the gene (start site of 27844). /note=Function call: Predicted function is Lysin B, based on hits from NCBI BLASTp and plagesDB BLASTp, both of which had 2 hits with high query coverage (>98.0%), high % identity (>98.0%), and low E-values (~e-175). HHpred supports this with e values of 8.7e-23 and corresponded to unique SEA-PHAGES requirements for the gene, and has alignment with 3HC7 chain A (in the macromolecular complex) with 99.8% probability, 90% coverage). /note=Transmembrane domains: Not a membrane protein as there are no predicted TMDs. /note=Secondary Annotator Name: Ornelas, Jessica /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator. I have QC`d the function call and agree with the first annotator. CDS complement (28989 - 29276) /gene="34" /product="gp34" /function="immunity repressor" /locus tag="ModicumRichard_34" /note=Original Glimmer call @bp 29276 has strength 11.24; Genemark calls start at 29276 /note=SSC: 29276-28989 CP: yes SCS: both ST: SS BLAST-Start: [immunity repressor [Gordonia phage Cafasso] ],,NCBI, q1:s1 98.9474% 7.89302E-63 GAP: 16 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.246, -2.253377680616507, yes F: immunity repressor SIF-BLAST: ,,[immunity repressor [Gordonia phage Cafasso] ],,QXN74249,98.9474,7.89302E-63 SIF-HHPRED: ComR; Streptococcus, Competence, Quorum sensing, ComR, TRANSCRIPTION REGULATOR; 2.9A {Streptococcus suis (strain 05ZYH33)},,,5FD4_B,82.1053,97.2 SIF-Syn: Immunity repressor; upstream gene is lysin B and downstream is tyrosine integrase, much like phages in the DZ cluster like Morgana. /note=Primary Annotator Name: Adamian, Andrew /note=Auto-annotation: Both Glimmer and GeneMark call and agree on the same start site: 29276. The glimmer score is 11.24. /note=Coding Potential: Gene seems to have reasonable coding potential, as there is a steep upward and downward hash at the start and stop sites. It also seems to cover all the coding potential. /note=SD (Final) Score: The final score is -2.253, which is a pretty reasonable score, and the lowest from all other gene candidates. /note=Gap/overlap: The gap is 16, which is also reasonable. /note=Phamerator: The gene is found in Pham number 193249 as of 10/25/2024. This Pham is well conserved within other members of the cluster. It had nearly identical locations to ObLaDi and Cafasso. The database listed the function as “immunity repressor”. /note=Starterator: The start site @ 29276 seems to be reasonable, as it is conserved amongst other members of the pham. There are four other members of this Pham (all non-draft), and all 5 (including ModicumRichard) call site 3 as the most conserved site. /note=Location call: The gathered evidence suggests that the gene is real, as it is conserved in the Phamerator and has good coding potential. Additionally, Starterator suggests the start site of this gene is reasonable. /note=Function call: PhagesDB BLASTp calls hits for Morgana and Cafasso with a score of 196 and e-value of 2e-50 for both. Both of these call immunity repressor as their function. NCBI blast also has similar calls with Cafasso and ObLaDi. CDD called one hit as well with an e value of 1.41e-03 within the cl48150 superfamily. The evidence shows that this gene encodes for the described protein, and is on the approved list. All the previous softwares called hits in high confidence for the function of an immunity repressor. /note=Transmembrane domains: This gene has no TMD’s, which makes sense in the context of its function. /note=Secondary Annotator Name: McFarlane, Kelly /note=Secondary Annotator QC: I agree with the called start site and the called function. The location call section of PECAAN notes should state the exact start number chosen, and the function call section should state the function name. The function call section could also be more detailed, could include specific numbers (e-values/probabilities) and specific programs (ie. CDD was had # hits). Also, no domains are checked on PECAAN for HHpred. Since there is a function for this gene, the Synteny box should also be filled out. CDS complement (29293 - 30420) /gene="35" /product="gp35" /function="tyrosine integrase" /locus tag="ModicumRichard_35" /note=Original Glimmer call @bp 30420 has strength 11.18; Genemark calls start at 30420 /note=SSC: 30420-29293 CP: no SCS: both ST: NI BLAST-Start: [tyrosine integrase [Gordonia phage ObLaDi]],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.485, -4.322809268821859, yes F: tyrosine integrase SIF-BLAST: ,,[tyrosine integrase [Gordonia phage ObLaDi]],,UXE03758,100.0,0.0 SIF-HHPRED: Integrase; PROTEIN-DNA COMPLEX, DNA BINDING PROTEIN-DNA COMPLEX; HET: PTR; 3.8A {Enterobacteria phage lambda} SCOP: d.163.1.1, d.10.1.4,,,1Z1B_B,96.2667,99.9 SIF-Syn: this is a tyrosine integrate. upstream is an immunity suppressor and downstream is an IrrE-like protein /note=Primary Annotator Name: Anebere, Stephanie /note=Auto-annotation: Both glimmer and gene mark agree on the start site of this gene /note=Coding Potential: the gene shows good coding potential in both host and self trained. /note=SD (Final) Score: the z score is above 2 which is good and the best final score is -7.398 which is good. the second best is -7.365 which is also good. /note=Gap/overlap: there is a gap of -1 and the gene candidates have a more substantial and unfavorable gaps /note=Phamerator: as of 10/31/24 192233 There are 381 members, 40 are drafted. is conserved in Aleemy, Cafasso, Morgana, and ObLaDi /note=Starterator: as of 10/31/24 192233 a reasonable start site is 70 and 340 nondraft genes called for it. /note=Location call: the evidence shows that this is a real gene and has good coding potential. Based on the data the start site 30420 is the correct start site /note=Function call: tyrosine integrase. the phagesdb and ncbi blast provide strong evidence. They both have a high % identity and they both have low e-values. The cdd and hhpred data are also informative and provide strong evidence that this is a tyrosine integrase. There are also no TMD`s which make sense for the function of this gene since it works inside the cell /note=Transmembrane domains: no TMD /note=Secondary Annotator Name: LAWSON, MATTHEW /note=Secondary Annotator QC: Based on the data collected, the start location call of 30,420 seems most likely but some of the information you provided for this gene is incorrect, which needs to be fixed to provide better evidence for this start location. You indicated that both GeneMark and Glimmer call this start site and showed that it has good coding potential in both self-trained and host-trained GeneMark, though I would change your screenshots in your annotation notebook to make sure they encompass the stop site. You also showed that this gene has syntany with other related phages, and there are strong PhagesDBBlast hits. Additionally, the start codon at 30,420 has high usage probability and covers all of the coding potential in the GeneMark maps. (You did not correctly indicate what the gap is between this gene’s suggested start and the previous gene’s stop or if this start site this the longest reasonable ORF for this gene call, so I would fix this in your annotation notebook to provide stronger evidence for this location call.) /note= This location call of 30,420 also has the best Z-score (2.485) and final score (-4.323) for any location call. Additionally, the pham number you indicated for this gene (192233) does not appear to be the pham this gene is in. I believe pham numbers can change, but the pham that comes up now (193894) on PECAAN doesn’t seem to be the one described in your annotation notebook, so Module 4 in your notebook and your PECAAN notes relating to the starterator must be fixed for better evidence. /note=Also, I agree with your function call of it being a tyrosine kinase. In your PECAAN notes, I think it could just help to comment on the results of HHpred and CDD, as you only talk about the NCBI and PhagesDB blast hits but I think HHpred and CDD provide more valuable evidence. Additionally, make sure to fill out the function box on PECAAN (above the notes box) and the synteny box as well. Also, I think there are a few more hits in the blast sections and HHPred and CD sections on PECAAN that you could select as evidence too. Lastly, in your notebook you just missed a few questions on the TMD section, so I think you should make sure you answer all of these so that you don`t lose any points. CDS complement (30420 - 31148) /gene="36" /product="gp36" /function="IrrE-like protein" /locus tag="ModicumRichard_36" /note=Original Glimmer call @bp 31148 has strength 13.35; Genemark calls start at 30989 /note=SSC: 31148-30420 CP: yes SCS: both-gl ST: NI BLAST-Start: [peptidase [Gordonia phage Cafasso] ],,NCBI, q54:s1 78.0992% 1.69199E-133 GAP: 126 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.66, -6.665838791837839, no F: IrrE-like protein SIF-BLAST: ,,[peptidase [Gordonia phage Cafasso] ],,QXN74251,100.0,1.69199E-133 SIF-HHPRED: CapP toxin; Zinc metallopeptidase, IrrE, cysteine switch, HYDROLASE; HET: SO4, NHE; 1.35A {Thauera sp. K11},,,7T5T_A,57.438,99.3 SIF-Syn: IrrE-like protein, shows synteny with DZ cluster, specifically Morgana and ObLaDi. Upstream these phages have a tyrosine integrase and downstream these phages have a Cro protein. /note=Primary Annotator Name: BURKE, KAIDEN /note=Auto-annotation: Glimmer and GeneMark both call this gene with only slight disagreement on the start codon. Glimmer’s is used in the auto-annotation at 31148. This start codon is a “GTG”. /note=Coding Potential: The host-trained gene mark gives the 2nd ORF of the reverse strand very high coding potential. I believe the start site does work well within this coding potential. The Glimmer start site 31148 provides all coding potential while the GeneMark site of 30989 misses a small amount of coding potential in its ORF. this is evidence for the Glimmer site. /note=SD (Final) Score: -5.594 is the score and this is not the highest but is still the most valid given that the other higher options have massive gaps of 500+ base pairs. This appears to be the start of a potential operon but this start is not in the middle of the operon. /note=Gap/overlap: The gap is 126bp which is large but the other members of the cluster DZ have gaps of this size or larger at this site as well giving more evidence in support of this start site. No alternative sites provide a larger ORF, the gene length of 729bp is a good gene length. The GeneMark suggested site of 30989 provides a larger gap of 286 basepairs and this is evidence against it /note=Phamerator: As of 10/29/24 the Pham is 11370. There are 4 other members in the Pham like ObLaDi and Cafasso. The function is contested among the Pham as mainly an irre-like protein or a peptidase. /note=Starterator: Yes, there is a conserved site among 5/5 phages in the pham that is also the most called site by far. This is start 4 at position 30989 which is called by GeneMark. This differs from start 1 at 31148 that is called by Glimmer. I believe that only 4 non-draft phages is not enough evidence to change the start when coding potential is lost and the gap between genes increases dramatically. /note=Location call: All together this suggests this is a real gene with a start site most likely at 31148 that runs reverse. /note=Function call: The function is an IrrE-Like Protein as seen by the top 4 NCBI BLASTp showing consistent results consistent with that function (e-value < 1e-100), and CDD domains showing the matching domains (metallopeptidase for zinc and IrrE structure) with low e-value (3.1e-5). HHpred shows IrrE-like function as well with 99.67 probability and e-value of 2.5e-15. /note=Transmembrane domains: No transmembrane domains were predicted by DeepTMHMM. This makes sense with the endopeptidase functionality. /note=Secondary Annotator Name: MUFF, MARGARET /note=Secondary Annotator QC: I agree with this annotation, as the evidence supports the Glimmer auto annotated start site and all coding information is included within the gene. I also agree that this is likely an IrrE-like protein, as it has a strong e-value and is supported by the DeepTMHMM and CDD calls. Synteny box complete. No notes. CDS complement (31275 - 31880) /gene="37" /product="gp37" /function="Cro (control of repressor's operator)" /locus tag="ModicumRichard_37" /note=Original Glimmer call @bp 31880 has strength 9.5; Genemark calls start at 31733 /note=SSC: 31880-31275 CP: yes SCS: both-gl ST: SS BLAST-Start: [Cro protein [Gordonia phage Aleemily] ],,NCBI, q26:s1 87.5622% 2.25011E-122 GAP: 105 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.99, -5.989984804479136, no F: Cro (control of repressor`s operator) SIF-BLAST: ,,[Cro protein [Gordonia phage Aleemily] ],,UVK59776,100.0,2.25011E-122 SIF-HHPRED: ComR; Streptococcus, Competence, Quorum sensing, ComR, TRANSCRIPTION REGULATOR; 2.9A {Streptococcus suis (strain 05ZYH33)},,,5FD4_B,75.6219,97.2 SIF-Syn: Cro (control of repressor`s operator) Protein, upstream gene is in pham 194024 and downstream gene is in pham 11370, just like in phage ObLaDi. /note=Primary Annotator Name: Laughton, Alette /note=Auto-annotation: Gene (stop@31275 R), Glimmer start called at 31880 bp, GeneMark start called at 31733 bp. The start at 31880 has a start codon of TTG. /note=Coding Potential: For the start site at 31880, the gene has reasonable coding potential in the Self-Trained and Host-Trained GeneMark systems, and the site covers all of the coding potential. The longest start in Host-Trained and Self-Trained GeneMark appears to be between 31800 and 31900 in the reverse. The gene is 606 bp long, shows a switch in gene orientation from forward to reverse with an appropriate gap of 105 bp, and shows coding potential in only one frame of the reverse GeneMark system. /note=SD (Final) Score: -5.990. This final score is in the middle of the listed final scores for suggested start sites. The Z score is 1.99. This Z score is in the middle of the listed Z scores for suggested start sites. /note=Gap/overlap: For the start site at 31880, 105 bp gap. This is the smallest of the gaps for listed start sites, and the gap is due to the gap needed to switch between forward and reverse genes, which occurs at this start. This is the longest ORF for the gene and there is no coding potential suggesting the presence of another gene before the start site. /note=Phamerator: Pham 88544, run on 10/29/2024. It is conserved, displaying synteny with non-draft genes in Aleemily (DZ), ObLaDi (DZ), Morgana (DZ), and Cafasso (DZ). The function is called as Cro Protein. /note=Starterator: 2/5 non-draft members call start site 5, which does not correlate to the chosen start site 1 at 31880 for this phage. Glimmer’s autoannotated start at 31880 still seems to be the most likely start due to the smaller gap and coding potential coverage, even though the start site 5 at 31805 has a better Z score and final score. /note=Location call: Based on the above evidence, this is a real gene. The gene is conserved and has reasonable coding potential. The likely start site for this gene is 31880. 31880 seems to be the correct start based on GeneMark’s tendency to skip TTG starts, the smaller gap, and a Z-score close to 2. /note=Function call: Cro Protein. PhagesDB BLASTp returned the top 5 hits of a Cro Protein, a helix-turn-helix DNA binding protein (E values < 1e-30). NCBI BLASTp returned the top 4 hits of a Cro Protein, found in DZ cluster phages Aleemily, ObLaDi, Cafasso, and Morgana, and singleton VanLee (E values < 5e-33, coverage > 70%, identity 49-99.5%). CDD returned the top 4 hits of a helix-turn-helix protein (E values < 2e-4). HHpred returned the top 4 PDB hits as DNA binding proteins or transcriptional regulators (E values < 6e-5). /note=Transmembrane domains: DeepTMHMM does not call any trans-membrane domains (TMDs), and this is therefore not a transmembrane protein. /note=Secondary Annotator Name: Choudhary, Khushi /note=Secondary Annotator QC: I agree with the first annotation on the likely start site for this gene, and the function, and have QC`d this call. All of the evidence categories have been considered. CDS 31986 - 32303 /gene="38" /product="gp38" /function="helix-turn-helix DNA binding domain" /locus tag="ModicumRichard_38" /note=Original Glimmer call @bp 31986 has strength 6.15; Genemark calls start at 31986 /note=SSC: 31986-32303 CP: yes SCS: both ST: NI BLAST-Start: [helix-turn-helix DNA binding domain protein [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 8.93193E-68 GAP: 105 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.485, -4.197870532213559, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding domain protein [Gordonia phage Cafasso] ],,QXN74253,100.0,8.93193E-68 SIF-HHPRED: ComR; Streptococcus, Competence, Quorum sensing, ComR, TRANSCRIPTION REGULATOR; 2.9A {Streptococcus suis (strain 05ZYH33)},,,5FD4_B,90.4762,96.0 SIF-Syn: helix-turn-helix DNA binding domain, upstream is a Cro protein, downstream is a RecE-like exonuclease, like in Aleemily, Cafasso, and ObLaDi. /note=Primary Annotator Name: Ea, Emily /note=Auto-annotation: Glimmer and GeneMark both have the start site at 31,986 with a start codon of ATG. /note=Coding Potential: Coding Potential is found in both host-trained and self-trained GeneMark in the forward direction, indicating that this is a forward gene. The start site covers all of the coding potential. /note=SD (Final) Score: Coding Potential is found in both host-trained and self-trained GeneMark in the forward direction, indicating that this is a forward gene. The start site covers all of the coding potential /note=Phamerator: This gene is in the pham 192,560 on 10/29/2024. This pham has 65 members and 2 draft phages on PhagesDB. This gene is conserved as can be seen by comparison of phages Aleemily (DZ) and Morgana (DZ). No function called. /note=Starterator: Starterator lists that there are 63 members and 4 drafts in Pham 192560 as of 10/18/24. Start number 15 was manually annotated 4/59 (6.8%) in non-draft genes and agrees with the autoannotated calls from Glimmer and GeneMark @31,986. This start number is not the most annotated, with 23/59 (39.0%) non-draft genes calling start site 39. The most highly conserved start number is start 45 found in 40/63 (63.5%) genes with no manual annotations. Start number 15 was found in 4/59 (6.8%) non-draft genes in the pham, which is not highly conserved but is called 100% of the time when present and does not contain start site 39 that is the most conserved. Considering that ModicumRichard does not have the most annotated start, start #15 is reasonable since it agrees with Glimmer and GeneMark and is at least called in 4 other non-draft genes. /note=Location call: Based on the listed evidence, this gene is real and starts at 31,986. GeneMark data shows good coding potential that is all covered with this start site. Glimmer, GeneMark, and Starterator all call the same start site. /note=Function call: Based on both the PhagesDB and NCBI BLAST results for this sequence, I believe that there is enough data to prove that this gene functions as a helix-turn-helix DNA binding domain protein. The results are found in other gordonia phages (ObLaDi, Cafasso, etc.), have extremely high BLAST scores (at or close to max score), the least negative E- value is 1e-54 (means E-values are all <10-7), 100% query coverage, and 99.05-100% identity similarity. They all have the same listed function and alludes that this gene is a helix-turn-helix DNA binding domain protein. From the approved functions list, it satisfied the requirement of having at least 2-3 alpha helices separated by a small turn region of 3-4 amino acids. This can be seen in the HHpred query sequence. HHpred’s top PDB hit with a probability of 98.09%, % Coverage of 49.29, and E-value of 0.00014 had a classification as a transcription regulator and function noted as transcription factor. Due to the past function calls by Phages DB and NCBI BLAST as a helix-turn-helix DNA binding domain, this HHpred hit supports the call and gives it a specific function. Other hits did not satisfy the cut-offs for all values, but SCOPE had a hit (Probability: 98.03%, % Coverage: 34.68, and E-value: 0.00046; only % coverage is insufficient) that is a P22 C2 repressor in a DNA-binding domain. This function could potentially be related to the transcription regulation as listed by the PDB hit and also has the structure required to be considered a helix-turn-helix DNA binding domain. Overall, this gene can be called as a helix-turn-helix DNA binding domain since HHpred hits from PDB and SCOPE display functions relating to and within the helix-turn-helix DNA binding domain. /note=Transmembrane domains: DeepTMHMM does not predict any transmembrane domains in this protein and therefore is not a membrane protein. TMHMM has this protein listed as inside the membrane, which makes sense since that is where transcription factors (and the associated helix-turn-helix DNA binding domain) would be found. /note=Secondary Annotator Name: Adamian, Andrew /note=Secondary Annotator QC: I agree with the called start site of 31986. The function call seems to be properly supported by the evidence shown, and I agree with the called function. Additionally, I agree that this is not a transmembrane protein. CDS 32352 - 33365 /gene="39" /product="gp39" /function="RecE-like exonuclease" /locus tag="ModicumRichard_39" /note=Original Glimmer call @bp 32352 has strength 11.94; Genemark calls start at 32352 /note=SSC: 32352-33365 CP: yes SCS: both ST: SS BLAST-Start: [RecE-like exonuclease [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 0.0 GAP: 48 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.067, -2.8298788511194775, yes F: RecE-like exonuclease SIF-BLAST: ,,[RecE-like exonuclease [Gordonia phage Cafasso]],,QXN74254,100.0,0.0 SIF-HHPRED: Exonuclease; alkaline exonuclease, digest double stranded DNA, strict 5-3-polarity, HYDROLASE; 1.9A {Laribacter hongkongensis},,,3SYY_A,58.1602,99.8 SIF-Syn: Rec-E like exonuclease, upstream gene is helix-turn-helix DNA binding domain and downstream gene is Rec-T like DNA pairing protein, just like in Cafasso, Aleemily, Morgana, and ObLaDi /note=Primary Annotator Name: Venkatraman, Rakshaa /note=Auto-annotation: Both Glimmer and GeneMark agree with start site 32352, and start codon ATG. /note=Coding Potential: Good coding potential in predicted ORF in both host-trained and self-trained genemarks. Chosen start site covers the entire coding potential region. /note=SD (Final) Score: Chosen final score is the best as it is the least negative of all other options. It also has a very good Z-score that is above 2. /note=Gap/overlap: There is a 49bp gap with upstream genes. This is reasonable, as there are no alternative start candidates to choose from, and this bp gap is mirrored by other non-draft annotated phages, indicating synteny. The length of the gene is 1014bp, which is acceptable. /note=Phamerator: As of 10/31/24, the gene is in Pham 192351. Pham has 4 other non-draft phages of the same cluster, including Aleemily, Cafasso, Morgana, and ObLaDi. /note=Starterator: Reasonable start site that is conserved amongst phages in the pham, Start 80, called by 40 of 159 non-draft phage annotations. This start site is not present in ModicumRichard. In this phage, this corresponds to Start: 73 @32352 which has 5 MA’s. Start 73 correlates with all the other phages in cluster DZ, and 100% of phages who had this start site called it, providing sufficient support for this start. /note=Location call: Based on the current evidence, this gene is a real gene with good coding potential, final/Z-score, and synteny with other phages. The auto-annotated 32352 start site seems correct based on this same evidence, as phamerator and starterator provide good evidence for it. /note=Functional call: Multiple phagesDB BLAST and NCBI Blastp hits suggest a function of Rec-E-like exonuclease, with strong e-values of 0, same length, and high percent identity (99%). Two PDB hits on HHPred provide strong evidence for exonuclease, with high probability (>99.9%), coverage greater than 60%, and low E-values (<10e-20). Comparison of the given phage for Rec-E-like exonuclease (Che9c_60) had a moderately strong e-value on HHPred. Although the comparison with phage for Cas4 exonuclease had a higher e-value than with Che9c, combined with BLAST and other evidence, I think Rec-E-like exonuclease is more likely. I think the strong E-values on BLAST along with currently hypothesized function of Rec-T like DNA binding protein for the immediate downstream gene further supports Rec-E-like exonuclease decision. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Anebere, Stephanie /note=Secondary Annotator QC: me: Anebere, Stephanie /note=Secondary Annotator QC: I agree with the start site proposed. The evidence is reasonable and suggests that this is the correct start site. I agree with the function call of this gene. I have QC’ed this gene. CDS 33410 - 34294 /gene="40" /product="gp40" /function="RecT-like DNA pairing protein" /locus tag="ModicumRichard_40" /note=Original Glimmer call @bp 33410 has strength 13.52; Genemark calls start at 33410 /note=SSC: 33410-34294 CP: no SCS: both ST: NI BLAST-Start: [RecT-like DNA pairing protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 0.0 GAP: 44 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.166, -2.2763497933341483, yes F: RecT-like DNA pairing protein SIF-BLAST: ,,[RecT-like DNA pairing protein [Gordonia phage Cafasso]],,QXN74255,100.0,0.0 SIF-HHPRED: RecT; DNA Recombination, DNA Annealing, DNA BINDING PROTEIN; 4.5A {Listeria innocua Clip11262},,,7UBB_G,78.5714,100.0 SIF-Syn: RecT-like DNA pairing protein, upstream gene is RecE-like exonuclease, downward stream gene is NKF. This is the same for other DZ cluster phages Cafasso, Morgana, ObLaDi, and Aleemily. /note=Primary Annotator Name: Allataifih, Bashar /note=Auto-annotation: Gene stop@34294 F starts at 33410 according to glimmer, GeneMark agrees with it starting at 33410. Start codon is an "ATG". /note=Coding Potential: This gene has high coding potential covered by the auto-annotated start range in the Self-Trained and Host-Trained GeneMark systems. High coding potential with all potential covered by the given start site. /note=SD (Final) Score: -2.2. This is a great final score and the highest available. /note=Gap/overlap: 44 bp. That’s the smallest gap option and provides the LORF. /note=Phamerator: As of 11/11/24 the Pham is 192759. There are 34 other members in the Pham like ObLaDi and Cafasso. The function is entirely agreed to be a RecT DNA binding protein. /note=Starterator: No, there is not the same conserved site among the 34 phages in the pham. This is start 12 at position 33410 which is called by GeneMark and Glimmer. This start is called 100% of the time in the 5 phages it is found while the others in the pham call their respective start sites 100% of the time as well within their clusters. It may not be the most common but this does not appear to be evidence against it. /note=Location call: This is a real gene that shows great synteny with surrounding region in other cluster members. It likely starts at 33410 bp as seen by this being the LORF, shortest gap, Complete coverage of coding potential, and good final score. /note=Function call: The function is an RecT-like DNA pairing protein as seen by the top 2 NCBI BLASTp showing almost the same exact results , which is also the same as the function (e-value < 1e-100), and CDD domains showing the matching domains (RecT) with low e-value (3.63e-62). HHpred shows RecT protein as well with 100% probability and e-value of 1.7e-41 /note=Transmembrane domains: DeepTMHMM predicts no TMDs; only one inside domain /note=Secondary Annotator Name: BURKE, KAIDEN /note=Secondary Annotator QC: Looks good, I fixed up and added detail to all the other sections from auto-annotation to location call. I agree with the final function call due to HHpred evidence being strong with the backing of BLASTp for the RecT DNA pairing function. No additional notes. CDS 34287 - 34541 /gene="41" /product="gp41" /function="hypothetical protein" /locus tag="ModicumRichard_41" /note=Original Glimmer call @bp 34287 has strength 15.57; Genemark calls start at 34287 /note=SSC: 34287-34541 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_41 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 1.58649E-49 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.759, -5.220380545373514, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_41 [Gordonia phage Cafasso] ],,QXN74256,100.0,1.58649E-49 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Qian, Audrey /note=Auto-annotation: Gene (stop@34541 F). Glimmer and GeneMark. Both call the start at 34287, and the start codon is ATG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. The ORF is not the longest (255 bp compared to 441 bp), but it does have reasonable coding potential and the chosen start site does include all of the coding potential. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -5.220. It is not necessarily the best final score on PECAAN, but it is second best (best is -3.774 for start site 34494). This is reasonable, since it is possible that this gene is on an operon, and final scores will not hold much weight. The z score is the second highest at 1.759 (the highest is 2.435 for start site 34494). /note=Gap/overlap: 8bp overlap. This overlap is somewhat larger than the conventional 4bp overlaps for genes on an operon. However, this overlap or gap is the best one and is reasonable because the gap is conserved in other phages (Cafasso and Aleemily). There is also only one coding potential in the overlap. /note=Phamerator: The pham number as of 10/28/2024 is 12210. The gene is conserved in phages Aleemily, Morgana, and ObLaDi, all in the same cluster as MordicumRichard (DZ). The function call for the gene is unknown. /note=Starterator: There are only 4 non-draft members of this pham. 4/4 non-draft members call start site 5, which correlates to a start site of 34287 bp for ModicumRichard. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 34287 bp. Starterator agrees with Glimmer and Genemark. /note=Functional call: NKF. Although phagesdb BLAST called some hits with known function, these results have very large e-values (2.2) and low scores (<30). The best e-values are hits that have unknown function, which is in alignment with NCBI BLAST, which only called hits with unknown function (100% coverage, 88%+ identities, and e-value < 1e-43), and thus no relevant hits. There were no hits on CDD and no significant hits on HHpred (all hits were blue/green, with high e-values of >42, low probability of < 71.02%, and low coverage of <73.8095%). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs. Therefore, it is not a membrane protein. /note=Secondary Annotator Name: Laughton, Alette /note=Secondary Annotator QC: Based on the above evidence, I agree with the primary annotator`s locational call and functional call. CDS 34538 - 34795 /gene="42" /product="gp42" /function="hypothetical protein" /locus tag="ModicumRichard_42" /note=Original Glimmer call @bp 34538 has strength 7.66; Genemark calls start at 34538 /note=SSC: 34538-34795 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_42 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 1.88061E-52 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.48, -5.792175110375287, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_42 [Gordonia phage Cafasso] ],,QXN74257,98.8235,1.88061E-52 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Liu, Carissa /note=Auto-annotation: Gene (stop@34795 F) Both Glimmer and GeneMark call the gene. They agree on the same start site, which was called at 34538 bp with a start codon of ATG. /note=Coding Potential: [start site 34538] The gene has reasonable (very good coverage) coding potential predicted within the putative ORF. The chosen start site covers all this coding potential. The gene displays synteny with at least two other non-draft phage genomes. There are four PhagesDB BLAST hits that are non-draft phage genomes with an e-value less than 10^-6. There is coding potential predicted by Glimmer, GeneMark Self, and GeneMark Host. The gene is at least 120 bp long (258 bp). There are no switches in gene orientation. Only one frame in one strand is used for a protein-coding gene. /note=SD (Final) Score: [start site 34538] The SD score -5.792 is the best. The Z-value 1.48 is the second best. The best is 1.586. Since the gene is an operon, the RBS is irrelevant for the start call. /note=Gap/overlap: [start site 34538] 4 bp overlap with the upstream gene is reasonable. This gene may be part of an operon. The length of the gene is acceptable (258 bp). It is the longest reasonable ORF for this gene call. There are no large non-coding gaps before the gene. /note=Phamerator: Pham 10993 as of 10/27/24 is conserved in Cafasso and ObLaDi, all in the same cluster DZ as ModicumRichard. No function called. /note=Starterator: There is a reasonable start site choice that is conserved. Start site #1 @34538 in ModicumRichard. 4/4 non-draft genes call site #1. This evidence agrees with the site predicted by both Glimmer and GeneMark. /note=Location call: [start site 34538] The gathered evidence suggests that this is a real gene, conserved in phamerator and has good coding potential. The most likely start site is 34538, conserved in starterator and covers all coding potential. /note=Function call: NKF. No program returned any informative results. PhagesDB BLASTp: function unknown. NCBI BLASTp: hypothetical protein. CDD: no hits. HHpred: no significant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Ea, Emily /note=Secondary Annotator QC: I agree with this location call, everything seems to be considered. This function call seems reasonable since there were no functions called by BLAST hits and there were no hits with any significance from CDD or HHpred and no predicted TMDs to call it a membrane protein. CDS 34792 - 34995 /gene="43" /product="gp43" /function="excise" /locus tag="ModicumRichard_43" /note=Original Glimmer call @bp 34783 has strength 15.13; Genemark calls start at 34810 /note=SSC: 34792-34995 CP: yes SCS: both-cs ST: SS BLAST-Start: [excise [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 2.40772E-40 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.141, -4.515700363413541, no F: excise SIF-BLAST: ,,[excise [Gordonia phage Cafasso] ],,QXN74258,100.0,2.40772E-40 SIF-HHPRED: AlpA family phage regulatory protein; excisionase, mobile genetic elements, recombination, DNA BINDING PROTEIN; 2.11A {Achromobacter xylosoxidans NBRC 15126 = ATCC 27061},,,8C3T_A,89.5522,98.1 SIF-Syn: Excise, upstream gene has NKF and downstream also has NKF, just like in DZ cluster phage Aleemily and ObLaDi. /note=Primary Annotator Name: Streva, Amanda /note=Auto-annotation start source: Glimmer Start calls the start at 34783 bp, whereas GeneMark calls the start at 34810 bp, a discrepancy of 27 bp with start codon ATG /note=Coding Potential: Coding potential is found in both GeneMark Host and GeneMark Self. Host has high coding potential in the forward strand only, indicating a Forward gene, and the Self also has good coding potential in the typical and atypical coding potential. Though some typical coding potential is found in the Reverse strand of the gene, it is too small to be considered. All coding potential is included in the ORF start site 34792 /note=SD (Final) Score: The final score for start site 34792 is -4.516, which is not quite the best score listed on PECAAN, however it is the second best. Since this gene has a minimum gap and is a potential operon, that takes precedence over choosing the best final score /note=Gap/overlap: -4 indicates 4bp overlap, the smallest listed on PECAAN and is a potential operon with good synteny compared to other genes such as Morgana and ObLaDi /note=Phamerator: The pham number is 10246 and it was ran on 10/29/24. It is conserved, found in Aleemily, Cafasso, Morgana, and ObLaDi (all DZ cluster) /note=Starterator: Start site 6 was manually annotated in ⅘ of the non-draft genes in the pham, though it was auto annotated to be 5 but manually annotated to be 6 (34792 bp) because of previously accumulated evidence and how conserved start site 6 is in the pham /note=Location call: Based on the evidence, this is a real gene that calls the gene start 34792. Starterator agrees with previous findings /note=Function call: excise. The top 3 phagesDB BLAST hits (Aleemily, Cafasso, ObLaDi) have a low E-value (99%), low E-values (e-6). Also, the Transmembrane domain section is not filled out. Other than that, I agree and believe the primary annotator has compiled enough evidence that calls for the function to be RepA-like replication inhibitor! CDS 37063 - 37524 /gene="48" /product="gp48" /function="helicase loader" /locus tag="ModicumRichard_48" /note=Original Glimmer call @bp 37063 has strength 14.24; Genemark calls start at 37063 /note=SSC: 37063-37524 CP: yes SCS: both ST: SS BLAST-Start: [helicase loader [Gordonia phage Morgana]],,NCBI, q1:s1 100.0% 2.4231E-104 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.452, -5.786784277633676, no F: helicase loader SIF-BLAST: ,,[helicase loader [Gordonia phage Morgana]],,XAO35482,98.6928,2.4231E-104 SIF-HHPRED: replisome organizer; helical bipartite natively unfolded domain, replication; HET: MSE; 2.4A {Bacillus phage SPP1} SCOP: a.179.1.1, l.1.1.1,,,1NO1_C,47.0588,98.4 SIF-Syn: helicase loader. Upsteam: RepA-like replication initiator. Downstream: DnaB-like dsDNA helicase. This is similar to phage Aleemily, Morgana, Cafasso, and ObLaDi, the other 4 phages in cluster DZ, which all have RepA-like replication initiator upstream of this gene and DnaB-like dsDNA helicase downstream. /note=Primary Annotator Name: Su, Erin /note=Auto-annotation: Glimmer and GeneMark both call the gene and agree on a start site of 37063. /note=Coding Potential: This gene does have coding potential in both GeneMarkSelf and GeneMark Host, indicated in the forward strand. /note=SD (Final) Score: The Final Score is -5.787 and the Z-value is 1.452. These are not the best Z-score and final scores present but they are some of the better ones. Low RBS may not be required due to the potential for a gene to be in an operon (see gap/overlap). The start codon is GTG, which is more likely than the TTG start site that would create the LORF. /note=Gap/overlap: The overlap is -4bp indicating that the gene may potentially be in an operon, thereby not necessitating a high RBS score. /note=Phamerator: The pham number as of 11/17/2024 is 194195. The gene is conserved in each of the four other phages (ObLaDi, Cafasso, Aleemily, Morgana) in Cluster DZ. The gene is 462bp in each of these 5 phages, but is different in other genes in the pham. /note=Starterator: Each of the phages in cluster DZ (ObLaDi, Cafasso, Aleemily, Morgana) have start site 14 called as the start for this gene. This start site is 37063 for ModicumRichard and is the same as the auto-annotation suggestion. Start site 14 was called in 5 of 29 manual annotations, including all of the cluster DZ genes. The most common start site, though, was start site 12, called in 13 of 29 manual annotations. However, ModicumRichard does not have this start site. The starterator analysis was last run on 11/02/24 on database version 579. /note=Location call: Based on the above evidence, this is a real gene and the start site is most likely 37063. /note=Function call: Helicase loader. PhagesDB BLAST’s top 4 hits have E-values below 10^-80, and three of them are called as helicase loaders (1 is unknown). On NCBI BLAST as well, 3 of the top 4 hits are helicase loaders (1 is unknown) (92%+ identity, 100% coverage on all, E-values below 10^-100). HHPred’s top hit (PDB 1NO1) is a helicase loader (98.4% probability, 47.0588% coverage, E-value of 6.4e-10) and its second best hit is PFAM PF11417, also a helicase loader (98.2% probability, 43.1373% coverage, E-value of 3.2e-9). CDD had no hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, so this is not a transmembrane protein. /note=Secondary Annotator Name: Torosian, Isabella /note=Secondary Annotator QC: I agree with the annotation above. There is strong evidence to support the location call and the functional call. CDS 37521 - 38840 /gene="49" /product="gp49" /function="DnaB-like dsDNA helicase" /locus tag="ModicumRichard_49" /note=Original Glimmer call @bp 37521 has strength 17.25; Genemark calls start at 37521 /note=SSC: 37521-38840 CP: yes SCS: both ST: SS BLAST-Start: [DnaB-like dsDNA helicase [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.485, -3.75071250087134, no F: DnaB-like dsDNA helicase SIF-BLAST: ,,[DnaB-like dsDNA helicase [Gordonia phage Cafasso]],,QXN74264,100.0,0.0 SIF-HHPRED: Replicative helicase; Helicase, Primase, Replication, dnaB, dnaG; HET: MSE, SO4; 2.9A {Geobacillus stearothermophilus},,,2R6A_B,92.4829,100.0 SIF-Syn: DnaB-like dsDNA helicase, downstream gene is NKF, upstream gene is helicase loader, just like in phages Morgana, ObLaDi, Aleemy, and Cafasso. /note=Primary Annotator Name: Baugh, Alex /note=Auto-annotation: Gene (stop@38840 F) Both Glimmer and Genemark call the start at 37521. /note=Coding Potential: There is coding potential only on the forward strand, showing that this is a forward strand. There is coding potential in both Genemark self and host. /note=SD (Final) Score: -3.751. It is the highest score on PECANN. Z-score is 2.485 and is also the highest score. /note=Gap/overlap: -4bp. This is not a bigger overlap than -7 and -4 is an ideal overlap. This could mean the gene is part of an operon. /note=Phamerator: At 10/29/24, the gene is in pham 4302. The cluster DZ is conserved and is found in 4 other phages Aleemy, Cafasso, Morgana, ObLaDi. /note=Starterator: There is 19 non-draft members in this pham. 5 out of the 19 non-draft members call the start site of 4 which correlates to a start site of 37521bp for ModicumRichard. /note=Location call: Based on the above evidence, this is a real gene and the likely start site is 37521. /note=Function call: DnaB-like dsDNA helicase. The top 2 phagesdb Blast hits and top 2 NCBI Blast hits have the function of DNAB-like dsDNA helicase with E-values of 0. I got supporting functions for the CCD and HHpred results. There are also strong domain alignment data for CCD that align the ORF with known ASCE ATPase in phages, which helicase is a subgroup of. HHpred results show the best PBD hit as replicative helicase and the best pfam hit as divDNAB. With these supporting matches across multiple tools and datasets, I am confident in assigning a function of DnaB-like dsDNA helicase. It also meets the requirements in the SEA-PHAGES approved function list. /note=Transmembrane domains: DeepTMHMM did not predict any TMDs, so it is not a membrane protein. /note=Secondary Annotator Name: Lee, Kyle /note=Secondary Annotator QC: I agree with this annotation including the location, but it seems like the functional call is unfinished. From module 7, it seems like you still need to check HHpred, or at least mention it in your PECAAN notes. Also mention CDD in PECAAN notes, and update function call to ASCE ATPase (based on your module 7). Check evidence boxes in PECAAN and add synteny notes. Add function call as well. Lastly, please update the transmembrane domain portion of your PECAAN notes. CDS 38851 - 39069 /gene="50" /product="gp50" /function="hypothetical protein" /locus tag="ModicumRichard_50" /note=Original Glimmer call @bp 38851 has strength 14.04; Genemark calls start at 38851 /note=SSC: 38851-39069 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_50 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 3.79094E-43 GAP: 10 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.625, -4.2914814936838255, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_50 [Gordonia phage Cafasso] ],,QXN74265,100.0,3.79094E-43 SIF-HHPRED: SIF-Syn: /note=PECAAN Notes /note=Primary Annotator Name: Iasi, Matilde /note=Auto-annotation: Glimmer and GeneMark both agree on the start site at 38851, with a start codon of ATG (good evidence for a real gene). /note=Coding Potential: Coding potential of this ORF is on the forward strand only. It is found in GeneMark Self and Host. All of the coding potential is contained within the start site. /note=SD (Final) Score: -4.270, and it is the best final score on PECAAN, providing the LORF. /note=Gap/overlap: 10bp, which is not perfect but still very reasonable, and there is no coding potential in this gap. /note=Phamerator: Pham 10571. Date 18/10/24. It is conserved, found in Aleemily (DZ), Cafasso (DZ), Morgana (DZ) and ObLaDi (DZ). /note=Starterator: Start site 4 was manually annotated in 5/5 non-draft genes in this pham. Start 4 is 38851 in ModicumRichard draft, and this evidence agree with the autoannotated start site. /note=Location call: Based on the evidence, this is a real gene with a start site at 38851, as indicated by both Glimmer and GeneMark /note=Function call: unknown function. All the phagesDB BLAST results have an unknown function, and the top 2 have an E-value of 5e-36. All 3 NCBI BLAST results also have an unknown function (100% coverage, 100% and around 83% identity, and E-value of 4e-43). HHpred and CDD did not yield any informative results, as there were no hits with significant values. Therefore, the function of the protein is unknown. /note=Transmembrane domains: No TMDs detected /note=Secondary Annotator Name: Khuu, Jacob /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. I agree with the functional call given the evidence; the top hits on PhagesDB call for an unknown function, the HHpred and CDD hits were uninformative, and NCBI BLAST yielded hypothetical protein hits. CDS 39066 - 39314 /gene="51" /product="gp51" /function="hypothetical protein" /locus tag="ModicumRichard_51" /note=Original Glimmer call @bp 39066 has strength 11.18; Genemark calls start at 39066 /note=SSC: 39066-39314 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_51 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 3.48273E-53 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.739, -5.340269015678895, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_51 [Gordonia phage Cafasso]],,QXN74266,100.0,3.48273E-53 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Mathew, Taniya /note=Auto-annotation: Gene 2 (stop@39314 F), Glimmer and GeneMark both indicate the start site is 39066. /note=Coding Potential: Since coding potential for the ORF this gene is located in is only in the forward strand, this gene is a forward gene and the chosen start site contains all of the coding potential. The start codon has all of the coding potential. /note=SD (Final) Score: The SD final score given for this gene is -5.340, which is the best final score on PECAAN. /note=Gap/overlap: Gap overlap is -4, which means there is minor overlap. Since the gap is not greater than -7, this can be overlooked. A gap size of -4 might indicate an operon. /note=Phamerator: Pham 88709. Date 10/29/24. The gene is conserved in phages Aleemily, Cafasso, and ObLaDi, which are part of the same cluster, DZ, as ModicumRichard. /note=Starterator: Start site was called at 39066 on both auto-annotation and starterator, and my gene contains this most-annotated start site. /note=Location call: This gene is most likely a real gene and the start site is called at 39066. /note=Function call: There is not enough information to confidently call a function for this gene. The BLASTp hits on PhagesDB and NCBI came back as “unknown function,” and had good values (E-value <10^-44.) HHpred and CDD hits were not quite statistically significant and came back with differing results. Due to these results, I cannot confidently conclude the gene’s function. /note=Transmembrane domains: There are no TMDs detected within this gene. /note=Secondary Annotator Name: Xu, Sasha /note=Secondary Annotator QC: I agree with the function call as well. For phagesdb blast, make sure not to check draft genomes as evidence. CDS 39311 - 39784 /gene="52" /product="gp52" /function="hypothetical protein" /locus tag="ModicumRichard_52" /note=Original Glimmer call @bp 39293 has strength 13.41; Genemark calls start at 39311 /note=SSC: 39311-39784 CP: yes SCS: both-gm ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_52 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 4.88625E-111 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.831, -3.4897410997597818, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_52 [Gordonia phage Cafasso]],,QXN74267,99.3631,4.88625E-111 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Reynolds, Noah /note=Auto-annotation: Gene stop@ 39784 F The Glimmer gives the start site as 39293 and Genemark reports it at 39311. This is only an 18 BP difference and we will default to Genemark due to it having a better Z-score, better Final score and a lower gene overlap. The Glimmer score is 13.41. /note=Coding Potential: Coding potential in this ORF is in the forward strand only, indicating it is a forward gene. /note=SD (Final) Score: There are 2 final scores. -3.490 (for start codon 39311) and -5.711 (for start codon 39293). We will use start codon 39311, which has a final score of -3.490. This is the best final score on PECAAN, /note=Gap/overlap: -4. This suggests the gene may be part of an operon and thus may be transcribed as polycistronic mRNA. This overlap is minimal, is conserved in other phages (Cafasso & Morgana) and there is no gap to have coding potential. /note=Phamerator: Pham 12304 10/29/2024. This gene is conserved in 4/4 non-draft genomes, found in Aleemily_52 (DZ), Cafasso_52(DZ), Morgana_52(DZ) and ObLaDi_52(DZ), all of which are in the same cluster. /note=Starterator: Start site 7 was manually annotated in 0 non-draft genes in this pham. Start 7 is 39293 in ModicumRichard draft, but this evidence does not agree with the start site predicted by Glimmer. Instead the start 8 at 39311 that was predicted by GeneMark acts as a better start. It brings the gene to 274 bp long, which is conserved by the other 4 genes. It also creates a better final score of -3.490 compared to -5.711 and the z-score is 2.831 compared to 1.558. Start 8 is also conserved amongst the other 4 genes in the pham group. /note=Location call: This is a real gene and the start site is called for 39311 /note=Function call: The function of this gene is designated as "No Known Function" (NKF) due to a lack of definitive functional evidence despite strong sequence alignment data. PhagesDB BLAST results show high-confidence matches to Gordonia phages, such as Cafasso_52 (E-value of 2e-87 with 99% identity) and Aleemily_51 (E-value of 1e-78 with 92% identity), both annotated with unknown functions. Additionally, the PhagesDB BLAST result for Morgana_52 shows an E-value of 3e-75 with identities of 136/157 (86%) across 278 matches, and it is classified under cluster DZ with a function still listed as unknown. NCBI BLAST similarly identifies hits with strong alignment, including Cafasso_52 with an E-value of 5e-111, but these matches are also functionally uncharacterized. Additionally, neither CDD nor HHpred provided informative hits, as no relevant domains or structural predictions were found. The HHpred analysis, for example, returned matches with low probabilities, such as a top hit (5TMP_A) with a probability of 52.6% and similarly low values for other hits (e.g., d1s20a1 at 47% and d1ak2a1 at 43.1%), coupled with high E-values, such as 16 or higher, indicating weak structural alignment confidence. While the gene is conserved within cluster DZ and displays syntenic patterns consistent with related phages, the absence of clear functional characterization results in an NKF designation. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: SU, ERIN /note=Secondary Annotator QC: I have QC`d this location call and agree with the primary annotator. Be sure to distinguish overlaps and gaps (ie, there is no gap for there to be coding potential if it is an overlap.). Please proofread for clarity and concision, especially looking for sentence fragments. I also agree with the functional call as there is not sufficient evidence to call a function. I suggest mentioning Morgana`s gene in the PhagesDB BLAST section since it is checkeed as evidence on PECAAN. It would also strengthen your argument to provide some example HHPred values, such as mentioning that the top probabilities are low or the top E-values are very high. CDS 39819 - 39971 /gene="53" /product="gp53" /function="DNA binding protein" /locus tag="ModicumRichard_53" /note=Original Glimmer call @bp 39819 has strength 8.22; Genemark calls start at 39819 /note=SSC: 39819-39971 CP: yes SCS: both ST: SS BLAST-Start: [DNA binding protein [Gordonia phage Cafasso]],,NCBI, q2:s1 98.0% 5.11614E-26 GAP: 34 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.662, -5.4987073736983305, no F: DNA binding protein SIF-BLAST: ,,[DNA binding protein [Gordonia phage Cafasso]],,QXN74268,100.0,5.11614E-26 SIF-HHPRED: SIF-Syn: DNA binding protein, upstream gene is unknown and downstream gene is helix-turn-helix DNA binding domain, just like in phage Cafasso /note=Primary Annotator Name: Bonver, Sarah /note=Auto-annotation: Both Glimmer and Genemark call start at 39819 /note=Coding Potential: The stop and start site of this gene captures all of the coding potential indicated by the black and red dotted lines in the self-trained GeneMark. Also it captures all the coding potential in the host-trained GeneMark /note=SD (Final) Score: The final score is -5.499 which is the second to least negative start site. The Z score for this start site is the best on the list but is 1.662 which is below 1.8 so not great. There’s another start site (@39823 F) that has a better Z score of 2.303 and a less negative final score of -4.872. /note=Gap/overlap: for the start site @39819 F, which is the one that Glimmer and Genemark called there is a gap of 34 before the gene. For the start site @39823 F there is a gap of 43 /note=Phamerator: pham: 12084. Date 11/06/2024. It is conserved; found in Morgana (DZ), Aleemily (DZ), ObLaDi (DZ), Cafasso (DZ) /note=Starterator: start#1 @39819bp is the auto-annotated start site which is manually annotated in 2 genes. Found in 5/5 genes in pham. /note=Location call: There’s sufficient evidence to say this is a real gene because the E-value, synteny, length, and coding potential are strong pieces of evidence that support it being real. Although, the start site that Glimmer and Genemark called is not the best one. A better start site is @39828F which has a 2.303 Z-value and a more negative final value. Start codon "GTG" is very likely but there is a gap of 43bp. /note=Function call: DNA binding protein. NCBI BLASTp and PhagesDB BLASTp provide strong evidence for this function. All the hits have good scores and values and they all have the same function - DNA binding protein /note=Transmembrane domains: The absence of TMDs makes sense since DNA binding proteins generally function within the cell membrane to assist in the building and repair of DNA. /note=Secondary Annotator Name: Baugh, Alex /note=Secondary Annotator QC: I agree with location call but not with function call. I would say this is a NKF since you only have Blast results and no CDD or HHpred results (Video Module 7). You should fix your functional call and Transmembrane domain PECANN notes accordingly. CDS 39964 - 40224 /gene="54" /product="gp54" /function="helix-turn-helix DNA binding domain" /locus tag="ModicumRichard_54" /note=Original Glimmer call @bp 39964 has strength 8.8; Genemark calls start at 39964 /note=SSC: 39964-40224 CP: yes SCS: both ST: SS BLAST-Start: [helix-turn-helix DNA binding domain protein [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 8.28292E-57 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.778, -3.5985503360675457, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding domain protein [Gordonia phage Cafasso] ],,QXN74269,100.0,8.28292E-57 SIF-HHPRED: a.4.1.2 (C:) HIN recombinase (DNA-binding domain) {Synthetic},,,d1ijwc_,51.1628,98.4 SIF-Syn: "helix-turn-helix DNA binding domain protein, upstream gene is (likely) DNA binding protein, downstream is PnuC-like nicotinamide riboside transporter, just like in phage Cafasso" /note=PECAAN Notes /note=Primary Annotator Name: Graham, Noah /note=Auto-annotation: both Glimmer and GeneMark called, agreed start site of #39964, codon ATG /note=Coding Potential: there is good supplemental coding potential supported for the gene, the gene length is 261bp long which adds up with the amount of coverage on the map and the selected start site covers all of the coding potential. however a reverse gene seems to be present with coding potential in the same sequence. /note=SD (Final) Score: -3.599 which is the best PECAAN score /note=Gap/overlap: -8bp overlap which is decently reasonable although covers the end of a previous gene, this is also conserved in similar phages which adds supporting reasonability as well as the fact that there is no missed coding potential with this start site, it is the LORF /note=Phamerator: pham 9836 as of 10/29/24 // yes, the other genes in the pham are also in the phages that belong to the same DZ cluster along with one singleton, used Morgana and ObLaDi for comparison // the phagesDB called this a helix-turn-helix DNA binding domain protein as the function and the other phages in the same cluster with this gene were consistently calling this function /note=Starterator: yes there`s a reasonable start conserved in the pham with start #7, position 39964 // there are 5 other members and the 4/5 that contain the start site, called this conserved site /note=Location call: based on the evidence this gene is real due to good coding potential covered by the start site, conserved start sites, synteny, sufficient Z and Final scores and the start site that was auto annotated at #7 and 39964 bp is the most likely start site as it`s conserved, most annotated, and a normal start codon /note=Function call: Predicted function is helix-turn-helix DNA binding domain protein, based on hits from NCBI BLASTp and phagesDB BLASTp, both of which had at least 2 hits with 100% query coverage, high % identity (>95%), and very low E-values (>e-49) // the matches with several other phages of genes with known functions, proper BLASTp scores, e-values, and identity + alignment are all supportive of the hypothesized function. The approved functions list also contains this suggested function. /note=Transmembrane domains: /note=Secondary Annotator Name: Iasi, Matilde /note=Secondary Annotator QC: I have QC`s this location and agree with the primary annotator. This start site at 39964 provides the LORF, is well conserved across phages of the same cluster and the Z score and final score also provide good evidence for this start site. Additionally, I agree with the functional call as PhagesDB BLAST, NCBI BLAST, HHpred and CDD top hits all call for this protein being a helix-turn-helix DNA binding domain, and it is conserved across other phages. CDS 40221 - 40475 /gene="55" /product="gp55" /function="PnuC-like Nicotinamide riboside transporter" /locus tag="ModicumRichard_55" /note=Original Glimmer call @bp 40221 has strength 14.74; Genemark calls start at 40221 /note=SSC: 40221-40475 CP: yes SCS: both ST: SS BLAST-Start: [PnuC-like nicotinamide riboside transporter [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.64614E-51 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.246, -2.1924212546750814, yes F: PnuC-like Nicotinamide riboside transporter SIF-BLAST: ,,[PnuC-like nicotinamide riboside transporter [Gordonia phage Cafasso]],,QXN74270,100.0,1.64614E-51 SIF-HHPRED: NMN_transporter ; Nicotinamide mononucleotide transporter,,,PF04973.15,83.3333,99.1 SIF-Syn: PnuC-like Nicotinamide riboside transporter, upstream gene is helix-turn-helix DNA binding domain, downstream gene is not defined as seen in phage ObLaDi. /note=Primary Annotator Name: Kang, Hannah /note=Auto-annotation: Glimmer and GeneMark both call the gene and call a start at 40221 bp, with the start codon GTG. /note=Coding Potential: The gene has a reasonable coding potential predicted within the putative ORF. The chosen start site covers all the coding potential found in GeneMark Self and Host. The coding potential of the gene is only found on one ORF on the forward strand. /note=SD (Final) Score: -2.192. It is the best final score on PECAAN. Has the highest Z score of 3.246 on PECAAN. /note=Gap/overlap: There is a -4bp overlap with the previous gene. /note=Phamerator: Pham: 191004. Date: 10/30/2024. The gene is conserved in phages Aleemily and Cafasso. /note=Starterator: Start site 23 in ModicumRichard was manually annotated in 4 of 254 genes in this pham. Start 23 is 40221 in ModicumRichard, which agres with the site predicted by Glimmer and GeneMark. /note=Location call: With the evidence above, this is a real gene that has a start site of 40221. /note=Function Call: PnuC-like nicotinamide riboside transporter. The top four phagesBD BLAST hits have the function of PnuC-like nicotinamide riboside transporter (E-Value < 1^-40). The top NCBI BLAST hit has the function of PnuC-like nicotinamide riboside transporter (100% coverage, 100% identity, and E-Value = 2e-51). The second top NCBI BLAST hit also has the function of PnuC-like nicotinamide riboside transporter (100% coverage, 96% identity, and E-Value = 9e-50). HHpred has hits for PnuC-like nicotinamide riboside transporter with probabilities of 98%+, 83%+ coverage, and E-Values less than 1e-12. CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM predicts three TMD’s. Therefore, we can assume the gene encodes a transmembrane protein. /note=Secondary Annotator Name: Mathew, Taniya /note=Secondary Annotator QC: I have QC`ed this location and function call and agree with the primary annotator. CDS 40478 - 40774 /gene="56" /product="gp56" /function="hypothetical protein" /locus tag="ModicumRichard_56" /note=Original Glimmer call @bp 40478 has strength 8.14; Genemark calls start at 40478 /note=SSC: 40478-40774 CP: no SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_CAFASSO_56 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 5.74379E-66 GAP: 2 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.257, -2.1695583717155773, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_56 [Gordonia phage Cafasso]],,QXN74271,100.0,5.74379E-66 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Nuckowski, Sophie /note=Auto-annotation: Both start at 40478. /note=Coding Potential: High coding potential, which starts at the predicted bp, 40478. /note=SD (Final) Score: -2.170, the best possible SD score of available options. /note=Gap/overlap: A gap of less than 50 bps is present, only 2. /note=Phamerator: 88755 /note=Starterator: All others in pham have the same start, which agrees with the auto-annotation. /note=Location call: The start is likely 40,478, based on starterator’s information on all other genes in the Pham having the same start. /note=Function call: /note=Transmembrane domains: /note=Secondary Annotator Name: Reynolds, Noah /note=Secondary Annotator QC: I agree with this annotation. However it would be a good idea to add more detail to your PECAAN notes and then fill out the drop down menu for the starterator and GM coding capacity. It would be a good idea to add more to your phamerator (including date the PHAM analysis was run, how many genes its conserved in and the clusters). Try to also add some more information/details to your starterator. Please try to get the last QC notes fixed as well as complete Modules 6-8 in your Lab NB as soon as possible so you can fill out these PECAAN notes and determine a function for your gene. After running a PhagesDB BLAST, NCBI BLAST, CDD, HHpred, and DeepTMHMM analyses, I have determined the gene to have NKF (No Known Function). The PhagesDB BLAST produced four positive hits with ObLaDi_8 and Cafasso_8. These hits exhibited strong E-values, were part of the DZ cluster, had similar/close scores, and demonstrated good identity. The NCBI BLAST results showed strong matches for Morgana, ObLaDi, Aleemily, and Cafasso. ObLaDi_8 and Cafasso_8 should be considered the primary evidence since they showed strong results in both PhagesDB and NCBI BLAST analyses. However, the CDD, HHpred, and DeepTMHMM analyses did not provide any significant hits or relevant domains. As a result of this review, I have determined the gene to have NKF. Make sure to include enough detail when you add your own notes. You should include specific values/data sets to provide good evidence. Also when you are done, make sure to click the evidence box for the information used to make the claim of NKF CDS complement (40750 - 41151) /gene="57" /product="gp57" /function="lipoprotein" /locus tag="ModicumRichard_57" /note=Original Glimmer call @bp 41118 has strength 13.32; Genemark calls start at 41118 /note=SSC: 41151-40750 CP: yes SCS: both-cs ST: NI BLAST-Start: [lipoprotein [Gordonia phage Cafasso]],,NCBI, q12:s1 91.7293% 7.23913E-78 GAP: 107 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.765, -3.750255311428014, yes F: lipoprotein SIF-BLAST: ,,[lipoprotein [Gordonia phage Cafasso]],,QXN74272,99.1803,7.23913E-78 SIF-HHPRED: LPS-assembly lipoprotein LptE; Lipopolysaccharide, Lipoprotein, LptDE, MEMBRANE PROTEIN; HET: PCJ; 2.98A {Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1)},,,8H1R_E,15.0376,91.1 SIF-Syn: Lipoprotein, upstream gene is NFK from pham 88755, downstream is NFK from pham 11440, just like in phage ObLaDi. /note=Primary Annotator Name: Burgess, Brooklyn /note=Auto-annotation: Both Glimmer and Genemark agree and call the gene on the call start at 41118. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: For start site 41151, the final score is -3.750, which is the best final score on PECAAN. Also, the z score is the highest at 2.765. /note=Gap/overlap: For start site 41151, the overlap is large with a 107 bp overlap, but it is conserved in phages ObLaDi, Morgana, and Cafasso. Also, there is no coding potential in the gap that might be a new gene. /note=Phamerator: Pham 12291 as of Nov 11, 2024. Conserved; Gene 57 displays synteny with ObLaDi (DZ), Morgana (DZ), Cafasso (DZ), and Aleemily (DZ). /note=Starterator: As of November 11, 2024, there are only 4 non-draft members of this pham. 4/4 non-draft members call start site 2, which corresponds to a start site @ 41118 for ModicumRichard. /note=Location call: Considering all the evidence, this gene is a real gene and has a start site at 41151 as it has a higher final score and more significant z-score. This, however, does not agree with Glimmer and Genemark, which call the start site at 41118. /note=Functional call: multiple phagesDB BLAST has hits with the suggested function lipoprotein with small e values in which are 1e-63 and 6e-41. HHpred had a hit for lipoprotein with 97.39% probability, 39.92% coverage, and E-value of 0.0015. CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a transmembrane protein. This makes sense because it is anchored to a membrane via lipid modifications rather than spanning the membrane with hydrophobic regions. /note=Secondary Annotator Name: Bonver, Sarah /note=Secondary Annotator QC: Almost all sections are complete and good quality. I agree that this is a lipoprotein and with the final location call. Make sure to revisit the transmembrane domains section. What about the gene being a lipoprotein makes sense for why there is no domains in the membrane. The DeepTMHMM graph is interesting, i`m not sure how to interpret it but I would maybe ask an instructor about it. Also the synteny box should contain info in the following format -"Lipoprotein, upstream gene is NFK (pham #), downstream is NFK (pham #), just like in phage XXX" Make sure to find a phage for which this statement is true. Also, report pham number for genes that are NFK. CDS 41259 - 41651 /gene="58" /product="gp58" /function="hypothetical protein" /locus tag="ModicumRichard_58" /note=Original Glimmer call @bp 41259 has strength 3.49; Genemark calls start at 41244 /note=SSC: 41259-41651 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_OBLADI_58 [Gordonia phage ObLaDi]],,NCBI, q1:s11 100.0% 1.53993E-87 GAP: 107 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.375, -5.200721809673198, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_OBLADI_58 [Gordonia phage ObLaDi]],,UXE03781,92.1429,1.53993E-87 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Piao, Sara /note=Auto-annotation: Both Glimmer and GeneMark called. Glimmer called 41259. GeneMark called 41244. /note=Coding Potential: There is large evidence that this ORF has high coding potential. Although the host-trained GeneMark map does not show good coding potential in any of the frames, the self-trained GeneMark map shows good coding potential in the third forward frame as the black solid line spans the entire gene. /note=SD (Final) Score: -5.201. This was the second best final score on PECAAN. /note=Gap/overlap: Gap: 140 bp. Moderately large gap, but there is no coding potential that could indicate a new gene. This gap was also seen in phages Morgana and ObLaDi so this gap is conserved and there is no possibility of a missing gene. /note=Phamerator: pham: 11440. Date 10/30/2024. It is conserved; found in Aleemily (DZ), Cafasso (DZ), Morgana (DZ), and ObLaDi (DZ). /note=Starterator: Start site 5 in Starterator was only called by ModicumRichard. Start site 4 was the most conserved and called by 2 of the 4 non-draft genes in the pham. Start site 4 corresponds to 41244 in ModicumRichard. This evidence agrees with the start site predicted by GeneMark. Due to higher z-score and final score at start site 5 (41259), the evidence is in greater support of the autoannotated start site despite some of the other phages from the cluster calling start site 4 more often. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 41259. /note=Function call: No known function based on hits from PhagesDB BLASTp and NCBI BLASTp. PhagesDB BLASTp had 2 hits with high identity percentage (≥97%) and low e-values of 7e-73. NCBI BLASTp also had 2 hits with 100% coverage, high identity percentage (≥98%), and low e-values of ≤3e-87. CDD had no hits. HHpred had a couple of hits but none of them met the minimum cutoff values for a relevant hit. Since there is lack of evidence to support a specific gene function, this gene has been labeled as no known function. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Graham, Noah /note=Secondary Annotator QC: I agree with the primary annotator, the scores and evidence show this selected start as the most likely. CDS complement (41624 - 41797) /gene="59" /product="gp59" /function="hypothetical protein" /locus tag="ModicumRichard_59" /note=Original Glimmer call @bp 41797 has strength 7.06; Genemark calls start at 41797 /note=SSC: 41797-41624 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_MORGANA_59 [Gordonia phage Morgana]],,NCBI, q1:s1 100.0% 9.25975E-28 GAP: 4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.791, -5.999909193173244, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_MORGANA_59 [Gordonia phage Morgana]],,XAO35493,89.4737,9.25975E-28 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sahagun, Diego /note=Auto-annotation: Glimmer and Genemark both have 41797 as the start site. /note=Coding Potential: Decent Coding potential on both self and host. /note=SD (Final) Score:-6 being the lowest and its z-score, 1.791, isn`t too low. /note=Gap/overlap: Gap of 4 includes operon. /note=Phamerator: 10732 /note=Starterator: Site 2, matched with 5 nondraft phages which is start site 41797. /note=Location call: 41797, based off of starterator and its coverage is good. /note=Function call: NKF, based off of BLAST results matching with other hypothetical proteins, No CDD, or HHPRED data available. /note=Transmembrane domains: Absence can`t be explained as gene function is unknown. /note=Secondary Annotator Name: Kang, Hannah /note=Secondary Annotator QC: I have QC`ed this location and function call and agree with the first annotator. There could be more information on whether the chosen start site covers all coding potential, the date they found the pham, what other phages the pham belongs to, and the phages used for comparison. CDS complement (41802 - 42185) /gene="60" /product="gp60" /function="hypothetical protein" /locus tag="ModicumRichard_60" /note=Original Glimmer call @bp 42185 has strength 10.52; Genemark calls start at 42185 /note=SSC: 42185-41802 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_61 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.8106E-65 GAP: 549 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.009, -4.7084971843946395, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_61 [Gordonia phage Cafasso]],,QXN74276,88.0,1.8106E-65 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Zhang, Kevin /note=Auto-annotation: Glimmer and GeneMark. Both call at 42825 /note=Coding Potential: There is good coding potential between the start and stop site in the reverse direction only. There is coding potential in only one reverse ORF. Both Self and Host trained genemark have similar results. /note=SD (Final) Score: The SD score is -4.709. Tied with another start site at 42029. The Z-value is 2.009. Tied with 42029 as well. /note=Gap/overlap: There is a 549 bp gap between the current gene and the next gene. /note=Phamerator: Pham: 10858. Date: 10/29/24. It is conserved; found in Cafasso, Aleemily, Morgana, ObLaDi (all DZ) /note=Starterator: Start site 1 in Starterator was manually annotated in 4/4 non-draft genes. Start 1 in ModicumRichard is 42185. It is the same call site as Glimmer and GeneMark /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 42825. /note=Function call: Hypothetical protein. The top 4 phagesdbBLAST call for hypothetical protein (E-value <5e-42), and the top 3 on NCBI BLAST calls for hypothetical protein. HHpred had no significant hits as the lowest E-value was 1.9. CDD had no significant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Nuckowski, Sophie /note=Secondary Annotator QC: I agree with this annotation; although there is a large gap present (over 500 bps), the there is no viable alternative start site that could be selected to reduce this gap, and the other factors such as the coding potential covered by the start, low e-values, and high degree of synteny provide evidence for the current location call. The primary annotator should include their location call in PECAAN though. /note=Tertiary Annotator Name: Bouklas, Tejas /note=Tertiary Annotator QC: I agree with your function call. CDS 42735 - 42998 /gene="61" /product="gp61" /function="hypothetical protein" /locus tag="ModicumRichard_61" /note=Original Glimmer call @bp 42735 has strength 11.0; Genemark calls start at 42735 /note=SSC: 42735-42998 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_MORGANA_61 [Gordonia phage Morgana]],,NCBI, q1:s1 100.0% 3.50038E-54 GAP: 549 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.078, -4.856184980221285, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_MORGANA_61 [Gordonia phage Morgana]],,XAO35495,100.0,3.50038E-54 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Echeverria, Alondra /note=Auto-annotation: Glimmer and Genemark. Both calls start at 42735. /note=Coding Potential: Coding Potential in this ORF is on the forward strand and includes all coding potential in that region in both host and self. /note=SD (Final) Score: The final score is the best option at -4.856 and the z score is the highest at 2.663. /note=Gap/overlap: 549bp. Somewhat large, but ultimately reasonable because the gap is conserved in other phages (Aleemily,Cafasso) and there is no coding potential in the gap that might be a new gene. /note=Phamerator: 192979. Date 10/24/24. It is conserved with both Cafasso and Aleemily. /note=Starterator: Start site 14 in Starterator was manually annotated in 4/12 non-draft genes in this pham. Start 14 is 42735 in ModicumRichard_61. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 42735. /note=Function call: No known function. HHpred and CDD have no hits for this gene and little to no supporting evidence of the function listed. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Burgess, Brooklyn /note=Secondary Annotator QC: I agree with the annotations. All of the evidence categories have been considered. Note: More details could be included such as in the final score and gap sections CDS 42995 - 43303 /gene="62" /product="gp62" /function="hypothetical protein" /locus tag="ModicumRichard_62" /note=Original Glimmer call @bp 42995 has strength 11.95; Genemark calls start at 42995 /note=SSC: 42995-43303 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_OBLADI_62 [Gordonia phage ObLaDi]],,NCBI, q1:s1 100.0% 1.42647E-64 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.921, -4.966271270034708, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_OBLADI_62 [Gordonia phage ObLaDi]],,UXE03785,98.0583,1.42647E-64 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Jana, Pranava /note=Auto-annotation: Glimmer and Genemark agree that the start site is @ 42995. /note=Coding Potential: All of the Coding Potential of the ORF is present in the 2nd forward reading frame with none in any other reading frames. Glimmer and Genemark both agree with near identical coding potentials. /note=SD (Final) Score: 2nd Lowest (best) SD of -4.966, with a good z-score of 1.921. /note=Gap/overlap: -4bp. Very reasonable, especially since the gap is conserved in other phages (ObLaDi, Morgana, Cafasso, Aleemily). /note=Phamerator: Pham 10865 as of Nov 11, 2024. Conserved; Gene 62 displays very strong synteny with ObLaDi (DZ), Morgana (DZ), Cafasso (DZ), and Aleemily (DZ). /note=Starterator: As of Nov 11, 2024, there are only 4 non-draft members of this pham. 4/4 non-draft members call start site 2, which corresponds to a start site @ 42995 for ModicumRichard. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 42995 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: Multiple phagesDB BLAST has hits with no suggested function with small e-values of 4e-50 to 1e-45 and 3/3 top NCBI BLAST hits with NKF (98.0583% probability, 100% coverage, and e-value of 1.42647e-64). HHPRED and CDD have no significant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein, but the protein is intracellular and a “globular protein.” /note=Secondary Annotator Name: Piao, Sara /note=Secondary Annotator QC: I have QC’ed this location and function call (NKF) and agree with the first annotator. Make sure to delete the text in the synteny box since your gene is NKF. CDS 43300 - 43671 /gene="63" /product="gp63" /function="hypothetical protein" /locus tag="ModicumRichard_63" /note=Original Glimmer call @bp 43300 has strength 12.61; Genemark calls start at 43300 /note=SSC: 43300-43671 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_OBLADI_63 [Gordonia phage ObLaDi]],,NCBI, q1:s1 100.0% 8.64428E-85 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.053, -4.6353190821692705, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_OBLADI_63 [Gordonia phage ObLaDi]],,UXE03786,99.187,8.64428E-85 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Mastro, David /note=Auto-annotation: Glimmer and GeneMark. Both call at 43300. /note=Coding Potential: Coding potential is seen across the entire range (43300-43671) in the appropriate direction (Forward) only. This is shown in both GeneMark Self and Host. /note=SD (Final) Score: The final score was -4.635 (Z score: 2.053). This is the second highest score, but much more reasonable than the other. /note=Gap/overlap: The overlap is 4bp. This is reasonable in the presence of an operon. /note=Phamerator: The pham is 1305 (10/30/24). This pham is also found in a gene in its cluster (ObLaDi), but none of the others within its cluster. /note=Starterator: It calls start number 16 at 43300bp. This is not the most common start site (site 20) within its pham, but ModicumRichard does not have start site 20, making a call difficult. /note=Location call: For the above evidence, the start site is likely 43300. There is reasonable overlap, good scoring, Glimmer+Genemark agree, and proper coding potential coverage. /note=Function call: 10/11 of the top NCBI BLASTP hits called a protein with an unknown function (e value < 2e-10). 10/10 of the phagesDB BLASTP hits called a protein with unknown function (e value <2e-11). CCD gave no hits, and HHpred’s top hit also was a protein with unknown function (e value = 6.5). For this reason, we cannot make an inference about the function of the protein and instead declare it as No Known Function (NKF). /note=Transmembrane domains: DeepTMHMM predicts no transmembrane domains and instead predicts the protein exists within the membrane. Therefore, this is likely not a transmembrane protein. /note=Secondary Annotator Name: Sahagun, Diego /note=Secondary Annotator QC: Start site covered through ample evidence to determine that it is the most likely start site compared. Autoannotations and starterator match with the reasonings. Using BLAST results from PhagesDB and NCBI lead to no known function, HHPRED data not strong enough but it also supports it. CDS 43668 - 43904 /gene="64" /product="gp64" /function="hypothetical protein" /locus tag="ModicumRichard_64" /note=Original Glimmer call @bp 43668 has strength 13.91; Genemark calls start at 43668 /note=SSC: 43668-43904 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_OBLADI_64 [Gordonia phage ObLaDi]],,NCBI, q1:s1 100.0% 4.25886E-49 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.485, -3.75071250087134, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_OBLADI_64 [Gordonia phage ObLaDi]],,UXE03787,98.7179,4.25886E-49 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Ushaswini Namburu /note=Auto-annotation: Glimmer and Genemark. Both call the start @43668. /note=Coding Potential: The coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Good coding potential is observed in Genemark Self and Host. /note=SD (Final) Score: SD (Final) Score: The final score is the best option at -3.751 and the z-score is 2.485, which is significant. /note=Gap/overlap: The overlap is small with a 4 bp overlap, but this overlap is observed for genes part of operons. It has the sequence of ATGA, which is also characteristic of operons. /note=Phamerator: Pham 9321. Date 10/31/24. It is conserved, found in Aleemily_63, Cafasso_65, ModicumRichard_64, Morgana_64, ObLaDi_64, and VanLee_96 /note=Starterator: Start site 4 was manually annotated in 5/5 non-draft genes in this pham. Start 4 is 43668 in ModicumRichard. /note=Location call: Considering all the evidence, this gene is a real gene and has a start site at 43668. This agrees with Glimmer and Genemark, which call the start site at the same place. /note=Function call: NKF. The top two phagesdb BLAST hits have the NKF (E-value <10^-38), and 2 out of 5 top NCBI BLAST hits also have NKF. (100% coverage, 89%+ identity, and E-value <10^-39). There were no significant hits for CDD and HHpred. HHpred hits have very low coverage and high E-values, making them insignificant. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is likely not a membrane protein. /note=Secondary Annotator Name: Zhang, Kevin /note=Secondary Annotator QC: The annotation and location call looks good. The evidence of Glimmer, GeneMark, Starterator, and Phamerator looks good. /note=For Function call, please check the phasgedb blast results and untick any draft genomes. NCBI evidence looks good. Would possibly expand upon why HHpred had no significant hits. CDS 43901 - 44479 /gene="65" /product="gp65" /function="hypothetical protein" /locus tag="ModicumRichard_65" /note=Original Glimmer call @bp 43901 has strength 13.93; Genemark calls start at 43901 /note=SSC: 43901-44479 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_OBLADI_65 [Gordonia phage ObLaDi]],,NCBI, q1:s1 100.0% 4.7413E-133 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.201, -4.252258253355557, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_OBLADI_65 [Gordonia phage ObLaDi]],,UXE03788,99.4792,4.7413E-133 SIF-HHPRED: SIF-Syn: Protein with no known function upstream pham number for ObLaDi is 10885 and downstream is 51366. Upstream for Cafasso is 187938 (no synteny downstream) /note=Primary Annotator Name: Aguilar, Xeleste /note=Auto-annotation: Glimmer and GeneMark, both call for start at 43901 /note=Coding Potential: Yes the gene has coding potential within the ORF forward strand. Both the GeneMark Self and Host demonstrate the same forward strand coding potential, there are no visible discrepancies. /note=SD (Final) Score: -4.252 /note=Gap/overlap: -4 /note=Phamerator: pham: 9132, Date: 10/30/24, Conserved and found in phages Cafasso and Aleemily. /note=Starterator: Start site 5 in Starterator was manually annotated in 5/5 non-draft genes. Start site 5 is 43901. /note=Location call: Based on the evidence collected, the start site of this gene is 43901. /note=Function call: NKF (unable to determine function based on other phage comparisons, blast results and the HHpred, the data provided was not within a range that would be considered reliable to make an informed decision) /note=Transmembrane domains: DeepTMHMM predicted 0 TMDs, so it is not a membrane protein. /note=Secondary Annotator Name: Echeverria, Alondra /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 44472 - 44900 /gene="66" /product="gp66" /function="hypothetical protein" /locus tag="ModicumRichard_66" /note=Original Glimmer call @bp 44472 has strength 9.93; Genemark calls start at 44472 /note=SSC: 44472-44900 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_OBLADI_66 [Gordonia phage ObLaDi]],,NCBI, q1:s1 100.0% 2.69312E-98 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.602, -5.8313867178037215, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_OBLADI_66 [Gordonia phage ObLaDi]],,UXE03789,100.0,2.69312E-98 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sarahi Jasso /note=Auto-annotation: Both glimmer and Gennemark agree on the start site being 44472. /note=Coding Potential: Yes, there is an abundance of coding potential on the forward strand. /note=SD (Final) Score: -5.831 /note=Gap/overlap: -9 bp /note=Phamerator: The gene is part of pham 111876 as of 10/30/24 /note=Starterator: Shows it at the most conserved start site being 1. /note=Location call: The initial start site matches what was said in Glimmer and Genemark as being @44472 bp. /note=Function call: No Known Function /note=Transmembrane domains: 0 transmembrane domains /note=Secondary Annotator Name: Jana, Pranava /note=Secondary Annotator QC: Try to include z-score. Also specify if coding potential is in the forward or reverse strand. I have QC’ed this location and function call (NKF) and agree with the first annotator. Make sure to fill out NCBI BLAST hits. CDS 44897 - 45385 /gene="67" /product="gp67" /function="hypothetical protein" /locus tag="ModicumRichard_67" /note=Original Glimmer call @bp 44897 has strength 13.41; Genemark calls start at 44897 /note=SSC: 44897-45385 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_OBLADI_67 [Gordonia phage ObLaDi]],,NCBI, q1:s1 100.0% 5.30202E-113 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.891, -3.1896562502401133, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_OBLADI_67 [Gordonia phage ObLaDi]],,UXE03790,100.0,5.30202E-113 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Mathew, Mallika /note=Auto-annotation: Glimmer and GeneMark call start position 44897 /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.190 which is the best final score /note=Gap/overlap: -4 gap meaning the gene is part of an operon. This creates the longest ORF. This is a strong choice for start site /note=Phamerator: Pham 109564 Date 10/30/2024. It is conserved, found in ObLaDi and Cafasso /note=Starterator: Start site 67 (position 44897) in Starterator was manually annotated in 1/2 non-draft genes in this pham. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: It seems to be a real gene due to good coding potential and synteny with non-draft phages in cluster DZ. 44897 is the correct start site and both Glimmer and GeneMark call it. /note=Function call: The function is probably NKF since the BLASTp and NCBI results also showed alignment to genes in non-draft phages that had no known function and the same shows up in HHpred /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name:Mastro, David /note=Secondary Annotator QC: Mention what bp call is in "auto-annotation" field. Function looks good, I agree with NKF due to poor hits - could elaborate more on why it should be NKF. Could also elaborate on why HHPred FAD assembly protein hit does not provide support? Evidence is properly checked off. CDS 45382 - 45570 /gene="68" /product="gp68" /function="hypothetical protein" /locus tag="ModicumRichard_68" /note=Original Glimmer call @bp 45382 has strength 6.02 /note=SSC: 45382-45570 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein SEA_OBLADI_68 [Gordonia phage ObLaDi]],,NCBI, q1:s1 100.0% 2.568E-35 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.224, -2.2186743274732437, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_OBLADI_68 [Gordonia phage ObLaDi]],,UXE03791,100.0,2.568E-35 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sridharan, Ananya /note=Auto-annotation: Glimmer calls the start at 6305. No note from GeneMark /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score:-2.219 /note=Gap/overlap: -4 /note=Phamerator: Pham 51366 Date 10/18/2024. It is conserved; found in ObLaDi (DZ) and Morgana (DZ). /note=Starterator: Start site 13 in Starterator was manually annotated in non-draft genes in this pham. This evidence agrees with the site predicted by Glimmer /note=Location call: The gene is real and starts at 45382 because it is conserved, there is no gap, and there is good coding potential /note=Function call: no known function /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Namburu, Ushaswini /note=Secondary Annotator QC: Everything looks good, can expand more on function call and reference blast data used to finalize NKF. Maybe word everything more like your first gene? Its closer to the "good" example on the lab manual and more likely to be what they`re looking for. CDS 45567 - 45761 /gene="69" /product="gp69" /function="hypothetical protein" /locus tag="ModicumRichard_69" /note=Original Glimmer call @bp 45567 has strength 17.42; Genemark calls start at 45567 /note=SSC: 45567-45761 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ALEEMILY_67 [Gordonia phage Aleemily]],,NCBI, q1:s1 100.0% 2.12999E-34 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.986, -2.707694805492614, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEEMILY_67 [Gordonia phage Aleemily]],,UVK59807,93.75,2.12999E-34 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Carrillo, Daniel /note=Auto-annotation: Glimmer and Genemark both call the start at 45567. /note=Coding Potential: Coding potential is found on the third ORF of the forward strand, indicating this is a forward gene. The host-trained map has narrower coding potential lines than expected, while the self-trained map provides sufficient cover of the gene’s start and stop. Ultimately, both provide passing, consistent coding potential for the gene. /note=SD (Final) Score: The final score is the best at -2.708 with a Z-value of 2.986 being the highest. /note=Gap/overlap: There is 4bp overlap present which indicates that this gene must be part of an operon. This is consistent with the minor overlap of coding potentials we see on GeneMark between our gene’s start and the previous gene`s apparent stop. We also see the same overlap with phage ObLaDi when making comparisons on the pham map. /note=Phamerator: This gene is found in Pham: 12089 as of 10/29/2024. This gene is conserved in phages such as ObLaDi and Cafasso. /note=Starterator: Start site 10 is conserved across 4 other phages. It was manually annotated in 4/4 final phages (Aleemily, ObLaDi, Cafasso, and Morgana). Start site 10 is position 45567 in ModicumRichard. This is in agreement with Glimmer and GeneMark. /note=Location call: Based on the evidence, this gene is real and has a start site of 45567, agreeing with Glimmer and GeneMark.. /note=Functional call: Unknown function. The majority of top “hits” in PhagesDB BLASTp and NCBI BLASTp do not have a function assigned to them despite one hit belonging to phage Cafasso having a DNA Binding Protein function assigned. The majority of top hits just do not have a function and have significantly lower E-values (69%), high % identity (>69%), and low E-values (~3e-93). No informative results on function from HHpred or CDD. /note=Transmembrane domains: Not a membrane protein as there are no predicted TMDs. /note=Secondary Annotator Name: Ornelas, Jessica /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator. I have QC`d this function call and agree with the first annotator. CDS 51079 - 51501 /gene="82" /product="gp82" /function="DNA binding protein" /locus tag="ModicumRichard_82" /note=Original Glimmer call @bp 51079 has strength 10.73; Genemark calls start at 51079 /note=SSC: 51079-51501 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ALEEMILY_81 [Gordonia phage Aleemily] ],,NCBI, q1:s1 100.0% 5.65372E-92 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.435, -4.681381803890022, yes F: DNA binding protein SIF-BLAST: ,,[hypothetical protein SEA_ALEEMILY_81 [Gordonia phage Aleemily] ],,UVK59821,97.1429,5.65372E-92 SIF-HHPRED: Small CPxCG-related zinc finger protein; zinc-finger, METAL BINDING PROTEIN; NMR {Haloferax volcanii DS2},,,8Q5B_A,37.8571,96.2 SIF-Syn: DNA binding protein; both upstream and downstream genes are NKF, like all other members of phage cluster DZ. Cafasso and ObLaDi also have synteny with this gene and have the same function, while other members of this cluster are NKF. /note=Primary Annotator Name: Adamian, Andrew /note=Auto-annotation: Both GeneMark and Glimmer call the start site at 51079. /note=Coding Potential: Both Host and Self trained GeneMark indicate strong coding potential, with both a strong upward hash and steep downward hash at the start and stop. /note=SD (Final) Score: The overall final score is -4.681, which is the best from all other gene candidates. However, it is not the LORF. /note=Gap/overlap: There is a 4 bp overlap between this gene and the previous gene. /note=Phamerator: This gene was found in Pham 7080 as of 10/31/2024, and this Pham is found within all the members of the cluster and several members of the DL cluster. There are 9 members of this Pham, all of which (except ModicumRichard) are non-drafts. /note=Starterator: The most often called start number was 13. It was called in 6 of the 8 non-draft genomes. The start coordinate that aligns with this start site is 51079 and has 6 MA`s. /note=Location call: I would call the start at 51029, as supported by auto annotation and the lack of a gap. /note=Function call: The likely function of this gene is as a DNA binding protein, which was called by HHPred and BLASTp. HHPred called two good hits above 95% probability and e-values less than 0.0007 for both. Additionally Phages BLASTp calls hits for ObLaDi (291 score and 4e-79 e-value with 134/140 identities) and Cafasso (279 score and 2e-75 e-value with 132/156 identities). NCBI also calls Cafasso (score 264, e-value 4e-88, identities 132/156) as well but also Aleemily (score 273, e-value 6e-92, identities 134/156). Although CDD did not call any hits, the function still is likely as a DNA binding protein. /note=Transmembrane domains: There are no transmembrane domains, as the protein this gene codes for is likely used within the cell. /note=Secondary Annotator Name: McFarlane, Kelly /note=Secondary Annotator QC: I agree with the called start site and the called function. The function call section of PECAAN notes could be more detailed, could include specific numbers (e-values/probabilities). Since there is a function for this gene, the Synteny box should also be filled out. CDS 51502 - 51675 /gene="83" /product="gp83" /function="hypothetical protein" /locus tag="ModicumRichard_83" /note=Original Glimmer call @bp 51502 has strength 14.92; Genemark calls start at 51502 /note=SSC: 51502-51675 CP: no SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_MORGANA_90 [Gordonia phage Morgana]],,NCBI, q1:s1 100.0% 1.55724E-31 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.407, -4.483031786805435, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_MORGANA_90 [Gordonia phage Morgana]],,XAO35524,96.4912,1.55724E-31 SIF-HHPRED: SIF-Syn: this gene has NFK. the gene upstream is a DNA binding protein and the gene down stream is NFK /note=Primary Annotator Name: Anebere, Stephanie /note=Auto-annotation: Both glimmer and gene mark agree on the start site of this gene /note=Coding Potential: the gene shows good coding potential in both host and self trained. /note=SD (Final) Score: the z score is above 2 and the final score is -4.483 these are both the only candidates and are good /note=Gap/overlap: there is no gap /note=Phamerator: as of 10/31/24 192233 There are 381 members, 40 are drafted. is conserved in Aleemy, Cafasso, Morgana, and ObLaDi /note=Starterator: as of 10/31/24 192233 a reasonable start site is 70 and 340 nondraft genes called for it. /note=Location call: the evidence shows that this is a real gene and has good coding potential. Based on the data the start site 30420 is the correct start site /note=Function call: NFK. There are decent hits on phagesdb and ncbi. The e-values are high and the evidence isnt strong. It might be an antitoxin. /note=Transmembrane domains: there are no TMD`s /note=Secondary Annotator Name: Lawson, Matthew /note=Secondary Annotator QC: Start 83 /note=Based on the data collected, I agree that the start location call of 51,502 seems most likely, though I would fix some of the information in your PECAAN notes and annotation notebook to provide better evidence for this start location. Both GeneMark and Glimmer call this start site and showed that it has good coding potential in both self-trained and host-trained GeneMark This gene also has syntany with other related phages , and there are strong PhagesDBBlast hits . Additionally, the start codon at 51,502 has high usage probability and covers all of the coding potential in the GeneMark maps. You did not correctly indicate what there is a 4BP overlap between this gene’s suggested start and the previous gene’s stop or if this start site this the longest reasonable ORF for this gene call, so I would fix this in your annotation notebook to provide stronger evidence for this location call. This location call of 51,502 also has a good Z-score (2.485) and a decent final score (-4.483). Additionally, the starterator data strongly agrees with this location call. Also, start 1 corresponds to 51,502 for our phage, not 51285 (that is for Aleemily). I also agree that there is not enough information to determine the function of this protein. Again, just make sure to fill out the function box on pecaan. CDS 51672 - 52052 /gene="84" /product="gp84" /function="hypothetical protein" /locus tag="ModicumRichard_84" /note=Original Glimmer call @bp 51750 has strength 8.51; Genemark calls start at 51750 /note=SSC: 51672-52052 CP: yes SCS: both-cs ST: SS BLAST-Start: [membrane protein [Gordonia phage Cafasso] ],,NCBI, q2:s3 99.2064% 3.34705E-61 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.487, -5.778251918042438, no F: hypothetical protein SIF-BLAST: ,,[membrane protein [Gordonia phage Cafasso] ],,QXN74299,91.9355,3.34705E-61 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: BURKE, KAIDEN /note=Auto-annotation: Glimmer and GeneMark both call this gene with agreement on the start codon at position 51750. This start codon is a “GTG”. /note=Coding Potential: The host-trained GeneMark and self-trained GeneMark gives the 3rd ORF of the forward strand very high coding potential. I believe the start site does work well within this coding potential. /note=SD (Final) Score: -4.919 is the score and this is the highest but is still perhaps not valid given that the other lower options have smaller gaps or no gaps. This appears to be in the middle of a potential operon and this does not support Glimmer’s start. /note=Gap/overlap: The gap is 74bp which is large but the other members of the cluster DZ have no gaps at this site as well giving more evidence against this start site. An alternative site at 51672 provides a larger ORF and continues the operon without a gap. /note=Phamerator: 88954 is the pham as of 10/29/24. There are 4 non-draft phages in this pham with 1 draft being the phage we are currently annotating. All 4 non-draft phages use this start site and have been manually annotated. The function is unknown. /note=Starterator: This is a start site that is conserved in all members of the cluster. This is Start: 4 @51672 is called by all 4 phages in the cluster and pham. /note=Location call: All together this suggests this is a real gene with a start site most likely at 51672 rather than the Glimmer and GeneMark suggested start site 51750 that fails to fill a large basepair gap and connects what appears to be an operon. All the other members of the cluster share this different site rather than the autoannotation as well. /note=Function call: There is no known function. The data from the NCBI and Phagesdb BLASTp shows high e-values (< e-47) and high identity (+80%). There were no CDD hits and all the HHpred hits had high e-values (> 0.35) with little understanding of the role of the protein. /note=Transmembrane domains: DeepTMHMM predicts 2 transmembrane helixes. The function remains unknown but this is consistent with the membrane protein referenced in the NCBI BLASTp. /note=Secondary Annotator Name: MUFF, MARGARET /note=Secondary Annotator QC: I agree with this annotation, as it reflects the conserved start across the pham, and the new start at 51672 includes the operon. There is no significant evidence indicating a known function, given the high e-values and lack of CDD and HHpred hits. Synteny box complete. No notes. CDS 52045 - 52392 /gene="85" /product="gp85" /function="membrane protein" /locus tag="ModicumRichard_85" /note=Original Glimmer call @bp 52045 has strength 13.04; Genemark calls start at 52045 /note=SSC: 52045-52392 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Gordonia phage Morgana]],,NCBI, q1:s1 100.0% 7.2387E-77 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.986, -2.707694805492614, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Gordonia phage Morgana]],,XAO35526,98.2609,7.2387E-77 SIF-HHPRED: SIF-Syn: Membrane protein, upstream gene is in pham 88954 and downstream gene is in pham 11257, just like in phage ObLaDi. /note=Primary Annotator Name: Laughton, Alette /note=Auto-annotation: Gene (stop@52392 R), Glimmer and GeneMark start called at 52045 bp. This start has a start codon of ATG. /note=Coding Potential: For the start site at 52045, the gene has reasonable coding potential in the Self-Trained and Host-Trained GeneMark systems, and the site covers all of the coding potential. The gene is 348 bp long, shows no switch in gene orientation from forward to reverse, and shows coding potential in only one frame of the forward GeneMark system. /note=SD (Final) Score: -2.708. This final score is the highest of the listed final scores for suggested start sites. The Z score is 2.986. This Z score is the highest of the listed Z scores for suggested start sites. /note=Gap/overlap: For the start site at 52045, 8 bp overlap. This is the smallest of the gaps/overlaps for listed start sites. This is not the longest ORF and there is no coding potential suggesting the presence of another gene before the start site. /note=Phamerator: Pham 192230, run on 10/29/2024. It is conserved, displaying synteny with non-draft genes in Aleemily (DZ), ObLaDi (DZ), Morgana (DZ), and Cafasso (DZ). The function is called as unknown for all except ObLaDi. /note=Starterator: 89/342 non-draft members call start site 49, which does not correlate to the chosen start site 40 at 52045 for this phage. Glimmer and GeneMark’s autoannotated start at 52045 still seems to be the most likely start due to ModicumRichard not containing the start at 49. /note=Location call: Based on the above evidence, this is a real gene. The gene is conserved and has reasonable coding potential. The likely start site for this gene is 52045. 52045 seems to be the correct start due to both Glimmer and GeneMark calling it, as well as having the best final score, Z score, and overlap bp value. /note=Function call: Membrane protein. PhagesDB BLASTp returned the top 4 hits of No Known Function (E values < 1e-60). NCBI BLASTp returned the top 2 hits of a Membrane Protein, found in DZ cluster phages Cafasso and Morgana (E values < 5e-75, coverage = 100%, identity 94-98%). CDD returned no hits. HHpred returned the top Pfam hit as Protein of Unknown Function (E value < 1e-24) and top PDB hit as Uncharacterized lipoprotein (E value = 3). ObLaDi calls this protein as a membrane protein. /note=Transmembrane domains: DeepTMHMM does not call any trans-membrane domains (TMDs), and this is therefore not a transmembrane protein. Additionally, ObLaDi called this similar gene as a membrane protein, which is accurate. Although the ModicumRichard run shows no TMDs predicted, the signal and outside portions of the topology graph show that this is a membrane protein of some kind. Since no other specifics are provided, the generalized “membrane protein” function on the approved functions list will be used. /note=Secondary Annotator Name: Choudhary, Khushi /note=Secondary Annotator QC: I agree with the first annotation on the likely start site for this gene, and the function, and have QC`d this call. All of the evidence categories have been considered. CDS 52392 - 52772 /gene="86" /product="gp86" /function="hypothetical protein" /locus tag="ModicumRichard_86" /note=Original Glimmer call @bp 52392 has strength 9.8; Genemark calls start at 52392 /note=SSC: 52392-52772 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_86 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 1.49522E-86 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.401, -5.972387497942108, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_86 [Gordonia phage Cafasso] ],,QXN74301,100.0,1.49522E-86 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Ea, Emily /note=Auto-annotation: Glimmer and GeneMark both call the start site at 52392 with a start codon of ATG. /note=Coding Potential: There is good coding potential in both the self and host-trained GeneMarks in the forward direction. The start site covers all of the coding potential. The neighboring genes are also in the forward direction. /note=SD (Final) Score: The final score is -5.972 and is in the bottom half of the shown final scores of all the start sites. The Z-score is 1.401, lower than the desired score of 2. These are not the best score seen on PECAAN, but may be lower to accommodate the start site needed for the operon. /note=Gap/overlap: The overlap of this gene and the prior gene is 1 bp. This overlap is evidence that the gene may be in an operon. /note=Phamerator: This gene is in pham 11257 on 10/31/24 and is conserved since it is found in other members of the same cluster (DZ) such as Morgana, Aleemily, and Cafasso. /note=Starterator: On 10/18/24, the most conserved start site number is 3 @52392 with 4 MA`s. It is called in 4/4 (100%) non-draft phages. This start site agrees with the auto-annotated sites from Glimmer and GeneMark and supports the location call @52392. /note=Location call: This gene appears to be a real gene and the start site that seems the most likely is the original (auto-annotated) one at 52392. This start site is highly conserved, is called 100% (4/4) of the time and the same start site is agreed upon by Glimmer, GeneMark, and Starterator. /note=Function call: This sequence has no known function. Phages DB BLAST and the NCBI BLAST results all listed either no known function (NKF) or hypothetical protein (predicted that this sequence is expressed but not verified), and all hits (except 1 outlier - LilMcDreamy phage) have low E-values (below e-68), high percent identity scores (98-100%), 100% query coverage, and very high BLAST scores that are only 0 or 4 points away. These mentioned hits are found in other Gordonia phages in the cluster DZ, making these hits very logical, and they are also the same length. There were no hits from CDD and hits from HHpred had insufficient values to use to make any function call (all E-values are above 70, all probability is below 29%, and all % coverage is below 26). /note=Transmembrane domains: There are no TMDs found from DeepTMHMM in this gene and cannot be supportive or dismissive of any function since there was no function called from the BLAST, CDD, and HHpred hits. This gene is not a membrane protein. /note=Secondary Annotator Name: Adamian, Andrew /note=Secondary Annotator QC: I agree with the start site location call and also feel the gene may be an operon. I agree with the function as well, since there is not enough evidence present to sufficiently present a proper function call. Additionally, I agree with the transmembrane call, as it is unlikely this protein is a membrane protein, especially considering that we don’t know this protein’s function. CDS 52769 - 52909 /gene="87" /product="gp87" /function="DNA binding protein" /locus tag="ModicumRichard_87" /note=Original Glimmer call @bp 52769 has strength 9.86; Genemark calls start at 52769 /note=SSC: 52769-52909 CP: yes SCS: both ST: SS BLAST-Start: [DNA binding protein [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 1.60584E-25 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.828, -4.0178693334748665, yes F: DNA binding protein SIF-BLAST: ,,[DNA binding protein [Gordonia phage Cafasso] ],,QXN74302,100.0,1.60584E-25 SIF-HHPRED: SIF-Syn: DNA binding protein, upstream gene is NKF (pham 11257) and downstream gene is helix-turn-helix DNA binding domain protein. ObLaDi/Morgana/Cafasso/Aleemily: DNA binding protein, upstream NKF (pham 11257 for all), downstream is helix-turn-helix DNA binding domain protein /note=Primary Annotator Name: Venkatraman, Rakshaa /note=Auto-annotation start source: Both Glimmer and GeneMark agree with start site 52769, start codon is GTG /note=Coding Potential: Good coding potential in predicted ORF in both host-trained and self-trained. Predicted start site covers the entirety of said coding potential. /note=SD (Final) Score: SD score is best for chosen start site 52769, as it is the highest/least negative. The RBS score is technically irrelevant for the start call since it has a 4bp overlap and is part of an operon, but is the best RBS score regardless. Z-score is good and above 2. /note=Gap/overlap: 4bp gap with the upstream gene reasonable as it is part of an operon. Other start candidates have unfavorable gaps/overlaps, and not the best RBS/Z-score. Currently chosen start has synteny with other phage genomes. /note=Phamerator: 10/31, part of Pham 11441. Pham conserved with other 4 non-draft phages in the same cluster: Aleemily, Cafasso, Morgana, ObLaDi. /note=Starterator: The start number called the most often in the published annotations is 2, it was called in 4 of the 4 non-draft genes in the pham. Start: 2 @52769 has 4 MA`s. /note=Location call: Based on the current evidence, this gene is a real gene with good coding potential, final/Z-score, and synteny with other phages. The auto-annotated 52769 start site seems correct based on this same evidence. Given it was also called by the other 4 non-draft phage genomes, Starterator evidence supports this start site too. /note=Function call: DNA binding protein, based on NCBI and PhagesDB Blasts, both of which had strong identities/alignment and significant e-scores. CDD and HHPred results were initially inconclusive, but comparison with MintFen_gp52 which is listed as DNA binding protein on SEA-Phages approved function list document yielded an E-value of 0.00091, which is significant evidence for this function. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: ANEBERE, STEPHANIE /note=Secondary Annotator QC: me: Anebere, Stephanie /note=Secondary Annotator QC: I agree with the start site proposed. The evidence is reasonable and suggests that this is the correct start site. I agree with the function call of this gene. I have QC’ed this gene. CDS 52973 - 53191 /gene="88" /product="gp88" /function="helix-turn-helix DNA binding domain" /locus tag="ModicumRichard_88" /note=Original Glimmer call @bp 52973 has strength 7.24; Genemark calls start at 53036 /note=SSC: 52973-53191 CP: yes SCS: both-gl ST: NI BLAST-Start: [helix-turn-helix DNA binding domain protein [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 1.82346E-42 GAP: 63 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.927, -2.9063687850157054, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding domain protein [Gordonia phage Cafasso] ],,QXN74303,100.0,1.82346E-42 SIF-HHPRED: ComR; Streptococcus, Competence, Quorum sensing, ComR, TRANSCRIPTION REGULATOR; 2.9A {Streptococcus suis (strain 05ZYH33)},,,5FD4_B,97.2222,98.0 SIF-Syn: helix-turn-helix DNA binding domain, upstream gene is DNA binding protein, downward stream gene is NKF. This is the same for other DZ cluster phages Cafasso, Morgana, ObLaDi, and Aleemily. /note=Primary Annotator Name: Allataifih, Bashar /note=Auto-annotation: Glimmer and GeneMark both call this gene but disagree on the start codon. This start codon is a “GTG” at 52973 for Glimmer and a "ATG" at 33036 for Genemark. /note=Coding Potential: The host-trained GeneMark and self-trained GeneMark gives the 2nd ORF of the forward strand very high coding potential. I believe the start site at 52973 works better than starting at 53036 because the former has higher alternate coding potential in the self-trained Genemark that is missed by the later start. /note=SD (Final) Score: -2.906 is the final score and is the best of the reasonable gap start sites by far. The rest have gaps of 100+ base pairs and are less reasonable. /note=Gap/overlap: Start codon at 69993 has the most reasonable and minimal gap at only 52 base pairs while the rest show a gap of over 200 base pairs making them very unreasonable for the coding potential they miss. /note=Phamerator: This gene is very conserved in the pham. All 4 other phages in the cluster share the length of start site 52973. /note=Starterator: As of 11/11/24 this gene is in the pham 88799. The start number called the most often in the published annotations is 1, it was called in 4 of the 4 non-draft genes in the pham.THis is the only site considered by the other members of the pham and cluster. The function is agreed to be a helix-turn-helix DNA-binding protein. /note=Location call: The gene is a real gene that starts very likely at the auto-annotated Glimmer start codon 52973 due to its small gap, high RBS value, and lack of reasonable alternatives. The conservation in other members of the pham and cluster also follow this start site pattern. /note=Functional call: The function is an helix-turn-helix DNA binding domain as seen by the top 2 NCBI BLASTp showing almost the same exact results , which is also the same as the function (e-value < 1e-100), and CDD domains showing the matching domains (HipB) with a decent e-value (9.94e-13). HHpred shows multiple different functions, one of them being Methylphosphonate synthase. We ended up deciding it would be called a hth DNA. It has 99.04% probability and an e-value of 5.4e-8. This is because we ran it on InterPro and 3 of the 5 hits came back as that, further confirming. /note=Transmembrane domains: DeepTMHMM predicts no TMDs; only one inside domain /note=Secondary Annotator Name: BURKE, KAIDEN /note=Secondary Annotator QC: I wrote the entire PECAAN from Auto-annotations to location call. I agree with the location call. I agree with function call due to clear evidence from BLAST and CDD showing helix-turn-helix DNA binding function. CDS 53188 - 53622 /gene="89" /product="gp89" /function="hypothetical protein" /locus tag="ModicumRichard_89" /note=Original Glimmer call @bp 53188 has strength 10.82; Genemark calls start at 53188 /note=SSC: 53188-53622 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_89 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 7.08079E-101 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.485, -4.197870532213559, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_89 [Gordonia phage Cafasso] ],,QXN74304,100.0,7.08079E-101 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Qian, Audrey /note=Auto-annotation: Gene (stop@53622). Both Glimmer and GeneMark call the start site at 53188, and the start codon is GTG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating it is a forward gene. The ORF is not the longest (435 bp compared to 483 bp), but it does have reasonable coding potential, which is found both in GeneMark Self and Host in the first frame. The chosen start site does include all of the coding potential. /note=SD (Final) Score: -4.198. It is the best final score on PECAAN. The z-score is the highest at -4.198 /note=Gap/overlap: 4bp overlap. This overlap aligns with the conventional 4bp overlaps for genes on an operon. This overlap is reasonable because the gap is conserved in other phages (Cafasso, Aleemily, and ObLaDi) and there is one coding potential in one frame in the overlap. /note=Phamerator: The pham number as of 10/29/2024 is 11444. The gene is conserved in phages Aleemily, Morgana, ObLiDa, and Cafasso, all in the same cluster as ModicumRichard (DZ). The function call for the gene is unknown. /note=Starterator: There are only 4 non-draft members of this pham. 4/4 non-draft members call start site 2, which correlates to a state site of 53188 bp for ModicumRichard. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 53188 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: NKF. The phagesdb BLAST hits with e-values greater than e-6 all call unknown function (8e-78 to 6e-33), and NCBI BLAST hits only call two hits with unknown function (>96.5278% coverage). HHpred called no good hits, as all of them are blue (low probability of < 42.12%, high e-value of > 66, and low coverage of < 41.6667%), and CDD called no domain hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs. Therefore, it is not a membrane protein. /note=Secondary Annotator Name: Laughton, Alette /note=Secondary Annotator QC: Based on the above evidence, I agree with the primary annotator`s locational call and functional call. CDS 53674 - 53922 /gene="90" /product="gp90" /function="hypothetical protein" /locus tag="ModicumRichard_90" /note=Original Glimmer call @bp 53674 has strength 15.84; Genemark calls start at 53674 /note=SSC: 53674-53922 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_OBLADI_90 [Gordonia phage ObLaDi]],,NCBI, q1:s1 100.0% 1.04734E-51 GAP: 51 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.564, -4.609715333525589, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_OBLADI_90 [Gordonia phage ObLaDi]],,UXE03813,100.0,1.04734E-51 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Liu, Carissa /note=Auto-annotation: Gene (stop@53922 F) Both Glimmer and GeneMark call the gene. They agree on the same start site, which was called at 53674 bp with a start codon of ATG. /note=Coding Potential: [start site 53674] The gene has reasonable (very good coverage) coding potential predicted within the putative ORF. The chosen start site covers all this coding potential. The gene displays synteny with at least two other non-draft phage genomes. There are four PhagesDB BLAST hits that are non-draft phage genomes with an e-value less than 10^-6. There is coding potential predicted by Glimmer, GeneMark Self, and GeneMark Host. The gene is at least 120 bp long (249 bp). There are no switches in gene orientation. Only one frame in one strand is used for a protein-coding gene. /note=SD (Final) Score: [start site 53674] The SD score -4.610 is the best. The Z-value 2.564 is the best. /note=Gap/overlap: [start site 53674] 51 bp gap with the upstream gene is a little large but reasonable. The gene and gap is conserved in other phages (Cafasso and ObLaDi). There is no coding potential in the gap that might be a potential gene. The length of the gene is acceptable (249 bp). It is the longest reasonable ORF for this gene call. /note=Phamerator: Pham 11643 as of 10/27/24 is conserved in Cafasso and ObLaDi, all in the same cluster DZ as ModicumRichard. No function called. /note=Starterator: There is a reasonable start site choice that is conserved. Start site #1 @53674 in ModicumRichard. 4/4 non-draft genes call site #1. This evidence agrees with the site predicted by both Glimmer and GeneMark. /note=Location call: [start site 53674] The gathered evidence suggests that this is a real gene, conserved in phamerator and has good coding potential. The most likely start site is 53674, conserved in starterator and covers all coding potential. /note=Function call: NKF. No program returned any informative results. PhagesDB BLASTp: function unknown. NCBI BLASTp: hypothetical protein. CDD: no hits. HHpred: no significant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Ea, Emily /note=Secondary Annotator QC: I agree with this location call as it is highly conserved and called by annotators. I agree with this function call as well since there is no evidence that it would have any known function from any database with sufficient statistics. CDS 53919 - 54545 /gene="91" /product="gp91" /function="MazG-like nucleotide pyrophosphohydrolase" /locus tag="ModicumRichard_91" /note=Original Glimmer call @bp 53919 has strength 12.94; Genemark calls start at 53919 /note=SSC: 53919-54545 CP: yes SCS: both ST: SS BLAST-Start: [MazG-like nucleotide pyrophosphohydrolase [Gordonia phage ObLaDi]],,NCBI, q1:s1 100.0% 8.88928E-149 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.478, -6.085678266381749, no F: MazG-like nucleotide pyrophosphohydrolase SIF-BLAST: ,,[MazG-like nucleotide pyrophosphohydrolase [Gordonia phage ObLaDi]],,UXE03814,99.5192,8.88928E-149 SIF-HHPRED: putative NTP pyrophosphohydrolase; Structural Genomics, Joint Center for Structural Genomics, JCSG, Protein Structure Initiative, PSI-2, HYDROLASE; HET: MSE; 1.78A {Exiguobacterium sibiricum 255-15},,,3NL9_A,97.1154,99.9 SIF-Syn: MazG-like nucleotide pyrophosphohydrolase, upstream gene is NFK, downward stream gene is DnaQ-like (DNA polymerase III subunit). This is the same for other DZ cluster phages Cafasso, Morgana, ObLaDi, and Aleemily. /note=Primary Annotator Name: Streva, Amanda /note=Auto-annotation start source: Both Glimmer Start and GeneMark Start call the start at 53919 bp with codon GTG /note=Coding Potential: High coding potential is found in both GeneMark Host-Trained and Self-trained Forward strand. Host had high coding potential that ranged from the agreed start site (53919) to the stop site (54545) and the typical and atypical coding potential from the Self were also high despite an insignificant amount of typical coding potential found in reverse strands of the gene /note=SD (Final) Score:-6.086, although not the highest, it has the 2nd best Z-score at 1.478 and by far the gap says -4bp (overlap of 4bp) as well as it is the only start site present that incorporates all of the coding potential /note=Gap/overlap: -4bp, which is a 4bp overlap and is most likely an operon, which makes sense given the general arrangement of phage genomes /note=Phamerator: The Pham number is 192632 and was ran on 10/29/24 it is conserved, found in 51 genes total (of all different clusters) with 4 others being DZ (ObLaDi, Morgana, Cafasso, and Aleemily) /note=Starterator: The start site 19 was the most manually annotated in 31 of the non-draft genes, yet it was not present in the track. Because of this, auto-annotated start site 7, 53919 was kept because both and Glimmer and GeneMark call it as well as other evidence of Gap and final/Z score /note=Location call: Based on the evidence, this is a real gene that calls the gene start 53919. Starterator is proven to be uninformative and the final decision was chosen based on all other prior evidence /note=Function call: MazG-like nucleotide pyrophosphohydrolase. There are 4 Top BLAST hits from PhagesDB (ObLaDi, Aleemily, Cafasso, Morgana) that call the function and have very good e-values (6), therefore there is no sufficient evidence to call a function other than NKF. CDS 56195 - 56431 /gene="96" /product="gp96" /function="hypothetical protein" /locus tag="ModicumRichard_96" /note=Original Glimmer call @bp 56195 has strength 15.25; Genemark calls start at 56195 /note=SSC: 56195-56431 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ALEEMILY_95 [Gordonia phage Aleemily]],,NCBI, q1:s1 100.0% 6.25155E-33 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.089, -4.482213237254482, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEEMILY_95 [Gordonia phage Aleemily]],,UVK59835,86.25,6.25155E-33 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Su, Erin /note=Auto-annotation: Glimmer and GeneMark both call the gene and agree on a start site of 56195. /note=Coding Potential: This gene does have coding potential in both GeneMarkSelf and GeneMark Host, indicated in the forward strand. /note=SD (Final) Score: The final score is -4.482 with a Z-score of 2.089, which is favorable. This is also the best score on PECAAN. This also has a start codon of ATG, more common, and would also create the LORF. /note=Gap/overlap: The overlap is 1bp, indicating that it could potentially be in an operon. The other start site options have gaps of 50+, not favorable. /note=Phamerator: The pham number as of 10/29/2024 is 11746. The gene is conserved in each of the four other phages (ObLaDi, Cafasso, Aleemily, and Morgana) in Cluster DZ. /note=Starterator: The four other non-draft genes that are in this pham belong to the four other phages in Cluster DZ, the same as ModicumRichard. Each of the four non-draft genes call start site 1 which corresponds to 56195 for ModicumRichard and is the same as the auto-annotation suggestion. The starterator analysis was last run on 10/18/2024 on database version 578. /note=Location call: Due to the above evidence, this is a real gene and the start site is most likely 56195. /note=Function call: Unknown function. Each of the top 4 hits on both PhagesDB BLAST (E-value less than 10^-28 each) have no known functions. Each of the top 4 hits on NCBI BLAST (identity >70%, 100% coverage each, E-values below 10^-30) are hypothetical proteins. There were no CDD hits. There are no hits on HHPred—the top E-values were above 10^1, and the highest probability is less than 60%. /note=Transmembrane domains: DeepTMHMM does not predict any transmembrane domains. This is not a transmembrane protein. /note=Secondary Annotator Name: Torosian, Isabella /note=Secondary Annotator QC: I agree with the primary annotator`s location call. There is strong evidence to support start site 56195. I also agree with the location call. CDS 56428 - 57018 /gene="97" /product="gp97" /function="metallophosphatase" /locus tag="ModicumRichard_97" /note=Original Glimmer call @bp 56428 has strength 13.76; Genemark calls start at 56428 /note=SSC: 56428-57018 CP: yes SCS: both ST: SS BLAST-Start: [metallophosphoesterase [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 1.31931E-136 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.663, -3.368377069717189, yes F: metallophosphatase SIF-BLAST: ,,[metallophosphoesterase [Gordonia phage Cafasso] ],,QXN74313,98.4694,1.31931E-136 SIF-HHPRED: d.159.1.8 (A:) Hypothetical protein aq_1666 {Aquifex aeolicus [TaxId: 63363]},,,d1xm7a_,94.898,99.8 SIF-Syn: Metallophosphatase, downstream gene is NKF, upstream gene is NKF, just like in phages Morgana, ObLaDi, Aleemy, and Cafasso. /note=Primary Annotator Name: Baugh, Alex /note=Auto-annotation: Gene (stop@57018 F) Both Glimmer and Genemark call the start site at 56428. /note=Coding Potential: There is coding potential only on the forward strand, showing that this is a forward strand. There is coding potential in both Genemark self and host. /note=SD (Final) Score: -3.368. The z-score is 2.663 and both are the highest. /note=Gap/overlap: -4. This is not a bigger overlap than -7 and -4 is an ideal overlap. This could mean the gene is part of an operon. /note=Phamerator: At 10/29/24, the gene is in pham 192373. The cluster DZ is conserved and is found in 4 other phages Aleemy, Cafasso, Morgana, ObLaDi. /note=Starterator: There is 136 non-draft members in this pham. 36 out of the 136 non-draft members call the start site of 58, but ModicumRichard has a start site of 59 with 32 out of the 136 non-draft members calling it. This start site correlates to a start site of 56428 bp for ModicumRichard. /note=Location call: With the above evidence, this is a real gene and the start site is called for at 56428. Despite there being more members in the pham calling the start site at 58, 59 was chosen since that start site is conserved with other phages in its cluster and it matches the auto-annotation start site. /note=Function call: Metallophosphatase. The top 2 phagesdb Blast hits have one function as Metallophosphoesterase and the other as metallophosphatase (which is the same), and the top 2 NCBI Blast hits have the function of Metallophosphoesterase with E-values of 1e-136 to 3e-136. Many other hits with good scores in both phagesdb and NCBI also have the function of Metallophosphoesterase. There are also strong domain alignment data for CCD and HHpred that align the ORF with known Metallophosphatase and Metallophosphoesterase in phages. With these consistent matches across multiple tools and datasets, the function of Metallophosphatase is assigned. It also meets requirements listed on the Approved Functions List. /note=Transmembrane domains: DeepTMHMM did not predict any TMDs, so it is not a membrane protein. /note=Secondary Annotator Name: Lee, Kyle /note=Secondary Annotator QC: I agree with this annotation including the location and function call, although it would help to add a little more detail to your PECAAN notes. For example, please mention CDD and HHpred findings, whether they contributed to your function call or not. Also, you need to update your PECAAN notes on PECAAN, add a function call, and add a synteny call. If you used HHPred and CDD as evidence, check the boxes on PECAAN as well. Also, you still need to fill out the transmembrane domain portion of the PECAAN notes. Lastly, metallophosphoesterase is not a SEA-phages approved functional call, so update it to metallophosphatase. CDS 57019 - 57315 /gene="98" /product="gp98" /function="hypothetical protein" /locus tag="ModicumRichard_98" /note=Original Glimmer call @bp 57019 has strength 18.87; Genemark calls start at 57019 /note=SSC: 57019-57315 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_MORGANA_105 [Gordonia phage Morgana]],,NCBI, q1:s1 97.9592% 6.30395E-61 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.744, -3.2017655760970833, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_MORGANA_105 [Gordonia phage Morgana]],,XAO35539,97.9592,6.30395E-61 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Iasi, Matilde /note=Auto-annotation:Glimmer and GeneMark both call the start at 57019, with a start codon GTG. /note=Coding Potential: Coding potential in this ORF is in the forward strand only. Coding potential is found in both GeneMark Self and in Host. /note=SD (Final) Score: -3.202, and is the best final score on PECAAN and it provides the LORF /note=Gap/overlap: It has a perfect gap of 0, which indicates continuity with neighboring genes. /note=Phamerator: Pham 10845. Date 18/10/24. It is conserved, found in Aleemily (DZ), Cafasso (DZ) Morgana (DZ) and ObLaDi (DZ). /note=Starterator: Start site 1 was manually annotated in 5/5 non-draft genes in this pham. Start 1 is at position 57019 in ModicumRichard draft, and it agrees with both GeneMark and Glimmer autoannotated start sites. /note=Location call: Based on the evidence, this is a real gene with the start at 57019. /note=Function call: unknown function. The top 4 PhagesDB BLAST results have an unknown function (E-value 3e-49 for the top one and 2e-47 for the next one). All 3 NCBI BLAST hits also have an unknown function (97% coverage, 100% identity, and E-value 6e-61). CDD and HHpred searches were not informative, as neither found any hits. Therefore the protein has no known function. /note=Transmembrane domains: No TMDs detected /note=Secondary Annotator Name: Khuu, Jacob /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. I agree with the functional call given the evidence; the top hits on PhagesDB call for an unknown function, the HHpred and CDD hits were uninformative, and NCBI BLAST yielded hypothetical protein hits. CDS 57312 - 57572 /gene="99" /product="gp99" /function="hypothetical protein" /locus tag="ModicumRichard_99" /note=Original Glimmer call @bp 57312 has strength 8.52; Genemark calls start at 57312 /note=SSC: 57312-57572 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_MORGANA_106 [Gordonia phage Morgana]],,NCBI, q1:s1 100.0% 1.47704E-54 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.43, -5.893810279580483, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_MORGANA_106 [Gordonia phage Morgana]],,XAO35540,98.8372,1.47704E-54 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Mathew, Taniya /note=Auto-annotation: Gene 3 (stop@57572 F), Glimmer and GeneMark both indicate the start site is 57312. /note=Coding Potential: Since coding potential for the ORF this gene is located in is only in the forward strand, this gene is a forward gene and the chosen start site contains all of the coding potential. /note=SD (Final) Score: The SD final score given for this gene is -5.894, which is the second best final score on PECAAN. The first best score (-5.778) is only slightly higher and is associated with a much larger gap, making the SD final score better. /note=Gap/overlap: Gap overlap is -4, which means there is minor overlap. Since the gap is not greater than -7, this can be overlooked. A gap size of -4 might indicate an operon. /note=Phamerator: Pham 11207. Date 10/29/24. There is only one cluster represented in this pham and the gene is conserved in phages Aleemily, Cafasso, and ObLaDi which are part of DZ, just like Modicum Richard. /note=Starterator: Start site was called at 57312 on both auto-annotation and starterator, and my gene contains this most-annotated start site. /note=Location call: This gene is most likely a real gene and the start site is called at 57312. /note=Function call: There is not enough information to confidently call a function for this gene. The BLASTp hits on PhagesDB and NCBI came back as “unknown function,” and had good values (E-value <10^-44.) HHpred and CDD hits were not quite statistically significant and came back with differing results. Due to these results, I cannot confidently conclude the gene’s function. /note=Transmembrane domains: There are no TMDs detected within this gene. /note=Secondary Annotator Name: Xu, Sasha /note=Secondary Annotator QC: I agree with the function call. For PhagesDb blast results, be careful calling draft genomes as evidence. Additionally, for the HHpred results, should they be checked as evidence if the results are not significant? CDS 57569 - 57982 /gene="100" /product="gp100" /function="hypothetical protein" /locus tag="ModicumRichard_100" /note=Original Glimmer call @bp 57569 has strength 15.74; Genemark calls start at 57569 /note=SSC: 57569-57982 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_101 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 5.36305E-83 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.007, -2.7423981586358774, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_101 [Gordonia phage Cafasso]],,QXN74316,92.7007,5.36305E-83 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Reynolds, Noah /note=Auto-annotation: Gene stop@ 57982 F. Both genemark and Glimmer gave the start site as 57569 with a glimmer score of 15.74. /note=Coding Potential: Coding potential in this ORF is in the forward strand only, indicating it is a forward gene. /note=SD (Final) Score: -2.742, and it is the best final score on PECAAN /note=Gap/overlap: -4. This suggests the gene may be an operon and thus transcribed as polycistronic mRNA. This overlap is minimal, is conserved in other phages (Cafasso) and there is no gap to have coding potential that might be a new gene. /note=Phamerator: Pham 88981 10/29/2024, it is conserved in 4/4 of the non draft genes all of which are in the same cluster, DZ. It was found in Aleemily_99, Cafasso_101, Morgana_107 and ObLaDi_100. /note=Starterator: There are only 4 non-draft members of this Pham. 4/4 non-draft members call start site 3, which correlates to a start site of 57569 bp for Modicum Richard. /note=Location call:This is a real gene and the start site is called for 57569 /note=Function call: The gene is classified as having "No Known Function" due to the lack of sufficient functional evidence, despite strong sequence conservation. PhagesDB BLAST results revealed high-confidence matches to hypothetical proteins in related Gordonia phages, such as ObLaDi_100 and Aleemily_99, with E-values of 6e-67 and 8e-67, and identities exceeding 90%. Additionally, a match to Cafasso_101 was found with a length of 137 amino acids, a score of 249, and an E-value of 2e-66 within cluster DZ (Pham 88981). NCBI BLAST similarly confirmed strong alignments, including hits to Morgana_107 (E-value of 3.008e-81, 89.78% identity), ObLaDi_100 (E-value of 8.86365e-77, 91.97% identity), and Aleemily_99 (E-value of 1.50071e-76, 91.24% identity), all annotated as hypothetical proteins from Gordonia phages. Despite this strong conservation, these matches remain functionally uncharacterized. Furthermore, both CDD and HHpred analyses produced no significant hits or relevant domains. HHpred was particularly uninformative, returning hits such as 4C18_A with a low probability of 31% and a very high E-value of 61, as well as similarly weak matches (e.g., d2c9wc1 with a probability of 27.6% and an E-value of 76). The gene`s conserved synteny within cluster DZ further supports its validity, yet the complete lack of specific functional evidence justifies its NKF classification. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: SU, ERIN /note=Secondary Annotator QC: I have QC’d this location call and I agree with the primary annotator. However, you have incorrectly listed the gap/overlap as “4” when it is actually “-4.” Be careful acknowledging the “gap” where they may be coding potential because it is an overlap, not a gap. You have also incorrectly stated the start site on Starterater as number “100” when it is actually “3”. I agree that the function cannot be determined. Make sure to address each of the pieces of evidence that you`ve selected on PECAAN because they should support your argument, at least mentioning the number of hits (there were 4 hits with so and so identity, or something similar). It could also help to provide example numbers for why HHPred was unhelpful (low probability, high E-value, etc). CDS 57979 - 58929 /gene="101" /product="gp101" /function="ribonucleotide reductase" /locus tag="ModicumRichard_101" /note=Original Glimmer call @bp 57979 has strength 15.88; Genemark calls start at 57979 /note=SSC: 57979-58929 CP: no SCS: both ST: NI BLAST-Start: [ribonucleotide reductase [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.393, -4.2111970702621555, no F: ribonucleotide reductase SIF-BLAST: ,,[ribonucleotide reductase [Gordonia phage Cafasso]],,QXN74317,100.0,0.0 SIF-HHPRED: Ribonucleoside-diphosphate reductase subunit beta; OXIDATION-REDUCTION, FLAVIN MONONUCLEOTIDE, MANGANESE, OXIDOREDUCTASE; HET: MN; 2.65A {Streptococcus sanguinis} SCOP: a.25.1.0,,,4N83_C,99.3671,100.0 SIF-Syn: ribonucleotide reductase, upstream gene is unknown (pfam 88981) and downstream gene is unknown (pfam 8108), just like in phage ObLaDi. /note=Primary Annotator Name: Bonver, Sarah /note=Auto-annotation: Both Glimmer and GeneMark call start at 57979 /note=Coding Potential: The stop and start site of this gene captures all of the coding potential on Host-trained Gene mark and the red dotted line on the Self-trained GeneMark. But the black line on the self-trained one isn’t perfectly aligned /note=SD (Final) Score: Final score is the fourth to least negative with a z-score of 2.393. There are alternative start sites with more negative final scores and good Z-scores but these have bigger gaps /note=Gap/overlap: There is a -4bp overlap which suggests it may be part of an operon. /note=Phamerator: 191282. Date 11/06/2024. It is conserved; found in Aleemily (DZ), Cafasso (DZ), ObLaDi (DZ), Morgana (DZ) /note=Starterator: start #32 @57979 has 6 MA`s. 4 MA’s in cluster DZ and 2 in cluster CQ2. Found in 8/68 genes in pham. /note=Location call: The site that Genemark and Glimmer called is the most likely start site. There is good evidence for it being a real gene because of the E-value, length, and coding potential. Also the -4bp overlap suggests it’s part of an operon. “GTG” is a commonly found start codon. /note=Function call: This gene most likely encodes the beta subunit of class Ib ribonucleotide reductase (RNR), based on strong matches to domains like RNR_1b_NrdF (TIGR04171), NrdB (COG0208), and Ribonuc_red_sm (pfam00268), which are specific to this subunit. Structural alignment with PDB 4N83 further confirms the function along with the absence of TMDs. /note=Transmembrane domains: The absence of TMDs makes sense since this gene is a ribonucleotide reductase which works on ribonucleotides - found intracellularly. /note=Secondary Annotator Name: Baugh, Alex /note=Secondary Annotator QC: I agree with location call and functional call. However, I believe there is a Pfam result as evidence (PF00268.26 on HHpred 11/22/24) so you should add that as evidence in PECANN and your notebook in Module 7. CDS 58926 - 59072 /gene="102" /product="gp102" /function="hypothetical protein" /locus tag="ModicumRichard_102" /note=Original Glimmer call @bp 58926 has strength 11.46; Genemark calls start at 58926 /note=SSC: 58926-59072 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_OBLADI_102 [Gordonia phage ObLaDi]],,NCBI, q1:s1 100.0% 4.3157E-25 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.435, -3.914968956777623, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_OBLADI_102 [Gordonia phage ObLaDi]],,UXE03825,97.9167,4.3157E-25 SIF-HHPRED: Zn-ribbon_TFIIS; domain III/zinc ribbon domain of Transcription Factor IIS. TFIIS is a zinc-containing transcription factor.,,,cd13749,66.6667,76.9 SIF-Syn: /note=PECAAN Notes /note=Primary Annotator Name: Graham, Noah /note=Auto-annotation: #58,926 noted as the called start for both Glimmer and GeneMark agree (good evidence for real gene) , start codon GTG /note=Coding Potential: there is good coding potential for both the Host and Self trained GeneMark, the coding potential covers more bp than the start site listed which should mean that the gene is real /note=SD (Final) Score: -3.915, best score called by PECAAN which is reasonable but also may be irrelevant (according to the lab manual) since this gene is likely an operon /note=Gap/overlap: -4bp reasonable overlap suggesting this is an operon, the Z value is greater than 2 and this start codon of GTG creates the largest ORF and puts the gene over the 120bp threshold for a real coding gene which means the other start sites would be unreasonable as coding potential would be missed and the gene would not be in the LORF /note=Phamerator: Pham 8108 on 10/29/24 // yes the pham my gene belongs to is also present in members of the same DZ cluster as well as the CQ and AA clusters using Cafasso, ObLaDi, and Aleemily as examples // no function was called by phamerator or phams database /note=Starterator: yes there is a reasonable start site number, #2 @58926 bp // there are 7 other members in the pham and all 4 that contained the start site called it /note=Location call: the gene is real as seen by valid scores, coding potential, and reasonable length. the start site 58926 is the most reasonable as it creates the LORF, smallest gap/overlap and is conserved in an operon /note=Function call: I do not have enough data to hypothesize a function. The closest relative phages and the hits on their genes are not assigned functions, the matches exhibit helpful scores in providing relative similarity, but since they do not provide functions it is likely my gene does not have a known one either. NCBI and phagesDB BLASTp had good hits based on scores but did not contain a known function. NCBI and phagesDB BLAST runs showed this with pretty strong matching hits with 100% and 95% identities and strong e-values. HHpred has less evidence for strong matches but exhibits an idea of transcription factor function which is on the list of functions. /note= /note=Transmembrane domains: Since the function is unknown it’s hard to confirm whether or not the absence of TMDs (0 predicted) makes sense. however it seems likely that this would be the case considering there are much less proteins that interact with multiple regions of the membrane rather than those on the inside of a viral particle /note=Secondary Annotator Name: Iasi, Matilde /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. I agree on the start site as the start at 58926 provides good scores and covers more of the coding potential. It has reasonable gap length and gene length, as well as the start site being conserved in 4/7 genes in the pham. The CDS 59144 - 59476 /gene="103" /product="gp103" /function="hypothetical protein" /locus tag="ModicumRichard_103" /note=Original Glimmer call @bp 59144 has strength 15.19; Genemark calls start at 59144 /note=SSC: 59144-59476 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_MORGANA_110 [Gordonia phage Morgana]],,NCBI, q1:s1 100.0% 4.2982E-44 GAP: 71 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.574, -3.5676719685030167, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_MORGANA_110 [Gordonia phage Morgana]],,XAO35544,82.8571,4.2982E-44 SIF-HHPRED: NS6_deltaCoV; deltacoronavirus accessory protein NS6. This family includes the accessory protein NS6 from deltacoronaviruses such as porcine coronavirus HKU15, and several avian coronaviruses found in sparrow, pigeon, quail and falcon, among others.,,,cd21629,12.7273,40.3 SIF-Syn: /note=Primary Annotator Name: Kang, Hannah /note=Auto-annotation: Glimmer and GeneMark both call the gene and call a start at 59144 bp, with the start codon TTG. /note=Coding Potential: The gene has a reasonable coding potential predicted within the putative ORF. The chosen start site covers all the coding potential found in GeneMark Self and Host. The coding potential of the gene is only found on one ORF on the forward strand. /note=SD (Final) Score: -3.568. It is the best final score on PECAAN. 2.574 is the highest Z-score on PECAAN. /note=Gap/overlap: 72bp. It is a large gap that is conversed in other phages (Cafasso, Aleemily). There is no coding potential in that gap that might indicate the presence of another gene. /note=Phamerator: Pham: 88794. DATE: 10/30/2024. The gene is conserved in phages aleemily, Morgana, Cafasso, and ObLaDi, which are within cluster DZ, the same as ModicumRichard. /note=Starterator: Start site 1 auto-annotated in ModicumRichard was manually annotated in 4 of the 4 non-draft genes in the pham. Start number 1 is 59144 bp in ModicumRichard which aligns with the site predicted by Glimmer and GeneMark. /note=Location call: With the evidence above, this is a real gene and has a start site at 59144 bp. Starteratror agrees with Glimmer and GeneMark. /note=Function call: Function call: Function unknown. The top phagesBD BLAST hits have unknown functions (E-Value < 10^-7). Hits with E-Values > 1 call the function as a tape measure protein, capsid maturation protease, and MuF-like fusion protein. The top NCBI BLAST hits have unknown protein functions (100% coverage, 75%+ identity, and E-Value < 10^-9). CCD and HHpred had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, so we can conclude this gene does not encode a membrane protein. /note=Secondary Annotator Name: Mathew, Taniya /note=Secondary Annotator QC: I agree with this location and function call. All of the evidence categories have been considered. Note: I believe you put NFK instead of NKF, which is why it hasn`t registered as complete in PECAAN! CDS 59473 - 59886 /gene="104" /product="gp104" /function="hypothetical protein" /locus tag="ModicumRichard_104" /note=Original Glimmer call @bp 59473 has strength 13.78; Genemark calls start at 59473 /note=SSC: 59473-59886 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_CAFASSO_105 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 5.81097E-92 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.663, -3.3861058366776207, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_105 [Gordonia phage Cafasso]],,QXN74320,100.0,5.81097E-92 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Nuckowski, Sophie /note=Auto-annotation: Both glimmer and gene mark call the gene at 59,473. /note=Coding Potential: There is high coding potential, seen in the second reading frame on GeneMark, which starts right after the predicted start site and ends at the stop. /note=SD (Final) Score: -3.386, the least negative/ highest start possible. /note=Gap/overlap: There is a gap of -4, which is acceptable, and highly likely indicates the presence of an operon /note=Phamerator:180 /note=Starterator: Start 11 is the predicted start from autoannotation, and is likely correct as very similar genes, including ObLaDi and Ameeily, have that same start. /note=Location call: The start is likely at 59,473, given the small overlap indicative of an operon, synteny present, and high SD score. /note=Function call: /note=Transmembrane domains: /note=Secondary Annotator Name: Reynolds, Noah /note=Secondary Annotator QC: I agree with this annotation. However it would be a good idea to add more detail to your PECAAN notes and then fill out the drop down menu for the starterator and GM coding capacity. It would be a good idea to add more to your phamerator (including date the PHAM analysis was run, how many genes its conserved in and the clusters). Please try to get the last QC notes fixed as well as complete Modules 6-8 in your Lab NB as soon as possible so you can fill out these PECAAN notes and determine a function for your gene. After running a PhagesDB BLAST, NCBI BLAST, CDD, HHpred, and DeepTMHMM analyses, I have determined the gene to have NKF. The PhagesDB BLAST produced four positive hits with ObLaDi_104, Cafasso_105, Morgana_137, and Aleemily_103. These hits exhibited strong E-values, were part of the DZ cluster, had similar/close scores, and demonstrated good identity. The NCBI BLAST results showed excellent matches for Aleemily and Cafasso. However, the CDD, HHpred, and DeepTMHMM analyses did not provide any significant hits or relevant domains. As a result of this review, I have determined the gene to have NKF. You should include specific values/data sets to provide good evidence. Also when you are done, make sure to click the evidence box for the information used to make the claim of NKF CDS 59886 - 60146 /gene="105" /product="gp105" /function="hypothetical protein" /locus tag="ModicumRichard_105" /note=Original Glimmer call @bp 59886 has strength 13.71; Genemark calls start at 59886 /note=SSC: 59886-60146 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_MORGANA_112 [Gordonia phage Morgana]],,NCBI, q1:s1 98.8372% 7.88297E-40 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.586, -3.8156109669835954, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_MORGANA_112 [Gordonia phage Morgana]],,XAO35546,89.6552,7.88297E-40 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NFK from pham 180, downstream is HNH endonuclease, just like in phage Cafasso. /note=Primary Annotator Name: Burgess, Brooklyn /note=Auto-annotation: Both Glimmer and Genemark agree and call the gene on the call start at 59886. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: For start site 59886, the final score is -3.816, which is the best final score on PECAAN. Also, the z score is the highest at 2.586. /note=Gap/overlap: For start site 59886, the gap is -1 bp, indicating that it could potentially be in an operon. The other start site has a gap of 248 which is not favorable. /note=Phamerator: Pham 11380 as of Nov 11, 2024. Conserved; Gene 105 displays synteny with ObLaDi (DZ), Morgana (DZ), Cafasso (DZ), and Aleemily (DZ). /note=Starterator: As of November 11, 2024, there are only 5 non-draft members of this pham. Start site 2 in Starterator was manually annotated in 5/5 non-draft genes in this pham. This corresponds to start site @ 59886 on ModicumRichard. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all the evidence, this gene is a real gene and has a start site at 59886 which both Glimmer and GeneMark agreed on. /note=Functional call: NKF; multiple phagesDB BLAST have hits for unknown function with e values of 1e-31 and 3e-27. HHpred and CDD both had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Bonver, Sarah /note=Secondary Annotator QC:Just make sure to select an option from the pham starterator drop down. Also explain that a gap of -1 could indicate that this gene is part of an operon. Complete module 6-8 and check off evidence on pecaan CDS 60143 - 60772 /gene="106" /product="gp106" /function="HNH endonuclease" /locus tag="ModicumRichard_106" /note=Original Glimmer call @bp 60143 has strength 2.55; Genemark calls start at 60440 /note=SSC: 60143-60772 CP: yes SCS: both-gl ST: SS BLAST-Start: [Uncharacterised protein [Mycobacteroides abscessus subsp. massiliense] ],,NCBI, q5:s18 95.6938% 5.04907E-45 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.927, -3.6727816321281046, yes F: HNH endonuclease SIF-BLAST: ,,[Uncharacterised protein [Mycobacteroides abscessus subsp. massiliense] ],,SKU70185,51.5695,5.04907E-45 SIF-HHPRED: HNH endonuclease; Thermophilic bacteriophage, HNH Endonuclease, DNA nicking, HYDROLASE; 1.52A {Geobacillus virus E2},,,5H0M_A,57.8947,97.3 SIF-Syn: /note=Primary Annotator Name: Piao, Sara /note=Auto-annotation: Both Glimmer and GeneMark called. Glimmer called 60143. GeneMark called 60440. /note=Coding Potential: There doesn’t appear to be evidence that this gene has strong coding potential, but there were some significant peaks in the self-trained GeneMark Coding Potential Map in the second forward reading frame which could mean some atypical coding potential. /note=SD (Final) Score: -3.673. This is the second best final score on PECAAN. This was still chosen because the gap/overlap was the most indicative of a start site. /note=Gap/overlap: Overlap: 4 bp. This shows evidence that the gene is most likely to be part of a lac operon. /note=Phamerator: pham: 107934. Date 10/30/2024. It is conserved among phages in cluster A; found in DreamCatcher (A), Abrogate (A), and Ruotula (A). /note=Starterator: Start site 5 in Starterator was manually annotated in 3/3 non-draft genes in this pham. There is no corresponding starting site 5 in ModicumRichard. ModicumRichard called start site 7 instead (at 60143). This evidence supports the site predicted by Glimmer. Being the only phage in the DZ cluster that has this gene distinguishes it from other phages in the DZ cluster. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site 60143. /note=Function call: Predicted function is an HNH endonuclease, based on hits from PhagesDB BLASTp and HHpred. PhagesDB BLASTp had 2 hits with low identity (37%) but very low e-values of 2e-33. NCBI BLASTp gave 2 hits of no known function. CDD had no hits. HHpred had a hit for PDB and Pfam. The PDB hit was for an HNH endonuclease with 99.26% probability, 47.89% coverage, and a low e-value of 3e-6. The Pfam hit was for a protein involved in recombination, one of the functions of an HNH endonuclease with a 97.12% probability, a coverage of 32.53%, and a low e-value of 1.6e-3. When aligning the amino acid sequence of this gene to the SEA-PHAGES example sequence for an HNH endonuclease, there was also a high probability of 95.19% and a low e-value of 4.2e-7. All of this evidence supports that the function of this gene is most likely an HNH endonuclease. It was also further confirmed with the other conserved phages from cluster A found in Phamerator that this gene is an HNH endonuclease by alignment of the first gene (not synteny specifically as this gene seems to be acquired in a random position but is conserved with the other phages in cluster A). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Graham, Noah /note=Secondary Annotator QC: I agree with the primary annotator, the gap and scores suggest this is the accurate start site. I also agree with the function call. CDS 60769 - 61155 /gene="107" /product="gp107" /function="HNH endonuclease" /locus tag="ModicumRichard_107" /note=Original Glimmer call @bp 60796 has strength 7.11; Genemark calls start at 60910 /note=SSC: 60769-61155 CP: no SCS: both-cs ST: SS BLAST-Start: [HNH endonuclease [Gordonia phage Morgana]],,NCBI, q4:s5 97.6562% 1.12323E-66 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.154, -4.429150775994987, no F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Gordonia phage Morgana]],,XAO35548,82.1705,1.12323E-66 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sahagun, Diego /note=Auto-annotation: Glimmer Start:60796 GeneMark Start:60910, differing starts for glimmer and Genemark. /note=Coding Potential: Low Coverage in Host, decent coverage in Self. Low coverage when it comes to start site. /note=SD (Final) Score: -4.429, 2.154 z-score. not the best but is second. /note=Gap/overlap: 4 overlap, decent size not too big, fits operon. /note=Phamerator: Pham Map with Morgana and Aleemily show a function of HNH endonuclease but different start site. 192191 /note=Starterator: (Start: 129 @60769 has 15 MA`s), (Start: 142 @60796 has 1 MA`s), Many other clusters, 541 non-draft. /note=Location call: 60769, has support from starterator, Glimmer, and gap size is in line. /note=Function call: HNH endonuclease, BLAST supports it with different cluster having the same conservation of the gene, conserved domain also supports, HHPRED also has data supporting. /note=Transmembrane domains: 0, absence aligns with protein function. /note=Secondary Annotator Name: Kang, Hannah /note=Secondary Annotator QC: I have QC`ed this location and function call and agree with the first annotator. Could include the date the pham investigation was run, if there is a start site choice conserved for other members of the pham, what start site is the conserved start site, which it is for the phage, how many pham members there are and how many call the conserved site. Also check HHpred and NCBI evidence. CDS 61152 - 61511 /gene="108" /product="gp108" /function="hypothetical protein" /locus tag="ModicumRichard_108" /note=Original Glimmer call @bp 61140 has strength 15.02; Genemark calls start at 61140 /note=SSC: 61152-61511 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_108 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 2.72543E-80 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.653, -3.3275678682521845, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_108 [Gordonia phage Cafasso]],,QXN74323,97.479,2.72543E-80 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Zhang, Kevin /note=Auto-annotation: Glimmer and GeneMark. Both call at start 61140 /note=Coding Potential: There is good coverage of the gene in both Self and Host-trained GeneMark. There is only coding potential in a single forward ORF. Therefore, the gene is forward /note=SD (Final) Score:SD Score is -3.673. It is the second best compared to start 61152. The Z-score is 2.927. It is the best Z-score /note=Gap/overlap: The overlap is 4 bp. It is possibly part of an operon /note=Phamerator: Pham: 11836. Date: 10/29/24. Gene is conserved; found in Aleemily, Cafasso, Morgana, and ObLaDi (all DZ). /note=Starterator: Start site 9 in starterator was manually annotated in 4 out of 4 non-draft genes in this pham. Start site 9 in ModicumRichard is 61152. The start site goes against what Glimmer and GeneMark called /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 61152. Despite Glimmer and GeneMark saying that 61140 is the starting site, the overlap is too big and Starteror with 4 out of 4 non-draft genes saying that the start site is at 61152. /note=Function call: hypothetical protein. The top 4 phagesdb BLAST call for unknown function (e-value<3e-57). The top 4 NCBI BLAST results call for hypothetical protein (e-value <7.32256e-71). No conclusive results from HHpred as the lowest e-value is 11. No results on CDD. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Nuckowski, Sophie /note=Secondary Annotator QC: I agree with this annotation, and with the decision to manually annotate the start site to 61152 rather than keep it at 61140; aside from the slight decrease in overlap, the new start site aligns with the manually annotated phages, as seen on starterator. /note=Tertiary Annotator Name: Bouklas, Tejas /note=Tertiary Annotator QC: I agree with your function call. Please revise supporting evidence checks to only 2 pieces for this and any other primary genes. CDS 61508 - 61732 /gene="109" /product="gp109" /function="hypothetical protein" /locus tag="ModicumRichard_109" /note=Original Glimmer call @bp 61508 has strength 13.87; Genemark calls start at 61508 /note=SSC: 61508-61732 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_OBLADI_108 [Gordonia phage ObLaDi]],,NCBI, q1:s1 97.2973% 6.07559E-45 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.739, -5.340269015678895, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_OBLADI_108 [Gordonia phage ObLaDi]],,UXE03831,100.0,6.07559E-45 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Echeverria, Alondra /note=Auto-annotation: Glimmer and GeneMark both call 61508. /note=Coding Potential: The ORF does have a reasonable coding potential and the chosen start site does include all the coding potential. /note=SD (Final) Score: The final score is the best option at -3.752 and the z score is the highest at 2.514. /note=Gap/overlap: 41bp. Somewhat small, the gap is conserved in other phages (Aleemily,Cafasso) and there is no coding potential in the gap that might be a new gene. /note=Phamerator: 11059. Date 10/02/2019. It is conserved; found in Aleemily and Cafasso.Starterator: Start site 4 in Starterator was manually annotated in 4/4 non-draft genes in this pham. Start 4 is 61508 in ModicumRichard. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 61508. /note=Function call: No function called. The top two BLASTS results do not have a function call so at this moment there is an unknown function. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Burgess, Brooklyn /note=Secondary Annotator QC: I agree with the annotations. All of the evidence categories have been considered. CDS 61729 - 62094 /gene="110" /product="gp110" /function="hypothetical protein" /locus tag="ModicumRichard_110" /note=Original Glimmer call @bp 61729 has strength 13.25; Genemark calls start at 61729 /note=SSC: 61729-62094 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_OBLADI_109 [Gordonia phage ObLaDi]],,NCBI, q1:s1 100.0% 1.36414E-75 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.927, -2.9063687850157054, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_OBLADI_109 [Gordonia phage ObLaDi]],,UXE03832,99.1667,1.36414E-75 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Jana, Pranava /note=Auto-annotation: Glimmer and Genemark agree that the start site is @ 61729. /note=Coding Potential: All of the Coding Potential of the ORF is present in the 1st forward reading frame with none in any other reading frames. Glimmer and Genemark both agree with near identical coding potentials. /note=SD (Final) Score: Lowest (best) SD of -2.906, with a good z-score of 2.927. /note=Gap/overlap: -4bp. Very reasonable, especially since the gap is conserved in other phages (ObLaDi, Morgana, Cafasso, Aleemily). /note=Phamerator: Pham 11358 as of Nov 11, 2024. Conserved; Gene 110 displays very strong synteny with ObLaDi (DZ), Morgana (DZ), Cafasso (DZ), and Aleemily (DZ). /note=Starterator: As of Nov 11, 2024, there are only 4 non-draft members of this pham. 3/4 non-draft members call start site 2, which corresponds to a start site @ 61729 for ModicumRichard. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 61729 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: Multiple phagesDB BLAST has hits with no suggested function with small e-values of 1e-60 to 2e-58 and 4/4 top NCBI BLAST hits with NKF (98.3333% probability, 100% coverage, and e-value of 1.36414e-75). HHPRED and CDD have no significant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein, but the protein is intracellular and a “globular protein.” /note=Secondary Annotator Name: Piao, Sara /note=Secondary Annotator QC: I have QC’ed this location and function call (NKF) and agree with the first annotator. Make sure to delete the text in the synteny box since your gene is NKF. CDS 62112 - 62306 /gene="111" /product="gp111" /function="hypothetical protein" /locus tag="ModicumRichard_111" /note=Original Glimmer call @bp 62112 has strength 12.88; Genemark calls start at 62091 /note=SSC: 62112-62306 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_111 [Gordonia phage Cafasso] ],,NCBI, q1:s8 100.0% 6.46337E-32 GAP: 17 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.199, -5.085091778133397, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_111 [Gordonia phage Cafasso] ],,QXN74326,80.2817,6.46337E-32 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Mastro, David /note=Auto-annotation: Glimmer calls at 62112. GeneMark calls at 62091. /note=Coding Potential: Coding potential is seen across the entire range (62091-62306) in the appropriate direction (Forward) only. This is shown in both GeneMark Self and Host. /note=SD (Final) Score: Final score: -5.085 (Z score: 2.199). This is the best final score, providing strong evidence for stop site. /note=Gap/overlap: Gap is 17bp. This is small and very reasonable /note=Phamerator: The Pham is 11195 (10/30/24). All other phages within its subcluster have this pham, giving a strong answer. /note=Starterator: This calls start site 3 at 62112bp. The most common start site is 2 (at 62109bp in this phage); start site 2 is called within 4/4 other phages in its subcluster, providing strong evidence for start site 2. /note=Location call: While Glimmer and GeneMark differ, 62112 is the best location call due to it having the only z score greater than 2, as well as the strongest final score. However, 62109 has strong support by the starterator evidence, making this call a little bit difficult. /note=Function call: All three NCBI BLASTP hits called a protein with an unknown function (e value < 9e-23). 5/5 of the phagesDB BLASTP hits called a protein with unknown function (e value <9e-23). CCD gave no hits, and HHpred’s top hit was unclear as it did not give a clearly defined protein function (e value = 48). For this reason, we cannot make an inference about the function of the protein and instead declare it as No Known Function (NKF). /note=Transmembrane domains: DeepTMHMM predicts no transmembrane domains and instead predicts the protein exists within the membrane. Therefore, this is likely not a transmembrane protein. /note=Secondary Annotator Name: Sahagun, Diego /note=Secondary Annotator QC: 62112 has the best potential for being the start side as evidence from z and final scores both support it. Starterator data does not refute the site but suggests another start site although sticking to 62112 is the best case as it has more evidence. Only having significant results for BLAST the function call of No Known Function has the best chance of being correct. CDS 62299 - 62802 /gene="112" /product="gp112" /function="hypothetical protein" /locus tag="ModicumRichard_112" /note=Original Glimmer call @bp 62299 has strength 11.85; Genemark calls start at 62299 /note=SSC: 62299-62802 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_OBLADI_111 [Gordonia phage ObLaDi]],,NCBI, q1:s1 95.8084% 2.30568E-102 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.154, -4.411422009034556, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_OBLADI_111 [Gordonia phage ObLaDi]],,UXE03834,92.2156,2.30568E-102 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Ushaswini Namburu /note=Auto-annotation: Glimmer and Genemark. Both call the start @62299. /note=Coding Potential: The coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Good coding potential is observed in Genemark Self and Host. /note=SD (Final) Score: The final score is one of the best option at -4.411 and the z-score is 2.154, which is significant. /note=Gap/overlap: The overlap is small with an 8 bp overlap. /note=Phamerator: Pham 10691. Date 10/29/24. It is conserved, found in Aleemily_110, Cafasso_112, ModicumRichard_112, Morgana_119, ObLaDi_111 (DZ) /note=Starterator: Start site 1 was manually annotated for 4/4 non-draft genes in this pham. Start 1 is 62299 for this gene. /note=Location call: Considering all the evidence, this gene is a real gene and has a start site at 62299. This agrees with Glimmer and Genemark, which call the start site at the same place. /note=Function call: The top two phagesdb BLAST hits are NKF (E-value <10^-38), and 2 out of 5 top NCBI BLAST hits also have NKF. (100% coverage, 89%+ identity, and E-value <10^-78). There were no significant hits for CDD and HHpred, HHpred hits had low coverage and high (nonsignificant) E-values. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is likely not a membrane protein. /note=Secondary Annotator Name: Zhang, Kevin /note=Secondary Annotator QC: The annotation and location call looks good. The evidence of Glimmer, GeneMark, Starterator, and Phamerator looks good. /note=Function call. phagesDB and NCBI BLAST look good. maybe expand upon why hhpred results were not signifiant. CDS 62803 - 62973 /gene="113" /product="gp113" /function="hypothetical protein" /locus tag="ModicumRichard_113" /note=Original Glimmer call @bp 62803 has strength 9.83; Genemark calls start at 62803 /note=SSC: 62803-62973 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_113 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 1.26119E-31 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.267, -6.166667396375102, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_113 [Gordonia phage Cafasso] ],,QXN74328,100.0,1.26119E-31 SIF-HHPRED: SIF-Syn: Protein with no known function upstream pham number for ObLaDi is 7332 and downstream is 10691. Upstream for Morgana is 7332 and downstream is 11195. /note=Primary Annotator Name: Aguilar, Xeleste /note=Auto-annotation: Glimmer and GeneMark, both call for start at 62803 /note=Coding Potential: Yes the gene has coding potential within the ORF reverse strand. Both the GeneMark Self and Host demonstrate the same reverse strand coding potential, there are no visible discrepancies. /note=SD (Final) Score: -6.167 /note=Gap/overlap: 0 /note=Phamerator: pham: 9819, Date: 10/30/24, Conserved and found in phages Cafasso and Aleemily. /note=Starterator: Start site 2 in Starterator was manually annotated in 4/5 non-draft genes. Start site 2 is 62803. /note=Location call: Based on the evidence collected, the start site of this gene is 62803. /note=Function call: NKF (all the BLAST hits that have high similarity with this gene all have unknown functions, there were no NCBI hits, and all HHpred hits had a probability of under 70% with high e scores) /note=Transmembrane domains: DeepTMHMM predicted 0 TMDs, so it is not a membrane protein. /note=Secondary Annotator Name: Echeverria, Alondra /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS complement (62970 - 63365) /gene="114" /product="gp114" /function="hypothetical protein" /locus tag="ModicumRichard_114" /note=Original Glimmer call @bp 63365 has strength 11.49; Genemark calls start at 63365 /note=SSC: 63365-62970 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_114 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 1.52554E-87 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 0.48, -7.77805343054184, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_114 [Gordonia phage Cafasso] ],,QXN74329,99.2366,1.52554E-87 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sarahi Jasso /note=Auto-annotation: Yes both glimmer and genemark called the same start site. /note=Coding Potential: Yes both displayed good coding potential /note=SD (Final) Score: -7.778 /note=Gap/overlap: -4 /note=Phamerator: Had pham of 7332. /note=Starterator: Had a starterator number of 3, which was the most conserved of the members in that pham. /note=Location call: Initial start sites agree on Glimmer, and Genemark, as well as when looking at the starterator. They all called for the start site of 63365 bp /note=Function call: No known function /note=Transmembrane domains: 0 transmembrane domains /note=Secondary Annotator Name: Jana, Pranava /note=Secondary Annotator QC: Try to include z-score. Also specify if coding potential is in the forward or reverse strand. I have QC’ed this location and function call (NKF) and agree with the first annotator. Make sure to check phagesDB BLAST and NCBI BLAST hits CDS complement (63362 - 63607) /gene="115" /product="gp115" /function="hypothetical protein" /locus tag="ModicumRichard_115" /note=Original Glimmer call @bp 63607 has strength 7.24; Genemark calls start at 63610 /note=SSC: 63607-63362 CP: no SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_115 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 1.15913E-46 GAP: 66 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.927, -2.827683592113848, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_115 [Gordonia phage Cafasso] ],,QXN74330,95.0617,1.15913E-46 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name:Mathew, Mallika /note=Auto-annotation: Glimmer and GeneMark. They both called different start sites. Glimmer called start site 63607 while GeneMark called start site 63610 /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -2.828 referring to the 63607 start position is the best final score /note=Gap/overlap: 63 bp gap vs 66 bp gap indicating that there is a 1 codon difference between the two start sites. /note=Phamerator: Pham 109332 Date 10/30/2024. It is conserved, found in ObLaDi and Cafasso /note=Starterator: Start site 115 (position 63607) in Starterator was manually annotated in 2/2 non-draft genes in this pham. This evidence agrees with the site predicted by Glimmer. /note=Location call: The gene is real and starts at 63607 since this site is called by Glimmer which is more reliable than GeneMark, it has a better final score, good z-score over 2, and it the most annotated site which is conserved in non-draft genes. /note=Function call: The BLASTp data was poorly aligned, CDD did not give any hits and the hits on HHpred were not very strong so I would conclude that this ORF has NKF. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name:Mastro, David /note=Secondary Annotator QC: Mention what bp call is in "auto-annotation" field. Function looks good, I agree with NKF due to poor hits - could elaborate more on why it should be NKF. Evidence checked off. CDS complement (63674 - 64867) /gene="116" /product="gp116" /function="ADP-ribosyl glycohydrolase" /locus tag="ModicumRichard_116" /note=Original Glimmer call @bp 64867 has strength 14.21; Genemark calls start at 64867 /note=SSC: 64867-63674 CP: yes SCS: both ST: SS BLAST-Start: [ADP-ribosylglycohydrolase [Gordonia phage ObLaDi]],,NCBI, q1:s1 98.9924% 0.0 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.759, -3.7606820078281067, no F: ADP-ribosyl glycohydrolase SIF-BLAST: ,,[ADP-ribosylglycohydrolase [Gordonia phage ObLaDi]],,UXE03838,97.2637,0.0 SIF-HHPRED: ADP-ribosyl-(Dinitrogen reductase) hydrolase; Toxin, Immunity, Type VI secretion, ADP ribosyl transferase; HET: MSE; 1.8A {Serratia proteamaculans},,,6DRE_A,84.3829,99.9 SIF-Syn: ADP -ribosyl glycohydrolase, upstream gene is NKF (pham 193785), downstream gene is NKF (pham 109332), just like in phage ObLaDi /note=Primary Annotator Name: Sridharan, Ananya /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 64867 /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.761 /note=Gap/overlap: -1 /note=Phamerator: Pham 192951 Date 10/18/2024. It is conserved; found in ObLaDi (DZ) and Cafasso (DZ). /note=Starterator: Start site 11 in Starterator was manually annotated in non-draft genes in this pham. This evidence agrees with the site predicted by Glimmer /note=Location call: The gene is real and starts at 64867 /note=Function call: ADP-ribosyl glycohydrolase /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Namburu, Ushaswini /note=Secondary Annotator QC: Could expand on significance of Gap of -1. Expand on function call and and reference blast and HHpred data, as you have it checkmarked correctly. CDS complement (64867 - 65073) /gene="117" /product="gp117" /function="hypothetical protein" /locus tag="ModicumRichard_117" /note=Original Glimmer call @bp 65073 has strength 10.29; Genemark calls start at 65073 /note=SSC: 65073-64867 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_MORGANA_124 [Gordonia phage Morgana]],,NCBI, q1:s1 100.0% 5.34633E-38 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.272, -5.208452476010064, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_MORGANA_124 [Gordonia phage Morgana]],,XAO35558,94.1176,5.34633E-38 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Carrillo, Daniel /note=Auto-annotation: Glimmer and Genemark both call the start at 65073. /note=Coding Potential: Sufficient coding potential is found on the third ORF of the reverse strand, indicating this is a reverse gene. Both self-trained and host-trained GeneMarks demonstrated adequate coding potential. /note=SD (Final) Score: The suggested final score of -5.208 is the best with a Z-value of 2.272 being the highest. /note=Gap/overlap: There is 1 bp overlap present which indicates that this gene must be part of an operon. /note=Phamerator: This gene is found in Pham: 193785 as of 11/12//2024. This gene is conserved in phages ObLaDi, Cafasso, Aleemily, and Morgana. /note=Starterator: Start site 58 is conserved across 4 other phages of the same subcluster DZ. It was manually annotated in 4 other phages of the same subcluster. Start site 58 corresponds with position 65073 in ModicumRichard. This agrees with Glimmer and GeneMark. /note=Location call: Based on the evidence, this gene is real and has a start site of 65073, agreeing with Glimmer and GeneMark. /note=Function call: Unknown function. All top “hits” in BLASTp and NCBI BLASTp do not have a function assigned to them while displaying significant E-values between 2e-27 to 5e-38. Furthermore there were no CDD hits and hits received from HHpred were likely random given the higher E-values (>0.018) and lower coverages (<45.5882%). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore the protein is not a membrane protein. /note=Secondary Annotator Name: Aguilar, Xeleste /note=Secondary Annotator QC: Everything appears to be correct, please make sure to review the information above since someone else accidentally annotated your gene. Also please make sure that the coding potential and starterator are selected correctly as reflected in your own analysis. I agree with the first annotator on gene location. /note=Secondary Annotator QC 11/15: Everything looks great in the notes and PEECAN is up to date! Only thing that needs to be addressed is the Synteny box below. Annotation module says the box should be left blank if the function of the gene is unknown, this is the only correction that should be made. There also appears to be nothing checked off for the blasts, suggesting that PhagesDB blast and NCBI blast does not hit evidence, I believe that selecting Cafasso and ObLaDi for PhagesDB blast serves as evidence for the NKF. I agree that there is no need to select for any HHPred hits. CDS complement (65073 - 65198) /gene="118" /product="gp118" /function="hypothetical protein" /locus tag="ModicumRichard_118" /note=Original Glimmer call @bp 65198 has strength 17.18; Genemark calls start at 65198 /note=SSC: 65198-65073 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_CAFASSO_118 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 1.61538E-19 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.013, -6.827750065480549, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_118 [Gordonia phage Cafasso] ],,QXN74333,100.0,1.61538E-19 SIF-HHPRED: d.159.1.6 (A:) Hypothetical protein TT1561 {Thermus thermophilus [TaxId: 274]},,,d1uf3a_,41.4634,79.8 SIF-Syn: /note=Primary Annotator Name: Gutierrez, Michelle /note=Auto-annotation: Both the GeneMark start and Glimmer start recorded the start codon at 65198. /note=Coding Potential: There is coding potential in the reverse strand and the markers are able to demonstrate such. /note=SD (Final) Score: -6.828 /note=Gap/overlap: -4 /note=Phamerator: My gene was found on 192261 and was found on 10/29/2024. The pham has 307 members, none of them are drafts. Yes, all the members in this pham are a part of the same cluster/subcluster. I used ObLaDi and Casso as comparison phages. /note=Starterator: The start site is reasonable and conserved across all phages in the pham. Start site called is 3. Pham number 11255 has 5 members and 1 is a draft. Start site 3 was found in 5/5 of the genes in the pham. Start 3 in ModicumRichard is conserved. The evidence agrees with the site predicted by Glimmer and Genemark. /note=Location call: Based on the evidence above, this is a real gene and most likely the start site is 65198bp. /note=Function call: Function call:NFK. There were no hits in CCD and the NCBI Blast results were all with hypothetical function but not determined functions. Phages DB had hits but all were noted as unknown functions as well, (very high e-values, low coverage, and low probability). This also goes for any of the phages that are in the same pham and cluster. I.e. Calfasso and ObLaDi /note=Transmembrane domains: none. /note=Secondary Annotator Name: Jasso, Sarahi /note=Secondary Annotator QC: Update pecan notes on notebook as well. For the location call, explain the actual start site and how it was proved using the evidence. Also for the starterator, include that it had start site number 3. CDS complement (65195 - 65407) /gene="119" /product="gp119" /function="hypothetical protein" /locus tag="ModicumRichard_119" /note=Original Glimmer call @bp 65407 has strength 10.5; Genemark calls start at 65407 /note=SSC: 65407-65195 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_119 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 2.21704E-43 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.993, -4.820269098574683, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_119 [Gordonia phage Cafasso] ],,QXN74334,100.0,2.21704E-43 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Scheithauer, Julia /note=Auto-annotation: Both GeneMark and Glimmer agree that the gene starts at 65407. ATG is the start codon. /note=Coding Potential: Coding potential is found on the reverse strand only, suggesting this is a reverse gene. Coding potential was found on both GeneMark and host. /note=SD (Final) Score: -4.82. This is not the best SD score that PECAAN reported for all of the start sites, but is still reasonable. /note=Gap/overlap: -4. This suggests that it is part of an operon. The gene is 213 bp long. /note=Phamerator: Pham 11438, 10/29/2024. This Pham is conserved among the other phages in the cluster DZ, and I used Aleemily and Cafasso as comparison. The other phages in this pham also have NKF as the function. /note=Starterator: Yes, the start site is 65407 bp. This is the most called start site. It is start site #3, found in 5/5 phages in the pham. It is conserved among the members of the pham and in the cluster. /note=Location call: This seems to be a real gene, due to the good coding potential and the synteny observed in other finalized phages. Starterator also agrees with this start site, and it is the same as the other phages in the cluster DZ. The start site appears to be at 65407 /note=Function call: I believe that at this time, NKF is the best we can do because of the lack of quality hits from HHpred and CDD (very high e-values, low coverage, and low probability), and the other phages in the genome from PhagesDB also have genes here with NKF with very high e-values. /note=Transmembrane domains: none. /note=Secondary Annotator Name: Mathew, Mallika /note=Secondary Annotator QC: I agree with the location call made as the data from starterator and phamerator as well as the final score support this location call. Add evidence for starterator and phamerator. I agree with the function call and supporting evidence. CDS complement (65404 - 65715) /gene="120" /product="gp120" /function="hypothetical protein" /locus tag="ModicumRichard_120" /note=Original Glimmer call @bp 65718 has strength 13.98; Genemark calls start at 65718 /note=SSC: 65715-65404 CP: yes SCS: both-cs ST: NI BLAST-Start: [hypothetical protein SEA_CAFASSO_120 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 8.99391E-67 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.177, -3.16089827114923, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_120 [Gordonia phage Cafasso] ],,QXN74335,100.0,8.99391E-67 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Grunski, Lilly /note=Auto-annotation: Glimmer and Genemark agree on start site: 65718 /note=Coding Potential: Yes, both the self-trained and host-trained showed the same coding potential across the same regions. The ORF has 1 stop codon in the expected region that includes all of the coding potential. /note=SD (Final) Score: the second best option at -3.161 with a z-score of 3.177, identical to the best SD score option. /note=Gap/overlap: -1 which is characteristic of an operon /note=Phamerator: 10497 (10/29/24) It is conserved; found in Aleemy (DZ), Cafasso (DZ), Morgana (DZ), and ObLaDi (DZ) /note=Starterator: Start site 2 in starterator was manually annotated in 4/4 non-draft genes in the pham. This does not agree with the autoannotated start site which is 1, @65718 while the “Most Annotated” is Start: 2 @65715 which has 4 MA`s /note=Location call: Start: 2 @65715 has 4 MA`s /note=Function call: NKF, Yes, this amino acid sequence has no known function as both CDD and HHpred did not return any feasible function calls. NCBI BLAST and phagesDB BLAST also did not yield any known function hits, only high homology with other NKF sequences from phages in the same cluster. /note=Transmembrane domains: None /note=Secondary Annotator Name: Sridharan, Ananya /note=Secondary Annotator QC: Good descriptions and results CDS complement (65715 - 66224) /gene="121" /product="gp121" /function="hypothetical protein" /locus tag="ModicumRichard_121" /note=Original Glimmer call @bp 66224 has strength 11.58; Genemark calls start at 66200 /note=SSC: 66224-65715 CP: yes SCS: both-gl ST: SS BLAST-Start: [membrane protein [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 9.84387E-113 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.565, -3.586741874491512, yes F: hypothetical protein SIF-BLAST: ,,[membrane protein [Gordonia phage Cafasso] ],,QXN74336,100.0,9.84387E-113 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Gonzalez Hernandez, Yalit /note=Auto-annotation: Genemark and Glimmer Start are different. Glimmer claims the start to be at 66,224. GeneMark Start claims the start to be at 66,224. The difference is 24 bp, which is negligible. Start codon is ATG, if the start is noted at 66,224 and stop at 65,715. /note=Coding Potential: The gene does have good coding potential in the Host and Self Trained Genemarks. When you look at both Genemarks, the 65,700 region, there’s a downwards tick and there’s an upwards tick at 66,200 regions. This corresponds to the reverse orientation of the gene. No switch or interference is observed in the forward gene section (first three rows). Start and stop encompass all of the coding potential /note=SD (Final) Score: -3.587 (best, highest score); Z-score (2.565) is above 2, which is ideal. /note=Gap/overlap: -1 is ideal and marks evidence for the gene to be organized into an operon and could transcribe for a polycistronic. The gene is 510 base pairs long, and is appropriate for the suggested area the start and stop site encompass. /note=Phamerator: The gene on 11/2/24 was found to be in pham number: 192211. The gene seems to be high conserved in CA and A clusters, in addition to 4 other phages in the DZ cluster. In the DZ cluster, all share the same start site of 49 (Aleemily_119, Cafasso_121, OblaDi_120, and Morgana_128). Function is unknown in the DZ cluster, but is labeled to be an excise protein in other cluster groups. For example, Edison31_35 in cluster A10 is labeled to be an excise protein and has the same start site of 49, but has a different start bp number. /note=Starterator: Starterator seems to state that the most conserved start site for the gene is 48, but the DZ cluster shares a start different from other genes in a different clusters that have a start at 49 . Also, the length of the gene is conserved throughout the DZ cluster at 510 bp. /note=Location call: Yes, my notes show that this is a real gene and that it has good coding potential at start site 49; at around 66,224 bp. Staterator was helpful in determining the start site, showing that start site 49 is conserved in the DZ subcluster, 55 other phage genomes with Manual Annotations.Start and stop encompass all of the coding potential and is seen in Host and Self trained Genemarks. Shows evidence of reverse orientation of the gene. No switch or interference is observed in the forward gene section (first three rows), and no genes need to be inserted/deleted around my gene. /note=Functional call: The gene shows 100% identity with two genes from phages in the DZ cluster of Gordonia rubripertincta (Gordonia phage ObLaDi_120 and Cafasso_121), both annotated with unknown functions. These genes have e-values of 7e-92, indicating a highly significant match. The gene seems to be conserved across multiple phages in the same and different clusters, suggesting a critical role in the phage life cycle. A secondary hit to an excisionase from Mycobacterium phage UnionJack (YP_009198804.1) further supports the potential function, although the identity is lower (46%) and the e-value is also lower (6e-23). HHpred results show the presence of transmembrane domains and signaling peptides that add complexity to the function call. The gene is located near a DNA binding protein, which aligns with the typical role of excisionase proteins working with DNA binding proteins in genome recombination. Despite this, the presence of transmembrane domains complicates the prediction, and the gene remains classified as No Known Function (NKF) /note=Transmembrane domains: The DeepTMHMM analysis predicts 4 transmembrane domains (TMDs) in the gene’s protein sequence, spanning amino acids 9-24, 44-64, 69-89, and 97-116. This suggests the gene encodes a membrane-associated protein, potentially involved in interactions at the host membrane. The TMDs meet SEA-PHAGES` criteria for membrane proteins, which require at least one TMD around 17-22 amino acids in length. BLAST results also indicate that both Cafasso_121 and ObLaDi_120, the genes with 100% identity to this gene, are predicted to be membrane proteins, adding further support to the membrane association. However, this prediction conflicts with the excisionase hypothesis. The presence of TMDs suggests the gene may function in a dual role, potentially involved in both genome excision and membrane-associated processes. Given the conflicting evidence from TMD prediction and excisionase-like hits, the gene remains classified as NKF until further functional validation can clarify its role. /note=Secondary Annotator Name: Carrillo, Daniel /note=Secondary Annotator QC: Update NB and Phamerator and Starterator. This is the incorrect gene annotations. 11/16 - Update - Phamerator and Starterator. Also update your function call and assess Transmembrane domains. CDS complement (66224 - 66364) /gene="122" /product="gp122" /function="DNA binding protein" /locus tag="ModicumRichard_122" /note=Original Glimmer call @bp 66364 has strength 8.63; Genemark calls start at 66328 /note=SSC: 66364-66224 CP: yes SCS: both-gl ST: SS BLAST-Start: [DNA binding protein [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 4.41181E-23 GAP: 81 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.177, -2.394485424036831, yes F: DNA binding protein SIF-BLAST: ,,[DNA binding protein [Gordonia phage Cafasso] ],,QXN74337,100.0,4.41181E-23 SIF-HHPRED: Adventurous gliding motility protein GltJ; protein, motility, GYF domain, myxococcus xanthus, adventurous motility, gliding, focal adhesion complex, adhesion, regulatory domain, myxobacteria; HET: ZN; NMR {Myxococcus xanthus},,,7ZOK_A,86.9565,94.0 SIF-Syn: This gene is likely a DNA binding protein. The sequences both upstream and downstream are reverse NKF. However there is significant synteny with the gene ObLaDi with an identity score of 96% and low e-value of 4e-20. /note=Primary Annotator Name: Pimentel, Patricia /note=Auto-annotation: Glimmer and GeneMark both call the gene to start at different locations. Glimmer calls the gene to start at 66354. GeneMark calls the gene to start at 66328. /note=Coding Potential: This gene has coding potential in the ORF in the reverse strand only. This suggests that it is a reverse gene. The coding potential on both the Self and Host-Trained is high and is all included within the region of the suggested start and stop site. /note=SD (Final) Score: The final score of the best start site is -2.934. This is the best and most optimal start, as the other genes candidates in the sequence have z-scores that are lower than the 2.000 threshold and have smaller (more negative) final scores. /note=Gap/overlap: The gap is 81 which suggests that the gene indicates a dna binding protein. /note=Phamerator: As of 11/03/2024, the pham is 11453. This pham has 5 members, one of which is a draft, which are all in the same cluster (DZ). /note=Starterator: The auto-annotated start site is site 1, which corresponds to the position of 66354 (same as Glimmer and GeneMark). This is also the most-annotated start site. /note=Location call: Given the above evidence, this gene is real and likely has a start site at 66354. /note=Function call: dna binding protein, this call is made based on the highly congruous and high-scoring hits from Phages db Blast and HHpred, as well as NCBI blast., which all point towards the likely assumption that this is a dna binding protein, /note=Transmembrane domains: 0. This makes sense because some dna binding proteins are transmembrane while others are not considered to be transmembrane. /note=Secondary Annotator Name: Guiterrez, Michelle /note=Secondary Annotator QC: It all looks good and supported and I think it is really cool tou were able to get functions. THe evidence that really helps for me is the function frequency across the board . 11/19 CDS complement (66446 - 66841) /gene="123" /product="gp123" /function="hypothetical protein" /locus tag="ModicumRichard_123" /note=Original Glimmer call @bp 66841 has strength 19.87; Genemark calls start at 66841 /note=SSC: 66841-66446 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_123 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 1.53456E-89 GAP: 75 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.067, -2.4787699911121788, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_123 [Gordonia phage Cafasso] ],,QXN74338,100.0,1.53456E-89 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Khachaturov, Allyson /note=Auto-annotation: High coding potential due to agreement on the start site with both Glimmer and GeneMark. /note=Coding Potential: From the start site to stop, GeneMark shows parts of the operon that are more codable than others due to the increased level of dips in in the gene report. This could indicate that some parts are more codable than others or that the gene has an incorrect start site. However, when looking at the GeneMarkS on PhagesDB.org, there is high coding potential directly in the dips present in the GeneMark indicating that the entirety of the gene has high coding potential. This gene does not display any syteny with phages. There are many possibilities as a result of these findings. This could be a gene unique to the phage and could have been acquired from a bacteria during lysis throughout its evolution. Overall, it is a real gene and does not need to be deleted. It`s start codon is ATG and has a high probability of being the start due to statistics given by PECAAN. This gene also shares the same start with other genes which is at start 26. It shares its start with other members of its cluster including Cafasso and ObLaDi. Does not share the most annotated start with other genes, however it shares its start with many other phages within its clusters. Additionally, this gene does not display synteny with other members within its cluster which is concerning for its legitimacy as a gene or could be a novel gene unique to this phage. /note=SD (Final) Score: -2.479. Both the SD final score and the Z score fall within the metrics to indicate that the gene has a high coding potential. /note=Gap/overlap: 75. Indicates that start can be changed, however due to the high coding potential it does not need to be changed. /note=Phamerator: Found in pham 904. According to the pham, it is not the most conserved. It was compared with the non-draft phage ObLaDi and Cafasso. /note=Starterator: According to the starterator, ModicumRichard has a start at position 26 which is found in 5 of the 145 phages this gene was found in. All of the shared phages are also the most closely blasted to ModicumRichard. This indicates high synteny despite the pham maps. /note=Location call: Overall, I would agree with the location call at 66,841 due to the high amount of synteny in the start sites and the other statistics discussed above. /note=Function call: This gene has no function call according to PhagesDB.org. /note=Secondary Annotator Name: Scheithauer, Julia /note=Secondary Annotator QC: I would ask the instructional team if you haven`t yet if that is enough evidence to call the function. Maybe go to the SEAPHAGES approved function, and use the example phage for the function and see if there is any synteny/match in HHpred when you match the two up. Also add the bp for the start site for the starterator as per the annotation guideline notes! /note=Based off PhagesDb and NCBI, there is not enough evidences to fully call the function. Both state that the function of the gene is that it is either a hypothetical protein or its function is unknown. However, the highest hits are consistent within the phages that have the most synteny with ModicumRichard. In addition to this, the e-values that are displayed by both softwares are low enough that it could be possible that this is the correct function. /note=Function call: No known function /note=CDD: No function hit found. This shows further evidence that the gene may have no known function /note=HHpred: On HHpred, the Pfam hit shows that this gene could be a mycobacterium membrane protein. The e-value is low enough (3e-16) to be considered plausible; however, considering the other evidence from other bioinformatical softwares. I would not consider this to be enough evidence to show the function of the gene. /note=TMHMM: According to TMHMM, this gene has the possibility to be a signal/outside protein which is displayed intis graphs. The protein sequence only generated one hit that stated it has 0 TMRs. This is not enough conclusive evidence to justify a specific function. This further shows that the function of this gene is unknown. /note=Function Call: NKF /note=Transmembrane Domains: 0 CDS complement (66917 - 67321) /gene="124" /product="gp124" /function="hypothetical protein" /locus tag="ModicumRichard_124" /note=Original Glimmer call @bp 67321 has strength 9.67; Genemark calls start at 67321 /note=SSC: 67321-66917 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_124 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 1.33418E-90 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.022, -5.271243305869519, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_124 [Gordonia phage Cafasso] ],,QXN74339,100.0,1.33418E-90 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: LIEU, JADELIEN /note=Auto-annotation: Glimmer and GeneMark. Both attribute 67321 as the “start” site. /note=Coding Potential: Consistent coding potential on the reverse strand for both the host-trained GeneMark and self-train GeneMark. /note=SD (Final) Score: -5.271. It is the fifth best final score from the list on PECAAN. /note=Gap/overlap: Overlap of -4. This indicates that the gene likely contains an operon. /note=Phamerator: Pham number is 12194 as of 10/31/24. Gene is conserved in other phages in the pham, such as Aleemily, Cafasso, and Morgana. Function call may be Photosystem I accessory protein E. /note=Starterator: There are 4 non-draft members of this Pham. All 4 call this start site 2, correlated to start site 67321 in ModicumRichard. /note=Location call: This is likely a real gene with a start site of 67321. Auto and manual annotation both agree on the start site and are supported by the evidence above. /note=Function call: Function is unknown, with no hits on CDD nor significant hits on HHpred (77.15 probability and lower, e values 12 and higher) /note=Transmembrane domains: DeepTMHMM predicts no TMDs; only one inside domain. /note=Secondary Annotator Name: Lilly Grunski /note=Secondary Annotator QC: I agree with this annotation as all of the relevant evidence categories have been addressed CDS complement (67318 - 67704) /gene="125" /product="gp125" /function="hypothetical protein" /locus tag="ModicumRichard_125" /note=Original Glimmer call @bp 67704 has strength 12.23; Genemark calls start at 67704 /note=SSC: 67704-67318 CP: yes SCS: both ST: SS BLAST-Start: [DUF3846 domain-containing protein [Agrococcus casei] ],,NCBI, q2:s3 79.6875% 2.6886E-12 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.66, -5.501485936053402, no F: hypothetical protein SIF-BLAST: ,,[DUF3846 domain-containing protein [Agrococcus casei] ],,WP_159456897,46.7626,2.6886E-12 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Ornelas, Jessica /note=Auto-annotation: Auto-annotation was provided for both Glimmer and Genemark, and they both agreed on a start site of 67704. This is associated with the start codon ATG. /note=Coding Potential: The gene does have reasonable coding potential predicted within the putative ORF. The chosen start site covers all this coding potential based on the guiding principles. Coding potential is on the reverse strand only, showing that it is a reverse gene. Coding potential is on both Self-Trained and Host-Trained Genemark. /note=SD (Final) Score: -5.501. It is not the best score out of 4 entries on PECAAN because it is not the least negative. It is the second least negative. However, the gene is most likely part of an operon. Z-score was 1.66 which was not the entry with the best score. Only one entry has a Z-score of 2 or higher. /note=Gap/overlap: -4 so overlap of 4 bp. The gene has an acceptable length of 387 bp. /note=Phamerator: The gene is part of pham 11725 as of 10/30/24. The pham my gene is conserved in is in other members of the DZ cluster. I used Aleemily_CDS_123, Cafasso_CDS_125, Morgana_CDS_123, and ObLaDi_CDS_124 as comparison. The phams database did not have a function called for this gene. /note=Starterator: Starterator shows a highly conserved start site, number 1 among the members of the pham. This start site corresponds to the basepair 67704. There are 5 members total according to phamDB, with 1 (ModicumRichard) being a draft. From Starterator, 4/4 non-draft genes call this site. Start 1 is 67704 in ModicumRichard. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence, my gene is a real gene as it is conserved in Starterator and has good coding potential. 67704 seems the most likely start site as it is conserved in Starterator and covers all coding potential. /note=Function call: NKF. Only one of the phages Db BLAST hits had a function of a major capsid protein. However, this one hit had a high e-value of 6.4. The sequence identity was 28%, which does not reach the requirement of being more than 35%. The other hit from phages Db BLAST had a high e-value of 9e-68 and a sequence identity of 100%. However, this hit had an unknown function. Almost all of the NCBI BLASTp hits were domain-containing proteins. They had high query covers (>79%) and low E-values of 4e-13 and 3e-12. The sequence identity in NCBI BLASTp was 36.59% and higher, reaching the requirement of being more than 35%. There were no significant CDD or HHpred hits as they were DUF. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Gonzalez Hernandez, Yalit /note=Secondary Annotator QC: i agree with the call to remain NFK, since there`s no significant BLAST, CDD, and HHPred results to conclude a function. /note=Tertiary Annotator Name: Bouklas, Tejas /note=Tertiary Annotator QC: Good work, Jessica. You may move on to the next module. Minor comment that when saying that something is not the best score, eg final score, to say it is the "x" best score out of "y" CDS complement (67701 - 67904) /gene="126" /product="gp126" /function="hypothetical protein" /locus tag="ModicumRichard_126" /note=Original Glimmer call @bp 67904 has strength 12.27; Genemark calls start at 67904 /note=SSC: 67904-67701 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ALEEMILY_124 [Gordonia phage Aleemily]],,NCBI, q1:s20 100.0% 1.91261E-40 GAP: 167 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.166, -2.356391881054909, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEEMILY_124 [Gordonia phage Aleemily]],,UVK59864,77.907,1.91261E-40 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: McFarlane, Kelly /note=Auto-annotation: Both Glimmer and GeneMark call the gene start site to be 67904. /note=Coding Potential: Coding potential on both Self and Host-Trained is high, only in the reverse direction, and is all included in the suggested start site. /note=SD (Final) Score: The final score for the 67904 start site is -2.356 and the Z-score is 3.166, which are the best values presented on PECAAN. /note=Gap/overlap: The gap between this gene and the previous gene is 167 bp. This is significant, but is because this gene is reverse and its previous gene is forward. A similar gap is also conserved among other non-draft phages. /note=Phamerator: As of 10/30/24, the pham is 10543. This pham has 5 members and is found in 4 other non-draft phages in the same cluster (DZ). /note=Starterator: The auto-annotated start site is 11 which corresponds to position 67904 (same as Glimmer and GeneMark). This is the most annotated start site and found in 3 other non-draft phages. /note=Location call: Based on this evidence, this is a real gene and the start site is most likely 67904. /note=Function call: NKF. PhagesDB has three hits with unknown functions with e-values of 3e-32, and NCBI has three hypothetical protein hits with e-values less than 10^-37. There are no hits on CDD and no top hits with lower e-values than 1 on HHpred. /note=Transmembrane domains: DeepTMHMM predicts that there are no transmembrane domains in this gene, with 100% probability, meaning this is not a membrane protein. /note=Secondary Annotator Name: Pimentel, Patricia /note=Secondary Annotator QC: With the limited data/evidence generated from CDD, HHpred, and NCBI--the call to function as NKF is reasonable. Great notes regardless! CDS 68072 - 68473 /gene="127" /product="gp127" /function="hypothetical protein" /locus tag="ModicumRichard_127" /note=Original Glimmer call @bp 68072 has strength 11.69; Genemark calls start at 68072 /note=SSC: 68072-68473 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ALEEMILY_125 [Gordonia phage Aleemily]],,NCBI, q1:s1 100.0% 7.8719E-92 GAP: 167 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.067, -2.5410833118725082, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEEMILY_125 [Gordonia phage Aleemily]],,UVK59865,100.0,7.8719E-92 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Lawson, Matthew /note=Auto-annotation: Glimmer and GeneMark both call the start position as being 68,072. /note=Coding Potential: Coding potential is found only on the forward strand in both host-trained and self-trained GeneMark, indicating that this is a forward gene. /note=SD (Final) Score: -2.541. This is the lowest final score of any proposed start site, making this site more likely. /note=Gap/overlap: +370 base pairs, which is very large but still reasonable because the gap is conserved in other phages (Aleemily) and there is no coding potential in the gap that might be a new gene. /note=Phamerator: 10,857. Date 10/29/24. It is conserved; found in Aleemily (AZ), Cafasso (AZ), Morgana (AZ), and ObLaDi(AZ). /note=Starterator: Start site 2 in Starterator was manually annotated in 4/4 non-draft genes in this pham. Start 2 is 68,072 in ModicumRichard. This evidence agrees with the site predicted by GeneMark and Glimmer. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 68,072. /note=Function call: function not able to be called yet, as all good hits are only function unknowns. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Khachaturov, Allyson /note=Secondary Annotator QC: Overall, the data is very reasonable towards the conclusion that you made. You addressed the fact that there is a large gap and show that is consistent with other phages within the cluster that ModicumRichard is in which shows synteny. I would elaborate a little more on the SD score since you did not provide any explanation of whether it is possible and contributes to your conclusion. If you add on to those, it will make the evidence towards your conclusion stronger. /note= /note=Function Call QC: Overall, based on the hits you received and the evidence you collected. I would agree with the function call. The hits you received were all consistent with the gene not having a known function; therefore, I would agree with your function call. CDS 68470 - 68796 /gene="128" /product="gp128" /function="hypothetical protein" /locus tag="ModicumRichard_128" /note=Original Glimmer call @bp 68470 has strength 12.13; Genemark calls start at 68470 /note=SSC: 68470-68796 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_128 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 3.38932E-73 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.921, -4.905314844093283, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_128 [Gordonia phage Cafasso] ],,QXN74343,100.0,3.38932E-73 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Muff, Margaret /note=Auto-annotation: Glimmer and GeneMark both called the gene and agree on the start site at 68470 bp. /note=Coding Potential: Coding potential shown as forward; both Self and Host trained GeneMarks found this coding potential, so it is likely a real, forward gene. /note=SD (Final) Score: -4.905; the least negative/best final score of the possible start sites. /note=Gap/overlap: -4 bp; a common overlap indicative of an operon present, conserved in ObLaDi and Aleemily as well. /note=Phamerator: pham 12239 as of 10/30/2024. This is conserved in Morgana, Cafasso, Aleemily, and ObLaDi (DZ). /note=Starterator: Start site 6 is conserved in four of four finalized phage genomes. Start site 6 is at 68470 bp in ModicumRichard, and this is the start that both Glimmer and GeneMark called. /note=Location call: Given the above evidence, the gene is real and most likely starts at 68470 bp. /note=Function call: NKF, there are no clear or obvious functions called in HHpred, CDD, or the phagesdb Blast (no good calls). /note=Transmembrane domains: 0 TMD, it is clear that the protein does not cross the membrane according to the deep TMHMM, the entire protein remains inside the membrane. This does not help us identify function. /note=Secondary Annotator Name: Lieu, Jadelien /note=Secondary Annotator QC: 1) Glimmer and GeneMark agree on the start site. Coding potential lines up for host- and self-trained GeneMarks. 2) SS on Starterator. Conserved in other pham members and displays synteny. 3) ATG start site and -4 overlap. 4) Z score and final score are not the best—1.352 and -6.053, respectively—but all other evidence supports. 5) Function call, evidence boxes, and drop-down menues quality check completed. Overall: Agree with the primary annotation. CDS 68796 - 68990 /gene="129" /product="gp129" /function="hypothetical protein" /locus tag="ModicumRichard_129" /note=Original Glimmer call @bp 68793 has strength 8.69; Genemark calls start at 68796 /note=SSC: 68796-68990 CP: yes SCS: both-gm ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_129 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 1.22929E-37 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.171, -4.455662434380964, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_129 [Gordonia phage Cafasso] ],,QXN74344,100.0,1.22929E-37 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Choudhary, Khushi /note=Auto-annotation: Start site for Glimmer is 68793 and start site for GeneMark is 68796. /note=Coding Potential: Some overlap in the frames for coding potential - starts in the middle of an ORF of the gene. /note=SD (Final) Score: Z-score of 2.171 and Final Score of -4.395, which is a strong Z-score value as it is higher than 2 and other options of either scores are lower than 2 and more negative than -4.395. /note=Gap/overlap: Overlap of 1bp which is not a significant overlap and reasonable. Length of 195 is also reasonable as the overlap is 1 with the previous gene. /note=Phamerator: The pham is 88916 with the run date of 18th October 2024. It is conserved in this phage and in ObLaDi, Cafasso, Aleemily, Morgana, all of which are in the DZ cluster. /note=Starterator: Start site 6 in Starterator was manually annotated in 4/4 non-draft genes in the pham. The start site for this gene is the 68796 start site or site 6 which is conserved in other genes from the cluster – as it also has an appropriate Z and final score and covers the coding potential. Genemark is consistent as it lists 68796. /note=Location call: Evidence suggests this is a real gene and this start site of 68796 seems the most likely according to previous evidence. 68796 seems the most likely start site as it is conserved in Starterator and covers all coding potential. /note=Function call: Predicted function is unknown, based on hits from NCBI BLASTp and PhagesDB BLASTp, both of which had 2 hits with high query coverage, high % identity, and low E-values (0) but it is not known what function is associated with the gene. No informative results on function from HHpred or CDD (there are hits but they are not significant - low e value). /note=Transmembrane domains: Unnamed number of predicted TMRs was 0. /note=Secondary Annotator Name: Ornelas, Jessica /note=Secondary Annotator QC: I have QC`ed this location call and do agree with the first annotator. I have QC`d this function call and agree with the first annotator. CDS 68968 - 69342 /gene="130" /product="gp130" /function="hypothetical protein" /locus tag="ModicumRichard_130" /note=Original Glimmer call @bp 68968 has strength 15.07; Genemark calls start at 68968 /note=SSC: 68968-69342 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ALEEMILY_128 [Gordonia phage Aleemily] ],,NCBI, q1:s1 100.0% 2.10945E-82 GAP: -23 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.939, -2.8030814177649614, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEEMILY_128 [Gordonia phage Aleemily] ],,UVK59868,100.0,2.10945E-82 SIF-HHPRED: SIF-Syn: N/A /note=Primary Annotator Name: Adamian, Andrew /note=Auto-annotation: Both Glimmer and GeneMark call and agree on a start at 68968. The glimmer score is 15.07 /note=Coding Potential: Gene seems to have decent coding potential, as there is a sharp upwards hash and a steep downwards hash at the predicted start and stop sites. /note=SD (Final) Score: Final score is -2.803, which is pretty reasonable. There are no other gene candidates to compare with however. /note=Gap/overlap: There is a 23 BP overlap, which is a little more than reasonably expected. /note=Phamerator: The gene is found in Pham number 12019 as of 10/31/2024. It is somewhat conserved within other members of the Pham, with similar locations with Cafasso. However, it has different locations/BP numbers compared to ObLaDi, Morgana, and Aleemily. /note=Starterator: The start site @ 68968 seems to be reasonable, as it is conserved amongst other members of the pham. It had the most called start site in the published annotations and the start site (@ 1) and had 4 MA`s. There are 4 non-draft members of this pham and they all call this start site. /note=Location call: Previous evidence taken from PECAAN, GeneMark, and Starterator all point at a start site of 68968. Additionally, this gene seems to be real, despite the slightly larger overlap. /note=Function call: There does not seem to be a clear function for this gene, as BLASTp, HHPred and CCD didn`t have definitive function calls. PhagesDB BLASTp calls hits for ObLaDi and Aleemily (both NKF, score 247, e-value 7e-66). NCBI BLASTp calls hits for Aleemily as well (score 248, e-value 2e-82, and124/124 identities) but also for Cafasso (score 245. e-value 2e-81, and 123/124 identities) but both are hypothetical proteins. /note=Transmembrane domains: There are no transmembrane domains. /note=Secondary Annotator Name: McFarlane, Kelly /note=Secondary Annotator QC: I agree with the called start site and the unknown function call. The function call section of PECAAN notes could be more detailed, could include specific numbers (e-values/probabilities). Also, should uncheck the HHpred hit that is selected since it has a high e-value and low probability (doesn’t meet requirements). Not sure about whether CDD hits should be checked or not since there is NKF, check with instructional team. For Transmembrane domain PECAAN notes, should state that there are no transmembrane domains. CDS 69425 - 69940 /gene="131" /product="gp131" /function="RuvC-like resolvase" /locus tag="ModicumRichard_131" /note=Original Glimmer call @bp 69479 has strength 10.74; Genemark calls start at 69425 /note=SSC: 69425-69940 CP: no SCS: both-gm ST: NI BLAST-Start: [RuvC-like resolvase [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 2.41347E-118 GAP: 82 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.161, -6.46371945197481, no F: RuvC-like resolvase SIF-BLAST: ,,[RuvC-like resolvase [Gordonia phage Cafasso] ],,QXN74346,100.0,2.41347E-118 SIF-HHPRED: c.55.3.6 (A:) RuvC resolvase {Escherichia coli [TaxId: 562]},,,d1hjra_,69.5906,99.7 SIF-Syn: this is a RuvC like Resolvase. the gene upstream is NFK and the gene downstream is NFK /note=Primary Annotator Name: Anebere, Stephanie /note=Auto-annotation:Glimmer and genemark do not agree on the start of this gene /note=Coding Potential: no coding potential is shown for this gene /note=SD (Final) Score:the best final score is -7.016 which does show that there is coding potential and the best z score is 1.951 which is less than 2. /note=Gap/overlap: there is an 82 base pair gap is not unreasonable and it is unlikely that there is another gene that could fill the space. the other gene candidates gave unfavorable large gaps that are larger than the gene. /note=Phamerator: 9549 11/02/24. 7 members, 1 are drafts. Aleemily, Cafasso, Morgana, ObLaDi,VanLee /note=Starterator: 9549 11/02/24. was called in 5 of the 6 non-draft genes in the pham. The most called start number is 24.. Start: 24 @69425 has 5 MA`s /note=Location call: given the evidence shown this is a real gene the final score and z score indicates this. coding potential is shown in host and self trained. the best start site for this gene would be somewhere between 69400 to around 69500 /note=Function call: the e-value is good and the hits are good on phagedb and ncdi blast. It also has a high probability.This evidence supports that this is a RuvC resolvase. there are no TMD`s this makes sense for the function of this gene. there is strong evidence in hhpred and cdd. this gene belongs to the RuvC superfamily in cdd and shows strong hits in hhpred. /note=Transmembrane domains: there are no TMD`s /note=Secondary Annotator Name: Lawson, Matthew /note=Secondary Annotator QC:1 Based on the data collected, I agree that the start site at 69,425 is most likely, but some of the information you provided for this gene could be be fixed to provide better evidence for this start location. You indicated that GeneMark and Glimmer call different start sites (69,479 for Glimmer, 69,425 for GeneMark) . Both start sites also have good coding potential in both self-trained and host-trained GeneMark. This gene also has syntany with other related phages, and there are strong PhagesDBBlast hits. Additionally, the start codon at 69,425 (ATG) has a higher usage probability than the start codon at 69,479 (GTG) and covers all of the coding potential in the GeneMark maps. You did not indicate if this start site is this the longest reasonable ORF for this gene call, so I would fix this in your annotation notebook to provide stronger evidence for this location call. Although the location call at 69,425 has a worse Z-score and final score (-4.323) compared to the 69,479 call site Additionally, the starterator data strongly agrees with this location call. /note=I agree with your function call of it being a RUVC resolvase. In your PECAAN notes, I think it could just help to comment on the results of HHpred and CDD, as you only talk about the NCBI and PhagesDB blast hits but I think HHpred and CDD provide more valuable evidence. Additionally, make sure to fill out the function box on PECAAN (above the notes box) and the synteny box as well. Also, I think there are a few more hits in the blast sections and HHPred and CD sections on PECAAN that you could select as evidence too. Lastly, in your notebook you just missed a few questions on the TMD section, so I think you should make sure you answer all of these so that you don`t lose any points. CDS 69993 - 70652 /gene="132" /product="gp132" /function="hypothetical protein" /locus tag="ModicumRichard_132" /note=Original Glimmer call @bp 69993 has strength 8.97; Genemark calls start at 69993 /note=SSC: 69993-70652 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_132 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 6.40672E-158 GAP: 52 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.165, -5.426906901365179, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_132 [Gordonia phage Cafasso] ],,QXN74347,100.0,6.40672E-158 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: BURKE, KAIDEN /note=Auto-annotation: Glimmer and GeneMark both call this gene with agreement on the start codon at position 69993. This start codon is a “GTG”. /note=Coding Potential: The host-trained GeneMark and self-trained GeneMark gives the 3rd ORF of the forward strand very high coding potential. I believe the start site does work well within this coding potential and matches with the Glimmer/GeneMark predictions. /note=SD (Final) Score: -5.427 is the final score and is the best of the reasonable gap start sites by far. The rest have gaps of 200+ base pairs and are unreasonable. /note=Gap/overlap: Start codon at 69993 has the most reasonable and minimal gap at only 52 base pairs while the rest show a gap of over 200 base pairs making them very unreasonable for the coding potential they miss. /note=Phamerator: As of 10/18/24 this gene is in the pham 85114. The start number called the most often in the published annotations is 19, it was called in 106 of the 132 non-draft genes in the pham. The members of the cluster do not share this site and match the autoannotated site given. The function is agreed to be a DNA-binding protein. /note=Starterator: This gene is very conserved in the pham. All 4 other phages in the cluster manually annotate site 69993 equivalents. /note=Location call: The gene is a real gene that starts very likely at the start codon 69993 due to its small gap, high RBS value, and lack of reasonable alternatives. The conservation in other members of the pham and cluster also follow this start site pattern. /note=Function call: There is no known function. The data from the top 3 NCBI and Phagesdb BLASTp shows high e-values (< e-127) and high identity (+80%). There were no CDD hits and all the HHpred hits had high e-values (> 0.00045) with little understanding of the role of the protein. /note=Transmembrane domains: DeepTMHMM found no likely TMDs. This cannot help determine function because nothing is known about function. /note=Secondary Annotator Name: MUFF, MARGARET /note=Secondary Annotator QC: I agree with this annotation as all of the evidence has been considered and supports the auto annotated site. Appears to be LORF. I also agree with the function NKF, since there are no significant hits in HHpred or CDD, without any good e-values either. Synteny box complete. No notes. CDS 70705 - 70947 /gene="133" /product="gp133" /function="hypothetical protein" /locus tag="ModicumRichard_133" /note=Original Glimmer call @bp 70792 has strength 3.19; Genemark calls start at 70717 /note=SSC: 70705-70947 CP: yes SCS: both-cs ST: NA BLAST-Start: [hypothetical protein SEA_ALEEMILY_131 [Gordonia phage Aleemily]],,NCBI, q1:s1 100.0% 3.02137E-49 GAP: 52 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.083, -5.402017758808125, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEEMILY_131 [Gordonia phage Aleemily]],,UVK59871,98.75,3.02137E-49 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Laughton, Alette /note=Auto-annotation: Gene (stop@70947 F), Glimmer start called at 70792 bp, GeneMark start called at 70717 bp. The start at 70705, which is the manually called start, has a start codon of GTG. /note=Coding Potential: For the start at 70705, the gene has reasonable coding potential in the Self-Trained and Host-Trained GeneMark systems, and the site covers all of the coding potential. The most ideal start in Host-Trained and Self-Trained GeneMark appears to be between 70700 and 70800, closer to the 70700 side. The gene is 243 bp long, shows no switch in gene orientation from forward to reverse, and shows coding potential in only one frame of the reverse GeneMark system. /note=SD (Final) Score: With a start at 70705, -5.402. This final score is in the middle of the listed final scores for suggested start sites. The Z score is 2.083. This Z score is on the higher end of the listed Z scores for suggested start sites. /note=Gap/overlap: For the start site at 70705, 52 bp gap. This is one of the smaller gaps for listed start sites, but is still a large gap. This is due to matching the length of genes in other DZ phages within the same spot, although this gene does not have synteny with the similar genes in other phages. This is not the longest ORF for the gene (the longest ORF completely overlaps with the prior gene), and there is no coding potential suggesting the presence of another gene before the start site. /note=Phamerator: Pham 154274, run on 10/29/2024. It is not conserved, and the function is not called. /note=Starterator: Report not generated. /note=Location call: Based on the above evidence, this is a real gene. The gene is not conserved, but has reasonable coding potential. The likely start site for this gene is 70705. A start at 70705 supports a smaller gap, better Z score, better final score, and still a common start codon. /note=Function call: No known function (NKF). PhagesDB BLASTp returned the top 4 hits of No Known Function (E values < 1e-34). NCBI BLASTp returned the top 2 hits of a hypothetical protein, found in DZ cluster phages Cafasso and Aleemily (E values < 5e-47, coverage = 100%, identity 96-99%). CDD returned no hits. HHpred returned the top Pfam hit as NACHT N-terminal Helical domain (E value = 76) and top PDB hit as Mitochondrial ATP synthase subunit (E value = 51). /note=Transmembrane domains: DeepTMHMM does not call any trans-membrane domains (TMDs), and this is therefore not a transmembrane protein. /note=Secondary Annotator Name: Choudhary, Khushi /note=Secondary Annotator QC: I agree with the first annotation on the likely start site for this gene, and the function, and have QC`d this call. All of the evidence categories have been considered. CDS 70953 - 73574 /gene="134" /product="gp134" /function="ADP-ribosyltransferase" /locus tag="ModicumRichard_134" /note=Original Glimmer call @bp 70953 has strength 17.77; Genemark calls start at 70953 /note=SSC: 70953-73574 CP: no SCS: both ST: SS BLAST-Start: [ADP-ribosyltransferase [Gordonia phage Aleemily]],,NCBI, q1:s1 100.0% 0.0 GAP: 5 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.592, -3.5925599723783583, no F: ADP-ribosyltransferase SIF-BLAST: ,,[ADP-ribosyltransferase [Gordonia phage Aleemily]],,UVK59872,99.6564,0.0 SIF-HHPRED: ADP-RIBOSYLTRANSFERASE; ALPHA-BETA PROTEIN, PROTEIN-NAD COMPLEX, BINARY TOXIN, TOXIN; HET: NAD; 2.7A {Bacillus cereus} SCOP: d.166.1.1,,,1QS2_A,24.055,99.3 SIF-Syn: ADP-ribosyltransferase, upstream and downstream genes have no known functions in ObLaDi, Cafasso, and Aleemily (all cluster DZ). /note=Primary Annotator Name: Ea, Emily /note=Auto-annotation: Glimmer and GeneMark both call the start site at 70953 with a start codon of ATG. /note=Coding Potential: There is good coding potential in the forward direction only for both the host-trained and the self-trained GeneMark. This indicates that this is a forward gene, but the start site does not cover all of the coding potential. Despite the incomplete coverage of the coding potential, non-draft phage Cafasso (DZ) also displays the same feature. /note=SD (Final) Score: The final score is -3.593 and is the best possible final score listed on PECAAN for this gene. The Z-score is 2.592. /note=Gap/overlap: This start site has a gap of 5 bp from the previous gene. This gap is not extremely large and is reasonable for a real start site. /note=Phamerator: This gene is in pham 108247 on 10/31/24. It has 4 members, 1 of which is a draft, and all are found in cluster DZ. The gene is conserved and the phages used for comparison are Aleemily, Cafasso, and ObLaDi all in cluster DZ. /note=Starterator: On 10/18/24, the start site was called most often at number 1 at position 70953 with 3 out of 3 manual annotations. This start site agrees with the calls from both Glimmer and GeneMark. On Starterator, this start site was the only site listed, therefore making it the best and only option. /note=Location call: Based on the listed evidence, this gene is real and likely does start at 70953. Glimmer, GeneMark, and Starterator call the same start site that is highly conserved and called. The lack of coverage of coding potential is also seen in Cafasso, a phage that has already been annotated, suggesting that this start site in ModicumRichard is real. /note=Function call: This gene likely functions as an ADP-ribosyltransferase due to strong hits from Phagesdb and NCBI BLAST. The listed hits had E-values of 0 (strong hit), very high scores only several off from the max score, percent identity over 99%, and 100% query coverage. All of these hits with the sufficient values had a function listed as ADP-ribosyltransferase. CCD hits also matched the function of ADP-ribosyltransferase with low E-values (highest was e-15) and HHpred matched the same function as well, but gave information of classification as a toxin in two PDB hits (E-value lower than e-15, probability higher than 99%, and percent coverage greater than 157) with sufficient values. There are no specific requirements for being an ADP-ribosyltransferase, but all hits from PhagesDB, NCBI, CDD, and HHpred point to being an ADP-ribosyltransferase. /note=Transmembrane domains: There are no TMDs and is not considered a membrane protein, which makes sense for this gene as ADP-ribosyltransferase. Protein noted to be outside of the membrane which could conflict with the proposed function of ADP-ribosyltransferase that would more likely take place within the membrane to add ADPs to protein sequences. TOPCONS also notes that the gene is outside, matching the TMHMM call. This could be due to a unique gene sequence that is similar to outside genes, but could actually be a gene on the inside. /note=Secondary Annotator Name: Adamian, Andrew /note=Secondary Annotator QC: I agree with the called start site of 70953. I also agree with the function call. HHPred and BLAST (Both NCBI and PhagesDB) seem to indicate this function call and CDD gives better insight onto this call. Additionally, I agree with your TMD call, as DeepTMHMM doesn`t predict any TMR`s for this gene. CDS 73571 - 73984 /gene="135" /product="gp135" /function="hypothetical protein" /locus tag="ModicumRichard_135" /note=Original Glimmer call @bp 73571 has strength 13.43; Genemark calls start at 73571 /note=SSC: 73571-73984 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_135 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 7.00301E-92 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.734, -3.3019548882942655, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_135 [Gordonia phage Cafasso] ],,QXN74350,100.0,7.00301E-92 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Venkatraman, Rakshaa /note=Auto-annotation start source: Both Glimmer and GeneMark agree with start site 73571, start codon is GTG /note=Coding Potential: Good coding potential in predicted ORF in both host-trained and self-trained. Predicted start site covers the entirety of said coding potential. /note=SD (Final) Score: SD score is best for chosen start site 73571, as it is the highest/least negative. The RBS score is technically irrelevant for the start call since it has a 4bp overlap and is part of an operon, but is the best RBS score regardless. Z-score is good and above 2. /note=Gap/overlap: 4bp gap with the upstream gene reasonable as it is part of an operon. Other start candidates have unfavorable gaps/overlaps, and not the best RBS/Z-score. Currently chosen start has synteny with other phage genomes. /note=Phamerator: As of 10/31, part of pham 11381. Conserved in other phages of same cluster: Aleemily, Cafasso, ObLaDi, and Morgana. No function called in these as of yet. /note=Starterator: The start number called the most often in the published annotations is 4, it was called /note=in 4 of the 4 non-draft genes in the pham. Corresponding to this phage, Start: 4 @73571 has 4 MA`s. /note=Location call: Based on the current evidence, this gene is a real gene with good coding potential, final/Z-score, and synteny with other phages. The auto-annotated 73571 start site seems correct based on this same evidence. Starterator also provides strong evidence for calling this start site, since it was manually annotated to the same site in all 4 other non-draft phage genomes of the same cluster and pham. /note=Function call: No known function. PhagesDB blast results and NCBI blastp returned no known function/hypothetical protein with strong e-values, no CDD results, and HHPred results were uninformative (had lowest e-value of 36). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: ANEBERE, STEPHANIE /note=Secondary Annotator QC: I agree with the start site proposed. The evidence is reasonable and suggests that this is the correct start site. I agree with the function call of this gene. I have QC’ed this gene. CDS complement (73985 - 74305) /gene="136" /product="gp136" /function="DNA binding protein" /locus tag="ModicumRichard_136" /note=Original Glimmer call @bp 74272 has strength 10.88; Genemark calls start at 74272 /note=SSC: 74305-73985 CP: yes SCS: both-cs ST: NI BLAST-Start: [DNA binding protein [Gordonia phage Aleemily] ],,NCBI, q1:s1 100.0% 2.27375E-68 GAP: 15 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.067, -2.8298788511194775, yes F: DNA binding protein SIF-BLAST: ,,[DNA binding protein [Gordonia phage Aleemily] ],,UVK59874,100.0,2.27375E-68 SIF-HHPRED: DUF2774 ; Protein of unknown function (DUF2774),,,PF11242.11,28.3019,96.4 SIF-Syn: DNA binding protein, upstream gene is NKF binding protein, downward stream gene is NKF. This is the same for other DZ cluster phages Cafasso, Morgana, ObLaDi, and Aleemily. /note=Primary Annotator Name: Allataifih, Bashar /note=Auto-annotation: Glimmer and GeneMark both call this gene in agreement on the start codon 74272. This start codon is a “GTG”. The "ATG" site at 74305 also shows some promise as the start /note=Coding Potential: The host-trained GeneMark and self-trained GeneMark gives the 1st ORF of the reverse strand very high coding potential. The host-trained Genemark shows all coding potential within the start site at 74272 while the self-trained Genemark shows that there is missed coding potential that is only covered by the LORF at the start site 74305. /note=SD (Final) Score: -4.869 is the final score at the auto-annotated start of 74272 but is not the best score nor the LORF. The LORF with the smallest gap and best final score of -2.830 is found at the start site 74305. /note=Gap/overlap: Start codon at 74305 has the most reasonable and minimal gap at only 15 base pairs while the rest show a gap of more than ~50 base pairs making them less reasonable for the coding potential they miss and extra base pair gaps. /note=Phamerator: This gene is very conserved in the pham. 2 out of 4 of the pham phages share the auto-annotated start site length while the other 2 out of 4 annotated the site 74305. This is inconclusive evidence on its own for either start. /note=Starterator: As of 11/11/24 this gene is in the pham 12159. The starterator is uninformative because there is only 4 nondraft phages and they are split equally between the start sites of 74305 and 74272. The function is agreed to be a DNA-binding protein. /note=Location call: The gene is a real gene that starts very likely at the site 74305 rather than the auto-annotated Glimmer start codon 74272 due to its smaller gap, higher RBS value, and full coverage of self-trained Genemark coding potential. The conservation in other members of the pham and cluster show this is a real gene with syteny but do not provide enough information alone to choose the proper start site. /note=Functional call: The function is an DNA binding protein as seen by the top 2 NCBI BLASTp showing almost the same exact results , which is also the same as the function (e-value < 1e-100), and CDD domains showing the matching domains (transpos_IS630) with a decent e-value (8.01e-03). HHpred shows multiple different functions, one of them being SiR5 and the other being an unknown function. We ended up deciding it would be called a DNA binding protein due to the overwhelming evidence from the NCBI BLASTp. /note=Transmembrane domains: DeepTMHMM predicts no TMDs; only one inside domain. /note=Secondary Annotator Name: BURKE, KAIDEN /note=Secondary Annotator QC: I wrote the entire PECAAN from Auto-annotations to location call. I agree with the location call. I agree with the final function call due to high similarity scores from BLAST despite weak CDD e-value. CDS complement (74321 - 74584) /gene="137" /product="gp137" /function="hypothetical protein" /locus tag="ModicumRichard_137" /note=Original Glimmer call @bp 74584 has strength 15.01; Genemark calls start at 74584 /note=SSC: 74584-74321 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_137 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 6.61952E-57 GAP: 44 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.831, -3.4897410997597818, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_137 [Gordonia phage Cafasso]],,QXN74352,98.8506,6.61952E-57 SIF-HHPRED: A2L_zn_ribbon ; A2L zinc ribbon domain,,,PF08792.13,33.3333,96.8 SIF-Syn: /note=Primary Annotator Name: Qian, Audrey /note=Auto-annotation: Gene (stop@74321 R). Both Glimmer and GeneMark call the start at 74584 bp, and the start codon is ATG. /note=Coding Potential: Coding potential in this ORF is on the reverse strand (fourth frame) only, indicating that this is a reverse gene. The ORF is the longest (264 bp), and it does have reasonable coding potential, which is found both in GeneMark Self and Host. The chosen start site includes all of the coding potential. /note=SD (Final) Score: -3.490. It is the best final score on PECAAN. The z score is also the highest at 2.831. /note=Gap/overlap: 44bp gap. This gap is somewhat large, but ultimately reasonable because the gap is conserved in other phages (Cafasso, Morgana). This gap is also the best one since the next smallest gap is 95bp. /note=Phamerator: The pham number as of 10/29/2024 is 186793. The gene is conserved in phages Aleemily, Cafasso, Morgana, and ObLaDi, all in the same cluster as ModicumRichard (DZ). The function call for the gene is unknown. /note=Starterator: There are 88 non-draft genes in the pham. 71/88 non-draft members call start site 19. However, the called start site for ModicumRichard is 20 at 74548 bp, which has 5 MA’s only. Despite this, the best annotated start site for this phage aligns with the auto-annotated start site, and ModicumRichard does not have an annotation for start site 19. Additionally, other phages in the same cluster, such as ObLaDi, Cafasso, and Aleemily, call start site 20, and both Glimmer and GeneMark call the start site at 74584. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 74584 bp. Starterator slightly agrees with Glimmer and GeneMark. Starterator states that the start number called the most often in the published annotations is 19 (71/88 non-draft genes in the pham), but ModicumRichard does not have this “Most Annotated” start. Instead, the auto-annotated and most manually annotated start for ModicumRichard is 20 at 74584 bp. This start site, rather than start site 19, agrees with Glimmer and GeneMark. Furthermore, the start site in Starterator across all homologues is conserved, just in different locations. Starterator is informative but does not provide any new information. /note=Functional call: NKF. All PhagesDB BLAST hits have no known function (e-value < 7e-20), and NCBI BLAST called all hits with unknown function (100% coverage, 98.8506% identities, and e-value < 8e-55). There were no hits on CDD, but HHpred called numerous hits with high probability (red sequences, top two >98.16% and e-value < 4.7e-6). These hits call function of EDR4-like_1st; enhanced disease resistance 4-like, N-terminal domain for PF22910.1 and A2L_zn_ribbon; A2L zinc ribbon domain for PF08792.15. However, since all three other platforms do not call any hits with known function, it is unlikely that this ORF has a functional call. Additionally, other phages from the same DZ cluster (Cafasso, Aleemily, ObLaDi, and Morgana) specifically have NKF. Most of the top hits with known function for ModicumRichard are not found on PECAAN, and none of these functions are officially on the SEA-PHAGES functional assignments list. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs. Therefore, it is not a membrane protein. /note=Secondary Annotator Name: Laughton, Alette /note=Secondary Annotator QC: Based on the above evidence, I agree with the primary annotator`s locational call and functional call. CDS complement (74629 - 74763) /gene="138" /product="gp138" /function="hypothetical protein" /locus tag="ModicumRichard_138" /note=Original Glimmer call @bp 74763 has strength 24.6; Genemark calls start at 74775 /note=SSC: 74763-74629 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_ALEEMILY_136 [Gordonia phage Aleemily] ],,NCBI, q1:s1 100.0% 4.89852E-20 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.313, -4.024058836634684, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEEMILY_136 [Gordonia phage Aleemily] ],,UVK59876,100.0,4.89852E-20 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Liu, Carissa /note=Auto-annotation: Gene (stop@74629 R) Both Glimmer and GeneMark call the gene. They do not agree on the same start site. Glimmer calls the start site at 74763 bp with a start codon of ATG. GeneMark calls the start site at 74775 bp with a start codon of GTG. /note=Coding Potential: [start site 74763] The gene has reasonable (very good coverage) coding potential predicted within the putative ORF. The chosen start site covers all this coding potential. The gene displays synteny with at least two other non-draft phage genomes. There are four PhagesDB BLAST hits that are non-draft phage genomes with an e-value less than 10^-6. There is coding potential predicted by Glimmer, GeneMark Self, and GeneMark Host. The gene is at least 120 bp long (135 bp). There are no switches in gene orientation. Only one frame in one strand is used for a protein-coding gene. /note=SD (Final) Score: [start site 74763] The SD score -4.024 is the second best. The best is -2.906. The Z-value 2.313 is the second best. The best is 2.927. Since the gene is an operon, the RBS is irrelevant for the start call. /note=Gap/overlap: [start site 74763] 1 bp overlap with the upstream gene is reasonable. This gene may be part of an operon. The length of the gene is acceptable (135 bp). It is the longest reasonable ORF for this gene call. There are no large non-coding gaps before the gene. /note=Phamerator: Pham 107538 as of 10/27/24 is conserved in Cafasso and ObLaDi, all in the same cluster DZ as ModicumRichard. No function called. /note=Starterator: There is a reasonable start site choice that is conserved. Start site #4 @74763 in ModicumRichard. 4/4 non-draft genes call site #4. This evidence agrees with the site predicted by Glimmer. /note=Location call: [start site 74763] The gathered evidence suggests that this is a real gene, conserved in phamerator and has good coding potential. The most likely start site is 74763, conserved in starterator and covers all coding potential. /note=Function call: NKF. No program returned any informative results. PhagesDB BLASTp: function unknown. NCBI BLASTp: hypothetical protein. CDD: no hits. HHpred: no significant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Ea, Emily /note=Secondary Annotator QC: I agree with this location call, everything seems to be considered. This function call is reasonable since there is no other evidence that it has a known function. CDS complement (74763 - 75008) /gene="139" /product="gp139" /function="hypothetical protein" /locus tag="ModicumRichard_139" /note=Original Glimmer call @bp 75008 has strength 21.45; Genemark calls start at 75008 /note=SSC: 75008-74763 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ALEEMILY_137 [Gordonia phage Aleemily] ],,NCBI, q1:s1 100.0% 2.86299E-51 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.173, -4.660750147004694, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEEMILY_137 [Gordonia phage Aleemily] ],,UVK59877,100.0,2.86299E-51 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Streva, Amanda /note=Auto-annotation start source: Both Glimmer and GeneMark call the start to be at 75008 with codon GTG /note=Coding Potential: In the Host-Trained, there is good coding potential in the reverse strand ranging from 75300 - 74700. The coding potential does span most of ORF but there is a presence of forward coding potential around 74700. The Self-Trained follows a similar pattern to the Host-Trained, where there is good coding potential in the reverse strand for both typical and atypical, but still a presence of both typical and atypical coding potential on the forward strand around 74750 bp, but only ranges for 50 pb, leading me to believe it is insignificant to consider /note=SD (Final) Score: -4.661, the best final score on PECAAN with the highest Z-score value of 2.173 /note=Gap/overlap: -1 bp, which can be indicative of an operon, also the smallest gap value in PECAAN (indicates an overlap of 1bp) /note=Phamerator: The Pham number is 86823 and was ran on 10/30/24 and is well conserved in both DZ and BD clusters /note=Starterator: The Start site 14 is the most manually annotated in 13/18 nondraft genes, but the auto-annotated start site is 15, 75008 (with 4 MA’s) which both glimmer and GeneMark agree with, as well as SD/Z-score and Gap also agree that the start site is 75008. Start site 15 is also pretty well conserved (in 4 phages in the same cluster) /note=Location call: This is a real gene and agreeing with both Glimmer and GeneMark, I call the start to be at 75008 /note=Function call: NKF. No program returned any informative results. PhagesDB BLAST states function unknown in the top 3 hits (though they have good E-values 3e-43), NCBI BLAST calls the function a hypothetical protein and CDD and HHpred have no significant hits. /note=Transmembrane domains: There are no predicted TMD’s, therefore no sufficient evidence that this is a membrane protein. /note=Secondary Annotator Name: Venkatraman, Rakshaa /note=Secondary Annotator QC: I agree with your location call for the start site! Coding potential was slightly off from the start, but there was also no better option. Given indication of operon, going with 75008 site seems best. SD score and z-scores are also promising. 4 other DZ phages also called same site per Starterator. I just added a comment saying overlap of 1bp since -1 indicates overlap and not gap, saw a comment on the annotation lab manual to verify this wording so just added that in as clarification! I agree with your function call - both blasts return no known function/hypothetical protein with good e-values, CDD and HHPred uninformative. Make sure to still click the BLAST hits on PECAAN though, the lab manual mentions you can click them if its NKF on BLAST, just not for CDD/HHPred! Make sure to fill out Synteny box as well! CDS complement (75008 - 75253) /gene="140" /product="gp140" /function="hypothetical protein" /locus tag="ModicumRichard_140" /note=Original Glimmer call @bp 75253 has strength 13.07; Genemark calls start at 75253 /note=SSC: 75253-75008 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ALEEMILY_138 [Gordonia phage Aleemily]],,NCBI, q1:s1 100.0% 1.07723E-51 GAP: 130 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.19, -4.354778061539587, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEEMILY_138 [Gordonia phage Aleemily]],,UVK59878,100.0,1.07723E-51 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: TOROSIAN, ISABELLA /note=Auto-annotation: Both Glimmer and GeneMark call the gene. The start site is listed in Glimmer as 75253. The start site listed in GeneMark is also 75253. /note=Coding Potential: GeneMark host has high coding potential along the reverse strand from 75253- 75050 bp; however, from 75050 bp to the stop site, there is no peak in coding potential. There is a small peak of coding potential in the forward strand 75000- 75050 bp. The peak is not large relative to the coding potential in the reverse direction of the ORF and it is only 50 bp long, so there is no evidence of a real gene in the forward strand. Genemark self contains reasonable coding potential in this ORF on the reverse strand only. /note=SD (Final) Score: The final score is -4.355, and it is the second-best final score. The z-value is 2.19, and it is the second-best z-value. /note=Gap/overlap: The gap between this gene and the prior gene is around 130 bp. This gap is somewhat large; however, there is no coding potential between the 130 bp gap between this gene and its upstream gene. /note=Phamerator: As of October 29, 2024, the Pham number is 112706. There is only one other member in the Pham, the phage Aleemily, which belongs to the same cluster/subcluster as Modicum_Richard. /note=Starterator: This Pham has only two non-draft members. Both non-draft members call start site 3, which correlates to a start site of 75253 bp for Modicum_Richard. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 75253. Starterator agrees with Glimmer and Genemark. /note=Function call: NKF. All of the PhagesDB Blast hits claimed an unknown function. NCBI contained only two strong hits (e-value of e-51 and 100% identity), which had the hypothetical protein function. CDD had no hits and was, therefore, uninformative. HHpred contained no strong hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: ALLATAIFIH, BASHAR /note=Secondary Annotator QC: This is a great catch. I think the original reverse start site misses a good bit of the coding potential, and 75050 or so sounds like it should be the right start. Everything else looks good for this gene. /note=Tertiary Annotator Name: Velez-Ramirez, Daniel /note=Tertiary Annotator QC: Despite the start site at 75253 does not have the least negative final score, the information provided to call it, is strong. Phage Aleemily is the only other phage of this cluster with this gene, and the length of 246 pb matches with the start site at 75253. The gap of 130 pb is somehow long, and it would accommodate a short gene. CDS complement (75384 - 75707) /gene="141" /product="gp141" /function="hypothetical protein" /locus tag="ModicumRichard_141" /note=Original Glimmer call @bp 75707 has strength 14.85; Genemark calls start at 75707 /note=SSC: 75707-75384 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ALEEMILY_139 [Gordonia phage Aleemily]],,NCBI, q1:s1 100.0% 1.38353E-69 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.927, -2.827683592113848, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEEMILY_139 [Gordonia phage Aleemily]],,UVK59879,98.1308,1.38353E-69 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Lee, Kyle /note=Auto-annotation: Glimmer and GeneMark both have starting sites at 75707. They are thus in agreement. The start codon called is ATG. /note=Coding Potential: Coding potential for this gene’s ORF is found in the reverse strand only. This indicates it is a reverse gene. Also, typical coding potential is found in both GeneMark self and host, indicating reasonable coding potential. The chosen start site includes all coding potential, and is the longest ORF at 324 bp. /note=SD (Final) Score: The SD score is -2.828, which is the best/most positive of the start sites provided. The z-score is highest as well, at 2.927. /note=Gap/overlap: The gap is -8 bp (8 bp overlap). This is just on the edge of what is reasonable. The next best option would be a 7 bp gap, although the z score and final score would be much lower. The length of the gene is also reasonable, at 324 bp. /note=Phamerator: On 10/25/24, the gene was determined to be part of pham 107930. The pham is common in the cluster, where all members of the pham are part of the DZ cluster. This pham includes Aleemily_139 (DZ), Cafasso_142 (DZ), ModicumRichard_141 (DZ, draft), and ObLaDi_140 (DZ). Phamerator and Phages DB did not identify the function of this gene. /note=Starterator: Start site 1 in Starterator was manually annotated in 3/3 non-draft genes in this pham. Start site 1 is 75707 in ModicumRichard. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on above evidence, this is a real gene with a start site at position 75707. Starterator agrees with Glimmer and GeneMark. /note=Function call: NKF. While the top two phagesdb BLAST hits (E-value <10^-55) and NCBI BLAST hits (E-value <10^-68) have low e-values, none of these hits have a known function. These other hits are from phages of cluster DZ (same cluster as for this gene). HHpred had a hit for YodL-like protein with 98% probability, 56% coverage, and E-value of 3.7e-9, but the function of this protein is unrecognized. CDD had no relevant hits. Thus, the gene is of NKF. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, so it is not a membrane protein. DeepTMHMM predicts it as a globular protein. /note=Secondary Annotator Name: Qian, Audrey /note=Secondary Annotator QC: Based on the above evidence, I agree with the first annotator`s location and function call. CDS complement (75700 - 75840) /gene="142" /product="gp142" /function="hypothetical protein" /locus tag="ModicumRichard_142" /note=Original Glimmer call @bp 75840 has strength 21.16; Genemark calls start at 75840 /note=SSC: 75840-75700 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ALEEMILY_140 [Gordonia phage Aleemily] ],,NCBI, q1:s1 100.0% 2.88905E-22 GAP: 458 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.404, -3.8366550365551095, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEEMILY_140 [Gordonia phage Aleemily] ],,UVK59880,100.0,2.88905E-22 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Khuu, Jacob /note=Auto-annotation: Both Glimmer & GeneMark. Agree on the same start site: 75840. Start codon is ATG. /note=Coding Potential: Gene has reasonable coding potential within putative ORF. The chosen start site does cover all of this coding potential. /note=SD (Final) Score: The final score was the best at -3.837 out of all of the options; it was the most positive. The Z-value was 2.404, which is above 2, indicating that the RBS site is reasonable. /note=Gap/overlap: Yes, the gap of 458 bp appears to be reasonable because there is no coding potential observed within that gap, and the pham map comparison demonstrates that there is a large gap between the start site and the previous gene, shown in other members of the pham. The length of 141 bp is reasonable for the gene. /note=Phamerator: The pham that the gene is found in is 9842, as of Tuesday, Oct. 29. The pham that my assigned gene belongs to is present in other members of the cluster to which the phage belongs to. The phages that I used for comparison are Aleemily and Cafasso. Phamerator did not call a function for this gene. /note=Starterator: The start site number that was called most often in published annotations is 2.The base pair coordinate that the most conserved start site number corresponds to for my phage is (Start: 2 @75840 has 5 MA’s). There are 5 other members in this pham, and all 5 call this the most conserved site. /note=Location call: The gathered evidence suggests that the gene is indeed real, and the start site is reasonable based on the RBS final score and the Z-value. The potential start site candidate that seems most likely is Start Number 2, at 75840, given the results from Starterator. /note=Function call: NKF. The top 2 BLAST hits (ObLaDi_141 and Morgana_149) that had e-values of 2e-19 both had functions that were unknown. Also, there were no CDD hits whatsoever, and the HHpred hits were all insignificant (the graph was mostly blue), with all e-values between 30-50, and the highest match lower than 70% probability. Based on DeepTMHMM, this gene doesn`t code for a membrane protein. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Liu, Carissa /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. Note: For the gap/overlap section, please state the phages that were used for the pham map comparison. Did Phamerator or the phams database have a function called for this gene? Were the functions called consistent and found in the approved function list? I have QC’ed this function call and agree with the first annotator. All of the evidence categories have been considered. tRNA complement (76229 - 76291) /gene="143" /product="tRNA-Ser(cga)" /locus tag="MODICUMRICHARD_143" /note=tRNA-Ser(cga) CDS complement (76299 - 76448) /gene="144" /product="gp144" /function="hypothetical protein" /locus tag="ModicumRichard_144" /note=Original Glimmer call @bp 76448 has strength 8.08; Genemark calls start at 76448 /note=SSC: 76448-76299 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ALEEMILY_141 [Gordonia phage Aleemily]],,NCBI, q1:s1 100.0% 1.06905E-25 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.734, -5.209740224197155, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEEMILY_141 [Gordonia phage Aleemily]],,UVK59881,100.0,1.06905E-25 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Xu, Sasha /note=Auto-annotation: Both Glimmer and Genemark called 76448 as the start site. The start codon is ATG. /note=Coding Potential: There is reasonable coding potential for this region in the reverse direction in the 5th row. The start site covers all the coding potential. /note=SD (Final) Score: The final score is -5.210 and the Z score is 1.734. These scores are not the best scores. However, the start site with the best final and Z score is in the middle of the coding potential and does not include all of it /note=Gap/overlap: -8. This is a larger than usual overlap between genes. The length of the gene is reasonable. /note=Phamerator: The gene is found in Pham 109698 as of 10/29/24. In comparison to gene Aleemily_141 also found in the same pham, both genes are found in the same cluster DZ. /note=Starterator: The reasonable start site is 3 that is conserved with the other gene found in the pham. The basepair coordinate is 76448. There is only one other member in this pham and they all call the most conserved site. /note=Location call: Based on the above information, specifically the coding map, the gene likely has a start site of 76488. /note=Function call: Both NCBI and Phagesdb blast only showed one strong hit with Gordonia phage Aleemily that showed a score of 99, query coverage of 100% and an E-value of 1e-25 but that phage showed unknown function. There were no significant CDD hits or HHPred hits. Based on this evidence, the function is NKF. /note=Transmembrane domains: There are no transmembrane domains predicted, indicating that the gene is not a membrane protein. /note=Secondary Annotator Name: Streva, Amanda /note=Secondary Annotator QC: I have QC`ed and agree with the Primary Annotators Start site call, there is strong evidence that supports the start site to be 76448. As for function, I also agree that it is NKF because PhagesDB BLAST top hit is "function unknown" and NCBI BLAST has one hit that is a "hypothetical protein". There were also no good hits in HHPred and no hits in CDD, so i agree with the primary annotators call of NKF. CDS complement (76441 - 76707) /gene="145" /product="gp145" /function="hypothetical protein" /locus tag="ModicumRichard_145" /note=Original Glimmer call @bp 76707 has strength 7.26; Genemark calls start at 76701 /note=SSC: 76707-76441 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_ALEEMILY_142 [Gordonia phage Aleemily]],,NCBI, q1:s1 100.0% 9.78528E-57 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.231, -4.331668843916977, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEEMILY_142 [Gordonia phage Aleemily]],,UVK59882,100.0,9.78528E-57 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Su, Erin /note=Auto-annotation: Glimmer and GeneMark both call the gene. Glimmer indicates the start site to be 76707 and GeneMark indicates that it is 76701. /note=Coding Potential: This gene does have coding potential in both GeneMarkSelf and GeneMark Host, indicated in the reverse strand. /note=SD (Final) Score: The RBS for the 76707 start site is -4.332 with a Z-score of 2.231 and the RBS for the 76701 start site is -5.797 with a Z-score of 2.231. The Z-scores are both good but the auto-annotated start site of 76707 has a better RBS. The start codon is GTG, more common. /note=Gap/overlap: The 76707 start site has an overlap of 4bp whereas the 76701 start site has a gap of 2bp. /note=Phamerator: The pham number as of 10/29/2024 is 10824. The gene is conserved in each of the four other phages (ObLaDi, Cafasso, Aleemily, Morgana) in Cluster DZ. No genes from other clusters are present. /note=Starterator: The other four non-draft genes that are in this pham belong to the four other phages in Cluster DZ. Each of the four non-draft genes call start site 2 which corresponds to 76707 for ModicumRichard, the same as Glimmer’s auto-annotation suggestion. This starterator analysis was run on 10/18/2024 on database version 578. /note=Location call: Due to the above information, this is a real gene and 76707 is most likely the start site. /note=Function call: No known function. The top four PhagesDB BLAST hits (from phages Cafasso, Aleemily, ObLaDi, Morgana) had no known functions (E-values less than 10^-40). The top four NCBI BLAST hits (from the same four phages) were all hypothetical proteins, with identity 80%+, coverage 90%+, and E-values less than 10^-48. There were no CDD hits. There were no HHPred hits either, the top E-values were above 10^1, and the top probabilities were less than 50% (matching with synthetic proteins and proteins also without a known function). /note=Transmembrane domains: DeepTMHMM did not predict any TMDs. This is not a transmembrane protein. /note=Secondary Annotator Name: Torosian, Isabella /note=Secondary Annotator QC: I agree with the location call made by the primary annotator. There is strong evidence to support start site 76707. I also agree with the location call made by the primary annotator. CDS complement (76704 - 77261) /gene="146" /product="gp146" /function="hypothetical protein" /locus tag="ModicumRichard_146" /note=Original Glimmer call @bp 77261 has strength 19.45; Genemark calls start at 77261 /note=SSC: 77261-76704 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ALEEMILY_143 [Gordonia phage Aleemily]],,NCBI, q1:s1 100.0% 4.12799E-132 GAP: 94 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.166, -2.356391881054909, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEEMILY_143 [Gordonia phage Aleemily]],,UVK59883,99.4595,4.12799E-132 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Vo, Cindy /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start at 77261. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. Coding potential has good coverage and width in the ORF. /note=SD (Final) Score: The final score is the best option at -2.356 and the z score is the highest at 3.166. /note=Gap/overlap: The gap is a little large at 94 bps, but this gene and gap was conserved in many other phages as well, such as Cafasso (DZ) and Aleemily (DZ). /note=Phamerator: pham: 107828. Date 10/24/2024. It is conserved; found in Aleemily and Cafasso. /note=Starterator: There are only 3 non-draft members of this Pham. 3/3 non-draft members call start site 4, which correlates to a start site of 77261 bp for ModicumRichard. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 77261 bp. /note=Function call: Function unknown. The top three phagesdb BLAST hits have unknown functions (E-value <10^-88, and top three NCBI BLAST hits also have the unknown function. (98.3%+ coverage, 83.7%+ identity, and E-value <10^-109). HHpred had a hits but had low coverage, low probability, and high e-value. Therefore, HHpred cannot be used as evidence for function. CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Bouklas, Tejas; Holaway, Aaron /note=Secondary Annotator QC: QC1 complete; QC2 complete CDS complement (77356 - 77550) /gene="147" /product="gp147" /function="hypothetical protein" /locus tag="ModicumRichard_147" /note=Original Glimmer call @bp 77550 has strength 16.63; Genemark calls start at 77550 /note=SSC: 77550-77356 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_146 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 7.68598E-37 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.485, -4.197870532213559, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_146 [Gordonia phage Cafasso]],,QXN74361,98.4375,7.68598E-37 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Vo, Cindy /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start at 77550. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. Coding potential has good coverage and width in the ORF. /note=SD (Final) Score: The final score is the best option at -4.198 and the z score is the highest at 2.485. /note=Gap/overlap: The overlap is ideal as it is a 4 bps overlap, indicating that it is an operon. This gene and gap was conserved in many other phages as well, such as Cafasso and Aleemily. /note=Phamerator: pham: 167377. Date 10/24/2024. It is conserved; found in Aleemily and Cafasso. /note=Starterator: There are only 4 non-draft members of this Pham. 2/4 non-draft members call start site 15, which correlates to a start site of 77550 bp for ModicumRichard. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 77550 bp. /note=Function call: Function unknown. The top two phagesdb BLAST hits have unknown functions (E-value <10^-30) and top two NCBI BLAST hits also have the unknown function. (100% coverage, 96.8%+ identity, and E-value <10^-36). HHpred had a hits but had low coverage, low probability, and high e-value. Therefore, HHpred cannot be used as evidence for function. CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Bouklas, Tejas; Holaway, Aaron /note=Secondary Annotator QC: QC1 complete; QC2 complete CDS complement (77547 - 77705) /gene="148" /product="gp148" /function="hypothetical protein" /locus tag="ModicumRichard_148" /note=Original Glimmer call @bp 77705 has strength 12.46; Genemark calls start at 77705 /note=SSC: 77705-77547 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_OBLADI_146 [Gordonia phage ObLaDi]],,NCBI, q1:s1 100.0% 7.29834E-28 GAP: 14 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.794, -5.14847875699779, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_OBLADI_146 [Gordonia phage ObLaDi]],,UXE03869,98.0769,7.29834E-28 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Vo, Cindy /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start at 77705. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. Coding potential has good coverage and width in the ORF. /note=SD (Final) Score: The final score is -5.148 and the z score is 1.794. These were not the highest final and z-scores. /note=Gap/overlap: The gap is small at 14 bps. This gene and gap was wholly conserved in the ObLaDi phage and partially conserved in Morgana. /note=Phamerator: pham: 1110773. Date 10/24/2024. It is conserved; found in ObLaDi (DZ). /note=Starterator: There are only 1 non-draft members of this Pham. 1/1 non-draft members call start site 5, which correlates to a start site of 77705 bp for ModicumRichard. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 77705 bp. /note=Function call: Function unknown. In both phagesdb BLAST and NCBI BLAST, there were only two results where the top result was a a good match (from ObLaDi) to the ModicumRichard (stop @ 77,747, R) gene while the second result (from Morgana) was not the best match. So, for this gene there was only one best match.The top phagesdb BLAST hits have unknown function (E-value= 1x10^-23), and top NCBI BLAST hits also had the unknown function. (100% coverage, 92.3% identity, and E-value= 7.310^-28). HHpred had a hits but had low coverage, low probability, and high e-value. There was one hit that had a 40.38% coverage, but the identity was low and the e-value was high. Therefore, HHpred cannot be used as evidence for function. CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Bouklas, Tejas; Holaway, Aaron /note=Secondary Annotator QC: QC1 complete; QC2 complete CDS complement (77720 - 78115) /gene="149" /product="gp149" /function="hypothetical protein" /locus tag="ModicumRichard_149" /note=Original Glimmer call @bp 78115 has strength 12.38; Genemark calls start at 78082 /note=SSC: 78115-77720 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_148 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 4.53746E-90 GAP: 12 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.166, -2.2763497933341483, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_148 [Gordonia phage Cafasso] ],,QXN74363,99.2366,4.53746E-90 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Vo, Cindy /note=Auto-annotation start source: Glimmer and GeneMark. Glimmer calls the start at 78115. GeneMark calls the start at 78082. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. Coding potential has good coverage and width in the ORF. /note=SD (Final) Score: The final score is the best option at -2.276 and the z score is the highest at 3.166. /note=Gap/overlap: The gap is small at 12 bps, but this gene and gap was conserved in many other phages as well, such as Cafasso and Aleemily. /note=Phamerator: pham: 107819. Date 10/24/2024. It is conserved; found in ObLaDi (DZ), Cafasso (DZ), and Aleemily (DZ). /note=Starterator: There are only 3 non-draft members of this Pham. 3/3 non-draft members call start site 1, which correlates to a start site of 78115 bp for ModicumRichard. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 78115 bp. Despite the differing start calls between Glimmer and GeneMark, Starterator agrees with Glimmer`s start call at 78115 bp. /note=Function call: Function unknown. There were hits that had functions to them in phagesdb, but they were not top hits and the functions were not consistent. The top three phagesdb BLAST hits have unknown functions (E-value= 2x10^-75). There was only one hit in NCBI BLAST and the hit was a good match. The top NCBI BLAST hit also have the unknown function. (100% coverage, 99.2% identity, and E-value= 4.5x10^-90). HHpred had a hits but had low coverage, low probability, and high e-value. There were hits that had higher probabilities (<75%) but the coverage was very low and the evalue was high. Therefore, HHpred cannot be used as evidence for function. CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Bouklas, Tejas; Holaway, Aaron /note=Secondary Annotator QC: QC1 complete; QC2 complete CDS complement (78128 - 78475) /gene="150" /product="gp150" /function="hypothetical protein" /locus tag="ModicumRichard_150" /note=Original Glimmer call @bp 78475 has strength 9.85; Genemark calls start at 78475 /note=SSC: 78475-78128 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ALEEMILY_147 [Gordonia phage Aleemily]],,NCBI, q1:s1 100.0% 1.80556E-75 GAP: 101 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.998, -4.731360067354143, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEEMILY_147 [Gordonia phage Aleemily]],,UVK59887,100.0,1.80556E-75 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Vo, Cindy /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start at 78475. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. Coding potential has good coverage and width in the ORF. /note=SD (Final) Score: The final score is the best score at -4.731 and the z score is the at 1.998. The z-score was not the highest. /note=Gap/overlap: The gap is a bit large at 101 bps, but this gene and gap was conserved in many other phages as well, such as Cafasso and Aleemily. /note=Phamerator: pham: 107727. Date 10/24/2024. It is conserved; found in ObLaDi (DZ), Cafasso (DZ), and Aleemily (DZ). /note=Starterator: There are only 3 non-draft members of this Pham. 3/3 non-draft members call start site 5, which correlates to a start site of 78475, bp for ModicumRichard. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 78475 bp. /note=Function call: Function unknown. There were hits that had functions to them in phagesdb, but there were not top hits and the functions were not consistent. The top three phagesdb BLAST hits have unknown functions (E-value < 10^-59). NCBI BLAST only had three hits, but only two of them were good matches. The top two NCBI BLAST hit also have the unknown function. (100% coverage, 97.4%+ identity, and E-value <10^-73). HHpred had a hits that had high probability and coverage but the evalue was also high. None of the functions were the same between the hits as well. Therefore, HHpred cannot be used as evidence for function. CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Bouklas, Tejas; Holaway, Aaron /note=Secondary Annotator QC: QC1 complete; QC2 complete CDS complement (78577 - 78858) /gene="151" /product="gp151" /function="hypothetical protein" /locus tag="ModicumRichard_151" /note=Original Glimmer call @bp 78858 has strength 9.68; Genemark calls start at 78858 /note=SSC: 78858-78577 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_CAFASSO_150 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 2.25783E-62 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.485, -3.8116689268127657, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_150 [Gordonia phage Cafasso] ],,QXN74365,100.0,2.25783E-62 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Vo, Cindy /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start at 78858. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. Coding potential has good coverage and width in the ORF. /note=SD (Final) Score: The final score is the best option at -3.812 and the z score is the highest at 2.485. /note=Gap/overlap: The overlap is ideal for an operon at a 1 bp overlap. This gene and gap was conserved in many other phages as well, such as Cafasso and Aleemily. /note=Phamerator: pham: 107824. Date 10/24/2024. It is conserved; found in ObLaDi (DZ), Cafasso (DZ), and Aleemily (DZ). /note=Starterator: There are only 3 non-draft members of this Pham. 3/3 non-draft members call start site 7, which correlates to a start site of 78858, bp for ModicumRichard. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 78858 bp. /note=Function call: Function unknown. The top three phagesdb BLAST hits have unknown functions (E-value= 6x10^-51). There was only one hit in NCBI BLAST and the hit was a good match. The top NCBI BLAST hit also have the unknown function. (100% coverage, 100% identity, and E-value= 2.2x10^-62). HHpred had a hits but had low coverage, low probability, and high e-value. Therefore, HHpred cannot be used as evidence for function. CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Bouklas, Tejas; Holaway, Aaron /note=Secondary Annotator QC: QC1 complete; QC2 complete CDS complement (78858 - 79061) /gene="152" /product="gp152" /function="hypothetical protein" /locus tag="ModicumRichard_152" /note=Original Glimmer call @bp 79061 has strength 14.96; Genemark calls start at 79061 /note=SSC: 79061-78858 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_151 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 2.04801E-38 GAP: 128 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.19, -4.801936092881806, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_151 [Gordonia phage Cafasso] ],,QXN74366,100.0,2.04801E-38 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Holaway, Aaron /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start site at 79061. /note=Coding Potential: Coding potential is found in both GeneMark Self and Host. In GeneMark Host, coding potential in this ORF is only in the reverse strand, indicating it is a reverse gene. However, in GeneMark Self, this same coding potential is observed in this ORF for the reverse strand but some background coding potential is observed in the forward strand as well. A similar pattern was observed in the GeneMark Self for Gene(stop 78187 R) of Aleemily (DZ) determined to be syntenic to this ORF. Additionally, this gene is located between several genes predicted to be on the reverse strand so unlikely for coding potential on the forward strand to be real. This ORF is very likely a reverse gene. /note=SD (Final) Score: -4.802. It is the second best final score on PECAAN. The best final score belongs to an ORF of 9bp. /note=Gap/overlap: 128bp. Somewhat large, but reasonable as the same gap is observed in other phages (Aleemily and Morgana; DZ) and almost no coding potential in this gap. Only coding potential observed is some atypical coding potential in the forward strand in GeneMarkSelf. Again, this gene is located between several genes predicted to be on the reverse strand so unlikely for coding potential on the forward strand to be real. /note=Phamerator: pham: 11625. Date accessed: October 25, 2024. The gene is completely conserved in Aleemily (DZ) and Morgana (DZ). /note=Starterator: Start site 2 in Starterator was manually annotated for 4/4 non-draft genes in this pham. Start 2 corresponds to 79061 in ModicumRichard. This evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 79061. /note=Function call: No known function. The top 2 PhagesDB hits have no known function (e-value: 8e-32) and so do 3/3 of NCBI BLAST hits (e-value <9e-35, 100% coverage, >94.03% identity). CDD and HHpred had no significant hits. /note=Transmembrane domains: DeepTMHMM did not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Vo, Cindy /note=Secondary Annotator QC: QC1 Complete; QC2 Complete CDS complement (79190 - 79699) /gene="153" /product="gp153" /function="hypothetical protein" /locus tag="ModicumRichard_153" /note=Original Glimmer call @bp 79699 has strength 10.56; Genemark calls start at 79699 /note=SSC: 79699-79190 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_152 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 1.43838E-120 GAP: 94 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.084, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_152 [Gordonia phage Cafasso] ],,QXN74367,100.0,1.43838E-120 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Holaway, Aaron /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start site at 79699. /note=Coding Potential: Coding potential is found in both GeneMark Self and Host. In GeneMark Host, coding potential in this ORF is only in the reverse strand, indicating it is a reverse gene. In GeneMark Self, this same coding potential is observed in this ORF for the reverse strand but some atypical background coding potential is observed in the forward strand as well. This gene is located between several genes predicted to be on the reverse strand so unlikely for coding potential on the forward strand to be real. This ORF is very likely a reverse gene. /note=SD (Final) Score: -2.505. It is the best final score on PECAAN. /note=Gap/overlap: 94bp. Somewhat large, but reasonable as about the same gap is observed in other phages (Aleemily also 94bp, Morgana 95bp; DZ) and almost no coding potential in this gap. Only coding potential observed is some atypical coding potential in the forward and reverse strands in GeneMarkSelf. Since the gene is between several genes predicted to be on the reverse strand, the coding potential on it is likely not real. Furthermore, it is only atypical coding potential for both strands so likely not real as well. /note=Phamerator: pham: 193253. Date accessed: October 25, 2024. The gene is completely conserved in Aleemily (DZ) and mostly conserved among the other genes in the pham. /note=Starterator: Start site 5 in Starterator was manually annotated for 4/4 non-draft genes in this pham. Start 5 corresponds to 79699 in ModicumRichard. This evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 79699. /note=Function call: No known function. The top 2 PhagesDB hits have no known function (e-value: 1e-94) and so do the top 2 NCBI BLAST hits (e-value <7e-119, 100% coverage, >98.82% identity). CDD had a significant hit to DUF6011 (e-value: 8.55e-13), representing a member of a biosynthetic gene cluster (BGC) and a domain of unknown function (does not appear in PECAAN). HHpred had a significant hit to DUF6011 as well (e-value: 3.1e-11, 99.16% probability). HHpred in PECAAN predicted a coverage of 98.5%. /note=Transmembrane domains: DeepTMHMM did not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Vo, Cindy /note=Secondary Annotator QC: QC1 Complete; QC2 Complete CDS complement (79794 - 80567) /gene="154" /product="gp154" /function="hypothetical protein" /locus tag="ModicumRichard_154" /note=Original Glimmer call @bp 80567 has strength 18.65; Genemark calls start at 80567 /note=SSC: 80567-79794 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_153 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 0.0 GAP: 96 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.177, -2.253486910374644, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_153 [Gordonia phage Cafasso] ],,QXN74368,100.0,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Holaway, Aaron /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start site at 80567. /note=Coding Potential: Coding potential is found in both GeneMark Self and Host. In GeneMark Host, coding potential in this ORF is only in the reverse strand, indicating it is a reverse gene. In GeneMark Self, this same coding potential is observed in this ORF for the reverse strand but some atypical background coding potential is observed in the forward strand as well. This gene is located between several genes predicted to be on the reverse strand so unlikely for coding potential on the forward strand to be real. This ORF is very likely a reverse gene. /note=SD (Final) Score: -2.253. It is the best final score on PECAAN. /note=Gap/overlap: 96bp. Somewhat large, but reasonable as about the same gap is observed in other phages (Aleemily also 96bp, Morgana 97bp; DZ) and almost no coding potential in this gap. Only coding potential observed is some atypical coding potential in the forward and reverse strands in GeneMarkSelf. Since the gene is between several genes predicted to be on the reverse strand, the coding potential on it is likely not real. Furthermore, it is only atypical coding potential for both strands so likely not real as well. /note=Phamerator: pham: 186523. Date accessed: October 25, 2024. This pham is well conserved, with 494 members and only 39 of them being drafts. ModicumRichard’s gene in this pham is completely conserved with phages Aleemily (DZ), Cafasso (DZ), and ObLaDi (DZ) and mostly conserved among other genes in the pham. /note=Starterator: Start site 46 in Starterator was manually annotated for 367/455 non-draft genes in this pham. Start 46 corresponds to 80567 in ModicumRichard. This evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 80567. /note=Function call: No known function. The top 2 PhagesDB hits have no known function (e-value: 1e-143) and so do the top 2 NCBI BLAST hits (e-value: 0, 100% coverage, >99.22% identity). CDD had a significant hit to DUF2786 (e-value: 2.60e-9), representing a family of proteins with no known function. CDD is PECAAN predicted a coverage of 15.18% and percent identity of 46.15%. HHpred had a significant hit to DUF2786 as well (e-value: 3.1e-10, 99.12% probability). HHpred in PECAAN predicted a coverage of 15.18%. /note=Transmembrane domains: DeepTMHMM did not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Vo, Cindy /note=Secondary Annotator QC: QC1 Complete; QC2 Complete CDS complement (80664 - 81200) /gene="155" /product="gp155" /function="hypothetical protein" /locus tag="ModicumRichard_155" /note=Original Glimmer call @bp 81200 has strength 20.66; Genemark calls start at 81200 /note=SSC: 81200-80664 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_154 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 5.61068E-128 GAP: 923 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.927, -2.827683592113848, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_154 [Gordonia phage Cafasso] ],,QXN74369,100.0,5.61068E-128 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Holaway, Aaron /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start site at 81200. /note=Coding Potential: Coding potential is found in both GeneMark Self and Host. In GeneMark Host and GeneMark Self, coding potential in this ORF is only in the reverse strand, indicating it is a reverse gene. /note=SD (Final) Score: -2.828. It is the best final score on PECAAN. /note=Gap/overlap: 923bp. Very large gap, but seems legit as there’s no typical coding potential between this gene and the next auto-identified gene (Gene(stop 82510 F)). Only some atypical coding potential is observed in this gap, but not a substantial amount. Furthermore, this large gap appears to be conserved among its cluster (Aleemily 907bp, Morgana 1023bp; DZ). Additionally, after this gene, gene orientation switches to the forward strand and this gap is more than enough to accommodate the switch. /note=Phamerator: pham: 88850. Date accessed: October 25, 2024. This pham is completely conserved between Modicum Richard and ObLaDi (DZ) and mostly conserved among Aleemily (DZ), Morgana (DZ), and Cafasso (DZ). /note=Starterator: Start site 7 in Starterator was manually annotated for 4/4 non-draft genes in this pham. Start 7 corresponds to 81200 in ModicumRichard. This evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 81200. /note=Function call: No known function. The top 3 PhagesDB hits have no known function (e-value: 1e-103) and so do the top 2 NCBI BLAST hits (e-value <2e-87, 100% coverage, >70.33% identity). CDD and HHpred had no significant hits. /note=Transmembrane domains: DeepTMHMM did not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Vo, Cindy /note=Secondary Annotator QC: QC1 Complete; QC2 Complete CDS 82124 - 82510 /gene="156" /product="gp156" /function="hypothetical protein" /locus tag="ModicumRichard_156" /note=Original Glimmer call @bp 82124 has strength 11.27; Genemark calls start at 82178 /note=SSC: 82124-82510 CP: no SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_OBLADI_155 [Gordonia phage ObLaDi]],,NCBI, q1:s1 100.0% 4.49641E-83 GAP: 923 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.166, -2.2763497933341483, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_OBLADI_155 [Gordonia phage ObLaDi]],,UXE03878,100.0,4.49641E-83 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Holaway, Aaron /note=Auto-annotation start source: Glimmer and GeneMark. Glimmer calls the start site at 82124 and GeneMark calls the start site at 82178. /note=Coding Potential: Coding potential is found in both GeneMark Self and Host. In both, the GeneMark start doesn’t cut off any coding potential, but the stop cuts off the coding potential. More coding potential beyond the stop is present and not included in the ORF. Furthermore, a sharp peak of typical coding potential between this gene and its downstream gene is observed in both GeneMark Self and Host. However, this coding potential is not located within any reading frames and unlikely to be a potential gene or extension of another gene. Additionally, this same pattern was observed in Aleemily (DZ) for Gene(stop 81823 F). Some background atypical coding potential on the reverse strand was observed in GeneMark Self but all typical coding potential is located on the forward strand, indicating it is a forward gene. /note=SD (Final) Score: -2.276 for Start @ 82124. It is the best final score on PECAAN. /note=Gap/overlap: 923bp for Start @ 82124; very large gap. This gap is conserved in Aleemily (907bp; DZ). Typical coding potential is observed in this gap, but the coding potential is not within any reading frames and unlikely to be a potential gene or extension of another gene. /note=Phamerator: pham: 108230. Date accessed: October 27, 2024. This pham is completely conserved in Aleemily (DZ) and ObLaDi (DZ) and mostly conserved in Cafasso (DZ) /note=Starterator: Start site 1 in Starterator was manually annotated for 3/3 non-draft genes in this pham. Start 1 corresponds to 82124 in ModicumRichard. This evidence agrees with the start site predicted by Glimmer. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 82124. /note=Function call: No known function. The top 2 PhagesDB hits have no known function (e-value <6e-67) and so do 3/3 NCBI BLAST hits (e-value <4e-72, 100% coverage, >94.53% identity). CDD and HHpred had no significant hits. /note=Transmembrane domains: DeepTMHMM did not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Vo, Cindy /note=Secondary Annotator QC: QC1 Complete; QC2 Complete CDS 82598 - 82831 /gene="157" /product="gp157" /function="hypothetical protein" /locus tag="ModicumRichard_157" /note=Original Glimmer call @bp 82598 has strength 17.92; Genemark calls start at 82598 /note=SSC: 82598-82831 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ALEEMILY_154 [Gordonia phage Aleemily] ],,NCBI, q1:s1 100.0% 5.31791E-47 GAP: 87 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.417, -3.8900809529022813, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEEMILY_154 [Gordonia phage Aleemily] ],,UVK59894,100.0,5.31791E-47 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Holaway, Aaron /note=Auto-annotation start source: Glimmer and GeneMark. Both Glimmer and GeneMark call the start site at 82598. /note=Coding Potential: Coding potential is found in both GeneMark Self and Host. In both, a small peak of typical coding potential is seen in the reverse strand. However, this is conserved in Aleemily (DZ) and not much compared to the coding potential in the forward strand. Additionally, there is some overlap in typical coding potential near the stop codon on the forward strand in another track. This is conserved in Aleemily (DZ) as well. Furthermore, this overlapping coding potential spans approximately 100bp and covers a reading frame not marked automatically to be a gene. The same thing is observed in Aleemily (DZ) and has been marked manually as a gene in the Phams database. Also, there’s some coding potential that extends beyond the possible reading frame towards the start. This is also observed in Aleemily (DZ). Overall, this gene appears to be a forward gene. /note=SD (Final) Score: -3.890 for Start @82598. It is the second best final score on PECAAN. The best final store belongs to a start @ 82790 and results in a gene with a length of 42bp. /note=Gap/overlap: 87bp for Start @ 82124; somewhat large gap. On the PECAAN database, this gap is conserved in Aleemily (DZ). However, for the Phams database, there’s an overlap of 3bp. This makes more sense with the observed overlapping coding potential in GeneMark. This reported gap on PECAAN does not appear to be legitimate. /note=Phamerator: pham: 1781. Date accessed: October 27, 2024. This pham is completely conserved in Aleemily (DZ), ObLaDi (DZ), and Cafasso (DZ) and mostly conserved in Morgana (DZ) /note=Starterator: Start site 14 in Starterator was manually annotated for 11/52 non-draft genes in this pham. The most common manually annotated start site was 15, which was called in 18/52 non-draft genes in this pham but isn’t present in ModicumRichard. Start site 14 corresponds to 82598. This evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 82598. However, I do not believe it has a gap of 87bp. I think it overlaps with a downstream gene that hasn’t been automatically identified. /note=Function call: No known function. The top 2 PhagesDB hits have no known function (e-value: 3e-41) and so do the top 2 NCBI BLAST hits (e-value <4e-46, 100% coverage, >98.70% identity). CDD and HHpred had no significant hits. /note=Transmembrane domains: DeepTMHMM did not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Vo, Cindy /note=Secondary Annotator QC: QC1 Complete; QC2 Complete CDS 82981 - 83229 /gene="158" /product="gp158" /function="hypothetical protein" /locus tag="ModicumRichard_158" /note=Original Glimmer call @bp 82981 has strength 14.31; Genemark calls start at 82987 /note=SSC: 82981-83229 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_OBLADI_158 [Gordonia phage ObLaDi]],,NCBI, q1:s1 100.0% 1.93931E-51 GAP: 149 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.067, -2.5588120788329394, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_OBLADI_158 [Gordonia phage ObLaDi]],,UXE03881,100.0,1.93931E-51 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: SUN, YANZI /note=Auto-annotation: Both called but don`t agree. Since Glimmer weigh more than Genemark, we choose the Glimmer start site: 82981 /note=Coding Potential: Yes, the start site covers all the coding potential for this gene. /note=SD (Final) Score: -2.559 /note=Gap/overlap: Gap of 149bp, which is also the LORF /note=Phamerator: The gene is part of pham 11857 as of Oct.31, 2024 /note=Starterator: The starterator shows a highly conserved start site, number 3, which starts at 82981 /note=Location call: The evidence suggests that the gene’s initial start site is correct. It is the Forward LORFs and the gap size is common for starting sites. It is likely that this is the correct start site. /note=Function call: Unknown, neither CDD nor HHpred return any significant hit. /note=Transmembrane domains: Not having transmembrane domain /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 83216 - 83479 /gene="159" /product="gp159" /function="hypothetical protein" /locus tag="ModicumRichard_159" /note=Original Glimmer call @bp 83216 has strength 10.05; Genemark calls start at 83216 /note=SSC: 83216-83479 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_OBLADI_159 [Gordonia phage ObLaDi]],,NCBI, q1:s1 100.0% 4.3928E-58 GAP: -14 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.013, -4.699386225878371, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_OBLADI_159 [Gordonia phage ObLaDi]],,UXE03882,100.0,4.3928E-58 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Laughton, Alette /note=Auto-annotation: Gene (stop@83216 F), Glimmer start called at 83216 bp, GeneMark start called at 83216 bp. This start has a start codon of ATG. /note=Coding Potential: The gene has reasonable coding potential in the Self-Trained and Host-Trained GeneMark systems, and the site covers all of the coding potential. The gene is 264 bp long, shows no switch in gene orientation from forward to reverse, and shows coding potential in only one frame of the forward GeneMark system. /note=SD (Final) Score: -4.699. This final score is the highest of the listed final scores for suggested start sites. The Z score is 2.013. This Z score is the highest of the listed Z scores for suggested start sites /note=Gap/overlap: 14 bp overlap. This is the smallest of the gaps/overlaps for listed start sites. This is not the longest ORF for the gene, but there is no coding potential suggesting the presence of another gene before the start site. /note=Phamerator: Pham 12290, run on 10/31/2024. It is conserved, found in Aleemily (DZ), ObLaDi (DZ), Morgana (DZ), and Cafasso (DZ). The function is not called. /note=Starterator: 3/5 non-draft members call start site 1, which does not correlate to the chosen start site 5 at 83216 for this phage. Glimmer and GeneMark`s autoannotated start at 83216 still seems to be the most likely start due to ModicumRichard not containing the start site of 1. ObLaDi also calls a start site of 5 for this gene. /note=Location call: Based on the above evidence, this is a real gene. The gene is conserved and has reasonable coding potential. The likely start site for this gene is 83216. 83216 seems to be the correct start based on both Glimmer and GeneMark calling it, as well as ObLaDi calling start site #5 as well. This start also has the best final and Z score. /note=Function call: No Known Function (NKF). PhagesDB BLASTp returned the top 4 hits of function unknown (E values < 1e-43). NCBI BLASTp returned the top 4 hits of hypothetical protein, found in DZ cluster phages Aleemily, ObLaDi, Cafasso, and Morgana (E values < 1e-45, coverage > 96%, identity 60-100%). CDD returned no hits. HHpred returned the top PDB hit as PCNA-associated factor (E value = 0.25) and the top Pfam hit as Probable zinc-ribbon domain (E value = 1.5). /note=Transmembrane domains: DeepTMHMM does not call any trans-membrane domains (TMDs), and this is therefore not a transmembrane protein. /note=Secondary Annotator Name: Bouklas, Tejas /note=Secondary Annotator QC: Minor comments; add the z score, codon, whether any start sites are annotated more outside of cluster DZ CDS 83476 - 83724 /gene="160" /product="gp160" /function="hypothetical protein" /locus tag="ModicumRichard_160" /note=Original Glimmer call @bp 83476 has strength 17.1; Genemark calls start at 83476 /note=SSC: 83476-83724 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ALEEMILY_158 [Gordonia phage Aleemily] ],,NCBI, q1:s1 100.0% 1.23621E-51 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.246, -2.17469248771465, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEEMILY_158 [Gordonia phage Aleemily] ],,UVK59897,100.0,1.23621E-51 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Venkatraman, Rakshaa /note=Auto-annotation: Both Glimmer and GeneMark agree with start site 83476, start codon is GTG /note=Coding Potential: Good coding potential in predicted ORF in both host-trained and self-trained. Predicted start site covers the entirety of said coding potential. /note=SD (Final) Score: Chosen final score is the best as it is the least negative of all other options. It also has a very good Z-score that is above 2. Best z-score and final score compared to other options. /note=Gap/overlap: No gap, 4bp overlap indicative of operon. Length of gene given said start is acceptable: 248bp /note=Phamerator: As of 11/12/24, the gene is in Pham 88945. This pham only has 4 non-draft phages, and all of these phages are of cluster DZ. Pham is conserved amongst cluster DZ - shown through Aleemily, Cafasso, Morgana, and ObLaDi. /note=Starterator: Reasonable start site that is conserved amongst phages in the pham, Start 3, called by 4 of 4 non-draft phage annotations. In this phage, this corresponds to Start: 3 3 @83476 has 4 MA`s /note=Location call: Based on the current evidence, this gene is a real gene with good coding potential, final/Z-score, and synteny with other phages. The auto-annotated 83476 start site seems correct based on this same evidence, this is agreed by starterator/phamerator as well (though since there are only 4 other phages to compare to, this isn’t the best evidence, but it agrees with all other data!) /note=Functional call: No known function. PhagesDB blast results and NCBI blastp returned no known function/hypothetical protein with strong e-values, no CDD results, and HHPred results were uninformative (poor e-values). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: /note=Secondary Annotator QC: CDS 83726 - 84076 /gene="161" /product="gp161" /function="hypothetical protein" /locus tag="ModicumRichard_161" /note=Original Glimmer call @bp 83726 has strength 13.68; Genemark calls start at 83726 /note=SSC: 83726-84076 CP: yes SCS: both ST: SS BLAST-Start: [DUF4262 domain-containing protein [Mycobacterium sp. JS623] ],,NCBI, q23:s176 81.0345% 1.46256E-10 GAP: 1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.435, -4.301170562178417, yes F: hypothetical protein SIF-BLAST: ,,[DUF4262 domain-containing protein [Mycobacterium sp. JS623] ],,WP_015298249,21.3235,1.46256E-10 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Qian, Audrey /note=Auto-annotation: Gene (stop@84076 F). Both Glimmer and GeneMark call the start site at 83726. /note=Coding Potential: Coding potential in this ORF is on the forward strand (second frame) only, indicating that this is a forward gene. The ORF does have reasonable coding potential, which is found both in GeneMark Self and Host. The chosen start site includes all of the coding potential. /note=SD (Final) Score: -4.301. It is the best score on PECAAN. The z score is also the highest at 2.435. /note=Gap/overlap: 1 bp gap. This is the smallest gap and is the most reasonable, since this gap is not too large, and there is no overlap. Additionally, its start site is agreed upon by Glimmer and GeneMark, and this gap is conserved in other phages, such as Cafasso and ObLaDi. /note=Phamerator: The pham number as of 11/4/2024 is 193896. The gene is conserved in phages Aleemily, Cafasso, and Morgana, all in the same cluster as ModicumRichard (DZ). The function call for the gene in these phages is unknown. /note=Starterator: There are 83 non-draft genes in the pham. 25/83 non-draft members call start site 36. However, the auto-annotated start site for ModicumRichard is 28 at 83726 bp, which has 3 MA’s only. This auto-annotated start site disagrees with the most manually annotated start site for ModicumRichard, which is 28 at 83753bp with 14 MA’s. Furthermore, ModicumRichard does not have an annotation for start site 36 as called in 25 others. Despite this misalignment, other phages in the same cluster, such as ObLaDi and Cafasso, call start site 28, and both Glimmer and GeneMark call the start site at 83726 bp. The start site may be suggestive of an operon. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 83726 bp. Starterator slightly agrees with Glimmer and GeneMark. Starterator states that the start number called the most often in the published annotations is 36 (25/88 non-draft genes in the pham), but ModicumRichard does not have this “Most Annotated” start. Instead, the auto-annotated start for ModicumRichard is 28 at 83726 bp. This start site, rather than start site 36 or 31 (from most MA’s), agrees with Glimmer and GeneMark. Furthermore, the start site in Starterator across most homologues is conserved, just in different locations. Starterator is informative but does not provide any new information. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 83726 bp. Starterator agrees with Glimmer but not GeneMark. /note=Functional call: NKF. All PhagesDB BLAST hits have no known function (e-value < 8e-6), except for one hit, whose function was called as MazG-like nucleotide pyrophosphohydrolase (score 50, e-value 2e-6). The best hits all have unknown functions. The NCBI BLAST called hits with unknown function (81.8966%+ coverage, 40%+ identities, and e-value < 5e-06) or with the function called as DUF4262 domain-containing protein (e-values < 2e-06, 37%+ identities), such as sequences ID WP_015298249.1 and WP_302285300. There were no hits on CDD, and HHpred did not call any good hits, as they were all green and blue (low probability of < 77.99% and high e-value of > 34). No significant similarities were found on NCBI BLAST for the phage containing the function MazG-like nucleotide pyrophosphohydrolase when compared to the amino acid sequence of ModicumRichard. Additionally, DUF4262 domain-containing protein was not listed on the SEA-PHAGES Functional Assignments sheet, and there were no significant similarities found in the sequence of the corresponding phage, Librie_53, compared to that of ModicumRichard. As a result, none of the hits were relevant, and the functional call is NKF. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs. Therefore, it is not a membrane protein. /note=Secondary Annotator Name: Bouklas, Tejas /note=Secondary Annotator QC: No comments. Good work! Would add that the start site is suggestive of operon. CDS 84073 - 84393 /gene="162" /product="gp162" /function="hypothetical protein" /locus tag="ModicumRichard_162" /note=Original Glimmer call @bp 84073 has strength 10.39; Genemark calls start at 84073 /note=SSC: 84073-84393 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ALEEMILY_160 [Gordonia phage Aleemily] ],,NCBI, q1:s1 100.0% 3.04017E-71 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.927, -4.371751636464124, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEEMILY_160 [Gordonia phage Aleemily] ],,UVK59899,100.0,3.04017E-71 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: BURKE, KAIDEN /note=Auto-annotation: Glimmer and GeneMark both call this gene with agreement on the start codon at position 84073. This start codon is a “ATG”. /note=Coding Potential: The host-trained GeneMark and self-trained GeneMark gives the 1st ORF of the forward strand very high coding potential. I believe the start site does work well within this coding potential and matches with the Glimmer/GeneMark predictions. /note=SD (Final) Score: -4.372 is the final score with a Z-score of 2.927 (codon is ATG) and is the best of the reasonable gap start sites by far. The rest have large gaps and less reasonable scores. /note=Gap/overlap: Start codon at 84073 has the most reasonable and gap consistent with an operon while the rest show large gaps making them very unreasonable for the coding potential they miss. /note=Phamerator: As of 10/18/24 this gene is in the pham 7265. The start number called the most often in the published annotations is 6, it was called in 8 of the 9 non-draft genes in the pham. The members of the cluster share this site and match the autoannotated site given. /note=Starterator: This gene is very conserved in the pham. All 4 other phages in the cluster have manual annotations at their equivalents for start site 84073 giving more evidence that this is the correct site. /note=Location call: The gene is a real gene that starts very likely at the start codon 84073 due to its overlap showing signs of an operon and high RBS value. The conservation in other members of the pham and cluster also follow this start site pattern and synteny is maintained around this gene with this start site. /note=Function call: There is no known function. The data from the top 2 NCBI and Phagesdb BLASTp shows high e-values (< e-66) and high identity (100% and > 95%). There were no CDD hits and all the HHpred hits had high e-values (> 10) which provides no evidence of the role of the protein. /note=Transmembrane domains: DeepTMHMM found no likely TMDs. This cannot help determine function because nothing is known about function. /note=Secondary Annotator Name: Bouklas, Tejas /note=Secondary Annotator QC: Good work! Minor comments; add the z score, codon, the ending sentence of starterator has some typos or confusing to read, remove the spaces between the lines as we need these to be consistent because of how it gets processed at submission to HHMI. /note=Do not fill synteny for NKF calls CDS 84390 - 84686 /gene="163" /product="gp163" /function="hypothetical protein" /locus tag="ModicumRichard_163" /note=Original Glimmer call @bp 84390 has strength 8.93; Genemark calls start at 84390 /note=SSC: 84390-84686 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ALEEMILY_161 [Gordonia phage Aleemily] ],,NCBI, q1:s1 100.0% 1.02009E-64 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.329, -6.100660902441702, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEEMILY_161 [Gordonia phage Aleemily] ],,UVK59900,100.0,1.02009E-64 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Su, Erin /note=Auto-annotation: Glimmer and GeneMark both call the gene and agree on a start site of 84390. The Glimmer score is 8.93. /note=Coding Potential: The gene has coding potential in the forward strand as indicated by GeneMarkSelf and GeneMark Host. The start site of 84390 does encompass all of this coding potential. /note=SD (Final) Score: The final score for this start site is -6.101 and the Z-score is 1.329. This is neither the longest ORF nor the highest Z or final score, but the gap is the most ideal (-4bp). The start codon is also GTG, more common, though all of the top possible start sites suggest GTG. /note=Gap/overlap: The gap is -4, indicating an overlap of 4bp. This suggests possible operon status. The other possible start sites had overlaps greater than 45bp and gaps greater than 40bp, so 84390 is the best possible start site. This also means that the start site does not need a high RBS score. /note=Phamerator: The pham number as of 11/15/2024 is 8363. There are 6 other non-draft genes in this pham. 4 correspond to genes from the other four cluster DZ Gordonia phages—Morgana, Cafasso, ObLaDi, and Aleemily. The two others are from Hannaconda and KashFlow in cluster J. Each of the 5 DZ genes have lengths of 297bp and the 2 J genes have lengths of 309bp. /note=Starterator: The four other non-draft genes from phages in cluster DZ (Aleemily_161, ObLaDi_163, Morgana_170, Cafasso_165) all call start site 10, which corresponds to start site 84390 for ModicumRichard. The two genes from phages in cluster J (KashFlow_198, Hannaconda_197) both call start site 8. The Starterator analysis was last run on 11/02/24 on database version 579. /note=Location call: Due to the above evidence, this is a real gene and the start site is 84390. /note=Function call: No known function. PhagesDB BLAST’s top 4 hits (Aleemily_161, ObLaDi_163, Morgana_170, Cafasso_165) have E-values below 10^-50, but their functions are not known. The next two hits (KashFlow_198, Hannaconda_197) have E-values below 10^-18 but no known function either. NCBI BLAST’s top three hits have 85%+ identity, 100% coverage, and E-values below 10^-58, but are all hypothetical proteins as well. The HHPred hits were poor. The top hit had a probability of 90%+ but an E-value greater than 10^-1 and less than 50% coverage, therefore not strong enough evidence to call a function. CDD had no hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, so this is not a transmembrane protein. /note=Secondary Annotator Name: /note=Secondary Annotator QC: CDS 85036 - 85746 /gene="164" /product="gp164" /function="PAPS reductase-like domain" /locus tag="ModicumRichard_164" /note=Original Glimmer call @bp 85036 has strength 14.72; Genemark calls start at 85036 /note=SSC: 85036-85746 CP: no SCS: both ST: NI BLAST-Start: [PAPS reductase-like domain protein [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 2.31229E-172 GAP: 349 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.859, -2.90473872338446, yes F: PAPS reductase-like domain SIF-BLAST: ,,[PAPS reductase-like domain protein [Gordonia phage Cafasso] ],,QXN74381,100.0,2.31229E-172 SIF-HHPRED: c.26.2.2 (A:) Phosphoadenylyl sulphate (PAPS) reductase {Escherichia coli [TaxId: 562]},,,d1sura_,79.2373,99.9 SIF-Syn: there is synteny. upstream and downstream gene phams are conserved in at least two non-draft genes. /note=Primary Annotator Name: Bouklas, Tejas /note=Auto-annotation: Glimmer and GeneMark both call the gene and agree on a start site of 85036. /note=Coding Potential: The gene has coding potential in the forward strand as indicated by GeneMarkSelf and GeneMark Host. The start site of 85036 does encompass all of this coding potential. /note=SD (Final) Score: The final score for this start site is -2.905 and the Z-score is 2.859. This not the longest ORF, but is the the highest Z and final score. The start codon is TTG, which is less common. /note=Gap/overlap: The gap is high (349bp), but other gap options are not comparable or higher. /note=Phamerator: The pham number as of 6/7/2025 is 9427. CDS 85743 - 86438 /gene="165" /product="gp165" /function="hypothetical protein" /locus tag="ModicumRichard_165" /note=Original Glimmer call @bp 85743 has strength 14.72; Genemark calls start at 85743 /note=SSC: 85743-86438 CP: no SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_ALEEMILY_163 [Gordonia phage Aleemily]],,NCBI, q1:s1 100.0% 1.07881E-167 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.828, -4.574171834242154, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEEMILY_163 [Gordonia phage Aleemily]],,UVK59902,99.5671,1.07881E-167 SIF-HHPRED: SIF-Syn: /note=Glimmer and genemark start sites agree on site 85743, which is conserved in other non-draft phages. This has an overlap of 4, likely an operon. No known function based on starterator. CDS 86542 - 86688 /gene="166" /product="gp166" /function="hypothetical protein" /locus tag="ModicumRichard_166" /note=Original Glimmer call @bp 86542 has strength 5.44; Genemark calls start at 86542 /note=SSC: 86542-86688 CP: no SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_CAFASSO_168 [Gordonia phage Cafasso] ],,NCBI, q1:s1 100.0% 5.3411E-22 GAP: 103 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.007, -3.9067510144203146, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_168 [Gordonia phage Cafasso] ],,QXN74383,100.0,5.3411E-22 SIF-HHPRED: SIF-Syn: /note=Glimmer and genemark start sites agree on site 86542, which is conserved in other non-draft phages. No known function based on starterator.