CDS 1 - 372 /gene="1" /product="gp1" /function="terminase, small subunit" /locus tag="ObLaDi_1" /note=Original Glimmer call @bp 1 has strength 13.93; Genemark calls start at 1 /note=SSC: 1-372 CP: yes SCS: both ST: SS BLAST-Start: [terminase small subunit [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 4.32525E-82 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.758, -3.170856472917156, no F: terminase, small subunit SIF-BLAST: ,,[terminase small subunit [Gordonia phage Cafasso]],,QXN74216,100.0,4.32525E-82 SIF-HHPRED: Terminase small subunit; genome packaging, bacteriophage, DNA binding, VIRAL PROTEIN; 1.4A {Enterobacteria phage HK97},,,6Z6E_B,76.4228,99.8 SIF-Syn: Small terminase subunit protein is flanked by a helix-turn-helix DNA binding domain, just like in phage Cafasso. /note=Primary Annotator Name: Chavez, Valeria /note=Auto-annotation: Glimmer and Genemark both agree on the same start site (1). The start codon is ATG. /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF. The start site covers all of the predicted coding potential. /note=SD (Final) Score: -3.171. It is the second best final score on PECAAN. /note=Gap/overlap: There is no gap or overlap. /note=Phamerator: This gene is in Pham 32951 as of 10/26/21. Our phage is in cluster DZ, and there is 1 non draft gene (Cafasso_1) that also has this pham. The other non draft gene (VanLee_1) also has this pham. Phamerator did not have a function called for this gene, but the function for this same gene in Cafasso is terminase small subunit. /note=Starterator: Yes, start site 1 at basepair position 1 is conserved among other members of the pham to which this gene belongs. 2/2 non draft genes in this pham call this site 1. /note=Location call: The gathered evidence suggests that this is a real gene and that start site 1 at basepair position 1 is most likely the true start site. /note=Function call: The top 2 NCBI and PhagesDB BLASTp hits, sorted by e-value, suggested function is terminase small subunit, with high query coverage (>98%), medium to high % identity (100%, 47%), and low e-values (<1e-26). CDD and HHpred hits had high probability of >99%, high coverage of >65%, and low e-values that met the <10e-3 threshold. The suggested function is also terminase small subunit. /note=Transmembrane domains: Since TMHMM and TOPCONS didn’t call at least 1 TMD, we can conclude that this protein doesn’t have any TMDs. This makes sense because this gene codes for a terminase subunit, which packages phage genome. /note=Secondary Annotator Name: Batteikh, Maysaa /note=Secondary Annotator QC: I agree with the start site called and the evidence presented. All of the evidence categories have been considered. CDS 374 - 967 /gene="2" /product="gp2" /function="helix-turn-helix DNA binding domain" /locus tag="ObLaDi_2" /note=Original Glimmer call @bp 374 has strength 19.61; Genemark calls start at 431 /note=SSC: 374-967 CP: yes SCS: both-gl ST: SS BLAST-Start: [helix-turn-helix DNA binding domain protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 8.0951E-141 GAP: 1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.442, -3.8970005018578204, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding domain protein [Gordonia phage Cafasso]],,QXN74217,100.0,8.0951E-141 SIF-HHPRED: a.4.1.2 (C:) HIN recombinase (DNA-binding domain) {Synthetic},,,d1ijwc_,21.3198,97.7 SIF-Syn: Cafasso_2 is part of Pham 71137 with ObLaDi_2 and is annotated to be a helix-turn-helix DNA binding domain. Cafasso_1 and ObLaDi_1 are both part of Pham 32951, and Cafasso_1 is annotated to be a terminase small subunit. Finally, Cafasso_3 and ObLaDi_3 are part of Pham 13912, and Cafasso_3 is annotated to be a terminase large subunit. This synteny is still seen further downstream many genes. /note=Primary Annotator Name: Cheng, Celine /note=Auto-annotation: Glimmer calls the gene and predicts its start site to be at 374 bp. GeneMark also calls the gene, but predicts its start site to be at 431 instead. /note=Coding Potential: The ORF has strong coding potential, which is seen in both start sites called by Glimmer and GeneMark. The start site @ 374 bp would capture all of the coding potential, whereas the start site @ 431 bp would not. /note=SD (Final) Score: -3.897. Best final score listed. /note=Gap/overlap: 1 bp gap upstream, 4 bp overlap (ATGA) downstream /note=Phamerator: It is part of pham 71137 as of 10/22/2021. Pham 71137 has 22 members, 16 are non-draft. Cafasso (DZ) is non-craft and calls the gene. Aleemily is a draft but also calls it. /note=Starterator: Start 5 is found in 3/22 of genes in pham 71137. Cafasso calls Start 5 as well and was manually annotated to it, Aleemily (Draft) also calls it. Start 5 correlates to 374 bp for ObLaDi. This agrees with Glimmer’s predicted start site, but not GeneMark (431 bp). /note=Location call: Based on the evidence above, I would say that this is a real gene and its start site is @ 371 bp. I chose the start site @ 371 bp because it captures all coding potential predicted by GeneMark. /note=Function call: Helix-turn-helix DNA binding domain protein. The top two hits on PhagesDB BLAST encode for a helix-turn-helix DNA binding domain protein, with very strong similarity in Cafasso_2, and ~ 40% similarity in VanLee_2. NCBI BLASTp also has similar results that further corroborate our findings from PhagesDB BLAST. PhagesDB Function Frequency box also mentions that some similar genes encode for a minor tail protein, but I am placing greater weight on our hit with Cafasso_2 due to Cafasso being in the same cluster as ObLaDi. From our HHpred hits, it seems like we would annotate it as a serine homologous recombinase because it has a hit with an Hin recombinase, which belongs to the serine homologous recombinase family. However, a closer look shows that Hin recombinase also is a helix-turn-helix DNA binding domain Looking back at our other evidence, such as our BLASTp hits and Cafasso_2, with which ObLaDi_2 has synteny with, I would call it a helix-turn-helix DNA binding domain. /note=Transmembrane domains: TMHMM did not predict any transmembrane domains (TMD). TOPCONS did not either. This suggests that this gene product is not a transmembrane protein. /note=Secondary Annotator Name: Ma, Yiwen /note=Secondary Annotator QC: Good notes! I agree with your location call and function call. You notes are really detailed and well-organized. CDS 964 - 2316 /gene="3" /product="gp3" /function="terminase, large subunit" /locus tag="ObLaDi_3" /note=Original Glimmer call @bp 964 has strength 13.95; Genemark calls start at 964 /note=SSC: 964-2316 CP: yes SCS: both ST: SS BLAST-Start: [terminase large subunit [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.504, -4.535884094130711, no F: terminase, large subunit SIF-BLAST: ,,[terminase large subunit [Gordonia phage Cafasso]],,QXN74218,100.0,0.0 SIF-HHPRED: Large subunit terminase; large terminase, VIRAL PROTEIN; 2.2A {Deep-sea thermophilic phage D6E},,,5OE8_B,93.5556,100.0 SIF-Syn: Terminase large subunit, upstream gene is portal protein and downstream gene is a Helix-turn-helix DNA binding domain protein, just like in phage cafasso. /note=Primary Annotator Name: Cosentino, Evan /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 964. /note=Coding Potential: The ORF has good coding potential and is found on both GeneMark /note=self and Host. This is a forward gene as coding potential is only found on the forward strand. /note=SD (Final) Score: -4.536 /note=Gap/overlap: -4 bp /note=Phamerator: This is listed in pham 13912. Date: 10/31/21. /note=Starterator: Start site 19 in Starterator was manually annotated in 1/33 in non-draft genes. Start 19 is 964 in ObLaDi. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: 964 /note=Function call: Terminase large subunit. Many phagesdb BLAST hits have the function of terminase large subunit with low e-values and high scores. Many NCBI BLAST hits also have this result. Many top hits in CDD and HHpred also list it as a terminase large subunit with probabilities of 100%, 90%+ coverage, and low e-values. /note=Transmembrane domains: TmHmm and Topcons doesn`t call any transmembrane domains. /note=Secondary Annotator Name: Verpukhovskiy, Philipp /note=Secondary Annotator QC: Due to having good coding potential, a minimal overlap, and a conserved start site seen in other phages plus agreeing with other algorithms, I would agree with the sentiment that the primary annotator has put forward of the start site being 964. CDS 2279 - 3949 /gene="4" /product="gp4" /function="portal protein" /locus tag="ObLaDi_4" /note=Original Glimmer call @bp 2279 has strength 12.75; Genemark calls start at 2279 /note=SSC: 2279-3949 CP: yes SCS: both ST: SS BLAST-Start: [portal protein [Streptomyces phage Galactica]],,NCBI, q16:s3 92.446% 8.62267E-116 GAP: -38 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.651, -3.4685663819143713, no F: portal protein SIF-BLAST: ,,[portal protein [Streptomyces phage Galactica]],,QJD53951,56.0521,8.62267E-116 SIF-HHPRED: PORTAL PROTEIN; BACTERIOPHAGE SPP1, DNA TRANSLOCATION, MOLECULAR MOTOR, VIRAL PORTAL PROTEIN, VIRAL PROTEIN; HET: HG, CA; 3.4A {BACTERIOPHAGE SPP1},,,2JES_A,85.7914,100.0 SIF-Syn: In both Cafasso and ObLaDi, gene 4 has a function of portal protein, while upstream gene 3 has a function of terminase large subunit and downstream gene 5 has a function of MuF-like minor capsid protein. /note=Primary Annotator Name: Gibbons, Alicia /note=Auto-annotation: Predicted by both GeneMark, Glimmer. Both predict a start site of 2279. /note=Start codon: TTG /note=Coding Potential: The gene has reasonable coding potential within the putative ORF and the chosen start site covers all of this coding potential /note=SD (Final) Score: -3.469. This is the second-highest final score /note=Gap/overlap: -38 (original start site). This is a large overlap but this start site creates the largest ORF and has the second-highest SD and Z-score. The next location with the absolute lowest gap value has a value of positive 64 with much lower SD and Z-scores. The length of the gene is reasonable at 1671 base pairs /note=Phamerator: As of 10/25/21, this gene is in the pham 8910. All other phages of the cluster DZ (Cafasso, Draft Aleemily) contain this gene in this phamily. PhagesDB consistently (among all genes in this phamily with calls) calls the function of this gene to be a portal protein, although no call exists in Phamerator. This gene function exists in the approved function list. /note=Starterator: Start site 25 is conserved in 14/33 (42.4%) of the non-draft genes in this phamily. My chosen start site, 8, is only chosen 16.7% of the time when present but is called in (only) all three phages of the subcluster DZ (including drafts). This corresponds to the base pair 2279. Overall, only 1 of 33 non-draft genomes call this start site, but 3 of the phages of the subcluster DZ call it /note=Location call: The evidence suggests that this is a real gene with a start site of 2279 The evidence suggests that this is a real gene with a start site of 2279 and a stop site of 3949. /note=Function call: The top 12 hits (including bacteria) on NCBI BLASTp all have e-values of 0, with coverage ranging from 100-89%, and each except one calls this gene to be a portal protein ( one calls this gene to be a "hypothetical protein"). Identity ranged from 99.82% to 51.86%. The top 2 phage-related hits on BLASTp not including phage results from PhagesDB also call this gene to be a portal protein. The PhagesDB Function Frequency calls this gene to primarily be a portal protein. Furthermore, PhagesDB BLAST shows many sequences with high coverage, significantly low e-values, and portal protein functions. CDD has one hit for this gene with a function of being a phage portal protein and with an e-value of 7.48e-08. Hhpred also calls this gene to be a portal protein and gives it a probability of 100 and an e-value of 4.5e-31. /note=Transmembrane domains: No transmembrane domains predicted. /note=Secondary Annotator Name: Montoya Serpas, Cinthya /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 3924 - 4982 /gene="5" /product="gp5" /function="minor capsid protein" /locus tag="ObLaDi_5" /note=Original Glimmer call @bp 3924 has strength 10.29; Genemark calls start at 3924 /note=SSC: 3924-4982 CP: yes SCS: both ST: SS BLAST-Start: [MuF-like minor capsid protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 0.0 GAP: -26 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.59, -3.513874779476501, no F: minor capsid protein SIF-BLAST: ,,[MuF-like minor capsid protein [Gordonia phage Cafasso]],,QXN74220,100.0,0.0 SIF-HHPRED: Phage_min_cap2 ; Phage minor capsid protein 2,,,PF06152.13,94.6023,100.0 SIF-Syn: Minor capsid protein; upstream gene is portal protein, downstream gene is scaffolding protein, as in phage Cafasso. /note=Primary Annotator Name: Daria Di Blasi /note=Auto-annotation: Both Glimmer and GeneMark show the same start site at 3924 with start codon ATG /note=Coding Potential: coding potential on both host-trained and self-trained GeneMark; the start site covers all of the coding potential within the ORF. /note=SD (Final) Score: -3.514; most favorable final score of all potential starts. /note=Gap/overlap: 26 bp overlap with upstream gene & 134 bp gap with downstream gene; this start site produces the smallest overlap (26 bp) with the upstream gene of all the potential start sites. The length of the gene (1059 bp) is reasonable. /note=Phamerator: The gene is part of pham 80974 as of October 24th, 2021. The pham has 31 members, 4 of which are draft genomes. Pham 80974 is present in 3 phages in the DZ cluster (ObLaDi_Draft, Aleemily_Draft, and Cafasso). /note=Starterator: The highly conserved start site (start site 10) is not present in the ObLaDi genome. The most conserved start site (10) is called in 19 of the 25 non-draft genes in the pham. Start site 16 corresponding to base pair 3924 is called for phages in the DZ cluster (Cafasso and Aleemily_Draft). /note=Location call: The evidence suggests that the auto-annotated start site (start site 16) is the correct start because the start site is highly conserved among members of the DZ cluster, is called in a non-draft genome (Cafasso), Glimmer and GeneMark agree on this start site, the start site includes all of the coding potential, the predicted start has the highest RBS score, and this start site produces the smallest overlap with the upstream gene (26 bp overlap). The 26 bp overlap is something to be wary of yet this start site produces the smallest overlap of all the potential starts and also is the closest start site to the most conserved site (site 10). /note=Function call: Since the first 2 phagesdb BLASTp hits called the gene product a MuF-like minor capsid (E-value = 0.0, E-value = e-110), the first 2 NCBI BLASTp hits called the gene product a MuF-like minor capsid (E-value = 0.0, E-value = 9e-134), the only CDD hit called the conserved domain part of a Phage minor capsid protein 2 (E-value = 8.09e-30), and the best 2 HHpred hits called the conserved domain part of a ​​Phage minor capsid protein 2 (E-value = 6.4e-47) and Phage Mu protein F like protein (E-value = 5.9e-11), there is evidence to support that the function of gene is a minor capsid protein. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Paek, Brian /note=Secondary Annotator QC: I agree with auto-annotation. All the evidence is there. CDS 5117 - 5893 /gene="6" /product="gp6" /function="scaffolding protein" /locus tag="ObLaDi_6" /note=Original Glimmer call @bp 5117 has strength 13.79; Genemark calls start at 5120 /note=SSC: 5117-5893 CP: yes SCS: both-gl ST: NI BLAST-Start: [scaffolding protein [Gordonia phage Cafasso]],,NCBI, q2:s1 99.6124% 0.0 GAP: 134 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.563, -3.586741874491512, no F: scaffolding protein SIF-BLAST: ,,[scaffolding protein [Gordonia phage Cafasso]],,QXN74221,100.0,0.0 SIF-HHPRED: Phage_GP20 ; Phage minor structural protein GP20,,,PF06810.13,60.8527,97.9 SIF-Syn: Scaffolding protein, upstream gene is minor capside protein, downstream gene is major capsid protein, just like in phage Cafasso /note=Primary Annotator Name: Empson, Brianna /note=Auto-annotation: Glimmer and Genemark both predict a start site, but the predicted sites are different. Glimmer calls the start site at nucleotide 5117 whereas Genemark predicts the site to be at nucleotide 5120. Glimmer`s predicted start codon is GTG whereas Genemark`s predicted start codon is ATG. /note=Coding Potential: Good coding potential is predicted within the putative ORF. Both of the predicted start sites called by Glimmer and Genemark cover all predicted coding potential. /note=SD (Final) Score: The SD Score for a start at bp 5117 is -3.587. This is not the best final score, but it is the second-best and still reasonable. /note=Gap/overlap: When evaluating using a start site at bp 5117, the upstream gap of 134 bp is minimized but still just below the threshold for being unreasonable. This may suggest that a new gene needs to be added upstream. The length of the gene is reasonable. /note=Phamerator: As of 10/23/2021, this gene is in pham 33262. 3 of the 4 members in the pham, including Gene (stop@5893 F) for ObLaDi, belong to the same cluster (DZ), and the fourth member is a singleton. Cafasso_6 is a non-draft gene in cluster DZ while Aleemily_Draft_6 is a draft gene in cluster DZ. VanLee_7 is a singleton. A function was not called for this gene, but 2 of the 4 members in the same cluster (Cafasso_6 and VanLee_7) function as scaffolding proteins. These genes have been manually annotated, so there may be some evidence that Gene (stop@5893 F) may have the same function. /note=Starterator: Two of the four genes in starterator are draft genes so starterator was not very informative. However, start #2 is conserved in 1/2 manually annotated genes. Start #1 is called in 2/4 members of the pham if looking at both auto-annotated and manually annotated genes. Start #1 is at bp 5117 and start #2 is at bp 5120. Overall, there does not appear to be a very agreeable start site based on starterator. /note=Location call: Gene (stop@5893 F) is a real gene. All together, the evidence suggests a start @ bp 5117. This call takes into account both starterator evidence as well as evidence that bp 5117 provides the longest ORF and has the best Final Score. Start #1 also minimizes the upstream gap. /note=Function call: The top 4 blasts utilizing PhagesDB BLAST and NCBI BLAST suggest that this functions as a scaffolding protein with a high identity for Cafasso (100%) and low e-values across the board (≤ 1e-31). BLAST also had high coverage hits (> 99%). For HHpred, the top hit also has good coverage and a pretty high probability with a low e-value. All of the databases’ top results agree that is, at the very least, a structural protein with all except for the top HHpred hit agreeing on scaffolding protein. CDD had no relevant hits. /note=Transmembrane domains: No transmembrane domains predicted. This makes sense because scaffolding proteins are likely to interact with viral proteins instead of the bacterial cell membrane. /note=Secondary Annotator Name: Abuwarda, Manar /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 5963 - 6880 /gene="7" /product="gp7" /function="major capsid protein" /locus tag="ObLaDi_7" /note=Original Glimmer call @bp 5963 has strength 17.45; Genemark calls start at 5963 /note=SSC: 5963-6880 CP: yes SCS: both ST: SS BLAST-Start: [major capsid protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 0.0 GAP: 69 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.005, -2.7423981586358774, yes F: major capsid protein SIF-BLAST: ,,[major capsid protein [Gordonia phage Cafasso]],,QXN74222,100.0,0.0 SIF-HHPRED: Gene 5 protein; phage, Sf6, VIRUS; 2.89A {Shigella phage Sf6},,,5L35_B,97.3771,100.0 SIF-Syn: Major capsid protein. Located downstream of scaffolding protein, just like in Cafasso /note=Primary Annotator Name: Maisam Ghannam /note=Auto-annotation: Both Glimmer and Genemark populate the same auto annotation value (5963 being the start site, and the GlimmerScore has a 17.5 rating.The start codon is ATG (methionine)). /note=Coding Potential: Coding potential does exist within putative ORF. The start site covers all the coding potential analyzed via Host and Self Trained Genemark. /note=SD (Final) Score: -2.742, best SD score listed /note=Gap/overlap: 69 bp gap between previous gene and start site for current gene. No evidence suggests there is coding potential within this gap to relocate start site, despite being over the typical 50 bp gap threshold. /note=Phamerator: As of 10/22/21, gene was found in pham #57253 with 218 members. Pham had highly conserved genes that were found in the rest of the phages within the same cluster, specifically Aleemily and Cafasso. Function call was for a PnuC-like nicotinamide riboside transporter. /note=Starterator: Starterator depicts highly conserved start site (7 @ 5963) with 148 human reviewed confirmations that share this site. All phages in cluster DZ agree on common start site and were cross referenced on Phamerator for synteny alignment. /note=Location call: This is a real gene with a start site 7 @ 5963, contains highly conserved gene sequence as verified via both Host and Self Trained Genemark. No missing gene required for area preceding start site, as no computable coding potential was discovered. /note=Function call: major capsid protein. HHpred, PhagesDB, NCBI confirm high probability amount (>99%) of gene belonging to superfamily 6 of major capsid proteins, Shares high identity with human annotated Cafasso, which lies in the same sub cluster DZ as ObLaDi. /note=Transmembrane domains: None. This correlates with low Exp and Total Prob quantities in TMHMM (both <.02) along with no predicted TMSs. This aligns with gene functioning as major capsid protein. Capsid does not need to pass through PM layer of host, only genomic information does. /note=Secondary Annotator Name: Empson, Brianna /note=Secondary Annotator QC: Need to fill out the Starterator and Coding Potential drop downs. Other than that, everything looks good! Maybe add a bit more to your gap analysis and how 69 bp is just above the 50 bp threshold. CDS 6917 - 7312 /gene="8" /product="gp8" /function="hypothetical protein" /locus tag="ObLaDi_8" /note=Original Glimmer call @bp 6917 has strength 15.06; Genemark calls start at 6917 /note=SSC: 6917-7312 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_8 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 9.76889E-87 GAP: 36 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.84, -5.071299467497814, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_8 [Gordonia phage Cafasso]],,QXN74223,100.0,9.76889E-87 SIF-HHPRED: SIF-Syn: NKF, Pham #33204, upstream gene is major capsid protein, downstream is head-to-tail adaptor, just like in phage Cafasso /note=Primary Annotator Name: Light, Isabel /note=Auto-annotation: Both Glimmer and Genemark show the same start site at 6917 with a Glimmer Score of 15.06. The start codon is GTG. /note=Coding Potential: Coding potential is observed in the putative ORF. The start site covers all the coding potential. /note=SD (Final) Score: The SD score is not the best possible as there are two start sites that have more + values listed but it is still reasonable. /note=Gap/overlap: the gap is 36 bp long which is reasonable. /note=Phamerator: Conducted on 10/29/2021, this gene is in pham 33204. The gene was conserved among the other two members of the cluster, Cafasso and Aleemily but the function is unknown in both of these phages. /note=Starterator: Yes start site #1 (6917) is conserved among all pham members. All three members in this pham called start site 1 which is also the auto-annotation start-site. (3/3 call site #1). /note=Location call: All evidence supports this is a real gene that is conserved in phamerator and has good coding potential, it is most likely site #6917 is the start site as it was called for all three phages, specifically in Cafasso as that is the only member that has been manually annotated. Additionally, all other data supports this start site. /note=Function call: There have been no significant hits on any genes with known functions, all hits on BLAST, HHpred, NCBI BLASTp, and CCD. /note=Transmembrane domains: No evidence of transmembrane domains. /note=Secondary Annotator Name: Rajiv, Subashni /note=Secondary Annotator QC: I agree with the annotation as all evidence categories have been considered. CDS 7322 - 7741 /gene="9" /product="gp9" /function="head-to-tail adaptor" /locus tag="ObLaDi_9" /note=Original Glimmer call @bp 7322 has strength 11.64; Genemark calls start at 7331 /note=SSC: 7322-7741 CP: yes SCS: both-gl ST: SS BLAST-Start: [head-to-tail adaptor [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 2.44261E-91 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.249, -5.757584294088564, yes F: head-to-tail adaptor SIF-BLAST: ,,[head-to-tail adaptor [Gordonia phage Cafasso]],,QXN74224,100.0,2.44261E-91 SIF-HHPRED: DUF3199 ; Protein of unknown function (DUF3199),,,PF11436.10,87.7698,99.5 SIF-Syn: Head-to-tail adaptor protein, upstream gene is NKF, downstream is head-to-tail stopper, just like in phage Cafasso. /note=PECAAN Notes /note=Primary Annotator Name: Erfanian M., Kiana /note=Auto-annotation: Glimmer and GeneMark. Glimmer called a start site at 7322 (original start site), and GeneMark at 7331. /note=Coding Potential: This gene has reasonable coding potential within the putative ORF, and covers all this coding potential. /note=SD (Final) Score: The RBS Final Score for the original start site is not the highest at -5.758, is still reasonable. The Z-score for this start however, is the highest at 2.249. /note=Gap/overlap: My gene has a 9 bp gap with the gene that comes before its start, a size that is considered reasonable given that it is less than 50 bp. It does not have a gap with its downstream gene, but does however have an overlap of 3 bp. This does not indicate an operon, and is very reasonable. /note=Phamerator: This gene was found in pham 56502 on 10/21/21, and consists of 28 members, eight of which are draft genomes. This pham was found to be present in another member of the cluster DZ, using phage Cafasso for comparison. /note=Starterator: Using information from the Starterator analysis run most recently on 10/15/21, it was found that the most conserved start site number is 12. The auto-annotated start is called at start number 13 (7322), which does not match the most conserved start. The track for ObLaDi does not contain start 12 at all, and has start 13 listed as the first start by a green line, indicating that it was determined as the final human annotated start. /note=Location call: The evidence gathered thus far indicates that the start site at 7322 as called by Glimmer appears to be the most probable site, and that this gene is indeed real. /note=Function call: Both PhagesDB BLASTp and NCBI BLASTp have several hits with low e values, high identity percentages, and reasonable scores. The top non-draft hit on PhagesDB BLASTp was for a gene in Cafasso, a final phage in the same cluster as ObLaDi. This hit has a significantly low E-value at 7e-73, a reasonable score, as well as a max identity percentage of 100%. Furthermore, the top non-draft hit on NCBI BLASTp was also for a gene in Cafasso. This hit has an E-value even lower than the first hit on PhagesDB of 3e-91, a reasonable score, as well as a max identity percentage of 100%. Each of these first hits have functions of head-to-tail adaptor. Given the above data, there is enough data to form a hypothesis for the function of my gene, which appears to be head-to-tail adaptor. Additionally, there is an HHPRED alignment to the Bacillus protein yqbG, which functions as a head-to-tail adaptor. The top hit on CDD also lists this function. /note=Transmembrane domains: No TMDs called by TmHmm or TOPCONS. The protein is not a membrane protein. /note=Secondary Annotator Name: Uvarov, Evgeniy /note=Secondary Annotator QC: I have QC’ed this gene and agree with the first annotator based on the evidence provided. CDS 7738 - 8097 /gene="10" /product="gp10" /function="head-to-tail stopper" /locus tag="ObLaDi_10" /note=Original Glimmer call @bp 7738 has strength 8.46; Genemark calls start at 7783 /note=SSC: 7738-8097 CP: yes SCS: both-gl ST: SS BLAST-Start: [head-to-tail stopper [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 4.29575E-80 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.921, -4.966271270034708, no F: head-to-tail stopper SIF-BLAST: ,,[head-to-tail stopper [Gordonia phage Cafasso]],,QXN74225,100.0,4.29575E-80 SIF-HHPRED: Minor_capsid_1 ; Minor capsid protein,,,PF10665.11,87.395,99.6 SIF-Syn: Head-to-tail stopper, upstream gene is head-to-tail adaptor, downstream is NKF (Pham 332963), just like in phage Cafasso from the same cluster DZ and phage VanLee /note=Primary Annotator Name: Khaine, Aye Myat /note=Auto-annotation: Glimmer and GeneMark. Glimmer calls the start at 7738 while GeneMark calls it at 7783. /note=Coding Potential:The ORF has reasonable coding potential with the 7738 start including all of the coding potential. /note=SD (Final) Score: The final score is -4.966 with a z-score of 1.921. It is not the best score. /note=Gap/overlap: The overlap upstream is 4 bp, which is reasonable because the gene could belong to an operon. /note=Phamerator: The pham number is 33214 as of 10/26/2021. The gene is conserved in Cafasso in the same cluster DZ. It is also found in singleton VanLee. /note=Starterator: Even though this start number is not the most annotated, start number 3 at position 7738 is called in the non-draft phage Cafasso belonging to the same cluster DZ. /note=Location call: Based on the evidence above, this is a real gene with the start site at 7738. /note=Function call: Head-to-tail stopper. Top phagesdb BLAST hits have head-to-tail stopper as their functions with the least e-value being 1e-12. The top NCBI BLAST hit also indicates head-to-tail stopper with an e-value of 4e-80 and 100% identity. HHPRED hit corresponds to the SEA-PHAGES requirement for SPP1 16 (5A21 chain E or F) with e-value of 7.2e-14, 99.56% probability and 94.12% coverage. /note=Transmembrane domains: No TMDs predicted in both TMHMM and TOPCONS. It is not a membrane protein. /note=Secondary Annotator Name: Di Blasi, Daria /note=Secondary Annotator QC: I agree with the primary annotator based on the evidence. Make sure to put that GeneMark and Glimmer agree. CDS 8108 - 8416 /gene="11" /product="gp11" /function="hypothetical protein" /locus tag="ObLaDi_11" /note=Original Glimmer call @bp 8108 has strength 16.3; Genemark calls start at 8108 /note=SSC: 8108-8416 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_11 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 5.03622E-69 GAP: 10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.403, -3.898968357315439, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_11 [Gordonia phage Cafasso]],,QXN74226,100.0,5.03622E-69 SIF-HHPRED: SIF-Syn: NKF but there is a suggested tail component function that is not specified yet. The upstream genes for Cafasso and VanLee for the same gene are both head to tail stoppers and for downstream are tail terminator genes. Cafasso and VanLee were the only phage used in this analysis. /note=Primary Annotator Name: McLinden, Katherine /note=Auto-annotation: Glimmer and GeneMark both call 8108 as the start site. The start codon called is GTG. /note=Coding Potential: This gene has reasonable coding potential within the putative ORF. The start site is also inclusive of the coding potential. /note=SD (Final) Score: -3.899. This is the best SD final score on PECAAN /note=Gap/overlap: 10. This is the smallest reasonable gap on PECAAN. All other gaps were over 76 bps long. The gene length is also reasonable. /note=Phamerator: This gene belongs to pham 32963. There are four members of this pham. ObLaDi (draft), Cafasso (non-draft), and Aleemily (draft) are all part of the DZ cluster and VanLee (non-draft) is a singleton. The date is 10/26/21. There is no conserved function between the members of the pham. /note=Starterator: Start site 6 was found to be the most annotated in 4/4 of the pham members. Start 6 is 8108 in ObLaDi, which is the same start site predicted by Glimmer and GeneMark. /note=Location call: It is highly likely that this is a real gene. 8108 is the most likely start site based on evidence given for starterator, GeneMark, and Glimmer. /note=Function call: In PhagesDB BLASTp there are two non-draft gene hits from phage Cafasso and VanLee with E values 3e-54 and 3e-10 respectively. In NCBI BLASTp there are two non-draft gene hits from phages Cafasso and VanLee with E values 5e-69 and 1e-09 respectively. Based on this information, both hits are significant, but the one from Cafasso is more reliable. Additionally, the identity scores were higher for Cafasso than VanLee. We are not able to make a function call because all the gene hits are for unknown functions. The PhagesDB function frequency shows hits for minor tail protein and minor capsid proteins at 50% frequency. This does not give enough evidence to try and call these as a function. Additionally, there were no significant hits in CDD, but HHpred had significant hits for a minor capsid protein and putative tail component with E-values of 4.8e-10 and 2e-8 respectively. Because the E-value, coverage %, probability % are all higher for the minor capsid protein and it is present at a 50% frequency in Phages DB function frequency, a minor capsid protein would be a good option. However, synteny suggest that it may be the putative tail component because upstream and downstream genes for Cafasso and VanLee are head to tail stopper and tail terminator respectively. I therefore think that with the still acceptable E-values, probability, and coverage for putative tail component in addition to synteny that putative tail component is the most likely function. /note=Transmembrane domains: There were no TMD`s called and no other evidence to suggest a TMD function within the other databases. /note=Secondary Annotator Name: Dooley, Naomi /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 8416 - 8820 /gene="12" /product="gp12" /function="tail terminator" /locus tag="ObLaDi_12" /note=Original Glimmer call @bp 8416 has strength 14.23; Genemark calls start at 8416 /note=SSC: 8416-8820 CP: yes SCS: both ST: SS BLAST-Start: [tail terminator [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 3.36344E-94 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.211, -4.291708691156382, no F: tail terminator SIF-BLAST: ,,[tail terminator [Gordonia phage Cafasso]],,QXN74227,100.0,3.36344E-94 SIF-HHPRED: TAIL-TO-HEAD JOINING PROTEIN GP17; VIRAL PROTEIN, VIRAL INFECTION, TAILED BACTERIOPHAGE, SIPHOVIRIDAE, SPP1, VIRAL ASSEMBLY, HEAD-TO-TAIL INTERFACE, DNA GATEKEEPER, ALLOSTERIC MECHANISM; 7.2A {BACILLUS PHAGE SPP1},,,5A21_G,98.5075,99.5 SIF-Syn: Tail terminator (pham 45828), upstream gene is NKF (pham 32963), downstream is major tail protein (pham 81438), just like in phage Cafasso /note=Primary Annotator Name: Uvarov, Evgeniy /note=Auto-annotation start source: Glimmer and GeneMark both call start at 8416 (site 2) with an ATG start codon. /note=Coding Potential: Reasonable coding potential predicted within the putative ORF by both Glimmer and GeneMark. The chosen start site covers all the coding potential. This ORF only has forward strand coding potential, thus this is a forward gene. /note=SD (Final) Score: -4.292, the best final score on PECAAN, but this could be irrelevant since this gene is possibly part of an operon with a 1 bp overlap. The Z-score is 2.211, the second best value on PECAAN. /note=Gap/overlap: Overlap of 1 bp (gap = -1) indicative of a likely operon within which this gene is located. This start site creates the LORF and the gene length is 405 bp which is acceptable. /note=Phamerator: The pham number as of 10/25/2021 is 45828. The gene is conserved in phages Aleemily_Draft (DZ) and Cafasso (DZ). Cafasso (DZ) is the best phage genome for comparison since it is non-draft. Based on PhagesDB the function call for the gene is a tail terminator. It is on the approved SEA-PHAGES list. /note=Starterator: Based on the 10/22/21 run the most annotated start site 2 is a reasonable choice that is conserved among members of pham 45828. There are 5 members total with only 2 non-draft members in this pham. 4/5 of total members and 1/2 of non-draft members call start site 2, which correlates to 8416 bp for ObLaDi. /note=Location call: Considering all of the evidence above, this gene is a real gene that is conserved in phamerator as well as starterator, has good coding potential and covers all of it with a start site at 8416 bp (site 2). Starterator agrees with Glimmer and Genemark. /note=Function call: Tail terminator protein. PhagesDB BLASTp top two non-draft hits are tail terminators with small e-values of Cafasso_12 (e: 5e-75, id: 100%) and VanLee_13 (e: 3e-33, id: 52%). Phagesdb Function Frequency shows tail terminator with highest total frequency (71%). NCBI BLASTp shows two strong hits with small e-values that are tail terminators of Cafasso (e: 3e-94 - lowest, id: 100%, cov: 100%) and VanLee (e: 3e-39 - third lowest, id: 52%, cov: 97%). No CDD hits. HHpred three top hits were all closely related to the structure and function of phage tail proteins. The top hit was 5A21_G (prob: 99.52%, cov: 98.5075%, e: 8.2e-13) which is the HHpred alignment that SEA-PHAGES requires to call a tail terminator, described as SPP1 17 (5A21 chain G in the macromolecular complex). The second top hit was 6TE9_F (prob: 99.44%, cov: 97.7612%, e: 1e-11) part of a phage-type particle that is responsible for DNA delivery. The third top hit was 4ACV_B (prob: 99.31%, cov: 97.0149%, e: 1.1e-10) a Listeria antigen that resembles the structure of phage tail proteins. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Tenney, Megan /note=Secondary Annotator QC: Great job! I agree with your decision! Maybe in your synteny box just mention the functions of the upstream gene and this gene! CDS 8905 - 9573 /gene="13" /product="gp13" /function="major tail protein" /locus tag="ObLaDi_13" /note=Original Glimmer call @bp 8905 has strength 18.48; Genemark calls start at 8905 /note=SSC: 8905-9573 CP: yes SCS: both ST: SS BLAST-Start: [major tail protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 5.2803E-161 GAP: 84 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.332, -2.0720764396375664, yes F: major tail protein SIF-BLAST: ,,[major tail protein [Gordonia phage Cafasso]],,QXN74228,100.0,5.2803E-161 SIF-HHPRED: SIF-Syn: major tail protein, upstream gene is tail terminator, downstream is NKF protein, just like in phage Cafasso. /note=Primary Annotator Name: Jin, Katherine /note=Auto-annotation: Both Genemark and Glimmer. /note=Both agreed on the same start site: 8905. Site # 14. /note=ATG start codon. /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF. /note=The chosen start site covers all this coding potential. /note=SD (Final) Score: -2.072 and the Z-score was above 2, at 3.332. /note=Gap/overlap: 84bp, Gap is relatively big, and is not below the accepted 50bp. Unknown if there is gene in Gap. There is low coding potential at the gap, unlikely that there is a gene there. /note=Phamerator: The pham number was 80614 as of 10/22/21. The gene is conserved in Cafasso, which is also part of the DZ cluster. /note=Starterator: Start site 14 was the most annotated start number, having been called in 167 of the 172 non-draft genes. It corresponds to 8905 start site. /note=Location call: From the evidence this gene is real and its most likely start site is 8905. /note=Function call: major tail protein, From all databases PhagesDB, NCBI and HHpred hits, all suggest that this gene function is major tail protein. HHpred predicted above 97% coverage, however e-values are a little higher than expected. NCBI had much better e-values at 6e-161 to 1e-75 and has high coverage with the Cafasso major tail protein sequence. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Semaan, Sasha /note=Secondary Annotator QC: Based on the notes, I agree with the location call of this gene. All the evidence provided seems to check out nicely and there doesn`t seem to be any conflicting data. I also agree with the function call. All the data provided seems to support that conclusion nicely, just make sure to fill out synteny box! CDS 9666 - 9917 /gene="14" /product="gp14" /function="hypothetical protein" /locus tag="ObLaDi_14" /note=Original Glimmer call @bp 9666 has strength 19.11; Genemark calls start at 9666 /note=SSC: 9666-9917 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_14 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 2.53613E-45 GAP: 92 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.925, -2.827683592113848, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_14 [Gordonia phage Cafasso]],,QXN74229,95.4023,2.53613E-45 SIF-HHPRED: SIF-Syn: NFK, upstream gene is major tail protein just like in phage Cafasso, downstream gene is tail assembly chaperone just like in phage Cafasso. /note=Primary Annotator Name: Montoya Serpas, Cinthya /note=Auto-annotation: The start site is 9666 per Glimmer and GeneMark. /note=Coding Potential: The coding potential spans over the putative ORF of this gene in both GeneMark and Glimmer. Therefore, there is very good coding potential. /note=SD (Final) Score: -2.828. This is the best final score since it is the least negative value among all three values. /note=Gap/overlap: There is a gap of 92 bps between the previous gene and the current gene which is considered a large gap. However, this gap is also observed in Cafasso`s genome and there is pretty good synteny with Cafasso`s genome which indicates that this gap is within reason. /note=Phamerator: The gene @stop site 9917 is found in pham 33520 as of 10/25/21. There are 3 members in this pham with ⅔ being draft genes. Aleemily and ObLaDi are draft genes and Cafasso is a finalized gene. Pham 33520 is present in all 3 members that belong to the DZ cluster /note=Starterator: The Starterator analysis was run on 10/25/21. The most conserved site is start site #2 (9666) which also matches the most annotated start site. There are 3 members on this pham. All three members of this pham call start site 2. Cafasso phage was used for comparison. /note=Location call: Based on the evidence so far, the most accurate start site is 9666 for gene @stop 9917 which is a real gene as the start site covers all the coding potential. Additionally, the final score and the Z-score associated with this start site are the best when compared to others. This location call was verified by a second annotator who agrees with the evidence presented above. /note=Function call: NCBI BlastTp and PhagesDB BlastTP were used to predict the function of gene #14, however, the evidence gathered was inconclusive and no predicted function was found. The top NCBI BlastTp hit which corresponds to hypothetical protein SEA_CAFASSO_14 has an e-value of 3e-45, a percent identity of 95.40%, and a total score of 150. The top two PhagesDB BlastTP hits correspond to phages Cafasso and Toniann. Again, Cafasso has very good score of 163 bits, a low e-value of 1e-40, and an identity value of 95% which is strong evidence in support for gene #14 being a real gene. The second PhagesDB Blast TP corresponds to phage Toniann and has a score of 34.3 bits, an E-value of 0.0087, and an identity value of 28%. Additionally, are no conserved domains on the NCBI database and the hits generated by the HHpred hitlist are not within acceptable ranges since the percent coverages for the first two hits are 32% and 27% respectively, the probability values are much lower than desired at 49 and 39, and the e-values are extremely high at 37 and 66. Furthermore, NCBI BlastTP and PhagesDBTP hits are not informative enough since the protein with an acceptable identity value is a hypothetical protein. /note=Transmembrane domains: There is no information available for the existence of transmembrane domains within Gene 14 according to TMHMM and TOPCONS. Therefore, the protein is not a transmembrane protein. /note=Secondary Annotator Name: Uvarov, Evgeniy /note=Secondary Annotator QC: I have QC’ed this gene and agree with the first annotator based on the evidence provided. CDS 9957 - 10469 /gene="15" /product="gp15" /function="tail assembly chaperone" /locus tag="ObLaDi_15" /note=Original Glimmer call @bp 9957 has strength 12.24; Genemark calls start at 9957 /note=SSC: 9957-10469 CP: yes SCS: both ST: SS BLAST-Start: [tail assembly chaperone [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 4.62779E-118 GAP: 39 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.619, -4.443838065747148, no F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Gordonia phage Cafasso]],,QXN74230,100.0,4.62779E-118 SIF-HHPRED: SIF-Syn: Tail assembly chaperone, upstream gene is in pham 3352, downstream gene is a tail assembly chaperone, just like Cafasso. /note=Primary Annotator Name: Semaan, Sasha /note=Auto-annotation start source: Both Glimmer and GeneMark agree on same start site @ 9957 /note=Coding Potential: Gene has reasonable coding potential predicted within the putative ORF. The chosen start site does cover the coding potential /note=SD (Final) Score: -4.444; only one other suggested start site has a better start site and also has a reasonable Z-score (-2.765) but this other start site results in shortest ORF and largest gap which is not more ideal than the proposed start with a Final Score of -4.444. /note=Gap/overlap: Gap with preceding gene is reasonable, only 39 base pairs. Start site that produces this gap also produces longest ORF. /note=Phamerator: Gene found in Pham 79886 as of 10/26/21. Only one other non-draft gene present within this pham had the same cluster as my gene, DZ, which belongs to phage Cafasso. The function called for the Cafasso phage gene and another singleton gene in phage VanLee was tail assembly chaperone. /note=Starterator: With only one other gene available for comparison, the start site number was the same for both genes in the DZ cluster, however different from the one other non-draft gene identified as a singleton. The start site for the gene in phage Van Lee was identified as "Most Annotated" and it`s start site number is 2 (10020 bp) while my gene`s start site number is 1 (9957 bp). 1/2 call site #2. /note=Location call: Gathered evidence suggests this is a real gene that has a start site @ 9957bp: covers all coding potential, small gap with preceding gene, aligns with start site of gene in the same cluster (Cafasso). /note=Function call: Tail chaperone assembly; The top two hits from PhagesDB Blast with identities of 100% and 99%, and e-values of 1e-93 and 2e-92, respectively, have listed functions as tail assembly chaperone. These were the same two top hits from the NCBI database that came from a phage in the same cluster as ObLaDi as well. Additionally, while no hits in the CCD database, there were two hits in the HHpred database that provided evidence supporting the function of the gene as a tail assembly chaperone. These two hits had ideal coverage percentages above 40% and ideal e-values as well. /note=Transmembrane domains: No transmembrane domains. The absence of TMDs does make sense in the context of the hypothesized function for this gene. The hypothesized function of this gene is a tail assembly chaperone protein which does not require the protein to have a transmembrane domain. /note=Secondary Annotator Name: Melkote, Aditi /note=Secondary Annotator QC: Looks good, agree with this call! Second QC: Looks good! How about CCD and HHpred results? CDS join(9957..10457,10457..10909) /gene="16" /product="gp16" /function="tail assembly chaperone" /locus tag="ObLaDi_16" /note= /note=SSC: 9957-10909 CP: yes SCS: neither ST: NI BLAST-Start: [tail assembly chaperone [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 0.0 GAP: -513 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.619, -4.443838065747148, no F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Gordonia phage Cafasso]],,QXN74231,100.0,0.0 SIF-HHPRED: SIF-Syn: /note=annotated by Amanda Freise /note=OblaDi_15 ends in GKSPR /note=ObLaDi_16: has sequence ...GKIPA CDS 10934 - 15625 /gene="17" /product="gp17" /function="tape measure protein" /locus tag="ObLaDi_17" /note=Original Glimmer call @bp 10934 has strength 13.73; Genemark calls start at 10934 /note=SSC: 10934-15625 CP: yes SCS: both ST: SS BLAST-Start: [tape measure protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 0.0 GAP: 23 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.403, -4.48879389222639, no F: tape measure protein SIF-BLAST: ,,[tape measure protein [Gordonia phage Cafasso]],,QXN74232,96.5517,0.0 SIF-HHPRED: Tape Measure Protein, gp57; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_AF,54.2546,99.6 SIF-Syn: Upstream gene (stop 10909; pham 82945) is a tail assembly chaperone and downstream gene (stop 16530; pham 83351) is a minor tail protein, both consistent with phage Cafasso from the same cluster. /note=Primary Annotator Name: Alvarez, Alondra /note=Auto-annotation: Glimmer and GeneMark both call and agree on the start site at 10934. /note=Coding Potential: There appears to be good coding potential for this gene in the Host-Trained and Self GeneMark. There is both alternate and typical coding potential. The start site includes all of the coding potential. /note=SD (Final) Score: -4.489. This is the best/highest score out of all the potential start sites. /note=Gap/overlap: There is a gap of 24 bp. This gap is not long enough to be considered for a gene addition. /note=Phamerator: Gene belongs to pham 61725. Of the four members, two are non-draft genes. These are Cafasso (DZ) and VanLee (singleton). /note=Starterator: The most conserved start site is number 1 at position 10934. It is annotated in 2 of the 2 non-draft genes in the pham. /note=Location call: Based on all of the evidence, this is a real gene. The chosen start site is at 10934. /note=Function call: Tape measure protein. All of the hits displayed on phages DB BLASTp had highly significant e-values--the highest value being 9x10^-49--with tape measure protein functions. NCBI BLASTp also returned several highly significant hits for tape measure protein functions with identity percentages >42% (highest at 94% belonging to Cafasso and 53.98% belonging to Gordonia phage GEazy). Query coverage for these significant hits are >53%. The top 4 hits returned by HHpred had significant e-values with 99% probability of tape measure protein function, matching those returned from phagesdb and NCBI. /note=Transmembrane domains: One transmembrane domain (TMD/TMH) called by TMHMM and several TMDs called by Topcons. It is a membrane protein, which is consistent with its function involved in tail length and transportation of DNA to the cytoplasm. /note=Secondary Annotator Name: Cheng, Celine /note=Secondary Annotator QC: Great job! Just a note: while there is a 24 bp gap, it doesn`t seem long enough to fit in a new gene due to its small length. CDS 15622 - 16530 /gene="18" /product="gp18" /function="minor tail protein" /locus tag="ObLaDi_18" /note=Original Glimmer call @bp 15622 has strength 15.46; Genemark calls start at 15622 /note=SSC: 15622-16530 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.735, -3.296037457437639, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Gordonia phage Cafasso]],,QXN74233,99.6689,0.0 SIF-HHPRED: Distal Tail Protein, gp58; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_CC,94.702,99.9 SIF-Syn: Minor tail protein (pham 80800), upstream gene is pham 61725, downstream is pham 81305, just like in phage Cafasso. /note=Primary Annotator Name: Baughman, Lexie /note=Auto-Annotation: Glimmer and Genemark. Both agree on the same start site of 15622, with a start codon of ATG. /note=Coding Potential: Reasonable coding potential predicted within the putative ORF. Chosen start site of 15622 covers all of the coding potential. /note=SD (Final) Score: The SD score is not the best (only one potential start site has a better SD score), but it is still reasonable to suggest the presence of a credible ribosome binding site. The gene does seem to be organized into an operon, so the RBS score could even be irrelevant for this start call. /note=Gap/Overlap: There is a 4 base pair overlap with the upstream gene, indicating that this gene is likely part of an operon. This start site creates the longest ORF and the length of the gene is acceptable. /note=Phamerator: As of 10/21/2021, the gene is found in Pham 80800. The pham is conserved in other members of the cluster - comparison was done between ObLaDi and Cafasso, as this was the only non-draft genome available. Both Phamerator and PhagesDB called the function of this gene as “minor tail protein,” which is on the approved function list. /note=Starterator: The “Most Annotated” start site is present in 235 of 511 non-draft genes in this pham; however, this start site is not present in ObLaDi. In terms of ObLaDi, the start site with the most manual annotations is 56, which is at position 15622. This start site was found in 4 of 550 genes in the pham (2 of which are finalized) but it is called 100 percent of the time when present. /note=Location Call: The gathered evidence suggests that this is a real gene and that its start site is likely at position 15622. /note=Function Call: The top 2 NCBI BLASTp hits suggested function is minor tail protein, with high query coverage (>98%), high % identity (>54%), and low e-values (<2e-99). The top 2 PhagesDB BLASTp hits suggested function is minor tail protein, with high % identity (>54%) and low e-values (<3e-84). Thus, the two databases seem to be in agreement; however, NCBI also has some strong hits for “phage tail protein” and “hypothetical protein” that may warrant consideration. While there were no hits in CDD, there were strong hits in HHpred - with high probability (99.92%), high coverage (>87.4172%), and low e-values (<1.6e-22) - that listed a similar function. This function is also conserved in a finalized phage genome (see Synteny box). /note=Transmembrane Domains: No predicted transmembrane domains. /note=Secondary Annotator Name: Di Blasi, Daria /note=Secondary Annotator QC: I agree with the primary annotator based on all the evidence. CDS 16527 - 18191 /gene="19" /product="gp19" /function="minor tail protein" /locus tag="ObLaDi_19" /note=Original Glimmer call @bp 16527 has strength 17.3; Genemark calls start at 16527 /note=SSC: 16527-18191 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.073, -4.864188174661353, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Gordonia phage Cafasso]],,QXN74234,99.8195,0.0 SIF-HHPRED: Protein gp18; NP_465809.1, prophage tail protein gp18, Structural Genomics, Joint Center for Structural Genomics, JCSG, Protein Structure Initiative; HET: MSE, MLY; 1.7A {Listeria monocytogenes EGD-e},,,3GS9_A,93.6823,99.2 SIF-Syn: Function is minor tail protein, upstream gene is minor tail protein, and dowstream is HNH endonuclease just like Cafasso /note=Primary Annotator Name: Dooley, Naomi /note=Auto-annotation: The gene is called by Glimmer and Gene mark at the same start site 16527 along with the same start codon at ATG. This is a very common start codon so there is reason to believe that this may be the correct start site. /note=Coding Potential: The gene has reasonable coding potential that is covered by the start site. /note=SD (Final) Score: The SD score was reasonable at -4.864 which is not the best score listed; however the z-score of 2.073 (>2) suggests that this may be the correct start site. Furthermore; the start sites with better SD scores all have unreasonable gaps and/or lengths. /note=Gap/overlap: There is a reasonable gap of -4 which could indicate the presence of an operon and the gene is a reasonable length of 1665 bp. /note=Phamerator: The pham number as of (11/29/2021); the gene is conserved and found in Cafasso /note=Starterator: There are 1099 non draft members of this Pham. Start number 104, which corresponds to Obladi_19, is called 5/1099. This start corresponds to start site 16527 in Obladi. /note=Location call: Based on the data it appears that this is a real gene with a 18191 stop site and 16527 start site. /note=Function call: Based on the phages Cafasso and VanLee, it is safe to conclude that the function of Obladi gene 19 is minor tail protein. The top two BLAST hits on both PhagesDB have E-values of 0 and high scores, both of which have a minor tail protein function listed (both have high coverage, 99.9% and 60% respectively, 65%+ identity and evalue <10^-63). HHpred have tail protein listed with 99% probability, 87%+ coverage, and E-values of 4.5E-21 and 2.9E-8. CDD had no relevant hits /note=Transmembrane domains: Neither TMHMM or TOPCONS predicted any TMDs. Thus we can conclude that this is not a membrane protein. /note=Secondary Annotator Name: Stephenson, Juliet /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. I hope this gene can be further annotated once Phamerator/Starterator start working. CDS complement (18204 - 18692) /gene="20" /product="gp20" /function="HNH endonuclease" /locus tag="ObLaDi_20" /note=Original Glimmer call @bp 18692 has strength 5.83; Genemark calls start at 18692 /note=SSC: 18692-18204 CP: yes SCS: both ST: SS BLAST-Start: [HNH endonuclease [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.42463E-100 GAP: 243 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.826, -3.108788982793736, yes F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Gordonia phage Cafasso]],,QXN74235,90.7407,1.42463E-100 SIF-HHPRED: HNH homing endonuclease; HNH catalytic motif, Helix-turn-helix DNA binding domain, protein-DNA complex, DNA binding protein-DNA COMPLEX; HET: EDO; 2.92A {Bacillus phage SPO1} SCOP: d.4.1.3, d.285.1.1,,,1U3E_M,66.0494,99.9 SIF-Syn: HNH endonuclease for gene 20 for both ObLaDi and Cafasso. Genes 19 and 21 flank this gene in ObLaDi and show synteny with Cafasso gene 19 and gene 21, respectively. Cafasso gene 19 has the function of minor tail protein, which matches the function of gene 19 for ObLaDi. The function for gene 21 in ObLaDi is minor tail protein, and the function for gene 21 in Cafasso is not noted; both of these genes are part of pham 74961. /note=PECAAN Notes /note=Primary Annotator Name: Wu, Meigan /note=Auto-annotation: Glimmer and Genemark. Both give a start site at 18692. Start codon: ATG. /note=Coding Potential: Coding potential is observed in a reverse-oriented reading frame in both GeneMark Self and GeneMark Host. /note=SD (Final) Score: -3.109 (best final score on PECAAN) /note=Gap/overlap: Gap = 243; large gap that is conserved in Cafasso’s genome. No coding potential is observed in this gap. /note=Phamerator: pham 80837. Date: 10/26/2021. Conserved: also found in Cafasso (DZ). Phamerator suggests the gene function to be HNH endonuclease. /note=Starterator: Start number 208 was manually annotated in 19/607 non-draft genes for pham 80837 and was called 84% of the times when present. The respective start position is at 18692 bp. This data matches with the Glimmer and GeneMark start site call. /note=Location call: This gene is likely a real gene that starts at 18692. /note=Functional call: Function call: HNH endonuclease. Supported by most hits with small e-values found through PhagesDB BLASTp (3e-56). The top phagesDB BLASTp hit suggested function is minor tail protein with decent percent identity (57% and 33%) and low e-values (e-138 and 3e-58). The other hits in the PhagesDB also suggest the same function, but have lower percent identities, or with high e-values, but the suggested function for this protein is a minor tail protein /note=Transmembrane domains: Neither TMHMM or TOPCONS predicted any TMDs, therefore it is not a membrane protein /note=Secondary Annotator Name: Thorp, Jocelyn /note=Secondary Annotator QC: I agree with the evidence presented, and the start site called. I also agree with the function call, but there is a lot more evidence that can be selected under both BLASTs to support this claim. CDS 21434 - 21595 /gene="23" /product="gp23" /function="hypothetical protein" /locus tag="ObLaDi_23" /note=Original Glimmer call @bp 21434 has strength 14.57; Genemark calls start at 21434 /note=SSC: 21434-21595 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_23 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 5.51784E-28 GAP: 13 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.796, -5.909146062529675, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_23 [Gordonia phage Cafasso]],,QXN74238,96.2963,5.51784E-28 SIF-HHPRED: SIF-Syn: NKF, upstream gene is minor tail protein, downstream is NKF and in Pham 83185, just like in phage Cafasso. /note=Primary Annotator Name: Kamarzar, Minehli /note=Auto-annotation: Glimmer and GeneMark were used and both agreed on the same start site. The called start codon is 21434. /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF. The chosen start site covers all the coding potential. /note=SD (Final) Score: The SD score of -5.909 is the best option and the z-score is the highest at 1.796. /note=Gap/overlap: The gap with the upstream gene is very reasonable at a 13 bp gap. The length of the gene (162 bp) is acceptable given the auto-annotated start site. /note=Phamerator: As of October 20, 2021, the gene is found in pham 78677. The gene is conserved in Phage Cafasso which belongs to the same cluster (DZ) as Phage ObLaDi. The phage used for comparison was Phage Cafasso. The function call for this gene is a putative carbohydrate binding domain protein and it is only listed on the phams database. There was no function call on Phamerator and it is not on the approved SEA-PHAGES list. /note=Starterator: The start site choice that is conserved among the members of the pham in which this gene belongs is start site 21434 which is start number 17. There are 41 non-draft members and 5 draft members in this pham and 16/41 non-draft members call start number 6 as the most conserved; however, the start site that made most sense for this gene is 17 which is called by 4/41 non-draft members. /note=Location call: The gathered evidence suggests that the original start site call at 21434 by Glimmer and Genemark is reasonable and it is the potential start site candidate that seems most likely. In addition, it also suggests that the gene is a real gene. /note=Function call: PhagesDB BLAST and NCBI BLASTp have hits with small E-values, but do not suggest what the function of this gene is. PhagesDB BLAST gave hits with E-values of e-25 and e-18, while NCBI BLASTp gave E-values of e-18 and e-19. The top 2 NCBI BLASTp and PhagesDB BLAST hits sorted by E-values show high identity values (>75%) and 100% query coverage. HHpred and CDD had no relevant hits. /note=Transmembrane domains: TMHMM and TOPCONS did not predict any TMDs, which indicates that it is not a membrane protein. /note=Secondary Annotator Name: Magaling, Janelle /note=Secondary Annotator QC: Looks good :) CDS 21632 - 21970 /gene="24" /product="gp24" /function="hypothetical protein" /locus tag="ObLaDi_24" /note=Original Glimmer call @bp 21632 has strength 16.09; Genemark calls start at 21632 /note=SSC: 21632-21970 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_24 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.39456E-73 GAP: 36 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.005, -2.7423981586358774, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_24 [Gordonia phage Cafasso]],,QXN74239,97.3214,1.39456E-73 SIF-HHPRED: SIF-Syn: No Known Function protein, upstream gene is pham 81057, downstream gene is pham 79018, similar to phage Cafasso, although in Cafasso gene pham 79018 function is known as minor tail protein. Has synteny /note=Primary Annotator Name: Krug, Kelley /note=Auto-annotation: Glimmer and GeneMark, start site 21632, start codon ATG /note=Coding Potential: Coding potential in 2nd frame, Forward gene, coding potential found in both GeneMark Self and Host /note=SD (Final) Score: -2.742, it is the best final score /note=Gap/overlap: 36 bp gap is reasonable, no coding potential in the gap, gap conserved in Cafasso /note=Phamerator: Pham 74084 as of 10/22/2021, the pham is conserved in other members of DZ cluster (Cafasso), no function called for this gene /note=Starterator: Start site 22 was manually annotated in 24/64 genes (non-draft) in this pham, start 22 is at base pair 21632 in ObLaDi, genemark and glimmer agrees with this evidence. /note=Location call: Likely a real gene, start site num 22 at bp 21632 /note=Function call: No hits on CDD and no significant hits on HHpred (the best hit was 79% probability, 68.75% coverage, and e-value of 33 for a Chitinase C protein). Thus, I went with NKF. /note=Transmembrane domains: TMHMM showed no TMDs, neither did TOPCONS. The gene function is unknown. /note=Secondary Annotator Name: Wu, Meigan /note=Secondary Annotator QC: The gap seems a bit large since it`s more than 7 bp. Is this gap conserved in other phages? Also, I would specify that the manual annotations you are referencing for Starterator are non-draft genes. Otherwise, looks good! Good job! /note=11/24/21: For gap/overlap, a small gap/overlap size is generally no more than 7 bp, so I wouldn`t note that 36 bp is small. Overall looks great! Everything is filled out correctly and with detail! Good job! CDS 21974 - 23899 /gene="25" /product="gp25" /function="minor tail protein" /locus tag="ObLaDi_25" /note=Original Glimmer call @bp 21974 has strength 13.84; Genemark calls start at 21974 /note=SSC: 21974-23899 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 0.0 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.433, -3.773970443115436, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Gordonia phage Cafasso]],,QXN74240,100.0,0.0 SIF-HHPRED: SIF-Syn: Minor tail protein, upstream gene is NKF, downstream gene is also a minor tail protein, just like in phage Cafasso, /note=Primary Annotator Name: Lee, Adrienne /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 21974. /note=Coding Potential: There is coding potential in both GeneMark Self and Host. The coding potential in the ORF is only found in the forward strand and there is no switching. /note=SD (Final) Score: The final score is -3.774 which was the least negative score out of all the gene candidates. /note=Gap/overlap: There is a 3 base pair gap. This is highly reasonable because it is extremely small. This gap is also conserved in the other phage (Cafasso) in the cluster. /note=Phamerator: As of October 21, 2021, the pham number is 79018 and it is conserved in Aleeemily_draft_24 and Cafasso_25 which are in the same cluster as ObLaDi. /note=Starterator: Starterator determined that the most conserved start site is start site 5 among 18 phages in the pham. Start site 5 is the most conserved start site but it does not exist in ObLaDi, but the auto-annotated start site 10 at 21974 was called 100% of the times it was present in 3 other phages. /note=Location call: Based on the evidence above, this is a real gene and the start site is 21974. This site was called by both Glimmer and GeneMark and yields the longest reasonable ORF. This was also the auto-annotated start site by Starterator. /note=Function call: Minor tail protein: There were no good hits from CDD or HHpred. They all had very high e-values. Both Cafasso and Mollymur had the function call of minor tail protein and had an e-value of 0 from Phagesdb Blast and this is in agreement with NCBI Blastp. Cafasso had 100% identity, alignment, and coverage and Mollymur had 74% identity, 83% alignment, and 95% coverage. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Light, Isabel /note=Secondary Annotator QC: I agree with the start site of 21974 as well as function call for this gene. CDS 23899 - 24609 /gene="26" /product="gp26" /function="minor tail protein" /locus tag="ObLaDi_26" /note=Original Glimmer call @bp 23899 has strength 7.2; Genemark calls start at 23899 /note=SSC: 23899-24609 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_26 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.82753E-161 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.0, -4.741344217713455, no F: minor tail protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_26 [Gordonia phage Cafasso]],,QXN74241,100.0,1.82753E-161 SIF-HHPRED: SIF-Syn: Minor tail protein, upstream gene is minor tail protein, downstream is lysin A protease, just like in Cafasso (though Cafasso does not call a function for this particular pham). /note=Primary Annotator Name: Janelle Tricia Magaling /note=Auto-annotation: Glimmer and Genemark. Both agree on the start site 23899. /note=Coding Potential: Coding potential is in one forward ORF. There is also coding potential shown in GeneMark Self and Host. /note=SD (Final) Score: -4.741. Not the best, but it doesn’t matter since it’s most likely an operon. /note=Gap/overlap: 1bp which indicates an operon. /note=Phamerator: pham: 80920. Date 10/15/2021. It is conserved; found in Cafasso (DZ). /note=Starterator: The most conserved start site was 6 which was called in 230/336 non-draft genes, but ObLaDi does not have this start. Start site 43 in Starterator was manually annotated in 8/336 non-draft genes in this pham. ObLaDi uses start 43 @ 23899. This is supported by GeneMark and Glimmer. /note=Location call: Based on the previous evidence, this is a real gene that is conserved in Phamerator and has good coding potential with the start site 43 at 23899 which was conserved in Starterator and supported by Glimmer and Genemark. /note=Function call: minor tail protein. The phages DB function call has all of the phages function as a minor tail protein. The phagesDB blast sorted by increasing e value had unknown functions for the first twelve phages, but the 13th and 16th phages were minor tail proteins with higher, but still good e values (e-64, e-52). The NCBI blast shows the second phage, sorted by e-value, has a 52% identity and has a function of minor tail protein. Many phages were hypothetical proteins with some being minor tail proteins with higher, but still good e values (e-54, e-15). Minor tail proteins tend to be in a chain of the other minor tail proteins so can check if surrounding genes have the same function. /note=Transmembrane domains: There are no hits for TMHMM so we cannot check TOPCONS for further evidence. /note=Secondary Annotator Name: Turon Font, Guillem /note=Secondary Annotator QC: Looks good! No notes here. The start site seems to clip a little into the CP, but I don`t think there`s one that wouldn`t without being very awkward. CDS 24644 - 25318 /gene="27" /product="gp27" /function="lysin A, protease C39 domain" /locus tag="ObLaDi_27" /note=Original Glimmer call @bp 24644 has strength 13.91; Genemark calls start at 24644 /note=SSC: 24644-25318 CP: yes SCS: both ST: SS BLAST-Start: [lysin A, protease C39 domain [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.74818E-166 GAP: 34 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.519, -3.659876950936526, no F: lysin A, protease C39 domain SIF-BLAST: ,,[lysin A, protease C39 domain [Gordonia phage Cafasso]],,QXN74242,100.0,1.74818E-166 SIF-HHPRED: Peptidase_C39_2 ; Peptidase_C39 like family,,,PF13529.8,60.7143,99.5 SIF-Syn: Lysin A, protease C39 domain, with upstream gene in Pham 81317 and downstream gene in Pham 79453 and identified as another lysin A domain, like Cafasso 27. /note=PECAAN Notes /note=Primary Annotator Name: Ostroske, Elyse /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 24644. /note=Coding Potential: Coding Potential in this ORF is on the forward strand, indicating that this is a forward gene. Coding Potential is found in both GeneMark Self and Host. /note=SD (Final) Score: -3.660. This is the best final score in PECAAN. /note=Gap/overlap: Gap of 34 bp. All other gaps and overlaps for other possible start sites are much larger, so this is the most reasonable one. /note=Phamerator: Pham 53671 (as of 10/25/21). This pham is also present in Cafasso, the only non-draft phage in cluster DZ. /note=Starterator: Start Site 38 was manually annotated 28/83 non-draft genes in this pham. However, this gene does not possess start site 38, and Starterator called start site 21 for this gene, which corresponds to 24644 (the Glimmer and GeneMark start site). /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 24644. /note=Function call: Lysin A, protease C39 domain, due to 100% identity call with Cafasso_27, which has this function. Support for peptidase C39 function was also found in CDD ( e-value of 4.44e-08) and HHpred (e-value of 2.2e-12). /note=Transmembrane domains: TMDs were not predicted by TMHMM or TOPCONS. /note=Secondary Annotator Name: Baughman, Lexie /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator. I have also QC`ed this function call and agree with the first annotator. CDS 25315 - 26022 /gene="28" /product="gp28" /function="lysin A, N-acetylmuramoyl-L-alanine amidase domain" /locus tag="ObLaDi_28" /note=Original Glimmer call @bp 25315 has strength 12.51; Genemark calls start at 25315 /note=SSC: 25315-26022 CP: yes SCS: both ST: SS BLAST-Start: [lysin A, N-acetylmuramoyl-L-alanine amidase domain [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.33731E-172 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.153, -4.429150775994987, yes F: lysin A, N-acetylmuramoyl-L-alanine amidase domain SIF-BLAST: ,,[lysin A, N-acetylmuramoyl-L-alanine amidase domain [Gordonia phage Cafasso]],,QXN74243,100.0,1.33731E-172 SIF-HHPRED: d.118.1.1 (A:1-157) N-acetylmuramoyl-L-alanine amidase PlyG {Anthrax bacillus (Bacillus anthracis) [TaxId: 1392]},,,d1yb0a1,78.7234,99.6 SIF-Syn: ObLaDi Gene 28 shows synteny with Cafasso 28, which is also a lysin A protein. Both the upstream (ObLaDi Gene 29 - lysin A, glycosyl hydrolase domain) and downstream genes (ObLaDi Gene 27 - lysin A, protease C39 domain) are also lysin A related function proteins, so this is another indicator that confirms the function of this gene. /note=AF: made function more specific based on HHpred hits (lysin a, n-acetylmuramoyl-l-alanine amidase domain) /note=Primary Annotator Name: Santos, Charysa /note=Auto-annotation: Glimmer and Genemark. Both start at 25315. /note=Coding Potential: Coding potential in this ORF is found only on the forward strand, thus this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.429. It is the best final score on PECAAN. /note=Gap/overlap: -4. Very small overlap, meaning there is no room for another gene in between the previous gene and this one. /note=Phamerator: 10/24/21 – pham: 79453; found in Cafasso, Aleemily_Draft, and ObLaDi_Draft /note=Starterator: Start site 21 in Starterator was manually annotated in 5/57 non-draft genes in this pham. Start 21 is 25315 in ObLaDi. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Using the evidence I found, this is a real gene and the most probable start site is 25315. /note=Function call: Based on the BLAST results for this sequence, I believe thatmy gene has a lysin A function because the top hits were both lysin A function genes and they had high alignment with my gene. Lysin A function; I believe I have enough evidence to rationalize the function of my gene based on the BLAST results for this sequence. Because the top matches were both lysin A function genes with high homology with my gene, I believe it has a lysin A function. Furthermore, based on the CDD results, it also seems to have a lysin function due to the fact that it has both an amidase function as well as a peptidoglycan recognition protein. /note=Transmembrane domains: There were no TMD predictions on either TMHMM or TOPCONS, so we can conclude that this gene is not a membrane protein. /note=Secondary Annotator Name: Fleming, Hanna /note=Secondary Annotator QC: I agree with this annotation and location call. All of the evidence categories have been considered. Note: the starterator and coding potential drop down menus have not been filled. I also agree with your function call. CDS 26045 - 26428 /gene="29" /product="gp29" /function="hypothetical protein" /locus tag="ObLaDi_29" /note=Original Glimmer call @bp 26045 has strength 13.22; Genemark calls start at 26045 /note=SSC: 26045-26428 CP: yes SCS: both ST: SS BLAST-Start: [lysin A, glycosyl hydrolase domain [Gordonia phage Pleakley] ],,NCBI, q11:s217 89.7638% 1.16587E-45 GAP: 22 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.255, -2.0895162839948163, yes F: hypothetical protein SIF-BLAST: ,,[lysin A, glycosyl hydrolase domain [Gordonia phage Pleakley] ],,YP_010097450,27.2997,1.16587E-45 SIF-HHPRED: SIF-Syn: /note=QC note: Was formerly called "lysin A, glycosyl hydrolase domain" on basis of 2 PhagesDB BLAST hits to this function only. Upstream two genes have strong evidence for 2 lysin A subunits. As of now (July 2022) I do not know of any lysin A genes split into three parts. Calling NKF for now but this may merit a more specific call. -Amanda Freise /note=Primary Annotator Name: Sheppy, Tyler /note=Auto-annotation: Glimmer and GeneMark both list the start site at 26045. /note=Coding Potential: Both the Host-Trained GeneMark and the Self-Trained GeneMark show a reasonable coding potential in the ORF and the coding potential is covered by the chosen start site. /note=SD (Final) Score: The RBS Final Score is the least negative among other start sites with a value of -2.090. The Z-score is the highest with a value of 3.255. /note=Gap/overlap: The gap is 22bp. This gap is reasonable and observed in Cafasso, another phage in the DZ cluster. /note=Phamerator: The gene is found in pham 22213, as of October 26, 2021. This pham is found in Cafasso (DZ), the only non-draft phage in the same cluster. The Phamerator did not have a function called for this gene. /note=Starterator: The start site for the gene seems to be somewhat complicated. Phages in the DZ cluster appear to have a start site distinct from the most conserved start site. The most conserved start site in this pham is start site number 2 and 4 of 5 genes call it. This phage does not have that start site, so it uses start site number 1 and 1 of 5 genes call it. Both of these start sites are used in 100% of the genes in which they are present (as of October 26, 2021). This pham has a total of 8 members, 3 of which are drafts. /note=Location call: This is a real gene and it has a start site of 26045. /note=Function call: The top hits with functions in both the NCBI BLASTp and PhagesDB BLASTp list the function as lysin A, glycosyl hydrolase domain. The top hit in NCBI BLASTp comes from a gene in Pleakley, and it has a coverage of 89.7638%, identity of 22.8487%, and an e-value of 1.16587e-45. CDD and HHpred were not informative. /note=Transmembrane domains: There are no transmembrane domains predicted by TMHMM or TOPCONS. /note=Secondary Annotator Name: Lee, Adrienne /note=Secondary Annotator QC: I agree with the location and functional call. CDS 26425 - 27102 /gene="30" /product="gp30" /function="cysteine protease" /locus tag="ObLaDi_30" /note=Original Glimmer call @bp 26425 has strength 13.04; Genemark calls start at 26431 /note=SSC: 26425-27102 CP: yes SCS: both-gl ST: SS BLAST-Start: [cysteine protease [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.66732E-162 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.468, -6.8551355900001, no F: cysteine protease SIF-BLAST: ,,[cysteine protease [Gordonia phage Cafasso]],,QXN74245,100.0,1.66732E-162 SIF-HHPRED: d.3.1.0 (A:) automated matches {Yellow mealworm (Tenebrio molitor) [TaxId: 7067]},,,d3qj3a_,77.7778,100.0 SIF-Syn: Upstream gene is pham 22213 (lysin A), Downstream gene is pham 32716, holin (as of 11/24/21). These match phage Cafasso of Cluster DZ and also make sense as part of an operon. /note=Primary Annotator Name: Stephenson, Juliet /note=Auto-annotation: Both GeneMark and Glimmer call this gene. Glimmer calls the gene 6 bases down from GeneMark. /note=Coding Potential: Both GeneMark Self and Host identify coding potential. There is coding potential in the forward strand only. /note=SD (Final) Score: -6.855, SD score not the best, SD Score irrelevant because gene is part of operon (evidence 4 bp gap) /note=Gap/overlap: -4, indicates operon, length 678 is acceptable /note=Phamerator: as of 10/26, this gene is a member of pham 56312, common to 1 other non-draft phage in cluster DZ: Phage Cafasso, function cysteine protease /note=Starterator: reasonable start site 12, not well conserved, present in Cluster DZ. Start Site 19 is most conserved, not present in phage, 6/24 phages in pham call this MA start site. Uninformative, no agreeable start site. /note=Location call: real gene (good coding potential, synteny with Cafasso, 4bp overlap), best start site 26425 because of 4bp overlap, conserved within cluster /note=Function call: Cysteine Protease, The only other non-draft phage in cluster DZ is Cafasso, and Cafasso calls an identical gene as a cysteine protease. This is supported by homology with conserved domains in cysteine proteases in CDD, as well as by homology of conformation with other cysteine proteases in HHPred, which are highly conserved and present even in humans. /note=Transmembrane domains: none /note=Secondary Annotator Name: Huq, Naveed /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 27099 - 27458 /gene="31" /product="gp31" /function="holin" /locus tag="ObLaDi_31" /note=Original Glimmer call @bp 27105 has strength 16.24; Genemark calls start at 27105 /note=SSC: 27099-27458 CP: yes SCS: both-cs ST: SS BLAST-Start: [holin [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 2.88504E-75 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 0.822, -7.90556050301914, no F: holin SIF-BLAST: ,,[holin [Gordonia phage Cafasso]],,QXN74246,100.0,2.88504E-75 SIF-HHPRED: SIF-Syn: Holin protein, upstream gene is a cysteine protease. Downstream is a membrane protein, followed by lysin B. This matches the gene order in phage Cafasso. /note=Primary Annotator Name: Thorp, Jocelyn /note=Auto-annotation: Glimmer and GeneMark agree on 27,105 as the start site, starting with an ATG codon (Methionine). /note=Coding Potential: The gene has reasonable coding potential within the putative ORF in the forward direction, with the chosen start site encompassing all of the coding potential. This is the case for GeneMark Self and Host. /note=SD (Final) Score: -2.523. Final Score and Z-score (3.082) are the best of all start sites for this gene. /note=Gap/overlap: 2 bp, a reasonable gap. This start codon produces a 354 bp transcript, which is a reasonable length for a gene. However, there is a start site (27,099) that has a 4 bp overlap, which is ideal. /note=Phamerator: The gene was found in pham 32716 as of 10/23/2021, and is conserved within the DZ cluster. The phage Cafasso was primarily used for comparison, as it is the only non-draft phage in the cluster and contains the only non-draft gene of the pham. While the ObLaDi gene does not currently have a function listed in phamerator, the corresponding gene of the same pham in Cafasso was a holin. /note=Starterator: Start site 1 is the only start site chosen in a non-draft gene of the pham. This start site corresponds to 27,099 in ObLaDi. /note=Location call: Based upon the evidence above, this is a real gene with the start site at 27,099. While autoannotations and z-scores favor start site 27105, 27099 has a 4 bp overlap (indicative of an operon) and is the start site chosen for the only non-draft gene within the same pham. /note=Function call: Holin. Though there is currently only one hit for NCBI BLASTp, from the phage Cafasso of the same subcluster as ObLaDi, it suggests the function is to act as a holin. There is high query coverage (100%), a high percentage identity match (100%), and a low e-value (3e-75) to offer strong support. While the e-vaues for HHpred hits are high, there is a high probability of a match to a holin (90.76% and 86.13% probability for 2 separate areas of coverage by the same holin). It is also close to multiple lysins, providing further support for the gene being a holin. CDD had no hits. /note=Transmembrane domains: TMHMM and TOPCONS both predict transmembrane domains, TMHMM predicting 4 and TOPCONS programs predicting 2-4 transmembrane domains. Based upon this evidence, there are real TMDs within the gene. This is consistent with the function call of holin, which is a membrane protein as its function is involved in lysis. /note=Secondary Annotator Name: Chuzhi Zhuang /note=Secondary Annotator QC: Agree with your decision on start site 27099. Great work! Agree with the function call too. CDS 27455 - 27826 /gene="32" /product="gp32" /function="membrane protein" /locus tag="ObLaDi_32" /note=Original Glimmer call @bp 27455 has strength 17.91; Genemark calls start at 27455 /note=SSC: 27455-27826 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 3.83328E-82 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 0.953, -7.910617765408869, no F: membrane protein SIF-BLAST: ,,[membrane protein [Gordonia phage Cafasso]],,QXN74247,100.0,3.83328E-82 SIF-HHPRED: SIF-Syn: Membrane protein, upstream gene is holin, downstream is lysin B, just like in phage Cafasso. /note=Primary Annotator Name: Zhuang, Chuzhi /note=Auto-annotation: both Glimmer and Genemark agree on the start site #27455, start codon GTG /note=Coding Potential: have reasonable coding potential, chosen start site cover all this coding potential /note=SD (Final) Score: -7.911, not the best, but it may not matter because this gene may be in an operon, as indicated by the -4 overlap /note=Gap/overlap: -4, it is most reasonable with longest ORF, and the length is also reasonable /note=Phamerator: pham number - 80808, date - 10/26/2021, the gene is conserved in other phages in DZ cluster, Cafasso is used for comparison. No function specified. /note=Starterator: The conserved start site in the pham is 22, 64/141 called site 22. This gene also called site 22 and the corresponding basepair coordinate is at 27455. /note=Location call: real gene, start at #27455 /note=Function call: The top 1 NCBI BLASTp hits, sorted by E-value, suggested function is a membrane protein, with high query coverage (100%), high % identity (100%), and low E-values (4e-82) /note=Transmembrane domains: Both TOPCONS and TMHMM predicted one transmembrane sequence in this protein, and its predicted function of NCBI BLASTp is also a membrane protein. /note=Secondary Annotator Name: Liao, Shiqing /note=Secondary Annotator QC: agree with the primary annotator except info in the starterator. For starterator, start 22 is most annotated and called in 64 out of 141 non-draft genes in this pham. Start site 44 doesn`t seem to be conserved also. (Chuzhi’s Reply: Hi Shiqing, after reading your comment, I checked the starterator again and I believe the pham for my gene has changed over the past few weeks, because when I accessed the starterator before, the most annotated site was indeed site 44, and I saved screenshots. But now it switched to 22, and I updated my notes to reflect this change. Thank you for pointing it out! ) CDS 27823 - 28575 /gene="33" /product="gp33" /function="lysin B" /locus tag="ObLaDi_33" /note=Original Glimmer call @bp 27823 has strength 13.06; Genemark calls start at 27823 /note=SSC: 27823-28575 CP: yes SCS: both ST: SS BLAST-Start: [lysin B [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.34922E-179 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.433, -4.125079303122735, no F: lysin B SIF-BLAST: ,,[lysin B [Gordonia phage Cafasso]],,QXN74248,100.0,1.34922E-179 SIF-HHPRED: Gene 12 protein; alpha/beta sandwich, CELL ADHESION; 2.0A {Mycobacterium phage D29},,,3HC7_A,90.0,99.9 SIF-Syn: Gene 33 in Cafasso is also lysin B, upstream in both ObLaDi and Cafasso is a gene in Pham 81796. Downstream is an immunity repressor. /note=Primary Annotator Name: Fleming, Hanna /note=Auto-annotation: Glimmer and GeneMark both call 27823 as the start site with GTG (Valine) as the start codon. /note=Coding Potential: There is coding potential within the putative ORF, however, the start site does not cover all of this coding potential. /note=SD (Final) Score: -4.125, this was the best score on PECAAN. /note=Gap/overlap: -4 bp, reasonable overlap suggesting it may be co-transcribed. The gene length is acceptable at 753 base pairs. /note=Phamerator: This gene was in pham 80396 as of October 23, 2021. This gene in Cafasso was in the same pham. /note=Starterator: Start site 56 is called by 16/96 genes in this pham. This corresponds to bp 27823 in ObLaDi. /note=Location call: This is most likely a real gene with start site 27823. /note=Function call: Lysin B. The top 5 non-hypothetical BLASTp hits with the lowest e values (less then 6e-42) suggest that the function is Lysin B. They have between 41.7 and 100% identity and greater than 96% coverage. The top hit in HHpred was for the A chain of lysin B in a mycobacteriophage D29 this hit had an e value of 6.4e-21, 99.87% probability, and 90% coverage. CDD did not call any specific hits. /note=Transmembrane domains: TMHMM did not predict any transmembrane domains and neither did TOPCON so this is not a transmembrane domain. /note=Secondary Annotator Name: Kamarzar, Minehli /note=Secondary Annotator QC: Good job! I agree with the start site call. Don`t forget to mention if the length of the gene is acceptable in the Gap/overlap section. CDS complement (28968 - 29255) /gene="34" /product="gp34" /function="immunity repressor" /locus tag="ObLaDi_34" /note=Original Glimmer call @bp 29255 has strength 11.89; Genemark calls start at 29255 /note=SSC: 29255-28968 CP: yes SCS: both ST: SS BLAST-Start: [immunity repressor [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 6.2335E-62 GAP: 15 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.243, -2.17469248771465, yes F: immunity repressor SIF-BLAST: ,,[immunity repressor [Gordonia phage Cafasso]],,QXN74249,100.0,6.2335E-62 SIF-HHPRED: Orf20; SaPI, Repressor, STRUCTURAL PROTEIN; HET: SO4; 1.8A {Staphylococcus aureus},,,6H49_A,68.4211,98.8 SIF-Syn: Portal, upstream gene is lysin b, downstream is tyrosine integrase, just like in Cafasso_34 /note=Primary Annotator Name: Gonzalez, Celio /note=Auto-annotation: Glimmer and Genemark mark start at 29255 with starting codon as GTG /note=Coding Potential:Coding potential found only in Reverse stand therefore reverse gene and coding potential is found in both gene mark self and host. /note=SD (Final) Score:-2.175 final score /note=Gap/overlap:There are no gene overlaps /note=Phamerator: Pham 64224 (10/29/21). Conserved in Cafasso (DZ) /note=Starterator: Start Site is 11 on bp 29255. Agrees with Glimmer and conserved in two other non draft members of pham /note=Location call: Most likely start at 29255 based on evidence /note=Function call: Immunity repressor. PhagesDB’s two highest hits were both immunity repressors with e values as low as 1e-49 and 2e-16 and identities of 95% and 54%. For NCBI, 7e-62 and 2e-19 with identity scores of 91/94 and 44/81. Based on CDD, it aligns with immunity repressor because it has proteins that contribute to transcriptional regulators with e value as 1.43 e-3 /note=Transmembrane domains: Not a transmembrane protein because no TMD’s were found /note=Secondary Annotator Name: Light, Isabel /note=Secondary Annotator QC: I agree this is the most likely start site. Explain the final score, give more context on how it compares to the other final scores (less negative is better). Everything else looks good, I agree with function call! I noticed you checked the boxes for evidence on HHpred but did not discuss in the "function call" PECAAN Notes, consider adding them. It seems like they don`t support your functions call, if that`s true I think you want to uncheck them. CDS complement (29271 - 30398) /gene="35" /product="gp35" /function="tyrosine integrase" /locus tag="ObLaDi_35" /note=Original Glimmer call @bp 30398 has strength 12.53; Genemark calls start at 30398 /note=SSC: 30398-29271 CP: yes SCS: both ST: SS BLAST-Start: [tyrosine integrase [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.484, -4.322809268821859, yes F: tyrosine integrase SIF-BLAST: ,,[tyrosine integrase [Gordonia phage Cafasso]],,QXN74250,99.4667,0.0 SIF-HHPRED: Integrase; PROTEIN-DNA COMPLEX, DNA BINDING PROTEIN-DNA COMPLEX; HET: PTR; 3.8A {Enterobacteria phage lambda} SCOP: d.163.1.1, d.10.1.4,,,1Z1B_A,96.0,100.0 SIF-Syn: Tyrosine integrase, upstream is Pham 64224, an immunity repressor, downstream is peptidase, just like in phage Cafasso. /note=Primary Annotator Name: Paek, Brian /note=Auto-annotation: Both Glimmer and GeneMark agree that the start site is 30398. /note=Coding Potential: There is high coding potential based on the middle frame going in the reverse direction within the gene range for both host-trained and self-trained GeneMark. /note=SD (Final) Score: The Final Score is -4.323, and the Z-score is 2.484, both of which are the best among other start site options. /note=Gap/overlap: There is a 1 bp overlap which is reasonable because it is all going on the reverse strand. This start site produces the longest ORF of 1128 bp which is acceptable because it is consistent with the idea that the genes must be densely packed. /note=Phamerator: Pham: 20631. Date Analyzed: 10/22/2021. The gene is conserved in cluster AS and found in phages Abidatro and Amelia. /note=Starterator: Start site 7 is called in 41 out of 43 of the non-draft members in this pham. Start site 7 correlates to 30398 bp in ObLaDi. /note=Location call: The gathered evidence suggests that this is a real gene and the most probable start site is at 30398. /note=Function call: Tyrosine Integrase. Multiple phagesdb BLAST hits have the tyrosine integrase function (E-value < 1E-58), and 2 out of 3 top NCBI BLAST hits also have the tyrosine integrase function. (> 94% coverage, 42%+ identity, and E-value <10^-86). HHpred had a hit for Lambda Integrase Dimer with 100% probability, 96% coverage, and E-value of 5.3E-33. CDD had five hits that conferred to an integrase with an E-value < 1E-04. /note=Transmembrane domains: Neither TMHMM or TOPCONS predicted any TMDs, suggesting that this gene is not a membrane protein. /note=Secondary Annotator Name: Khaine, Aye Myat /note=Secondary Annotator QC: I agree with the annotation based on the evidence above. For synteny notes, mention that the upstream protein is immunity repressor. CDS complement (30398 - 30973) /gene="36" /product="gp36" /function="peptidase" /locus tag="ObLaDi_36" /note=Original Glimmer call @bp 31132 has strength 13.2; Genemark calls start at 30973 /note=SSC: 30973-30398 CP: no SCS: both-gm ST: NI BLAST-Start: [peptidase [Gordonia phage Cafasso]],,NCBI, q1:s1 98.9529% 7.29067E-125 GAP: 285 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.585, -5.593718379696664, no F: peptidase SIF-BLAST: ,,[peptidase [Gordonia phage Cafasso]],,QXN74251,95.7672,7.29067E-125 SIF-HHPRED: IrrE protein; Deinococcus, Radiotolerance, Gene regulation, Metallopeptidase, IrrE; HET: MSE; 2.6A {Deinococcus deserti},,,3DTE_A,64.3979,99.7 SIF-Syn: peptidase. In ObLaDi, upstream is a tyrosine integrase and downstream is possibly a Cro protein. Gene 36 of Cafasso is also a peptidase. In Cafasso, downstream is a cro protein and upstream is a tyrosine integrase. /note=Primary Annotator Name: Rajiv, Subashni /note=Auto-annotation: Glimmer calls the start at 31132. Genemark calls the start at 30973. The start codon is GTG. /note=Coding Potential: The coding potential in this ORF is only in the reverse strand, suggesting it is a reverse gene. Coding potential is found in both GeneMark Host and GeneMark Self. /note=SD (Final) Score: The final score is -6.666. It is not the best Final Score, but compared to the start sites with better final scores, it has the best Z-score. /note=Gap/overlap: There is a gap of 126 base pairs. It is somewhat large, but there is insignificant coding potential in the gap that could suggest another gene. /note=Phamerator: Pham 79866 on 10/23/2021. It is conserved in Cafasso (DZ). /note=Starterator: The conserved start site 3 is base pair 30973. Of the 3 members in this pham, the 1 non-draft member calls it. This start site agrees with the GeneMark start site. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 31132. /note=Function call: The likely function is a peptidase. PhagesDB’s two top hits predicted peptidase function with e-values of 1e-98 and 5e-15, 40 and identities of 94% and 40%, respectively. NCBI’s two top hits also predicted peptidase function with e-values of 4e-124 and 3e-21 and identities of 94% and 42%, respectively. The CDD database has a hit for peptidase function with an e-value of 2.49e-03. /note=Transmembrane domains: No transmembrane domains were called in TMHMM or TOPCONS. It is not a membrane protein. /note=Secondary Annotator Name: Abuwarda, Manar /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. Note: I think you should mention the starterator info for start site 31132 and not just 30973. You could say that the start site 31132 is not conserved, however there is only one non-draft phage in this pham of three members. CDS complement (31259 - 31789) /gene="37" /product="gp37" /function="Cro (control of repressor's operator)" /locus tag="ObLaDi_37" /note=Original Glimmer call @bp 31864 has strength 11.3; Genemark calls start at 31717 /note=SSC: 31789-31259 CP: yes SCS: both-cs ST: NI BLAST-Start: [Cro protein [Gordonia phage Cafasso]],,NCBI, q7:s1 96.5909% 1.37218E-118 GAP: 180 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.724, -3.5290106867422555, no F: Cro (control of repressor`s operator) SIF-BLAST: ,,[Cro protein [Gordonia phage Cafasso]],,QXN74252,100.0,1.37218E-118 SIF-HHPRED: Orf20; SaPI, Repressor, STRUCTURAL PROTEIN; HET: SO4; 1.8A {Staphylococcus aureus},,,6H49_A,44.3182,98.2 SIF-Syn: There is a high level of synteny with the phage known as Cafasso. Aside from being the same gene number, the nearby genes are in the same direction and equal in the number of associated genes in said direction. There is a slight shift in their location but it is very minor. Their functions are also similar, down the operon the functions are in order, peptidase, tyrosine integrase, and immunity repressor for both phages. /note=Rogelio68 (stop@31259 R) /note=Primary Annotator Name: Esparza, Pablo /note=Auto-annotation: Glimmer and Genemark are utilized. They vary in the start site. Glimmer says @31864 R and Genemark says @31717 R. ATG starting codon. /note=Coding Potential: The gene has reasonable coding potential; the typical predicted coding region matches with the ORF. The atypical is a little outside of the ORF near the start site. /note=SD (Final) Score: The final score is -3.529. It came in as second best but is still strongly supportive of my call when presented along with the other factors. /note=Gap/overlap: The gap is 180bp and there is some space between genes up and downstream. It is not very spacious and actually is similar to non-draft phage Cafesso. /note=Location call: I call the start site to be @31789 R. It is consistent as mentioned before with non-draft gene Cafesso and covers coding potential. /note=Phamerator: It is found on DZ as of November 2, 2021. There is only one actual draft gene I can compare it to. I used Cafesso. The other draft, although unreliable, is in the same pham. Cafesso and ObLaDi phams align. there does seem to be a function of Cro protein. /note=Starterator: @31759 R is the start and 31259 R is the stop sign for most MA. It is hard to tell about phams because only one, Cafesso, is a non-draft gene. They all seem to align in location and perhaps could indicate function similarities. The one that seemed the best to me was a start sign of @31789 R. /note=Location call: This appears to be a real gene with coding potential and the start/stop sight appear to be as mentioned before, @31789 R/@31259 R. /note=Function call: (Cro protein?) There is an immunity repressor and integrase down the line in this reverse operon. Could indicate that this is the function. /note=Transmembrane domains: There is no supporting information. It appears due to a lack of hits it is not associated with any transmembrane domains. /note=Secondary Annotator Name: Santos, Charysa /note=Secondary Annotator QC: It appears as though the start site agrees with the one provided on the PECAAN Starterator, which seems to have the largest Z-score and least negative final score. CDS 31970 - 32287 /gene="38" /product="gp38" /function="helix-turn-helix DNA binding domain" /locus tag="ObLaDi_38" /note=Original Glimmer call @bp 31970 has strength 4.76; Genemark calls start at 31970 /note=SSC: 31970-32287 CP: yes SCS: both ST: SS BLAST-Start: [helix-turn-helix DNA binding domain protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 9.14873E-68 GAP: 180 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.484, -4.197870532213559, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding domain protein [Gordonia phage Cafasso]],,QXN74253,100.0,9.14873E-68 SIF-HHPRED: Endothelial differentiation-related factor 1; EDF1, HMBF1alpha, helix-turn-helix, Structural Genomics, NPPSFA, National Project on Protein Structural and Functional Analyses, RIKEN Structural; NMR {Homo sapiens} SCOP: l.1.1.1, a.35.1.12,,,1X57_A,82.8571,98.2 SIF-Syn: Helix-turn-helix DNA binding protein, upstream gene`s function is not called yet, downstream is exonuclease. Upstream gene in phage Cafasso is a Cro protein, but downstream gene is a RecE-like exonuclease in phage Cafasso. /note=Primary Annotator Name: Melkote, Aditi /note=Auto-annotation: Both Glimmer and GeneMark agree on that start site called at 31970, start codon is ATG. Start site at 32270 is not possible due to the fact that the resulting gene would only be 17bp long, as opposed to the minimum length of 120bp. /note=Coding Potential: Good coding potential from GeneMarkS and Host-Trained GeneMark within predicted ORF; start site at 31970 covered in this coding potential. /note=SD (Final) Score: -4.198, not the best but still reasonable to suggest the presence of a credible ribosome binding site. There are also two other start sites, both with better final scores (-4.167) but both of these genes have longer gaps. /note=Gap/overlap: Gap of 105bp, no alternative start sites upstream of 31970; Pham map of Phage Cafasso used for comparison and synteny was observed for this gene with start site 31970 /note=Phamerator: 78520. Date 10-31-2021. It is conserved; found in Cafasso (DZ). /note=Starterator: Start site 7, which corresponds to 31970, was manually annotated once in Cluster DZ. This gene does not have the "Most Annotated" start number 22, but this start number is only called in 12 of 39 non-draft genomes, suggesting that this is not the Best site, seeing as it is called less than 1/3rd of the time. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 31970 bp. /note=Function call: Helix-turn-helix DNA binding protein; PhagesDB Blast indicates low e-values (<10e-3) + high sequence similarity (99%) with a Cafasso gene with HTH DNA binding functionality; HHpred results also show hits for proteins involved in binding DNA/have HTH binding-domains, with high probability (>98%), low e-values (<10e-3) /note=Transmembrane domains: Does not have transmembrane domains. No TMDs, no TOPCONs results. /note=Secondary Annotator Name: Light, Isabel /note=Secondary Annotator QC: Everything looks good! I would try and be more specific with your evidence, for example saying like "the final score is -4.198, two other start sites have a better final score which is -4.167 but both of these genes have longer gaps." and also when discussing the hits for function call, maybe list the e value and coverage %s to make it easier for quality control moving forward. Also I noticed you checked boxes with function immunity repressor, I think you only want to check boxes that support your HTH function. /note=--> Updated accordingly! Made changes to SD score section, have included e-value and coverage % in PhagesDB hits discussion; unchecked immunity repressor and have selected a more relevant HTH function hit. CDS 32337 - 33350 /gene="39" /product="gp39" /function="RecE-like exonuclease" /locus tag="ObLaDi_39" /note=Original Glimmer call @bp 32337 has strength 12.62; Genemark calls start at 32337 /note=SSC: 32337-33350 CP: yes SCS: both ST: SS BLAST-Start: [RecE-like exonuclease [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 0.0 GAP: 49 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.065, -3.1309088467834587, yes F: RecE-like exonuclease SIF-BLAST: ,,[RecE-like exonuclease [Gordonia phage Cafasso]],,QXN74254,97.3294,0.0 SIF-HHPRED: Uncharacterized protein R354; MIMIVIRE, Cas4-like, nuclease, R354, NUCLEAR PROTEIN; 2.806A {Acanthamoeba polyphaga mimivirus},,,5YET_B,93.1751,100.0 SIF-Syn: RecE-like exonuclease, upstream gene is helix turn helix DNA binding protein, downstream is recT-like ss DNA binding protein, just like in phage Cafasso. /note=Primary Annotator Name: Niazmandi, Kiana /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 32337. the start site is ATG /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score:-3.131, the SD final score, and the z-score is the best scores relative to others. The Z-score is 3.065 /note=Gap/overlap: 49 gap. there are alternative start positions but they don`t have good Z-scores and final scores. the length of the gene is 1014, which is acceptable. /note=Phamerator: the pham number is 80929 on 10/22/2021. the gene doesn`t;t have the most conserved start number, but the start number was similar between the phages from the same cluster. I compared my gene with Cafasso_87 which codes a DNA binding protein. my gene is draft and doesn`t;t have a function yet. /note=Starterator: the most conserved number is 72 between all the phams and was manually annotated in 35/101 non-draft genes, however, all the phages in cluster DZ had start number 58, the coordinates are 58, 32337. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: 32337, 58. The gene is real because has good coding potential and the start site 323337 is a potential candidate /note=Function call: Hits for RecE-like exonuclease in BLAST. RecE-like exonucleases are most likely to be called when upstream of a RecT-like ssDNA binding protein, which is the downstream gene. /note=Transmembrane domains: ansmembrane domain. Supports our findings for the gene function because the gene interacts with DNA, and it’s not required to bind to the membrane. /note=Secondary Annotator Name: Ghannam, Maisam /note=Secondary Annotator QC: Please add more detaileded notes to support Genome Profile claims. CDS 33395 - 34282 /gene="40" /product="gp40" /function="RecT-like ssDNA binding protein" /locus tag="ObLaDi_40" /note=Original Glimmer call @bp 33395 has strength 14.01; Genemark calls start at 33389 /note=SSC: 33395-34282 CP: yes SCS: both-gl ST: NI BLAST-Start: [RecT-like ssDNA binding protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 0.0 GAP: 44 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.163, -2.2763497933341483, yes F: RecT-like ssDNA binding protein SIF-BLAST: ,,[RecT-like ssDNA binding protein [Gordonia phage Cafasso]],,QXN74255,97.619,0.0 SIF-HHPRED: RecT ; RecT family,,,PF03837.16,68.4746,100.0 SIF-Syn: RecT-like ss DNA binding protein, upstream gene is RecE-like exonuclease, downstream gene 41 function is not given, just like phage Cafasso. /note=Primary Annotator Name: Senthilvelan, Jayasuriya /note=Auto-annotation: Glimmer - 33395 (original), GeneMark - 33389. They don’t agree on the start site. Both start sites are ATG. /note=Coding Potential: There is coding potential within the putative ORF. Both start sites cover this coding potential. /note=SD (Final) Score: -2.276 is the best SD score for original. /note=Gap/overlap: 44 bp gap. This is reasonable. Another start candidate is 33389. 33389 and 33395 are very similar except 33395 has a higher final/z score. Hence, 33389 was not chosen despite it being the LORF. The length of the gene is reasonable. The gap is also syntenic with Cafasso. /note=Phamerator: Gene is found in Pham 79706 as of 10/26/2021. This pham is in one other member of cluster DZ - Cafasso_40. Function call is Rec-T like ssDNA binding protein and is conserved and approved according to SEA-PHAGES. /note=Starterator: Most annotated start is 41 (92/244 call it), but this start is not called in my gene. Start site 36 (33389) and 39 (33395) are called. 36 is more conserved (8 MAs) relative to 39 (2 MAs). However, in other phages, 36 is only called when there isn’t a start site 39, but ObLaDi has both 36 and 39. So, auto annotation data overrules the starterator analysis, and 33395 is the best start site. This is why I put Starterator as NI. /note=Location call: Based on above evidence, the gene is real and starts at 33395. /note=Function call: Top 2 phagesdb hits suggested RecT-like ssDNA binding protein. (1) Cafasso with score 572, E e-163, ident 95%. (2) VanLee with score 269, E 5e-72, ident 48%. BLASTp suggested RecT recombinase: Gordonia bronchialis with E 6e^-88, ident 50%, score 274, cover 94%. This supports the function being RecT-like ssDNA binding protein. HHpred and CDD say this is a RecT protein that binds to ssDNA. Both databases give very low e-values (94%), medium to high % identity (51%, 98%), and low e-values (93%, and low e-values that met the <10e-3 threshold. The suggested function is also DnaB like DNA dsDNA helicase. /note=Transmembrane domains: Since TMHMM and TOPCONS didn’t call at least 1 TMD, we can conclude that this protein doesn’t have any TMDs. This makes sense because this gene codes for a dsDNA helicase, which packages unwinds the DNA strands during replication. /note=Secondary Annotator Name: Fleming, Hanna /note=Secondary Annotator QC: I agree with this annotation and location call. All of the evidence categories have been considered. I also agree with your function call. CDS 38842 - 39060 /gene="50" /product="gp50" /function="hypothetical protein" /locus tag="ObLaDi_50" /note=Original Glimmer call @bp 38842 has strength 13.25; Genemark calls start at 38842 /note=SSC: 38842-39060 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_50 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.39971E-34 GAP: 10 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.623, -4.2914814936838255, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_50 [Gordonia phage Cafasso]],,QXN74265,86.1111,1.39971E-34 SIF-HHPRED: SIF-Syn: ObLaDi_50 has synteny with Cafasso_50, which is currently labelled as NKF; both are part of Pham 33203. Their upstream genes, ObLaDi_49 / Cafasso_49, are both part of Pham 19781. Cafasso_49 is annotated to be a DnaB-like dsDNA helicase. Their downstream genes, ObLaDi_51 / Cafasso_51, are both part of Pham 54259. While Cafasso_51 does not have a function annotated, ObLaDi_51 is annotated to be a DNA-dependent RNA polymerase. This synteny is maintained several genes upstream and downstream. /note=Primary Annotator Name: Cheng, Celine /note=Auto-annotation: Both Glimmer and GeneMark call the gene and predict the start site to be at 38842 bp. /note=Coding Potential: The ORF has strong coding potential and also contains the predicted start site at 38832 bp. There is an earlier start codon that is very close to the predicted start site (38821 bp) that looks like it would capture all of the coding potential, compared to 38842 bp which looks like it is missing just a little bit of coding potential. /note=SD (Final) Score: -4.291. This is the best Final Score available. /note=Gap/overlap: 10 bp gap upstream, 4 bp overlap downstream /note=Phamerator: It is part of pham 33203, last checked 10/22/2021. Pham 33203 only has 3 members, and only Cafasso is non-draft. All three members call this gene. /note=Starterator: Pham 33203 only has 3 members, Cafasso is the only non-draft member. All three members call Start 3. Cafasso was manually annotated for start 3. Cafasso’s Start 3 correlates to 38852 bp, whereas ObLaDi’s is 38842 bp. The start site at 38842 bp agrees with the start site predicted by GeneMark and Glimmer. /note=Location call: Based on the information above, I would say that this is likely a real gene and its start site is at 38832 bp. While it does not capture all of the coding potential, it is only missing a bit. This start site has the best final score, is predicted by both GeneMark and Glimmer, and was manually annotated for Cafasso. /note=Function call: NKF (No Known Function). PhagesDB BLAST found only 1 strong hit, Cafasso_50, which is annotated to be of unknown function. It has very strong similarity and is in the same cluster as ObLaDi. NCBI BLASTp also only has 1 strong hit, which is also Cafasso_50, predicted to be a hypothetical protein. The PhagesDB Function Frequency Box has no results for any similar genes. Based on our shaky CDD hit, lack of significant HHpred hits, and few BLASTp hits, I will currently call this gene as NKF (No Known Function). While this gene does have synteny with Cafasso, Cafasso_50 is also labelled as No Known Function. Our CDD hit barely missed the significance cutoff for e-value, but suggests that this gene encodes aldehyde dehydrogenase. While it is a gene found in plants, bacteria, and archaea, I can’t really think of a useful function for this gene in phages, though it may be an artifact of horizontal gene transfer. Additionally, SEA-PHAGES does not accept aldehyde dehydrogenase as a function. Regardless, I still think labelling this gene as No Known Function is the safest, most reliable choice right now. /note=Transmembrane domains: TMHMM did not predict any transmembrane domains (TMD). TOPCONS did not either. This suggests that this gene product is not a transmembrane protein. /note=Secondary Annotator Name: Ghannam, Maisam /note=Secondary Annotator QC: Great job! Concise yet descriptive. CDS 39057 - 39305 /gene="51" /product="gp51" /function="hypothetical protein" /locus tag="ObLaDi_51" /note=Original Glimmer call @bp 39057 has strength 5.33; Genemark calls start at 39057 /note=SSC: 39057-39305 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_51 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 2.01555E-22 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.738, -5.340269015678895, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_51 [Gordonia phage Cafasso]],,QXN74266,74.3902,2.01555E-22 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF (Pham 33338), downstream is NKF (Pham 33203), just like in phage Cafasso from the same cluster DZ. /note=Primary Annotator Name: Huq, Naveed /note=Auto-annotation: Glimmer and Genemark both agree on start site @39057, start codon is GTG /note=Coding Potential: Reasonable coding potential in putative ORF, covered by chosen start site /note=SD (Final) Score: The original start site of 39057 has a final score of -5.340 and a Z-score of 1.738. This start site does not have the best Ribosome Binding Site score but the other starts are not better because the gap is much larger. /note=Gap/overlap: Overlap with upstream gene is reasonable and so is gene length /note=Phamerator: Pham 54259 - 10/27/21. The pham my gene belongs to does present in other members of the cluster, DZ. The phage that I used for comparison is Cafasso. No function called. /note=Starterator: Conserved start site number 1, @39057, 2/2 other members of pham call same start site number /note=Location call: Real gene with most likely start site @39057, conserved in starterator /note=Function call: None of the databases got hits except for HHpred. /note=Transmembrane domains: No TMDs predicted /note=Secondary Annotator Name: Chavez, Valeria /note=Secondary Annotator QC: I agree with the location call. Please fill out the synteny box and include more detail about the HHpred hits and the other evidence that led you to call that function. It also seems like there were some hits on BLAST so it might be helpful to include some of that evidence as well. CDS 39302 - 39775 /gene="52" /product="gp52" /function="hypothetical protein" /locus tag="ObLaDi_52" /note=Original Glimmer call @bp 39302 has strength 11.75; Genemark calls start at 39302 /note=SSC: 39302-39775 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_52 [Gordonia phage Cafasso]],,NCBI, q11:s11 93.6306% 4.27378E-84 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.829, -3.4897410997597818, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_52 [Gordonia phage Cafasso]],,QXN74267,83.4395,4.27378E-84 SIF-HHPRED: SIF-Syn: NKF. Per the PECAAN pham map, gene 52 in ObLaDi is of pham 33338 and is downstream of gene 51 of pham 54259 and upstream of gene 53 of pham 33065. In Cafasso, gene 52 is also of pham 33338, gene 51 of pham 54259. and gene 53 of pham 33065. In both ObLaDi and Cafasso,downstream gene 53 is a DNA binding protein. /note=Primary Annotator Name: Gibbons, Alicia /note=Auto-annotation: Predicted by both GeneMark, Glimmer. Both predict a start site of 39302. /note=Start codon: GTG /note=Coding Potential: The gene has reasonable coding potential within the putative ORF and the chosen start site covers nearly all of this coding potential. /note=SD (Final) Score: -3.490. This is the highest final score. /note=Gap/overlap: -4 (original start site). This small amount of overlap suggests the gene’s involvement in an operon. /note=Phamerator: As of 10/25/21, this gene is in the pham 33338. All other phages of the cluster DZ (Cafasso, Draft Aleemily) contain this gene in this phamily. Phamerator and PhagesDB do not call a function for this gene. /note=Starterator: Start site 8 is conserved in 1/1 of the non-draft genes in this phamily. It is called in all 3/3 of the genes in this phamily, including drafts. Each of those phages are in the subcluster DZ. This start site corresponds to the base pair 39302. /note=Location call: Start site of 39302 and a stop site of 39775 /note=Function call: Both PhagesDB and NCBI BLASTp only suggested Cafasso as a phage with this gene. In both cases, this gene was given a hypothetical function. PhagesDB function frequency suggested that this gene may have other functions, although only two were called and these do not agree with other results. Furthermore, CCD returned no hits and HHpred returned no significant hits (all hits had an e-value > 20), so this gene currently has no known function. /note=Transmembrane domains: No transmembrane domains predicted. /note=Secondary Annotator Name: Gonzalez, Celio /note=Secondary Annotator QC: In full agreement, great job! CDS 39810 - 39962 /gene="53" /product="gp53" /function="DNA binding protein" /locus tag="ObLaDi_53" /note=Original Glimmer call @bp 39810 has strength 8.39 /note=SSC: 39810-39962 CP: yes SCS: glimmer ST: SS BLAST-Start: [DNA binding protein [Gordonia phage Cafasso]],,NCBI, q2:s1 98.0% 3.9806E-24 GAP: 34 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.661, -5.4987073736983305, no F: DNA binding protein SIF-BLAST: ,,[DNA binding protein [Gordonia phage Cafasso]],,QXN74268,95.9184,3.9806E-24 SIF-HHPRED: zf-ZPR1 ; ZPR1 zinc-finger domain,,,PF03367.15,70.0,92.8 SIF-Syn: DNA binding protein, upstream gene is NKF, downstream gene is helix-turn-helix DNA binding domain, as in phage Cafasso. /note=Primary Annotator Name: Di Blasi, Daria /note=Auto-annotation: Glimmer and GeneMark do not agree on the start site as only Glimmer shows a start site at 3924 with start codon GTG. /note=Coding Potential: coding potential on both host-trained and self-trained GeneMark; the start site covers all of the coding potential within the ORF. /note=SD (Final) Score: -5.499; this is not the most favorable Final Score but it is on the higher end. /note=Gap/overlap: 34 bp gap with upstream gene & 6 bp overlap with downstream gene; this start site produces the smallest gap (34 bp) with the upstream gene of all the potential start sites. The length of the gene (253 bp) is acceptable. /note=Phamerator: The gene is part of pham 3065 as of October 25th, 2021. The pham has 3 members, all of which are members of the DZ cluster and 2 of which are draft genomes (ObLaDi_Draft, Aleemily_Draft, and Cafasso). /note=Starterator: The “Most Annotated” start site (start site 2) is present in the ObLaDi genome but it is not called. Start site 2 is the start site for the only non-draft phage genome in the pham (Cafasso); called in 1 of the 1 non-draft genes in the pham. Start site 2 corresponds to the basepair coordinate 39813. Both start sites 1 and 2 are highly conserved among members of the DZ cluster yet start site 1 (39810) is the auto-annotated start site. /note=Location call: The evidence suggests that the auto-annotated start site (start site 1) is the correct start because although the only non-draft genome calls site 2, start site 1 is highly conserved among members of the DZ cluster, the start site provides for the LORF, includes all of the coding potential, the predicted start has a favorable RBS score compared to other potential starts (start site 2), and the 34 bp gap is the smallest gap produced of all the potential starts. /note=Function call: DNA binding protein. Since the only phagesdb BLASTp hit called the gene product a DNA binding protein (E-value = 1e-21 & 93% identity), the only NCBI BLASTp hit called the gene product a DNA-binding protein (E-value =4e-24 & 93.88% identity), and the best 2 HHpred hits (although I am wary of the high E-values) called the conserved domain a zinc-finger/zinc ribbon (E-value = 0.25 & 70% coverage, E-value = 0.17 & 72% coverage), there is evidence to support that the gene is DNA binding protein with a zinc-finger domain. There were no relevant CDD hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Wright, Nicklas /note=Secondary Annotator QC: I have QC`ed this location call and agree with the primary annotator. CDS 39955 - 40215 /gene="54" /product="gp54" /function="helix-turn-helix DNA binding domain" /locus tag="ObLaDi_54" /note=Original Glimmer call @bp 39955 has strength 7.63; Genemark calls start at 39955 /note=SSC: 39955-40215 CP: yes SCS: both ST: NI BLAST-Start: [helix-turn-helix DNA binding domain protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.51318E-54 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.776, -3.5985503360675457, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding domain protein [Gordonia phage Cafasso]],,QXN74269,97.6744,1.51318E-54 SIF-HHPRED: sigma factor; Bergerat fold, helix-turn-helix, PROTEIN BINDING; HET: ADP; 2.9A {Geobacillus stearothermophilus} SCOP: a.4.13.2,,,1L0O_C,53.4884,97.5 SIF-Syn: Helix-turn-helix DNA binding domain, upstream gene is a DNA binding protein, downstream gene is a PnuC-like Nicotinamide riboside transporter, just like in phage Cafasso /note=Primary Annotator Name: Empson, Brianna /note=Auto-annotation: Both Glimmer and Genemark do call this gene at the same start site of nucleotide 39955. This corresponds to a start codon of ATG. /note=Coding Potential: There is good coding potential within the putative ORF for this gene. The coding potential is encompassed from the predicted start site to the stop site. /note=SD (Final) Score: The SD Score that corresponds to the predicted start site is the best at -3.599. /note=Gap/overlap: There is a reasonable gap of -8 bp. The length of the gene created is also acceptable. While there is another start site that creates the LORF, every other piece of evidence (such as Z-Value, Final Score /note=Phamerator: As of 10/24/2021, this gene is found in pham 32707. ¾ of the genomes (including 1/2 non-draft phages, Cafasso) are in the same cluster as ObLaDi (DZ) while the fourth member, VanLee, is a singleton (this data includes the draft phages). The pham database had the function “helix-turn-helix DNA binding domain protein” called for the same gene in Cafasso, a non-draft phage in the same cluster as ObLaDi. /note=Starterator: Staterator was uninformative in this case. Only ½ non-draft phage genomes call the “most conserved” start site (#8). However, ObLaDi does not have this start site number. One helpful piece of information is that Cafasso has the same start site called (#7) as this gene in ObLaDi, and the two phages are in the same cluster. This supports the auto-annotated start site. /note=Location call: Altogether, the evidence supports that this is a real gene with the auto-annotated start site at bp 39955. Both Glimmer and GeneMark call the same start site, and this start site has the best Final Score and Z-Value. It also has the same start as the phage, Cafasso, that is in the same cluster. /note=Function call: Using PhagesDB BLAST and NCBI BLAST, it was determined that Cafasso is the only genome that has a good enough e-value to compare this function to. When compared to Cafasso, a high identity (95%) is identified and a good e-value is present (3e-47). If we use Cafasso as a function hypothesis, we can hypothesize that this function is a helix-turn-helix DNA binding domain protein. CDD did not return any relevant hits. The top hits on HHpred were not relevant because they were not supported by any of the other resources used to determine the function. All other resources, including synteny, agreed on a helix-turn-helix DNA binding domain. Only one HHpred hit agreed here, and it had a high probability (> 97%), coverage (> 53%), and a low enough e-value (0.001). /note=Transmembrane domains: No transmembrane domains predicted. The absence of TMDs makes sense in this context because the hypothesized function of a helix-turn-helix binding domain suggests that the protein encoded for by this gene primarily interacts with viral DNA and does not interact with the bacterial cell membrane or aid in lysis/entry. /note=Secondary Annotator Name: Magaling, Janelle /note=Secondary Annotator QC: Looks good! CDS 40212 - 40466 /gene="55" /product="gp55" /function="PnuC-like Nicotinamide riboside transporter" /locus tag="ObLaDi_55" /note=Original Glimmer call @bp 40212 has strength 14.16; Genemark calls start at 40212 /note=SSC: 40212-40466 CP: yes SCS: both ST: SS BLAST-Start: [PnuC-like nicotinamide riboside transporter [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 4.2104E-50 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.243, -2.1924212546750814, yes F: PnuC-like Nicotinamide riboside transporter SIF-BLAST: ,,[PnuC-like nicotinamide riboside transporter [Gordonia phage Cafasso]],,QXN74270,98.8095,4.2104E-50 SIF-HHPRED: NMN_transporter ; Nicotinamide mononucleotide transporter,,,PF04973.14,84.5238,99.5 SIF-Syn: PnuC-like nicotinamide ribosome transporter just like Cafasso, no known gene function discovered yet in surrounding genes. /note=Primary Annotator Name: Maisam Ghannam /note=Auto-annotation: Both Glimmer and Genemark. Same start site agreed upon (40212 F). /note=Coding Potential: High coding potential cross verified by both host and self trained Genemark. Synteny projection very similar to other phages in the same cluster. Start site does cover all coding potential. /note=SD (Final) Score: -2.192, best SD score listed /note=Gap/overlap: 4 bp gap between previous gene and start site for current gene. Evidence suggests operon exists within this overlap. /note=Phamerator: As of 10/22/21, gene was found in pham #54954. Pham had highly conserved genes that were found in the rest of the phages within the same cluster, specifically Aleemily and Cafasso. Function call was for a PnuC-like nicotinamide riboside transporter. /note=Starterator: Starterator depicts common start site for this pham to be at 30, however this phage has evidence pointing to start site 27. Cluster is small and highly conserved, suggesting that human reviewed Cafasso with start site 27 has better correlation to ObLaDi than suggested start site 30. /note=Location call: Evidence suggests this is a real gene with high coding potential and a start site 27 @ 40212 F. No gene documentation needed before the start site as Genemark and Glimmer both confirm a low coding potential for the region. /note=Function call: High e values on PhagesDB BLASTp, NCBI BLASTp, high probability values over 99% on HHpred and NCBI, correlates with a gene found on Cafasso, which is in the same subcluster as ObLaDi. /note=Transmembrane domains: High probability for transmembrane helix domains, 3 alpha helices projected across 55 amino acid sequence. Aligns with functional role of nucleoside ribosome transporters. /note=Secondary Annotator Name: Esparza, Pablo /note=Secondary Annotator QC: Make sure to put `Yes" for all GM coding Capacity! Also, you mention a -4bp gap, it is an overlap as you even mention it could be part of an operon. If it is minus, it is overlap if I am not mistaken. Otherwise, I agree everything checks out. Good job! CDS 40469 - 40765 /gene="56" /product="gp56" /function="hypothetical protein" /locus tag="ObLaDi_56" /note=Original Glimmer call @bp 40469 has strength 7.96; Genemark calls start at 40469 /note=SSC: 40469-40765 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_56 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 2.6485E-65 GAP: 2 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.255, -2.1695583717155773, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_56 [Gordonia phage Cafasso]],,QXN74271,98.9796,2.6485E-65 SIF-HHPRED: SIF-Syn: NKF, Pham #21988, upstream gene is PnuC-like Nicotinamide riboside transporter, downstream is lipoprotein, just like in phage Cafasso /note=Primary Annotator Name: Light, Isabel /note=Auto-annotation: Both Glimmer and Genemark showed the same start site, 40469 with a start codon of ATG. /note=Coding Potential: Coding potential observed is high in the putative ORF. Coding potential is fully included in start and stop sites. /note=SD (Final) Score: The best SD score is -2.170 which is the most positive SD score value for all the potential start sites. /note=Gap/overlap: There is a gap of 2 bp which is reasonable and this creates the longest and most reasonable ORF. /note=Phamerator: Conducted on 10/29/2021, this gene is in pham 21988. The gene was conserved among an undefined number of phages. The function seems to be unknown. /note=Starterator: Start site #8 (40469) is not conserved among all pham members. Aleemily, Cafasso, ObLaDi all called the same start site which was the “most annotated” start. This is also the auto-annotation start-site. (3/4 call site #8). Skog did not contain start site #8 but was in the pham. /note=Location call: All evidence supports this is a real gene that is conserved in phamerator and has good coding potential, it is most likely site #40469 is the start site as it was conserved in three phages and was called as the start site for 3 of the 4 phages in the pham. Specifically in Cafasso as that is the only member that has been manually annotated. /note=Function call: There have been no significant hits on any genes with known functions, all hits on BLAST, HHpred, NCBI BLASTp, and CCD. /note=Transmembrane domains: No evidence of transmembrane domains. /note=Secondary Annotator Name: Baughman, Lexie /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator. I have also QC`ed this function call and agree with the first annotator. CDS complement (40741 - 41109) /gene="57" /product="gp57" /function="lipoprotein" /locus tag="ObLaDi_57" /note=Original Glimmer call @bp 41109 has strength 7.51; Genemark calls start at 41109 /note=SSC: 41109-40741 CP: yes SCS: both ST: SS BLAST-Start: [lipoprotein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.77137E-40 GAP: 110 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.601, -6.785629227243047, no F: lipoprotein SIF-BLAST: ,,[lipoprotein [Gordonia phage Cafasso]],,QXN74272,77.8689,1.77137E-40 SIF-HHPRED: LpqV ; Putative lipoprotein LpqV,,,PF17301.4,67.2131,99.5 SIF-Syn: Lipoprotein, upstream gene is NKF, downstream is also NKF, just like in phage Cafasso. Phage Cafasso has an additional gene upstream of it that is not found in that of ObLaDi`s. /note=PECAAN Notes /note=Primary Annotator Name: Erfanian M., Kiana /note=Auto-annotation: Glimmer and GeneMark. Both called the same start sites at 41109. /note=Coding Potential: This gene has good coding potential within the putative ORF, and covers all this coding potential. /note=SD (Final) Score: The RBS Final Score of the original start site at -6.786 and Z-score at 1.601 for this start are not the highest but are still favorable. /note=Gap/overlap: My gene has a 110 bp gap with the gene before it, downstream. While this gap is on the large side, it is not considered unreasonable, and is additionally present in the genome of phage Cafasso. Therefore this gap does not require the addition of a gene in this gap, which would not fit anyway. Additionally, there is no gap with its upstream gene, but does however have an overlap of 24 bp. This however, is reasonable and the gene can still be considered legitimate. /note=Phamerator: This gene was found in pham 33649 on 10/21/21, and consists of three members, two of which are draft genomes. This pham was found to be present in another member of the cluster DZ, using phage Cafasso for comparison. /note=Starterator: Using information from the Starterator analysis run most recently on 10/15/21, it was found that the most conserved start site number is 2 (41109), which was called in the one and only non-draft genes. Looking at ObLaDi’s track, start 2 is listed as the first start by a green line, indicating that it was determined as the final human annotated start. /note=Location call: The evidence gathered thus far indicates that the start site at 41109 as called by Glimmer and GeneMark appears to be the most probable site. /note=Function call: Both PhagesDB BLASTp and NCBI BLASTp have several hits with low e values, high identity percentages, and reasonable scores. The top non-draft hit on PhagesDB BLASTp was for a gene in Cafasso, a final phage in the same cluster as ObLaDi. This hit has a significantly low E-value at 1e-41, a reasonable score, as well as a high identity percentage of 69%. Its function is lipoprotein. Furthermore, the top non-draft hit on NCBI BLASTp was also for a gene in Cafasso. This hit also has a low E-value of 2e-40, a reasonable score, as well as a high identity percentage of 100%. This hit also has a function of lipoprotein. The hits returned by HHpred further provided information on many genes with this same function. It is therefore reasonable to conclude that the function of the gene in question is lipoprotein. CDD and HHpred did not have any hits. Given the above data, there is enough data to form a hypothesis for the function of my gene, which appears to be lipoprotein. /note=Transmembrane domains: No TMDs called by TmHmm or TOPCONS. The protein is not a membrane protein. /note=Secondary Annotator Name: Cheng, Celine /note=Secondary Annotator QC: Great job! I would agree with this annotation. Don`t forget to mark the GM Coding Capacity and Starterator! CDS 41220 - 41642 /gene="58" /product="gp58" /function="hypothetical protein" /locus tag="ObLaDi_58" /note=Original Glimmer call @bp 41220 has strength 5.18; Genemark calls start at 41235 /note=SSC: 41220-41642 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_59 [Gordonia phage Cafasso]],,NCBI, q2:s16 99.2857% 3.35324E-93 GAP: 110 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.111, -6.547901311046603, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_59 [Gordonia phage Cafasso]],,QXN74273,87.6623,3.35324E-93 SIF-HHPRED: SIF-Syn: NKF, upstream gene is lipoprotein, downstream is NKF (Pham 33170), just like in phage Cafasso from the same cluster DZ /note=Primary Annotator Name: Khaine, Aye Myat /note=Auto-annotation: Both Glimmer and GeneMark call the gene at slightly different start sites. Glimmer calls it at 41220 and GeneMark calls it at 41235. /note=Coding Potential: There is a reasonable coding potential. The start sites include all of the coding potential. /note=SD (Final) Score: The final SD score for 41220 is -6.548 with z-score of 1.111. The final SD score for 41235 is -6.080 with a z-score of 1.348. /note=Gap/overlap: The gap upstream is somewhat large at 110 for 41220 and 125 for 41235 but the gap is observed in another phage and does not have a coding potential. /note=Phamerator: The pham this gene is found in is 33540 as of 10/26/2021. The pham is conserved and found in the non-draft phage Cafasso of the same cluster. /note=Starterator: The manually annotated start site is 1, only found in non-draft Cafasso phage and not in this phage. The auto-annotated start site is 3 at 41220 and 4 at 41235, neither are the most annotated. /note=Location call: Based on the evidence above, this is a real gene with the start site at 41220. /note=Function call: Unknown function. One top hit from both phagesdb and NCBI BLAST has unknown function (e- value of 3e-93 and identity of 96%). No CDD hits are found. HHPRED hits are not reliable to determine a function. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predicted any TMDs. The protein is not a membrane protein. /note=Secondary Annotator Name: Erfanian M., Kiana /note=Secondary Annotator QC: The information provided indicates that your call on projected start site 41220 is correct. CDS complement (41615 - 41788) /gene="59" /product="gp59" /function="hypothetical protein" /locus tag="ObLaDi_59" /note=Original Glimmer call @bp 41788 has strength 5.7; Genemark calls start at 41788 /note=SSC: 41788-41615 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_CAFASSO_60 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.42504E-34 GAP: 4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.79, -5.999909193173244, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_60 [Gordonia phage Cafasso]],,QXN74275,100.0,1.42504E-34 SIF-HHPRED: SIF-Syn: NKF. Cafasso was the only other phage used for evidence in this gene call. The upstream and downstream genes for the related gene in Cafasso were also NKF. /note=Primary Annotator Name: McLinden, Katherine /note=Auto-annotation: Both Genemark and Glimmer call the start site at 41788. The codon called is ATG. /note=Coding Potential: Both GeneMark and Glimmer suggested coding potential at the putative ORF. The start site covers all of the coding potential. /note=SD (Final) Score: -6.000. This was the best final SD score. /note=Gap/overlap: 4. This was the most reasonable gap out of the two start sites. Additionally, the length of the gene for this start site is more reasonable. /note=Phamerator: The pham for this gene is 33170. The date is 10/26/21. The other two genomes where this pham are conserved are in the same DZ cluster: Cafasso (non-draft) and Aleemily (draft). /note=Starterator: Start site 1 was called on 100% of the phams present for this gene. Start 1 in ObLaDi is 41788, which is the same start site predicted in Glimmer and GeneMark. Starterator was marked as NA because there was only one possible start site. /note=Location call: This is most likely a real gene. Based on Glimmer, GeneMark, and Starterator, I believe that 41788 is the most likely start site. /note=Function call: PhagesDB BLASTp and NCBI BLASTp were used to try and determine the gene’s function. There is one hit in each database from non-draft genes that has significance for this gene’s function. Both hits have very low E-values (1e-29 and 1e-29 respectively) and 100% identities, making them good matches. However, neither has any suggested function meaning we are not able to determine a putative function for this gene. Additionally, CDD and HHpred had no significant hits supporting the fact that there is no known function. /note=Transmembrane domains: There were no TMD`s called and no other evidence to suggest a TMD function within the other databases. /note=Secondary Annotator Name: Wang, Jennifer Yiyang /note=Secondary Annotator QC: Looks good, but I think the Starterator is not quite informative, maybe double check that! Agree on the start site. CDS complement (41793 - 42170) /gene="60" /product="gp60" /function="hypothetical protein" /locus tag="ObLaDi_60" /note=Original Glimmer call @bp 42170 has strength 9.34; Genemark calls start at 42170 /note=SSC: 42170-41793 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_61 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.0679E-82 GAP: 509 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.008, -4.7084971843946395, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_61 [Gordonia phage Cafasso]],,QXN74276,97.6,1.0679E-82 SIF-HHPRED: SIF-Syn: NKF (pham 33606), upstream gene is NKF (pham 83057), downstream is NKF (pham 33170), just like in phage Cafasso. /note=Primary Annotator Name: Uvarov, Evgeniy /note=Auto-annotation start source: Glimmer and GeneMark both call start at 42170 (site 1) with an ATG start codon. /note=Coding Potential: Reasonable coding potential predicted within the putative ORF by both Glimmer and GeneMark. The chosen start site covers all the coding potential. This ORF only has reverse strand coding potential, thus this is a reverse gene. /note=SD (Final) Score: -4.708, the best final score on PECAAN, but with a tie. The Z-score is 2.008, the best value on PECAAN, but with a tie. /note=Gap/overlap: Large gap of 509 bp is present that likely does not contain a missing gene. Gap has no coding potential in any of the 6 reading frames but does contain the two promoters for the adjacent gene start sites due to switch from reverse to forward reading frames. Additionally the large gap is present in other non-draft phages such as Cafasso. This start site creates the LORF and the gene length is 378 bp which is acceptable. /note=Phamerator: The pham number as of 10/25/2021 is 33606. The gene is conserved in phages Aleemily_Draft (DZ) and Cafasso (DZ). Cafasso (DZ) is the best phage genome for comparison since it is non-draft. Based on PhagesDB there is no function call for the gene. /note=Starterator: Based on the 10/22/21 run the most annotated start site 1 is a reasonable choice that is conserved among members of pham 33606. There are 3 members total with only 1 non-draft member in this pham. 3/3 of total members and 1/1 of non-draft members call start site 1, which correlates to 42170 bp for ObLaDi. /note=Location call: Considering all of the evidence above, this gene is a real gene that is conserved in phamerator as well as starterator, has good coding potential and covers all of it with a start site at 42170 bp (site 1). Starterator agrees with Glimmer and Genemark. /note=Function call: Not enough data to form a function hypothesis, but this is likely a real protein. The only PhagesDB BLASTp non-draft hit with a small e-value has an unknown function from Cafasso_61 (e: 2e-69, id: 96%). NCBI BLASTp shows only a single hit of a hypothetical protein with a small e-value from Cafasso_61 (e: 1e-82, id: 97%, cov: 100%). This gene possibly has a function of thymidylate synthase but it is very unlikely. PhagesDB BLASTp hits of thymidylate synthase have e-values that are too large (Nebkiss_89, Gaia_90: 0.25). NCBI BLASTp has no thymidylate synthase hits. Additionally the thymidylate synthase phams in the Phagesdb Function Frequency (57278, 55821) do not match with this gene’s pham (33606). There were no CDD hits. The HHpred hits were not significant enough to use with the lowest three e-values of 1.1, 5.1, 5.2. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Liao, Shiqing /note=Secondary Annotator QC: Agree with the primary annotator. CDS 42680 - 42943 /gene="61" /product="gp61" /function="hypothetical protein" /locus tag="ObLaDi_61" /note=Original Glimmer call @bp 42680 has strength 11.36; Genemark calls start at 42680 /note=SSC: 42680-42943 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_62 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.29143E-54 GAP: 509 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.077, -4.856184980221285, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_62 [Gordonia phage Cafasso]],,QXN74277,100.0,1.29143E-54 SIF-HHPRED: SIF-Syn: NKF protein, upstream gene is NKF protein, downstream is NKF protein, just like in phage Cafasso. /note=Primary Annotator Name: Jin, Katherine /note=Auto-annotation: Both Genemark and Glimmer. /note=Both agreed on the same start site: 42680. Site # 61. /note=ATG start codon. /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF. /note=The chosen start site covers all this coding potential. /note=SD (Final) Score: -4.856, with a Z-score of 2.077. /note=Gap/overlap: 509 bp was found. Gap is relatively big, and is not below the accepted 50bp. Unknown if there is gene in Gap. /note=Phamerator: The pham number was 77992 as of 10/22/21. The gene is conserved in Cafasso, which is also part of the DZ cluster. /note=Starterator: Start site 61 was the most annotated start number that was called for 4/5 non-draft phages. This corresponds to the 42680 start site. /note=Location call: From the evidence this gene is real and its most likely start site is 42680. /note=Function call: NKF, From the evidence of hits from databases, PhagesDB and NCBI have suggested unknown function of gene. HHpred and CDD both did not have informative hits for this gene. NCBI BLAST did have some better hits that gave NKF with e-values of 1e-54 to 4e-19 and high alignment identity with phage Cafasso. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Sheppy, Tyler /note=Secondary Annotator QC: I agree with the location call of 42680 and the function call of NKF. Make sure to fill out the synteny box below and fix the location call note above so that it has the same start site as the rest of your annotation. CDS 42940 - 43251 /gene="62" /product="gp62" /function="hypothetical protein" /locus tag="ObLaDi_62" /note=Original Glimmer call @bp 42940 has strength 12.46; Genemark calls start at 42940 /note=SSC: 42940-43251 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_63 [Gordonia phage Cafasso]],,NCBI, q1:s1 99.0291% 3.40838E-59 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.921, -4.966271270034708, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_63 [Gordonia phage Cafasso]],,QXN74278,96.0784,3.40838E-59 SIF-HHPRED: SIF-Syn: NFK, gene upstream is NFK just like in Cafasso and gene downstream is NFK just like in Cafasso. /note=Primary Annotator Name: Montoya Serpas, Cinthya /note=Auto-annotation: The start site is 42940 per Glimmer and GeneMark. /note=Coding Potential: There is good coding potential that spans the start site and the stop site. /note=SD (Final) Score: -4.966. This is the best SD score when compared to other scores because it is the second least negative which results in a reasonably long ORF. The best final score would result in a 293 gap which is unlikely to be correct. /note=Gap/overlap: 4 bp overlap with gene upstream which indicates that this gene is part of an operon. This is very strong evidence for this gene being a real gene. The gene length for this gene is 264 bp which is about 100 bp less than the two other genes in Pham 33179. /note=Phamerator: This gene is found in pham #33217 as of 10/26/21. The pham in which my gene is conserved is also present in phages Cafasso and Aleemily which were also used for comparison. As of 10/26/21 the function of this gene is unknown. /note=Starterator: Start site #2 at 42940 is a reasonable start site among the members of pham #33217 since it is conserved among all members of this phamily. It was manually annotated as the start site of the corresponding Cafasso and Aleemily’s genes. There are 3 members in this phamily and 3/3 members call site #2 at 42940. /note=Location call: The evidence gathered so far suggests that the best start site for this gene is #2 at 42940 because it is the most manually annotated start site and it is conserved among the other members of pham #33217. This is in fact a real gene as it has good coding potential that spans the putative ORF for this gene. /note=Function call: NFK: Based on the BLAST results for this sequence, there is not enough evidence to form a hypothesis regarding the function of gene # 62. There are no phages other than Cafasso that contain reasonably high identity values. According to PhagesDB, the identity percentage for Cafasso is 100%, the e-value is 4e-51, and the score is 198 bits. Similarly, NCBI reports a lower identity percentage of 90%, a much higher e-value at 4e-59, and a slightly lower score of 186. The second highest hit corresponds to phage Blino which has much worse values when compared to Cafasso. Phages DB reports a score of 32 bits, an e-value of 0.44, and an identity value of 27% which tells us that ObLaDi_62 contains a very different sequence when compared to other phages of different clusters. Although there aren’t any hits that have functions and have reasonable e-values, Cafasso_63 is a very good match to ObLaDi_62 which indicates that this is in fact a real gene. Additionally, there are no conserved domains listed in the CDD. There are 34 hits present in the HHpred database however, none of these hits are significant. For example, for hit PF11860.10, a muramidase protein, the probability value is very low at 20.72, the e-value is extremely large at 40, and the identity value is 24.15 which is not sufficiently large. The next hit corresponds to protein d1iga2 which corresponds to the Alpha and beta protein and the dimeric alpha and beta barrel superfamily. The probability value for this protein is much higher at 57.1% but the e-value is slightly higher at 48 which is much higher than the required threshold of 10e-3. /note=Transmembrane domains: There are no predicted transmembrane domains for this gene according to TMHMM or TOPCONS. Therefore, this is not a transmembrane domain protein. /note=Secondary Annotator Name: Gibbons, Alicia /note=Secondary Annotator QC: I agree with this call! Everything looks good! You could add Cafasso as evidence in the section NCBI BLAST. CDS 43248 - 43619 /gene="63" /product="gp63" /function="hypothetical protein" /locus tag="ObLaDi_63" /note=Original Glimmer call @bp 43248 has strength 12.0; Genemark calls start at 43248 /note=SSC: 43248-43619 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein PBI_YUNGJAMAL_100 [Mycobacterium phage YungJamal] ],,NCBI, q6:s17 85.3659% 5.59383E-12 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.921, -4.966271270034708, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PBI_YUNGJAMAL_100 [Mycobacterium phage YungJamal] ],,AII28339,42.2535,5.59383E-12 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Semaan, Sasha /note=Auto-annotation: Both Genemark and Glimmer were analyzed for auto-annotation start sites and both softwares agree that the gene start site is at 43248. /note=Coding Potential: The gene does have reasonable coding potential predicted within the putative ORF and the chosen start site does cover coding potential /note=SD (Final) Score: -4.966; A couple of other suggested start sites had slightly more ideal Final Scores but with start site @ 43248, we get the longest ORF and the gene may be part of an operon. /note=Gap/overlap: 4bp overlap (operon); This overlap is ideal and suggests gene is part of an operon. With this overlap, the gene also have the longest ORF. Other genes have gaps present and shorter ORFs. /note=Phamerator: This gene`s is in Pham 79655 as of 10/27/21. The 64 genes that are part of this pham do not share the same cluster as this gene. Additionally, no functions were listed for the other genes present in the pham. /note=Starterator: There was no conserved start site among the other genes present in my gene. My gene was the only one to have a start site number 16, which is identified as the auto-annotated start site for the gene. My gene also lacked the most conserved start site which is start site 18 for genes within the pham. 41/64 genes call start site #18 which is not present in my gene. /note=Location call: This gene covers all coding potential but is not conserved in phamerator. Despite lack of conservation, this gene seems to be a part of an operon based on base pair overlap and consistency in auto-annotation start site called between GeneMark, Glimmer, and Phamerator suggesting that the start site for this gene is the auto-annotated start site @ 43248 bp. /note=Function call: Function unknown; All of the hits in PhagesDB Blast with an acceptable e-value and identity, that are not draft genes, do not have a function listed (labeled as “function unknown”). The second hit from NCBI had an identity of 39% and an e-value of 7e-12 which are okay values, and the function listed was gp98. All the other hits from NCBI were hypothetical proteins. I’m not sure what the exact function of the ORF is. The CCD database did not yield any hits. The top two hits from HHpred for this gene did have the same function which is the C-terminal domain of a ribosomal protein. I do not think this is sufficient evidence to determine the function of this ORF. /note=Transmembrane domains: No transmembrane domains. The absence of transmembrane domains does not impact the hypothesized function of this gene because this gene has no known function thus far. The TOPCON database does have one transmembrane domain hit but this is not sufficient evidence to call the function of this protein a membrane protein. /note=Secondary Annotator Name: Sheppy, Tyler /note=Secondary Annotator QC: I agree with both the location call and the function call. Make sure to fill out your notes for the Transmembrane domains section above. CDS 43616 - 43852 /gene="64" /product="gp64" /function="hypothetical protein" /locus tag="ObLaDi_64" /note=Original Glimmer call @bp 43616 has strength 14.66; Genemark calls start at 43616 /note=SSC: 43616-43852 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_65 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 2.06113E-44 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.484, -3.75071250087134, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_65 [Gordonia phage Cafasso]],,QXN74280,93.5897,2.06113E-44 SIF-HHPRED: SIF-Syn: /note=Location: start 43,616 chosen by Glimmer/Genemark; best stats; supported by Starterator. /note=Function call: unknown. CDD does not call it and the probabilities/e-values on HHpred are extremely low to be sure of the function call. /note=Transmembrane domains: None /note=Secondary Annotator Name: Santos, Charysa /note=Secondary Annotator QC: I agree with the location call. However, it is missing the drop-down menus. CDS 43849 - 44427 /gene="65" /product="gp65" /function="hypothetical protein" /locus tag="ObLaDi_65" /note=Original Glimmer call @bp 43849 has strength 17.22; Genemark calls start at 43849 /note=SSC: 43849-44427 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_66 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 3.13267E-97 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.2, -4.252258253355557, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_66 [Gordonia phage Cafasso]],,QXN74281,92.6316,3.13267E-97 SIF-HHPRED: SIF-Syn: Upstream gene (stop 43852; pham 33553) and downstream gene (stop 44848; pham 74739) have no known function (NKF). /note=Primary Annotator Name: Alvarez, Alondra /note=Auto-annotation: Glimmer and GeneMark both call the start site at 43849. /note=Coding Potential: The ORF for this gene has a reasonable amount of coding potential across the ORF in both Host and Self GeneMark, self containing both alternate and typical coding potential. The chosen start site does not include all of the coding potential. /note=SD (Final) Score: -4.252. This is the best RBS final score out of the potential start sites on PECAAN. /note=Gap/overlap: -4. Overlap is possibly indicative of an operon. Not too large to consider gene addition. /note=Phamerator: pham: 34370. Ran on 10/27/2021. The gene is found in phage Cafasso (DZ) and singleton phage VanLee. /note=Starterator: start site 4 at position 43849 is the most conserved start site; it is annotated 2/2 of the non-draft genes in the pham. /note=Location call: Based on information collected, this is a “real” gene. The chosen start site is at 43849. /note=Function call: NKF - Running Phages DB BLASTp yielded significant hits (e-value < 10^-22) with acceptable identity and query coverage percentages (>59% and >46%, respectively), however none of these hits had known functions. Similarly, all hits, whether significant or not, from NCBI BLASTp were labeled as hypothetical proteins - meaning there is no known function. HHpred too failed to return any significant hits, with the smallest e-value belonging to 5VLA_Z at a value of 11. Furthermore, running the protein sequence of the gene in CDD did not return any hits at all. All of this data suggests no known function (NKF). /note=Transmembrane domains: TMHMM nor Topcons predict any transmembrane domains. Gene is not a membrane protein. /note=Secondary Annotator Name: Teoh, Bryan /note=Secondary Annotator QC: -4bp Gap (Operon), large reading frame, and good coding potential provides the necessary rational to conclude the gene call is accurate. CDS 44420 - 44848 /gene="66" /product="gp66" /function="hypothetical protein" /locus tag="ObLaDi_66" /note=Original Glimmer call @bp 44420 has strength 10.66; Genemark calls start at 44420 /note=SSC: 44420-44848 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein M176_gp045 [Rhodococcus phage E3] ],,NCBI, q39:s5 40.8451% 2.99279E-16 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.601, -5.8313867178037215, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein M176_gp045 [Rhodococcus phage E3] ],,YP_008061083,34.0909,2.99279E-16 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Baughman, Lexie /note=Auto-Annotation: Glimmer and Genemark. Both agree on the same start site of 44420, with a start codon of ATG. /note=Coding Potential: Reasonable coding potential predicted within the putative ORF. Chosen start site of 44420 covers all of the coding potential. /note=SD (Final) Score: The SD score is not the best (only one potential start site has a better SD score), but it is still reasonable to suggest the presence of a credible ribosome binding site. /note=Gap/Overlap: There is a 8 base pair overlap with the upstream gene, which may be cause for concern. This start site creates the longest ORF and the length of the gene is acceptable. /note=Phamerator: As of 10/24/2021, the gene is found in Pham 74739. The pham is not conserved in other members of the cluster - comparison was done between ObLaDi and Cafasso, as this was the only non-draft genome available. The function is not called by either Phamerator or PhagesDB. /note=Starterator: Starterator was uninformative for this gene because it is an orpham. /note=Location Call: The gathered evidence suggests that this is a real gene - even though it is an orpham, it has good coding potential, an acceptable length, strong PhagesDB BLAST hits, and a reasonable Final Score and Z-Score. Its start site is likely at position 44420. /note=Function Call: The top 2 NCBI BLASTp hits suggested function is hypothetical protein, with mediocre query coverage (30-40%), decent % identity (67% and 80%), and decent e-values (<2e-15). The top 2 PhagesDB BLASTp hits suggested function is unknown, with decent % identity (67% and 71%) and decent e-values (<1e-20). Similarly, the CDD and HHpred hits were uninformative, with very high e-values and low probabilities and coverages. As such, there does not seem to be enough evidence to call the function of this gene; this makes sense since the gene is an orpham. /note=Transmembrane Domains: No predicted transmembrane domains. /note=Secondary Annotator Name: Ghannam, Maisam /note=Secondary Annotator QC: Good job! Mentioning the ribosome binding site potentiality was great. CDS 44845 - 45333 /gene="67" /product="gp67" /function="hypothetical protein" /locus tag="ObLaDi_67" /note=Original Glimmer call @bp 44830 has strength 13.3; Genemark calls start at 44845 /note=SSC: 44845-45333 CP: yes SCS: both-gm ST: NA BLAST-Start: [hypothetical protein SEA_CAFASSO_68 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 3.12633E-88 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.889, -3.1896562502401133, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_68 [Gordonia phage Cafasso]],,QXN74283,90.1235,3.12633E-88 SIF-HHPRED: SIF-Syn: Function is NKF, upstream gene (pham 34370) is also NKF, and the gene dowstream (pham 33421) is DNA binding protein just like Cafasso /note=Primary Annotator Name: Dooley, Naomi /note=Auto-annotation: The gene is called by Glimmer and Genemark at two different start sites. Glimmer calls the start at 44830 with an ATG start codon and Genemark calls the start at 44845 with a GTG start codon. Because both of the start codons called by both Glimmer and Genemark are equally common, we cannot determine based on this information alone if one is more likely than the other. /note=Coding Potential: Both of the sites have reasonable coding potential. Both of the start sites cover all of the coding potential and therefore one cannot be ruled out over the other in this case. /note=SD (Final) Score: The SD score for 44830 is much lower than the score for 44845 with 44830 being at -7.152 and the SD score for 44845 being the best at -3.190. Additionally; the Z score for the 44830 start site is unreasonable because it is far below 2 with an SD score of 0.825. The SD score for Genemark is much more reasonable with an SD score of 2.889. /note=Gap/overlap: The 44830 start site has an unreasonable overlap with a gap of -19 but has a reasonable 504 bp length. The 44845 start site has a reasonable length of 489 and reasonable overlap that could indicate the presence of an operon with a -4 gap. /note=Phamerator: The Pham number as of 10/28/21 was 71514. The gene is conserved in Cafasso which is in the same cluster as Obladi. was also called by Cafasso. /note=Starterator: The start number is 3 for Obladi which corresponds to 44830 start site. This evidence does not correspond to Genemark but does correspond to Glimmer. /note=Location call: Based on the data provided it seems that 44845 is most likely the correct start site because of it`s sufficient overlap, its coding potential and its SD and final score. /note=Function call: Based on the evidence shown, I do not believe a function can be called. In NCBI, there is only one hit (Cafasso) which has an unknown function (100% coverage, 83% identity, E-value of 3.13E-88). Similarly in phagesDB Cafasso is also called with no function and no other hit listed has a strong e value or score (Cafasso: E-value 1E-76). HHPred had no useful hits and CDD contained one hit with a very weak e-value and therefore was not very informative. Therefore, we cannot call a gene function in this case. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs. Therefore, this is likely not a membrane protein. /note=Secondary Annotator Name: Melkote, Aditi /note=Secondary Annotator QC: Looks good, agree with this call! Second QC: Looks good, don`t forget to fill out Synteny box + comment on CCD and HHpred in the function call. CDS 45330 - 45518 /gene="68" /product="gp68" /function="hypothetical protein" /locus tag="ObLaDi_68" /note=Original Glimmer call @bp 45330 has strength 7.55 /note=SSC: 45330-45518 CP: no SCS: glimmer ST: NA BLAST-Start: [hypothetical protein [Streptomyces tuirus]],,NCBI, q4:s1 85.4839% 3.22992E-4 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.222, -2.2186743274732437, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces tuirus]],,MBR8638664,23.6111,3.22992E-4 SIF-HHPRED: SIF-Syn: NKF. Cafasso gene 68 and gene 69 show synteny with ObLaDi gene 67 and 69, which are upstream and downstream of this gene, respectively. Cafasso gene 68 has no function noted and is a part of pham 71541, which matches with ObLaDi gene 67 (NKF). Both gene 69 in Cafasso and ObLaDi have the function of DNA binding protein. /note=Primary Annotator Name: Wu, Meigan /note=Auto-annotation: Glimmer only. Start site at 45330. Start codon: GTG. /note=Coding Potential: A typical and an alternate coding potential is observed for a forward direction frame in GeneMark Self, and an alternate coding potential is observed for a reverse direction frame in GeneMark Self as well. Additionally, a typical coding potential is found on a forward direction frame in GeneMark Host. /note=SD (Final) Score: -2.219 (best final score on PECAAN) /note=Gap/overlap: Overlap = 4 bp; this gene is potentially a part of an operon. /note=Phamerator: pham 74545. Date: 10/26/2021. This gene is the only member in this pham. No function predicted by Phamerator. /note=Starterator: orpham. /note=Location call: This gene is possibly a real gene that starts at 45330. /note=Functional call: Function call: NKF. NCBI BLASTp, PhagesDB BLASTp, CDD, and HHpred all either provided no hits or yielded uninformative information. However, some poor phagesDB BLAST hits suggest HNH endonuclease, which would make sense for this apparent insertion of a gene (compared to Cafasso) /note=Transmembrane domains: No TMDs detected /note=Secondary Annotator Name: Tenney, Megan /note=Secondary Annotator QC: Great job! I agree with your calls! CDS 45515 - 45709 /gene="69" /product="gp69" /function="DNA binding protein" /locus tag="ObLaDi_69" /note=Original Glimmer call @bp 45515 has strength 18.23; Genemark calls start at 45515 /note=SSC: 45515-45709 CP: yes SCS: both ST: SS BLAST-Start: [DNA binding protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 8.40092E-23 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.984, -2.707694805492614, yes F: DNA binding protein SIF-BLAST: ,,[DNA binding protein [Gordonia phage Cafasso]],,QXN74284,78.125,8.40092E-23 SIF-HHPRED: SIF-Syn: DNA binding protein, upstream gene is of pham 33371, just like in Cafasso. /note=Primary Annotator Name: Abuwarda, Manar /note=Auto-annotation: Glimmer and Genemark. Both call the start site at 45515. /note=Coding Potential: The ORF has reasonable coding potential. Coding potential is found in both Genemark Self and Host. The chosen start site includes all the coding potential. /note=SD (Final) Score: -2.708. This is the best final score on PECAAN. /note=Gap/overlap: - 4 bp. There is overlap between this gene and the upstream gene, however, a 4 bp overlap is typical of a gene found in an operon. The 4 bp overlap is GTGA which signifies the start codon of this gene overlapping with the stop codon of the upstream gene. /note=Phamerator: Pham 33421. Date 10/22/21. It is conserved and found in Cafasso (DZ). /note=Starterator: Start site 10 was manually annotated in 1/1 non-draft phages in this pham, and there are only 3 members of this pham. Start site 10 is 45515 in ObLaDi. This evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene with the most likely start site at 45515. /note=Function call: DNA Binding Protein. The top non-draft phagesdb BLAST hit has the function of DNA binding protein (68% identity, E-value 3^-19), and the top NCBI BLAST hit also has the function of DNA binding protein. (100% coverage, 68.75% identity, and E-value 9^-23). /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Ma, Yiwen /note=Secondary Annotator QC: Done, good notes! I agree with your call! But what about the downstream gene? CDS 45706 - 46581 /gene="70" /product="gp70" /function="hypothetical protein" /locus tag="ObLaDi_70" /note=Original Glimmer call @bp 45706 has strength 11.69; Genemark calls start at 45706 /note=SSC: 45706-46581 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_70 [Gordonia phage Cafasso]],,NCBI, q25:s30 91.4089% 3.57518E-122 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.981, -4.700704706091107, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_70 [Gordonia phage Cafasso]],,QXN74285,76.2411,3.57518E-122 SIF-HHPRED: SIF-Syn: No known function, upstream gene is pham 4752, downstream gene is pham 33421, just like in phage Cafasso, with the same functions for these genes. Upstream is DNA polymerase III subunit and downstream is the DNA binding protein. /note=Primary Annotator Name: Batteikh, Maysaa /note=Auto-annotation: Glimmer and Gene mark both called 45706 as the start site /note=Coding Potential:The ORF has coding potential which is seen in the host and self genemark. At the 45706 start site, it has all the coding potential for the gene. /note=SD (Final) Score: Final score of -4.701 is not the best option as well as the z-score of 1.981, but they are good enough. /note=Gap/overlap:The overlap between the previous and upstream gene is reasonable with it being -4, which indicates an operon /note=Phamerator: As of 10.21.2021, the gene belongs to pham 33371. The pham is conserved in another phage, Cafasso, which belongs to the same cluster, DZ. Cafasso was used for comparison since it is the only non-drafted phage that is in pham 33371. No known function for this pham. /note=Starterator: The start site that is conserved among the members of the pham in which the gene belongs to is 45706. The start number for the phage is 3, with 45706 bp. There are 2 other members in this pham, making it a total for 3 members overall, two draft phages and one non draft phage. All three pages call the most conserved start site,3. /note=Location call: The overall evidence suggests that the gene is a real gene with the auto annotated start site, 45706, is the best start site for this gene. With proximity to the previous gene, the start site only has a 4bp overlap meaning it is a reasonable overlap and does not need another gene to fill nor does it need a different start site. The gene is seen to be conserved in the phamerator and has good coding potential, which means it is a real gene. The start site of 45706 contains all the coding potential for the gene and is seen to be conserved in starterator. /note=Function call: No informative data was provided by CDD, HHpred, PhagesDB BLASTp and NCBI, therefor, no known function The only NCBI BLASTp hit suggests that the function is a hypothetical protein, with high query coverage (89%), good percent identity (72%) and low e-value (4e-122). The top phagesDB BLASTp hit suggested function is unknown with good percent identity (72%) and low e-values (2e-98). The other hits in the PhagesDB are on draft genes or with high e-values, therefore there does not appear to be enough evidence to call a function for this gene. /note=Transmembrane domains:Neither TMHMM or TOPCONS predicted any TMDs, therefor it is not a membrane protein /note=Secondary Annotator Name: Fleming, Hanna /note=Secondary Annotator QC: I agree with this annotation and location call. All of the evidence categories have been considered. Note: the starterator and dropdown menus are not filled out. I also agree with your function call. CDS 46578 - 47111 /gene="71" /product="gp71" /function="DnaQ-like (DNA polymerase III subunit)" /locus tag="ObLaDi_71" /note=Original Glimmer call @bp 46578 has strength 14.93; Genemark calls start at 46578 /note=SSC: 46578-47111 CP: yes SCS: both ST: NI BLAST-Start: [DnaQ-like DNA polymerase III subunit [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 7.26775E-126 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.23, -4.331668843916977, yes F: DnaQ-like (DNA polymerase III subunit) SIF-BLAST: ,,[DnaQ-like DNA polymerase III subunit [Gordonia phage Cafasso]],,QXN74286,100.0,7.26775E-126 SIF-HHPRED: DNA polymerase III subunit epsilon; DNA editing Proofreading Exonuclease Polymerase, DNA Binding protein; 6.7A {Escherichia coli K12},,,5M1S_D,96.6102,99.5 SIF-Syn: DnaQ-like (DNA polymerase III subunit), upstream gene is NKF and in Pham 33371, downstream is ThyX-like thymidylate synthase, just like in phage Cafasso. /note=Primary Annotator Name: Kamarzar, Minehli /note=Auto-annotation: Glimmer and GeneMark were used and both agreed on the same start site. The called start codon is 46578. /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF. The chosen start site covers all the coding potential. /note=SD (Final) Score: The SD score of -4.332 is the best option and the z-score is the highest at 2.23. /note=Gap/overlap: The 4 base pair overlap with the upstream gene is reasonable. The length of the gene (534 bp) is acceptable given the auto-annotated start site. /note=Phamerator: As of October 20, 2021, the gene is found in pham 4753. The gene is conserved in Phage Cafasso which belongs to the same cluster (DZ) as Phage ObLaDi. The phage used for comparison was Phage Cafasso. The function call for this gene is a DnaQ-like (DNA polymerase III subunit). The function call is consistent between Phamerator and the phams database. It is approved on the SEA-PHAGES function list. /note=Starterator: The start site choice that is conserved among the members of the pham in which this gene belongs is start site 46578 which is start number 43. There are 93 non-draft members and 13 draft members in this Pham and 37/93 non-draft members call start site 44; however, the start site that made most sense for this gene is 43 which is called by 3/93 non-draft members. /note=Location call: The gathered evidence strongly suggests that the original start site call at 46578 by Glimmer and Genemark is reasonable and it is the potential start site candidate that seems most likely. In addition, it also suggests that the gene is a real gene. /note=Function call: PhagesDB BLAST and NCBI BLASTp have hits that suggest the function of this gene is either a DnaE-like DNA polymerase III alpha or a DnaQ-like DNA polymerase III subunit with very small E-values. PhagesDB BLAST gave hits with E-values of e-103 and e-71, while NCBI BLASTp gave E-values of e-126 and e-84. The top 2 NCBI BLASTp and PhagesDB BLAST hits sorted by E-values show high identity values (>74%) and over 99% query coverage. HHpred and CDD had a hit for RNase AS protein with 99.7% probability, 98.3% coverage, and E-value of 2.2e-16. The function of RNase AS is stated as a 3`-5` exoribonuclease. SEAPHAGES approved function list states that DnaQ is the exonuclease of Pol III (epsilon subunit). Therefore, it is concluded that the function is likely to be a DnaQ-like (DNA polymerase III subunit). /note=Transmembrane domains: TMHMM and TOPCONS did not predict any TMDs, which indicates that it is not a membrane protein. /note=Secondary Annotator Name: Esparza, Pablo /note=Secondary Annotator QC: I agree with everything; scores look significant and I came up with the same conclusions. Great job! CDS 47184 - 47888 /gene="72" /product="gp72" /function="ThyX-like thymidylate synthase" /locus tag="ObLaDi_72" /note=Original Glimmer call @bp 47184 has strength 17.9; Genemark calls start at 47184 /note=SSC: 47184-47888 CP: yes SCS: both ST: NI BLAST-Start: [ThyX-like thymidylate synthase [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 2.89081E-170 GAP: 72 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.796, -5.432024807810012, no F: ThyX-like thymidylate synthase SIF-BLAST: ,,[ThyX-like thymidylate synthase [Gordonia phage Cafasso]],,QXN74287,99.1453,2.89081E-170 SIF-HHPRED: Thymidylate synthase thyX; ThyX, FAD, FdUMP, Flavoprotein, Methyltransferase, Nucleotide biosynthesis, Transferase, Structural Genomics, Seattle Structural Genomics Center for Infectious; HET: FAD, UFP; 1.9A {Mycobacterium tuberculosis},,,3GWC_E,99.1453,100.0 SIF-Syn: ThyX-like thymidylate synthase protein, upstream gene is pham 4753, downstream gene is pham 32967, just like in phage Cafasso /note=Primary Annotator Name: Krug, Kelley /note=Auto-annotation: Glimmer and GeneMark, start site 47184, start codon ATG /note=Coding Potential: Coding potential found in 3rd frame, Forward gene, coding potential found in both GeneMark Self and Host, start site encompasses full coding potential of gene /note=SD (Final) Score: -5.432, it is the best RBS final score /note=Gap/overlap: 72 bp, it is the shortest gap out of all the potential start sites, no coding potential in the gap, has synteny with Cafasso, seems reasonable /note=Phamerator: Pham 57278 as of 10/22/2021, this pham is conserved in Cafasso, Phamerator had function of this gene in other phages as: ThyX-like thymidylate synthase /note=Starterator: Start site number 38 called the most often, 294 out of 548 non-draft phage annotations called it. ObLaDi does not have this start site, instead the likely start is site 51 and it corresponds to base pair 47184. Start site 51 only found in 4 out of 579 genes in pham, but called 100% of time when present. /note=Location call: Likely a real gene as evidenced by the information above, start site is 47184 /note=Function call: The top 7 hits on HHpred based on e value (e-32 or lower) all suggested thymidylate synthase protein function (most said Thyx, one said Thy1), with coverage in the high 90s, probability 100%. The CDD top 5 hits were Thymidylate synthase hits with low e values, best one had 32% identity, 96% coverage, e value of 0. /note=Transmembrane domains: TMHMM showed no TMDs, neither did TOPCONS. This agrees with the function call of Thy-X since the enzyme is not a transmembrane protein. /note=Secondary Annotator Name: Fleming, Hanna /note=Secondary Annotator QC: I agree with this location call and have reviewed all of the evidence categories. I also agree with your function call. Check Starterator again - seems like you chose the suggested start site. CDS 47900 - 48262 /gene="73" /product="gp73" /function="hypothetical protein" /locus tag="ObLaDi_73" /note=Original Glimmer call @bp 47900 has strength 14.81; Genemark calls start at 47900 /note=SSC: 47900-48262 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_73 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 3.30147E-79 GAP: 11 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.175, -2.3158002311349737, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_73 [Gordonia phage Cafasso]],,QXN74288,99.1667,3.30147E-79 SIF-HHPRED: SIF-Syn: NKF, upstream gene is ThyX-like thymidylate synthase, downstream gene is also NKF, just like phage Cafasso /note=Primary Annotator Name: Lee, Adrienne /note=Auto-annotation: GeneMark and Glimmer call the start site at 47900. /note=Coding Potential: There is high coding potential according to both the Self GeneMark and Host GeneMark and they were all forward genes. /note=SD (Final) Score: The final score is -2.316, which was the least negative score compared to the other gene candidates. /note=Gap/overlap: There is a 11 base pair gap which is very small so this is reasonable. This is also the smallest gap compared to the other gene candidates. /note=Phamerator: This gene is part of Pham 32967 as of October 21, 2021. This pham only has 3 phages in total and the other two are Aleeemily_Draft_72 and Cafasso_73. /note=Starterator: The most conserved start site is 2 at 47900 which is also the auto-annotated start site for ObLaDi. This start site is the same for the other 2 phages in the pham. /note=Location call: Based on the evidence, this is a real gene and the start site is 47900, which was called by both GeneMark and Glimmer. This start site was also determined by Starterator and conserved in other phages in the pham. /note=Function call: NKF: There were multiple Phagesdb BLAST hits with low e-values (4e-63) but they all have function unknown. In NCBI Blastp, all the strong hits with low e-values and high identity have “hypothetical protein” as the description. There were no strong hits from CCD and HHpred; they all had very poor e-values. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Montoya Serpas, Cinthya /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered CDS 48259 - 48615 /gene="74" /product="gp74" /function="hypothetical protein" /locus tag="ObLaDi_74" /note=Original Glimmer call @bp 48259 has strength 10.18; Genemark calls start at 48259 /note=SSC: 48259-48615 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_74 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 3.15031E-78 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.362, -4.000014702296222, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_74 [Gordonia phage Cafasso]],,QXN74289,99.1379,3.15031E-78 SIF-HHPRED: SIF-Syn: NKF (pham 33523), upstream gene is NKF (pham 32967), downstream is NKF (pham 79772), just like in Cafasso /note=Primary Annotator Name: Janelle Tricia Magaling /note=Auto-annotation: Glimmer and Genemark. Both agree on the start site 48259. /note=Coding Potential: Coding potential is in the first forward ORF. There is also typical and atypical coding potential shown in GeneMark Self and Host. The atypical coding potential tapers off at 48500. /note=SD (Final) Score: -4.000. Not the best, but irrelevant because it’s likely an operon. /note=Gap/overlap: 4 bp overlap. Maybe an operon. /note=Phamerator: pham: 33523. Date: 10/25/2021. It is conserved; found in Cafasso (DZ) /note=Starterator: The most conserved start site 3 was found in 1/1 non-draft genes in the pham. Start 3 is at 48259 in ObLaDi. This is supported by GeneMark and Glimmer. /note=Location call: Based on the evidence, this is a real gene with good coding potential and is conserved in Phamerator with the start site at 48259 which is supported by Glimmer and Genemark. /note=Function call: unknown function. phagesDB blast shows that top 5 hits have unknown functions and the frequency table had one function with a high e value. NCBI blast showed the top two phages with unknown function and the third with putative regulator but it had an e value of e-9 and had no other phages to support the function call. There were also no hits for CDD and HHpred. /note=Transmembrane domains: TmHmm have no hits so cannot use TOPCONS /note=Secondary Annotator Name: Ostroske, Elyse /note=Secondary Annotator QC: I have QC`d this location call and agree with the first annotator. CDS 48599 - 48763 /gene="75" /product="gp75" /function="hypothetical protein" /locus tag="ObLaDi_75" /note=Original Glimmer call @bp 48599 has strength 19.14; Genemark calls start at 48599 /note=SSC: 48599-48763 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_CAFASSO_75 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 2.08598E-29 GAP: -17 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.005, -2.6814417326944517, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_75 [Gordonia phage Cafasso]],,QXN74290,98.1481,2.08598E-29 SIF-HHPRED: SIF-Syn: NKF and gene is in Pham 79772, with upstream gene in Pham 33523 and downstream gene in Pham 18992, like Cafasso 75. /note=PECAAN Notes /note=Primary Annotator Name: Ostroske, Elyse /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 48599. /note=Coding Potential: Coding Potential for this gene is mainly found on the forward strand, indicating that this is a forward gene. Coding Potential is found in both GeneMark Self and Host. /note=SD (Final) Score: -2.681. It is the best final score on PECAAN. /note=Gap/overlap: Overlap of 17 bp. Most reasonable overlap given out of the potential start sites. /note=Phamerator: Pham 79772 (as of 10/25/21). It is found in the only non-draft genome (Cafasso) in cluster DZ /note=Starterator: Start Site 2 in Starterator was manually annotated for Cafasso. Start 2 is 48599 in ObLaDi. This evidence agrees with the site predicted by Glimmer and GeneMark. However, nucleotide sequence and possible start sites the same for the 3 members of this pham, so Starterator data not very informative. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 48599. /note=Function call: NKF. There were no hits in CDD and none of the hits in HHpred made sense (low probability, high E-values, lots of Domains of Unknown Function). Cafasso was checked as evidence due to low e-value, despite not having a known function. /note=Transmembrane domains: TMDs were not predicted by TMHMM or TOPCONS. /note=Secondary Annotator Name: Wang, Jennifer Yiyang /note=Secondary Annotator QC: Looks great! Agree on start site 48599. CDS 48760 - 49281 /gene="76" /product="gp76" /function="hypothetical protein" /locus tag="ObLaDi_76" /note=Original Glimmer call @bp 48760 has strength 16.15; Genemark calls start at 48772 /note=SSC: 48760-49281 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_76 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 5.73172E-125 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.433, -4.824049307458754, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_76 [Gordonia phage Cafasso]],,QXN74291,100.0,5.73172E-125 SIF-HHPRED: SIF-Syn: ObLaDi gene 76 shows synteny with Cafasso 76 as well as Mollymur 9, neither of which have a known function (NKF). Many of the genes upstream and downstream of this gene also have NKF, which may further confirm the absence of function in this gene. /note=Primary Annotator Name: Santos, Charysa /note=Auto-annotation: Glimmer and Genemark, Glimmer starts at 48760 and GeneMark starts at 48772. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.284. It is the best final score on PECAAN. /note=Gap/overlap: -4. Very small overlap, meaning there is no room for another gene in between the previous gene and this one. /note=Phamerator: pham: 18992; Date: 10/24/21; It is conserved, it is found in Aleemily, Cafasso, ObLaDi_76 /note=Starterator: The start site number that was called most often in the published annotations is 3. In total, it was called in 1 out of the 2 non-draft genes. Start 3 is 48760 in ObLaDi. This evidence agrees with the site predicted by Glimmer. /note=Location call: Using the evidence I found, this is a real gene and the most probable start site is 48760. /note=Function call: NKF; Based on the BLAST, CDD, and HHpred information I’ve collected, there seems to be no known function for this gene. It does not align with very many known sequences or share synteny with other genes. It also does not have very many hits throughout the databases, leading me to believe that the function may not be known yet. /note=Transmembrane domains: There were no TMD predictions on either TMHMM or TOPCONS, so we can conclude that this gene is not a membrane protein. /note=Secondary Annotator Name: Khaine, Aye Myat /note=Secondary Annotator QC: I agree with this annotation and all evidence has been considered. CDS 49278 - 49469 /gene="77" /product="gp77" /function="hypothetical protein" /locus tag="ObLaDi_77" /note=Original Glimmer call @bp 49278 has strength 9.23; Genemark calls start at 49278 /note=SSC: 49278-49469 CP: yes SCS: both ST: SS BLAST-Start: [DNA binding protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 3.55797E-32 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.905, -5.2088931604536635, no F: hypothetical protein SIF-BLAST: ,,[DNA binding protein [Gordonia phage Cafasso]],,QXN74292,96.8254,3.55797E-32 SIF-HHPRED: SIF-Syn: DNA binding protein, upstream gene does not have a known function (Pham: 18992, as of 11/30/2021), downstream gene does not have a known function (Pham: 33111, as of 11/30/2021), just like in Cafasso. /note=Primary Annotator Name: Sheppy, Tyler /note=Auto-annotation: GeneMark and Glimmer both call the start site of the gene at 49278. /note=Coding Potential: There is high coding potential seen in both the Host-Trained GeneMark and the Self-Trained GeneMark and the start site covers this potential. /note=SD (Final) Score: The RBS score is -5.209 and the Z-score is 1.905. There are other start sites with higher scores, but they do not have a reasonable overlap with the previous gene. /note=Gap/overlap: The gene has an overlap of 4bp, which may indicate that this gene is part of an operon. This gap is observed in a gene in the phage Cafasso, which is in the same cluster. The length of the gene is acceptable. /note=Phamerator: The gene is found in pham 33634, as of October 26, 2021. The pham is conserved in another phage in the DZ cluster. Cafasso (DZ) was used for comparison. The Phamerator did not have a function called for this gene. /note=Starterator: There is a reasonable site choice that is conserved among members of the pham. Start site number 7 is conserved and it corresponds to the coordinate 49278. 1 out of 1 non-draft members in this pham call the most conserved start site. This supports the Glimmer and GeneMark data. /note=Location call: Based on the evidence, this gene is real and it has a start site of 49278. /note=Function call: The top hit in both the NCBI BLASTp and PhagesDB BLASTp lists the function as a DNA binding protein. This hit comes from a gene in Cafasso, and it has a coverage of 100%, identity of 96.8254%, and an e-value of 3.55797e-32. CDD and HHpred were not informative. No other evidence, so calling as NKF for now. /note=Transmembrane domains: Transmembrane domains are not predicted by either TMHMM or TOPCONS. /note=Secondary Annotator Name: Wu, Meigan /note=Secondary Annotator QC: I would also note that the gap size may indicate that this gene is an operon. Other than that, looks good! Good job! /note=11/24/21: I would also state that the Starterator data supports the Glimmer and Genemark data. Providing the coverage, identity, and e-value numbers for the function call data would be helpful. Otherwise, everything is detailed fully and correctly! Good job! CDS 49549 - 49806 /gene="78" /product="gp78" /function="hypothetical protein" /locus tag="ObLaDi_78" /note=Original Glimmer call @bp 49549 has strength 12.94; Genemark calls start at 49549 /note=SSC: 49549-49806 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_78 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 2.78064E-51 GAP: 79 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.781, -3.2026596621721617, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_78 [Gordonia phage Cafasso]],,QXN74293,98.8235,2.78064E-51 SIF-HHPRED: SIF-Syn: Downstream: Pham 33634, DNA binding protein. Upstream: Pham 33643, NKF. This matches Phage Cafasso of Cluster DZ. /note=Primary Annotator Name: Stephenson, Juliet /note=Auto-annotation start source: Both Glimmer and GeneMark called this gene at the same site, 49549. /note=Coding Potential: The ORF was on forward strand only. Coding potential was evident in both GeneMark Self and Host. /note=SD (Final) Score: -3.202, which is the best SD score in the mix. /note=Gap/overlap: 79 bp, which is reasonable. This gene has a length of 258 which is reasonable. /note=Phamerator: The gene is found in pham 33111 on 10/26/21. Pham present in phage Cafasso of Cluster DZ. no function called. /note=Starterator: Reasonable start site conserved among members of pham. Conserved is #12, which corresponds to bp 49,549. 3 other phages in this pham. 3/3 call this start site. However, Starterator uninformative because too many draft phages. /note=Location call: I think the best start site for this gene is 49,549, because it has the best RBS and Z-score, and is also present in phage Cafasso. /note=Function call: NKF, This sequence did not match well with any other known sequences, and so I cannot assign it a function. /note=Transmembrane domains: None /note=Secondary Annotator Name: Alvarez, Alondra /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator CDS 49839 - 50165 /gene="79" /product="gp79" /function="hypothetical protein" /locus tag="ObLaDi_79" /note=Original Glimmer call @bp 49839 has strength 13.59; Genemark calls start at 49839 /note=SSC: 49839-50165 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_79 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.56998E-46 GAP: 32 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.623, -4.2123002476362, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_79 [Gordonia phage Cafasso]],,QXN74294,85.4369,1.56998E-46 SIF-HHPRED: SIF-Syn: NKF, the gene is of pham 33643, upstream is a gene of pham 33111 and downstream is a gene of pham 22572, just like in phage Cafasso. /note=Primary Annotator Name: Thorp, Jocelyn /note=Auto-annotation: Glimmer and GeneMark agree on bp 49839 as the start site, starting with an ATG codon (Methionine). /note=Coding Potential: The gene has reasonable coding potential in the forward direction within the putative ORF (bp 49839 to 50165). The start site encompasses all of the coding potential, and there is coding potential on both GeneMark Self and Host. /note=SD (Final) Score: -4.212. This and the z-score (2.623) are the best of all possible start sites. /note=Gap/overlap: 32 bp gap. This gap isn’t large enough to be unreasonable. /note=Phamerator: As of 10/22/2021, the gene belonged to pham 33643. This pham is conserved within other members of the DZ cluster, containing both genes from Cafasso (non-draft) and Aleemily (draft). Cafasso was primarily used for comparison. There was no listed function as of this time. /note=Starterator: While there is only one non-draft gene (Cafasso) in the pham, its selected start site (1) is also selected within the other draft members in the pham. Start site 1 corresponds to 49839 in ObLaDi. There are 2 other members in this pham (1 non-draft) and both call start site #1. /note=Location call: Based on the evidence above, this is a real gene with the likely start site at 49,839. /note=Function call: No known function. Blastp hit featured 1 non-draft hit with a low e-value (3e-38) and high identity match (76%) and 100% coverage, though it was a gene with no known function. CDD and HHpred did not return informative hits. /note=Transmembrane domains: There are no predicted TMDs in TOPCONS or TMHMM, therefore it is not a membrane protein. /note=Secondary Annotator Name: Senthilvelan, Jayasuriya /note=Secondary Annotator QC: I agree with the location and function call. All evidence categories have been filled out. CDS 50173 - 50493 /gene="80" /product="gp80" /function="hypothetical protein" /locus tag="ObLaDi_80" /note=Original Glimmer call @bp 50173 has strength 13.27; Genemark calls start at 50173 /note=SSC: 50173-50493 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_80 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 3.74234E-72 GAP: 7 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.488, -5.7740198441752755, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_80 [Gordonia phage Cafasso]],,QXN74295,100.0,3.74234E-72 SIF-HHPRED: SIF-Syn: NKF(from Pham 22572), upstream gene is from pham 33643, downstream is from pham 55417, just like in phage Cafasso /note=Primary Annotator Name: Zhuang, Chuzhi /note=Auto-annotation: both Glimmer and Genemark agree on the start site #50173, start codon ATG /note=Coding Potential: have reasonable coding potential, chosen start site cover all this coding potential /note=SD (Final) Score: -5.774, it is not the best final score but it is reasonable. Other options with better final scores have unreasonable overlap or huge gaps. /note=Gap/overlap: 7, it is most reasonable with longest ORF, and the length is also reasonable /note=Phamerator: pham number - 22572, date - 10/26/2021, the gene is conserved in other phages in DZ cluster, Cafasso is used for comparison. No function specified. /note=Starterator: The conserved start site in the pham is 10. It corresponds to 50173 in my phage. 3/6 called site 10. /note=Location call: real gene, start at #50173 /note=Function call: NKF, because no blast results reflect any known function, no CDD hits are found, and no HHpred hits are significant. /note=Transmembrane domains: No hit for transmembrane domains. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 50497 - 51111 /gene="81" /product="gp81" /function="hypothetical protein" /locus tag="ObLaDi_81" /note=Original Glimmer call @bp 50497 has strength 16.85; Genemark calls start at 50497 /note=SSC: 50497-51111 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_81 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 2.58997E-146 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.433, -3.773970443115436, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_81 [Gordonia phage Cafasso]],,QXN74296,99.5098,2.58997E-146 SIF-HHPRED: SIF-Syn: This gene is in Pham 55417. The upstream gene is in Pham 22572. The downstream gene is 8556. These both have NKF. These are all in the same Phams as the respective genes in Cafasso. The downstream gene in Cafasso is a DNA binding protein but it currently not annotated for ObLaDi. /note=Primary Annotator Name: Fleming, Hanna /note=Auto-annotation: Glimmer and Genemark both call 50497 as the start site. The start codon is ATG (methionine). /note=Coding Potential: There is good coding potential within the putative ORF. However, the start codon does not cover all of this coding potential. /note=SD (Final) Score: The SD score was -3.774, this was the best score on PECAAN. /note=Gap/overlap: 3 bp gap. This is the most reasonable start site. The length of the gene is reasonable at 615 bp. /note=Phamerator: Pham 55417 on 10/24/21. This pham was also found in Cafasso. /note=Starterator: Start site 4 was found in 3 of the 48 genes in the pham. Start 4 is bp 50497 in ObLaDi. This agrees with Glimmer and Genemark. /note=Location call: This is most likely a real gene and the most reasonable start site is 50497. /note=Function call: The top hits on NCBI BLAST have an unknown function but a hit with e value of 6e-18 and with 48% (48/100) identity and 48.5% coverage suggests that this may be a DNA methylase. CDD and HHpred were uninformative with no specific hits. The lowest e-value in HHpred was 1.8. /note=Transmembrane domains: TMHMM did not predict any transmembrane domains and neither did TOPCON so this is not a transmembrane domain. /note=Secondary Annotator Name: Whang, Allison /note=Secondary Annotator QC: Agree with the location call for the correct start site. I would look at the annotation lab manual and add more detail to your notes sections via the `Instructions for Each Section`. Also, the SD score you put is not technically the "best" SD score, but it is still reasonable because it is more negative than -2. CDS 51108 - 51530 /gene="82" /product="gp82" /function="DNA binding protein" /locus tag="ObLaDi_82" /note=Original Glimmer call @bp 51108 has strength 12.53; Genemark calls start at 51108 /note=SSC: 51108-51530 CP: yes SCS: both ST: SS BLAST-Start: [DNA binding protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 3.49995E-88 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.012, -5.544484265892628, yes F: DNA binding protein SIF-BLAST: ,,[DNA binding protein [Gordonia phage Cafasso]],,QXN74297,85.8974,3.49995E-88 SIF-HHPRED: g.41.3.1 (A:) RBP9 subunit of RNA polymerase II {Thermococcus celer [TaxId: 2264]},,,d1qypa_,35.0,97.4 SIF-Syn: Portal, upstream gene is NFK, downstream is NFK, just like in Cafasso_82 as well as both genes being DNA binding proteins /note=Primary Annotator Name: Gonzalez, Celio /note=Auto-annotation: Both Glimmer and Genemark say the start begins in 51108 with start codon as GTG /note=Coding Potential:Coding Potential found only in forward strand therefore suggesting a forward gene. Coding potential is found in both gene mark host and self  /note=SD (Final) Score: Final Score is -5.544 and a z-score of 2.012 therefore providing strong evidence for start site 51108 since z is above 2. /note=Gap/overlap: Gene overlap of 4 base pairs. Gap does not contain Coding potential.   /note=Phamerator: Pham 8556, conserved on Cafasso (DZ), in total 7 members with 2 drafts. /note=Starterator:  Start 51108 for conserved start site 12. 4 members of pham non-draft show the same start site.   /note=Location call:Most likely start site at 51108 seeing as the evidence suggests that it sis a real gene and maintains all coding potential in both genemark host and self as well as glimmer. /note=Function call: DNA Binding protein because of it’s 84% identity with Cafasso_82 with e-value of 2e-75 /note=Transmembrane domains:Not a membrane protein because no TMD’s were found. Through HHPRED we find d1qypa to be a prtein in our gene and its function is to form a subunit of RNA polymerase II. /note=Secondary Annotator Name: Wright, Nicklas /note=Secondary Annotator QC: I have QC`ed this location call and I agree with the start site of 51108. However, I disagree with everything else in the PECAAN notes. The primary annotator used the z-value and Final score from the wrong start site. Start site 51108 actually has a good z-value (2.012). Also, this is definitely not a reverse gene. There is no coding potential in the reverse direction and there is plenty of coding potential in the forward direction. CDS 51531 - 51707 /gene="83" /product="gp83" /function="hypothetical protein" /locus tag="ObLaDi_83" /note=Original Glimmer call @bp 51531 has strength 11.72; Genemark calls start at 51531 /note=SSC: 51531-51707 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_83 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 4.97511E-35 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.405, -4.483031786805435, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_83 [Gordonia phage Cafasso]],,QXN74298,100.0,4.97511E-35 SIF-HHPRED: SIF-Syn: NKF, upstream is Pham 8556, downstream is Pham 33541, just like in phage Cafasso. /note=Primary Annotator Name: Paek, Brian /note=Auto-annotation: Both Glimmer and GeneMark agree that the start site is 51531. /note=Coding Potential: There is high coding potential based on the third frame going in the forward direction within the gene range for both host-trained and self-trained GeneMark. /note=SD (Final) Score: The Final Score is -4.483, and the Z-score is 2.405, both of which are the best among other start site options. /note=Gap/overlap: 0 bp gap which is reasonable because the coding potential is all covered and there is no need to add another gene. /note=Phamerator: Pham: 32757. Date Analyzed: 10/22/2021. The gene is conserved in cluster DZ when compared to phage Cafasso. /note=Starterator: There is only one non-draft member of this pham which calls start site 1. Start site 1 correlates to a start site of 51531 bp in ObLaDi. /note=Location call: The gathered evidence suggests that this is a real gene and the most probable start site is at 51531. /note=Function call: Inconclusive. There is only one phagesdb and NCBI BLAST hit with E-values < 3e-30. (100% identity, 100% coverage). There is no known function of this ORF because CDD showed nothing and HHpred does not have any significant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predicted any TMDs, indicating that this gene is not a membrane protein. /note=Secondary Annotator Name: Sheppy, Tyler /note=Secondary Annotator QC: I agree with the location and function call. CDS 51704 - 52078 /gene="84" /product="gp84" /function="membrane protein" /locus tag="ObLaDi_84" /note=Original Glimmer call @bp 51704 has strength 7.33; Genemark calls start at 51785 /note=SSC: 51704-52078 CP: yes SCS: both-gl ST: SS BLAST-Start: [membrane protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 4.92255E-82 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.984, -3.1725816037952645, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Gordonia phage Cafasso]],,QXN74299,100.0,4.92255E-82 SIF-HHPRED: SIF-Syn: membrane protein; Upstream on ObLaDi is NKF and downstream is also a membrane protein . On Cafasso, the gene as well as the upstream and downstream genes all have unknown function. /note=Primary Annotator Name: Rajiv, Subashni /note=Auto-annotation: Glimmer calls the start at 51704. GeneMark calls the start at 51785. The start codon is ATG. /note=Coding Potential: The coding potential in this ORF is only in the forward strand, suggesting it is a forward gene. Coding potential is found in both GeneMark Self and Host. /note=SD (Final) Score: The final score is -3.173. It is not the best Final Score, but compared to the better scores, it has the best Z-score. /note=Gap/overlap: There is an overlap of 4 base pairs. This is a small and normal overlap, which is evidence of an operon. /note=Phamerator: Pham 33441 on 10/23/2021. It is conserved in Cafasso (DZ). /note=Starterator: The conserved start site 1 is base pair 51704. Of the 4 members in this pham, both non-draft members call it. This start site agrees with the Glimmer start site. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 51704. /note=Function call: The function is a membrane protein. PhagesDB’s two top hits both had unknown functions. NCBI’s NCBI’s top hit predicted membrane protein function with an e-value of 5e-82 and 100% identities while the second top hit predicted a hypothetical protein function with an e-value of 9e-30 and 76% identities. Both CDD and HHpred hits were uninformative. It has enough transmembrane domains called for in TMHMM to classify this gene as a membrane protein. /note=Transmembrane domains: 2 transmembrane domains were called in TMHMM. TOPCONS has not called any transmembrane domains. Because of the TMHMM calls, this is a membrane protein. /note=Secondary Annotator Name: Alvarez, Alondra /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator. Note: please edit auto-annotation start sites to the correct ones. I would also elaborate on the overlap and how it could be due to an operon! CDS 52071 - 52418 /gene="85" /product="gp85" /function="membrane protein" /locus tag="ObLaDi_85" /note=Original Glimmer call @bp 52071 has strength 14.17; Genemark calls start at 52071 /note=SSC: 52071-52418 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.1187E-78 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.59, -3.513874779476501, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Gordonia phage Cafasso]],,QXN74300,100.0,1.1187E-78 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Esparza, Pablo /note=Auto-annotation: Both Glimmer and Genemark agree it starts @52071 F. The starting codon is ATG. /note=Coding Potential: It has significant coding potential in its ORF. It covers almost all the coding potential with an ever so slight gap at the end where coding potential goes down, falls outside ORF but might be insignificant. /note=SD (Final) Score: The SD score is significant. this score appears to be still relevant due to Z score also being the best and positioning of Pham map backing this up too. /note=Gap/overlap: the gap overlap is by -8bp. It is still a few bp so not too much /note=Phamerator: In Pham 8075 and it was ran on 10/29/21. I compared it to Cafesso because it was the only non-draft gene that it was in its cluster (DZ). It seemed to be conserved for the area surrounding my gene. The function is a little vague it says secreted protein and its next to a DNA binding protein. Perhaps a generic protein. /note=Starterator: For the DZ cluster, the conserved start site works for ObLaDi draft and non-draft gene Cafesso. It is irrelevant to the other starts that are most common in all of the phages because it calls a most common start of 38 and this does not exist with our cluster. Our chosen start was (Start: 29 @52071 has 1 MA`s). It has been chosen before and is most relevant. So I think Starterator was helpful to a degree. /note=Location call: All the gathered evidence suggests this is a real gene, start at 52,071. /note=Function call: Membrane protein /note=Transmembrane domains: There is clear evidence between one result on TMHMM and Topcons that there appears to be a significant marker for a membrane protein. (There is also a membrane protein next to mine, Gene 84.) /note=Secondary Annotator Name: Santos, Charysa /note=Secondary Annotator QC: Make sure to be more clear about which start site you decided on. It appears that 52071 is the best choice for this gene. CDS 52418 - 52798 /gene="86" /product="gp86" /function="hypothetical protein" /locus tag="ObLaDi_86" /note=Original Glimmer call @bp 52418 has strength 10.49; Genemark calls start at 52418 /note=SSC: 52418-52798 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_86 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 7.50234E-87 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.4, -5.972387497942108, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_86 [Gordonia phage Cafasso]],,QXN74301,100.0,7.50234E-87 SIF-HHPRED: SIF-Syn: This gene`s function is NKF, ObLaDi upstream gene is membrane protein, downstream is DNA-binding protein. Upstream gene in phage Cafasso has no function called, downstream gene is DNA-binding protein. Cafasso gene 86 (corresponding to this gene in ObLaDi) has no function called. /note=Primary Annotator Name: Melkote, Aditi /note=Auto-annotation: Both Glimmer and GeneMark agree on start site called at 52418, start codon is ATG /note=Coding Potential: Good coding potential from GeneMarkS and Host-Trained GeneMark within predicted ORF; start site at 52418 covered in this coding potential. /note=SD (Final) Score: -5.972, not the best but still reasonable to suggest the presence of a credible ribosome binding site. /note=Gap/overlap: Overlap of -1, no large gaps upstream or downstream of gene, Pham map of Phage Cafasso used for comparison and synteny was observed for this gene with start site 52418 /note=Phamerator: 33541. Date 10-31-2021. It is conserved; found in Cafasso (DZ). /note=Starterator: Start site 3, which corresponds to 52418. This corresponds to the "Most Annotated" start site of the pham. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 52418 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: No Known Function. PhagesDB BLASTp yielded 2 hits, both proteins with unknown function, NCBI BLASTp provided one hit for a hypothetical protein in phage Cafasso. HHpred results are all very poor matches - the E-values are extremely high (95, 210, 430, etc.), while the probabilities are ~50% or lower. No CCD results. /note=Transmembrane domains: Does not have transmembrane domains. No TMDs, no TOPCONs results. /note=Secondary Annotator Name: Rajiv, Subashni /note=Secondary Annotator QC: I agree with the annotation as all evidence categories have been considered. Please update the upstream gene function in the synteny box for ObLaDi (updated!) CDS 52795 - 52935 /gene="87" /product="gp87" /function="DNA binding protein" /locus tag="ObLaDi_87" /note=Original Glimmer call @bp 52795 has strength 11.26; Genemark calls start at 52795 /note=SSC: 52795-52935 CP: yes SCS: both ST: SS BLAST-Start: [DNA binding protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 8.05738E-26 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.826, -4.0178693334748665, yes F: DNA binding protein SIF-BLAST: ,,[DNA binding protein [Gordonia phage Cafasso]],,QXN74302,100.0,8.05738E-26 SIF-HHPRED: zinc_ribbon_15 ; zinc-ribbon family,,,PF17032.7,65.2174,96.1 SIF-Syn: "DNA binding protein, the upstream gene is NFK, downstream is helix turn helix DNA binding domain, just like in phage Cafasso /note=Primary Annotator Name: Niazmandi, Kiana /note=Auto-annotation: Glimmer and Genesmark both marked 52795 as the start position. the start codon is GTG /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: the final score is -4.018 and the Z-score is 2.826 which are the best numbers among the rather starts sites /note=Gap/overlap: there is an overlap of 4 bp which is normal. the length of the gene is 141, which is above 120 bp and normal. /note=Phamerator: 33171 on 10/22021. The pham is also in the other members of the cluster. the gene is draft and doesn`t have a function. I compared the gene with Cafasso_87 that functions as a DNA binding Pr. /note=Starterator: the start site is 2 and it is conserved among all the members of the Pham. Starterator was manually annotated in 1/1 non-draft genes in this pham. the base pair coordinate is 2, 52795. /note=Location call: gathered evidence suggests that 52795 is the best start site for my gene that covers all the coding potential and has the best Z-score and final score. the gene is real because it has a good coding potential and it is conserved in the phamerator. the strat site covers all the action potential. also there is a gap of 4 bp between this gene and the upstream gene and there is no other start codon upstream. the start codon covers all the coding potential. /note=Function call: In BLASTp and NCBI both there was only Cafasso phage that has a function for this gene as DNA binding protein. Only one NCBI BLASTp hits, sorted by E-value, suggested function is DNA binding protein, with high query coverage (100%), high % identity (100%), and low E-values (8e-26). there was some evidence for the function of this gene as a DNA binding protein on HHpred but nothing on CDD. /note=Transmembrane domains: 0 transmembrane domain. Supports our findings for the gene function because the gene interacts with DNA, and it’s not required to bind to the membrane. /note=Secondary Annotator Name: Turon Font, Guillem /note=Secondary Annotator QC: Please make the notes more detailed! Although it seems like the correct start site, the module calls for more information. Please also update the All GM Coding Capacity tab. CDS 52999 - 53217 /gene="88" /product="gp88" /function="Helix-turn-helix DNA binding domain" /locus tag="ObLaDi_88" /note=Original Glimmer call @bp 52999 has strength 6.49; Genemark calls start at 53062 /note=SSC: 52999-53217 CP: yes SCS: both-gl ST: NI BLAST-Start: [helix-turn-helix DNA binding domain protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 9.49287E-43 GAP: 63 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.925, -2.9063687850157054, yes F: Helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding domain protein [Gordonia phage Cafasso]],,QXN74303,100.0,9.49287E-43 SIF-HHPRED: Transcriptional regulator ComR; RNPP family TPR domain HTH domain bacterial signaling peptide binding, TRANSCRIPTION; 1.894A {Streptococcus vestibularis F0396},,,6HU8_A,91.6667,98.9 SIF-Syn: Helix-turn-helix DNA binding domain, upstream gene is DNA-binding protein and downstream gene 88 is uknown in function just like Cafasso. /note=Primary Annotator Name: Senthilvelan, Jayasuriya /note=Auto-annotation: Glimmer - 52999, GeneMark - 53062. Glimmer start codon is GTG. GeneMark’s start codon is ATG. /note=Coding Potential: There is coding potential within the putative ORF. Only 52999 covers all the coding potential, though. This makes 53062 less likely. /note=SD (Final) Score: -2.906 is the best final score. 53062 has a lower final score. /note=Gap/overlap: Gap for 52999 is reasonable (63 bp) and it is the LORF. The gap for 53062 is too high and not syntenic, which makes it less likely. /note=Phamerator: Pham 33263, 10/26/2021. This pham is in one other member of cluster DZ - Cafasso_40. Function call is DNA-binding protein and is conserved and approved according to SEA-PHAGES. /note=Starterator: Start site 1 is conserved and reasonable (52999 bp, 1/1 call it). Note that there is only one other non-draft member in this pham, and it is Cafasso. So, Starterator info may not be reliable here. /note=Location call: Based on the above evidence, the gene is real and the start site is 52999. This start has a high Final score and is the LORF. Also, it covers all of the coding potential. This start site is also conserved in the one other non-draft member of this pham. /note=Function call: Phagesdb’s two top hits are Cafasso and Denise. While scores were only 40-150, e value was < 10^-6 and ident% was greater than 40%. Both hits suggested this gene product is a helix-turn-helix DNA-binding protein. Blastp indicates the same result with Streptomyces litmocidini instead. This hit has e = 2e-07, ident 48%, score 53. Hence, this gene most likely codes for a helix-turn-helix domain-containing DNA-binding protein. Both HHpred and CDD yield a HTH DNA binding domain as the most significant hit with low e-value (e-7, e-12), high probability (98%), and high coverage (76%, 88%). However, both sources also suggest the same anti-toxin protein. While this hit has a low e-value/high probability, it is not also suggested by phagesdb like the HTH DNA binding protein option. Also, the HTH option is suggested by similarities in other phages like Denise/Cafasso, but anti-toxin is not. /note=Transmembrane domains: Both TMHMM and TOPCONS predicted no TMDs in my gene product, meaning this protein doesn’t interact with the membrane. This makes sense because my protein is a DNA-binding domain, so it most likely doesn’t associate with the membrane. /note=Secondary Annotator Name: Thorp, Jocelyn /note=Secondary Annotator QC: Per the reasoning and evidence above, I agree with the location call. I also agree with the function call, and would like to add that the anti-toxin suggested by HHPRED does feature a helix-turn-helix motif. Additionally, I would suggest adding at least Cafasso as evidence for the NCBI BLAST due to its high percentage match across all categories, though there is a lot more evidence beyond that which could also be added. CDS 53214 - 53648 /gene="89" /product="gp89" /function="hypothetical protein" /locus tag="ObLaDi_89" /note=Original Glimmer call @bp 53214 has strength 12.19; Genemark calls start at 53214 /note=SSC: 53214-53648 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_89 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 3.55281E-101 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.484, -4.197870532213559, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_89 [Gordonia phage Cafasso]],,QXN74304,100.0,3.55281E-101 SIF-HHPRED: SIF-Syn: NKF (pham number 33607), downstream is also NKF(pham number 33414), upstream might be helix-turn-helix DNA binding domain. same in Cafasso /note=Primary Annotator Name: Liao, Shiqing /note=Auto-annotation: Both Glimmer and GeneMark annotate this gene. They also agree on the start site at 53214. /note=Coding Potential: The gene has reasonable coding potential predicted; however, the chosen start site covers more than the whole region of the coding potential in both Host-trained GeneMark and Self-Trained GeneMark. The start codon after the predicted start codon covers the whole region. /note=SD (Final) Score: -4.198. This score is not important because this gene is likely to be part of an operon. /note=Gap/overlap: -4, likely to be within an operon, so SD Score is not important /note=Phamerator: Phamerator: 33607. Date 10/21/2021. It is conserved; found in Cafasso (DZ) and Aleemily (DZ). /note=Starterator: Start site 2 in Starterator was manually annotated in 1/1 non-draft genes in this pham. Start site 2 is 53214 in ObLaDi. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Starterator Drop-Down Menu (see end of PECAAN Notes Instructions): NA because this gene is not the longest. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 53214. Straterator agrees with Glimmer and GeneMark /note=Function call: function unknown because all high hits from BLASTp don’t show functions, and the only one with function has an E value of 6.3. CDD and HHpred also don’t show good hits with functions. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Ma, Yiwen /note=Secondary Annotator QC: Very detailed notes, agree with your location call and function call! CDS 53700 - 53948 /gene="90" /product="gp90" /function="hypothetical protein" /locus tag="ObLaDi_90" /note=Original Glimmer call @bp 53700 has strength 17.28; Genemark calls start at 53700 /note=SSC: 53700-53948 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_90 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 2.42111E-51 GAP: 51 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.563, -4.609715333525589, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_90 [Gordonia phage Cafasso]],,QXN74305,100.0,2.42111E-51 SIF-HHPRED: SIF-Syn: NKF, upstream gene in pham 32955 with a function of MazG-like nucleotide pyrophosphohydrolase and a downstream gene in pham 32955, just like phage Cafasso. /note=Primary Annotator Name: Tenney, Megan /note=Auto-annotation: Both GeneMark and Glimmer predicted a start site of 53700bp of codon ATG. /note=Coding Potential: The gene has solid coding potential, verified in GeneMark-host and GeneMark-self, both of which support the above start site. /note=SD (Final) Score: The SD score is -4.610, which is not ideal. /note=Gap/overlap: There is a 51bp gap, which is slightly worrisome. Though this gap is also seen in Cafasso`s non-draft genome. This start site allows for the longest ORF compared to any other start site candidates and shortest gap. With this start site, the length of the ORF is very reasonable at 248 bp. /note=Phamerator: This gene is found in pham 33414 (as of 10/23/21). This pham is conserved in all other members of cluster DZ, only one of which is a non-draft phage; these phages are Cafasso and Aleemily_draft. There is no function called for this gene, but all genes have the same length which provides strong evidence for the legitimacy of this pham. /note=Starterator: Start site 1 is conserved among all members of this pham, which corresponds to a base pair coordinate of 53700bp in ObLaDi. In 3/3 of the pham members, site 1 is called. /note=Location call: The gathered evidence indicates that this is a real gene with a start site at 53700bp, which is the conserved start site 1 in all other members of the Pham. This sit is called 100% of the time it is present, which provides strong evidence that this is the real start. There is very solid coding potential that is contained within the bounds set by this start and stop site. The ORF is significant in size but there is a significant non-coding gap upstream to this gene. There is synteny between ObLaDi and Cafasso in regards to this gap, so this gap seems to be real. /note=Function call: There is not enough data to claim a function for our gene. There is only one non-draft phage with a high aligning sequence, with a strong e-value, but this has no known function. There is one aligning sequence that has been found to code for DNA primase/polymerase, but we cannot predict our gene to have the same function because of the poor e-value and the fact that no other aligned sequences also call for this function. There were no hits from CDD and hits from HHpred had high e-values and low probabilities and percentage of coverage. This gene has NKF. The inability to detect transmembrane domains supports this hypothesis. /note=Transmembrane domains: There were no transmembrane domains detected by TMHMM or TOPCONS, which supports the proposed function of “NKF.” /note=Secondary Annotator Name: Kamarzar, Minehli /note=Secondary Annotator QC: Great job! There is great evidence to support the start site called. I agree with the call! CDS 53945 - 54571 /gene="91" /product="gp91" /function="MazG-like nucleotide pyrophosphohydrolase" /locus tag="ObLaDi_91" /note=Original Glimmer call @bp 53945 has strength 13.47; Genemark calls start at 53945 /note=SSC: 53945-54571 CP: no SCS: both ST: SS BLAST-Start: [MazG-like nucleotide pyrophosphohydrolase [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 3.91673E-148 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.477, -6.085678266381749, no F: MazG-like nucleotide pyrophosphohydrolase SIF-BLAST: ,,[MazG-like nucleotide pyrophosphohydrolase [Gordonia phage Cafasso]],,QXN74306,99.0385,3.91673E-148 SIF-HHPRED: MAZG-LIKE NUCLEOSIDE TRIPHOSPHATE PYROPHOSPHOHYDROLASE; HYDROLASE, DIMERIC DUTPASE; HET: GOL, SO4; 1.7A {DEINOCOCCUS RADIODURANS},,,2YF4_B,99.0385,100.0 SIF-Syn: Upstream gene is NKF while the downstream gene is a DNA-Q-like DNA polymerase subunit III. /note=Primary Annotator Name: Teoh, Bryan /note=Auto-annotation: Glimmer and GeneMark both have the same start site at 53945. Both have a GTG start site. /note=Coding Potential: The gene displays very strong coding potential predicted within the ORF. Start sites are covered by the predicted coding potential ORF. /note=SD (Final) Score: -6.086/1.422 Z-score ; these scores are NOT the best scores among the candidates. /note=Gap/overlap: -4bp overlap is likely indicative of an operon. /note=Phamerator: Pham 32955; at 10/22/2021; It is conserved in Cafasso of Cluster DZ. /note=Starterator: Start number 1 was manually annotated in 1/1 non-draft genes in this Pham. Start 1 has start site 53851. It appeared in Cafasso of cluster DZ. This evidence does not agree with Glimmer or GeneMark auto-annotations. /note=Location call: Based on the evidence, this gene is real and the appropriate start site is 53945. /note=Function call: Maze-like nucleotide pyrophosphohydrolase /note=Transmembrane domains: No TMDs detected or listed /note=Secondary Annotator Name: Stephenson, Juliet /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 54568 - 55293 /gene="92" /product="gp92" /function="DnaQ-like (DNA polymerase III subunit)" /locus tag="ObLaDi_92" /note=Original Glimmer call @bp 54568 has strength 16.34; Genemark calls start at 54574 /note=SSC: 54568-55293 CP: yes SCS: both-gl ST: SS BLAST-Start: [DnaQ-like DNA polymerase III subunit [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 7.7749E-178 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.635, -5.491506094349139, no F: DnaQ-like (DNA polymerase III subunit) SIF-BLAST: ,,[DnaQ-like DNA polymerase III subunit [Gordonia phage Cafasso]],,QXN74307,99.1701,7.7749E-178 SIF-HHPRED: c.55.3.5 (A:1-228) Three prime repair exonuclease 2, TREX2 {Human (Homo sapiens) [TaxId: 9606]},,,d1y97a1,75.1037,99.8 SIF-Syn: Synteny is maintained. Cafasso genes are arranged very similarly, and share functions for this gene and the previous one in their genomes. There are no function notes for the following gene in Cafasso, and it`s an NKF in ObLaDi /note=Primary Annotator Name: Turon Font, Guillem /note=Auto-annotation: Both call it, but disagree (Glimmer: 54568; GeneMark: 54574). The start codon for both is GTG. /note=Coding Potential: There is clear and marked coding potential in the location. Both the start sites cover the coding potential entirely /note=SD (Final) Score: 54568 [Final score: -5.492 Z-score: 1.635] 54574 [Final score: -6.105 Z-score: 1.327]. The scores for 54568 are better, although not by much. These scores should not be very relevant, however, as the gap for 54568 is -4 and the gap for 54574 is 1. The gene is likely in an operon. /note=Gap/overlap: 54568: -4. 54574: 1. Both gaps are acceptable with the supposition that it is in an operon. These are not the LORFs, but the ones suggested that are longer have an overlap of 100+ bps (unlikely). /note=Phamerator: as of 10/25/21, it is in pham 77401. The only other non-draft member of the DZ subcluster is Cafasso, which has genes in that pham. /note=Starterator: Run on 10/25/21. The most conserved start site for similar genes in starterator is not present in ObLaDi (37, called by 16/46). The most annotated start site for this gene is a tie between #34 (54568) and #36 (54574). This does not narrow down which start site is best. /note=Location call: The autoannotated site (34@54568) is better than the other equally called site (36@54574) because it’s a bigger ORF with a gap of -4 with the previous gene. It also has better scores. /note=Function call: I believe it is a DnaQ-like polymerase subunit. The highest-score blast non-draft matches as well as most of the function frequencies are polymerase subunits. /note=Transmembrane domains: None, as per both TMHMM and TOPCONS. /note=Secondary Annotator Name: Esparza, Pablo /note=Secondary Annotator QC: Good job, I agree it seems there are overlaps but by very little making your choice a significant one. I agree with your start sight and supporting data. Make sure to put if its an SS, NA, or NI. CDS 55290 - 55457 /gene="93" /product="gp93" /function="hypothetical protein" /locus tag="ObLaDi_93" /note=Original Glimmer call @bp 55323 has strength 6.74; Genemark calls start at 55323 /note=SSC: 55290-55457 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_93 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.07451E-31 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.591, -5.642238619078679, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_93 [Gordonia phage Cafasso]],,QXN74308,98.1818,1.07451E-31 SIF-HHPRED: SIF-Syn: Function: NKF; Gene 92; Pham: 33181 In phage Cafasso: function and Pham number are the same, except Gene 93 Upstream function: DnaQ-like (DNA polymerase III subunit); Gene 91; Pham: 83425 In phage Cafasso: same function and pham number, except Gene 92 Downstream function: NKF; Gene 93; Pham: 83540 In phage Cafasso: Function: HNH endonuclease; Gene 94; Pham: 83449 /note=Primary Annotator Name: Ma, Yiwen (Kristy) /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 55323 /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. The ORF does have reasonable coding potential and the chosen start site does include all of the coding potential. /note=SD (Final) Score: Final Score: -6.488; Z-score: 1.514. It is the best final score on PECAAN. However, the Z-score is less than 2(not a significant/good score). /note=Gap/overlap: 4 bp overlap with upstream gene, which is evidence of an operon. /note=Phamerator: Pham:33181. Date 10/24/21. It is conserved; found in Cafasso(DZ) and Aleemily(DZ) /note=Starterator: The start number called the most often in the published annotations is 3, it was called in 1 of the 1 non-draft genes in the pham. The start site 3 is 55290 in ObLaDi, and manual annotations of this start is 1. Glimmer and GeneMank both agree on start site 4 (55323 in ObLaDi) /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 55290. /note=Secondary Annotator Name: Rajiv, Subashni /note=Secondary Annotator QC: I agree with the annotation as all evidence categories have been considered. /note=Function call: NKF. The top one phagesdb BLAST hit says that the function is unknown (E-value =5e-29), and the only NCBI BLAST hit suggest a hypothetical protein, which is the same meaning of unknown function (100% coverage, 98.2% identity, and E-value = 1.07451e-31). HHpred had no significant hits. CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Rajiv, Subashni /note=Secondary Annotator QC: Looks good. All evidence has been considered. CDS 55454 - 55750 /gene="94" /product="gp94" /function="hypothetical protein" /locus tag="ObLaDi_94" /note=Original Glimmer call @bp 55454 has strength 15.45; Genemark calls start at 55454 /note=SSC: 55454-55750 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein CH289_07730 [Rhodococcus sp. RS1C4]],,NCBI, q3:s5 97.9592% 1.01791E-44 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.005, -2.6013996449736907, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein CH289_07730 [Rhodococcus sp. RS1C4]],,OZC55074,81.3726,1.01791E-44 SIF-HHPRED: SIF-Syn: The current gene is NKF (Pham 80125), upstream gene(Pham 33181) is also NKF, which is different from phage Cafasso that has an upstream gene(Pham 83053) being HNH endonuclease; downstream gene(Pham 83016) is also NKF, just like in phage Cafasso of Cluster DZ. /note=Primary Annotator Name: Wang, Jennifer Yiyang /note=Auto-annotation: Both Glimmer and GeneMark agree on the same site 55454 as the start site. /note=Coding Potential: This gene has reasonable coding potential predicted within the putative ORF, and the chosen start site covers all the predicted coding potential. /note=SD (Final) Score: final score=-2.601 It is the best final score on PECAAN. /note=Gap/overlap: -4 This indicates this gene is likely part of an operon. /note=Phamerator: Pham: 80125. Date 10/26/21. It is conserved; found in Cafasso(DZ) and Aleemily(DZ) which are within the same cluster as ObLaDi, as well as in other 206 members from other clusters. There is no function called for the gene. /note=Starterator: Start site 52 in Starterator was manually annotated in 70/198 non-draft genes in this pham. Start 52 is 55454 in ObLaDi. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 55454 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: Highly possible unknown function for the gene. All phagesdb BLAST top hits and NCBI BLAST top hits state “function unknown” for the gene. NKF, there is no hit for CDD and no good hit for HHpred. /note=Transmembrane domains: No TMD`s called and no evidence suggesting TMD function within the other databases, neither TMHMM nor TOPCONS calls it. /note=Secondary Annotator Name: Gonzalez, Celio /note=Secondary Annotator QC:In full agreement with evidence and conclusion. Great job! CDS 55798 - 56244 /gene="95" /product="gp95" /function="hypothetical protein" /locus tag="ObLaDi_95" /note=Original Glimmer call @bp 55798 has strength 16.15; Genemark calls start at 55798 /note=SSC: 55798-56244 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_96 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 3.28516E-86 GAP: 47 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.451, -3.877930595869325, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_96 [Gordonia phage Cafasso]],,QXN74311,97.973,3.28516E-86 SIF-HHPRED: SIF-Syn: NKF. Upstream gene is pham 80125, Downstream gene is pham 33362, just like in phage Cafasso. /note=Primary Annotator Name: Whang, Allison /note=Auto-annotation: Glimmer and Genemark agree on the projected start site of 55798. /note=Coding Potential: As indicated by the Glimmer and Genemark coding potential map as well as a lack of violation of the guiding principles, this gene does seem to have coding potential. However, the start site as predicted by Glimmer and Genemark does not include the entirety of the coding potential in neither the host-trained Genemark nor the self-trained Genemark. /note=SD (Final) Score: -3.878 for start site 55798. While this is not the most negative SD-score, it is reasonable because it is more negative than -2. /note=Gap/overlap: 48 bp gap with the upstream gene. /note=Phamerator: This gene is part of pham 80061, as of 10/26/2021. There are 2 other genes that are part of the same cluster (DZ). I have been using Cafasso’s corresponding gene to compare my gene to. There are no listed functions for this gene listed in Phamerator. /note=Starterator: The start site 7 is conserved among the members of this pham. The basepair coordinate for this start site is 55798. 14/31 non-draft genes called this same start site. /note=Location call: The gathered evidence further affirms that 55798 is the correct start site. This gene is a real gene because it is conserved in phamerator and has good coding potential. The correct start site should be 55798. /note=Function call: According to all information (BLASTp, CDD, HHpred), there seems to be no known function for this gene. /note=Transmembrane domains: No transmembrane domains indicated by TMHMM or TOPCONS. /note=Secondary Annotator Name: Lee, Adrienne /note=Secondary Annotator QC: I agree with the the location and functional call. CDS 56244 - 56477 /gene="96" /product="gp96" /function="hypothetical protein" /locus tag="ObLaDi_96" /note=Original Glimmer call @bp 56244 has strength 15.66; Genemark calls start at 56244 /note=SSC: 56244-56477 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_97 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.85546E-41 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.17, -4.394706008439538, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_97 [Gordonia phage Cafasso]],,QXN74312,91.25,1.85546E-41 SIF-HHPRED: SIF-Syn: No known function (pham 33362), upstream gene is in pham 81424, downstream gene is in pham 80975, just like in phage Cafasso. /note=Primary Annotator Name: Wright, Nicklas /note=Auto-annotation: Glimmer and Genemark both call the gene with a start site of 56244. /note=Coding Potential: The gene has a lot of coding potential, all of which is covered by the start site 56244. /note=SD (Final) Score: -4.395, this is the best score in PECAAN /note=Gap/overlap: 1 bp, this gene may be in an operon /note=Phamerator: pham 33362 as of 10/23/2021. The pham is conserved in all members of subcluster DZ. No function is called. /note=Starterator: Start site 1 is conserved in all members of the pham. This start site corresponds to position 56244 in ObLaDi. All 3 members of the pham call this start site. /note=Location call: This is likely a real gene with start site 56244 /note=Function call: There are no good hits for CDD, HHpred, or BLASTp. Therefore, the function is unknown. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Jin, Katherine /note=Secondary Annotator QC: I agree with the evidence that the start site is 56244. The evidence for Function and TMD also makes sense. Good Job! CDS 56474 - 57064 /gene="97" /product="gp97" /function="metallophosphoesterase" /locus tag="ObLaDi_97" /note=Original Glimmer call @bp 56474 has strength 16.93; Genemark calls start at 56474 /note=SSC: 56474-57064 CP: yes SCS: both ST: SS BLAST-Start: [metallophosphoesterase [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 2.15776E-138 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.661, -3.368377069717189, yes F: metallophosphoesterase SIF-BLAST: ,,[metallophosphoesterase [Gordonia phage Cafasso]],,QXN74313,99.4898,2.15776E-138 SIF-HHPRED: MPP_GpdQ; Enterobacter aerogenes GpdQ and related proteins, metallophosphatase domain. GpdQ (glycerophosphodiesterase Q, also known as Rv0805 in Mycobacterium tuberculosis) is a binuclear metallophosphoesterase from Enterobacter aerogenes that catalyzes the hydrolysis of mono-, di-, and triester substrates, including some organophosphate pesticides and products of the degradation of nerve agents.,,,cd07402,89.2857,99.6 SIF-Syn: Metallophosphoesterase protein is flanked by two proteins of no known function. The gene to the left is in pham 33362 and the gene to the right is in pham 33178, just like in phage Cafasso. /note=Primary Annotator Name: Chavez, Valeria /note=Auto-annotation: Glimmer and Genemark both agreed on the same start site (56474). The start codon is GTG. /note=Coding Potential: This gene had reasonable coding potential predicted within the putative ORF. The chosen start site covered all of the predicted coding potential. /note=SD (Final) Score: -3.368. It is the best final score on PECAAN. /note=Gap/overlap: The overlap is 4 bp. This indicates this gene is likely part of an operon. /note=Phamerator: This gene is in Pham 80975 as of 10/26/21. Our phage is in cluster DZ, and there is only 1 non draft gene (Cafasso_98) that also has this pham. Phamerator did not have a function called for this gene, but the function for this same gene in Cafasso is metallophosphoesterase. /note=Starterator: Yes, start site 47 at basepair position 56474 is conserved among the only other member of the pham to which this gene belongs (Cafasso_98). 20/95 non draft genes in this pham call site 47. /note=Location call: The gathered evidence suggests that this is a real gene and that start site 47 at basepair position 56474 is most likely the true start site. /note=Function call: The top 2 NCBI and PhagesDB BLASTp hits, sorted by e-value, suggested function is metallophosphoesterase, with high query coverage (>96%), medium to high % identity (98%, 63%), and low e-values (99%, high coverage of >88%, and low e-values that met the <10e-3 threshold. The suggested function is also metallophosphoesterase. /note=Transmembrane domains: Since TMHMM and TOPCONS didn’t call at least 1 TMD, we can conclude that this protein doesn’t have any TMDs. This makes sense because this gene codes for a metallophosphoesterase, which are required for ER-Golgi transport. /note=Secondary Annotator Name: Wang, Jennifer Yiyang /note=Secondary Annotator QC: Looks great! Agree on the start site. CDS 57065 - 57361 /gene="98" /product="gp98" /function="hypothetical protein" /locus tag="ObLaDi_98" /note=Original Glimmer call @bp 57065 has strength 20.14; Genemark calls start at 57065 /note=SSC: 57065-57361 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_99 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.22788E-58 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.743, -3.2017655760970833, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_99 [Gordonia phage Cafasso]],,QXN74314,97.9592,1.22788E-58 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF (Pham 33356), downstream is metallophosphoesterase (Pham 80975), just like in phage Cafasso from the same cluster DZ. /note=Primary Annotator Name: Huq, Naveed /note=Auto-annotation: Glimmer and Genemark both agree on start site @57065, start codon is GTG /note=Coding Potential: Reasonable coding potential in putative ORF, covered by chosen start site /note=SD (Final) Score: The original start site of 57065 has a final score of -3.302 and a Z-score of 2.743. This start site does not have the best Ribosome Binding Site score but the other starts are not better because the gap is much larger. /note=Gap/overlap: Upstream gaps is reasonable, so is gene length /note=Phamerator: Pham 33178 - 11/3/21. The pham my gene belongs to does present in other members of the cluster, DZ. The phage that I used for comparison is Cafasso. No function called. /note=Starterator: Conserved start site number 1, @57065, 2/2 other members of pham call same start site number /note=Location call: Real gene with most likely start site @57065, conserved in starterator /note=Function call: No databases provided any evidence to support a call. /note=Transmembrane domains: No TMDs predicted /note=Secondary Annotator Name: Esparza, Pablo /note=Secondary Annotator QC: Recall to include its possible function being /note=metallophosphoesterase. Also choose if it is SS, NA, Ni for the reports and to confirm if your choice gene covers all the coding potentials by hitting "Yes". U am confused as to if this is truly the correct start sight. While the data seems to support it, it does not cover the whole coding potential in the start portion. CDS 57358 - 57615 /gene="99" /product="gp99" /function="hypothetical protein" /locus tag="ObLaDi_99" /note=Original Glimmer call @bp 57358 has strength 15.16; Genemark calls start at 57358 /note=SSC: 57358-57615 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_100 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.52828E-46 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 0.818, -7.149082784683789, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_100 [Gordonia phage Cafasso]],,QXN74315,92.9412,1.52828E-46 SIF-HHPRED: SIF-Syn: NKF, the genes upstream and downstream are NKF, similarly to phage Cafasso. /note=Primary Annotator Name: Cosentino, Evan /note=Auto-annotation: Both Glimmer and GeneMark call the gene and they agree on the start site at 57358 bp. /note=Coding Potential: Coding potential is found both in GeneMark Self and Host. The ORF has good coding potential. This is a forward gene as coding potential in the ORF is on the forward strand only. /note=SD (Final) Score: -7.149 /note=Gap/overlap: 4 bp overlap /note=Phamerator: It is in Pham 33356. Date: 10/31/21. This pham is only in cluster DZ. /note=Starterator: Start site 2 in Starterator was manually annotated in 1/3 non-draft genes in this pham. Start 2 is 57358 in ObLaDi. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: 57358 /note=Function call: Unknown function. All of the PhagesDB hits list the function as unknown and NCBI only has 1 hit which lists it as a hypothetical protein. There were also no CDD hits and no significant hits on HHpred. All of the HHpred hits had low probabilities, high e-values, and bad coverage. /note=Transmembrane domains: TMHMM and TOPCONS both call no transmembrane domains. /note=Secondary Annotator Name: Senthilvelan, Jayasuriya /note=Secondary Annotator QC: Looks good. Interesting that the chosen start has lower final/Z score, but all other evidence points to it being the start. I agree with both the location and function call. All evidence categories have been filled out. CDS 57612 - 58025 /gene="100" /product="gp100" /function="hypothetical protein" /locus tag="ObLaDi_100" /note=Original Glimmer call @bp 57612 has strength 16.33; Genemark calls start at 57612 /note=SSC: 57612-58025 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein KNU10_gp05 [Gordonia phage Foxboro] ],,NCBI, q19:s49 74.4526% 2.17163E-25 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.416, -3.8900809529022813, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein KNU10_gp05 [Gordonia phage Foxboro] ],,YP_010098261,37.5661,2.17163E-25 SIF-HHPRED: SIF-Syn: ObLaDi gene 100 corresponds to Cafasso gene 101, according to Pham Maps. Like ObLaDi gene 100, Cafasso gene 101 has no known function. ObLaDi gene 100 and Cafasso gene 101 are both of pham 73833 and preceded upstream by gene 99/100 of pham 33356 and followed downstream by gene 101/102 of pham 20258. Genes 99/100 both have an unknown function, while genes 101/102 both have the function of ribonucleotide reductase. /note=Primary Annotator Name: Gibbons, Alicia /note=Auto-annotation: Predicted by both GeneMark, Glimmer. Both predict a start site of 57612. Start codon: ATG /note=Coding Potential: The gene has reasonable coding potential within the putative ORF and the chosen start site covers all of this coding potential. /note=SD (Final) Score: -3.890. This is the highest final score. /note=Gap/overlap: -4 (original start site). This suggests that this gene is part of an operon. /note=Phamerator: As of 10/25/21, this gene is in the pham 73833. /note=Starterator: Start site 3 is called in 3/4 of the non-draft genes and is called 100% of the time when present. Start site 8 is found in 3 of 7 of genes in the pham and is called 100% of the time when present, including in all 3 of the subcluster DZ phage genes. This corresponds to the base pair 57612. /note=Location call: The evidence suggests that this is a real gene with a start site of 57612 and a stop site of 58025. All other phages of the cluster DZ (Cafasso, Draft Aleemily) contain this gene in this phamily. Phamerator and PhagesDB do not call a function for this gene. /note=Function call: PhagesDB BLAST calls this gene to have an unknown function for every gene with an e-value greater than 1e-06. Similar results are seen on NCBI BLAST. Phagesdb Function Frequency is not informative, since it calls three different functions not repeated elsewhere. CDD returns no hits for this gene and HHpred only returns hits with e-values of 76 or greater and at most a probability of 43.12 /note=Transmembrane domains: No transmembrane domains predicted. /note=Secondary Annotator Name: Paek, Brian /note=Secondary Annotator QC: I agree with this annotation. All evidence has been considered and reviewed. Phamerator information got cut off, please check that again. CDS 58022 - 58972 /gene="101" /product="gp101" /function="ribonucleotide reductase" /locus tag="ObLaDi_101" /note=Original Glimmer call @bp 58022 has strength 20.13; Genemark calls start at 58022 /note=SSC: 58022-58972 CP: yes SCS: both ST: SS BLAST-Start: [ribonucleotide reductase [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.391, -4.2111970702621555, no F: ribonucleotide reductase SIF-BLAST: ,,[ribonucleotide reductase [Gordonia phage Cafasso]],,QXN74317,99.6835,0.0 SIF-HHPRED: Ribonucleoside-diphosphate reductase subunit beta; OXIDATION-REDUCTION, FLAVIN MONONUCLEOTIDE, MANGANESE, OXIDOREDUCTASE; HET: MN; 2.65A {Streptococcus sanguinis} SCOP: a.25.1.0,,,4N83_B,99.6835,100.0 SIF-Syn: This gene is a ribonucleotide reductase, the corresponding gene in Cafasso has the same function. The upstream gene is in Pham 73833 in both ObLaDi and Cafasso and the downstream gene is in pham 11677 in both as well. /note=Primary Annotator Name: Fleming, Hanna /note=Auto-annotation: Glimmer and Genemark both call the gene with a start site of 58022. /note=Coding Potential: There is good coding potential in both self-trained and host-trained Genemark. The start site does include all of that coding potential. /note=SD (Final) Score: -4.211, this is the second best score on PECAAN. /note=Gap/overlap: There is a 4 bp overlap with the previous gene, this is reasonable. /note=Phamerator: Pham 20258 as of 10/25/21. This is also found in Cafasso. /note=Starterator: Start site 21 was found in 7/45 genes in the pham. This corresponds to bp 58022 in ObLaDi. This agrees with Glimmer and GeneMark. /note=Location call: This is likely a real gene with start site 58022. /note=Function call: ribonucleotide reductase. The top 5 hits on NCBI BLAST all call ribonucleotide reductase as the function. These all have e-values of 0, coverages above 98% and identities above 78%. The top two hits on CDD have an e-value of 0, coverage of 98.7% and % identity of >50.7% they both have a function of ribonucleoside-diphosphate reductase beta subunit. The top two hits on HHpred also have a function of ribonucleoside-diphosphate reductase beta subunit with an e-value of 0, probability of 100% and coverage of >99.3%. /note=Transmembrane domains: Neither TMHMM or TOPCONS predicts a transmembrane domain so this is not a transmembrane protein. /note=Secondary Annotator Name: Teoh, Bryan /note=Secondary Annotator QC: I agree that 58022 is the best call. There is evidence of strong coding potential, appropriate overlap, and synteny with other established genomes. Therefore, the call is accurate. CDS 58969 - 59115 /gene="102" /product="gp102" /function="hypothetical protein" /locus tag="ObLaDi_102" /note=Original Glimmer call @bp 58969 has strength 8.87; Genemark calls start at 58969 /note=SSC: 58969-59115 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_103 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 8.3182E-24 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.433, -3.914968956777623, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_103 [Gordonia phage Cafasso]],,QXN74318,95.8333,8.3182E-24 SIF-HHPRED: SIF-Syn: NKF, upstream gene is a ribonucleotide reductase, downstream gene is in pham 4791, just like in phage Cafasso /note=Primary Annotator Name: Empson, Brianna /note=Auto-annotation: Both Glimmer and Genemark call this gene at the start site of nucleotide 58969. This corresponds to a start codon of GTG. /note=Coding Potential: There is good coding potential for this gene within the putative ORF. The potential is encompassed by the predicted start site and stop site. /note=SD (Final) Score: The Final Score of -3.915 is the best. However, this may be irrelevant due to the -4 bp overlap indicating an operon. /note=Gap/overlap: The overlap of -4 bp is reasonable. The length of the gene is acceptable. /note=Phamerator: As of 10/23/21, this gene is found in pham 11677. 3/6 members (ObLaDi, Cafasso, and Aleemily) belong to the same cluster as ObLaDi (DZ). 2/6 members (Phrappuccino and Settecandela) belong to cluster AA, and the final member (BrutonGaster) belongs to (sub)cluster CQ2. A function was not called for this gene. /note=Starterator: There was not a reasonable, conserved start site for this gene. ObLaDi did not contain the most conserved start site. The start site number called most often in published annotations was #4. It was called in 2/4 non-draft annotations. Because ObLaDi did not contain the most conserved start site, Staterator was not very useful. /note=Location call: This gene is a real gene. Given that Gene (stop@59115 F) does not even have the “most annotated” start, I would say that it is not the best start site. Furthermore, the fact that both Glimmer and GeneMark called the start site at 58969 is strong evidence that start #2 is the better site for this gene. Based on all of the evidence, I would agree with the auto-annotated start site called by Glimmer and GeneMark. Given that both Glimmer and GeneMark call the same site, and the fact that that site has the LORF as well as the best Z-value and Final Score, I would say the auto-annotated start site is correct. /note=Function call: Utilizing PhagesDB BLAST and NCBI BLAST was not particularly useful because all comparative genes have unknown functions. No relevant CDD hits. No useful HHpred hits. Utilizing PhagesDB BLAST and NCBI BLAST was not particularly useful because all comparative genes have unknown functions. No relevant CDD hits. No relevant HHpred hits. The lack of evidence across all databases supports a call of NKF. /note=Transmembrane domains: No transmembrane domains predicted. There was very little known about this gene`s function, so the absence of TMDs can at least tell us that this protein does not interact with the bacterial cell membrane. /note=Secondary Annotator Name: Whang, Allison /note=Secondary Annotator QC: Agree with the projected start site and reasoning. CDS 59187 - 59504 /gene="103" /product="gp103" /function="hypothetical protein" /locus tag="ObLaDi_103" /note=Original Glimmer call @bp 59187 has strength 17.6; Genemark calls start at 59187 /note=SSC: 59187-59504 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_104 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 3.63955E-64 GAP: 71 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.802, -3.3692135650586397, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_104 [Gordonia phage Cafasso]],,QXN74319,100.0,3.63955E-64 SIF-HHPRED: SIF-Syn: Downstream from robonucleotide reductase, showing consistent synteny with Cafasso. There is a single gene shift in synteny that can be attributed to upstream genes 93 and 94 not aligning. /note=Primary Annotator Name: Ghannam, Maisam /note=Auto-annotation: Both Glimmer & Genemark show identical start site calls at 59187. Start codon TTG. /note=Coding Potential: High coding potential over putative start site, cross referenced by both Host and Self trained Genemark. Chosen start site does cover all of the coding potential regions. /note=SD (Final) Score: -3.369 /note=Gap/overlap: 71 bp overlap, no coding potential before the start site to indicate extension. /note=Phamerator: Pham 4791 containing only 4 members, 3 of which are the DZ cluster including ObLaDi. No known functional call for this Pham perhaps due to low Pham membership. Highly conserved genes in DZ subcluster as verified by Phamerator synteny diagrams. /note=Starterator: Suggested start site (2 @ 59503) shares this start site with all other members in this pham (also the most called start site). No coding potential found in gaps preceding gene. /note=Location call: This is a real gene with start site (2 @ 59504) and functions in the forward direction. Unknown gene function found. Updated as of 11/14/21 /note=Function call: NKF -- no final function labeled in NCBI BLASTp. Each hit is either a draft, a hypothetical gene, or belongs outside mycobacterium host range. /note=Transmembrane domains: N/A -- no TMD probability indicating any related AA sequence hits /note=Secondary Annotator Name: Paek, Brian /note=Secondary Annotator QC: All evidence has been considered and reviewed. be more specific about what the Final score entails. Also why do you say 59503 when it is not one of the start site selections? Function Call is incomplete and Synteny is not detailed enough, check the annotation manual sample. CDS 59501 - 59914 /gene="104" /product="gp104" /function="hypothetical protein" /locus tag="ObLaDi_104" /note=Original Glimmer call @bp 59501 has strength 13.97; Genemark calls start at 59501 /note=SSC: 59501-59914 CP: yes SCS: both ST: NI BLAST-Start: [DNA binding protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 6.99921E-93 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.661, -3.3861058366776207, yes F: hypothetical protein SIF-BLAST: ,,[DNA binding protein [Gordonia phage Cafasso]],,QXN74320,99.2701,6.99921E-93 SIF-HHPRED: SIF-Syn: DNA binding protein, the upstream gene is in pham 4791 and downstream gene is in pham 33368, just like in phage Cafasso /note=Primary Annotator Name: Empson, Brianna /note=Auto-annotation: Both Glimmer and Genemark do call this gene with a start site at nucleotide 59501. This correlates to a predicted start codon of GTG. /note=Coding Potential: The gene does have good coding potential within the putative ORF. All coding potential is covered within the predicted start site and stop site. /note=SD (Final) Score: The SD Score corresponding to the predicted start site is the best at -3.386. However, this may not be very relevant given the 4 bpp overlap, indicating a possible operon. /note=Gap/overlap: There is a reasonable overlap of -4 bp. This may indicate the presence of an operon. The length of the gene is reasonable given the predicted start site. /note=Phamerator: As of 10/23/2021, this gene is found in pham 73151. The only non-draft phage genome that this pham is also present in is Cafasso. There is not a function called for this gene in Phamerator. /note=Starterator: There does not appear to be a well conserved start site across the members of the pham as less than half of the non-draft genomes contain the same start site. There are 244 non-draft members in this pham, and 83/244 call site #15 (the most conserved site). However, Starterator does not appear very helpful because less than half of the non-draft genomes call the same start site. It is worth noting that Starterator calls Start Site 10 for all phages of the subcluster DZ (including drafts). /note=Location call: All of the evidence suggests that this is a real gene with a start site at bp 59501. The Final Score, Z-Value, and LORF data all point to this start site being reliable. The Final Score and Z-score are both the best scores available, and the selected start site contains the LORF. /note=Function call: The top hit on PhagesDB BLAST and NCBI BLAST indicates that this function may be a DNA binding protein. The minimum query coverage was high (≥84%), identity was ≥ 50% and the e-value was low (≤1e-38). CCD did not return any relevant hits. The top two hits on HHpred had high probabilities (> 90%) and coverage (> 73%), but the e-values were high (> 3). However, the top HHpred hit still agreed that this ORF coded for a DNA binding protein. /note=Transmembrane domains: No transmembrane domains predicted. The absence of TMDs does make sense in this context because the hypothesized function for this gene is a DNA binding protein. As the name suggests, this protein is primarily binding to viral DNA, not interacting with the bacterial cell membrane or adding in entry/lysis. /note=Secondary Annotator Name: Gibbons, Alicia /note=Secondary Annotator QC: I agree! You can note that Starterator calls Start Site 10 for all phages of the subcluster DZ (including drafts). You can also describe in the Location Call section why those parameters point to the start site being reliable. For the PhagesDB BLAST suggesting a DNA binding protein, were the top 4 hits non-draft phages? CDS 59914 - 60171 /gene="105" /product="gp105" /function="hypothetical protein" /locus tag="ObLaDi_105" /note=Original Glimmer call @bp 59914 has strength 15.97; Genemark calls start at 59914 /note=SSC: 59914-60171 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_106 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 2.51897E-51 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.584, -3.8156109669835954, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_106 [Gordonia phage Cafasso]],,QXN74321,100.0,2.51897E-51 SIF-HHPRED: SIF-Syn: Protein of unknown function, upstream gene is DNA binding protein, downstream is HNH endonuclease, just like in phage Cafasso. /note=Primary Annotator Name: Erfanian M., Kiana /note=Auto-annotation: Glimmer and GeneMark. Both called the same start site at 59914. /note=Coding Potential: The gene has reasonable coding potential within the putative ORF, and covers all this coding potential. /note=SD (Final) Score: The RBS Final Score at -5.758 for the original start at 59914 is not the best (highest), but is still reasonable. The Z-score for this start however, is the best (highest) at 2.584. /note=Gap/overlap: The original start has an overlap of 1 bp with the gene that comes before its start; this gene starts at 59914, and the gene that precedes it has a stop at 59914. /note=Phamerator: This gene was found in pham 33368 on 10/15/21, which has three members, two of which are drafts. Additionally, this pham was found to be present in another member of the cluster DZ, using phage Cafasso for comparison. /note=Starterator: Using information from the Starterator analysis run most recently on 10/22/21, it was found that the most conserved start site number is 2. The auto-annotated start is called at start number 2 (59914), which matches the most conserved start. ObLaDi’s track contains start site 2 by a yellow line, which denotes it as an auto-annotated start. Start site 2 in ObLaDi’s track corresponds to that of other phages in the cluster, such as Aleemily. Start site 2 has been determined to be the Final Human Annotated start, as represented by a green line on the track representing these phages. The analogous start site between ObLaDi and other phages in this cluster is therefore promising, indicating that the auto-annotated start site 2 at 59914 is indeed correct. /note=Location call: The evidence gathered thus far indicates that the start site at 59914 as called by Glimmer and GeneMark appears to be the most probable site. /note=Function call: Both PhagesDB BLASTp and NCBI BLASTp each have one non-draft hit with low E-values, high identity percentages, and reasonable scores. The top non-draft hit on PhagesDB BLASTp was for a gene in Cafasso, a final phage in the same cluster as ObLaDi. This hit has a significantly low E-value at 1e-40, a reasonable score, as well as a max identity percentage of 100%. There are a total of three hits, although only the aforementioned hit belongs to a final draft phage genome. Regardless, each hit has an unknown function. Furthermore, there was only one hit on NCBI BLASTp, and belongs to a final draft phage genome. This was also for a gene in Cafasso, a final phage in the same cluster as ObLaDi- although it should be noted that the hit is for a hypothetical protein. This hit has an even lower E-value of 3e-51, a reasonable score, as well as a max identity percentage of 100%. No CDD return hits. The HHpred returned several hits for proteins of various or unknown functions, and was therefore inconclusive in determining the function. Given the above information, the data is insufficient in formulating a hypothesis for the function of my gene, whose function appears to be unknown. /note=Transmembrane domains: No TMDs called by TmHmm or TOPCONS. The protein is not a membrane protein. /note=Secondary Annotator Name: Verpukhovskiy, Philipp /note=Secondary Annotator QC: Good coding potential, high Z-score, small gap between genes, and a conserved start site makes this start site a great candidate, I agree with primary annotator. CDS 60168 - 60557 /gene="106" /product="gp106" /function="HNH endonuclease" /locus tag="ObLaDi_106" /note=Original Glimmer call @bp 60168 has strength 13.26; Genemark calls start at 60168 /note=SSC: 60168-60557 CP: yes SCS: both ST: SS BLAST-Start: [HNH endonuclease [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 7.77038E-88 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.826, -3.875201829906135, no F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Gordonia phage Cafasso]],,QXN74322,97.6744,7.77038E-88 SIF-HHPRED: d.4.1.8 (A:513-673) CRISPR-associated endonuclease Cas9/Csn1, HNH domain {Actinomyces naeslundii [TaxId: 1115803]},,,d4ogca2,44.9612,97.8 SIF-Syn: HNH endonuclease, upstream gene is NKF (Pham 33368), downstream is NKF (Pham 32923), just like in phage Cafasso from the same cluster DZ /note=Primary Annotator Name: Khaine, Aye Myat /note=Auto-annotation: Both Glimmer and GeneMark call the gene at the same start 60168. /note=Coding Potential: This coding potential of this ORF is reasonable and only in forward direction. The ORF start includes all of the coding potential. /note=SD (Final) Score: The final SD score is -3.875 with a z-score of 2.826. This is the best option. /note=Gap/overlap: The overlap upstream is 4 bp which indicates that this gene belongs to an operon. /note=Phamerator: The pham this gene is in is 80925 as of 10/26/2021. This pham has 515 members with one non-draft phage Cafasso belonging to the same cluster DZ as this phage. /note=Starterator: Start site 121 at 60168 is manually annotated in two other non-draft genes in the pham. This site agrees with the sites called by Glimmer and GeneMark. /note=Location call: Considering the evidence above, this is a real gene with a start site at 60168. /note=Function call: HNH endonuclease. Top two hits in both phagesdb and NCBI BLAST have HNH endonuclease as their function (the lower e-value of 3e-24 and at least 48% identity). HHpred top hits also correspond to HNH endonuclease with good enough e-value (1e-4) and about 97% probability. /note=Transmembrane domains: No TMDs predicted by both TMHMM and TOPCONS, so this is not a membrane protein. /note=Secondary Annotator Name: Stephenson, Juliet /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 60554 - 60913 /gene="107" /product="gp107" /function="hypothetical protein" /locus tag="ObLaDi_107" /note=Original Glimmer call @bp 60542 has strength 17.51; Genemark calls start at 60542 /note=SSC: 60554-60913 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_108 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 6.60053E-81 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.651, -3.3275678682521845, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_108 [Gordonia phage Cafasso]],,QXN74323,100.0,6.60053E-81 SIF-HHPRED: SIF-Syn: NKF. All evidence gathered for this gene call is from Cafasso. The upstream genes for ObLaDi and Cafasso are both called as HNH endonuclease and the downstream genes are NKF. /note=Primary Annotator Name: McLinden, Katherine /note=Auto-annotation: Both Glimmer and Genemark call 60542 as the start site. The start site codon is GTG. /note=Coding Potential: This gene has good coding potential. Both GeneMark and Glimmer suggested coding potential at the putative ORF. The start site includes all of the coding potential. /note=SD (Final) Score: -3.673. This is the second best SD final score on PECAAN by a few tenths of a point. /note=Gap/overlap: -16. This was the second most reasonable gap. /note=Phamerator: This gene belongs to pham 32923. There are 3 members of this pham. ObLaDi (draft), Cafasso (non-draft), and Aleemily (draft) are all part of the DZ cluster. The date is 10/26/21. /note=Starterator: Starterator calls start site 4 for ObLaDi, which is 60542. Cafasso calls start site 5 (60554 in ObLaDi). Cafasso is only other non-draft member. /note=Location call: 60554 based on -4 gap: probable operon. /note=Function call: PhagesDB BLASTp and NCBI BLASTp were used to try and determine the gene’s function. There is one hit in each database from non-draft genes that has significance for this gene’s function. Both hits have very low E-values (5e-64 and 3e-80 respectively) and ~96% identities, making them good matches. However, either has any suggested function meaning we are not able to determine a putative function for this gene. Additionally, the terminase function is suggested, with very little weight, by the PhagesDB Function Frequency. This may show that there was gene recombination or horizontal transfer at some point, but it does not mean that there was a conserved function. Additionally, CDD and HHpred had no significant hits. /note=Transmembrane domains: There were no TMD`s called and no other evidence to suggest a TMD function within the other databases. /note=Secondary Annotator Name: Khaine, Aye Myat /note=Secondary Annotator QC: I agree with the annotations above. CDS 60910 - 61128 /gene="108" /product="gp108" /function="hypothetical protein" /locus tag="ObLaDi_108" /note=Original Glimmer call @bp 60919 has strength 15.65; Genemark calls start at 60919 /note=SSC: 60910-61128 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_109 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.62327E-44 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.738, -5.340269015678895, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_109 [Gordonia phage Cafasso]],,QXN74324,98.6111,1.62327E-44 SIF-HHPRED: SIF-Syn: NKF (pham 33519), upstream gene is NKF (pham 32923), downstream is NKF (pham 33579), just like in phage Cafasso. /note=Primary Annotator Name: Uvarov, Evgeniy /note=Auto-annotation start source: Glimmer and GeneMark both auto call start at 60919 (site 4) with an GTG start codon. /note=Coding Potential: Reasonable coding potential predicted within the putative ORF by both Glimmer and GeneMark. The manually chosen start site (60910 - site 3) covers all the coding potential. This ORF only has forward strand coding potential, thus this is a forward gene. /note=SD (Final) Score: For the 60910 start site it is -5.340, the best final score on PECAAN, but this could be irrelevant since this gene is possibly part of an operon with a 4 bp overlap. The Z-score is 1.738, the best value on PECAAN. /note=Gap/overlap: For the 60910 start site there is an overlap of 4 bp (gap = -4) indicative of a likely operon within which this gene is located. This start site does not create the LORF and the gene length is 219 bp which is acceptable. /note=Phamerator: The pham number as of 10/25/2021 is 33519. The gene is conserved in phages Aleemily_Draft (DZ) and Cafasso (DZ). Cafasso (DZ) is the best phage genome for comparison since it is non-draft. Based on PhagesDB there is no function call for the gene. /note=Starterator: Based on the 10/22/21 run the most annotated start site 3 is a reasonable choice that is conserved among members of pham 33519. There are 3 members total with only 1 non-draft member in this pham. 3/3 of total members and 1/1 of non-draft members call start site 3, which correlates to 60910 bp for ObLaDi. /note=Location call: Considering all of the evidence above, this gene is a real gene that is conserved in phamerator as well as starterator, has good coding potential and covers all of it with a start site at 60910 bp (site 3). Starterator (60910 - site 3) does not agree with Glimmer and Genemark (60919 - site 4). /note=Function call: Not enough data to form a function hypothesis, but this is likely a real protein. PhagesDB BLASTp hits are all of unknown function, the top non-draft hit with a small e-value is from Cafasso_109 (e: 4e-37, id: 98%). NCBI BLASTp shows only a single hit of a hypothetical protein with a small e-value from Cafasso_109 (e: 2e-44, id: 98.61%, cov: 100%). Phagesdb Function Frequency has no available data. One weak CDD hit, phosphate ABC transporter ATP-binding protein (e: 2.05e-03, id: 6.34328%, cov: 66.6667%). Many HHpred hits but all not significant with the lowest three e-values of 0.89, 0.95, 1.1. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Gonzalez, Celio /note=Secondary Annotator QC: In full agreement with evidence and conclusion. Great job! CDS 61131 - 61493 /gene="109" /product="gp109" /function="hypothetical protein" /locus tag="ObLaDi_109" /note=Original Glimmer call @bp 61131 has strength 15.17; Genemark calls start at 61131 /note=SSC: 61131-61493 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein SEA_CAFASSO_110 [Gordonia phage Cafasso]],,NCBI, q2:s1 99.1667% 1.13473E-76 GAP: 2 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.925, -2.9063687850157054, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_110 [Gordonia phage Cafasso]],,QXN74325,99.1597,1.13473E-76 SIF-HHPRED: SIF-Syn: NKF protein, upstream gene is NKF protein, downstream is NKF protein, just like in phage Cafasso. /note=Primary Annotator Name: Jin, Katherine /note=Auto-annotation: Both Genemark and Glimmer. Both agreed on the same start site: 61131. Site # 133. (Starterator and PhamDB data suggests the start site is 61134, which is not the same as Glimmer and GeneMark calls). GTG start codon. /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF. The chosen start site covers all this coding potential. /note=SD (Final) Score: The gene called by GeneMark and Glimmer is not the LORF. Final score is -3.673 with a 2.925 Z-score as noted by Starterator evidence. /note=Gap/overlap: 2bp, there is no overlap and gap is small, there may be a operon here. /note=Phamerator: The pham number was 33579 as of 10/22/21. The gene is conserved in Cafasso, which is also part of the DZ cluster. /note=Starterator: Start site 3 was the most annotated start number that was called for 1/1 non-draft phages. This does NOT correspond to the 61131 start site. Starterator uses 61134. Unclear why; 61131 has better stats and closes the gap. /note=Location call: From the evidence from GeneMark and Starterator this gene is real and it`s most likely start site is 61131. /note=Function call: NKF, all databases PhagesDB and NCBI BLAST show hits that suggest no known function for this gene. Both CDD and HHpred did not have informative hits for this gene; both had very high e-values and low coverage. NCBI BLAST gave one good hit with low e-value at 1e-76 and 100% coverage for the Cafasso hypothetical protein sequence. /note=Transmembrane domains: TMHMM did not predict any TMDs for this gene. Topcons did predict one TMD, however this is not enough evidence to conclude that this gene is a membrane protein. /note=Secondary Annotator Name: Ostroske, Elyse /note=Secondary Annotator QC: I have QC`d this location call and agree with the first annotator. CDS 61490 - 61705 /gene="110" /product="gp110" /function="hypothetical protein" /locus tag="ObLaDi_110" /note=Original Glimmer call @bp 61511 has strength 13.07; Genemark calls start at 61511 /note=SSC: 61490-61705 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_111 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 3.60247E-44 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.529, -5.627704621960157, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_111 [Gordonia phage Cafasso]],,QXN74326,100.0,3.60247E-44 SIF-HHPRED: SIF-Syn: There is are known functions for the genes upstream and downstream of gene #110 (83,215-83,478) in ObLaDi`s genome. Likewise, there are no function predicted for Cafasso`s corresponding downstream and upstream genes. /note=Primary Annotator Name: Montoya Serpas, Cinthya /note=Auto-annotation: The start site is 61511 per Glimmer and GeneMark. /note=Coding Potential: There is good coding potential that spans the start site and the stop site of this gene. /note=SD (Final) Score: -5.628. This is not the best final score as it is the second smallest number. However, this is the most reasonable LORF among all the options due to the presence of an operon. /note=Gap/overlap: there is a 4 bp overlap with the gene upstream which indicates that this gene is part of an operon. This overlap is very reasonable and it serves for strong evidence for this gene being a real gene. The gene length for this gene is 216 which is consistent with the other two members of pham #33427. /note=Phamerator: This gene is found in pham #33427 as of 10/26/21. This pham is conserved in other members of the DZ cluster as it has been manually annotated in Cafasso, which was used for comparison. As of 10/36/21, there is no known function available for this gene. /note=Starterator: The most reasonable start site for this gene is #1 at 61490 as it is conserved in Cafasso and Aleemily phages. It results in a gene length of 216 bp which is consistent with Cafasso and Aleemily. There are currently 3 members in this pham, with 2 of them being draft genes (ObLaDi and Aleemily) . Both Cafasso and Aleemily call start site #1 the most annotated start site. /note=Location call: 61490; Altogether, start site #1 @ 61490 is the best start site for this gene because it results in the LORF in spite of not having the best Z-score and Final score. This start site is conserved among other members of the same pham which serves as strong evidence for this gene being real. Additionally, start site 1 at 61490 covers all coding potential in the genemark map. /note=Function call:Based on the BLAST results for this sequence, there is not enough evidence to form a hypothesis regarding the function of gene # 110. There are no phages other than Cafasso that contain reasonably high identity values. According to PhagesDB, the identity percentage for Cafasso is 100%, the e-value is 1e-34, and the score is 143 bits. Similarly, NCBI reports the same identity percentage of 100%, a much lower e-value at 4e-44, and a slightly higher score of 146. The second highest hit corresponds to phage CloverMinnie which has worse values when compared to Cafasso. Phages DB reports a score of 31.2 bits, an e-value of 0.74, and an identity value of 56% which tells us that ObLaDi_110 contains a very different sequence when compared to other phages of different clusters. Although there aren’t any hits that have functions and have reasonable e-values, Cafasso_111 is a very good match to ObLaDi_110 which indicates that this is in fact a real gene. There are no hits on the NCBI conserved domain database and the hits generated by HHpred are not very good as all of them have high e-values. The first hit PF08894.13 corresponds to a protein of unknown function but it has the highest probabilities of all entries (75.1) and a high e-value of 4.8 which is not within the desired threshold of 10e-3. The second hit 6QJA-A corresponds to a nuclear mitotic apparatus protein which has the second highest probability at 60.69 and a smaller e-value at 1.6 which is still to high to be considered strong evidence. /note=Transmembrane domains: There are no predicted transmembrane domains for this sequence according to TOPCONS and HHPRED. Therefore, protein is not a TMP. /note=Secondary Annotator Name: Liao, Shiqing /note=Secondary Annotator QC: I agree with the primary annotator. CDS 61698 - 62201 /gene="111" /product="gp111" /function="hypothetical protein" /locus tag="ObLaDi_111" /note=Original Glimmer call @bp 61698 has strength 13.42; Genemark calls start at 61698 /note=SSC: 61698-62201 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_112 [Gordonia phage Cafasso]],,NCBI, q1:s1 95.8084% 1.83782E-82 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.153, -4.411422009034556, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_112 [Gordonia phage Cafasso]],,QXN74327,83.6364,1.83782E-82 SIF-HHPRED: SIF-Syn: No known function, upstream gene is in pham 33427, downstream gene is in pham 33109, just like Cafasso /note=Primary Annotator Name: Semaan, Sasha /note=Auto-annotation: Host GeneMark and Self GeneMark were used for analysis. Glimmer was used with GeneMark to compare the auto-annoyed start sites which are the same between the two. Start site is 61698 while the stop site is 62201. /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF and the predicted start site is covered in coding potential. /note=SD (Final) Score: -4.411; Other suggested start sites did not have better final scores in comparison to the start site at 61698. Additionally, the length with this start site is the longest and has the most ideal gap. /note=Gap/overlap: There is an overlap of 8bp with the gene before (may be something to keep an eye out for). /note=Phamerator: Gene found in Pham 33631 as of 10/27/21. No function called and only one other non-draft gene in phage Cafasso is present in this pham. /note=Starterator: Start site number is conserved in our gene and in gene Cafasso (Start Site #1). Start site #1 of my gene is at base pair 61698. 1/1 genes called start site #1. /note=Location call: Gene covers all coding potential and start site number is conserved in gene Cafasso which is the same cluster as this gene. Length of ORF and overlap is also most ideal with start site listed in Starterator as well as the auto-annotated start site suggested in GeneMark and Glimmer. This suggests that start site is at 61698bp. /note=Function call: Function unknown; All of the hits in PhagesDB Blast with an acceptable e-value and identity, that are not draft genes, do not have a function listed (labeled as “function unknown”). The NCBI Blast had similar data where all the genes listed with an acceptable e-value and identity only have hypothetical proteins listed. PhagesDB database and the NCBI database did not provide any evidence to support a function call; on both, all the hits of non-draft genes were “function unknown”. There were no hits on CDD and the hits on HHpred were all hits with functions called for bacterial cell division, which does not align with this ORF. /note=Transmembrane domains: No transmembrane domains. There were no hits off TmHmm and Topcon. The gene does not have a hypothesized function thus far so having no transmembrane domains is not inconsistent with the no known function. /note=Secondary Annotator Name: Thorp, Jocelyn /note=Secondary Annotator QC: I agree with the location and function calls per the evidence listed above. Make sure to select evidence that this is a gene on the BLAST results- even if the evidence provides no help as to a function call. CDS 62202 - 62372 /gene="112" /product="gp112" /function="hypothetical protein" /locus tag="ObLaDi_112" /note=Original Glimmer call @bp 62202 has strength 10.44; Genemark calls start at 62202 /note=SSC: 62202-62372 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_113 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 6.32807E-32 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.266, -6.166667396375102, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_113 [Gordonia phage Cafasso]],,QXN74328,100.0,6.32807E-32 SIF-HHPRED: SIF-Syn: NKF, Upstream and downstreams genes are also NKF however have similar Phams as Cafasso. Gene order does not line up exactly with Cafasso, however the same pham is found on 113 for Cafasso while ObLaDi has it at 112. This lines up with the adjacent genes having similar phams as well. /note=Location call: 62202, supported by Glimmer/Genemark and Starterator (Cafasso). Smallest reasonable gap. /note=Function call: Unknown /note=Transmembrane domains: None /note=Secondary Annotator Name: Baughman, Lexie /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator. I have also QC`ed this function call and agree with the first annotator. CDS complement (62369 - 62764) /gene="113" /product="gp113" /function="hypothetical protein" /locus tag="ObLaDi_113" /note=Original Glimmer call @bp 62725 has strength 11.63; Genemark calls start at 62764 /note=SSC: 62764-62369 CP: yes SCS: both-gm ST: NA BLAST-Start: [hypothetical protein SEA_CAFASSO_114 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 4.17278E-90 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 0.481, -7.77805343054184, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_114 [Gordonia phage Cafasso]],,QXN74329,100.0,4.17278E-90 SIF-HHPRED: SIF-Syn: Upstream gene (stop 62372; pham 33109) and downstream gene (stop 62761; pham 71450) have no known function. Conserved genes in phage Cafasso (from the same cluster) also do not have called functions. /note=Primary Annotator Name: Alvarez, Alondra /note=Auto-annotation: Glimmer and GeneMark disagree on the start site. Glimmer calls the start at 62725 while GeneMark calls the start at 62764. /note=Coding Potential: There is good coding potential for the Host-Trained and Self GeneMark. Both typical and alternate coding potential are present. The start at 62764 called by GeneMark seems to include all of the coding potential while the start at 62725 called by Glimmer does not. /note=SD (Final) Score: -2.807 belonging to start at 62725 is the best RBS final score out of all of the potential start sites. However, the chosen start site is 62764 and it’s RBS final score is -7.778 with a z-score of 0.481. /note=Gap/overlap: The chosen start site is at 62764. It has an overlap of 4 bp, possibly indicating that the gene is part of an operon. /note=Phamerator: Gene belongs to pham 5117. Of the 6 members, 4 are non-draft genes. They are Bantam (DL), Cafasso (DZ), DatBoi (DL) and SpeedDemon (DL). /note=Starterator: Start number 3 is the most conserved start site in the pham; it is annotated in 4 of the 4 non-draft genes. However, the starterator start site for my gene is number 4 at position 62725. /note=Location call: Based on all of the evidence, this is a real gene. The chosen start site is at 62764. While this start site did not have the best RBS final score, the coding potential inclusion and overlap support it as the start site. The low RBS score may be due to the overlap with the gene before it, due to it being part of an operon. /note=Function call: Possible toxin or Fic family protein, but more evidence needed. Three of the significant hits on phages DB BLASTp with e-values of 9x10^-20, 4x10^-21, and 3x10^-22 are toxins, however they have 44-45% identities with the gene of interest. NCBI BLASTp also had significant hits that correspond to those of phages DB suggesting toxin function. These hits had >94% coverage but had identities around 50%. Other significant hits in NCBI with greater identity percentages >55% and >95% query coverage suggest the gene to be a fic family protein.. Neighboring antitoxins for toxin/antitoxin system to be investigated. /note=Transmembrane domains: No transmembrane domains (TMD) called by TMHMM. No TMD called by any of the programs in TOPCONs. Gene is not a membrane protein. /note=Secondary Annotator Name: Krug, Kelley /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS complement (62761 - 63006) /gene="114" /product="gp114" /function="hypothetical protein" /locus tag="ObLaDi_114" /note=Original Glimmer call @bp 63009 has strength 9.37; Genemark calls start at 63009 /note=SSC: 63006-62761 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_115 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.59187E-51 GAP: 54 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.925, -2.827683592113848, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_115 [Gordonia phage Cafasso]],,QXN74330,100.0,1.59187E-51 SIF-HHPRED: SIF-Syn: NKF (pham 71450), upstream gene is ADP-ribosyl glycohydrolase (pham 56397), downstream is pham 5117, just like in phage Cafasso. /note=Primary Annotator Name: Baughman, Lexie /note=Auto-Annotation: Glimmer and GeneMark. Both agree on the same start site of 63009, with a start codon of ATG. /note=Coding Potential: Reasonable coding potential predicted within the putative ORF. Chosen start site of 63009 covers all of the coding potential. /note=SD (Final) Score: The SD score is not the best (only one potential start site has a better SD score), but it is still reasonable to suggest the presence of a credible ribosome binding site. /note=Gap/Overlap: There is a 51 base pair gap between this gene and the upstream gene, which may be cause for concern. However, this start site creates the longest ORF and the length of the gene is acceptable. /note=Phamerator: As of 10/24/2021, the gene is found in Pham 71450. The pham is conserved in other members of the cluster - comparison was done between ObLaDi and Cafasso, as this was the only non-draft genome available. The function is not called by either Phamerator or PhagesDB. /note=Starterator: The “Most Annotated” start site (2) is present in both genes in this pham, but it is only manually annotated in one non-draft genome (Cafasso). This start site corresponds to position 63006 in ObLaDi. Start site 1, which is the auto annotated start site at position 63009, has no manual annotations. /note=Location Call: The gathered evidence suggests that this is a real gene, and the potential start site of 63006 seems most likely. While it widens the gap between this gene and the upstream one by 3 base pairs, it meets all of the other criteria in Module 3 and was manually annotated in a non-draft genome. /note=Function Call: The top 2 NCBI BLASTp hits suggested function is hypothetical protein with high query coverage (100%), high % identity (73% and 100%), and low e-values (<3e-31). The top PhagesDB BLASTp hit suggested function is unknown, with high % identity (100%) and a low e-value (<1e-40). The other hits in PhagesDB are on draft genes or have high e-values. Similarly, the CDD and HHpred hits were uninformative, with very high e-values and low probabilities and coverages. As such, there does not seem to be enough evidence to call the function of this gene. /note=Transmembrane Domains: No predicted transmembrane domains. /note=Secondary Annotator Name: Sheppy, Tyler /note=Secondary Annotator QC: I agree with both the location and function call. Make sure to fill out the synteny box. CDS complement (63061 - 64269) /gene="115" /product="gp115" /function="ADP-ribosyl glycohydrolase" /locus tag="ObLaDi_115" /note=Original Glimmer call @bp 64269 has strength 14.13; Genemark calls start at 64269 /note=SSC: 64269-63061 CP: yes SCS: both ST: SS BLAST-Start: [ADP-ribosyl glycohydrolase [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.758, -3.7606820078281067, no F: ADP-ribosyl glycohydrolase SIF-BLAST: ,,[ADP-ribosyl glycohydrolase [Gordonia phage Cafasso]],,QXN74331,99.7512,0.0 SIF-HHPRED: ADP-ribosyl-(Dinitrogen reductase) hydrolase; Glycohydrolase, METAL BINDING PROTEIN; HET: ADP; 1.55A {Azospirillum brasilense} SCOP: a.209.1.0,,,5OVO_A,84.3284,100.0 SIF-Syn: Function is ADP-ribosyl glucohydrolase, upstream gene (pham 33168) is DNA binding protein , and the gene dowstream (83921) is also DNA binding protein just like Cafasso /note=Primary Annotator Name: Dooley, Naomi /note=Auto-annotation: Called by Glimmer and Gene mark to the same start site at 64269 with an ATG start codon (very common). /note=Coding Potential: The gene has reasonable coding potential and the start site covers all coding potential. /note=SD (Final) Score: The SD score for 64269 is -3.761 which is the third best of the scores listed. The other higher scores all have unreasonable gaps listed and therefore are not likely to be the correct start site despite their high scores. /note=Gap/overlap: There is a reasonable gap of -1 (could indicate an operon). The gene itself also has a reasonable length of 1209 bp. /note=Phamerator: The Pham number as of 10/28/21 is 56397 and is also found in Cafasso, contained in the same cluster. The function for this gene in Cafasso was ADP-ribosyl glycohydrolase which could indicate a similar function in Obladi. /note=Starterator: The gene in this case contains the most conserved start number at 17 but instead calls 10 which agrees with the auto annotated start site. Start number 10 was found in 2 of 13 genes while 17 was found in 7 of 13. /note=Location call: I believe this is a real gene with start site at 64269. Although this start site is not the most conserved in starterator, a start site at the most conserved site called by starterator at 64098 does not have sufficient evidence supporting this site, the evidence supporting 64269 is much stronger. /note=Function call: Based on the data shown I believe that the function of this gene is ADP-ribosyl glycohydrolase. This is because when we compare the gene hits in phagesDB Cafasso (E-value 0) and Boopy (E-value 8E-77) have this function listed. Similarly the top four NCBI hits have this function (Cafasso: 100% coverage, 99% identity, and E-value 0). HHpred had multiple hits for ADP-ribosyl glucohydrolase with 100% probability, 80%+ coverage, and E-values between 1E-32 and 1E-34). CDD contains a hit with a high e-value that indicates that this protein is conserved and has the ADP-ribosylglycohydrolase function conserved. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein /note=Secondary Annotator Name: Niazmandi, Kiana /note=Secondary Annotator QC: I agree with the function and the start site. fill out the synteny box CDS complement (64269 - 64475) /gene="116" /product="gp116" /function="hypothetical protein" /locus tag="ObLaDi_116" /note=Original Glimmer call @bp 64475 has strength 12.68; Genemark calls start at 64475 /note=SSC: 64475-64269 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_CAFASSO_117 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 3.50951E-42 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.271, -5.208452476010064, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_117 [Gordonia phage Cafasso]],,QXN74332,98.5294,3.50951E-42 SIF-HHPRED: SIF-Syn: NKF. ObLaDi genes 115, 116, and 117 show synteny with Cafasso genes 116, 117, 118, respectively. ObLaDi gene 115 has the function ADP-ribosyl glycohydrolase, and Cafasso gene 116 also has the function ADP-ribosyl glycohydrolase. ObLaDi gene 117 has NKF (pham 32705), and Cafasso gene 118 has no function noted (pham 32705). This gene, ObLaDi gene 116 (pham 83893) has NKF, and Cafasso gene 117 has no function noted (pham 83893). ObLaDi gene 115 and 117 are downstream and upstream of gene 116, respectively. /note=Primary Annotator Name: Wu, Meigan /note=Auto-annotation: Glimmer and GeneMark. Both give a start site at 64475. Start codon: ATG. /note=Coding Potential: No coding potential is found with GeneMark Host. There are two typical coding potentials and one alternate coding potential found with GeneMark Self in two reverse frames. /note=SD (Final) Score: -5.208 (best final score on PECAAN) /note=Gap/overlap: Overlap = 1; this gene may be a part of an operon. /note=Phamerator: pham 80655. Date: 10/26/2021. Conserved: also found in Cafasso (DZ). No function predicted by Phamerator. /note=Starterator: Start number 91 was manually annotated in 4/242 non-draft genes for pham 80655. The respective start position is at 64475 bp. This data matches with the Glimmer and GeneMark start site call. /note=Location call: This gene is likely to be a real gene with a start site at 64475. /note=Function call: NKF. Support: one hit with a small e-value found using NCBI BLASTp (4e-42), and two hits with small e-values found using PhagesDB BLASTp (1e-37 & 3e-8). CDD provided no hits. HHpred yielded no informative hits. /note=Transmembrane domains: No transmembrane domains were found with TMHMM or Topcons. This gene does not code for a transmembrane protein. /note=Secondary Annotator Name: McLinden, Katherine /note=Secondary Annotator QC: I would re-evaluate the coding potential piece of it. I am unsure of how the absence of coding potential or varying coding potential between Glimmer and GeneMark should be weighted since the information given in Starterator and suggested start sites by Glimmer and Genemark suggest the start site is correct. Second annotation looks good! I agree with all of the information provided. I would look at the synteny of Aminay as well since it was used for your NKF evidence. CDS complement (64475 - 64600) /gene="117" /product="gp117" /function="hypothetical protein" /locus tag="ObLaDi_117" /note=Original Glimmer call @bp 64600 has strength 19.72; Genemark calls start at 64600 /note=SSC: 64600-64475 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_CAFASSO_118 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 7.93131E-20 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.013, -6.827750065480549, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_118 [Gordonia phage Cafasso]],,QXN74333,100.0,7.93131E-20 SIF-HHPRED: SIF-Syn: NKF, upstream gene is of pham 33439, downstream gene is of pham 82002, just like in Cafasso. /note=Primary Annotator Name: Abuwarda, Manar /note=Auto-annotation: Glimmer and Genemark. Both call the start site at 64600. /note=Coding Potential: The ORF has reasonable coding potential. Coding potential is found in both Genemark Self and Host. The chosen start site includes all the coding potential. /note=SD (Final) Score: -6.828. This is the most negative final score on PECAAN, however this gene is likely part of an operon as it has a -4 bp overlap with the upstream gene. Since it is part of an operon, it does not have a RBS directly upstream itself. /note=Gap/overlap: - 4 bp. This is the smallest gap of all the possible start sites on PECAAN and is also typical of a gene in an operon. There is also the presence of the ATGA sequence directly upstream of the gene which occurs in an operon when the stop codon of the upstream gene and start codon of the gene overlap. This overlap is also conserved in the Cafasso genome. /note=Phamerator: Pham 32705. Date 10/22/21. It is conserved and found in Cafasso (DZ). /note=Starterator: There is only 1 non-draft member of this pham, but the pham has 3 total members. Start site 3 was manually annotated in this non-draft phage. Start site 3 is 64600 in ObLaDi. This evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this gene is real with a start site likely at 64600. /note=Function call: NKF. phagesDB and NCBI BLAST show no phage hits with known function. CCD and HHPRED only show phage hits with large e-values. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Turon Font, Guillem /note=Secondary Annotator QC: Very detailed and comprehensive notes. Start site seems good according to CP map. Synteny is maintained and there are no excessive gaps. Good job :)) CDS complement (64597 - 64809) /gene="118" /product="gp118" /function="hypothetical protein" /locus tag="ObLaDi_118" /note=Original Glimmer call @bp 64809 has strength 12.1; Genemark calls start at 64809 /note=SSC: 64809-64597 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_119 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.11241E-43 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.992, -4.820269098574683, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_119 [Gordonia phage Cafasso]],,QXN74334,100.0,1.11241E-43 SIF-HHPRED: SIF-Syn: No known function, upstream gene is pham 33648, downstream gene is pham 32705, just like in phage Cafasso, but the function for those genes are not identified in cafasso /note=Primary Annotator Name: Batteikh, Maysaa /note=Auto-annotation: Glimmer and Gene mark both called 64809 as the start site /note=Coding Potential: The ORF has coding potential which is seen in the host and self genemark. The chosen start site, 64809 has all the coding potential for the gene. /note=SD (Final) Score:Final score of -4.820 is a reasonable option as well as a z-score of 1.992 /note=Gap/overlap: The overlap between the previous and upstream gene is reasonable with it being -4 bp, which is most likely an operon /note=Phamerator: As of 10.21.2021, the gene belongs to pham 33439. The pham is conserved in another phage, Cafasso, which belongs to the same cluster, DZ. Cafasso was used for comparison since it is the only non-drafted phage that is in pham 33439. /note=Starterator: The start site that is conserved among the members of the pham in which the gene belongs to is 64809. The start number for the phage is 3, with 64809bp. There are 2 other members in this pham, making it a total for 3 members overall, two draft phages and one non draft phage. All three pages call the most conserved start site,3. /note=Location call: The overall evidence suggests that the gene is a real gene with the auto annotated start site, 64809, is the best start site for this gene. The gene is seen to be conserved in the phamerator and has good coding potential, which means it is a real gene. The start site of 64809 contains all the coding potential for the gene and is seen to be conserved in starterator. /note=Function call: No informative data was provided by CDD, HHpred, PhagesDB BLASTp and NCBI, therefore, no known function The only NCBI Blastp hit suggests that the function is a hypothetical protein, with high query coverage (92%), strong percent identity (100%) and low e-value (1e-43). The top phagesDB BLASTp hit suggested function is unknown with high percent identity (100%) and low e-values (4e-35). The other hits in the PhagesDB are non draft genes or with high e-values, therefore there does not appear to be enough evidence to call a function for this gene. /note=Transmembrane domains: Neither TMHMM or TOPCONS predicted any TMDs, therefor it is not a membrane protein /note=Secondary Annotator Name: Uvarov, Evgeniy /note=Secondary Annotator QC: I have QC’ed this gene and agree with the first annotator based on the evidence provided. Note: fill out the "Function:" box and the "Synteny:" box. Everything else looks great! CDS complement (64806 - 65120) /gene="119" /product="gp119" /function="hypothetical protein" /locus tag="ObLaDi_119" /note=Original Glimmer call @bp 65120 has strength 15.41; Genemark calls start at 65120 /note=SSC: 65120-64806 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_CAFASSO_120 [Gordonia phage Cafasso]],,NCBI, q2:s1 99.0385% 4.00155E-67 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.175, -2.394485424036831, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_120 [Gordonia phage Cafasso]],,QXN74335,100.0,4.00155E-67 SIF-HHPRED: SIF-Syn: NKF, upstream gene is a membrane protein. In Cafasso, there is no annotated function for the upstream gene, but the two genes are both part of Pham 71050. Downstream is NKF and in Pham 33439, just like in phage Cafasso. /note=Primary Annotator Name: Kamarzar, Minehli /note=Auto-annotation: Glimmer and GeneMark were used and both agreed on the same start site. The called start codon is 65120. /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF. The chosen start site covers all the coding potential. /note=SD (Final) Score: The SD score of -2.394 is the best option and the z-score is the highest at 3.175. /note=Gap/overlap: The 4 base pair overlap with the upstream gene is reasonable, indicating this gene is part of an operon. The length of the gene (315 bp) is acceptable given the auto-annotated start site. /note=Phamerator: As of October 20, 2021, the gene is found in pham 33648. The gene is conserved in Phage Cafasso which belongs to the same cluster (DZ) as Phage ObLaDi. The phage used for comparison was Phage Cafasso. There was no function call for this gene. /note=Starterator: The start site choice that is conserved among the members of the pham in which this gene belongs is start site 65120 which is start number 1. There is 1 non-draft member and 2 draft members in this Pham and 1/1 non-draft members call start site 2; however, the start site that made most sense for this gene is 1. /note=Location call: The gathered evidence strongly suggests that the original start site call at 65120 by Glimmer and Genemark is reasonable and it is the potential start site candidate that seems most likely. In addition, it also suggests that the gene is a real gene. /note=Function call: PhagesDB BLAST and NCBI BLASTp have hits with small E-values, but do not suggest what the function of this gene is. PhagesDB BLAST gave hits with E-values of e-56, while NCBI BLASTp gave E-values of e-67. There was one valid phage to compare with, Cafasso_120, in which the NCBI BLASTp and PhagesDB BLAST hit sorted by E-value showed high identity values (100%) and >99% query coverage. All other hits had extremely low scores or were draft genes. HHpred resulted in hits that were not useful due to very high E-values. CDD resulted in one hit of a stalled ribosome rescue protein Dom34 from the PelA super family (E-value of 8.67e-03). The function of this gene is not known. /note=Transmembrane domains: TMHMM and TOPCONS did not predict any TMDs, which indicates that it is not a membrane protein. /note=Secondary Annotator Name: Thorp, Jocelyn /note=Secondary Annotator QC: Considering all of the evidence presented, I agree with the location and functional calls above. CDS complement (65117 - 65626) /gene="120" /product="gp120" /function="membrane protein" /locus tag="ObLaDi_120" /note=Original Glimmer call @bp 65626 has strength 12.09; Genemark calls start at 65602 /note=SSC: 65626-65117 CP: yes SCS: both-gl ST: SS BLAST-Start: [membrane protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 4.9392E-113 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.563, -3.586741874491512, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Gordonia phage Cafasso]],,QXN74336,100.0,4.9392E-113 SIF-HHPRED: SIF-Syn: Membrane protein, upstream gene is pham 33648, downstream gene is pham 33168, just like in phage Cafasso (although in Cafasso the gene function is not specified) /note=Primary Annotator Name: Krug, Kelley /note=Auto-annotation: Glimmer says start site 65626 (ATG), GeneMark says 65602 (GTG) /note=Coding Potential: Coding potential is on frame 4, indicating reverse gene, found on both GeneMark Host and Self but they differ on start site. I think 65626 makes more sense since it includes the full gene. /note=SD (Final) Score: -3.587, it is the best final score, has Z-score of 2.563 /note=Gap/overlap: -1 bp gap, so slight overlap, likely part of an operon /note=Phamerator: Pham is 71050 as of 10/22/2021, conserved in Cafasso, most of the genes in this pham belong to cluster A while ours belongs to cluster DZ. /note=Starterator: Most annotated start number is 48, ObLaDi doesn’t have this start site. ObLaDi has start site number 53 which is found in 3 of 407 genes in the pham, called 100% of the time when present, it correlates to a start site of 65626 bp for ObLaDi. /note=Location call: Due to the evidence above, this is likely a gene, and the start site of 65626 seems most likely due to including all coding potential, -1 bp gap, and having better Z-score and final score than start site 65602 /note=Function call: The CDD returned a Pfam with low e value but unknown function. HHpred returned proteins with unknown functions. However, NCBI blastp showed a great match with phage Cafasso membrane protein (100% identity, 100% coverage, 100% aligned, and e-value of 4.94e-113). Although the results from CDD and HHpred left much to be desired, I think the result from NCBI blastp is sufficient to identify this gene’s function as a membrane protein. /note=Transmembrane domains: TMHMM predicted 4 TMDs and all the TOPCONS programs predicted 4 TMDs as well. Due to the results of TMHMM, TOPCONS, a and previous NCBI blastp result, it is likely that the gene encodes a membrane protein. /note=Secondary Annotator Name: Empson, Brianna /note=Secondary Annotator QC: I agree with your final call, but I think your Phamerator notes are lacking. I think you need to add more info on clusters. I also think you need to add something about how the -1 bp overlap is a good indicator of this being the better start site bc -1 bp overlaps are better than -304 bp overlaps. This should be added to the gap section and explains why you would pick this start site over the LORF one. /note=Note from the third annotator (Amanda Chai): Excise was a function called since it has a relatively high function frequency and since this phage has functions that also required an excise gene, we decided to call this excise function. CDS complement (65626 - 65766) /gene="121" /product="gp121" /function="DNA binding protein" /locus tag="ObLaDi_121" /note=Original Glimmer call @bp 65766 has strength 9.79; Genemark calls start at 65730 /note=SSC: 65766-65626 CP: yes SCS: both-gl ST: SS BLAST-Start: [DNA binding protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 2.21364E-23 GAP: 81 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.175, -2.394485424036831, yes F: DNA binding protein SIF-BLAST: ,,[DNA binding protein [Gordonia phage Cafasso]],,QXN74337,100.0,2.21364E-23 SIF-HHPRED: Uncharacterized protein; zinc finger, UNKNOWN FUNCTION; HET: ZN; NMR {Caulobacter crescentus},,,2NB9_A,67.3913,95.1 SIF-Syn: DNA binding protein, upstream gene is a membrane protein, downstream has no known function and is in pham 20935. Both of the upstream and downstream genes in Cafasso have no known function. /note=Primary Annotator Name: Lee, Adrienne /note=Auto-annotation: GeneMark and Glimmer were both used but they have different start sites. GeneMark called the start site at 65730 and Glimmer called the start site at 65766. /note=Coding Potential: There is coding potential in both GeneMark and Glimmer and the coding potential seems to cover the whole ORF. /note=SD (Final) Score: The final score is -2.394 which is the best score out of all the gene candidates since it is the least negative. /note=Gap/overlap: There is a gap of 81 base pairs which is not the most ideal but it is also not that large. This gap is conserved in the other phage, Cafasso. /note=Phamerator: This gene is part of Pham 33168 as of October 21, 2021. There are only 3 members in this pham and they are Cafasso_122 and Aleemily_Draft_120 that are both part of cluster DZ. /note=Starterator: The most conserved start site is start site 1 at 65766. This is called in all 3 genes in the pham. The most conserved site is also the auto-annotated start site for ObLDi. /note=Location call: Based on the evidence above, this is a real gene and the start site is 65766. /note=Function call: DNA binding protein: both NCBI Blastp and Phagesdb Blast had low e-values (2.2e-23 and 3e-20) associated with the DNA binding protein function. The NCBI Blastp also had 100% identity, alignment, and coverage. HHpred had week hits for DNA-binding functions. Cafasso and Aleemily, phages in the same cluster, have the same function for the same gene. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Krug, Kelley /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS complement (65848 - 66243) /gene="122" /product="gp122" /function="hypothetical protein" /locus tag="ObLaDi_122" /note=Original Glimmer call @bp 66243 has strength 20.31; Genemark calls start at 66243 /note=SSC: 66243-65848 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_CAFASSO_123 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 7.69971E-90 GAP: 75 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.065, -2.4787699911121788, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_123 [Gordonia phage Cafasso]],,QXN74338,100.0,7.69971E-90 SIF-HHPRED: SIF-Syn: NKF, upstream gene is DNA binding protein, downstream is NKF, just like in Cafasso /note=Primary Annotator Name: Janelle Tricia Magaling /note=Auto-annotation: Glimmer and Genemark. Both agree on the start site 66243. /note=Coding Potential: Coding potential is in one reverse ORF. There is also typical and atypical coding potential shown in GeneMark Self and Host. /note=SD (Final) Score: -2.479. Best score and LORF. /note=Gap/overlap: 75 bp gap. A bit large, but the gap is conserved in Cafasso and is not large enough to insert another gene. /note=Phamerator: pham: 20935. Date: 10/25/2021. It is conserved; found in Cafasso (DZ) /note=Starterator: The most conserved start site was 26 which was called in 32/71 non-draft genes, but ObLaDi does not have this start. Start site 21 was found in 1/71 non-draft genes in the pham. Start 21 is 66243 in ObLaDi. This is supported by GeneMark and Glimmer. /note=Location call: Based on the previous evidence, this is a real gene with good coding potential and is conserved in Phamerator with the start site at 66243 which is supported by Glimmer and Genemark. /note=Function call: NKF. The top 10 NCBI blast hits and phagesDB blast hits all had unknown functions. There were no functions in the function frequency box. HHpred had good hits suggesting a membrane protein, but TMHMM predicted no TMPs. /note=Transmembrane domains: There were no hits on TMHMM. TOPCONS did suggest one TMD. HHpred also has hits for membrane proteins. /note=Secondary Annotator Name: Erfanian M., Kiana /note=Secondary Annotator QC: The information provided indicates that your call on projected start site 66243 is correct. CDS complement (66319 - 66723) /gene="123" /product="gp123" /function="hypothetical protein" /locus tag="ObLaDi_123" /note=Original Glimmer call @bp 66693 has strength 11.27; Genemark calls start at 66723 /note=SSC: 66723-66319 CP: yes SCS: both-gm ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_124 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 6.87034E-91 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.021, -5.271243305869519, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_124 [Gordonia phage Cafasso]],,QXN74339,100.0,6.87034E-91 SIF-HHPRED: SIF-Syn: NKF and gene is in Pham 32714, with downstream gene in Pham 33004 and upstream gene in Pham 20935, like Cafasso 124. /note=Primary Annotator Name: Ostroske, Elyse /note=Auto-annotation: Glimmer calls the start site as 66693 and GeneMark calls it as 66723. /note=Coding Potential: Coding Potential in this ORF is on the forward and reverse strands, but it is indicated as a reverse gene. Coding Potential is found in both GeneMark Self and Host. /note=SD (Final) Score: -5.271 (for GeneMark start site 66723). This is not the best final score, but is one of the better ones. /note=Gap/overlap: Overlap of 4 bp ( for 66723, likely an operon). By far the most reasonable gap/overlap of all the possible start sites. /note=Phamerator: Pham 32714 (as of 10/25/21). Found in Cafasso (DZ). /note=Starterator: Start Site 2 was manually annotated for Cafasso. Start 2 in ObLaDi is 66723 (GeneMark call). Start Site 3 was autoannotated in Starterator for ObLaDi and corresponds to 66693 (Glimmer call). /note=Location call: Based on the above evidence, this is a real gene and the most likely start is 66723. /note=Function call: NKF. There were no hits returned in CDD and no good hits found in HHpred (low probability, high e-values, many DUFs, etc.) I have checked Cafasso because of the low e-value and 100% identity, despite there being no known function. /note=Transmembrane domains: TMDs were not predicted by TMHMM or TOPCONS. /note=Secondary Annotator Name: Tenney, Megan /note=Secondary Annotator QC: I completely agree! Nice! You might want to add a little bit more detail and interpretation overall, just to be safe! CDS complement (66720 - 67106) /gene="124" /product="gp124" /function="hypothetical protein" /locus tag="ObLaDi_124" /note=Original Glimmer call @bp 67106 has strength 12.07; Genemark calls start at 67106 /note=SSC: 67106-66720 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_125 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 6.78595E-86 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.66, -5.501485936053402, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_125 [Gordonia phage Cafasso]],,QXN74340,99.2188,6.78595E-86 SIF-HHPRED: SIF-Syn: ObLaDi gene 124 shares synteny with Cafasso gene 125 and NiceHouse gene 108, neither of which have a known function (NKF). The genes upstream and downstream of this gene also have NKF, which further confirms that this gene may lack a known function. /note=Primary Annotator Name: Santos, Charysa /note=Auto-annotation: Glimmer and Genemark, they both start at 67106. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -5.501. It is the best final score on PECAAN. /note=Gap/overlap: -4. This means that there is very little overlap between the previous gene and this gene, indicating that there is unlikely to be a gene in between them. /note=Phamerator: pham: 33604. Date 10/24/2021. It is conserved; found in Cafasso, Aleemily, & ObLaDi. /note=Starterator: Start site 1 in Starterator was manually annotated in 1/1 non-draft genes in this pham. Start 1 is 67106 in ObLaDi. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Using the evidence I found, this is a real gene and the most probable start site is 67106. /note=Function call: Based on the BLAST, CDD, and HHpred results for this sequence, I believe that my gene has an unknown function because it does not have any significant hits or matches in the database that would confidently confirm a function. /note=Transmembrane domains: There were no TMD predictions on either TMHMM or TOPCONS, so we can conclude that this gene is not a membrane protein. /note=Secondary Annotator Name: Jin, Katherine /note=Secondary Annotator QC: I agree with the start site with the Starterator evidence. I also agree with the function call as there were no hits, and TMD results. Good job. CDS complement (67103 - 67306) /gene="125" /product="gp125" /function="hypothetical protein" /locus tag="ObLaDi_125" /note=Original Glimmer call @bp 67306 has strength 12.73; Genemark calls start at 67306 /note=SSC: 67306-67103 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_126 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.52977E-40 GAP: 167 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.163, -2.356391881054909, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_126 [Gordonia phage Cafasso]],,QXN74341,100.0,1.52977E-40 SIF-HHPRED: SIF-Syn: No known function (Pham: 33411, as of 11/30/2021), upstream gene does not have a known function (Pham: 33604, as of 11/30/2021), downstream gene does not have a known function (Pham: 32952, as of 11/30/2021), just like in Cafasso. /note=Primary Annotator Name: Sheppy, Tyler /note=Auto-annotation: Both Glimmer and GeneMark list the start site as 67306. /note=Coding Potential: There is high coding potential in the open reading frame and this potential is covered by the start site, according to both the Host-Trained GeneMark and Self-Trained GeneMark. /note=SD (Final) Score: The RBS final score is -2.356 and the Z-score is 3.163. These are the highest scores among the other potential start sites. /note=Gap/overlap: The gap is 167, which is quite large. There is no typical coding potential in the region upstream from the gene, which is past 67306 since it is a reverse gene. The gap could be explained by the switch from reverse coding genes to forward coding genes. /note=Phamerator: The gene is found in pham 33411, as of October 27, 2021. The pham is conserved in another member of the DZ cluster. Cafasso (DZ) was used for comparison since it was the only non-draft phage in the pham. The Phamerator did not have a function called for this gene. /note=Starterator: The start that was chosen by the only non-draft member in the pham and it is also the auto-annotated start site for this gene. The conserved start is start site number 4 and it corresponds to the coordinate 67306 on ObLaDi. 1 of 1 non-draft members in this pham call the most conserved start site. /note=Location call: Based on the evidence, this is a real gene and its start site is at 67306. /note=Function call: The only hit with a low e-value in both PhagesDB BLASTp and NCBI BLASTp does not have a known function. This hit comes from a gene in Cafasso and it has a coverage of 100%, identity of 100%, and an e-value of 1.52977e-40. CDD and HHpred were not informative. /note=Transmembrane domains: TMHMM and TOPCONS do not predict any transmembrane domains. /note=Secondary Annotator Name: Ostroske, Elyse /note=Secondary Annotator QC: I have reviewed the evidence and I agree with this location call CDS 67474 - 67875 /gene="126" /product="gp126" /function="hypothetical protein" /locus tag="ObLaDi_126" /note=Original Glimmer call @bp 67474 has strength 13.12; Genemark calls start at 67474 /note=SSC: 67474-67875 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_127 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.64556E-91 GAP: 167 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.065, -2.5410833118725082, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_127 [Gordonia phage Cafasso]],,QXN74342,99.2481,1.64556E-91 SIF-HHPRED: SIF-Syn: Upstream: Pham 33411, NKF. Downstream: Pham 33215, NKF. This matches phage Cafasso of Cluster DZ. /note=Primary Annotator Name: Stephenson, Juliet /note=Auto-annotation start source: Both GeneMark and Glimmer called this gene at the same start site. /note=Coding Potential: There was good coding potential called by GeneMark Self and Host. /note=SD (Final) Score: This site has the highest SD score of -2.541, which is good evidence that this is the correct start site. /note=Gap/overlap: The overlap is 167, which is reasonable. Phage Cafasso of Pham DZ also has a gap here, so I do not think there is a missing gene. There is no coding potential in the gap. Length 402 is reasonable. /note=Phamerator: as of 10/26/21, pham number 32952. conserved within small cluster. this pham is present in phage Cafasso of Cluster DZ. No function called. /note=Starterator: Reasonable start site conserved among members of the cluster. Start site 2 is most annotated, bp 67474, 3/3 members of this pham call this start site. However, starterator not informative because too many draft phages. /note=Location call: Due to the high Z-score and good RBS score, I think 67474 is the best start site for this gene. /note=Function call: NKF, this sequence is not a good match for any known proteins. It has no significant matches on HHPred or BLAST or CD. /note=Transmembrane domains: none /note=Secondary Annotator Name: Krug, Kelley /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. I think you should explain why the large gap is ok in this case (no coding potential in gap, Cafasso has no gene in gap either). CDS 67872 - 68198 /gene="127" /product="gp127" /function="hypothetical protein" /locus tag="ObLaDi_127" /note=Original Glimmer call @bp 67872 has strength 12.34; Genemark calls start at 67872 /note=SSC: 67872-68198 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_128 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.7006E-73 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.921, -4.905314844093283, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_128 [Gordonia phage Cafasso]],,QXN74343,100.0,1.7006E-73 SIF-HHPRED: SIF-Syn: NKF, the gene is of pham 33215, upstream is a gene of pham 32952 and downstream is a gene of pham 23932, just like in phage Cafasso. /note=Primary Annotator Name: Thorp, Jocelyn /note=Auto-annotation: Both Glimmer and GeneMark had 67872 as the start site for the putative gene in the forward direction, starting with an ATG codon (Methionine). /note=Coding Potential: This gene has good coding potential within the base pair range of the putative gene (67872 to 68198). The coding potential extends a little past the stop codon. /note=SD (Final) Score:-4.905. This is not the best final score, with start position 63635 having a score of -4.255, however that start site features a 241 base pair overlap with the previous gene, and both Glimmer and GeneMark call 67,872 as the start site. /note=Gap/overlap: There is a 4 bp overlap with the previous gene. This amount of overlap is preferred by the ribosome and supports this start site. /note=Phamerator: As of 10/22/2021 this gene is in pham 33215. This pham is conserved within the DZ subcluster, with Cafasso primarily used for comparison as it is the only non-draft gene within the cluster as of this time. There is no function currently called for genes of this pham. /note=Starterator: Start site 5 is the only start site within this pham selected for a non-draft gene. Start site 5 corresponds to 67,872 within ObLaDi. /note=Location call: Based on the evidence above, particularly the 4 bp overlap and good final/z scores, this is a real gene with the start at 67,872. /note=Function call: No known function. NCBI BLAST featured 4 hits with low e-values (all less than 10e-10). Percent identity ranged from 33% to 100 with coverage ranging from 78% to 100%. Hits had no known functions, but were hypothetical proteins. The best match was from a hypothetical protein in Cafasso (which is in the same subcluster as ObLaDi), with 100% identity and 100% coverage. The other hits were matching a hypothetical protein from multiple species of Rhodococcus. CDD and HHpred did not return informative hits. /note=Transmembrane domains: No TMDs are predicted by TMHMM or TOPCONS, therefore it is not a membrane protein. /note=Secondary Annotator Name: Zhuang, Chuzhi /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. I just want to add that -4 bp overlap could be an indication of an operon, so Z score and final score do not matter. CDS 68198 - 68392 /gene="128" /product="gp128" /function="hypothetical protein" /locus tag="ObLaDi_128" /note=Original Glimmer call @bp 68198 has strength 9.22; Genemark calls start at 68198 /note=SSC: 68198-68392 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_129 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 6.02397E-38 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.17, -4.455662434380964, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_129 [Gordonia phage Cafasso]],,QXN74344,100.0,6.02397E-38 SIF-HHPRED: SIF-Syn: NKF(from Pham 23932), upstream gene is from pham 33215, downstream is from pham 32818, just like in phage Cafasso /note=Primary Annotator Name: Zhuang, Chuzhi /note=Auto-annotation: both Glimmer and Genemark agree on the start site #68198, start codon ATG /note=Coding Potential: have reasonable coding potential, chosen start site cover all this coding potential /note=SD (Final) Score: -4.456, it is the best final score /note=Gap/overlap: -1, it is the most reasonable one. Although there is another option with 3-bp longer ORF, this option has a slightly better final score. The length is also reasonable. /note=Phamerator: pham number - 23932, date - 10/26/2021, the gene is conserved in other phages in DZ cluster, Cafasso is used for comparison. No function specified. /note=Starterator: The conserved start site in the pham is 10, but this gene does not have this start site. 1/2 called site 10. This gene called 8 with 1 MA’s and the corresponding basepair coordinate is at 68198. /note=Location call: real gene, start at #68198 /note=Function call: NKF, because no blast results reflect any known function, no CDD hits are found, and no HHpred hits are significant. No hit for TmHmm, suggesting that it is not a membrane protein. /note=Transmembrane domains: No hit for transmembrane domains. /note=Secondary Annotator Name: Cheng, Celine /note=Secondary Annotator QC: Great job! Please don`t forget to include your notes on Starterator; I would also say the Starterator was somewhat informative (suggested start) because you can reference Cafasso! CDS 68370 - 68744 /gene="129" /product="gp129" /function="hypothetical protein" /locus tag="ObLaDi_129" /note=Original Glimmer call @bp 68370 has strength 16.05; Genemark calls start at 68370 /note=SSC: 68370-68744 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_130 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 8.24609E-82 GAP: -23 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.937, -2.8030814177649614, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_130 [Gordonia phage Cafasso]],,QXN74345,99.1936,8.24609E-82 SIF-HHPRED: SIF-Syn: This gene is in Pham 32818. The upstream gene is in Pham 23932. The downstream gene is in Pham 32734. These are in in the same Phams as the corresponding genes in Cafasso. They all have NKF in ObLaDi although the downstream gene in Cafasso is a RuvC-like resolvase. /note=Primary Annotator Name: Fleming, Hanna /note=Auto-annotation: Glimmer and Genemark both call the gene with a start site of 68370 bp. /note=Coding Potential:There is good coding potential in both self-trained and host-trained genemark and the start codon does cover all of that coding potential. /note=SD (Final) Score: -2.803, this is the best and only score displayed on PECAAN. /note=Gap/overlap: There is a 23 bp overlap with the previous gene, this is a large overlap but is consistent with Cafasso. /note=Phamerator: Pham 32818 as of 10/25/21. This is also found in Cafasso. /note=Starterator: Start site 1 was found in 1/1 of non-draft genomes and 3/3 genes in the pham. This is bp 68370 in ObLaDi. This agrees with Glimmer and GeneMark. /note=Location call: This is likely a real gene with start site 68730 bp. /note=Function call: NKF. The function is unknown. There was one hit on NCBI BLAST with an e-value of 8e-82, 100% coverage and 99% identity but the function of this gene is also unknown. CDD had no specific hits. HHpred also had no hits with an e-value below 7. /note=Transmembrane domains: Not a membrane protein. TMHMM does not predict any transmembrane domains and neither does TOPCONS. /note=Secondary Annotator Name: Batteikh, Maysaa /note=Secondary Annotator QC: I agree with the start site called and the evidence presented. Note- Don`t forget to select the evidence for the function call on PhagesDB Blast and NCBIp CDS 68827 - 69342 /gene="130" /product="gp130" /function="RuvC-like resolvase" /locus tag="ObLaDi_130" /note=Original Glimmer call @bp 68827 has strength 9.27; Genemark calls start at 68827 /note=SSC: 68827-69342 CP: yes SCS: both ST: SS BLAST-Start: [RuvC-like resolvase [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.39813E-118 GAP: 82 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.161, -6.46371945197481, no F: RuvC-like resolvase SIF-BLAST: ,,[RuvC-like resolvase [Gordonia phage Cafasso]],,QXN74346,100.0,1.39813E-118 SIF-HHPRED: c.55.3.6 (A:) RuvC resolvase {Thermus thermophilus [TaxId: 300852]},,,d4ep4a_,85.3801,99.8 SIF-Syn: Shows syntney with Cafasso_131 which shows to have RuvC-like resolvase as its function. Portal, upstream gene is NFK, downstream is ribbon-helix-helix DNA binding protein, just like in Cafasso_131 /note=Primary Annotator Name: Gonzalez, Celio /note=Auto-annotation:Both Glimmer and Genemark call the start at 68827 with start codon as ATG. /note=Coding Potential:Coding potential is only found in forward strand therefore suggesting its a forward gene and was found in both Genehost self and host. /note=SD (Final) Score:-6.464 is the final score with z-score being 1.161 which is less than 2 therefore not providing strong evidence. /note=Gap/overlap: No overlap but has 94 gap. However, no new genes can be added to this gap because no coding potential is demonstrated in that region and it shows synteny with Cafasso_131 meaning this gap is not out of the normal. /note=Phamerator: Pham 32732 (10/29/2021). 4 members with 2 of them as drafts. Conserved in Cafasso DZ /note=Starterator: Conserved on 1 member and one draft as start site 19 with start as 68827. Agrees with glimmer /note=Location call: Start as 68827 as support by above evidence /note=Function call: RuvC-like resolvase. Both NCBI and phagesDB showed its synteny of Cafasso_131 to have an e-value of 1e-118 and 4e-96 respectively. /note=Transmembrane domains: No TMD`s were found for the gene which is accurate because RuvC-like resolvase doesn`t need transmembrane proteins. /note=Secondary Annotator Name: Semaan, Sasha /note=Secondary Annotator QC: Looks great! Agree with function call based on all the evidence provided. Add slight additional detail to synteny box according to guide CDS 69395 - 70054 /gene="131" /product="gp131" /function="hypothetical protein" /locus tag="ObLaDi_131" /note=Original Glimmer call @bp 69395 has strength 10.9; Genemark calls start at 69395 /note=SSC: 69395-70054 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_132 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 3.71622E-157 GAP: 52 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.164, -5.426906901365179, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_132 [Gordonia phage Cafasso]],,QXN74347,100.0,3.71622E-157 SIF-HHPRED: SIF-Syn: NKF, upstream is Pham 32734, downstream is Pham 32823, just like in phage Cafasso. /note=Primary Annotator Name: Paek, Brian /note=Auto-annotation: Both Glimmer and GeneMark agree that the start site is 69395 with a start codon of GTG.. /note=Coding Potential: There is high coding potential based on the middle frame going in the forward direction within the gene range for both host-trained and self-trained GeneMark. The start site covers all of the coding potential. /note=SD (Final) Score: The Final Score is -5.427, and the Z-score is 2.264, the Z-score is the best amongst the options, however, the RBS score is higher for a start site at 69629. /note=Gap/overlap: Gap: 52 bp. This is considered large, however, this is reasonable because the gap is conserved in comparison to Cafasso and the self and host trained coding potential maps show no coding potential within the gap. /note=Phamerator: Pham: 55117. Date Analyzed: 10/22/2021. The gene is conserved and found in cluster F when compared to Phage Minnie and Phage BobaPhett. /note=Starterator: Start site 15 is called in 88 of the 105 non-draft genes in the pham. This start site correlates to 69395 bp in ObLaDi. /note=Location call: The gathered evidence suggests that this is a real gene and the most probable start site is at 69395. /note=Function call: NKF. If more supporting evidence found in future, could call Ribbon-Helix-Helix DNA Binding Domain Protein - some evidence to support from BLAST and HHpred but too weak right now. /note=Transmembrane domains: Neither TMHMM or TOPCONS predicted any TMDs, suggesting that this gene is not a membrane protein. /note=Secondary Annotator Name: Cosentino, Evan /note=Secondary Annotator QC: Looks good! I agree with the primary annotator. CDS 70104 - 70349 /gene="132" /product="gp132" /function="hypothetical protein" /locus tag="ObLaDi_132" /note=Original Glimmer call @bp 70164 has strength 3.84; Genemark calls start at 70119 /note=SSC: 70104-70349 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_133 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 3.04315E-51 GAP: 49 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.082, -4.635604911695725, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_133 [Gordonia phage Cafasso]],,QXN74348,100.0,3.04315E-51 SIF-HHPRED: SIF-Syn: NKF; In ObLaDi, the upstream gene is NKF and the downstream gene functions as a ADP-ribosyltransferase, just like in Cafasso. /note=Primary Annotator Name: Rajiv, Subashni /note=Auto-annotation: Glimmer calls the start at 70164. GeneMark calls the start at 70119. The start codon is GTG. /note=Coding Potential: The coding potential in this ORF is only in the forward strand, suggesting it is a forward gene. Coding potential is found in both GeneMark Self and Host. /note=SD (Final) Score: The final score is -6.064. A start site at 70104 has a better final score, Z-score, and gap. /note=Gap/overlap: Gap: There is a gap of 109 base pairs. This gap does not contain coding potential. /note=Phamerator: Pham 32823 on 10/23/2021. It is not conserved in any of the members of the pham and DZ cluster. /note=Starterator: The conserved start site 8 is base pair 70104, which Cafasso calls. Aleemily, the only other non-draft member, calls start 9 (70107 in ObLaDi). Neither start site agrees with either the Glimmer or the GeneMark start site. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 70104, which matches with Cafasso and closes the gap further. This site also has better stats compared to 70107. /note=Function call: The likely function is unknown. PhagesDB and NCBI’s only final-draft hit predicted membrane protein unknown function and hypothetical protein function, respectively. The e-values were 1e-41 and 3e-51 while the identities were both 100%. The CDD and HHpred had uninformative hits. There is not enough evidence to hypothesize a function for this gene. /note=Transmembrane domains: No transmembrane domains were called in TMHMM. TOPCONS has transmembrane domains, however it it is not a membrane protein because there are no TMHMM hits. /note=Secondary Annotator Name: Cosentino, Evan /note=Secondary Annotator QC: Looks good! I agree with the primary annotator. CDS 70355 - 72976 /gene="133" /product="gp133" /function="VIP2-like ADP-ribosyltransferase toxin" /locus tag="ObLaDi_133" /note=Original Glimmer call @bp 70355 has strength 18.09; Genemark calls start at 70355 /note=SSC: 70355-72976 CP: yes SCS: both ST: SS BLAST-Start: [ADP-ribosyltransferase [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 0.0 GAP: 5 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.59, -3.5925599723783583, no F: VIP2-like ADP-ribosyltransferase toxin SIF-BLAST: ,,[ADP-ribosyltransferase [Gordonia phage Cafasso]],,QXN74349,99.6564,0.0 SIF-HHPRED: ADP-RIBOSYLTRANSFERASE; ALPHA-BETA PROTEIN, PROTEIN-NAD COMPLEX, BINARY TOXIN, TOXIN; HET: NAD; 2.7A {Bacillus cereus} SCOP: d.166.1.1,,,1QS2_A,23.9404,99.9 SIF-Syn: The synteny here isn`t the best described as many of the genes are NKF. However, they do yield very similar gene arrangement and in the same direction as Cafasso phage. /note=Primary Annotator Name: Esparza, Pablo /note=Auto-annotation: Start site is called @70355 F and the starting codon sequence is ATG. Both Genemark and Glimmer call this start. /note=Coding Potential: It has great protein coding potential and the longest most reasonable ORF (LORF). /note=SD (Final) Score: The final score for SD and Z-score was -3.593 and 2.59, respectively. My call had the best SD score but the second best Z score. The best Z score had a more negative SD score. I do not believe that one to be a valid option because of gene length of around 570bp. It is way too short and the Pham maps along with coding potential and suggested Glimmer and Genemark start site suggest it is a long gene. /note=Gap/overlap: There is a gap near the start site of 5bp. So no other genes seem to fit here. for the downstream gene it has an overlap of -3bp which could indicate the next gene is part of an operon. /note=Phamerator: It is on Pham 33418 and this analysis was run 10/29/21. It is common among the others but there is only one established gene and one draft gene in this cluster, though all are consistent as I said. So basically, it`s found in 3 of 3 ( 100.0% ) of genes in Pham. The function is suggested to be ADP-ribosyltransferase. /note=Starterator: This is a reasonable start choice and the start seems conserved among the three choices (again, only Cafesso is reliable), with start sight being 1. /note=Location call: This appears to be a real, functional gene with the start sight in my observations being @70355 F. /note=Function call: Evidence from HHPred + PhagesDB BLAST to support VIP2-like ADP-ribosyltransferase toxin. /note=Transmembrane domains: There are no hits present in either program. /note=Secondary Annotator Name: Chuzhi, Zhuang /note=Secondary Annotator QC: Great work! I agree with your start site. My suggestion would be to check HHPRED hits again because some hits look like they could serve as additional evidence to your function call, with one hit also indicating the function of ADP-ribosyltransferase. For CDD I think the second hit also looks reliable as evidence. You could use these pieces of evidence to elaborate on your function call section. CDS 72973 - 73386 /gene="134" /product="gp134" /function="hypothetical protein" /locus tag="ObLaDi_134" /note=Original Glimmer call @bp 72973 has strength 13.47; Genemark calls start at 72973 /note=SSC: 72973-73386 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_135 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 3.51378E-92 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.732, -3.3019548882942655, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_135 [Gordonia phage Cafasso]],,QXN74350,100.0,3.51378E-92 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF just like in phage Cafasso, downstream is NKF however in phage Cafasso the downstream gene is a DNA-binding protein. /note=Primary Annotator Name: Melkote, Aditi /note=Auto-annotation: Both Glimmer and GeneMark agree on start site called at 72973, start codon called is GTG /note=Coding Potential: Good coding potential from GeneMarkS and Host-Trained GeneMark within predicted ORF; start site at 72973 covered in this coding potential. /note=SD (Final) Score: -3.302, best Final Score, but also gene is likely part of operon as evidenced by overlap. /note=Gap/overlap: Overlap of -4 so gene is likely part of operon, Pham map also shows no large gaps upstream or downstream of gene /note=Phamerator: 33369. Date 10-31-2021. It is conserved; found in Cafasso (DZ). /note=Starterator: Start site 1, which corresponds to 72973bp, is the "Most Annotated" start site, manually annotated in 1 out of 1 non-draft phage genomes, i.e in Cafasso. Imp to note only 3 members of this pham, of which Cafasso is the only non-draft genome; only cluster DZ represented in this pham. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 72973bp. Starterator agrees with Glimmer and Genemark. /note=Function call: No Known Function. A;; PhagesDB BLASTp hits are for genes with unknown functions, NCBI BLASTp provides hits for hypothetical proteins. HHpred hits have low probability (<80%), low coverage (<40%), and very high E-values (35, 200, etc. which are >>10e-3). CDD provides no hits /note=Transmembrane domains: Does not have transmembrane domains. No TMDs, no TOPCONs results. /note=Secondary Annotator Name: McLinden, Katherine /note=Secondary Annotator QC: Looks good! For the second annotation, I would look at your evidence for Sixama since I don`t think the E-value is low enough to justify its inclusion in the phages DB blast box. Additionally, the second NCBI result is a bacteria and not a phage and the E-value is too high, so I think that should also be excluded from the evidence. /note=--> Updates: I included Sixama because all PhagesDB hits are NKF, so as directed in the slides for this situation, I have selected the top two non-draft hits. As for the NCBI value, I think including a bacteria is okay, because there is always a chance of HGT between bacteria and phages; further the bacteria hit is for a Gordonia bacteria, which is the same genus as ObLaDi, so this is also a potential consideration. The e-value is 1e-05, and <10^-3 can be considered significant as per the slides. CDS complement (73387 - 73707) /gene="135" /product="gp135" /function="DNA binding protein" /locus tag="ObLaDi_135" /note=Original Glimmer call @bp 73674 has strength 11.87; Genemark calls start at 73674 /note=SSC: 73707-73387 CP: yes SCS: both-cs ST: SS BLAST-Start: [DNA binding protein [Gordonia phage Cafasso]],,NCBI, q12:s1 89.6226% 1.611E-59 GAP: 15 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.065, -2.8298788511194775, yes F: DNA binding protein SIF-BLAST: ,,[DNA binding protein [Gordonia phage Cafasso]],,QXN74351,100.0,1.611E-59 SIF-HHPRED: a.4.1.2 (C:) HIN recombinase (DNA-binding domain) {Synthetic},,,d1ijwc_,41.5094,95.3 SIF-Syn: "DNA binding protein, the upstream gene is NFK, downstream is NFK, just like in phage Cafasso /note=Primary Annotator Name: Niazmandi, Kiana /note=Auto-annotation: Glimmer and Genesmark both mentioned the start position is 73764, . /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host /note=SD (Final) Score:-4.869. It`s in the middle relative to the other final scores. The Z- score is 2.436 which is also in the middle. there is a start codon upstream of the gene that has a good Z-score and SD score. /note=Gap/overlap: 48 bp gap which is normal. the length of the gene is 288 which is more than 120 and it is acceptable. /note=Phamerator: Pham number 32686 on 10/24/2021, It is conserved in gene Cafasso_136. It`s draft and doesn`t have a function yet, but the Cafasso gene codes a DNA binding pr. /note=Starterator: start number 2 (73674) was manually annotated in Cafasso non-draft gene. Aleemily calls start 1 (73707 in ObLaDi). THey do not agree /note=Location call: 73707 based on better stats (SD score, close gap, LORF) and Aleemily. /note=Function call: The top 2 BLASTp hits were cafasso and VanLee, sorted by E-value, suggested function is DNA binding protein, with high query coverage (100%), high % identity (100%), and low E-values (6e-49). there is also a hit On CDD and HHpred (but not strong), we can assume that this gene functions as a transcription factor or sigma factor. overall there is strong evidence that this gene codes for a DNA binding protein, but not sure about its role as a sigma factor or a transcription factor. /note=Transmembrane domains: 0 transmembrane domain. Supports our findings for the gene function because the gene interacts with DNA, and it’s not required to bind to the membrane. /note=Secondary Annotator Name: Gibbons, Alicia /note=Secondary Annotator QC: I agree with this call. There is some evidence (but not all) for this start site over the LORF start site at 73707 (it also covers the ORF, has a higher z-score and final score as well), so this start site should also be kept in mind as a possible alternative start site. To strengthen you annotation, fill in the Pham Starterator box, the GM coding capacity box, and the synteny box, and finish your note on the coding portential. Can you explain why you say through CDD and HHpred you can assume this gene functions as a transcription factor or sigma factor and then say there is overall no strong evidence CDS complement (73723 - 73986) /gene="136" /product="gp136" /function="hypothetical protein" /locus tag="ObLaDi_136" /note=Original Glimmer call @bp 73986 has strength 16.24; Genemark calls start at 73986 /note=SSC: 73986-73723 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_137 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.32996E-55 GAP: 44 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.829, -3.4897410997597818, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_137 [Gordonia phage Cafasso]],,QXN74352,98.8506,1.32996E-55 SIF-HHPRED: DNA-directed RNA polymerase subunit P; Transcription, DNA-directed RNA polymerase; HET: ZN; 3.5A {Thermococcus kodakarensis},,,4QIW_W,35.6322,98.3 SIF-Syn: /note=Primary Annotator Name: Senthilvelan, Jayasuriya /note=Auto-annotation: Glimmer and GeneMark agree on the start site and start codon: 73986, ATG. /note=Coding Potential: There is coding potential within the putative ORF. The start site covers this coding potential. /note=SD (Final) Score: -3.490. This is the best score. /note=Gap/overlap: 44 bp gap. It is reasonable. Gap has no coding potential. The called start site is the LORF. Gap is syntenic with Cafasso. The other start site gaps are ~100 bp. Hence, called start site is more likely. /note=Phamerator: Pham 2467, 10/26/2021. This pham is in one other member of cluster DZ - Cafasso_137. Function call is DNA-binding protein and is conserved and approved according to SEA-PHAGES. /note=Starterator: ObLaDi was not found in the tracks, but it was found in the text output. Start 28 is called most (63/86) at 73986 bp and is also called in ObLaDi. /note=Location call: Above evidence suggests this is a real gene and starts at 73986. Starterator evidence indicates conservation in other phages in the same pham. This start site covers all coding potential and has the highest final score and is the LORF. /note=Function call: Only one result from phagesdb has a known function, but this hit is very poor (E>1). Hence, the two strongest phagesdb hits were identified and are Cafasso and Peregrin (score > 150, E < e-30, ident > 70%), both of which are unknown function. Blastp suggested a new hit, Serenity (score 137, E 3e-40, ident 78%) also with uknown function. CDD yielded no hits. HHPred yielded two significant hits: zinc ribbon (98% prob, e-5, 34% coverage) and DNA-directed RNA polymerase subunit P (98% prob, e-5, 35% coverage). Since RNA polymerase has a zinc ribbon domain, both hits (although they have non-ideal coverage) may be pointing to the same function: zinc ribbon-containing RNA polymerase subunit. However, this is not found in the approved SEA-PHAGES functions, so NKF will be recorded for now. HHPred also points at Mu-like prophage protein Com, but this has a much lower ident% than the two other top hits. /note=Transmembrane domains: Neither TmHmm or Topcons predicts any TMHs. No evidence to suggest this gene product is associated with the membrane. This supports the prediction by HHpred that this protein is a RNA polymerase subunit. /note=Secondary Annotator Name: Magaling, Janelle /note=Secondary Annotator QC: Looks good :) CDS complement (74031 - 74165) /gene="137" /product="gp137" /function="hypothetical protein" /locus tag="ObLaDi_137" /note=Original Glimmer call @bp 74177 has strength 23.92; Genemark calls start at 74177 /note=SSC: 74165-74031 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_138 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 2.21181E-18 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.311, -4.024058836634684, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_138 [Gordonia phage Cafasso]],,QXN74353,95.4545,2.21181E-18 SIF-HHPRED: SIF-Syn: NKF (pham number 32964), downstream is NKF (pham number 33580), upstream is NKF (2467). same in Cafasso /note=Primary Annotator Name: Liao, Shiqing /note=Auto-annotation: Both Glimmer and GeneMark annotate this gene. They also agree on the start site at 74177. /note=Coding Potential: The gene has reasonable coding potential predicted; however, the range covered by the coding-potential curve is slightly smaller than the range of the gene. /note=SD (Final) Score: -6.070. The original Z-score is 1.352 and the Final Score is -6.070. It’s not the highest RBS Final score. The highest one has a start site at 74300, with Z-score at 2.925 and Final Score at -2.906 (both are the best scores). Although this start site has higher scores, because it has a 136bp overlap with the previous gene, it can’t be the actual start site. There is also another start site at 74165, with 1bp overlap, and higher Z-score and Final score than the one at 74177. /note=Gap/overlap: -13, might be too big a overlap /note=Phamerator: 32964. Date 10/21/2021. It is conserved; found in Cafasso (DZ), Aleemily (DZ), and VanLee(singleton). /note=Starterator: Start site 4 (74165) in Starterator was manually annotated in 3/3 non-draft genes in this pham. This evidence doesn’t agree with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene. I would prefer 74165 over 74177 because 74165 has higher Z-score and Final score, also smaller overlap (-1, likely be in an operon) with the previous gene. This indicates this gene should be part of an operon. /note=Starterator Drop-Down Menu (see end of PECAAN Notes Instructions): NA because this gene is not the longest. /note=Function call: Function unknown due to no other hits have recorded functions in BLASTp and NCBI blast. CDD doesn’t have hits either. Though HHpred has hits, the E-values are really high, indicating that the match is due to randomness, and the functions of hits are also unknown. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Melkote, Aditi /note=Secondary Annotator QC: Looks good, agree with this call! Second QC: Agree with this call! CDS complement (74165 - 74410) /gene="138" /product="gp138" /function="hypothetical protein" /locus tag="ObLaDi_138" /note=Original Glimmer call @bp 74410 has strength 22.65; Genemark calls start at 74410 /note=SSC: 74410-74165 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_139 [Gordonia phage Cafasso]],,NCBI, q1:s1 97.5309% 1.87289E-37 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.172, -4.660750147004694, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_139 [Gordonia phage Cafasso]],,QXN74354,82.716,1.87289E-37 SIF-HHPRED: SIF-Syn: NKF, downstream gene in pham 33580, like in phage Cafasso. /note=Primary Annotator Name: Tenney, Megan /note=Auto-annotation: Both GeneMark and Glimmer predicted a start codon of GTG at 74410 bp. /note=Coding Potential: The start site mentioned above is conserved in both self-trained and host-trained coding potential, to which there is substantial potential. /note=SD (Final) Score: The SD score is -4.661, which is the largest and best of the candidates. /note=Gap/overlap: There is a 1bp overlap, which could indicate an operon. This start site of 74410 does not give the largest ORF, but it minimizes the overlap and has the most ideal z and SD scores. /note=Phamerator: This gene is a part of Pham 33580 with all other members of cluster DZ (Cafasso and Aleemily_draft) and a singleton (VanLee). The ObLaDi gene has an identical gene length to the Cafasso and Aleemily genes in the pham, which supports the legitimacy of this pham. Cafasso was used for comparison because of its non-draft status and membership in the DZ cluster. There was no function called for these genes. /note=Starterator: The start site 6 is conserved in 3 of 4 of the members of this pham and is called 100% of the time that it’s present. In ObLaDi, this start site corresponds to a base pair coordinate of 74410bp. This start candidate also contains all coding potential. /note=Location call: This gene does seem to be real, though synteny is not observed. The start site of 74410 seems to be the best candidate in all aspects except for maximizing the ORF length, and it is conserved in 3 of 4 members of the pham, but 3 of 3 members of the cluster DZ. It is called for 100% of the time that it is present, and manual annotation of Cafasso, the non-draft phage within cluster DZ, agrees. Coding-potential essentially starts and stops where the predicted gene does and the Z-score is above 2, which provides substantial evidence for this hypothesis. There is also a slight overlap when this start site is called which indicates an operon. /note=Function call: There is not enough evidence to predict a function for our gene. All non-draft phages with aligning sequences and strong e-values (Cafasso and VanLee) have no known function, according to phagesdb. Most of the aligning genes from other phages with insignificant e-values also have no known function. NCBI Blast results provide a long list of phages with aligning sequences and significant e-values, though 63/64 of these sequences are “hypothetical proteins.” Of actinobacteriophages, both non-draft phages Cafasso and VanLee have an aligning sequence with a strong e-value which provides strong support that when the functions of these genes are discovered we can assume our gene’s function to be the same. There were no hits from CDD and HHpred resulted in hits with high e-values and low probabilities and % coverage. There is NKF for this gene as of now. No transmembrane domains were detected which confirms the previously hypothesized function. /note=Transmembrane domains: No transmembrane domains were identified by TMHMM or TOPCONS, supporting the hypothesized function of “NKF.” /note=Secondary Annotator Name:Niazmandi, Kiana /note=Secondary Annotator QC: I agree. write more information about HHpred, probability, % coverage and e value CDS complement (74410 - 74655) /gene="139" /product="gp139" /function="hypothetical protein" /locus tag="ObLaDi_139" /note=Original Glimmer call @bp 74655 has strength 13.76; Genemark calls start at 74655 /note=SSC: 74655-74410 CP: yes SCS: both ST: SS BLAST-Start: GAP: 130 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.189, -4.354778061539587, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: NKF; Upstream gene and downstream gene are both NKF. /note=Primary Annotator Name: Teoh, Bryan /note=Auto-annotation start source: Glimmer and GeneMark both have the same start site at 74655. Both have an ATG start site. /note=Coding Potential: The gene displays strong coding potential predicted within the ORF. Start site 74655 is covered by the predicted coding potential ORF. /note=SD (Final) Score: 2.189/-4.355 ; Start site 74490 has a better score of -3.756 but has too many gaps and small reading frame. /note=Gap/overlap: 130bp gap; This is a very large gap and may suggest non-coding regions. This gap is not conserved in any sort of phage genome. /note=Phamerator: Pham 55262 at 10/22/2021. It is only conserved in Daredevil_133 of cluster DL. /note=Starterator: ObLaDi does not have most called start site (13). It does have 10 (location 74655) which is also called in Aleemily. /note=Location call: Based on the evidence, this gene is possibly real and the appropriate start site is 74655. /note=Function call: Unknown; Based on the evidence provided, there is no known function for this gene. This is due to the high e-values and low probability of functional hits. /note=Transmembrane domains: None detected or listed /note=Secondary Annotator Name: Ostroske, Elyse /note=Secondary Annotator QC: I have reviewed the evidence and I agree with the location call. CDS complement (74786 - 75109) /gene="140" /product="gp140" /function="hypothetical protein" /locus tag="ObLaDi_140" /note=Original Glimmer call @bp 75109 has strength 15.11; Genemark calls start at 75109 /note=SSC: 75109-74786 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Gordonia sp. GAMMA]],,NCBI, q1:s1 98.1308% 8.42551E-36 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.925, -2.827683592113848, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Gordonia sp. GAMMA]],,WP_137811226,69.7248,8.42551E-36 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Turon Font, Guillem /note=Auto-annotation: Both call it, and agree (75109). The starting codon called is ATG. /note=Coding Potential: There is clear and marked coding potential in the designated area, and the autoannotated start site covers it all. /note=SD (Final) Score: 75109 [Final score: -2.828 Z-score: 2.925]. These are very good scores, even though the called start site has an overlap of -11 with the previous gene, suggesting an operon. Maybe start site 75094 with a gap of 7 would work better if we`re under the assumption that it`s not an operon, but the scores for that one are significantly worse. /note=Gap/overlap: -11. It is the longest possible ORF. The gene length is compatible with the surrounding genes and is mimicked in synteny with Cafasso in the same subcluster. /note=Phamerator: as of 10/25/21, it is in pham 33110. This pham is very small, with only 3 genes, 2 of which are drafts. The non-draft gene belongs to Cafasso, a phage in the same subcluster as ObLaDi (DZ). I don`t believe a called function would be significant with such a small sample size of genes. /note=Starterator: as of 10/25/21, the most called start site (#1@75109) is not only present in ObLaDi, but also autoannotated. It has 1 MA, which corresponds to Cafasso`s manually annotated start site. Start site 1 is called by 1/1 non-draft genes (there are only 3 genes, and the other is a draft as well). While this is encouraging, I don`t think a single MA is that much help. /note=Location call: It’s the only called site and has much better scores than alternatives. It also overlaps nicely with the previous gene (-11) as well as being the only start site with an MA in starterator. I believe 75109 should be assignated. /note=Function call: I don’t think I can call a function. Both PhagesDB BLAST and NCBI BLAST had no non-draft matches which had a function. This gene already doesn’t have that many discovered genes it is related to (it has only 1 non-draft gene in its pham). I’m not surprised that I’m unable to determine a function. /note=Transmembrane domains: None, as per TMHMM and TOPCONS. /note=Secondary Annotator Name: Lee, Adrienne /note=Secondary Annotator QC: I agree with the location and functional call. Try to reformat your synteny description to fit the example in the manual to make it easier for the reader. CDS complement (75102 - 75242) /gene="141" /product="gp141" /function="hypothetical protein" /locus tag="ObLaDi_141" /note=Original Glimmer call @bp 75242 has strength 23.29; Genemark calls start at 75242 /note=SSC: 75242-75102 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_143 [Gordonia phage Cafasso]],,NCBI, q1:s1 97.8261% 6.85835E-22 GAP: 443 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.403, -3.8366550365551095, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_143 [Gordonia phage Cafasso]],,QXN74358,97.8261,6.85835E-22 SIF-HHPRED: SIF-Syn: Function: NKF; Gene 140; Pham: 17035 In phage Cafasso: function and Pham number are the same, except Gene 143 Upstream function: NKF; Gene 139; Pham: 33110 In phage Cafasso: same function and pham number, except Gene 142 Downstream function: NKF; Gene 141; Pham: 33641 In phage Cafasso: same function and pham number, except Gene 144 /note=Primary Annotator Name: Ma, Yiwen (Kristy) /note=Auto-annotation: GeneMark and Glimmer both call at the same site: 75242 /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. This start site includes all of the coding potential for this gene. /note=SD (Final) Score: Final Score is -3.837, the Z-score is 2.403, which is significant. /note=Gap/overlap: 443 bp overlap with upstream gene. Although there is a large gap with the previous one. There is no coding potential within that gap. Also, there is a large gap at the same location in Cafasso. /note=Phamerator: Pham:17035. Date 10/24/21. It is conserved; found in Cafasso(DZ) and Aleemily(DZ). (Pham number 17035 has 4 members, 2 are drafts.) /note=Starterator:The start number called the most often in the published annotations is 2, it was called in 2 of the 2 non-draft genes in the pham. Found in 4 of 4 ( 100.0% ) of genes in pham /note=Start number 2 was manually annotated 1 time for cluster DZ. Start number 2 was manually annotated 1 time for cluster DL. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 75242. /note=Secondary Annotator Name: Verpukhovskiy, Philipp /note=Secondary Annotator QC: While the gene has a large gap, its start site is conserved in 2 of the genes, with valid Z and Final scores. I agree with the location call HOWEVER I do not know whether the primary annotator has fully secured their location call. /note=Function call: NKF. The only non-draft phagesdb BLAST hit says that the function is unknown (E-value =4e-19), and the only NCBI BLAST hit suggest a hypothetical protein, which is the same meaning of unknown function (97.8% coverage, 97.8% identity, and E-value = 5.66315e-22). HHpred had no significant hits. CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS complement (75686 - 75961) /gene="142" /product="gp142" /function="hypothetical protein" /locus tag="ObLaDi_142" /note=Original Glimmer call @bp 75961 has strength 7.01; Genemark calls start at 75955 /note=SSC: 75961-75686 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_144 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.9474E-57 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.402, -3.978223840957539, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_144 [Gordonia phage Cafasso]],,QXN74359,98.9011,1.9474E-57 SIF-HHPRED: SIF-Syn: The current gene is NKF (Pham 33641), upstream gene(Pham 17035) is NKF, downstream gene(Pham 16032) is also NKF, just like in phage Cafasso of Cluster DZ. /note=Primary Annotator Name: Wang, Jennifer Yiyang Wang /note=Auto-annotation: Auto-annotation on both Glimmer and GeneMark, but do not agree on the start site Glimmer has 75961 as the start site, while GeneMark has 75955 as the start site. /note=Coding Potential: This gene has reasonable coding potential predicted within the putative ORF, and the chosen start site covers all the predicted coding potential. /note=SD (Final) Score: 75961: final score=-3.978 It is not the best final score on PECAAN but better than that of 75955; 75955: final score=-5.444 It is not the best final score on PECAAN. /note=Gap/overlap: 75961: -4 This indicates this gene is likely part of an operon ; 75955: 2 A small and reasonable gap. /note=Phamerator: Pham:33641. Date 10/26/21. It is conserved; found in Cafasso(DZ) and Aleemily(DZ). There is no function called for the gene. /note=Starterator: Start site 2 in Starterator was manually annotated in 1/1 non-draft genes in this pham. Start 2 is 75961 in ObLaDi. This evidence agrees with the site predicted by Glimmer, but not with GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and its start site should be at 75961 bp. Starterator agrees with Glimmer. /note=Function call: Highly possible unknown function for the gene. All phagesdb BLAST top hits and NCBI BLAST top hits state “function unknown” for the gene. NKF, there is no hit for CDD and no good hit for HHpred. /note=Transmembrane domains: No TMD`s called and no evidence suggesting TMD function within the other databases, neither TMHMM nor TOPCONS calls it. /note=Secondary Annotator Name: Niazmandi, Kiana /note=Secondary Annotator QC: Niazmandi, Kiana CDS complement (75958 - 76515) /gene="143" /product="gp143" /function="hypothetical protein" /locus tag="ObLaDi_143" /note=Original Glimmer call @bp 76515 has strength 14.5; Genemark calls start at 76515 /note=SSC: 76515-75958 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_145 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 6.43365E-124 GAP: 93 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.163, -2.356391881054909, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_145 [Gordonia phage Cafasso]],,QXN74360,96.2162,6.43365E-124 SIF-HHPRED: SIF-Syn: NKF. Upstream gene is pham 33641, Downstream (no synteny), just like in phage Cafasso. /note=Primary Annotator Name: Whang, Allison /note=Auto-annotation: Glimmer and Genemark agree on a projected start site of 76515. The agreed-upon start codon is GTG. /note=Coding Potential: There is strong coding potential in the entire ORF via both the host-trained and self-trained Genemark. /note=SD (Final) Score: -2.356 (for start site 76515). While this isn’t the most negative Z-score listed, this is still a Z-score less than -2, which makes it reasonable. /note=Gap/overlap: 94 BP overlap with the gene directly upstream. This seems to be a reasonable gap (albeit on the larger side), because Cafasso (which has a very similar gene layout to ObLaDi), has a 95 bp gap between the corresponding gene and its downstream counterpart. /note=Phamerator: This gene is part of the pham 16032 as of 10/26/21. The only other gene from the same cluster as ObLaDi is Cafasso. I have been using Cafasso as a point of reference for this ObLaDi gene. There are no functions listed for this pham listed in Phamerator. /note=Starterator: Starterator is not necessarily helpful for this particular gene because this gene does not have the most annotated start site (start site 4). 2 of the genes in this pham have start site 4 (which is the most annotated), while 3 of the genes in this pham (one of which is the gene from ObLaDi) have start site 5. The base pair corresponding with start site 5 is 76515. /note=Location call: The gathered evidence suggests that 76515 is the correct start site and that this gene is a real gene. While it is part of a small pham, it shows high amounts of synteny with the corresponding gene from Cafasso, which suggests that it is a real gene. It also has high coding potential in the ORF. /note=Function call: The function is unknown (NKF). Neither BLASTp, CDD, nor HHpred resulted in any hits detailing what the function of this gene may be. /note=Transmembrane domains: No transmembrane domains indicated by TMHMM or TOPCONS. /note=Secondary Annotator Name: Abuwarda, Manar /note=Secondary Annotator QC: I agree with this location. All of the evidence categories have been considered. Note: Your gap is 93 bp not 3 bp, and it`s not an overlap. You should expand on this gap in your notes: is it reasonable or not? Also, the starterator provides a little insight since the start site 76515 has been manually annotated in Cafasso which is non-draft and in the same cluster as ObLaDi. CDS complement (76609 - 76794) /gene="144" /product="gp144" /function="hypothetical protein" /locus tag="ObLaDi_144" /note=Original Glimmer call @bp 76794 has strength 19.17; Genemark calls start at 76794 /note=SSC: 76794-76609 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BEN60_gp045 [Gordonia phage Smoothie] ],,NCBI, q1:s1 100.0% 1.60932E-16 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.925, -2.8454123590742793, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BEN60_gp045 [Gordonia phage Smoothie] ],,YP_009269275,75.4098,1.60932E-16 SIF-HHPRED: SIF-Syn: No known function (pham 42651), upstream gene is in pham 16032, downstream gene is in pham 32706. In phage Cafasso, there is also a gene whose upstream gene is in pham 16032, and whose downstream gene is in pham 32706, but this gene (#147 with stop@77064 R) is in pham 53521 instead of 42651. /note=Primary Annotator Name: Wright, Nicklas /note=Auto-annotation: Genemark and Glimmer agree on a start site of 76794 /note=Coding Potential: The gene has good coding potential and this is captured by the start site 76794 /note=SD (Final) Score: -2.845, this is the best score on PECAAN. /note=Gap/overlap: 1 bp, the gene is likely part of an operon. /note=Phamerator: Pham 42651 as of 10/23/2021. This pham is not present in other members of cluster DZ. It is only present in 1 other phage which is Smoothie, of cluster CQ. No function is called. /note=Starterator: There is only 1 other phage in the pham and it has start site 6 called. Start site 6 is also the auto-annotated start site for ObLaDi. This start is at position 76794. /note=Location call: This is likely a real gene with start site 76794. /note=Function call: BLASTp, CDD, and HHpred are all uninformative or lacking hits, therefore, this gene has no known function. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Senthilvelan, Jayasuriya /note=Secondary Annotator QC: Looks good. I agree with both the function and location call. All evidence categories have been filled out. Note: perhaps mention the Smoothie phage in your function call notes since you checked that as evidence in both phagesdb and NCBI-blast. CDS complement (76794 - 77018) /gene="145" /product="gp145" /function="hypothetical protein" /locus tag="ObLaDi_145" /note=Original Glimmer call @bp 77018 has strength 18.64; Genemark calls start at 77018 /note=SSC: 77018-76794 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_147 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 2.76747E-45 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.484, -4.197870532213559, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_147 [Gordonia phage Cafasso]],,QXN74362,100.0,2.76747E-45 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF (Pham 75432), downstream is NKF (Pham 42651), just like in phage Cafasso from the same cluster DZ. /note=Primary Annotator Name: Huq, Naveed /note=Auto-annotation: Glimmer and Genemark both agree on start site @77018, start codon is GTG /note=Coding Potential: Reasonable coding potential in putative ORF, covered by chosen start site /note=SD (Final) Score: SD score is the best. Z-score is also higher than 2. /note=Gap/overlap: There is no gap but a slight overlap between this gene’s suggested start and the previous gene’s stop with a gap of -4. This is the longest reasonable ORF for this gene call with a length of 225 bases. /note=Phamerator: Pham 32706 - 11/5/21. The pham my gene belongs to does present in other members of the cluster, DZ. The phage that I used for comparison is Cafasso. No function called. /note=Starterator: Conserved start site number 1, @77018, 2/2 other members of pham call same start site number /note=Location call: Real gene with most likely start site @77018, conserved in starterator /note=Function call: No databases provided any evidence to support a call. /note=Transmembrane domains: No TMDs predicted /note=Secondary Annotator Name: Krug, Kelley /note=Secondary Annotator QC: I agree with the start site and have QC`d this gene. I`d suggest mentioning that the 4bp overlap means it is likely part of an operon. Also, maybe mention how small the Pham that your gene belongs to is (3 members, only one of which is finalized). CDS complement (77015 - 77173) /gene="146" /product="gp146" /function="hypothetical protein" /locus tag="ObLaDi_146" /note=Original Glimmer call @bp 77173 has strength 10.04; Genemark calls start at 77173 /note=SSC: 77173-77015 CP: yes SCS: both ST: NA BLAST-Start: GAP: 14 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.793, -5.14847875699779, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: This specific gene, ObLaDi_146, does not have synteny with any other gene in any other phage. This gene is an orpham. However, it is worth noting that most of its upstream genes share phams with Cafasso, such as ObLaDi_145 and Cafasso_147, which are both part of Pham 32706. ObLaDi_145 is annotated to be a Bacterial RNA polymerase inhibitor, while Cafasso_147 does not have a function assigned. Downstream, ObLaDi_147 is in the same pham as Cafasso_148; both are part of Pham 57246 and currently do not have a function assigned. This synteny is observed upstream and downstream of our orpham. /note=QC note: may not be real gene? Strong coding potential present on GM host, but neither Aleemily or Cafasso call this gene. /note=Primary Annotator Name: Cheng, Celine /note=Auto-annotation: Both Glimmer and GeneMark call the gene, as well as predict its start site to be at 77173 bp. /note=Coding Potential: The ORF has a strong coding potential for about half of it. It takes some time to develop a strong coding potential, but it is strong starting at around 77125 bp and and quickly drops off at about 77050 bp. /note=SD (Final) Score: -5.148. This was not the best final score, but the other start site (final score -3.285) suggests a significantly shorter protein of only 72 bp, which is typically not long enough to be considered. Additionally, the other start site would result in a large gap, which is usually not preferred. /note=Gap/overlap: 14 bp gap upstream, 4 bp overlap downstream. 4 bp overlap may suggest an operon. These gap sizes are small and therefore acceptable. /note=Phamerator: Pham 75432. This gene is an orpham; it is the only member in this pham. /note=Starterator: Pham 75432. This gene is an orpham; it is the only member in this pham. As a result, there is no Starterator analysis. /note=Location call: Because this gene is an orpham, it is difficult to determine if it is a real gene or not. However, there is strong coding potential, and a similar sized gap is seen in Cafasso (though no gene was called there). As a result, I would say that it is a real gene and its start site is at 77173 bp. I selected 77173 bp as the start site not only because it was predicted by both Glimmer and GeneMark, but also because it creates a long-enough protein compared to the other predicted start site. /note=Function call: NKF (No Known Function). There are no strong hits on PhagesDB BLAST and NCBI BLASTp. The PhagesDB Function Frequency Box also has no hits. There were no CDD hits, and there were no significant HHpred hits. All of this, combined with the fact that this gene is an orpham, leads me to label this gene as NKF (No Known Function). /note=Transmembrane domains: TMHMM did not predict any transmembrane domains (TMD). TOPCONS did not either. This suggests that this gene product is not a transmembrane protein. /note=Secondary Annotator Name: Montoya, Cinthya /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS complement (77188 - 77583) /gene="147" /product="gp147" /function="hypothetical protein" /locus tag="ObLaDi_147" /note=Original Glimmer call @bp 77583 has strength 11.99; Genemark calls start at 77550 /note=SSC: 77583-77188 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_148 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.90488E-91 GAP: 12 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.754, -3.2575878860394365, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_148 [Gordonia phage Cafasso]],,QXN74363,100.0,1.90488E-91 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Cosentino, Evan /note=Auto-annotation: Both Glimmer and GeneMark call the gene, however, they disagree on the start site. Glimmer lists the start site at 77583 bp, however, GeneMark lists the start site at 77550 bp. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, meaning that this is a reverse gene. Good coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.258 /note=Gap/overlap: 12 bp gap which is reasonable. /note=Phamerator: It is in Pham 57246. It is only in cluster DZ. /note=Starterator: Start site 1 in Starterator was manually annotated in 1/1 non-draft genes (Cafasso) in this pham. Start 1 is 77583 in ObLaDi. This evidence agrees with the site predicted by Glimmer. /note=Location call: The location call is at 77583. /note=Function call: Function unknown. All of the PhagesDB hits list the function as unknown and NCBI only had 1 hit which says it`s a hypothetical protein. There were also no CDD hits and no significant HHpred hits. The hits in HHpred all had low probabilities, high e-values, and bad coverage. /note=Transmembrane domains: TmHmm and Topcons doesn`t call any transmembrane domains. /note=Secondary Annotator Name: Paek, Brian /note=Secondary Annotator QC: I agree with the annotation. Detail the Final Score a bit more, check annotation manual sample. Synteny can be a little more detailed using pham numbers. CDS complement (77596 - 77943) /gene="148" /product="gp148" /function="hypothetical protein" /locus tag="ObLaDi_148" /note=Original Glimmer call @bp 77943 has strength 10.73; Genemark calls start at 77943 /note=SSC: 77943-77596 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_149 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 2.27815E-75 GAP: 101 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.997, -4.731360067354143, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_149 [Gordonia phage Cafasso]],,QXN74364,100.0,2.27815E-75 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Dooley, Naomi /note=Auto-annotation: Gene is called by Glimmer and Genemark to the same startsite at 77943 with an ATG start site. Because ATG is a more common start site this evidence would support a 77943 start site. /note=Coding Potential: The gene has reasonable coding potential and the start site covers all coding potential based on the host trained Genemark /note=SD (Final) Score: The final score for the auto-annotated site is -4.731 this is the best SD score of all of the potential start sites thus further supporting a 77943 start site. The next best start sites either have unreasonable gaps or or unreasonable lengths. The Z score for this start site is not >2 however it is 1.997 which is very close to 2 and could possibly help indicate that this is the correct start site. It is also important to note that 1.997 is the third best z-score, however the top two both have unreasonable overlaps. /note=Gap/overlap: There is a 101 bp gap between the auto-annotated start site and the next gene. Because this start site is a bit large, it could indicate the presence of a missing gene. /note=Phamerator: The Pham number as of 10/31/21 is 16460. This gene is also found in Cafasso and (it is also called by Aleemily which is a draft genome and thus cannot be used as evidence but it is important to note that this gene calls it regardless). Because it is called by Cafasso we can conclude that there is a good chance that the gene is conserved between the two phage since they are both in cluster DZ. /note=Starterator: The most called start number across multiple phage genomes was 6. This most annotated start number was called by Obladi. When annotating the gene information we see that this most annotated start number coincides with the 77943 start site thus indicating that this start site is conserved across multiple genomes thus providing further evidence that this start site is correct /note=Location call: Based on all of the evidence above we can conclude that 77943 is the correct start site. /note=Function call: Based on the data shown, it is not safe to conclude any particular function for this gene at this time. This is because all of the best non draft hits in PhagesDB, Cafasso and Daredevil (Cafasso having an E-value of 8E-61 and Score of 230 and Daredevil having an E-value of 7E-10 and Score of 61), both have no function listed both in phagesDB and NCBI. CDD contained no hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein /note=Secondary Annotator Name: Paek, Brian /note=Secondary Annotator QC: I agree with this annotation. All evidence has been considered and reviewed. CDS complement (78045 - 78326) /gene="149" /product="gp149" /function="hypothetical protein" /locus tag="ObLaDi_149" /note=Original Glimmer call @bp 78326 has strength 10.34; Genemark calls start at 78326 /note=SSC: 78326-78045 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_150 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.13287E-62 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.484, -3.8116689268127657, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_150 [Gordonia phage Cafasso]],,QXN74365,100.0,1.13287E-62 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF, downstream gene is NKF, as in phage Cafasso. /note=Primary Annotator Name: Di Blasi, Daria /note=Auto-annotation: Both Glimmer and GeneMark show the same start site at 78326 with start codon ATG. /note=Coding Potential: coding potential on both host-trained and self-trained GeneMark; the start site covers all of the coding potential with the ORF. /note=SD (Final) Score: -3.812; this is the most favorable Final Score. /note=Gap/overlap: 1 bp overlap with downstream gene & 101 bp gap with upstream gene; the 1 bp overlap with the downstream gene seems reasonable. The length of the gene (282 bp) is reasonable. /note=Phamerator: The gene is part of pham 33539 as of October 25th, 2021. The pham has 3 members, all of which are members of the DZ cluster and 2 of which are draft genomes (ObLaDi_Draft, Aleemily_Draft, and Cafasso). /note=Starterator: The “Most Annotated” start site (start site 7, coordinate 78326) is highly conserved and thus present in all of the members of the cluster DZ. This start site is also the auto-annotated start site for all 3 phages that are members of the pham and the DZ cluster. 1 of the 1 non-draft genes in the pham call this start site. /note=Location call: The evidence suggests that auto-annotated start site (start site 7) is the correct start because both Glimmer and GeneMark agree on this start site, the start site includes all of the coding potential, the predicted start has the highest RBS score, this start site produces the smallest overlap with the downstream gene (1 bp overlap), and the start site is highly conserved and called in the non-draft genome. /note=Function call: NKF; All good phagesdb BLASTp hits were of unknown function (E-value = 5e-51), the only NCBI BLASTp hit called the gene product a hypothetical protein (E-value = 1e-62), and there were no relevant CDD hits or HHpred hits (all HHpred hits had E-values > 6), there is evidence to support that the gene product has no known function. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Dooley, Naomi /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS complement (78326 - 78529) /gene="150" /product="gp150" /function="hypothetical protein" /locus tag="ObLaDi_150" /note=Original Glimmer call @bp 78529 has strength 17.11; Genemark calls start at 78529 /note=SSC: 78529-78326 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_151 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 4.53321E-35 GAP: 101 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.189, -4.801936092881806, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_151 [Gordonia phage Cafasso]],,QXN74366,95.5224,4.53321E-35 SIF-HHPRED: SIF-Syn: NKF, the upstream gene is in pham 33539 just like in phage Cafasso, but the downstream gene is in pham 74936. However, the next gene downstream is in pham 82739, just like in phage Cafasso /note=Primary Annotator Name: Empson, Brianna /note=Auto-annotation: Both Glimmer and GeneMark call this gene at a start site of nucleotide 78529. This corresponds to a start codon of ATG. /note=Coding Potential: There is good coding potential present in the putative ORF. The coding potential is entirely encompassed in between the predicted start site and stop site. /note=SD (Final) Score: The auto-annotated start site has the best Final Score at -4.802. /note=Gap/overlap: There is a 101 bp gap upstream of this gene that is reasonable. The length of this gene is reasonable given the predicted start site. /note=Phamerator: As of 10/24/2021, this gene is in pham 33169. All members of this pham are members of the DZ cluster, including ObLaDi. The only non-draft phage included in this pham is Cafasso. There was not a function called for this gene. /note=Starterator: Start #2 is the most conserved start site across the non-draft phages. However, there is only one non-draft phage to compare to. This start site corresponds to a bp of 78529 in ObLaDi, which is also the auto-annotated start site. 1/1 non-draft phage genomes call this start site. I am indicating that Starterator was not informative in this situation because there is only one non-draft gene present in the pham. /note=Location call: Altogether, the evidence suggests that this is a real gene with a start site at bp 78529 (#2). This start site has the best Z-Value, Final Score, and is the most conserved site across non-draft genomes. /note=Function call: Utilizing PhagesDB BLAST and NCBI BLAST was not particularly useful because all comparative genes have unknown functions. No relevant CDD hits. No relevant HHpred hits. The lack of evidence across all databases supports a call of NKF. /note=Transmembrane domains: No transmembrane domains predicted. There was very little known about this gene`s function, so the absence of TMDs can at least tell us that this protein does not interact with the bacterial cell membrane. /note=Secondary Annotator Name: Senthilvelan, Jayasuriya /note=Secondary Annotator QC: Looks good. I agree with both the location and function call. All appropriate evidence categories have been filled out. CDS complement (78631 - 79068) /gene="151" /product="gp151" /function="hypothetical protein" /locus tag="ObLaDi_151" /note=Original Glimmer call @bp 79068 has strength 13.18; Genemark calls start at 79068 /note=SSC: 79068-78631 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein [Gordonia bronchialis] ],,NCBI, q20:s30 86.8966% 2.57357E-20 GAP: 120 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.032, -4.597144958218091, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Gordonia bronchialis] ],,WP_223374396,43.6364,2.57357E-20 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Ghannam, Maisam /note=Auto-annotation: Auto annotation on Glimmer and Genemark call at the same start site of 79608 R. /note=Coding Potential: Strong coding potential on Host trained Genemark. /note=SD (Final) Score: -4.957 /note=Gap/overlap: 120 bp gap /note=Phamerator: 74936 indicates single pham member ObLaDi. 438 bp length /note=Starterator: orpham; no report /note=Location call: The gene is located at 79068 R /note=Function call: No strong hits for any functions. /note=Transmembrane domains: None CDS complement (79189 - 79698) /gene="152" /product="gp152" /function="hypothetical protein" /locus tag="ObLaDi_152" /note=Original Glimmer call @bp 79698 has strength 9.1; Genemark calls start at 79698 /note=SSC: 79698-79189 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_152 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 3.3619E-119 GAP: 94 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.082, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_152 [Gordonia phage Cafasso]],,QXN74367,98.8166,3.3619E-119 SIF-HHPRED: SIF-Syn: NKF Pham #82739, upstream gene is NFK, downstream is NFK, but conserved across phage Cafasso /note=Primary Annotator Name: Light, Isabel /note=Auto-annotation: Both Glimmer and Genemark show the same start site at 79698 with a Glimmer Score of 9.1 The start codon is ATG. /note=Coding Potential: Coding potential is observed in the putative ORF. /note=SD (Final) Score: -2.505 and is the best SD score. /note=Gap/overlap: 94 bp gap which is reasonable, there is likely not another gene in this area but it is possible a different start site could be the true start site and would decrease gap size. /note=Phamerator: This gene is found in pham 78565 as of 10/29/2021. This pham is conserved among 8 other members. In the DZ cluster, it is conserved in Cafasso and Aleemily. The function of this gene is unknown. /note=Starterator: 2/2 non draft genes in the pham called start site 4 (79698 in ObLaDi), which corresponds to the auto-annotation in both Glimmer and Genemark. /note=Location call: It seems this gene is a real gene and the predicted start site of 79698 is the most likely start site. While it is not conserved among all phages in the gene pham, it is conserved in all the phages in the DZ cluster and Cafasso does call start site 5. /note=Function call: There were very strong hits for this gene, PhagesDB, NCBI BLASTp, HHpred, and CCD showed hits for numerous alignments however each of the hits showed function unknown. No evidence for the function of this phage was found. /note=Transmembrane domains: No evidence of transmembrane domains. /note=Secondary Annotator Name: McLinden, Katherine /note=Secondary Annotator QC: The start site is included in the coding potential on the bottom most strand, but the stop site is not. For the second annotation, make sure to check the genomes with good E-values in the Phages DB blast box as evidence, even if they were not able to give an exact function. Also make sure to check off the NCBI BLAST evidence for Cafasso hypothetical protein (very good E-value). CDS complement (79793 - 80566) /gene="153" /product="gp153" /function="hypothetical protein" /locus tag="ObLaDi_153" /note=Original Glimmer call @bp 80566 has strength 13.69; Genemark calls start at 80566 /note=SSC: 80566-79793 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_153 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 0.0 GAP: 96 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.175, -2.253486910374644, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_153 [Gordonia phage Cafasso]],,QXN74368,99.2218,0.0 SIF-HHPRED: SIF-Syn: Protein of unknown function with pham number 56816, downstream gene is NKF with pham number 82739, upstream is also NKF with pham number 33605, just like in phage Cafasso. /note=Primary Annotator Name: Erfanian M., Kiana /note=Auto-annotation: Glimmer and GeneMark. Both called the same start site at 80566. /note=Coding Potential: This gene has reasonable coding potential within the putative ORF, but does not cover all this coding potential from start to stop. /note=SD (Final) Score: Of all the suggested start sites, the original start site has the best (highest) RBS Final score at -2.253, as well as the best (highest) Z-score at 3.175. /note=Gap/overlap: The original start site of this gene has a gap of 96 bp with the gene that comes before it start. Any gap greater than 50 bp is considered to be large, yet this gap is perfectly conserved with that of phage Cafasso`s in the same cluster (DZ). Therefore, this is most likely no cause for concern, and the gene still appears to be `real`. /note=Phamerator: This gene was found in pham 56816 on 10/22/21, and consists of 240 members, twelve of which are draft genomes. This pham was found to be present in another member of the cluster DZ, using phage Cafasso for comparison. /note=Starterator: Using information from the Starterator analysis run most recently on 10/22/21, it was found that the most conserved start site number is 35. This was called in 191 of the 228 non-draft genes in the pham, which is the same as the auto-annotated start is called at start number 35 (80566). This start site has been decided as the Final Human Annotated Start across the vast majority of phage tracks, including Cafasso. /note=Location call: The evidence gathered thus far indicates that the start site at 80566 as called by Glimmer and GeneMark appears to be the most probable site. /note=Function call: Both PhagesDB BLASTp and NCBI BLASTp have several hits with low e values, high identity percentages, and reasonable scores. The top non-draft hit on PhagesDB BLASTp was for a gene in Cafasso, a non-draft phage in the same cluster as ObLaDi. This hit has a significantly low E-value at 1e-142, a reasonable score, as well as a max identity percentage of 99%. This function of this gene, as well as all other genes for available hits on PhagesDB BLASTp, is unknown. Furthermore, the top non-draft hit on NCBI BLASTp was also for a gene in Cafasso. This hit has an E-value of 0, a reasonable score, as well as a max identity percentage of 99%. It should be noted however, that this gene is for a hypothetical protein, and therefore does not have a listed function. The function of all other hits with known functions however, are identified as domain-containing protein. CDD has one hit only, for a protein of unknown function. The HHpred returned several hits for proteins of various or unknown functions, and was therefore inconclusive in determining the function. Given the above information, there is not enough data to form a hypothesis for the function of my gene, and the function is therefore unknown (NKF). /note=Transmembrane domains: No TMDs called by TmHmm or TOPCONS. The protein is not a membrane protein. /note=Secondary Annotator Name: Kamarzar, Minehli /note=Secondary Annotator QC: Wonderful job! I absolutely agree with all the evidence presented suggesting that the start site called appears to be the correct one. CDS complement (80663 - 81199) /gene="154" /product="gp154" /function="hypothetical protein" /locus tag="ObLaDi_154" /note=Original Glimmer call @bp 81199 has strength 20.75; Genemark calls start at 81199 /note=SSC: 81199-80663 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_154 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 2.81518E-128 GAP: 923 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.925, -2.827683592113848, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_154 [Gordonia phage Cafasso]],,QXN74369,100.0,2.81518E-128 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF (Pham 33412), downstream is NKF (Pham 56816), just like in phage Cafasso from the same cluster DZ. It also has the same downstream gene as phage VanLee. /note=Primary Annotator Name: Khaine, Aye Myat /note=Auto-annotation: Both Glimmer and GeneMark mark the gene at the start 81199. /note=Coding Potential: The ORF has a good coding potential in the reverse direction. The start includes all of the coding potential. /note=SD (Final) Score: The final score is the best option at -2.828 and the z-score is the highest at 2.925. /note=Gap/overlap: The gap upstream is 923 bp which is considerably large. However, the gap has no coding potential and is also observed in the other phage Cafasso. /note=Phamerator: The pham this gene belongs to is 33605 as of 10/26/2021. It is found in two non-draft phages - Cafasso which belongs to the same cluster DZ and VanLee, a singleton. /note=Starterator: Start site number 6 is manually annotated in the other non-draft phage Cafasso of the same cluster DZ. This start position is at 81199, agreeing with the start called by Glimmer and GeneMark. /note=Location call: Based on the evidence, this gene is a real gene with the start site at 81199. /note=Function call: Unknown function (NKF). Top two hits in both phagesdb and NCBI BLAST have unknown function. HHpred does not have reliable hits to determine the function of this gene. /note=Transmembrane domains: Both TMHMM and TOPCONS do not give any predicted TMDs. This is not a membrane protein. /note=Secondary Annotator Name: Huq, Naveed /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 82123 - 82509 /gene="155" /product="gp155" /function="hypothetical protein" /locus tag="ObLaDi_155" /note=Original Glimmer call @bp 82123 has strength 12.32; Genemark calls start at 82177 /note=SSC: 82123-82509 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_155 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 5.83749E-80 GAP: 923 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.163, -2.2763497933341483, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_155 [Gordonia phage Cafasso]],,QXN74370,96.875,5.83749E-80 SIF-HHPRED: SIF-Syn: NKF. Cafasso is the only phage used as evidence for this gene call. The upstream and downstream genes in Cafasso and ObLaDi are all NKF as well. /note=Primary Annotator Name: McLinden, Katherine /note=Auto-annotation: Glimmer and Genemark do not agree on a start site. Glimmer calls 82123 as the start site. Genemark calls 82177 as the start site. /note=Coding Potential: This gene has good coding potential. Both GeneMark and Glimmer suggested coding potential at the putative ORF. Both start sites also includes all of the coding potential. /note=SD (Final) Score: -2.276. This is the best final score on PECAAN. /note=Gap/overlap: 923. This is large but it is conserved in the finalized genome Cafasso. Additionally, there is no coding potential in this gap. /note=Phamerator: It belongs to pham 33412 (10/26/21). It is conserved in Cafasso and Aleemily. /note=Starterator: Start site 1 is found and called for each of the three members of the pham, one of which is a non-draft genome (Cafasso). Start site 1 is 82123 in ObLaDi. 82123 was called by Glimmer but not GeneMark. However, ribosome binding potential and Starterator evidence support 82123 as the start site. /note=Location call: This is most likely a real gene. The most likely start site is 82123. /note=Function call: PhagesDB BLASTp and NCBI BLASTp were used to try and determine the gene’s function. There is one hit in each database from non-draft genes that has significance for this gene’s function. Both hits have very low E-values (6e-67 and 6e-67 respectively) and ~96% identities, making them good matches. However, either has any suggested function meaning we are not able to determine a putative function for this gene. Additionally, the minor tail protein, tape measure protein, and scaffolding protein functions are suggested by the PhagesDB Function Frequency. This may show that there was gene recombination or horizontal transfer at some point, but it does not mean that there was a conserved function, especially since there were several different protein functions. Additionally, there were no significant hits in HHpred and CDD, supporting the indication of no known function. /note=Transmembrane domains: There were no TMD`s called and no other evidence to suggest a TMD function within the other databases. /note=Secondary Annotator Name: Kamarzar, Minehli /note=Secondary Annotator QC: Looks great! I agree with all the evidence suggesting that the start site called is correct. Make sure to add that your gene is a real gene in the "Location call" section. CDS 82597 - 82830 /gene="156" /product="gp156" /function="hypothetical protein" /locus tag="ObLaDi_156" /note=Original Glimmer call @bp 82597 has strength 17.72; Genemark calls start at 82597 /note=SSC: 82597-82830 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_156 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.7841E-46 GAP: 87 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.416, -3.8900809529022813, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_156 [Gordonia phage Cafasso]],,QXN74371,98.7013,1.7841E-46 SIF-HHPRED: SIF-Syn: NKF (pham 54997), upstream gene is NKF (pham 33412), just like in phage Cafasso. Downstream is NKF (pham 33216), Cafasso first has NKF (pham 71224) downstream, then NKF (pham 33216). /note=Primary Annotator Name: Uvarov, Evgeniy /note=Auto-annotation start source: Glimmer and GeneMark both call start at 82597 (site 9) with an ATG start codon. /note=Coding Potential: Reasonable coding potential predicted within the putative ORF by both Glimmer and GeneMark. The chosen start site covers all the coding potential. This ORF only has forward strand coding potential, thus this is a forward gene. /note=SD (Final) Score: -3.890, the second best final score on PECAAN. The Z-score is 2.416, the second best value on PECAAN. /note=Gap/overlap: Gap of 87 bp. Considering that the only other possible start site has a gap of 279 bp, 87 bp is a fairly reasonable gap. /note=Phamerator: The pham number as of 10/25/2021 is 54997. The gene is conserved in phages Aleemily_Draft (DZ) and Cafasso (DZ). Cafasso (DZ) is the best phage genome for comparison since it is non-draft and is from the same cluster as ObLaDi. Based on PhagesDB there is no function call for the gene. /note=Starterator: Based on the 10/22/21 run there are 47 members total with 4 draft members in this pham. The most annotated start site is 14 but ObLaDi does not have it (neither does Cafasso). Start site 9 is a reasonable choice that is found among some members of pham 54997. It is called in 3/47 of the total genes and manually annotated in 1/43 of the non-draft genes. It is called 100% of the time when present. Start site 9 correlates to 82597 bp for ObLaDi. /note=Location call: Considering all of the evidence above, this gene is a real gene that is conserved in phamerator as well as starterator, has good coding potential and covers all of it with a start site at 82597 bp (site 9). Starterator agrees with Glimmer and Genemark. /note=Function call: Not enough data to form a function hypothesis, but this is likely a real protein. PhagesDB BLASTp hits are all of unknown function, the top non-draft hit with a small e-value is from Cafasso_156 (e: 1e-40, id: 98%). NCBI BLASTp hits are all hypothetical proteins with one having a small e-value from Cafasso_156 (e: 2e-46, id: 98.70%, cov: 100%). PhagesDB Function Frequency has no available data. The rest of the hits have much too high e-values. No CDD hits. Many HHpred hits but all not significant with the lowest three e-values of 8.8, 17, 17. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Dooley, Naomi /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 82827 - 82901 /gene="157" /product="gp157" /function="hypothetical protein" /locus tag="ObLaDi_157" /note= /note=SSC: 82827-82901 CP: yes SCS: neither ST: NI BLAST-Start: [membrane protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 2.65211E-6 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.683, -3.342129169857323, yes F: hypothetical protein SIF-BLAST: ,,[membrane protein [Gordonia phage Cafasso]],,QXN74372,100.0,2.65211E-6 SIF-HHPRED: SIF-Syn: /note=CP present on both GM host and self /note=Both cluster members Cafasso and Aleemily call this gene CDS 82980 - 83228 /gene="158" /product="gp158" /function="hypothetical protein" /locus tag="ObLaDi_158" /note=Original Glimmer call @bp 82980 has strength 15.49; Genemark calls start at 82980 /note=SSC: 82980-83228 CP: yes SCS: both ST: SS BLAST-Start: GAP: 78 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.065, -2.5588120788329394, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: NKF protein, upstream gene is NKF protein, downstream is NKF protein, just like in phage Cafasso. /note=Primary Annotator Name: Jin, Katherine /note=Auto-annotation: Both Genemark and Glimmer. /note=Both agreed on the same start site: 82980. Site # 188. ATG start codon. /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF. The chosen start site covers all this coding potential. /note=SD (Final) Score: -2.559, Z-score is 3.065 which is higher than 2. /note=Gap/overlap: 149bp, Gap is relatively big, and is not below the accepted 50bp. Unknown if there is gene in Gap. There is high coding potential at the gap for the third GeneMark coding line. /note=Phamerator: The pham number was 33216 as of 10/22/21. The gene is conserved in Cafasso, which is also part of the DZ cluster. /note=Starterator: Start site 3 was the most annotated start number that was called for 1/1 non-draft phages. This does correspond to the 82980 start site. Starterator, Glimmer and GeneMark all agree. /note=Location call: From the evidence from GeneMark and Starterator this gene is real and it`s most likely start site is 82980. /note=Function call: NKF, Only PhagesDB had a relatively good hit for this gene. Both CDD and NCBI BLAST did not have any hits, and HHpred did not have informative hits as their e-values were high (around 7-8) and coverage was not optimal. PhagesDB showed a hit with e-value 1e-04, for an unknown protein in Phage Cafasso. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Gonzalez, Celio /note=Secondary Annotator QC:Needs synteny box. No evidence was marked on PECAAN. The gap is considerably large, please explain more about it as referenced in the annotation notebook (ie should there be a gap there why or why not) CDS 83215 - 83478 /gene="159" /product="gp159" /function="hypothetical protein" /locus tag="ObLaDi_159" /note=Original Glimmer call @bp 83215 has strength 11.06; Genemark calls start at 83215 /note=SSC: 83215-83478 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_159 [Gordonia phage Cafasso]],,NCBI, q4:s36 96.5517% 8.49613E-47 GAP: -14 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.012, -4.699386225878371, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_159 [Gordonia phage Cafasso]],,QXN74374,63.8655,8.49613E-47 SIF-HHPRED: SIF-Syn: NFK, gene downstream is NFK just like in Cafasso; gene upstream is NFK just like in Cafasso /note=Primary Annotator Name: Montoya Serpas, Cinthya /note=Auto-annotation: Start site is 83215 per Glimmer and GeneMark /note=Coding Potential: There is coding potential present. The start site 83215 called by Genemark and Glimmer covers the coding potential within the putative ORF. /note=SD (Final) Score: -4.699. This is the best Final score among all 4 scores provided since it is the least negative value. /note=Gap/overlap: There is a 14 bp overlap with the gene upstream. There is a similar overlap in Cafasso`s genome which indicates that an overlap of 14 bps is reasonable for my gene. /note=Phamerator: The gene @stop site 83,478 is part of Pham 33179 as of 10/26/21. The pham in which my gene is conserved is present in all three members of the cluster to which the ObLaDi belongs. Phage Cafasso was used for comparison. The function of this gene is currently unknown. /note=Starterator: There is no reasonable start site choice that is conserved among the members of the pham to which the gene belongs. The most conserved start site is not present in track two where ObLaDi belongs. The start site number in this pham is #1 @83194. ⅔ members of this pham call site #1. /note=Location call: The gathered evidence suggests that start site #3 at 83215 is the best start site for this gene. Gene stop site @83,478 is a real gene but it does not appear to be conserved on starterator as it has a unique start. The start site @83215 covers all the coding potential and it has the best Z-score of greater than 2 and final score when compared to other start sites. /note=Function call: NFK: Based on the BLAST results for this sequence, there is not enough evidence to form a hypothesis regarding the function of gene # 158. There are no phages other than Cafasso that contain reasonably high identity values. According to PhagesDB, the identity percentage for Cafasso is 86%, the e-value is 2e-39, and the score is 159 bits. Similarly, NCBI reports an identity percentage of 87%, a much higher e-value at 3e-53, and a slightly lower score of 155. The second highest hit corresponds to phage MIndFlayer which has much worse values when compared to Cafasso. Phages DB reports a score of 28.1 bits, an e-value of 6.3, and an identity value of 56% which tells us that ObLaDi_58 contains a very different sequence when compared to other phages of different clusters. Although there aren’t any hits that have functions, Cafasso_159 is a very good match to ObLaDi_158 which indicates that this is in fact a real gene. There are no conserved domains listed in the NCBI database and the hits generated by the HHpred hitlist are not within the acceptable ranges due to their relatively high e-values (0.21 and 1.1). The probability values are quite high at 91.26 and 83.31 for the first and second hit respectively. The percent coverages are quite low (25.2874 and 25.874) and not within the desired range of > 40-50%. Moreover, NCBI BlastTP and Phages DBTP are not informative enough since the only protein with a sufficiently high identity value and low e-value is Cafasso which lists its corresponding protein as a hypothetical protein. /note=Transmembrane domains: No TMD’s were predicted for this sequence according to TMHMM and TOPCONS. Therefore, this is not a transmembrane protein. /note=Secondary Annotator Name: Erfanian M., Kiana /note=Secondary Annotator QC: I would mention whether the start sites called by GeneMark and Glimmer cover all of the coding potential within the putative ORF in the "Coding Potential" line above. The information provided indicates that your call on projected start site 83215 is correct. CDS 83475 - 83723 /gene="160" /product="gp160" /function="hypothetical protein" /locus tag="ObLaDi_160" /note=Original Glimmer call @bp 83475 has strength 18.28; Genemark calls start at 83475 /note=SSC: 83475-83723 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_160 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 5.68091E-49 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.243, -2.17469248771465, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_160 [Gordonia phage Cafasso]],,QXN74375,96.3415,5.68091E-49 SIF-HHPRED: SIF-Syn: No Known Function, upstream gene is in pham 33179, downstream gene is in pham 82545, just like Cafasso /note=Primary Annotator Name: Semaan, Sasha /note=Auto-annotation: Host GeneMark and Self GeneMark were used for auto-annotation analysis. Glimmer and GeneMark were used for assessment of the start site which is the same from both auto-annotations: 83475 /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF and the start site falls into coding potential as well. /note=SD (Final) Score: -2.175; The other suggested start sites for this gene did not have ideal Final Scores, but the start site of this suggested gene, 83,475, had the best final score of -2.175 and the most ideal overlap. /note=Gap/overlap: 4bp overlap (operon); This 4bp overlap suggests an operon is present with the suggested start site at 83,475 which is more optimal than any other gap lengths or overlaps. /note=Phamerator: This gene is part of Pham 15761 as of 10/27/21. This pham is present for gene Cafasso phage which is in the same DZ cluster as this gene. There are genes with other clusters present in this pham as well. There is no function called for this gene on Phamerator thus far. /note=Starterator: There are conserved start sites between this gene and the gene in Cafasso but not with most of the genes present that are classified under a different cluster. This gene does not have the most conserved start site present in the pham which is Start Site #12. The auto-annotated start site number for this gene and Cafasso is Start Site #11 which corresponds to a start site at 83457 bp for this gene. 3/5 call site #12 while 1/5 (Cafasso) call site #11. /note=Location call: This gene contains all the coding potential of the gene with the auto-annotated start site. The overlap and ORF length is most ideal for the suggested start site as well. Although, the start site is not conserved when looking at the genes present in the pham but is conserved when comparing it to the gene present with the same cluster designation. This evidence suggests the start site should be @ 83475 bp. /note=Function call: Function unknown; All of the hits in PhagesDB Blast with an acceptable e-value and identity, that are not draft genes, do not have a function listed (labeled as “function unknown”). The NCBI Blast had three hits listed that were also all hypothetical proteins, therefore with the data available right now, it is difficult to hypothesize a function for this gene. The CDD and the HHpred databases also provided no evidence: CDD did not have any hits and the hits that showed up in HHpred were either hypothetical proteins, had an unknown function, or had an insufficient e-value or low coverage. /note=Transmembrane domains: No transmembrane domains. There were no hits off TmHmm and Topcon. The gene does not have a hypothesized function thus far so having no transmembrane domains is not inconsistent with the no known function. /note=Secondary Annotator Name: Abuwarda, Manar /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 83725 - 84075 /gene="161" /product="gp161" /function="hypothetical protein" /locus tag="ObLaDi_161" /note=Original Glimmer call @bp 83725 has strength 13.29; Genemark calls start at 83725 /note=SSC: 83725-84075 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_161 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.76956E-74 GAP: 1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.433, -4.301170562178417, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_161 [Gordonia phage Cafasso]],,QXN74376,99.1379,1.76956E-74 SIF-HHPRED: SIF-Syn: NKF, Downstream and upstream genes both are unknown but are from the same phams. /note=Auto-annotation: Both call @ 83725 /note=Coding Potential: excellent on GM host /note=SD (Final) Score: -4.301, better than other choice /note=Gap/overlap: 1 /note=Starterator: 2/2 DZ non-draft genes call 83725 (site 3) /note=Function call: no good evidence /note=Transmembrane domains: None /note=Secondary Annotator Name: Empson, Brianna /note=Secondary Annotator QC: You need to fill out the GM coding drop-down menu. Your notes are really lacking and don`t have most of the required information. Double-check the annotation manual for what should be included. You need a lot more under pretty much every category, and you need to fill out auto-annotation. Also for the synteny box, make sure you include the pham numbers. CDS 84072 - 84392 /gene="162" /product="gp162" /function="hypothetical protein" /locus tag="ObLaDi_162" /note=Original Glimmer call @bp 84072 has strength 10.09; Genemark calls start at 84072 /note=SSC: 84072-84392 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_162 [Gordonia phage Cafasso]],,NCBI, q1:s1 99.0566% 1.15386E-67 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.925, -4.371751636464124, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_162 [Gordonia phage Cafasso]],,QXN74377,89.4737,1.15386E-67 SIF-HHPRED: SIF-Syn: Upstream gene (stop 84075; pham 82545) and downstream gene (stop 84685; pham 16401) do not have known functions. The conserved genes in phage Cafasso also do not have known functions. /note=Primary Annotator Name: Alvarez, Alondra /note=Auto-annotation: Glimmer and GeneMark both agree on the start site at 84072. /note=Coding Potential: On both Host and Self GeneMark there is a good amount of coding potential within the ORF, nearly spanning the entire frame. There is both typical and alternate coding potential present. The chosen start site includes all of the coding potential. /note=SD (Final) Score: -4.372. This is the best RBS final score out of the potential start sites. /note=Gap/overlap: There is an overlap of -4 bp. This amount of overlap can be indicative of the gene being part of an operon. /note=Phamerator: Gene belongs to pham 22603, accessed 10/27/2021. Of the 7 members, 5 are non-draft phages, namely: Cafasso (DZ), Bantam (DL), Daredevil (DL), DatBoi (DL) and SpeedDemon (DL). /note=Starterator: start site 7 at position 84072 is the most conserved start site; it is annotated in 3 of 5 non-draft genes in the pham. /note=Location call: Based on the evidence collected, the gene is considered to be “real.” The chosen start site is at 84072. /note=Function call: NKF - The top 5 hits in PhagesDB BLASTp and NCBI BLASTp, meaning those with the highest e-value, did not have a suggested function (function unknown). Only the top three hits in the latter had a high % identity (>77.76%). CDD did not return any hits. HHpred did not return any significant hits. The smallest e-value recorded was 30, belonging to hit PF00436.27. Given this data, the ORF has no known function. /note=Transmembrane domains: TMHMM and Topcons do not predict any transmembrane domains. Gene is not a membrane protein. /note=Secondary Annotator Name: Cosentino, Evan /note=Secondary Annotator QC: Looks good! I agree with the primary annotator. CDS 84389 - 84685 /gene="163" /product="gp163" /function="hypothetical protein" /locus tag="ObLaDi_163" /note=Original Glimmer call @bp 84389 has strength 8.55; Genemark calls start at 84389 /note=SSC: 84389-84685 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_165 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 1.5912E-63 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.329, -6.100660902441702, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_165 [Gordonia phage Cafasso]],,QXN74380,98.9796,1.5912E-63 SIF-HHPRED: SIF-Syn: NKF (pham 16401), upstream is pham 22603, downstream gene is PAPS reductase-like domain (pham 32687), just like in phage Cafasso. /note=Primary Annotator Name: Baughman, Lexie /note=Auto-Annotation: Glimmer and GeneMark. Both agree on the same start site of 84389, with a start codon of GTG. /note=Coding Potential: Reasonable coding potential predicted within the putative ORF. Chosen start site of 84389 covers all of the coding potential. /note=SD (Final) Score: The SD score is not the best (three other potential start sites have a better SD score), but it is still reasonable to suggest the presence of a credible ribosome binding site. The gene does seem to be organized into an operon, so the RBS score could even be irrelevant for this start call. /note=Gap/Overlap: There is a 4 base pair overlap with the upstream gene, indicating that this gene is likely part of an operon. This start site creates the longest ORF and the length of the gene is acceptable. /note=Phamerator: As of 10/24/2021, the gene is found in Pham 16401. The pham is conserved in other members of the cluster - comparison was done between ObLaDi and Cafasso, as this was the only non-draft genome available. The function is not called by either Phamerator or PhagesDB. /note=Starterator: The “Most Annotated” start site (4) is present in 2 of 3 non-draft genes in this pham; however, this start site is not present in ObLaDi. In terms of ObLaDi, the start site with the most manual annotations is 6, which is at position 84389. This start site was found in 3 of 5 genes in the pham (1 of which is finalized) but it is called 100 percent of the time when present. /note=Location Call: The gathered evidence suggests that this is a real gene and that its start site is likely at position 84389. /note=Function Call: The top 2 NCBI BLASTp hits suggested function is hypothetical protein with high query coverage (87% and 97%), decent % identity (45% and 48%), and low e-values (<5e-31). The top PhagesDB BLASTp hit suggested function is unknown, with high to decent % identity (44% and 96%) and a low e-value (<2e-19). The other hits in PhagesDB are on draft genes or have high e-values. Similarly, the CDD and HHpred hits were uninformative, with very high e-values and low probabilities and coverages. As such, there does not seem to be enough evidence to call the function of this gene. /note=Transmembrane Domains: No predicted transmembrane domains. /note=Secondary Annotator Name: Jin, Katherine /note=Secondary Annotator QC: Evidence for start site, and Function call makes sense to call it NKF. I also agree with the TMD evidence. Good Job! CDS 85035 - 85745 /gene="164" /product="gp164" /function="PAPS reductase-like domain" /locus tag="ObLaDi_164" /note=Original Glimmer call @bp 85035 has strength 16.03; Genemark calls start at 85035 /note=SSC: 85035-85745 CP: yes SCS: both ST: SS BLAST-Start: [PAPS reductase-like domain protein [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 3.17848E-173 GAP: 349 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.857, -2.90473872338446, yes F: PAPS reductase-like domain SIF-BLAST: ,,[PAPS reductase-like domain protein [Gordonia phage Cafasso]],,QXN74381,100.0,3.17848E-173 SIF-HHPRED: c.26.2.2 (A:) Phosphoadenylyl sulphate (PAPS) reductase {Escherichia coli [TaxId: 562]},,,d1sura_,80.0847,99.8 SIF-Syn: Function is PAPS reductase-like domain protein and upstream gene (pham 32686) is DNA binding protein just like Cafasso. All of the the dowstream genes are NKF /note=Primary Annotator Name: Dooley, Naomi /note=Auto-annotation: Called by Glimmer and Genemark to start site 85035 with a TTG, which is a much less common start site. /note=Coding Potential: The gene has reasonable coding potential and the start site covers all coding potential based on the host trained Genemark /note=SD (Final) Score: The SD score for the 85035 start site is the best listed at -2.905 and thus would suggest that this could be the correct start site. /note=Gap/overlap: There is an unreasonable gap between the gene in question and the previous gene. There is a gap of 349 which could indicate that there is an unannotated gene found between the two genes; However there is no coding potential in this region and this gap is conserved. However, the current gene is a reasonable length. /note=Phamerator: The Pham number is 32687 as of (10/28/21) this gene is also found in Cafasso (cluster DZ) which means it is likely conserved between the two phage. The function listed for Cafasso is PAPS reductase-like domain protein which could indicate a similar function for this gene in Obladi. /note=Starterator: The start at 85035 was the most conserved site. The most conserved start number was 13 which coincides with the auto annotated start site for the gene. Additionally, there were 4 genes in the pham all of which had the same start number (2 were draft). /note=Location call: Based on the data that this is a real gene with a start site at 85035 /note=Function call: Based on the evidence, it is safe to determine that the function of this gene is PAPS reductase-like domain protein. This is because all of the best gene hits in PhagesDB, Cafasso and VanLee, both have low E-values (1E-37 and 2E-77 respectively) and both contain this function. Similarly the NCBI hits have this function (Cafasso: 100% coverage, 100% identity, and E-value 3E-173; VanLee: 96% Coverage, 57% Identity, and E-value 6E-95). HHpred has PAPS-reductase listed with 99.8% probability, 78%+ coverage, and E-value <1E-18 CDD contained one hit with a very low e-value that was not strong enough evidence to conclude much of anything. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein /note=Secondary Annotator Name: Wu, Meigan /note=Secondary Annotator QC: Looks great! Good job! /note=11/29 Secondary QC: Is the gap conserved, or is there coding potential where the gap is? This can help predict whether an undetected gene is present. For Phamerator, I would also note that Cafasso and ObLaDi are both in the same cluster. For Starterator, specify how many non-draft pham members called the start site. Also, this is minor, but for location call, I would take out "I believe." I would comment on CDD for Function Call too; it looks like one hit has good coverage and a low e-value (I don`t see probability through PECAAN). Don`t forget to fill out the synteny box! CDS 85742 - 86437 /gene="165" /product="gp165" /function="hypothetical protein" /locus tag="ObLaDi_165" /note=Original Glimmer call @bp 85742 has strength 15.73; Genemark calls start at 85742 /note=SSC: 85742-86437 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_167 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 9.78955E-168 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.826, -4.574171834242154, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_167 [Gordonia phage Cafasso]],,QXN74382,99.1342,9.78955E-168 SIF-HHPRED: SIF-Syn: ObLaDi gene 164 (NKF, pham 32924) shows synteny with Cafasso gene 167 (no function noted, pham 32924). Upstream is ObLaDi gene 163 (PAPS reductase-like domain), which matches with Cafasso gene 166. Downstream is ObLaDi gene 165 (NKF, pham 32965), and similarly, Cafasso gene 168 has no function noted (pham 32965). /note=PECAAN Notes /note=Primary Annotator Name: Wu, Meigan /note=Auto-annotation: Glimmer and GeneMark. Both give a start site at 85742. Start codon: ATG. /note=Coding Potential: Typical coding potential in a forward frame found on GeneMark Host. Typical and alternate coding potential in a forward frame and alternate coding potential in a reverse frame are observed on GeneMark Self. /note=SD (Final) Score: -4.574 (This is not the best final score on PECAAN, but this final score corresponds with the most reasonable gap/overlap size.) /note=Gap/overlap: Overlap = 4 bp; this gene may be a part of an operon. /note=Phamerator: pham 32924. Date: 10/26/2021. Conserved: also found in Cafasso (DZ). No function predicted by Phamerator. /note=Starterator: Start number 6 was manually annotated in 1/2 non-draft genes for pham 32924. The respective start position is at 85742 bp. This data matches with the Glimmer and GeneMark start site call. /note=Location call: This gene is likely to be a real gene with a start site at 85742. /note=Function call: NKF. Support: Two hits with small e-values found using PhagesDB BLASTp (e-132 & 4e-52), and many hits with small e-values found using NCBI BLASTp (<= 2e-8). CDD yielded no hits. HHpred provided no informative hits. /note=Transmembrane domains: Neither TMHMM or Topcons identified any transmembrane domains. This gene does not code for a transmembrane protein. /note=Secondary Annotator Name: Santos, Charysa /note=Secondary Annotator QC: I agree with the location call/start site of 85742 based on the evidence gathered. CDS 86541 - 86687 /gene="166" /product="gp166" /function="hypothetical protein" /locus tag="ObLaDi_166" /note=Original Glimmer call @bp 86541 has strength 4.68; Genemark calls start at 86541 /note=SSC: 86541-86687 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAFASSO_168 [Gordonia phage Cafasso]],,NCBI, q1:s1 100.0% 2.67991E-22 GAP: 103 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.005, -3.9067510144203146, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CAFASSO_168 [Gordonia phage Cafasso]],,QXN74383,100.0,2.67991E-22 SIF-HHPRED: SIF-Syn: NKF, upstream gene is of pham 32924, just like in Cafasso. /note=Primary Annotator Name: Abuwarda, Manar /note=Auto-annotation: Glimmer and Genemark. Both call the start site at 86541. /note=Coding Potential: The ORF has reasonable coding potential. Coding potential is found in both Genemark Self and Host. The chosen start site includes all the coding potential. /note=SD (Final) Score: -3.907. This is the only RBS final score that is suggested by PECAAN. /note=Gap/overlap: 103. The gap is somewhat large, however, this gap is conserved in the genome of Cafasso phage and the gap has no coding potential. /note=Phamerator: Pham 32965. Date 10/22/21. It is conserved and found in Cafasso (DZ). /note=Starterator: Start site 1 was manually annotated in 1/1 non-draft phages in this pham of 3 members. Start site 1 is 86541 in ObLaDi. This evidence agrees with the start site predicted by Glimmer and GeneMark. Start site 1 is the only suggested start site on Starterator. /note=Location call: Based on the above evidence, this is a real gene with a start site most likely at 86541. /note=Function call: NKF. phagesDB and NCBI BLAST show no phage hits with known function. CCD and HHPRED only show phage hits with large e-values. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Di Blasi, Daria /note=Secondary Annotator QC: I agree with the primary annotator based on all the evidence. Add what the large e-values are on CDD and HHPred