CDS 101 - 436 /gene="1" /product="gp1" /function="hypothetical protein" /locus tag="MooKitty_1" /note=Original Glimmer call @bp 101 has strength 12.5; Genemark calls start at 92 /note=SSC: 101-436 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein PP340_gp01 [Arthrobacter phage Adaia] ],,NCBI, q3:s4 98.1982% 9.895E-35 GAP: 0 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.55, -4.680627165156441, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PP340_gp01 [Arthrobacter phage Adaia] ],,YP_010649357,74.3363,9.895E-35 SIF-HHPRED: SIF-Syn: This gene has synteny with the other three genomes in its cluster: Adaia_1, Atraxa_1, Sputnik_1. All have this gene but all do not have a function for this gene. /note=Primary Annotator Name: Berber-Pulido, Rodrigo /note=Auto-annotation: Glimmer and Genemark do not agree on start site. Glimmer calls 101 while Genemark calls 92. Both use a codon of GTG. /note=Coding Potential: Coding potential is seen in both Host-Trained and phagesdb genemark. This is a forward gene as there is no significant coding in the reverse direction. Confirms there is high coding potential in entire gene. Both start sites cover entire coding potential. However, start site @101 has a closer start site to when the coding potential first starts. /note=SD (Final) Score: Start site @92 has a final score of -6.918 while start site @101 has a final score of -4.681, This suggests that start site 101 has a better final score than start site @92. Moreover, start site @92 has a z-score of 1.173 while 101 has a z-score of 2.55. This indicates that start site @101 has a much stronger z-score since a z-score of over 2 are good. /note=Gap/overlap: Since this is the first gene, there is no gap/overlap to previous genes. Length of gene is acceptable with start site @101 and @92. /note=Phamerator: Phamerator analysis was run on 4/5/23, shows that it is a part of AX cluster. Cluster number: 11339. This cluster has 4 members with 1 draft. This gene is also conserved as it is seen in the other genomes of the same cluster: Adaia_1, Atraxa_1, Sputnik_1. /note=Starterator: The starterator analysis was run on 4/5/23, and it concludes that start site #4 @101 is the correct start site. There is a conserved start site choice @88 that is not seen in this gene. This start site @101 also has 2 MAs and is the only one with any MAs. /note=Location call: start 4 @ 101, this gene is real. /note=Function call: Phagesdb BLAST confirms this gene`s presence with other genomes mentioned in the phamerator report, however, the function is unknown. NCBI BLAST seconds this as the top hits are hypothetical proteins with no function mentioned.No CDD hits seen. HHpred has multiplle hits of different proteins however, they all have an E-value of 34 and above with the most probable ones having an evalue of 34, 130 and 46 in that order. I would conclude that this gene has an unknown function. /note=Transmembrane domains: There is no evidence from the DEEPTMHMM that shows this is a transmembrane protein as it predicts all proteins are inside. /note=Secondary Annotator Name: Martinez, Daniela /note=Secondary Annotator QC: I agree with the location and function call. My only suggestion is that you specify that the coding potential is found in the forward direction and that there was no significant coding potential in the reverse direction. CDS 417 - 671 /gene="2" /product="gp2" /function="hypothetical protein" /locus tag="MooKitty_2" /note=Original Glimmer call @bp 486 has strength 7.94; Genemark calls start at 441 /note=SSC: 417-671 CP: yes SCS: both-cs ST: NI BLAST-Start: [hypothetical protein PP340_gp02 [Arthrobacter phage Adaia] ],,NCBI, q5:s1 77.381% 3.97544E-18 GAP: -20 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.445, -2.4406251440021154, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PP340_gp02 [Arthrobacter phage Adaia] ],,YP_010649358,58.4416,3.97544E-18 SIF-HHPRED: SIF-Syn: Contains synteny with other phages such as Sputnik, Atraxa, and Adaia in terms of location and overlap, but the other phages are not annotated to list their functions, nor the functions of the genes surrounding either side of this gene. /note=Primary Annotator Name: Bursulaya, Isabelle /note=Auto-annotation: Both Glimmer and GeneMark were used, but they called the start site at different locations. Glimmer called the start site at 486 with a start codon of TTG (a less likely start codon) while GeneMark called the start site at 441 with a start codon of ATG. There are also other start sites listed, such as at 417 with a start codon of ATG. The length of all 3 possible start sites would be larger than 120 base pairs, suggesting all have potential to be protein coding genes. /note=Coding Potential: Coding potential at both start sites was seen in the forward orientation, but also had coding potential in the reverse orientation. However, there doesn’t seem to be enough space between the first and third gene for the gene to be in the reverse direction, so it is most likely a forward gene, but it is important to note that there is coding potential in the reverse direction. The coding potential was similar in both the Host and Self Trained GeneMark. However, the start site at 441 appears to be covered by the coding potential while the sites 417 and 486 are not, although it is possible that 417 is slightly covered but it is extremely hard to tell since the 417 and 441 are very close together. It is also important to note that if this gene is deleted, there would be a fairly large gap of 232 nucleotides, providing stronger support that there the gene does exist. /note=SD (Final) Score: The final score at location 417 is -2.441, which is also the best score on PECAAN. /note=Gap/overlap: The gap is -20 at position 417, which is a little unlikely but not impossible if the gene is part of an operon. The same gene in other phages (Sputnik, Atraxa, and Adaia) also had similar overlap with the first gene. Position 417 is the only position where there is strong overlap with the other genes. The length of the gene would be 255, while is a reasonable length since it is long enough to be a gene. /note=Phamerator: As of 4/4/2023, the pham is listed as 16439. This is shared with another phage named Adaia, which is also in cluster AX. There are only 2 members in this pham, but one of them is listed to be the MooKitty draft. The other is Adaia_2. /note=Starterator: No start sites were manually annotated in Starterator for MooKitty, but the suggested start number was at start number 6 with position 486. Note that there is only one other phage’s gene (Adaia_2) that is also in Starterator, so this makes any notable comparisons very hard. On Starterator, the start sites were not conserved, but they were close. At 417 for MooKitty, the start site was 1 while in Adaia the start site was 2 at position 421. Again, since there is only one phage to compare MooKitty to, the results of this should not be weighed heavily in the final consideration. /note=Location call: Based on all the evidence, the gene is real, but 417 appears to be the correct start site. Although neither Glimmer nor GeneMark called this start site, it has the highest Z score and the most positive final score. Its start codon is ATG, and only 441 has another start codon as ATG while the rest are either TTG or GTG which are more rare. The coding potential seems to begin near 417, so 441 and 486 are too far upstream. Also, at 417, there would be overlap with gene 1, which is conserved (synteny) with 3 other phages. Starterator, unfortunately, was not much help as there are no manually annotated start sites, but it did list 417 as a potential start site. 417 would also lead to the largest possible ORF, providing more evidence for it. /note=Function call: There is only one other phage in BLASTp for both the NCBI and PhagesDB, which is Adaia. In the PhagesDB BLASTp, there was an e value of 5e-19, and the identity was 59%, which is slightly lower than the preferred 75% threshold. For NCBI, the e value was 4e-18, which is low, which is good, and the identity was 60%, which is higher than the threshold of 35%. The percent coverage was 77%, which is fairly good. However, both sites listed the function as unknown. Unfortunately, there were no hits on CDD, but HHpred had a hit with 4GDA_B, which had a probability of 61.14% (which isn’t very high but was the highest probability) and an e value of 9.5, which is extremely high and should probably not be trusted. The coverage was 29.76%, which is lower than the preferred 35% but is still fairly close. The listed function was a biotin binding protein, but because the scores are low and untrustworthy, and there is only one other phage to compare MooKitty too, I am skeptical about listing this as the function. Overall, I do not think there is enough strong evidence to really state the function of the gene and it should remain NKF for now. /note=Transmembrane domains: There appear to be no transmembrane domains, as the probability is 100 that all proteins are on the inside of the cell. Because I am unsure what the function of this gene is, I cannot state whether or not it makes sense that the protein has no transmembrane domains. /note=Secondary Annotator Name: Unanwa, Nnaemeka /note=Secondary Annotator QC: Notes look very good! Btw, according to the lab manual you do not need to fill out the synteny box if your gene has NKF CDS 668 - 2077 /gene="3" /product="gp3" /function="terminase" /locus tag="MooKitty_3" /note=Original Glimmer call @bp 668 has strength 8.8; Genemark calls start at 668 /note=SSC: 668-2077 CP: yes SCS: both ST: NI BLAST-Start: [terminase family protein [Rothia sp. ZJ1223] ],,NCBI, q8:s22 98.5075% 8.6412E-118 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.131, -4.820269098574683, no F: terminase SIF-BLAST: ,,[terminase family protein [Rothia sp. ZJ1223] ],,WP_204878207,56.6462,8.6412E-118 SIF-HHPRED: Terminase large subunit; genome packaging, bacteriophage, ATPase, nuclease, VIRAL PROTEIN; HET: BR; 2.2A {Enterobacteria phage HK97},,,6Z6D_A,93.8166,100.0 SIF-Syn: Displays synteny with Adaia, Atraxa, and Sputnik --> downstream gene is portal protein for all of them /note=Primary Annotator Name: Okumura, Joey /note=Auto-annotation: Glimmer and Genemark used → agree on start 668 with TTG codon /note=Coding Potential: gene has coding potential in second forward ORF starting around 700 bp + minimal spikes of coding potential in other ORFs → similar results in host and self trained GeneMark /note=SD (Final) Score: FS of -4.820 and z score of 2.131 for start 668. In general, FS tend to increase as z scores increase so no starts have great FS and z scores (best FS of -3.693 with z score of 2.962 for start 1496, best z score of 0.953 but FS of -7.651 for 1127) /note=Gap/overlap: -4. Overlaps with previous gene if 668 is treated as start site since previous gene has 671 stop /note=Phamerator: pham number 73449. Date 4/4/2023. Pham has 1183 members, 106 are drafts. Gene also found in phages of the same cluster as MooKitty (AX); found in Adaia, Atraxa, and Sputnik. /note=Starterator: Most annotated start number was 138, called in 139 of the 1049 non-draft genes in the pham but MooKitty not included. Start 183 @764 has 4 MA`s. Start 194 @782 has 14 MA`s. Other phages in AX cluster call different start sites than MooKitty but these start sites are early in the gene. This evidence does not agree with the site predicted by Glimmer and GeneMark. /note=Location call: Based on above evidence, this is likely a real gene with the start site at 668. Start site for this one was tricky. Evidence used to determine 668 start site is the following: /note=668 prevents a gap and the 4 bp overlap could indicate gene is part of operon /note=Only 10% of phages in pham call most annotated start site so it is not too concerning for MooKitty to not have this start /note=Other members in AX cluster have different starts than MooKitty but these starts are close to beginning of gene, similar to the 668 start for MooKitty /note=Function call: Multiple PhageDB and NCI BLAST hits with terminase and large subunit terminase (all e values less than e-22). CDD had one hit of a large subunit terminase (e value of 4.34e-12, lower coverage but still >35%). HHpred also indicates terminase or large subunit terminase with 100% probability, >90% coverage, and 50bp, which is generally accepted), smallest gap with possible start site candidates (no start sites with overlap) and thus results in the longest reasonable ORF (>120bp). Synteny shows ~100bp gap for Atraxa and Sputnik and ~70bp gap for Adaia between genes 5 and 6), showing a >100bp gap between gene of interest and upstream gene is not conserved. Gap is not large enough for another gene (<120bp). No coding potential present in the gap. /note=Phamerator: Gene is found in pham #1643 as of 2023-04-03; pham contains 2 other non-draft AX subcluster phages (Adaia_6, Sputnik_6). 44 phages are part of the A cluster. All functions called for this pham if applicable are head-to-tail adaptors. /note=Starterator: MooKitty does not contain the most annotated start site (start site 3 in 38/48 non-draft genes). Most manually annotated start site for MooKitty is start site 12 @4861, which matches the auto-annotated start site. No other start sites have manual annotations. /note=Location call: Gene is most likely real with start site @4861 based mainly on a decent RBS score, being the most manually annotated on Starterator, and having the start site that closes the gap between the gene and the upstream gene. Synteny shows that the <100bp gap is conserved between all 3 non-draft AX phages (Adaia, Atraxa and Sputnik). /note=Function call: Head-to-tail adaptor. Phagesdb and NCBI BLAST call head-to-adaptor proteins in phages Adaia and Atraxa with small e-values (<10^-26) and high identity (50%) (Adaia and Atraxa). HHpred calls hits to proteins that have head-to-tail connector function with e-values of around 10^-15, coverage >94% and high probability >99%. Fulfills SEA-PHAGES requirement of HHpred alignment with crystal structure HK97 gp6 (3JVO_K hit) necessary to call this function (function decided over portal protein suggested by hhpred). No hits present in CDD for amino acid sequence. /note=Transmembrane domains: TMHMM does not call any transmembrane domains in this region of the genome. /note=Secondary Annotator Name: Okumura, Joey /note=Secondary Annotator QC: I agree with all conclusions and notes are very thorough! CDS 5193 - 5561 /gene="7" /product="gp7" /function="tail terminator" /locus tag="MooKitty_7" /note=Original Glimmer call @bp 5193 has strength 10.09; Genemark calls start at 5193 /note=SSC: 5193-5561 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PP340_gp07 [Arthrobacter phage Adaia] ],,NCBI, q1:s1 100.0% 2.07562E-31 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.345, -4.256861663406435, yes F: tail terminator SIF-BLAST: ,,[hypothetical protein PP340_gp07 [Arthrobacter phage Adaia] ],,YP_010649363,61.4173,2.07562E-31 SIF-HHPRED: Minor tail protein U; Mixed Alpha-Beta fold, VIRAL PROTEIN; HET: SO4, MSE; 2.7A {Enterobacteria phage lambda} SCOP: d.323.1.1,,,3FZ2_C,99.1803,98.0 SIF-Syn: /note=Primary Annotator Name: Rodriguez, Justin /note=Auto-annotation: Coding potential is called for this gene in both Glimmer and GeneMark. They both agree on the start site of 5193. ATG is the start codon which is normal. /note=Coding Potential: There is reasonable coding potential as shown by Self-trained and Host-trained GeneMark graphs. Coding potential covers the proposed length of the gene. /note=SD (Final) Score: -4.257, which is the best one /note=Gap/overlap: -1 which is an overlap. This is fine since it shows that it is potentially part of an operon. Length of the gene at 369 is reasonable too. Final score is therefore not as necessary to look at. /note=Phamerator: 1640, 4/3/2023. Three other non-draft genomes (Adaia, Atraxa, and Sputnik) in the AX cluster are in the pham. There are 56 members in the pham, 8 are drafts. No gene function called /note=Starterator: There is a called start site that is found in 11 of 56 genes. There are 7 manual annotations at this site. The called start site is 5193. This gene does not call the most annotated start site, but 36 of 48 non-draft genomes in the pham call it. Starterator is not informative for this gene. /note=Location call: The start site is likely 5193. This is most likely a real gene taking coding potential and calls from Glimmer and GeneMark (5193 called for both) into consideration. Starterator also reports 5193 as a start site. /note=Function call: The predicted function is NKF as significant hits (e-values of less than e-15) from NCBI or PhagesDB BLASTs represent hypothetical proteins with no function. In PhagesDB BLAST, sequence aligns with genes from Sputnik, Adaia, and Atraxa (all in the same AX cluster). NCBI CDD has no hits at all, and HHPRED has no significant hits. /note=Transmembrane domains: The absence of TMDs from DeepTMHMM does not inform much about the function of this protein since the protein has NKF at the moment. Looking at pham maps in PECAAN, genes upstream and downstream of this gene in other phages are associated with the tail structure. This supports there being no transmembrane domains. /note=Secondary Annotator Name: Martinez, Daniela /note=Secondary Annotator QC: I agree on the location and function calls. Please fill out the synteny box! Overall, great job. CDS 5563 - 5988 /gene="8" /product="gp8" /function="major tail protein" /locus tag="MooKitty_8" /note=Original Glimmer call @bp 5563 has strength 8.96; Genemark calls start at 5563 /note=SSC: 5563-5988 CP: yes SCS: both ST: SS BLAST-Start: [major tail protein [Arthrobacter phage Adaia] ],,NCBI, q1:s1 99.2908% 3.39782E-79 GAP: 1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.901, -3.8130952303033587, yes F: major tail protein SIF-BLAST: ,,[major tail protein [Arthrobacter phage Adaia] ],,YP_010649364,91.4894,3.39782E-79 SIF-HHPRED: Phage major tail protein, TP901-1 family; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_G,95.0355,99.6 SIF-Syn: There is synteny with other phages (Sputnik and Adaia) that also call the function for this gene a major tail protein. Both the upstream gene and downstream gene have NKF. /note=Primary Annotator Name: Ortiz-Gomez, Diana /note=Auto-annotation start source: Glimmer and GeneMark both call the start site at 5563. /note=Coding Potential: There is strong coding potential present in both Glimmer and GeneMark. The suggested start site covers all of this coding potential. /note=SD (Final) Score: The suggested start site has the best Z-score of 2.901 and a final score of -3.813. /note=Gap/overlap: There is a gap of 1bp with the upstream gene, which may indicate that this gene is part of an operon. We see this overlap in other phages such as Adaia and Sputnik. /note=Phamerator: Pham 618 (04/04/2023). This gene is conserved in other members of the cluster AX (phages Adaia and Sputnik). The function call is a major tail protein. /note=Starterator: MooKitty does not contain the most annotated start site, which is start site 5 found in 81/136 genes. Start site 11 is the most annotated start site for MooKitty (41/136 manual annotations) which is called in phages Adaia and Sputnik. This start site agrees with GeneMark and Glimmer. /note=Location call: The evidence above shows that this gene is a real gene and has a start site at 5563. Starterator provides evidence for this start site called in GeneMark and Glimmer. /note=Function call: Major tail protein. The top PhagesDB hits suggest a function of major tail protein with e-values of 3e-58 and 1e-63. Top two hits in NCBI BLAST also suggest this function (identity score of >79% and e-values of <2e-72). No CDD hits. Three of the best HHPRED hits agree with this function and have a probability score of 99.4-99.6%, coverage of >91%, and e-values of <2.9e-11. /note=Transmembrane domains: TmHmm predicts no transmembrane proteins. /note=Secondary Annotator Name: Berber-Pulido, Rodrigo /note=Secondary Annotator QC: Great job! I agree with your notes and the suggested start sites/function. I really liked how concise you were with the notes. It made it easy to read. CDS 5991 - 6389 /gene="9" /product="gp9" /function="hypothetical protein" /locus tag="MooKitty_9" /note=Original Glimmer call @bp 5991 has strength 10.43; Genemark calls start at 5991 /note=SSC: 5991-6389 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PP342_gp09 [Arthrobacter phage Atraxa] ],,NCBI, q1:s1 99.2424% 5.92843E-53 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.804, -5.402031654876326, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PP342_gp09 [Arthrobacter phage Atraxa] ],,YP_010649393,80.303,5.92843E-53 SIF-HHPRED: SIF-Syn: NKF, upstream gene is major tail protein and downstream gene is tail assembly chaperone, just like in cluster AX phages Atraxa, Sputnik, and Adaia.Other AX phages do not have function called for this gene. /note=Primary Annotator Name: SCRIVEN, SAVANNAH /note=Auto-annotation: Glimmer and GeneMark both call start site at 5991 with ATG start codon. /note=Coding Potential: In both Host- and Self-Trained GeneMark, the third, forward translational frame had two peaks of high coding potential separated by a dip of zero coding potential. Start site includes all coding potential. There was no coding potential in any other translational frames, except for a narrow peak on the first reverse strand between the peaks on the third forward strand. This coding potential was ignored as it is unlikely that a reverse gene exists between two forward genes. /note=SD (Final) Score: -5.402. Second highest final score on PECAAN. Z score of 1.804 does not show a good sequence match to potential RBS. While an alternate start site has a Z score of 2.001, it would introduce an unlikely 308bp gap and cut off high coding potential. Although final and Z scores are not the best, more weight can be given to synteny evidence. Published AX phage genomes Atraxa, Sputnik, and Adaia show this gene in a similar location and order with surrounding genes. /note=Gap/overlap: There is a reasonable upstream gap of 2bp and downstream gap of 7bp. The auto-annotated start site minimizes gaps and is the LORF of 399bp. /note=Phamerator: (04/04/2023) Pham 58402 is conserved in all three other published AX genomes (Adaia, Atraxa, Sputnik). This pham is also frequently annotated in cluster AN and EE phages. In 42 of 126 non-draft genes of Pham 58402, function called as minor tail protein. /note=Starterator: Run on 03/24/2023. Gene does not have “Most Annotated” start 3 in Pham 58402 (81/126), but its auto-annotated start site 5 at 5991bp in MooKitty is present in 7 non-draft genes and called 100% of time. Start is called in 3 AX genomes (Atraxa, Adaia, Sputnik), 3 DM genomes (Emperor, EpicDab, SallySpecial), and CW3 phage Schiebs. /note=Location call: Starterator, Phamerator, coding potential, and synteny evidence suggest this is a real gene with a likely start site at 5991bp. /note=Function call: No Known Function. PhagesDB Blastp has strong alignment scores (>173) to genes in cluster AX phages of no known function with low e values (1e-43 to 7e-45). NCBI Blastp shows strong hits to hypothetical proteins with >98% query coverage, >60% identity, and low e values (6e-53 and 1e-49). Although both Blastp programs show significant hits to genes with functions called as minor tail proteins, there is not enough evidence to call this gene a minor tail protein according to SEA-Phages guidelines. There must be significant hits to collagen-like or glycine-rich proteins as well as synteny evidence to confidently call a gene a minor tail protein; neither of these requirements were met. CDD gave no hits and HHPRED gave strong hits to other functions: minor capsid protein (99.51% probability with e value 7.8e-13) and putative tail-component (99.19% probability with e value 3.3e-10). There is not enough downstream synteny in phages where the gene is called a minor tail protein (mainly cluster AN phages). Synteny evidence is strongest in cluster AX phages and these also call the gene as No Known Function. Even though genes in Pham 58402 have functions called as minor tail protein, there is not enough evidence to call the function of this gene a minor tail protein, so it is NKF for now. /note=Transmembrane domains: DeepTMHMM predicts 0 TMDs. Not a membrane protein. /note=Secondary Annotator Name: BERBER-PULIDO, RODRIGO /note=Secondary Annotator QC: Great notes! I really liked your explaination on why the function is NKF. It makes sense that it is not a minor capsid protein and rather the function is undefined. CDS 6396 - 6680 /gene="10" /product="gp10" /function="hypothetical protein" /locus tag="MooKitty_10" /note=Original Glimmer call @bp 6396 has strength 8.31; Genemark calls start at 6396 /note=SSC: 6396-6680 CP: yes SCS: both ST: SS BLAST-Start: [tail assembly chaperone [Arthrobacter phage Adaia] ],,NCBI, q1:s1 91.4894% 2.15602E-26 GAP: 6 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.3, -4.34663776219455, no F: hypothetical protein SIF-BLAST: ,,[tail assembly chaperone [Arthrobacter phage Adaia] ],,YP_010649366,64.9485,2.15602E-26 SIF-HHPRED: Phage_TAC_11 ; Phage tail tube protein, GTA-gp10,,,PF11836.11,60.6383,93.0 SIF-Syn: Similar to MooKitty, for Adaia, Atraxa, and Sputnik, the upstream gene is a similar gene with no known function. MooKitty has a similar downstream gene to Adaia, whose gene function is a tail assembly chaperone. /note=Primary Annotator Name: Nguyen, Angelynn /note=Auto-annotation: Both Glimmer and Genemark agree that the start site is 6396. /note=Coding Potential: There is good coding potential throughout the ORF in the forward direction based on the host and self-trained genemark graphs. /note=SD (Final) Score: -4.347, this is the second highest (closest to 0) final score which corresponds to the start site of 6396. The final score that is the closest to 0 is -3.980 which states that the start is at 6588. /note=Gap/overlap: 0. There is no gap or overlap between this gene and the ones that come before and after it. This means that no gene needs to be added here. /note=Phamerator: The pham number as of 4/3/2023 is 72486. The pham has 14 members total with 5 drafts. The gene is conserved in phages Adaia, Atraxa, and Sputnik. However, this gene is not in the same cluster as the other genes since it currently has no cluster according to the phagesdb phamerator. Moreover, these three other genes have the same function call of tail assembly chaperone. /note=Starterator: Start number 12 in the Starterator was manually annotated in 3 of 9 non-draft genes in this pham. Start number 12 was also the most commonly called number. Start 12 corresponds to the start site of 6396 which agrees with the auto-annotated start site from Glimmer and Genemark. /note=Location call: All of the evidence above indicates that gene is real and the start site is 6396. Although the final score that corresponds to the start site of 6396 is not the closest to 0, the starterator, Glimmer, and Genemark all agree on this site. /note=Function call: Three of the top hits in NCBI BLAST have the function of tail assembly chaperone. For instance, YP_010649366 has 91% coverage, 53% identity, and E-value2.15e-26. From the HHpred, two of the top, not unknown function genes, were a phage tail tube protein (93.01% probability, 60% coverage, and E-value of 1.2) and a phage tail assembly chaperone protein (83% probability, 68% coverage, and E-value of 9.8). CDD had no relevant hits. /note=Transmembrane domains: The DeepTMHMM suggests that there is near 100% certainty that the gene is found inside. There are no transmembrane domains. /note=Secondary Annotator Name: Unanwa, Nnaemeka /note=Secondary Annotator QC: Very good notes! I would just check the Gap/Overlap section because the selected start site seems to have a 6 bp gap according to PECAAN. CDS 6698 - 6826 /gene="11" /product="gp11" /function="hypothetical protein" /locus tag="MooKitty_11" /note=Original Glimmer call @bp 6698 has strength 7.41 /note=SSC: 6698-6826 CP: yes SCS: glimmer ST: SS BLAST-Start: [tail assembly chaperone [Arthrobacter phage Adaia] ],,NCBI, q1:s1 92.8571% 1.74389E-9 GAP: 17 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.336, -7.073324118416107, no F: hypothetical protein SIF-BLAST: ,,[tail assembly chaperone [Arthrobacter phage Adaia] ],,YP_010649367,77.5,1.74389E-9 SIF-HHPRED: SIF-Syn: Tail assembly chaperone, upstream gene is tape measure protein, downstream is tail assembly chaperone, just like in phage Adaia /note=Primary Annotator Name: Unanwa, Nnaemeka /note=Auto-annotation: Gene is called by Glimmer, but not by GeneMark. TTG start codon according to Glimmer (TTG start codon is rare, only about 7% of all genes use TTG), This start site covers all of the coding potential. /note=Coding Potential: Coding potential in a part of the region in Self Trained and Host Trained Genemark, but does not reach all the way to the stop codon in both cases. Synteny with gene 11 in phage Adaia, but no synteny in the other two phages. Only one strong non-draft PhagesDB hit from phage Adaia (1e-8), other hits are weak. /note=SD (Final) Score: SD Score is -7.073 and Z-score is 1.336 for start @6698. These are considered to not be ideal values (SD score is far from 0, z-score is less than 2). The other start sites have better z-scores and SD scores, but they do not cover all of the coding potential. /note=Gap/overlap: 17 BP gap. This is considered to be a small gap and no gene can be inserted into this area. /note=Phamerator: Phamerator report ran on 4/5/23. Only one other result for pham 24721, Adaia_11. It is in cluster AZ, like MooKitty. It is a tail assembly protein in Adaia. /note=Starterator: Starterator report ran on 4/5/23. The starterator also has site 2 (start @6698) as the most annotated site for pham 24721. However, there are not many members of this pham and both members only have one manual annotation at this site. /note=Location call: This seems to be a real gene, albeit a rare one. It has much similarity to gene 11 in Adaia. It has synteny with this gene, it shares the same start site 2 with Adaia_11, high PhagesDB Blast hit with Adaia_11, etc. /note=Function call: No data on CDD. Lowest e-value on HHPred is 4, so the results from this resource was not helpful either. 4 non-draft hits on PhagesDB blast, but only one had a reasonable e-value: Adaia_11 with an e-value of 1e-08 and an identity of 100%. There is only 1 NCBI Blast hit, the tail assembly protein from Adaia with an e-value of 2e-09, 72% identity, and 92% query cover. /note=Transmembrane domains: No transmembrane domains detected on DeepTMHMM. This protein does not interact with the membrane. /note=Secondary Annotator Name: Bursulaya, Bella /note=Secondary Annotator QC: Great notes, I agree with your assessment! However, remember to mention coverage and probability for the blast hits. Although I agree with the start site you chose, it`s a little strange how it has the worst Z score and SD final score out of all 4 options, and also starts with TTG which is very rare. I would also include the date that you ran Phamerator and Starterator. CDS 6819 - 8936 /gene="12" /product="gp12" /function="tape measure protein" /locus tag="MooKitty_12" /note=Original Glimmer call @bp 6882 has strength 6.07; Genemark calls start at 6807 /note=SSC: 6819-8936 CP: yes SCS: both-cs ST: SS BLAST-Start: [tail length tape measure protein [Arthrobacter phage Atraxa] ],,NCBI, q1:s1 100.0% 0.0 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.893, -4.085313187509542, yes F: tape measure protein SIF-BLAST: ,,[tail length tape measure protein [Arthrobacter phage Atraxa] ],,YP_010649396,76.1364,0.0 SIF-HHPRED: Tape Measure Protein, gp57; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_CF,29.078,99.9 SIF-Syn: This gene has synteny with the other three genomes in its cluster: Adaia_1, Atraxa_1, Sputnik_1. All have this gene and all have the same function of tape measure protein. Genes around this gene both share synteny with other phages in AX. /note=Primary Annotator Name: Berber-Pulido, Rodrigo /note=Auto-annotation: Glimmer and Genemark do not agree on start site. Glimmer calls 6882 while Genemark calls 6807 /note=Coding Potential: Coding potential is seen in both Host-Trained and phagesdb genemark. It confirms that there is high coding potential for the entire gene. Start site 6882 does not cover all of coding potential but 6807 does. It is important to note that start site 6819 was not called by auto-annotation but also covers all of coding potential. /note=SD (Final) Score: Start site @6819 has best final score and z value with -4.085 and 2.893 respectively. While start site @6807 has z-score of 1.868 and final score of -5.849. Lastly, start site @6837 has the worst scores with a z-score of 1.259 and a final score of -7.498. This overall strongly suggests that start site @6819 is the most probable start site. /note=Gap/overlap: Start site @6819 has an overlap of 8 while @6807 has a an overlap of 20. This shows how @6819 is also ideal because it has less of an overlap than the other start site. This start site also has an acceptable length. /note=Phamerator: Phamerator analysis was run on 4/5/23, shows that it is a part of AX cluster. Cluster number: 11043. This cluster has 4 members with 1 draft. This gene is also conserved as it is seen in the other genomes of the same cluster: Adaia_1, Atraxa_1, Sputnik_1. /note=Starterator: The starterator analysis was run on 4/5/23 and it concludes that start site #2 @6819 is the correct start site. This start site has 3 MAs while the others do not have any and when comparing the other factors previously mentioned, it supports that this start site @6819 is the best for this gene. The conserved start site choice is @6894 but is not seen in this gene. /note=Location call: start 2 @6819, this gene is real. /note=Function call: Phagesdb BLAST shows that the function is a tape measure protein, as seen in Adaia_12, Sputnik_12, and Atraxa_12 with an E-value of E-171, E-155, and E-155 respectively. NCBI BLAST confirms this as many of the top hits are also tape measure proteins. YP_010649396.1 and YP_010649368.1 both have an E-value of 0, suggesting it is strong evidence to indicate that it is a tape measure protein. Also have a query cover of 99 and 100 percent respectively. CDD states that it has a domain for a phage-related protein which makes sense as a tape measure protein is a phage protein. Lastly, HHpred shows a hit for 6v8I_CF which has a 99.88 probability with an E-value of 4.1e-14 that indicates it is a tape measure protein. Overall, these results indicate that this gene is a tape measure protein. /note=Transmembrane domains: There is no evidence from the DEEPTMHMM that shows this is a transmembrane protein as it predicts all proteins are inside /note=Secondary Annotator Name: Scriven, Savannah /note=Secondary Annotator QC: Y: Great evidence and reasoning for this start site that neither auto-annotation programs got right! PECAAN notes for start site have required evidence and clear reasoning. Make sure to note that the start codon is GTG. I agree with your functional call of tape measure protein. Remember to check the boxes in PECAAN for PhagesDB blast, HHPRED, and NCBI blast! Your PECAAN notes for functional call are missing a little bit of information - remember to mention query cover, percent identity, and probability for blast hits. For the synteny box, make sure to include the upstream and downstream functional calls for mookitty and the other AX phages. CDS 8937 - 10196 /gene="13" /product="gp13" /function="minor tail protein" /locus tag="MooKitty_13" /note=Original Glimmer call @bp 8967 has strength 3.69; Genemark calls start at 8937 /note=SSC: 8937-10196 CP: yes SCS: both-gm ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage Adaia] ],,NCBI, q4:s5 99.0453% 0.0 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.235, -4.535415599498544, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Adaia] ],,YP_010649369,91.4286,0.0 SIF-HHPRED: Tail protein, 43 kDa; tail protein, structural genomics, PSI, MCSG, Protein Structure Initiative, Midwest Center for Structural Genomics, UNKNOWN FUNCTION; 2.1A {Neisseria meningitidis MC58} SCOP: b.106.1.1,,,3D37_A,87.8282,98.6 SIF-Syn: /note=Primary Annotator Name: Bursulaya, Isabelle /note=Auto-annotation: Both Glimmer and GeneMark were used but they called different start sites at 8967 and 8937, respectively. Glimmer’s start site suggests a GTG codon while GeneMark suggests an ATG codon. /note=Coding Potential: The coding potential covers both start sites up until the stop sites, but there are a few dips that are suspicious. Overall though, I think there is good coding potential, and both the Host Trained GeneMark and GeneMarkS had similar coding potentials. There is no noticeable coding potential in the reverse orientation, so the gene is probably in the forward direction. /note=SD (Final) Score: PECAAN selected the start site of 8967 with a start codon of GTG. The Z score is 2.208 and the final score is -5.832. The other start site proposed by GeneMark at 8937 has a start codon of ATG with a Z score of 2.235 and a final score of -4.535. It appears that the GeneMark start site has a more positive final score and a higher Z score, suggesting that it is potentially a better start site. The Z and final scores for this start site are also the best while keeping the length of the gene as long as possible and the gap as small as reasonable. /note=Gap/overlap: The Glimmer start site has a gap of 30, while the GeneMark start site has a gap of 0. While comparing the Pham map of MooKitty to the other three phages (Adaia, Sputnik, and Atraxa), only MooKitty had a gap between this gene (13) and the previous gene (12) while the other three phages had overlap between the 12th and 13th genes. This shows that the start site at 8937 (GeneMark) would provide more evidence of synteny with other phages, so I am starting to think that 8937 is a better start site than 8967. Both gaps are small enough at 0 and 30 so that another gene would not be able to fit. /note=Phamerator: As of 4/9/2023, the Pham number is 11573, with 4 members. They are the final versions of Adaia, Atraxa, and Sputnik, as well as the draft of MooKitty, all in the 13th position. Therefore, this gene is conserved by other members of the AX cluster. Another interesting note is that the length of the gene in the other non-draft members ranges from 1260 to 1266, while MooKitty’s gene has a length of 1230. This provides even more proof that the start site should be 8937 instead of 8967, since this will add another 30 base pairs and keep the length of the gene consistent with the same gene in other phages. /note=Starterator: According to Starterator, the start site 8937 has 1 manual annotation, while the site 8967 does not have any manual annotations, but it was chosen by the computer. However, start number 3 (8967) was only called in 1 out of 4 genes, and only in MooKitty, suggesting that this start site isn’t likely if it’s only called in 25% of the genes. /note=Location call: Based on all the evidence I listed above (such as Starterator having a manual annotation at start site 8937, the start site 8937 having slight overlap with gene 12 as the other phages, the start site 8937 giving a length of the gene that is consistent with the other phage’s genes, and having a better Z and final score), the true start site is most likely at 8937. This start site also grants the largest possible ORF. This is most likely a real gene with a forward orientation, as corroborated by synteny with other phages in the AX cluster. /note=Function call: PhagesDB BLASTp: There were three phages (Adaia, Atraxa, and Sputnik) that had very small e values (0, 1e-116, 1e-116 respectively) and all called the function to be a minor tail protein. Adaia had 85% identity, and Atraxa and Sputnik both had 53% identity. All three phages also had >=200 on the alignment score, which is the highest possible score, showing that there is strong evidence that these genes are the same. NCBI BLASTp: Strangely, only Adaia and Atraxa are listed on NCBI BLASTp, and both have very low e values (0 and 1e-143 respectively). They both have a coverage percentage of 99%, but Adaia has a higher identity with 85.54%, while Atraxa has an identity score of 53.17%. Although Atraxa’s identity score is a little low, it is still higher than the threshold of 35%. Both Adaia and Atraxa listed the function as a minor tail protein. The alignment scores for these two phages were >= 200, which is the highest alignment score possible. CDD: There were no hits in CDD at all. HHpred: Good hits with 3D37_A and 3CDD_C with probabilities of 98.61 and 98.52 respectively, and e values of 0.000056 and 0.000029 respectively. These are high probabilities and low e values, which is good. Both of these hits list the function as tail protein. /note=Transmembrane domains: DeepTMHMM shows that this protein is found entirely outside of the cell, with 100% probability. This makes sense because a minor tail protein would remain on the outside of the bacterial cell and not enter it. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 10196 - 11755 /gene="14" /product="gp14" /function="minor tail protein" /locus tag="MooKitty_14" /note=Original Glimmer call @bp 10220 has strength 3.66; Genemark calls start at 10196 /note=SSC: 10196-11755 CP: yes SCS: both-gm ST: NI BLAST-Start: [tail protein [Arthrobacter phage Adaia] ],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.32, -6.626913615660345, no F: minor tail protein SIF-BLAST: ,,[tail protein [Arthrobacter phage Adaia] ],,YP_010649370,73.6031,0.0 SIF-HHPRED: Peptidoglycan N-acetylglucosamine deacetylase; peptidoglycan N-acetylglucosamine deacetylase, HYDROLASE; HET: MSE; 1.54A {Agathobacter rectalis (strain ATCC 33656 / DSM 3377 / JCM 17463 / KCTC 5835 / VPI 0990)},,,5JMU_A,47.0135,99.7 SIF-Syn: For Adaia, Atraxa, and Sputnik, downstream gene is lysin A and upstream gene is minor tail protein. MooKitty has a downstream endolysin and upstream minor tail protein. /note=Primary Annotator Name: Okumura, Joey /note=Auto-annotation: Glimmer has start at 10220. Genemark has start at 10196. /note=Coding Potential: start site covers good coding potential in second forward ORF (with slight dip in middle of gene) → similar results in host and self trained GeneMark (some coding potential in third reverse ORF but worse than forward + does not cover entire coding region with suggested start site of 3676) /note=SD (Final) Score: FS of -6.627 and z score of 1.32 for 10196. FS of -5.412 and z score of 2.219 for 10220. FS tend to increase as z scores increase so no starts have great FS and z scores. /note=Gap/overlap: -1 gap for 10196. 23 gap for 10220 → gap of 23 is too small to fit another gene. /note=Phamerator: Pham number 12180. Date 4/08/2023. Pham has 4 members, only draft is MooKitty. The gene is conserved in 3 nondraft phages. This gene is found in phages that are in the same cluster as MooKitty (AX); found in Adaia, Atraxa, and Sputnik. /note=Starterator: Start site 5 in Starterator was called in 2/3 non-draft genes in the pham. MooKitty does not have start site 5. Other non-draft gene does not have start 5 → it has and calls start 6. Start 6@10196 has 1 MA. Start 7 is 10220. /note=Location call: Based on above evidence, this is likely a real gene with the start site at 10196. This start site eliminates a gap. It is also near the beginning of the gene, similar to the 5 start site that other phages call. This agrees with GeneMark (but not Glimmer). /note=Function call: This one was tricky. Phages DB: minor tail protein, multiple phages with e values < E-147 and >200 alignment call minor tail protein. NCI BLAST: tail protein for multiple phages with < E-90 e values, >54% percent identity, > 70% coverage + deacetylase family protein for multiple phages with slightly worse values but more hits. CDD had one hit of deacetylase family protein with e value of E-3. The HHpred had peptidoglycan deacetylase for multiple phages with >99% probability and < E-14 e values (HHpred Job ID: 1352137). Even though there are good HHPred results, since peptidoglycan deacetylase is not an approved function, I did not see anything similar on the SEAPHAGES forum, PhagesDB and NCI BLAST have better e values, and other phages call a minor tail protein, I believe this is a minor tail protein. /note=Transmembrane domains: Not a transmembrane protein. /note=Secondary Annotator Name: Li, Anna /note=Secondary Annotator QC: I agree with the start site based on conserved overlap with other AX phages, starterator report, not best RBS but -1 overlap suggests operon presence /note=Not too sure about function, would call minor tail protein function based on BLAST (has good hits) and polysaccharide deacetylase isn`t in the approved functions list; commenting that there can be multiple genes coding for minor tail proteins (e.g. from VroomVroom); hhpred may have hard time calling minor tail proteins (https://seaphages.org/forums/topic/4538/) and coverage spans ~50% of gene, also adding this: https://seaphages.org/forums/topic/5472/?page=1#post-10012 CDS 11770 - 12438 /gene="15" /product="gp15" /function="endolysin" /locus tag="MooKitty_15" /note=Original Glimmer call @bp 11824 has strength 6.37; Genemark calls start at 11722 /note=SSC: 11770-12438 CP: yes SCS: both-cs ST: NA BLAST-Start: [endolysin [Arthrobacter phage Atraxa] ],,NCBI, q1:s1 67.1171% 2.44024E-48 GAP: 14 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.248, -3.005970110175159, yes F: endolysin SIF-BLAST: ,,[endolysin [Arthrobacter phage Atraxa] ],,YP_010649399,49.0291,2.44024E-48 SIF-HHPRED: Murein hydrolase activator EnvC; Complex, PROTEIN BINDING; 2.1A {Escherichia coli (strain K12)},,,6TPI_A,50.0,98.4 SIF-Syn: Mookitty has synteny with phage Adaia, which is also a part of cluster AX /note=Primary Annotator Name: Martinez, Daniela /note=Auto-annotation: Glimmer and Genemark do not agree on the same start site. Glimmer predicts a start @11824 and Genemark @11722. /note=Coding Potential:Coding potential is observed on both Host-Trained Genemark and GenemarkS maps after the @11700 region in the forward direction. There is some very minimal coding potential spikes in the reverse direction. /note=SD (Final) Score:a start @11824 has a score of -5.109 and the start @11722 has a score of -6.567. Since a higher final score means a better sequence match, the start @11824 appears to be the best one, however it does not cover a small region of coding potential of about 100bp. /note=Gap/overlap: a start @11824 has a gap of 68bp and a start @11722 has an overlap of -34bp. The “most annotated” start @11770 has a gap of 14bp, which is reasonable. /note=Phamerator: As of 4/10/2023, this gene was found on pham 73968 with 2 members, 1 of which is a draft. /note=Starterator: The start called the most often is start #5. Mookitty has the “most annotated” start, but does not call it. Start #5 is @11770 and has a better final score and Z-score than the start sites predicted by glimmer and genemark. Start site #5 also has one manual annotation on phage Adaia. Start site #5 @11770 covers more coding potential than the start @11824 predicted by glimmer. /note=Location call: Since start site #5 @11770 is conserved in the pham, covers most of the coding potential, and has a better final score and z-score, this is likely the correct start site. Based on the coding potential this is a real gene. /note=Function call: PhagesDB BLASTp results indicate a Lysin A with an e-value of 6e^-92. The top three results match with phage Adaia, Sputnik, and Atraxa. These all belong in the same cluster as Mookitty. The first NCBI BlASTp hit is a tail length tape measure protein and the fourth best hit is an endolysin from phage Atraxa (67% coverage, 2e^-78, 56.67% identity). CDD results indicated a peptidase (e-value 2.41e^-28). The top hit on HHPRED suggests a hydrolase with an e-value of 4.6e^-6, which meets the 10^-3 requirement. Since there is no evidence of a Lysin B and this is not a mycobacteriophage, I can not call this gene a Lysin A. However, based on the SEA-PHAGES functional assignment sheet, I can call this gene an endolysin instead. This function makes sense since some of the hits called this a hydrolase, which is also known as endolysin. For this reason, the function call for this gene was determined to be an endolysin. /note=Transmembrane domains: DeepTMHMM predicted zero transmembrane domains. /note=Secondary Annotator Name: Ortiz-Gomez, Diana /note=Secondary Annotator QC: I agree that the start site at 11770 is the best one, but make sure to mark "yes" for the "Coding Capacity" box. It looks like you went over a lot of evidence to determine the function, but please check the boxes for each database that helped you determine this function. The starterator box needs to be filled. CDS 12438 - 12659 /gene="16" /product="gp16" /function="membrane protein" /locus tag="MooKitty_16" /note=Original Glimmer call @bp 12438 has strength 6.21; Genemark calls start at 12438 /note=SSC: 12438-12659 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein PP340_gp16 [Arthrobacter phage Adaia] ],,NCBI, q1:s3 100.0% 3.32665E-37 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.501, -4.600921586871208, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein PP340_gp16 [Arthrobacter phage Adaia] ],,YP_010649372,93.4211,3.32665E-37 SIF-HHPRED: SIF-Syn: This gene displays synteny with the other genomes. /note=Glania - Changing function to membrane protein because there are two TMDs found in DeepTMHMM. /note= /note=Primary Annotator: Hernandez, Edgar /note=Auto-annotation: Glimmer and GeneMark both display the same start site 12438, with the start codon ATG. /note=Coding Potential: There’s a reasonable amount of coding potential present in both the Host-Trained and Self GeneMark. There’s an upward hash near the start site 12438, and there is a downward hash near the stop site 12659. The black line is included within the start and stop sites, which is indicative of typical coding potential that’s present within the putative ORF. /note=SD (Final) Score: There’s an SD final score of -4.601, which is indicative of a higher sequence match. Meanwhile, the Z-score is 2.501, which is good since anything higher than 2 indicates that the RBS was above the mean. /note=Gap/Overlap: There’s a very small overlap of -1 present, and it seems to be conserved when compared to the other genes in the AX cluster (Adaia, Atraxa, Sputnik). /note=Phamerator: The gene is located in Pham 61456. AX cluster group members like Adaia, Atraxa, and Sputnik, revealed synteny with MooKitty since the genes were found possessing the same location and order. The functions associated with the gene are unknown. /note=Starterator: The evidence suggests that the start site 7 at 12438 is semi conserved across all members of Pham 61456 because 2 out of 5 final genes claimed it as a real start site. The gene has only 1 manual annotation. /note=Location call: Based on all the pieces of evidence gathered from Pharmerator, Starterator, synteny comparison, and coding potential, the possible start site for the gene is 12438 seeing as GeneMark and Glimmer agree. Additionally, only one person has actually called this the correct start site as there is only 1 manual annotation. /note=Function call: PhagesDB and NCBI BlastP both called for significant hits with unknown or hypothetical function as the gene function. The hits Adaia, Sputnik, and Atraxa, yielded significant low E-values when examined with PhagesDB. PDB HHpred did not suggest any hits at all. Likewise, CDD did not provide any useful information seeing as no hits were found. Therefore, the gene function most likely remains unknown. /note=Transmembrane Domains: TMHMM predicted 2 transmembrane domains and as a result, it’s possible that the gene is membrane protein since at least 2 transmembrane domains are required for the gene to claim this function. /note=Secondary Annotator: Scriven, Savannah /note=Secondary Annotator QC: Y: I agree with start site and functional call. Make sure to note whether all coding potential is included and check yes for the box “All GM Coding Capacity.” Also make sure to note somewhere that this is the longest ORF. Add that both Z and SD scores were highest options. Include date of phamerator analysis. For starterator, is this start site the most conserved option or are all start sites equally good/bad? Does this gene even have a most-annotated start? Make sure to select SS, NI, or NA for the starterator box (look at annotation lab manual to see which option fits best for any drop down menus you have to fill out). For functional evidence, note the e-values, query cover, and percent identity for BLAST hits. For synteny box, indicate the function of upstream and downstream genes in MooKitty and the phages you used for comparison. CDS 12652 - 12951 /gene="17" /product="gp17" /function="membrane protein" /locus tag="MooKitty_17" /note=Original Glimmer call @bp 12652 has strength 7.29; Genemark calls start at 12652 /note=SSC: 12652-12951 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PP340_gp17 [Arthrobacter phage Adaia] ],,NCBI, q1:s1 100.0% 5.95773E-40 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.814, -5.382696858041375, no F: membrane protein SIF-BLAST: ,,[hypothetical protein PP340_gp17 [Arthrobacter phage Adaia] ],,YP_010649373,79.0,5.95773E-40 SIF-HHPRED: SIF-Syn: N/A for membrane protein /note=Primary Annotator Name: Li, Anna /note=Auto-annotation: Both Glimmer and GeneMark agree at the same start site (site #: 12652); start codon called: ATG. /note=Coding Potential: Gene has reasonable coding potential (both host and self-trained). Chosen start site covers all of coding potential. Surrounding coding potential is all in the forward direction. /note=SD (Final) Score: -5.383, 2nd best final score. Start site with better final score has one of -5.279, but also has a worse z-score (1.867 versus 1.814) and has a larger gap compared to the autoannotated start site. /note=Gap/overlap: -8 overlap, which is a greater overlap than what is normally accepted but a small overlap (-11bp) is also conserved in other AX subcluster phages between this gene and the gene directly upstream. /note=Phamerator: Gene is found in pham #11427 as of 2023-04-03; pham contains 3 other non-draft AX subcluster phages (Adaia_17, Atraxa_17, Sputnik_17). No functions are called for any genes in this pham. /note=Starterator: Most annotated site located at start site 9 for 3/3 non-draft genes within the pham. Start site 9 is located @12652 for MooKitty with 3 manual annotations. /note=Location call: Gene is most likely real with start site @12652 based mainly on a decent RBS score, coverage of coding potential, being the most manually annotated on Starterator, and having the start site that closes the gap between the gene and the upstream gene. Synteny shows that the small overlap is conserved between all 3 non-draft AX phages (Adaia, Atraxa and Sputnik). /note=Function call: Unknown function; Phagesdb and NCBI BLAST only call hits to genes with no known function. Phagesdb calls Adaia and Sputnik gene 17 with small e-values (<10^-31) and decent identity (68% and 37%, respectively). NCBI BLAST calls Adaia and Atraxa hits with small e-values (<10^-9) and decent identity (>35%). No hits present in CDD for amino acid sequence. No hits in HHpred with an e-value < 10^-2. /note=Transmembrane domains: TMHMM calls 1 transmembrane domain with a length of 18 amino acids (between aa79-96), which is within the TMD length range SEA-PHAGES requires to call a gene with a membrane protein function /note=Secondary Annotator Name: Rodriguez, Justin /note=Secondary Annotator QC: I agree with the location call based on reasoning from covered coding potential, final score, and the start site being the most annotated in Starterator. No function is supported based on the evidence provided (no hits in CDD or HHpred). Calling it a membrane protein is supported based on SEA-PHAGES guidelines, which is good to include. Check the TmHmm prediction box as evidence since it is a membrane protein. Make sure to fill out the coding capacity dropdown box and the synteny box, but other than that good work! CDS 13114 - 13341 /gene="18" /product="gp18" /function="helix-turn-helix DNA binding domain" /locus tag="MooKitty_18" /note=Original Glimmer call @bp 13114 has strength 3.52; Genemark calls start at 13153 /note=SSC: 13114-13341 CP: yes SCS: both-gl ST: NI BLAST-Start: [type II toxin-antitoxin system HicB family antitoxin [Candidatus Rubneribacter avistercoris]],,NCBI, q7:s78 80.0% 0.0101911 GAP: 162 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.596, -6.385349498381551, no F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[type II toxin-antitoxin system HicB family antitoxin [Candidatus Rubneribacter avistercoris]],,HIY82835,25.3165,0.0101911 SIF-HHPRED: a.6.1.5 (B:1-69) automated matches {Escherichia phage [TaxId: 10710]} | CLASS: All alpha proteins, FOLD: Putative DNA-binding domain, SUPFAM: Putative DNA-binding domain, FAM: Terminase gpNU1 subunit domain,,,SCOP_d6hn7b1,72.0,99.3 SIF-Syn: No synteny for this gene since it is an orpham. /note=Primary Annotator Name: Rodriguez, Justin /note=Auto-annotation: Coding potential is called for this gene in both Glimmer and GeneMark. They do not agree on the same start site. Glimmer calls 13114 while GeneMark calls 13153. The start codon of 13114 is GTG while for 13153 it is ATG which is not helpful in differentiating the likelihood of the starts. /note=Coding Potential: There is reasonable coding potential as shown by Self-trained and Host-trained GeneMark graphs. The proposed start of 13114 covers all the coding potential of the gene for both graphs. /note=SD (Final) Score: The final score for start 13114 is not the best, but it is not the worst either; it is -6.385. It is somewhat reasonable. /note=Gap/overlap: The gap for the proposed start of 13114 is 162 between it and the upstream gene; perhaps a new gene can be added. GeneMark graphs show that it would be a reverse gene in a field of forward genes however. There are multiple potential start sites for this gene, but the proposed start of 13114 has one of the longest ORFs of 228 bp. The start that would produce the longest ORF of 246 bp is not present on either GeneMark graph. /note=Phamerator: In pham 61816 as of 4/11/2023. Only member in pham so it is an orpham. No gene function called /note=Starterator: No Starterator report since it is an orpham /note=Location call: Evidence supports 13114 being the start site. This start site covers all the coding potential for this gene. This proposed start site creates a reasonably long ORF, and the Final Score is better than most starts. This is most likely a real gene taking coding potential and calls from Glimmer and GeneMark into consideration. /note=Function call: Predicted function is helix-turn-helix DNA binding domain protein, based on a hit from CDD with an E-value of 9.54e-04. HHpred also has multiple hits with probabilities above 98% and low E-values (smaller than 10e-3). Still, there are no significant hits with low e-values in NCBI or PhagesDB BLASTs. There were hits for winged helix-turn-helix DNA binding domain proteins as well, but there is evidence for a helix-turn-helix DNA binding domain protein for now. /note=Transmembrane domains: There are no TMDs according to DeepTMHMM. Looking at pham maps in PECAAN, genes upstream and downstream of this gene in other phages do not have functions so that does not provide evidence for this gene`s transmembrane status. /note=Secondary Annotator Name: Nguyen, Angelynn /note=Secondary Annotator QC: I agree with the conclusions! Great job, especially considering that this was a weird gene with not a lot of hits in the BLAST databases. CDS 13338 - 13517 /gene="19" /product="gp19" /function="membrane protein" /locus tag="MooKitty_19" /note=Original Glimmer call @bp 13338 has strength 6.76; Genemark calls start at 13338 /note=SSC: 13338-13517 CP: yes SCS: both ST: NA BLAST-Start: GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.031, -2.90473872338446, yes F: membrane protein SIF-BLAST: SIF-HHPRED: SIF-Syn: There is no synteny. This gene is an orpham. /note=Primary Annotator Name: Ortiz-Gomez, Diana /note=Auto-annotation: Both Glimmer and GeneMark agree on the start site at 13338. /note=Coding Potential: The chosen start site includes all the coding potential and is in the forward strand. /note=SD (Final) Score: This start site has the best final score of -2.905 and a Z-score of 3.031. /note=Gap/overlap: There is an overlap of 4bp which is acceptable and potentially indicates that this gene is part of an operon. There is no synteny. /note=Phamerator: This gene is found in pham 61795 (04/09/2023) and MooKitty is the only phage in this pham with a length of 180bp. /note=Starterator: Since this gene is an orpham, there is no starterator page to compare it to. /note=Location call: The evidence makes it difficult to confirm if this is a real gene (there are no non-draft members that call this gene real). However, there is good coding potential. The best start site for this gene, if real, would most likely be at 13338 since this start has the best final score and is agreed by both Glimmer and GeneMark. /note=Function call: There are no hits for NCBI BLASTp, CDD, and phagesDB BLAST. HHpred has very poor hits with very high e-values. Given the lack of informative hits, there is no known function for this gene. /note=Transmembrane domains: There are two TMDs called by TMHMM, meaning this might be a membrane protein. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 13541 - 13735 /gene="20" /product="gp20" /function="hypothetical protein" /locus tag="MooKitty_20" /note=Original Glimmer call @bp 13541 has strength 14.16; Genemark calls start at 13541 /note=SSC: 13541-13735 CP: no SCS: both ST: NA BLAST-Start: GAP: 23 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.526, -1.993391246735709, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: SCRIVEN, SAVANNAH /note=Auto-annotation: Both Glimmer and GeneMark call start site as 13541bp with ATG start codon. /note=Coding Potential: Coding potential is high throughout ORF with a couple small dips. Auto-annotated start @ 13541bp cuts off some high coding potential and has a 23bp gap with the preceding gene. Upstream start site @ 13454 maximizes coding potential and has a -64bp overlap with the preceding gene. Other AX phage genomes (Adaia, Sputnik, Atraxa) in this region have bp overlaps of only a few basepairs. /note=SD (Final) Score: Auto-annotated start site @ 13541bp has highest Final Score (-1.993) and Z score (3.526). Start site @ 13454 has a Final Score of -5.228 and Z score of 2.12. /note=Gap/overlap: The auto-annotated start site @ 13541bp is 195bp and leaves a 23bp gap with the preceding gene which is the shortest gap or overlap. Start site @ 13454bp is 282bp and has a -64bp overlap with the preceding gene. AX phage genomes are most similar to MooKitty and seem to limit gaps and overlaps to no more than 15bp. A -64 overlap is large and may not be favorable even if the gene is longer. /note=Phamerator: (4/10/2023) Pham 61828. Gene is an orpham - single member of this pham. /note=Starterator: Starterator not informative. /note=Location call: Above evidence suggests this is a real gene as there is good coding potential and it makes more sense for the genome to have a gene here to maximize genome space. Manual annotators agreed with the auto-annotated start site call @ 13541bp because it minimizes gap/overlap with preceding gene and has the most favorable Final and Z Scores. /note=Function call: No Known Function. PhagesDB and NCBI did not have any informative BLASTp hits (e values 2.5 - 5.5). No CDD hits. Top HHPRED hits had large e values. Overall lack of functional evidence to assign specific function. /note=Transmembrane domains: DeepTMHMM predicts 0 TMDs. Gene is completely inside - not a membrane protein. /note=Secondary Annotator Name: BERBER-PULIDO, RODRIGO /note=Secondary Annotator QC: Great job again with your notes! I had a similar case with one of my genes and I also determined it was NKF. It also makes sense why that start site was chosen. Overall, great job at writing good notes. Just don`t forget to fill out the Synteny box CDS 13838 - 14080 /gene="21" /product="gp21" /function="hypothetical protein" /locus tag="MooKitty_21" /note=Original Glimmer call @bp 13838 has strength 10.7; Genemark calls start at 13838 /note=SSC: 13838-14080 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein PP340_gp19 [Arthrobacter phage Adaia] ],,NCBI, q4:s1 93.75% 5.55877E-29 GAP: 102 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.419, -6.161165465030078, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PP340_gp19 [Arthrobacter phage Adaia] ],,YP_010649375,79.2208,5.55877E-29 SIF-HHPRED: SIF-Syn: There is not that much synteny between the genes immediately upstream and downstream this gene. The most synteny seen with MooKitty is with Adaia which also has a membrane protein upstream and a similar gene with NKF downstream. /note=Primary Annotator Name: Nguyen, Angelynn /note=Auto-annotation: Both Glimmer and Genemark agree that the start site is 13838. /note=Coding Potential: There is good coding potential throughout the ORF in the forward direction based on the host and self-trained genemark graphs. Moreover, there is no coding potential in the gap before and after the gene. /note=SD (Final) Score: -6.089, this is the third highest (closest to 0) final score which corresponds to the start site of 13790. The final score that is the closest to 0 is -3.980 which states that the start is at 13847. Moreover, the autoannotated start site corresponds to the final score of -6.161. /note=Gap/overlap: There is a small gap of 103 basepairs before the gene and 68 basepairs after the gene. Even though these are rather large gaps this is acceptable since there is coding potential in this region and this sort of gap is conserved in phages like Adaia. Therefore, a gene does not need to be added. /note=Phamerator: The pham number as of 4/4/2023 is 72486. The pham has 2 members total with 1 draft. The gene is conserved in phages Adaia. However, this gene is not in the same cluster as the other gene (which is in AX) since it currently has no cluster according to the phagesdb phamerator. /note=Starterator: There were two start numbers being 3 and 4. Start 4 was the most annotated and this start number was manually annotated in 1 of 1 non-draft genes in this pham. Start site 3 was found in MooKitty_21 which had no manual annotations but corresponds to the start site of 13838. /note=Location call: All of the evidence above indicates that gene is real and the start site is 13838. Although the final score that corresponds to the start site of 13838 is not the closest to 0, the starterator, Glimmer, and Genemark all agree on this site. /note=Function call: The top non-draft Phagesdb BLAST has an unknown function with an e-value of 3e-23. The rest of the hits are portal proteins with relatively high E-values of 3.3. The NCBI BLAST had one hit that indicates that the gene is a hypothetical protein (93% coverage, 68% identity, and E-value 6e-29). HHpred had a few hits but all of them had very high E-values that were greater than 15. Lastly, CDD had one hit with an E-value of 5.27e-03 which indicates that the gene could be a synthetase. Considering all of the evidence above, it is likely that this gene has no function known since all the relevant hits indicate that it is a hypothetical protein or some sort of protein. /note=Transmembrane domains: The DeepTMHMM suggests that there is near 100% certainty that the gene is found inside. There are no transmembrane domains. /note=Secondary Annotator Name: Okumura, Joey /note=Secondary Annotator QC: I corrected a typo in the synteny box (NFK to NKF) + corrected typo in location call section (Although the final score that corresponds to the start site of [6396 --> 13838]) /note=For the Phamerator section, MooKitty belongs in cluster AX. CDS 14127 - 14342 /gene="22" /product="gp22" /function="hypothetical protein" /locus tag="MooKitty_22" /note=Original Glimmer call @bp 14148 has strength 5.28; Genemark calls start at 14148 /note=SSC: 14127-14342 CP: yes SCS: both-cs ST: NA BLAST-Start: GAP: 46 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.382, -6.295851664414997, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Unanwa, Nnaemeka /note=Auto-annotation: Called by GeneMark and Glimmer. ATG stop codon for start@14148 predicted by GeneMark and Glimmer, which is common. TTG codons are also predicted for start@14127, start@14136. /note=Coding Potential: Covered by most of coding potential in both Host Trained and Self Trained GeneMark. Lacks synteny with Adaia, Atraxa, and Sputnik. No strong hits on PhagesDB. This seems to be a singleton in its pham. /note=SD (Final) Score: Z-score of 1.382 and RBS score of -6.296 for start@14127. These values are both not as satisfactory as they could be (Z-score is less than 2, RBS is relatively far from 0). However, this seems to be the most likely start because it leaves the smallest gap while still being at least 120 BP long. The predicted start@14148 with the ATG codon has more favorable scores, but it leaves a rather large gap between this gene and gene 21. In addition, start@14325 has a Z-score of 2.147 (which would be considered "good"), but it would cause the gene to only be 17 BP long which is not reasonable. /note=Gap/overlap: 46 BP. This is a reasonable gap, no gene can be inserted here. /note=Phamerator: This gene is a singleton in the pham 61809. There is nothing else to compare it to and there is no phamerator report. /note=Starterator: A starterator report is not available for this pham. /note=Location call: The most likely start is start@14127 because it leaves the smallest gap and creates a gene that is at least 120 BP long. There is also a lack of starterator and phamerator reports, so there is a limited amount of evidence that can be used. This seems to be a real gene, as well as a . It is called by GeneMark and Glimmer. There is coding potential for this gene in both the Self and Host Trained GeneMark. Only one strand is used for this gene (the forward strand), and there are no switches in orientation with the surrounding genes. The most likely start means that the gene would have a length over 120 BP. However, the gene lacks synteny with other phage gene, and there are no strong hits on the PhagesDB Blast. /note=Function call: NKF. All data that would give evidence of function either has no results (CDD, NCBI Blast, Starterator and Phamerator Reports) or unacceptably high e-values (PhagesDB Blast, HHPred). /note=Transmembrane domains: High probability that the protein stays inside the membrane DeepTMHMM, but no transmembrane domains detected. This protein does not interact with the cell membrane. /note=Secondary Annotator Name: Nguyen, Angelynn /note=Secondary Annotator QC: I agree with your conclusions. Good job with your descriptive notes. Remember to fill out the synteny box! CDS 14339 - 14515 /gene="23" /product="gp23" /function="hypothetical protein" /locus tag="MooKitty_23" /note=Original Glimmer call @bp 14339 has strength 3.04 /note=SSC: 14339-14515 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein PP340_gp25 [Arthrobacter phage Adaia] ],,NCBI, q1:s1 84.4828% 2.63154E-7 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.345, -4.336903751127196, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PP340_gp25 [Arthrobacter phage Adaia] ],,YP_010649381,52.3077,2.63154E-7 SIF-HHPRED: SIF-Syn: Contains synteny with only one other phage, Adaia, This other phage also has an unknown function. /note=Primary Annotator Name: Berber-Pulido, Rodrigo /note=Auto-annotation: Has only Glimmer where it suggested start site @14339 /note=Coding Potential: This gene has good coding potential overall, but, towards the end, there is some coding potential in the reverse direction. This is seen in both host-trained and phages db genemark.The suggested startsite also covers all the coding potential. /note=SD (Final) Score: The suggested start site @14339 has a z-score of 2.345 and has the final score of -4.337. Overall, these results make strong evidence that this start site is the correct one. /note=Gap/overlap:With this startsite there is an overlap of 4 which is the lowest gap/overlap out of the potential start sites.This startsite also results in the longest length for this gene. /note=Phamerator: Phamerator analysis was run on 4/6/23, shows that it is a part of AX cluster. Cluster number: 23733. This cluster has 2 members with 1 draft. This gene is also conserved as it is seen in the only other genomes of the same cluster: Adaia_25. /note=Starterator: The starterator analysis was run on 4/6/23. start site #1 @14339 has 1 MAs while the others do not have any and when comparing the other factors previously mentioned, it supports that this start site @14339 is the best for this gene. As there is only one other member in this pham, there is no conserved start site and thus, there isn’t a start site to compare to. /note=Location call: Start 1 @14339 and the gene is real /note=Function call: NCBI and Phagesdb BLAST both show hypothetical protein/ function unknown as the only other comparison that they made was to the gene in Adaia. The E-value for NCBI was 3e-7 while the phagesdb had an E-value of 9e-8. CDD had no hits seen. HHpred had some results, however, they will not be taken into consideration as their E-value is over 2, thus, not accurate. Overall, I am confident that this protein is a hypothetical one. /note=Transmembrane domains:There is no evidence from the DEEPTMHMM that shows this is a transmembrane protein as it predicts all proteins are inside /note=Secondary Annotator Name: Ortiz-Gomez, Diana /note=Secondary Annotator QC: I agree with the location and function call. All boxes have been checked and completed. CDS 15286 - 15675 /gene="24" /product="gp24" /function="hypothetical protein" /locus tag="MooKitty_24" /note=Original Glimmer call @bp 15286 has strength 7.31; Genemark calls start at 15286 /note=SSC: 15286-15675 CP: no SCS: both ST: NA BLAST-Start: [hypothetical protein PP340_gp27 [Arthrobacter phage Adaia] ],,NCBI, q4:s14 62.0155% 3.08636E-18 GAP: 770 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.653, -4.955328222098073, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PP340_gp27 [Arthrobacter phage Adaia] ],,YP_010649383,50.8929,3.08636E-18 SIF-HHPRED: SIF-Syn: The synteny with other genes is not very well conserved. This gene itself does not seem similar to any other gene in the other 3 phages except for Adaia. The other genes have smaller genes with unknown functions that MooKitty does not have, or vice versa. /note=Primary Annotator Name: Bursulaya, Isabelle /note=Auto-annotation: Both Glimmer and GeneMark were used and agreed on the start site at 15286, which begins with a start codon of ATG. /note=Coding Potential: The coding potential is strong in the forward direction, but it does not cover the proposed start site of 15286. Instead, it starts around 15300. There is no evidence of coding potential in the reverse direction. The coding potential does dip a little bit, but it overall looks strong. /note=SD (Final) Score: The proposed start site of 15286 has a final score of -4.955 and a Z score of 2.653, but this start site does not have the best scores. The start site at 15313 has a final score of -2.584 and a Z score of 3.266, which are the best overall. This site also has a start codon of ATG. /note=Gap/overlap: The gap at 15286 is 797, which is extremely large. The gap at 15313 is 770 which is also extremely large, meaning another protein coding gene can potentially fit there. When looking at the Pham map, there appears to be a large gap between this gene and the previous one in all three of the other phages (Adaia, Atraxa, Sputnik). /note=Phamerator: As of 4/9/2023, the Pham number is 61759. The only member of this Pham appears to be the draft of MooKitty, suggesting that either this gene is an orphan or does not exist. /note=Starterator: This was very strange because when I clicked the link to Starterator, the page was not even found. It does not seem that this gene has a Starterator page, providing further evidence that this gene is an orphan or is not real. /note=Location call: Because there is no evidence so far that this gene is real, I will keep the start site because both Glimmer and GeneMark agreed upon it at 15286. However, I do not know if this is the best start site possible. This start site also grants the largest possible ORF. /note=Function call: PhagesDB BLASTp: Adaia_27 is the only good hit with an e value of 2e-16 with an alignment score of 82. It lists the function as unknown. NCBI BLASTp: Adaia is the only hit with a percent identity of 56.47%, which isn’t very high but is still above 35%. The e value was 3e-18 and the function is listed as “hypothetical.” The percent identity was 42.9% and the coverage was 62%. CDD: CDD did not have any hits at all. HHpred: Although a few hits had high probabilities, they all had poor e values of at least 1.1 and higher. This makes me skeptical about any evidence found in HHpred, so no evidence would be selected for this section. For now, the function of this gene is probably NKF. /note=Transmembrane domains: There was a 100% probability that the protein was completely inside the membrane. Without any information about what the function of this protein is, I don’t know whether this makes sense or not. /note=Secondary Annotator Name: Okumura, Joey /note=Secondary Annotator QC: The gaps were switched for the Gap/Overlap section so I corrected this. I agree that this one is tricky. Since the other phages also seem to have a gene with a similarly large gap, I agree that this can be kept as a real gene for now with the autoannotated start site. CDS 15668 - 15949 /gene="25" /product="gp25" /function="HNH endonuclease" /locus tag="MooKitty_25" /note=Original Glimmer call @bp 15758 has strength 3.43 /note=SSC: 15668-15949 CP: yes SCS: glimmer-cs ST: NA BLAST-Start: [HNH endonuclease [Arthrobacter phage Adaia] ],,NCBI, q1:s1 98.9247% 2.52312E-38 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.334, -4.629474642709059, no F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage Adaia] ],,YP_010649384,75.7895,2.52312E-38 SIF-HHPRED: HNH endonuclease domain protein; CRISPR-Cas, Cas9, HNH, RuvC, RNA-guided DNA endonuclease, cytoplasmic, Hydrolase; HET: SPD; 2.201A {Actinomyces naeslundii},,,4OGE_A,69.8925,97.8 SIF-Syn: MooKitty upstream gene has NKF and upstream genes for all other phages are still waiting for function to be called. No downstream genes. /note=Primary Annotator Name: Okumura, Joey /note=Auto-annotation: Glimmer has 15758 start with TTG codon. No Genemark start. /note=Coding Potential: start site covers good coding potential in second forward ORF → similar results in host and self trained GeneMark /note=SD (Final) Score: FS of -4.554 and z score of 2.525 for 15758 with TTG (best FS score and average z score). FS of -4.629 and z score of 2.334 for 15668 with ATG (second best FS score and average z score). /note=Gap/overlap: 82 gap for start 15758 = on the larger side but likely too small for another gene. (But other phages have minimal gap before gene + gene is shorter in MooKitty than other phages if this is called as start site.) -8 gap for start 15668. /note=Phamerator: pham number 72719. Date 2/07/2023. Pham has 4 members, only draft is MooKitty. The gene is conserved in the 3 nondraft phages. This gene is found in phages that are in the same cluster as MooKitty (AX); found in Adaia, Atraxa, and Sputnik. /note=Starterator: Start site 9 in Starterator was called in 2/3 non-draft genes in the pham. MooKitty start 9 @ 15668 has 2 MAs. This evidence disagrees with the site predicted by Glimmer. /note=Location call: Based on above evidence, this is likely a real gene with the start site at 15668. /note=Function call: Phages DB: HNH endonuclease for multiple phages with e values < E-30 and >200 alignment. NCI BLAST: HNH endonuclease for multiple phages with < E-29 e values, >60% percent identity, > 90% coverage . CDD had one hit of HNH endonuclease (e value of E-7, lower coverage but still >35%) + second hit for different type of endonuclease but e>E-3. (Some CDD hits in PECAAN not present on CDD website.) The HHpred had various hits with HNH or CRISPR associated endonuclease. Likely an HNH endonuclease since this is most common result. /note=Transmembrane domains: TMHMM does not call any transmembrane domains in this region of the genome and it makes sense for an endonuclease to be found inside the cell and not on the membrane. /note=Secondary Annotator Name: Bursulaya, Bella /note=Secondary Annotator QC: Great, thorough notes! For the synteny box, I would fill out downstream genes and their functions. I see that you said CDD only has one hit on the website, but PECAAN shows more than 1 hit. I am not sure how well the websites transfer data over to each other, so I would ask Dr. Freise what to do in this situation. For the transmembrane domain section, I would include a little more evidence as to how you came to your conclusion. However, I agree with your final classification of the gene as an HNH endonuclease, and the start site that you chose. Make sure to change the tab under pham starterator to "Not informative"