CDS 15 - 272 /gene="1" /product="gp1" /function="hypothetical protein" /locus tag="JulietS_1" /note=Original Glimmer call @bp 15 has strength 12.49; Genemark calls start at 15 /note=SSC: 15-272 CP: yes SCS: both ST: SS BLAST-Start: [GP1 [Mycobacterium phage Cali] ],,NCBI, q1:s1 100.0% 5.00551E-57 GAP: 0 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.908, -2.9063687850157054, yes F: hypothetical protein SIF-BLAST: ,,[GP1 [Mycobacterium phage Cali] ],,YP_002224478,100.0,5.00551E-57 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Arredondo, Alexis /note=Auto-annotation: Both Glimmer and GeneMark list the start site of this gene at 15, with the start codon being GTG. /note=Coding Potential: The gene demonstrates strong coding potential in the forward direction only, being completely covered by the start site. Additionally, coding potential is also demonstrated equally in host-trained and self-trained GeneMark. /note=SD (Final) Score: The final score for the gene was -2.906 with the z-score of 2.908, making it the best candidate. /note=Gap/overlap: There is no gap, this is because it is the first gene. /note=Phamerator: It belongs to pham 574, dated 1/8/2023. It is conserved, also being found in Momo and Phlegm. /note=Starterator: Start site 1 was manually annotated in 154 of the 154 non-draft genes in the pham. This start site is 15 in JulietS, which agrees with the auto-annotation by Glimmer and GeneMark. /note=Location call: Based on the fact that Glimmer and GeneMark share the same start site, 15, which is validated by starterator, and the gene is conserved in Momo and Phlegm, this gene is most likely real. Other considerations include the fact that the start site 15 covers the entire gene for coding potential. /note=Function call: NKF. The NCBI Blast produced one significant hit, which was listed as a hypothetical protein (NKF), with a probability of 100%, an e-value of 5.01e-57, and percent coverage of 100%, indicating that the gene also does not have a known function. Additionally, PhagesDB Blasted listed two genes BadAgartude and Breeniome, with an e-value 1e-45 each, and without a known function. HHPred did not have a significant hit, and neither did CDD. However, given the statistics of the single NCBI Blast, it is most definitely a protein with no known function. /note=Transmembrane domains: Deep TMHMM does not list TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Rheinhardt, Jenna /note=Secondary Annotator QC: I agree with the primary annotators location call and function call based on the provided evidence. CDS 269 - 469 /gene="2" /product="gp2" /function="hypothetical protein" /locus tag="JulietS_2" /note=Original Glimmer call @bp 269 has strength 4.13; Genemark calls start at 269 /note=SSC: 269-469 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_SHAQNATO_2 [Mycobacterium phage Shaqnato]],,NCBI, q1:s1 100.0% 3.52625E-41 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.485, -6.094858505415773, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_SHAQNATO_2 [Mycobacterium phage Shaqnato]],,QAY04966,100.0,3.52625E-41 SIF-HHPRED: SIF-Syn: /note=FA: Function call changed to NKF because deepTmHmm predicted no TMDs, only a signal peptide. /note=Primary Annotator Name: Kumar, Preyasi /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 269, ATG /note=Coding Potential: Yes, the gene has reasonable coding potential predicted within the putative ORF and the chosen start site covers all this coding potential. ​​Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -6.095, which is third best score on PECAAN. Z-score is 1.485, which is also not strong. I didn`t choose a different gene candidate though because of stronger start site, start codon evidence, starterator evidence. /note=Gap/overlap: 4bp overlap. Reasonable, evidence of an operon, acceptable gene length. I didn’t choose a start site that would make a longer ORF because both Glimmer and GeneMark agreed on the current start site and all other final scores are lower than the current start site. /note=Phamerator: Pham 556. Date 01/08/2023. It is conserved, found in Phlegm and Shrimp. /note=Starterator: Yes, there is a conserved start site choice. It is start number 3 with a base pair coordinate of 269. 168 of 168 call site #3. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 269 bp. /note=Function call: The top three PhagesDB BLAST hits have unknown function (e-value < 10^-33). No conserved domains were identified for this query sequence. NCBI hit for phage Shaqnato supports the NKF call. /note=Transmembrane domains: TOPCONS and TMHMM predicted 1 TMD, but DeepTmHmm predicted no TMDs. This is most likely a signal peptide according to DeepTmHmm. /note=Secondary Annotator Name: Aunger, Sarah /note=Secondary Annotator QC: I agree with this gene`s function call and location. Just make sure you add DeepTMHMM. CDS 473 - 1339 /gene="3" /product="gp3" /function="nucleotidyl transferase" /locus tag="JulietS_3" /note=Original Glimmer call @bp 473 has strength 13.61; Genemark calls start at 473 /note=SSC: 473-1339 CP: yes SCS: both ST: SS BLAST-Start: [nucleotidyl transferase [Mycobacterium phage EasyJones]],,NCBI, q1:s1 100.0% 0.0 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.889, -3.711231203526307, yes F: nucleotidyl transferase SIF-BLAST: ,,[nucleotidyl transferase [Mycobacterium phage EasyJones]],,QZM07121,99.3056,0.0 SIF-HHPRED: RlaP ; RNA repair pathway DNA polymerase beta family,,,PF10127.12,83.6806,100.0 SIF-Syn: /note=Primary Annotator Name: Rheinhardt, Jenna /note=Auto-annotation: Both Glimmer Start and GeneMark Start have the same call; start of 473 and start codon of ATG. /note=Coding Potential: There is strong coding potential present within the host-trained gene mark within the coding range of the gene. Within the self-trained GeneMark the coding potential also presents a strong potential. Coding potential is found in the forward direction on the second row. /note=SD (Final) Score: -3.711 ; it is not the best final score, but when compared to the other options this is the best score available. The z-score is 2.889 which is not strong but still better than others provided. /note=Gap/overlap: There is a gap of 3. Which is reasonable. Compared to the other start sites which had gaps that range from 42 to above 300. /note=Phamerator: Completed on 1/25/23 ; there are 234 phages that are comparable within the pham of 65577. They fall within cluster C, GD, L, M, CV, CY, DB, V, and DU. 8 of the 234 phages are draft genomes, so only 228 are valuable to be compared to (ex. Ading, ArcherS7, and Audrick) /note=Starterator: Completed on 1/10/23; Pham 65577 ;Yes there is a conserved start site. Number 19. Keeping with the start site of 473, 94 of the 214 MA phages call this start site as well for the gene. It is also called 99.0% of the time if it is present within the gene. This start site is highly conserved within the pham. /note=Location call: 473 ; Confirms that this is a real gene /note=Function call: nucleotidyl transferase ; The Blast’s performances provided strong evidence for the gene to be a nucleotidyl transferase or HNH endonuclease. There was evidence within the CDD for nucleotidyl transferase, but the % aligned and % identity are low. There are over 30 aa present within the sequence so HNH endonuclease is the most likely function. Blackbrain calls the function of HNH endonuclease from the Phagesdb Blast with an e value of -167. The NCBI BLAST presented the HNH endonuclease from phage Blackbrain with a 97.22 %identity, 99.3056 % aligned, and an e value of 0. The HHpred presents nucleotidyl transferase with 97.8% probability with 6UN8_B and 97.5% probability with 3C18_C. It is likely that there is a HNH endonuclease present within the nucleotidyl transferase. (https://seaphages.org/forums/topic/4841/) /note=Transmembrane domains: DeepTmHmm presented no transmembrane domains so it is likely not a membrane protein. /note=Secondary Annotator Name: Fernandez, Mackenzie /note=Secondary Annotator QC: I agree with the primary annotators location call and function call based on the provided evidence. for Phamerator: the total number of drafts listed is 8, but the report states 20 ; for Staterator: the site number is also different and the Starterator report date is wrong ; agree with function call /note=Tertiary QC: Fadi Albanaa. HHpred hits unchecked (non-significant). Phage Blackbrain unchecked for PhagesDB blast. CDS 1339 - 1590 /gene="4" /product="gp4" /function="hypothetical protein" /locus tag="JulietS_4" /note=Original Glimmer call @bp 1369 has strength 8.25; Genemark calls start at 1390 /note=SSC: 1339-1590 CP: yes SCS: both-cs ST: SS BLAST-Start: [gp4 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 1.00256E-52 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.798, -5.151368342989444, no F: hypothetical protein SIF-BLAST: ,,[gp4 [Mycobacterium phage Bxz1] ],,NP_818080,98.7952,1.00256E-52 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Tseng, Kylie /note=Auto-annotation: Glimmer and Genemark call different sites; Glimmer Start is 1369; Genemark Start is 1390; The start site I am going with is 1339; start codon: ATG /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF on the forward strand only on line 1 (indicating that this is a forward gene). The chosen start site covers this ORF. Coding potential is found in both Genemark Self and Host. There is synteny with non-draft phages such as Astraea and Blackbrain. /note=SD (Final) Score: -5.151. SD score is not the best, but still reasonable. /note=Gap/overlap: Yes, the overlap is reasonable (1 bp). The other start site of 1369 has a reasonable gap of 29 bp while 1390 has a gap of 50 bp. However, the small overlap of 1 bp at start site 1339 is preferable due to other factors as well. /note=Phamerator: Date of investigation: 1/10/23; Pham 691; Yes, the pham is in other members that belong to Cluster C, which is the cluster JulietS belongs to (i.e. phages Astraea and Blackbrain); /note=Starterator: Yes, there is a conserved start site choice. It is start number 9 with a base pair coordinate of 1339. Has 115 MA’s. Found in 144/144 (100%) of genes in pham. This disagrees with both the Glimmer and Genemark calls, but according to Phamerator and Starterator is conserved. E-values are good at -44. /note=Location call: Yes, the evidence suggests this is a real gene. Start site 1339 seems most likely. /note=Function call: NKF; CDD had no hits. NCBI predicted hypothetical proteins with coverages of 100%, identities above 98%, and e-values at -52. PhagesDB did not predict functions for this gene. Phages BackyardAgain and BananaFence support the NKF conclusion since they have synteny for this gene and no known function (e-values -44). HHPred had some hits for functional proteins, but e-values were high (40-50) and coverage was low (30%). /note=Transmembrane domains: No TMDs were predicted by DeepTMHMM; therefore it is not a membrane protein. /note=Secondary Annotator Name: Kumar, Preyasi /note=Secondary Annotator QC: I agree with this gene`s function call and location. CDS 1583 - 1762 /gene="5" /product="gp5" /function="hypothetical protein" /locus tag="JulietS_5" /note=Original Glimmer call @bp 1583 has strength 5.09; Genemark calls start at 1583 /note=SSC: 1583-1762 CP: yes SCS: both ST: SS BLAST-Start: [gp5 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 1.53122E-35 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.062, -3.095100142625534, yes F: hypothetical protein SIF-BLAST: ,,[gp5 [Mycobacterium phage Bxz1] ],,NP_818081,100.0,1.53122E-35 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Aunger, Sarah /note=Auto-annotation: Glimmer and GeneMark both call the start site at 1583 with a start codon of ATG. /note=Coding Potential: Coding potential in this Open Reading Frame (ORF) is in the forward strand, indicating a forward reading gene. The start site of #1583 covers all of the coding potential regions on the forward strand, which supports the forward direction of this gene. Additionally, coding potential is found in both Host-Trained GeneMark and Self-Trained GeneMark graphs which suggest that this is a potential gene at start site #1583. /note=SD (Final) Score: Even though the final score is not the most negative (SD = -3.095), the Z-score is the highest overall score (Z-score = 3.062), indicating that the autogenerated start site is the better of the options. But being since that both GeneMark and Glimmer both call the same start site, there is letigitmacy in choosing the autogenerated site especially with a Z-score over 2 as it allows the gene to be the longest reasonable length. /note=Gap/overlap: There is an overlap of -8 bp, which seems reasonable when looking at the syntenty of other phages classed in the C1 cluster against this gene in JulietS. There is a reasonable overlap between this gene and the gene downstream of about -4 bp which allows for the longest length of the gene being 180 bp gene without too much overlap or unnecessary gaps. . /note=Phamerator: The gene was found to be in Pham 1456 (01/12/2023), which is common in Cluster C1 phages, as previously seen in Phages Ava, and Ewok. There were no listed functions in connection with this Pham number. However, the base pair length was conserved at 180 bp. /note=Location call: There is a reasonable and highly conserved start site that was looked at on 01/12/2023 at (9, 1583) which was called by 61 out of the 61 non-draft genes out of the 68 total pham members. /note=Function call: The function of the gene seems to have no known function. In the BLASTp on PhagesDB.org it has two matches between Sauce and Iota phages that have an e-value of 10-29 and 100% positives with no known function (NKF). Additionally, when looking at the NCBI BLASTp it also indicates a no known function (NKF) with an e-value of 10-35 and a 100% match with phage Bxz1. There are additional matches with a phage name MoMo Mixon that indicates a hypothetical protein but this is not enough to determine a function. HHpred and CDD both give inclusive results with CDD giving no hits and HHpred having the lowest possible hit with an e-value of 9.9 and a probability of 50 for zinc finger. Thus, this indicates a high probability that the function of this gene is not known. /note=Transmembrane domains: Both TOPCONS and TMHMM predict no transmembrane domains, thus this gene does not encode for a membrane protein. DeepTMHMM also predicts no transmembrane domains as well, so all encoding of this gene is done inside the cell. /note=Secondary Annotator Name: Arredondo, Alexis /note=Secondary Annotator QC: I agree with the call of the annotator, both in the gene’s location and function. CDS 1759 - 2151 /gene="6" /product="gp6" /function="hypothetical protein" /locus tag="JulietS_6" /note=Original Glimmer call @bp 2020 has strength 5.69; Genemark calls start at 1759 /note=SSC: 1759-2151 CP: yes SCS: both-gm ST: NI BLAST-Start: [hypothetical protein SEA_PETERSON_79 [Mycobacterium phage Peterson]],,NCBI, q1:s1 53.0769% 1.29085E-4 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.564, -4.536085090614939, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_PETERSON_79 [Mycobacterium phage Peterson]],,UVK60627,32.3308,1.29085E-4 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Fernandez, Mackenzie /note=Auto-annotation: Glimmer and GeneMark call different starts. Glimmer start 2020. GeneMark start 1759. ATG start codon. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene, and covers the start site. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.536. It is the best final score on PECAAN. Z-score is 2.564. /note=Gap/overlap: -4 overlap. Is small and reasonable. Suggests an operon. /note=Phamerator: Pham: 54992. Date 1/12/2023. All 7 members of this pham are draft genomes and may not be convincing evidence. Conserved in phages Concombre_16 and Spec_12. /note=Starterator: Start site 9 in Starterator had no manually annotations. This may not be convincing evidence. Start 9 is 2020 in JulietS. This evidence agrees with the site predicted by Glimmer. Starterator is not informative. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 1759. GeneMark start site has more convincing evidence. Even though there is no synteny with any other phage genomes, the coding potential, Final Score, overlap and Z-score heavily support the GeneMark site. Glimmer is supported by Starterator but no manual annotations have been made. /note=Function call: NKF. No, there is not enough evidence to suggest a function for this protein as both NCBI and PhagesDB did not have any significant hits. Function unknown was listed for every PhagesDB hit and no e-values were close to zero (all were very high). Hypothetical proteins were listed for NCBI, with some e-values close to zero. Example: Mycobacterium phage Sagefire. E-value = 0.002. HHpred and CDD had zero hits. /note=Transmembrane domains: 0 predicted TMRs from Deep TMHMM, therefore it is not a membrane protein. /note=Secondary Annotator Name: Hines, Kia /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 2148 - 2384 /gene="7" /product="gp7" /function="hypothetical protein" /locus tag="JulietS_7" /note=Genemark calls start at 2148 /note=SSC: 2148-2384 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein KHO63_gp007 [Mycobacterium phage QBert] ],,NCBI, q1:s1 100.0% 5.17433E-49 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.496, -3.769471247018311, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein KHO63_gp007 [Mycobacterium phage QBert] ],,YP_010058106,100.0,5.17433E-49 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Geghamyan, Knar /note=Auto-annotation: GeneMark calls the start site at bp 2148, but Glimmer does not call the gene at all. The start codon is GTG. /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORFs, and the chosen start site covers all of this coding potential. /note=SD (Final) Score: -3.769 is the best SD (final) score because it is the second highest (least negative) value, but has a significantly smaller overlap than the other start sites and doesn`t include a large gap, unlike the start site with the highest SD value. Z-score is above 2. /note=Gap/overlap: The 4 base pair overlap with the upstream gene is reasonable since it is below the threshold of 50bp. This suggests that it could be part of an operon. /note=Phamerator: Gene found in pham 65591 on 1/12/23. When compared to phages HyRo, Pinkcreek, and Stubby the pham in which this gene is most commonly annotated was found to be in other members of the same cluster C. The start number called the most often in the published annotations is 74, it was called in 185 of the 228 non-draft genes in the pham. There is no synteny with other non-draft phages, but the gene still exists in phages of the same cluster. /note=Starterator: Start: 74 @2148 has 185 MA`s. The start number called the most often in the published annotations is 74, it was called in 185 of the 228 non-draft genes in the pham. /note=Location call: The evidence supports that this is a real gene, and the potential candidate start site at 2148 seems most likely. /note=Function call: NKF. The top three phagesdb BLAST hits (IkeLoa, QBert, and RoMag) have no known function (E-value <10^-39), and the top 3 NCBI BLAST hits also have no known function. (100% coverage, 99%+ identity, and E-value <10^-115). There were no CDD hits. There were no HHpred hits that fulfilled the requirements of probability > 80%, coverage >40%, and an e-value <10^-3. /note=Transmembrane domains: Neither TMHMM, deep TMHMM, or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Luk, Jarrett /note=Secondary Annotator QC: I agree with the location and function call however do include the starterator report date and deep TMHMM predicitons CDS 2368 - 2484 /gene="8" /product="gp8" /function="membrane protein" /locus tag="JulietS_8" /note=Original Glimmer call @bp 2368 has strength 3.94; Genemark calls start at 2368 /note=SSC: 2368-2484 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein MOMOMIXON_11 [Mycobacterium phage MoMoMixon] ],,NCBI, q1:s1 100.0% 2.78232E-18 GAP: -17 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.077, -4.585118207934747, no F: membrane protein SIF-BLAST: ,,[hypothetical protein MOMOMIXON_11 [Mycobacterium phage MoMoMixon] ],,YP_009017347,100.0,2.78232E-18 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Melnyk, Mattie /note=Auto-annotation: Both Glimmer and GeneMark agree on the start site of 2368 with start codon TTG /note=Coding Potential: No coding potential at the beginning of this gene in both host trained and self trained. Coding potential begins around 2400 and continues to stop site on Track 1 in the forward direction /note=SD (Final) Score: -4.585, NOT the best final score on PECAAN however the manual annotations and the auto-annotation justify the SD score /note=Gap/overlap: -17, indicates an operon and appears to be conserved in other phages (IkeLoa, QBert) /note=Phamerator: Pham 62087, Date 01/12/23, 169 members, 12 drafts, all pham members are cluster C including Drazdys and IkeLoa /note=Starterator: Start 4 @ 2368; 101/142 manual annotations for this start site (most MA), agrees with the call from Glimmer and GeneMark. It calls the most annotated start site /note=Location call: Based on the above evidence this is a real gene with a start site at 2368. /note=Function call: Membrane Protein. All PhagesDB blast hits have E scores in the range of 10^-15 and say function unknown. Top two NCBI results call it as a hypothetical protein and have 100% coverage and E-values in the 10^-18 range with 100% identity for the top hit and 97.4% identity for the second hit. There were no significant hits on CDD or HHpred but DeepTMHMM called one TMD that is 20 aa long /note=Transmembrane domains: DeepTMHMM predicts 1 TMR that is 20 amino acids long and is most likely an alpha TMR. This indicates that the protein may be a membrane protein /note=Secondary Annotator Name: Okahata, Leila /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. However, make sure to name in the Phamerator section 2-3 phages that also share that Pham (show that it`s conserved). For the Starterator section, make sure to say how many total manual annotations there were (101 manual annotations out of how many total) to further highlight that it`s the most manually annotated start site. For the Functional call, make sure to also include whether you had significant hits or not from CDD. CDS 2481 - 2687 /gene="9" /product="gp9" /function="membrane protein" /locus tag="JulietS_9" /note=Original Glimmer call @bp 2466 has strength 17.2; Genemark calls start at 2481 /note=SSC: 2481-2687 CP: yes SCS: both-gm ST: SS BLAST-Start: [hypothetical protein SEA_ROMAG_8 [Mycobacterium phage RoMag]],,NCBI, q1:s1 100.0% 5.61662E-37 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.339, -4.307065658309795, no F: membrane protein SIF-BLAST: ,,[hypothetical protein SEA_ROMAG_8 [Mycobacterium phage RoMag]],,QAY26705,100.0,5.61662E-37 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Gurunathan, Vibha /note=Auto-annotation: Glimmer listed the start site as 2466, and Genemark listed the start site as 2481. The start codon called by Glimmer is ATG and the start codon called by Genemark is ATG. /note=Coding Potential: Coding potential is in the forward strand only, and has coding potential predicted by Host Trained and Self Trained Genemark. The chosen start site covers all the coding potential. /note=SD (Final) Score: The start site predicted by Genemark, 2481, has the best score, with a final score of -4.307 and a Z- score above 2, of 2.339. However, RBS score may be irrelevant as this gene is part of an operon. /note=Gap/overlap: The start site predicted by Genemark has the smallest overlap of 4 bp, which is the most favorable. /note=Phamerator: Pham 540. Date 1/11/23. It is conserved; found in RoMag_8 and ZygoTaiga_10. Function was listed as unknown on the phams database. /note=Starterator: Start site 8 @2481 or the start site predicted by GeneMark has 156 calls, while the start site at 2466 has none. /note=Location call: Based on the above evidence, this is a real gene, and the most likely start site is 2481. /note=Function call: Function is called as a membrane protein. The top three phagesDB BLAST hits have an unknown function, with e-value <-30. The top three phages on NCBI blast also are listed as “hypothetical proteins,” or proteins with unknown function. CDD states no conserved domains. HHpred does show similarities with other genes, with probability percentages of <70% (70, 60, 59%) and low percent coverage (below 50 %), as well as high e-values. However, there are two transmembrane domains shown in TMHMM and TOPCONS, so the function is called a membrane protein. /note=Transmembrane domains: 2 TMD’s are predicted in deepTMHMM, so the function is called a membrane protein. 2 transmembrane domains are consistent with a membrane protein. /note=Secondary Annotator Name: Rheinhardt, Jenna /note=Secondary Annotator QC: I agree with this gene`s function call and location CDS 2771 - 3454 /gene="10" /product="gp10" /function="hypothetical protein" /locus tag="JulietS_10" /note=Original Glimmer call @bp 2771 has strength 11.9; Genemark calls start at 2771 /note=SSC: 2771-3454 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein ET08_9 [Mycobacterium phage ET08] ],,NCBI, q1:s1 100.0% 6.46134E-163 GAP: 83 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.908, -3.2925703904164987, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein ET08_9 [Mycobacterium phage ET08] ],,YP_003347692,99.5595,6.46134E-163 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hines, Kia /note=Auto-annotation: Glimmer and GeneMark both call 2771 as the start site. /note=Coding Potential: There is reasonable coding potential on the forward strand, indicating it is a forward gene. The coding potential covers the entire start to stop coding region, however, on GeneMark Host, towards the middle of the gene sequence the coding potential dips resulting in a short region of bad coding potential. On GeneMark Self, the gene has very good coding potential with very little dips in the potential. /note=SD (Final) Score: The Final Score and the Z-score are both above the qualifications for good scores. Z-score is 2.908 (above 2, so it is a good score) and final score is -3.293 (a relatively low negative which also indicates a good score). Both of these indicate the presence of a credible ribosome binding site. /note=Gap/overlap: The length of the gene is reasonable (684 bp) as it is above 120 bp. The gap is slightly too large (83 bp) as it is above 50 bp. However, since the gene is conserved in other phages such as ET08 and Shifa then we can conclude that this 83 bp gap is reasonable. /note=Phamerator: Pham number 60681 as of 1/08/23. It is conserved, found in ET08_9 and Shifa_7. /note=Starterator: Start site (@2771) is 11 and was manually annotated 154 times. This is the same as the conserved start site. There are 191 members of the pham and 154 call the same conserved start site. /note=Location call: Based on the above evidence, this is a real gene and the start site is 2771. /note=Function call: NKF. Phagesdb, NCBI Blast, CDD, and HHpred all call no known function. The e-values for the top hits on phagesdb and NCBI Blast are both really good, 1e-125 and 6.46134e-163 respectively, but neither call a function. CDD did not have any results and the top hits on HHpred had really terrible e-values (like 150 and 290). The top hits on NCBI Blast had really good identity percentages (99% and 98%), and the top hits on phagesdb had high scores (447 and 445) but without function calls this data isn’t relevant. /note=Transmembrane domains: There are no TMDs predicted by deepTMHMM, so it is not a membrane protein. /note=Secondary Annotator Name: Kumar, Preyasi /note=Secondary Annotator QC: I agree with the location and function call. CDS 3492 - 3746 /gene="11" /product="gp11" /function="hypothetical protein" /locus tag="JulietS_11" /note=Original Glimmer call @bp 3492 has strength 12.91; Genemark calls start at 3492 /note=SSC: 3492-3746 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SCOTTMCG_13 [Mycobacterium phage ScottMcG] ],,NCBI, q1:s1 100.0% 2.95845E-55 GAP: 37 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.564, -3.5660483139923818, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SCOTTMCG_13 [Mycobacterium phage ScottMcG] ],,YP_002224046,100.0,2.95845E-55 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Luk Jarrett /note=Auto-annotation:Forward Gene.Glimmer(3492), Genmark (3492), start codon:ATG /note=Coding Potential: The coding potential is found in the forward frameThe start site of the gene does not cover all the coding potential in the self trained genemark. In the host trained genemark the gene does covered all the coding potentials /note=SD (Final) Score: -3.566 it`s the most reasonable score due to how it has a small gap and it its the best score due to how it is the least negative . Z score is 2.564 which is really close to 2.It shows that the site is a good candidate for start site. /note=Gap/overlap: The gap is 37 which is below 50 which makes the start site a good candidate /note=Phamerator: Pham 581, 01/08/23 gene conserved in phage SilverDipper and phage Alice /note=Starterator: 03/24/23. Manually annotated 155/155 nondraft in this pham. Start 1 (1,3492) are manually called by 155 others . Evidence is also in line with the autoannotation made by glimmer and genemark /note=Location call: It is a real gene and the start site is Start 1 at 3492 based on the evidence above /note=Function call: NKF. The top two hits on phagesDB blast (e-value=4e-44 for phage Roots515 and ScottMcG) indicate unknown function. The top hits on NCBI blast (e-value =3e-55 for phage ScottMcG and 7e-55 for phage shelob, with >98% identity and 100% coverage) also indicate an unknown function. There are no significant hits on HHpred and CDD. /note=Transmembrane domains: No transmembrane domain predicted on TOPCONS, TMHMM, and deepTMHMM. This indicates that this gene is not responsible for a membrane protein. /note=Secondary Annotator Name: Lee, Amber /note=Secondary Annotator QC: I agree with the function and location call. CDS 3746 - 3958 /gene="12" /product="gp12" /function="hypothetical protein" /locus tag="JulietS_12" /note=Original Glimmer call @bp 3746 has strength 8.34; Genemark calls start at 3746 /note=SSC: 3746-3958 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_SHAQNATO_13 [Mycobacterium phage Shaqnato]],,NCBI, q1:s1 100.0% 1.55539E-43 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.151, -4.490107201936413, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_SHAQNATO_13 [Mycobacterium phage Shaqnato]],,QAY04977,100.0,1.55539E-43 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Okahata, Leila /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 3746 bp. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. The chosen start site includes all of the coding potential. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.490. Although this is not the best SD score, because this gene has a 1 bp overlap, this may be evidence of an operon, so this SD score is still acceptable. /note=Gap/overlap: -1 bp upstream overlap. This overlap is very small and reasonable (less than 4 bp). This may be evidence of an operon. /note=Phamerator: Pham 584. Date 1/10/2023. It is conserved and found in Grungle, Daffodil, and ParkTD, which are all in the same cluster as JulietS (C). /note=Starterator: Start site 2 in Starterator was manually annotated in 154/154 non-draft genes in this pham. Start 2 is 3746 bp in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and likely has a start site at 3746 bp. /note=Function call: No known function (NKF). The top two phagesDB BLAST hits are of unknown function (E-value < 8e-38), and the top three NCBI BLAST hits are also of unknown function (E-value < 6e-42, 97.14%+ identity, 100% coverage). CDD and HHpred had no significant hits (all with low probability, low coverage, and high E-value). /note=Transmembrane domains: Neither TMHMM, TOPCONS, nor DeepTMHMM predict any TMDs, therefore, it is not a membrane protein. /note=Secondary Annotator Name: Shah, Amay /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 3972 - 4586 /gene="13" /product="gp13" /function="hypothetical protein" /locus tag="JulietS_13" /note=Original Glimmer call @bp 3972 has strength 5.61; Genemark calls start at 3972 /note=SSC: 3972-4586 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein KHO63_gp016 [Mycobacterium phage QBert] ],,NCBI, q1:s1 100.0% 9.21879E-145 GAP: 13 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.852, -2.881875840424956, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein KHO63_gp016 [Mycobacterium phage QBert] ],,YP_010058115,100.0,9.21879E-145 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hegde, Priya /note=Auto-annotation: Glimmer and GeneMark call a start site of 3972 (ATG codon). /note=Coding Potential: Host-trained and self-trained GeneMark show good coding potential in the putative ORF. Coding potential was noted in another ORF in Self-trained GeneMark (~100bp between 4400 and 4500), but this may not be of significance. The chosen start site covers all of the coding potential. Coding potential in this ORF is on the forward strand only. /note=SD (Final) Score: -2.882. It is the best final score on PECAAN. /note=Gap/overlap: There is a 13bp gap with the upstream gene, which is reasonable. /note=Phamerator: Gene is in pham 3218 (date accessed: 01/10/2023). It is conserved, found in Bxz1 (C1) and BigSwole (C1). /note=Starterator: Start site 2 at 3972 was manually annotated in 23 non-draft phage genomes. /note=Location call: The selected start site at 3972 has the highest final score (-2.882) of the listed start sites and a Z-score greater than 2. /note=Function call: NKF in PhagesDB BLASTp; top hits were phages Shaqnato (e-115) and QBert_16 (e-115), both of which had close to 100% coverage. HHPred evidence is not convincing. No CDD hits. NKF in NCBI BLAST; top hits were phages QBert (9.2e-145) and RoMag (1.09e-142). /note=Transmembrane domains: No TMDs predicted in DeepTMHMM, so this protein is not a membrane protein. /note=Secondary Annotator Name: Sandhu, Muskaan /note=Secondary Annotator QC: I agree with location and function call; all evidence necessary have been checked on PECAAN. CDS 4583 - 4921 /gene="14" /product="gp14" /function="membrane protein" /locus tag="JulietS_14" /note=Original Glimmer call @bp 4583 has strength 5.58; Genemark calls start at 4607 /note=SSC: 4583-4921 CP: yes SCS: both-gl ST: SS BLAST-Start: [gp20 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 1.08968E-73 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.574, -3.5268154277366257, yes F: membrane protein SIF-BLAST: ,,[gp20 [Mycobacterium phage Bxz1] ],,NP_818096,100.0,1.08968E-73 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Lee, Amber /note=Auto-annotation: Both Glimmer and GeneMark call the gene but at different start sites. Glimmer calls the gene at the start site 4583 bp. GeneMark calls the gene at the start site 4607 bp. Preference is given to the Glimmer start site at 4583 bp and Starterator also notes that this site is the most manually annotated start site. The start codon is GTG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: The start site 4583 corresponds to a Final Score of -3.527 which is the best final score and a Z-score of 2.574. /note=Gap/overlap: There is a 4 nucleotide overlap which suggests that this gene is part of an operon. This overlap is seen in other non-draft phages like Astraea (C1) and BeanWater (C1). /note=Phamerator: Pham: 291. Date: 1/10/2023. It is conserved and found in Astraea (C1) and BeanWater (C1). No function has been called. /note=Starterator: Date: 1/10/2023. Start site 22 in Starterator was manually annotated in 98/206 non-draft genes in this pham. Start 22 is 4583 in JulietS. This evidence agrees with the site predicted by Glimmer. /note=Location call: Based on the above evidence, this is a real gene, and the most likely start site is 4583. /note=Function call: Membrane Protein. E-values too large and not enough coverage in HHPRED. Many strong hits (e-value ~0) are observed in PhagesDB and NCBI BLAST, but no known function. No hits in CDD. Not a membrane protein because TMHMM and Deep TMHMM both predict just one TMD. TOPCONS doesn’t predict any TMDs. /note=Transmembrane domains: TMHMM and Deep TMHMM both predict just one TMD. TOPCONS doesn’t predict any TMDs. /note=Secondary Annotator Name: Arredondo, Alexis /note=Secondary Annotator QC: I agree with the call for both the location and function made by the primary annotator. CDS 4973 - 5158 /gene="15" /product="gp15" /function="hypothetical protein" /locus tag="JulietS_15" /note=Original Glimmer call @bp 4973 has strength 12.97; Genemark calls start at 4973 /note=SSC: 4973-5158 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein LINSTU_21 [Mycobacterium phage LinStu] ],,NCBI, q1:s7 100.0% 4.32586E-34 GAP: 51 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.062, -2.523003374675015, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein LINSTU_21 [Mycobacterium phage LinStu] ],,YP_009014615,91.0448,4.32586E-34 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Qi, Haocheng /note=Auto-annotation:Both Glimmer and genemark, they agree on the same site which is 4973, the start codon is ATG /note=Coding Potential:Coding potential is found both in GeneMark Self and Host, and there is only forward potential, and the chosen start site does include all of the coding potential. /note=SD (Final) Score: -2.523, the best final score on PECAAN /note=Gap/overlap: 51, is a large gap. But there is no other coding potential in the reverse direction so this gap is fine. /note=Phamerator: pham: 591. Date 01/08/23. It is conserved; found in other 151 non-draft phages, such as Ading or Bxz1 /note=Starterator: Start site 11 in Starterator was manually annotated in 118/152 non-draft genes in this pham. Start 11 is 4973 in JulietS. This evidence agrees with the site predicted by both Glimmer and GeneMark. /note=Location call:Based on the above evidence, this is a real gene and the most likely start site is 4973 /note=Function call: Not known function. The top 2 phagesdb BLAST hits have the function of unknown function with e-value of 1e-28 Not known function, and the ncbi blast also have no known function with e-value of 4e-34. There is also no hits in CDD, the largest possibility in HHpred is 89.17%, but the e-value is 9.3 which is very high so this possibility is not reliable, so overall there is no known function in this gene. /note=Transmembrane domains: Neither TMHMM or TOPCONS or DeepTMHMM predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Wu, Angus /note=Secondary Annotator QC: checked CDS 5252 - 5536 /gene="16" /product="gp16" /function="hypothetical protein" /locus tag="JulietS_16" /note=Original Glimmer call @bp 5300 has strength 10.0; Genemark calls start at 5300 /note=SSC: 5252-5536 CP: yes SCS: both-cs ST: SS BLAST-Start: [gp22 [Mycobacterium phage Cali] ],,NCBI, q1:s1 100.0% 1.04806E-62 GAP: 93 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.43, -6.210473446429151, no F: hypothetical protein SIF-BLAST: ,,[gp22 [Mycobacterium phage Cali] ],,YP_002224499,100.0,1.04806E-62 SIF-HHPRED: SIF-Syn: /note=PECAAN Notes /note=Primary Annotator Name: Uyemura, Antonio /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 5300. The suggested start codon is ATG. /note=Coding Potential: The gene has a really good coding potential and synteny with phage Grungle and Ghost. There was no atypical coding potential in both the self trained and host trained gene mark. Additionally, the two flanking genes are both in the forward direction. Lastly, the gene is long enough to be valid (237 bp). /note=SD (Final) Score: The SD final school for the auto-annotated start site has the lowest SD score (-3.756). Additionally, the Z-score is the second highest (2.435) and is above the recommended threshold. /note=Gap/overlap: There is a gap of 141. This gap is large and over the 50 bp threshold however, the gene has good coding potential, the lowest final score, and an adequate z-score. See previous sections for further explanation. /note=Phamerator: Pham 532 1/19/23. This pham is conserved in both Alice (C1) and Amataga (C1). /note=Starterator: Start site 5 @ 5300. This was manually annotated 3/171. Although this is very low, this start site for JulietS makes the most sense. It has one of the best Final scores (-3.756) and Z-score (2.435). /note=Location call: Based on the above evidence, this is a real gene that starts at 5300. /note=Function call:The function is unknown. The top three phages (Astraea (C1), BadAgartude (C1), BananaFence (C1)) phagesdb BLAST hits mark the function as unknown (E-value 1e-146). Additionally the first 3 NCBI BLAST hits also call the function as a Hypothetical protein (100% coverage,82>% identity, and E-values <10^-49). HHPRED is not convincing evidence nor did anything show up for CDD. /note=Transmembrane domains: DeepTMHMM indicates that the first 21 amino acids are signal and the rest our outer. There was no indication of this being a transmembrane protein (TMR=0) thus it is a signaling protein. /note=Secondary Annotator Name: Gurunathan, Vibha /note=Secondary Annotator QC: I agree with all the above evidence and the function/location call. The only change I would make is update TMHMM and TOPCONS to Deep TMHMM. There is no coding potential in the gap. /note=tertiary QC - Fadi Al Banaa: start site changed to 5252. Reduces the gap with the upstream gene and makes more sense in terms of gene length when compared to other phages in the Pham (Ex: Audrick and Babyland). It`s also the most manually annotated start site in starterator. CDS 5533 - 5760 /gene="17" /product="gp17" /function="hypothetical protein" /locus tag="JulietS_17" /note=Original Glimmer call @bp 5533 has strength 18.0; Genemark calls start at 5533 /note=SSC: 5533-5760 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein KHO63_gp020 [Mycobacterium phage QBert] ],,NCBI, q1:s1 100.0% 1.12892E-41 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.153, -4.34663776219455, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein KHO63_gp020 [Mycobacterium phage QBert] ],,YP_010058119,100.0,1.12892E-41 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Barrera, Alexis /note=Auto-annotation: Glimmer and Genemark. Both call the start located at 60654. The start codon is TTG, which is rare (about 7% of all genes have this start). /note=Coding Potential: Coding potential in this reading frame is in the forward direction only, which indicates this is a forward gene. Coding potential is found in both GeneMark Self and Host. All of the coding potential is included. /note=SD (Final) Score: The z score is 2.153 which is the best option and the final score is -4.347 which is the highest value. /note=Gap/overlap: -4bp gap upstream of the gene and a 56bp gap downstream of the gene. These gaps are conserved in other phages (Grungle, DrPhinkDaddy). /note=Phamerator: Pham: 60654. Date 01/10/2023. The gene is conserved in phages Ronan and BeanWater which are in the same cluster as JulietS. /note=Starterator: Start site 16 was called for 155 of the 209 non-draft phage genomes in the pham. It is the most annotated start site and is called 95.6% of the time when it is present. /note=Location call: Considering the evidence above, this gene is a real gene and has a start site at 5533 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: Function call: NKF. Phagesdb BLASTp shows phages Turret, Tyke, and YoungMoneyMata from cluster C1 having no function (all E-value: 6e-37). NCBI BLASTp shows phages Qbert, Wally, ValleyTerrace with no known function, full coverage and E-values of e-41. Phagesdb Function Frequency, HHpred, CDD showed no relevant hits. /note=Transmembrane domains: DeepTMHMM predicts no TMRs. This evidence indicates that this is not a membrane protein. /note=Secondary Annotator Name: Davis, Kayla /note=Secondary Annotator QC: The auto-annotation portion of the notes is incorrect, please revise the called start and start codon. Make sure to include the date for the starterator report. Other than that, I agree with function calls CDS 5819 - 5962 /gene="18" /product="gp18" /function="hypothetical protein" /locus tag="JulietS_18" /note=Original Glimmer call @bp 5816 has strength 7.16; Genemark calls start at 5819 /note=SSC: 5819-5962 CP: yes SCS: both-gm ST: NA BLAST-Start: [hypothetical protein LINSTU_24 [Mycobacterium phage LinStu] ],,NCBI, q1:s2 100.0% 2.2785E-25 GAP: 58 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.187, -4.4157344874810125, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein LINSTU_24 [Mycobacterium phage LinStu] ],,YP_009014618,97.9167,2.2785E-25 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Shah, Amay /note=Auto-annotation: Glimmer and Genemark do not agree on the start site. Glimmer predicts 5816 while Genemark predicts 5819. /note=Coding Potential: Coding potential in this ORF is in the forward direction only, indicating that this is a forward gene. Good coding potential is found in both GeneMark Self and Host. /note=SD (Final) Score: -4.355. This is the best final score on PECAAN. The z-score is also highest at 2.187. /note=Gap/overlap: The 55bp gap is reasonable. /note=Phamerator: pham: 543. Date 1/10/2023. It is conserved; found in Alice (C) and Bigswole (C). /note=Starterator: Start site 4 in Starterator was manually annotated in 95/156 non-draft genes in this pham while Start site 3 was manually annotated in 61/156 non-draft genes in this pham. Start 4 is 5819 in JulietS. This evidence agrees with the site predicted by GeneMark, but not Glimmer. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 5819. /note=Function call: NFK. BLASTp hits only included proteins labeled function unknown. CDD returned no hits. HHpred returned no significant hits. The top NCBI BLAST hit was a hypothetical protein with an e-value of 3.93e-26, so there is not enough evidence to determine a function call. /note=Transmembrane domains: DeepTMHMM doesn`t predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Hines, Kia /note=Secondary Annotator QC: I agree with the call, but don`t forget to fill out the start codon and starterator drop down boxes, make sure to describe the coding potential (good? bad? etc.), list the z-score as well as stating it`s the highest, discuss the length of the gene in the gap section as well, they call start number 3 for JulietS in the starterator report, make sure to discuss both 3 and 4 in this section (even though 4 is the most manually annotated), make sure to check deep TMHMM for the transmembrane part and replace the notes in PECAAN, don`t check off evidence in HHpred unless there is a function. CDS 5965 - 6117 /gene="19" /product="gp19" /function="hypothetical protein" /locus tag="JulietS_19" /note= /note=SSC: 5965-6117 CP: yes SCS: neither ST: NI BLAST-Start: [hypothetical protein KHO60_gp024 [Mycobacterium phage CharlieB] ],,NCBI, q1:s1 100.0% 1.90106E-27 GAP: 2 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.908, -2.9063687850157054, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein KHO60_gp024 [Mycobacterium phage CharlieB] ],,YP_010057432,100.0,1.90106E-27 SIF-HHPRED: SIF-Syn: CDS 6185 - 6484 /gene="20" /product="gp20" /function="hypothetical protein" /locus tag="JulietS_20" /note=Original Glimmer call @bp 6200 has strength 6.83; Genemark calls start at 6200 /note=SSC: 6185-6484 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_ATLANTEAN_24 [Mycobacterium phage Atlantean] ],,NCBI, q6:s1 67.6768% 3.32742E-31 GAP: 67 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.187, -4.337049294579155, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ATLANTEAN_24 [Mycobacterium phage Atlantean] ],,QAY14529,71.7391,3.32742E-31 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sandhu, Muskaan /note=Auto-annotation: The start site in Glimmer and Genemark are both the same: 6200. However, based on further evidence below, the suggested start is not informative. /note=Coding Potential:There is coding potential present in ORF 2 of the direct sequence (as expected for the forward gene) in the Host-Trained & Self-Trained GeneMark. Start site 6185 covers all of coding potential, /note=SD (Final) Score:-4.337. This is the best score with the best final score of 2.187 but it does not correspond with the start site recommended by Glimmer and Genemark. It corresponds with start site 6185. /note=Gap/overlap:There is a 223 bp gap located upstream of the gene. There is a 92 bp gap downstream of my gene. There is no coding potential present in GeneMark before and after the gene to indicate any genes that need to be added in the gaps. /note=Phamerator: Pham 67737 as of 2/1/23. It is conserved, found in phages Phox (C1) and Sebata (C1). /note=Starterator: 5/7 non-draft genes called start site 3. Start site 3 is not present in JulietS. 1/7 non-draft genes called start site 2, which is 6185 bp in JulietS. /note=Location call: Most likely a real gene; Start site 2 @6,185 bp because this is the best final score on PECAAN and the Z-score is above 2. Since MA start site is not found in phage JulietS, start site 2 is best option. Start codon is GTG, which is a common start codon. ` /note=Function call: No known function. The top three PhagesDB BLAST hits have the function of hypothetical proteins (E-value = 2*10^-53) and the top three NCBI BLAST hits also have the function of hypothetical proteins (>60% coverage, >60% identity, and E-value <10^-27). HHpred had no relevant hits with E-values being greater than 37. CDD had no relevant hits as well. /note=Transmembrane domains: DeepTMHMM does show any TMDs, therefore, this is not a transmembrane protein. /note=Secondary Annotator Name: Fernandez, Mackenzie /note=Secondary Annotator QC: I think start site should be re-visited. Final score: this is not the best final score ; Z-score: not listed ; Gap: does not appear to be in accordance with what is on PECAAN, need to check ; Phamerator: pham number does not match what the report says, need to double check. look at date that report was generated. the phages listed also do not appear to match the report. should also probably list number of phages and non-draft phages in pham ; Starterator: start 2 should also be considered as it also has 1 MA and is supported by suggested starts in PECAAN (has the best final score and Z-score) ; agree with function call but Phages DB Blast has e-value of e-50 not e-51 and NCBI Blast had >96% identity. CDS 6576 - 6932 /gene="21" /product="gp21" /function="hypothetical protein" /locus tag="JulietS_21" /note=Original Glimmer call @bp 6576 has strength 7.98; Genemark calls start at 6576 /note=SSC: 6576-6932 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SCOTTMCG_26 [Mycobacterium phage ScottMcG] ],,NCBI, q1:s1 100.0% 5.46999E-79 GAP: 91 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.056, -4.900166819979879, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SCOTTMCG_26 [Mycobacterium phage ScottMcG] ],,YP_002224059,100.0,5.46999E-79 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Wu, Angus /note=Auto-annotation: Using both Glimmer and GeneMark, they agree that the host start site is 6576. The start codon that is called is ATG. /note=Coding Potential: The gene has good coding potential with the host-trained GeneMark on ORF 3. The coding potential increases near the start site potential start site (6576) and plateaus until the stop codon where it tapers off. The gene similarly has good coding potential with the self-trained GeneMark on ORF 3, where the coding potential rises at the start site and plateaus until the stop site. Finally, the start and stop sites also encompass all of the coding potential. /note=SD (Final) Score: The Final Score is -4.900, and the Z score is 2.056. The final score for this start site has the best sequence match (highest final score), and also has the highest Z-score of the possible start site candidates. /note=Gap/overlap: There is a 91 base pair gap with the gene upstream, and this is reasonable/acceptable because the gap isn’t too long (>100bp). This chosen start site also makes the gene have the longest open reading frame, and have a gene length of 357 bp, which is also good. /note=Phamerator: The gene is found in Pham 65510 as of 1/12/23. The gene is conserved in other members of subcluster C1, and I used phage Stubby and Turret for comparison. /note=Starterator: A start site choice exists that is conserved among members of the pham, and corresponds to start site 52 and 6576. 156/594 non-draft genomes in the pham call also this start site. /note=Location call: The available evidence suggests that the start site is 6576. The gene appears to be real, and this proposed start site covers all the coding potential, and has a good final score and z-score. Although not as many genomes call the same start site in starterator, it is the most commonly called site and called 93.3% of the time when present. /note=Function call: The function of this protein is unknown (NKF). All results from PhagesDB BLASTp show hits with 100% identity, 100% alignment, and 100% coverage, with e-values 5e-69,, and those proteins have no known function. Similarly on NCBI BLASTp, e-values are low (5e-79), and those proteins also have no known function. /note=There were no hits on CDD, and HHpred results were not desirable, as e-values were too high. /note=Transmembrane domains: Neither DeepTMHMM or TOPCONS predicted any TMDs, this is not a membrane protein. /note=Secondary Annotator Name: Davis, Kayla /note=Secondary Annotator QC: I agree with this annotation in that this gene appears to have no known function. CDS 6943 - 7050 /gene="22" /product="gp22" /function="hypothetical protein" /locus tag="JulietS_22" /note=Genemark calls start at 6943 /note=SSC: 6943-7050 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein SCOTTMCG_27 [Mycobacterium phage ScottMcG] ],,NCBI, q1:s1 100.0% 1.61984E-16 GAP: 10 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.597, -5.509286325636915, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SCOTTMCG_27 [Mycobacterium phage ScottMcG] ],,YP_002224060,100.0,1.61984E-16 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Davis, Kayla /note=Auto-annotation: Only Genemark states a start site for this gene at 6943, which is listed at ATG. Glimmer does not state a start site for this gene. /note=Coding Potential: This gene has a good coding potential. Both the Host-trained and the Self-trained Genemarks show that the auto-annotated start does have coding potential along the forward strand. In addition to that, this gene shares synteny with phages Grungle and Daffodil as well. /note=SD (Final) Score: The Z-score that was chosen was 1.597 which corresponds to a Final score of -5.509. Though it is neither the highest Z-score nor the least negative Final Score, this start site matched the best with the gene’s position on the Pham Maps along with the GeneMark results /note=Gap/overlap: There’s a gap of 10 bp, which is the shortest possible gap. It is not likely that there is a gene in between that gap. /note=Phamerator: As of 01/12/23, this gene was found in Pham 587. It was found to be conserved in several other phages such as Daffodil, Audrick, and Alice which are all part of cluster C, which is the same as Juliet. /note=Starterator: As of 01/08/23 start site number 3 was manually annotated 152/152 of the non-draft genes within this pham. Start site 3 correlates to a start site of 6943 within JulietS, which is corroborated by the Genemark start site and the Z-score and Final scores mentioned previously. /note=Location call: Based on the evidence above, this is likely to be a real gene with a start site of 6943. /note=Function call: NKF. Based on all the available data, as of now it is impossible to conclude a function for this protein. There were no hits from CDD, and the two hits that were found on HHpred had very high e-values, making it not likely to match either of those functions found on HHpred. The NCBI blast had three hits with very good e-values (2e-16, 7e-16 and 1e-15), though all of these hits were hypothetical proteins, meaning that as of now there’s no known function. /note=Transmembrane domains: DeepTMHMM lists this as a protein that is on the inside, and therefore it is not a membrane protein. /note=Secondary Annotator Name: Gurunathan, Vibha /note=Secondary Annotator QC: I agree with the function and location call. Mark the GM coding capacity box. Change TOPCONS/TMHMM to Deep TMHMM. CDS 7050 - 7250 /gene="23" /product="gp23" /function="hypothetical protein" /locus tag="JulietS_23" /note=Original Glimmer call @bp 7050 has strength 15.06; Genemark calls start at 7050 /note=SSC: 7050-7250 CP: yes SCS: both ST: SS BLAST-Start: [gp28 [Mycobacterium phage Cali] ],,NCBI, q1:s1 100.0% 5.72618E-40 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.908, -3.6727816321281046, yes F: hypothetical protein SIF-BLAST: ,,[gp28 [Mycobacterium phage Cali] ],,YP_002224505,100.0,5.72618E-40 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Pham, Truc /note=Auto-annotation: The two auto-annotation algorithms, Glimmer and GeneMarks, both call the start of this gene at 7050 with the start codon of GTG. /note=Coding Potential: The coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host and all of the ORF is included with the 7050 start site. /note=SD (Final) Score: The SD (Final) Score for this start site is -3.673, which is the best score of all the possible start sites on PECAAN. Additionally, this start site also have the best Z-score of 2.908 on PECAAN. /note=Gap/overlap: This gene has an overlap of 1 base pair with the previous gene. This is the best possible start site since any other option would greatly increase the size of the gap, which will potentially require an addition of another gene. /note=Phamerator: This gene belongs to pham number 65617 as of 1/08/2023. The gene is conserved in phages of this cluster (C) like DrPhinkDaddy and Momo. There is no function listed for members of this family, so it is highly likely that this is a gene with an unknown function. /note=Starterator: Start site 6 is most often called as it was manually annotated in 157/169 non-draft genes in the pham. Start 6 is 7050 in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 7050. /note=Function call: No Known Function. The top three phagesdb BLAST hits are function unknown (E-value <10^-34), and the top three NCBI BLAST hits also have no known function for this gene (100% coverage, 100% identity, and E-value <10^-40). HHpred’s top hit indicates a different function but one hit called a domain of unknown function (45.5% probability, 28% coverage, and E-value <97). CDD had no relevant hits. /note=Transmembrane domains: Neither DeepTMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Fernandez, Mackenzie /note=Secondary Annotator QC: I agree with the primary annotators location call and function call based on the provided evidence. Gap/overlap: may want to include which phages share this overlap (synteny) ; Phamerator: might want to list total number of phages in pham ; Starterator: total number of non-draft phages does not seem to align with the report, might need to check ; agree with function call but need to double check HHPRED values; need to include Deep TMHMM. CDS 7280 - 8224 /gene="24" /product="gp24" /function="membrane protein, Band-7-like" /locus tag="JulietS_24" /note=Original Glimmer call @bp 7286 has strength 18.39; Genemark calls start at 7286 /note=SSC: 7280-8224 CP: yes SCS: both-cs ST: SS BLAST-Start: [gp31 [Mycobacterium phage Spud] ],,NCBI, q1:s1 100.0% 0.0 GAP: 29 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.123, -2.66590822473965, yes F: membrane protein, Band-7-like SIF-BLAST: ,,[gp31 [Mycobacterium phage Spud] ],,YP_002224285,100.0,0.0 SIF-HHPRED: Protein HflK; membrane microdomain organization, MEMBRANE PROTEIN; 3.27A {Escherichia coli},,,7VHP_S,81.5287,100.0 SIF-Syn: /note=Primary Annotator Name: Deng, Yiran /note=Auto-annotation: Glimmer and GeneMark both call the start site at 7286 with start codon ATG /note=Coding Potential: this ORF has good coding potential on the direct sequence (forward strand), indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. The chosen start site does include all of the coding potential /note=SD (Final) Score: the highest Z-score (3.123) and Final score (-2.666) starts at 7280, while the 7286 start has a lower Z-score (1.919) and final score (-5.743) /note=Gap/overlap: the gap with the upstream gene is 35 bp (start site 7280 has a gap of 29 bp). This gene is conserved in several other phages of the same cluster (Alice, Grungle) and the gap does not contain coding potential and was seen in Alice and Grungle as well. The length of gene is 945bp the longest ORF. /note=Phamerator: Pham 344 and has 203 members, Date 1/10/2023. It is conserved and found in Alice (C) and ArcherS7 (C) /note=Starterator: Start site 23 in Starterator was manually annotated in 19/203 non-drafted genes in this pham, which is the auto-annotated site by Glimmer and GeneMark. While the Start site 20 has the highest manual annotation in 129/203 non-drafted genes in this pham, which starts at 7280. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 7280 since it has the highest SD score, fewest gap, cover all coding potential and agree with Starterator. /note=Function call: membrane protein. The top phagesDB BLAST hits have the suggested function band-7-like membrane protein with small e-values of 1e -179 (BananaFence, Dietrick). Though the highest hit of NCBI Blastq has the function of lipoprotein (Spud), which was never called in pham 344, the other top hits agree with the same function (100%+ coverage, 99%+ identity, and e-value of 0.0). HHPRED top hits are HflK protein, membrane protein with 100% probability, >80% coverage and e-value of <8.2e-25. CDD top hits with SPFH domain/ Band 7 family with 66.8% coverage and e-value of <2.8e-9. /note=Transmembrane domains: deep TMHMM predicts two TMDs, suggesting this gene have a real transmembrane domain and support the function call of membrane protein /note=Secondary Annotator Name: Hines, Kia /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 8228 - 8590 /gene="25" /product="gp25" /function="hypothetical protein" /locus tag="JulietS_25" /note=Original Glimmer call @bp 8228 has strength 8.99; Genemark calls start at 8228 /note=SSC: 8228-8590 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_BACKYARDAGAIN_27 [Mycobacterium phage BackyardAgain] ],,NCBI, q1:s1 100.0% 6.0238E-82 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.092, -4.473102278738214, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BACKYARDAGAIN_27 [Mycobacterium phage BackyardAgain] ],,QOC58142,100.0,6.0238E-82 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Santilla, Matthew /note=Auto-annotation: Both Glimmer and Genemark agree that the start site for this gene is @ 8228. Start codon ATG. /note=Coding Potential: On the host-trained Genemark, the coding potential does not fully capture the start site @ 8228 and the stop site at @ 8590. The self-trained Genemark shows that the coding potential is fully captured by the start site @ 8228 and the stop site @ 8590. Both of these coding potentials are found in the forward reading of the gene. /note=SD (Final) Score: The final score of this start site is -4.473 with a Z score of 2.092. This final score is the least negative final score found compared to the other stop sites and the Z score of 2.092 is favorable since it’s above a 2. /note=Gap/overlap: There is a 3bp gap which is below the recommended 50bp limit. /note=Phamerator: Found in Pham 504. Date 01/10/2023. It is conserved and found in phages Ading, Pier and Tyke which are phages found in Cluster C. /note=Starterator: Start site 5 is called in 150/157 non-draft genes in this pham. Start site 5 is @ 8228 in JulietS which agrees with the Glimmer and Genemark auto annotation. /note=Location call: Based on the listed evidence the gene is real and the start site is most likely @ 8228. /note=Function call: NKF. The PhagesDB blast hits shows that the top hits (phages BackyardAgain and FoxtrotP1) have the function listed as unknown with e values of this hit being less than 10^-64. The NCBI blast hits show that the top 3 hits (phages BackyardAgain, FoxtrotP1, ScottMcG) have the function of this protein as being hypothetical with the e values being less than 10^-81. HHPred had no relevant hits. CDD had no relevant hits. /note=Transmembrane domains: Deep TMHMM did not predict any transmembrane domains, so this gene is not a membrane protein. /note=Secondary Annotator Name: Rheinhardt, Jenna /note=Secondary Annotator QC: I agree with the primary annotators location call and function call based on the provided evidence. CDS 8587 - 8718 /gene="26" /product="gp26" /function="hypothetical protein" /locus tag="JulietS_26" /note=Original Glimmer call @bp 8587 has strength 14.35; Genemark calls start at 8587 /note=SSC: 8587-8718 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SCOTTMCG_31 [Mycobacterium phage ScottMcG] ],,NCBI, q1:s1 100.0% 7.48019E-23 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.062, -3.3503726477288405, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SCOTTMCG_31 [Mycobacterium phage ScottMcG] ],,YP_002224064,100.0,7.48019E-23 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Arredondo, Alexis /note=Auto-annotation:Both Glimmer and Genemark list the start site of this gene at 8587, with the start codon being GTG. /note=Coding Potential: Coding potential is demonstrated in both the host-trained and self-trained GeneMark, based on the selected start site at 8587, running the length of the gene. /note=SD (Final) Score: The SD score is the best, being 3.062, while the Z-score is the second best, -3.350, suggesting that this gene is a very strong candidate. /note=Gap/overlap: The gap is -4, which suggests that the gene is part of an operon. /note=Phamerator: It belongs to pham 512, dated 1/08/2023. It is conserved, also being found Momo and Phlegm. /note=Starterator: Start site 2 was manually annotated in 169 of the 169 non-draft genes in the pham. This start site is 8587, which agrees with the auto annotation by Glimmer and GeneMark, and further validates the start site. /note=Location call: Based on the fact that Glimmer and GeneMark share the same start site, 8587, which is validated by starterator, and the gene is conserved in Momo (C1) and Phlegm (C1), this gene is most likely real. Additionally, the start site demonstrates coding potential for the eternity of the gene, further validating the suggest start site of 8587. /note=Function call: NKF. The NCBI Blast produced multiple hits, which were listed as a hypothetical protein (NKF). One significant hit had a probability of 100%, an e-value of 7.48e-23, and percent coverage of 100%, indicating that the selected gene also does not have a known function. HHPred did not have a significant hit, and neither did CDD. However, given the statistics of the single NCBI Blast, it is most definitely a protein with no known function. Additionally, Phages DB lists two phages Basquiat and BadAgartude with e-values of 4e-20, with no known function. /note=Transmembrane domains: Deep TMHMM does not list TMDs, therefore it is not a membrane protein or has a known function. /note=Secondary Annotator Name: Luk, Jarrett /note=Secondary Annotator QC: I agree with the location and location call but deep TMHMM information needs to be included for transmembrane domain prediction. Fix starterator info as only 157/157 nondraft called start 2 CDS 8792 - 9142 /gene="27" /product="gp27" /function="hypothetical protein" /locus tag="JulietS_27" /note=Original Glimmer call @bp 8792 has strength 15.49; Genemark calls start at 8792 /note=SSC: 8792-9142 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein DANDELION_36 [Mycobacterium phage Dandelion] ],,NCBI, q1:s1 100.0% 2.11813E-80 GAP: 73 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.889, -4.410201207862326, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein DANDELION_36 [Mycobacterium phage Dandelion] ],,YP_009012815,100.0,2.11813E-80 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kumar, Preyasi /note=Auto-annotation: Glimmer and GeneMark both call the start at 8792, ATG /note=Coding Potential: Yes, the gene has reasonable coding potential predicted within the putative ORF and the chosen start site covers all this coding potential. Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.410. It is the best final score on PECAAN. /note=Gap/overlap: 73bp. Slightly large, but ultimately reasonable because the gap is conserved in other phages and there is no coding potential in the gap that might be a new gene. /note=Phamerator: Pham 62004. Date 01/08/2023. It is conserved, found in ShiaLabeouf and Shrimp. /note=Starterator: Yes, there is a conserved start site choice. It is start number 18 with a base pair coordinate of 8792. 147 of 182 call site #18. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 8792 bp. /note=Function call: NKF; Both NCBI and PhagesDB database did not predict functions for this gene. The top three PhagesDB BLAST and HHpred hits have unknown function (e-value < 10^-65). No conserved domains identified for this query sequence. /note=Transmembrane domains: Neither TMHMM or TOPCONS or DeepTMHMM predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Sandhu, Muskaan /note=Secondary Annotator QC: I agree with this location call and assigned function. CDS 9219 - 9506 /gene="28" /product="gp28" /function="hypothetical protein" /locus tag="JulietS_28" /note=Original Glimmer call @bp 9219 has strength 22.97; Genemark calls start at 9219 /note=SSC: 9219-9506 CP: yes SCS: both ST: SS BLAST-Start: [gp33 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 2.62425E-63 GAP: 76 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.986, -2.6013996449736907, yes F: hypothetical protein SIF-BLAST: ,,[gp33 [Mycobacterium phage Bxz1] ],,NP_818109,100.0,2.62425E-63 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Rheinhardt, Jenna /note=Auto-annotation: Both Glimmer and GeneMark call the start site of 9219. Start codon of ATG. /note=Coding Potential: There is strong coding potential within the coding region, though in the last few codons there is a slight decrease in potential. Coding potential is found going forward on the third row. /note=SD (Final) Score: -2.601 ; this is the best score because it is the least negative. The z-score is 2.986, this is the strongest score because it the highest score compared to the other start sites. /note=Gap/overlap: Gap of 76 ; This is the best gap and overlap compared to the other starting points which are above 100 base pairs. The lower the number of base pairs make it the most likely to be the more preferable start sight. /note=Phamerator: Pham : 554. Reviewed on 1/12/23. Conserved within phages, compared against Ading and Audrick. /note=Starterator: Reviewed on 1/12/23. Call start site 9219, 156 of 156 non-draft genes call this start. This confirms with the start site called by Glimmer and GeneMark. /note=Location call: Start site of 9219 and stop site of 9506. Based on the above data the gene is real. /note=Function call: NKF ; No CDD hits present on 1/12/23. HHPRED did not present any data of note. Phagesdb BLAST presented with function unknown and e values of -52 (examples compared to are DTDevon and EmmaElysia phages). NCBI BLAST presented a hypothetical gp33 gene with 100 % identity and % aligned, with an e value of 2.62425e-63. /note=Transmembrane domains: DeepTmHmm presented no transmembrane domains so it is likely not a membrane protein. /note=Secondary Annotator Name: Qi, Haocheng /note=Secondary Annotator QC: I agree with the location and function call. Good work. CDS 9508 - 9669 /gene="29" /product="gp29" /function="hypothetical protein" /locus tag="JulietS_29" /note=Original Glimmer call @bp 9508 has strength 17.92; Genemark calls start at 9508 /note=SSC: 9508-9669 CP: yes SCS: both ST: SS BLAST-Start: [gp34 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 7.10033E-31 GAP: 1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.05, -4.6236495748204955, yes F: hypothetical protein SIF-BLAST: ,,[gp34 [Mycobacterium phage Bxz1] ],,NP_818110,100.0,7.10033E-31 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Tseng, Kylie /note=Auto-annotation: Glimmer and Genemark both call the same start site at 9508; start codon: ATG /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF on the forward strand only on line 1 (indicating that this is a forward gene). The chosen start site covers this ORF. Coding potential is found in both the host and self-trained Genemark. There is synteny with non-draft phages such as Astraea and BananaFence. /note=SD (Final) Score: -4.624. SD score is reasonable. /note=Gap/overlap: Yes, the gap is reasonable (1 bp). It is preferable to the other start site which would cause the gene to have a 55 bp gap. /note=Phamerator: Date of investigation: 1/12/23; Pham 551; Yes, the pham is in other members that belong to Cluster C, which is the cluster JulietS belongs to (i.e. phages Bread and Cali). /note=Starterator: Yes, there is a conserved start choice. It is start number 1 with a base pair coordinate of 9508. Has 156 MA’s. Found in 168/168 (100%) of genes in pham. This start site also agrees with Glimmer and Genemark’s call. The e-value is reasonable at -29. /note=Location call: Yes, the evidence suggests that this is a real gene. Start site 9508 seems most likely. /note=Function call: NKF; CDD had no hits. NCBI predicted hypothetical proteins with coverages of 100%, identities above 98%, and e-values at -30 and -31. PhagesDB did not predict functions for this gene. Phages Guwapp and I3 support the NKF conclusion since they have synteny for this gene and no known function (e-values -29). HHPred had some hits for functional proteins (i.e. cyclomaltodextrinase), but coverage was very low at 11%. /note=Transmembrane domains: No TMDs were predicted by DeepTMHMM; therefore it is not a membrane protein. /note=Secondary Annotator Name: Arredondo, Alexis /note=Secondary Annotator QC: I agree with the call for both the function and location made by the primary annotator. CDS 9669 - 9887 /gene="30" /product="gp30" /function="ribbon-helix-helix DNA binding domain" /locus tag="JulietS_30" /note=Original Glimmer call @bp 9669 has strength 4.23; Genemark calls start at 9669 /note=SSC: 9669-9887 CP: yes SCS: both ST: SS BLAST-Start: [ribbon-helix-helix DNA binding protein [Mycobacterium phage Amataga]],,NCBI, q1:s6 100.0% 4.21989E-43 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.564, -3.5660483139923818, yes F: ribbon-helix-helix DNA binding domain SIF-BLAST: ,,[ribbon-helix-helix DNA binding protein [Mycobacterium phage Amataga]],,QAY10091,93.5065,4.21989E-43 SIF-HHPRED: CdbA; nucleoid ribbon-helix-helix, DNA BINDING PROTEIN; 2.24A {Myxococcus xanthus DK 1622},,,6SBW_A,61.1111,98.6 SIF-Syn: /note=Primary Annotator Name: Aunger, Sarah /note=Auto-annotation: Glimmer and GeneMark both call the start site at 9669 with a start codon of ATG. /note=Coding Potential: Coding potential in this Open Reading Frame (ORF) is in the forward strand, indicating a forward reading gene. The start site of #9669 covers all of the coding potential regions on the forward strand, which supports the forward direction of this gene. Additionally, coding potential is found in both Host-Trained GeneMark and Self-Trained GeneMark graphs which suggests that this is a potential gene at start site #9669. /note=SD (Final) Score: The final score is the less negative (SD = -3.566), and the Z-score is the highest overall score (Z-score = 2.564), indicating that the autogenerated start site is the better of the options. But being since that both GeneMark and Glimmer both call the same start site, there is legitimacy in choosing the autogenerated site especially with a Z-score over 2 as it allows the gene to be the longest reasonable length. /note=Gap/overlap: There is an overlap of -1 bp, which seems reasonable when looking at the synteny of other phages classed in the C1 cluster against this gene. There is a reasonable overlap between this gene and the gene downstream of about -1 bp which allows for the conserved 219 bp gene to be reasonable. /note=Phamerator: The gene was found to be in Pham 770 (01/11/2023), which is common in Cluster C1 phages, as previously seen in Phages Ading, Ava, and Ewok. There were a lot of commonly listed functions which were all conserved to be ribbon-helix-helix DNA binding protein but not enough to call a function. However, the base pair length was conserved throughout the Phamily at 219 bp. /note=Starterator: There is a reasonable and highly conserved start site that was looked on 01/11/2023 at (9, 9669) which was called by 116 out of the 120 non-draft genes out of the 130 total pham members. /note=Location call: The gene and the start site are both conserved for this gene. With a high coding potential in the ORF it indicates a real gene’s placement in the operon and thus the evidence supports that the start site of this gene starts at #9969 and the gene is accurate to other phages within the C1 group. /note=Function call: The function of the gene could potentially be a ribbon-helix-helix DNA binding protein. In the BLASTp on PhagesDB.org it has two matches between YoungMoneyMeta and Valley Terrace phages that have an e-value of 10-35 and 100% positives with both carrying the function of ribbon-helix-helix DNA binding protein. Additionally, when looking at the NCBI BLASTp it also indicates a function of ribbon-helix-helix DNA binding protein with an e-value of 10-43 and a 100% match with phage Amataga. HHpred gives the first hit with a ribbon-helix-helix DNA binding protein with an e-value of 7.8x10-7 with a probability of 98.58%; but the second hit gives a putative uncharacterized protein with an e-value of 0.000001 with a 98.55% probability. CDD results are inconclusive as only one hit was given for NikR-like superfamily with an e-value of 8.24x10-4. This indicates a high probability that the function of this gene could be the ribbon-helix-helix DNA binding protein. Further classification of the genes, as seen in the SEAPHAGES Forum, in comparison of the hit (6SBW_A) in HHpred indicates that there is the presence of the beta and two alpha-helices indicating the function of a ribbon-helix-helix DNA binding protein. /note=Transmembrane domains: Both TOPCONS and TMHMM predict no transmembrane domains, thus this gene does not encode for a membrane protein. DeepTMHMM also predicts no transmembrane domains as well, so all encoding of this gene is done inside the cell. /note=Secondary Annotator Name: Wu, Angus /note=Secondary Annotator QC: I agree with the location and function call. CDS 9887 - 10048 /gene="31" /product="gp31" /function="hypothetical protein" /locus tag="JulietS_31" /note=Original Glimmer call @bp 9887 has strength 6.86; Genemark calls start at 9887 /note=SSC: 9887-10048 CP: yes SCS: both ST: SS BLAST-Start: [gp36 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 1.07156E-29 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.564, -3.5660483139923818, yes F: hypothetical protein SIF-BLAST: ,,[gp36 [Mycobacterium phage Bxz1] ],,NP_818112,100.0,1.07156E-29 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Fernandez, Mackenzie /note=Auto-annotation: Glimmer and GeneMark agree. Both call a start of 9887. ATG start codon. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene, and covers the start site. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.566. It is the best final score on PECAAN. Z-score is 2.564. /note=Gap/overlap: -1 overlap. Is small and reasonable as the overlap is conserved in other phages (Sprinklers, Shelob) and there is no coding potential in the overlap that might be a new gene. /note=Phamerator: Pham: 62148. Date 1/12/2023. Pham has 113 members and 8 phages are drafts. /note=Starterator: Start site 7 in Starterator was manually annotated in 95 of the 105 non-draft genes in the pham. This may not be convincing evidence. Start 7 is 9887 in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 9887. /note=Function call: NKF. No, there is not enough evidence to suggest a function for this protein as both NCBI and PhagesDB did not have any significant hits.Function unknown was listed for every PhagesDB hit except for phage Mikro. DNA binding protein was listed as the function and there was an e-value of 7e-24. Not strong enough evidence to suggest a DNA binding function. Hypothetical proteins were listed for NCBI, with some e-values close to zero. Two phages were listed as gp. Phages Bxz1 (e-value of 1e-29) and Rizal (e-value of 5e-29). CDD had zero hits. HHpred had potential hits with probability higher than 90% but the score was less than 30 for all hits. /note=Transmembrane domains: 0 predicted TMRs from Deep TMHMM, therefore it is not a membrane protein. /note=Secondary Annotator Name: Aunger, Sarah /note=Secondary Annotator QC: I agree with the gene being real and located with the suggested start site. I also agree with the function call. CDS 10113 - 10361 /gene="32" /product="gp32" /function="hypothetical protein" /locus tag="JulietS_32" /note=Original Glimmer call @bp 10113 has strength 18.15; Genemark calls start at 10113 /note=SSC: 10113-10361 CP: yes SCS: both ST: SS BLAST-Start: [gp37 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 2.41871E-51 GAP: 64 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.062, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: ,,[gp37 [Mycobacterium phage Bxz1] ],,NP_818113,100.0,2.41871E-51 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Geghamyan, Knar /note=Auto-annotation: Both Glimmer and GeneMark call the gene and they agree that the start site is 10,113. The start codon is ATG. /note=Coding Potential: The ORF has good coding potential and the chosen start site includes all of the coding potential. /note=SD (Final) Score: -2.505 is the best SD (final) score because it is the highest (least negative) value, and has the smallest gap of all of the choices. Z-score is above 2. /note=Gap/overlap: There is a significant gap of 64bp upstream, but there is no coding potential in the gap, and when compared to non-draft phages from the same cluster, there weren`t any additional genes present. /note=Phamerator: Gene found in pham 975 on 1/13/23. When compared to phages BackyardAgain, Daffodil, and Shrimp, the pham in which this gene is most commonly annotated was found to be in other members of the same cluster C. The start number called the most often in the published annotations is 2, it was called in 94 of the 95 non-draft genes in the pham. There is synteny with other non-draft phages belonging to the same cluster. /note=Starterator: The start number called the most often in the published annotations is 2, it was called /note=in 94 of the 95 non-draft genes in the pham. /note=Location call: The evidence supports that this is a real gene, and the potential candidate start site at 10113 seems most likely. /note=Function call: NKF. The top two phagesdb BLAST hits (Flabslab and HyRo) have no known function (E-value <10^-30), and the top 3 NCBI BLAST hits also have no known function. (100% coverage, 100%+ identity, and E-value <10^-51). There were no CDD hits with significant values. There were no HHpred hits that fulfilled the requirements of probability > 80%, coverage >40%, and an e-value <10^-3. /note=Transmembrane domains: Neither TMHMM, deep TMHMM, or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Lee, Amber /note=Secondary Annotator QC: I agree with the function and location call. CDS 10432 - 10761 /gene="33" /product="gp33" /function="hypothetical protein" /locus tag="JulietS_33" /note=Original Glimmer call @bp 10432 has strength 17.04; Genemark calls start at 10432 /note=SSC: 10432-10761 CP: yes SCS: both ST: SS BLAST-Start: [gp38 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 6.41886E-74 GAP: 70 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.062, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: ,,[gp38 [Mycobacterium phage Bxz1] ],,NP_818114,100.0,6.41886E-74 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Melnyk, Mattie /note=Auto-annotation: Both GeneMark and Glimmer call 10432 as the start site with a start codon of ATG /note=Coding Potential: Host-trained and self trained genemark shows good coding potential beginning at the suggested start site on the first forward ORF and the start site contains all of the coding potential /note=SD (Final) Score: -2.505, the best final score in PECAAN. The Z score is 3.062 which is reasonable /note=Gap/overlap: 70, which is a reasonable gap and has synteny with other phages (Astrea, Pio…) The gene has a length of 330 base pairs which is reasonable /note=Phamerator: Pham 500 as of 1/13/23, 170 members, 12 drafts, many other C cluster phages in this pham (Alice, Astrea…) /note=Starterator: Start: 7 @10432 has 152 MA`s (most Manual annotations), agrees with the call from glimmer and genemark, and is conserved /note=Location call: This appears to be a real gene with a start site at 10432 /note=Function call: NKF; phagesDB hits have E-values of 5e^-57 and their function is called as function unknown, NCBI hits are (Identity:100, coverage:100, e-value: 6e^-74) and (Identity: 99.08, Coverage: 100, e-value:3.6e^-73) and both call the function as hypothetical protein. CDD and HHPred had no relevant hits /note=Transmembrane domains: Deep TMHMM predicts 0 TMRs and that the whole protein is inside the cell /note=Secondary Annotator Name: Hines, Kia /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 10837 - 11091 /gene="34" /product="gp34" /function="hypothetical protein" /locus tag="JulietS_34" /note=Original Glimmer call @bp 10837 has strength 20.2; Genemark calls start at 10837 /note=SSC: 10837-11091 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein M181_gp041 [Mycobacterium phage Gizmo] ],,NCBI, q1:s1 100.0% 4.79707E-52 GAP: 75 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.22, -2.1924212546750814, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein M181_gp041 [Mycobacterium phage Gizmo] ],,YP_008060838,100.0,4.79707E-52 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Gurunathan, Vibha /note=Auto-annotation: Both Glimmer and Genemark call the start site at 10837, at start codon ATG. /note=Coding Potential: Coding potential is in the forward strand only, and this gene has good coding potential in host trained and self trained Genemark. The chosen start site covers all the coding potential. /note=SD (Final) Score: The SD score is less negative than the other options, at -2.192, and the Z score is good, at 3.22. /note=Gap/overlap: The gap is 75 bp, which is somewhat large, but the previous gene is in the reverse direction so this is reasonable. /note=Phamerator: Pham 530. Date 1/12/23. It is conserved; found in Zalecks_40 and YemiJoy2021_39. Function was listed as unknown on the phams database. /note=Starterator: Start site 1 @10837 was manually called 157 times in nondraft phages. /note=Location call: Based on the above evidence, this gene is a real gene and the most likely start site is 10837. /note=Function call: PhagesDB blast shows top hits with function unknown, with e-values around e^-40. NCBI BLASTp shows similar results, with hypothetical proteins listed with very low e-values (2. /note=Function call: NKF in PhagesDB BLASTp; top hits were phages Alice and Cane17, which had e-values of 8e-44. No convincing evidence in HHPred (all e-values too high). No CDD hits. NKF in NCBI BLAST; top hits were phages ET08 (2.2e-56) and Spud (1.2e-55). /note=Transmembrane domains: No TMDs predicted in DeepTMHMM, so this protein is not a membrane protein. /note=Secondary Annotator Name: Tseng, Kylie /note=Secondary Annotator QC: Don`t forget to select the drop-down menu under Starterator on PECAAN as well as check off evidence for NCBI hits on PECAAN, but otherwise I agree with the primary annotator`s location and function calls. CDS 12143 - 12508 /gene="39" /product="gp39" /function="hypothetical protein" /locus tag="JulietS_39" /note=Original Glimmer call @bp 12143 has strength 15.64; Genemark calls start at 12143 /note=SSC: 12143-12508 CP: yes SCS: both ST: SS BLAST-Start: [gp45 [Mycobacterium phage Cali] ],,NCBI, q1:s1 100.0% 1.78615E-84 GAP: 105 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.802, -5.080915947802713, no F: hypothetical protein SIF-BLAST: ,,[gp45 [Mycobacterium phage Cali] ],,YP_002224522,100.0,1.78615E-84 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Lee, Amber /note=Auto-annotation: Glimmer and GeneMark both call the start site at 12143 bp. The start codon is ATG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: The start site 12143 bp corresponds to a Final Score of -5.081 which is the third best final score. This final score isn’t very good but it minimizes the gap and the start site is the most annotated start site. The Z-score is 1.802 which is close to 2. /note=Gap/overlap: Gap: 105 bp. Somewhat large, but ultimately reasonable because the gap is conserved in other phages (BadAgartude, BeanWater) of the same cluster and there is no coding potential in the gap that might be a new gene. /note=Phamerator: pham: 615. Date 1/11/2023. It is conserved; found in BadAgartude (C1) and BeanWater(C1). /note=Starterator: Start site 2 in Starterator was manually annotated in 142/145 non-draft genes in this pham. Start 2 is 12143 bp in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 12143 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: NKF. E-values too large and not enough coverage in HHPRED. Many strong hits (e-value ~0) are observed in PhagesDB and NCBI BLAST, but no known function. No hits in CDD. Not a membrane protein because it wasn’t called by TMHMM, Deep TMHMM, or TOPCONS. /note=Transmembrane domains: Neither Deep TMHMM, TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Tseng, Kylie /note=Secondary Annotator QC: I agree with the primary annotator`s location and function call. CDS 12545 - 12820 /gene="40" /product="gp40" /function="hypothetical protein" /locus tag="JulietS_40" /note=Original Glimmer call @bp 12545 has strength 9.69; Genemark calls start at 12545 /note=SSC: 12545-12820 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein M182_gp047 [Mycobacterium phage Astraea] ],,NCBI, q1:s1 100.0% 1.26753E-59 GAP: 36 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.231, -2.1695583717155773, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein M182_gp047 [Mycobacterium phage Astraea] ],,YP_008061542,100.0,1.26753E-59 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Qi, Haocheng /note=Auto-annotation: Both Glimmer and genemark, they agree on the same site which is 12545, the start codon is ATG /note=Coding Potential: Coding potential is found both in GeneMark Self and Host, and there is only forward potential, and the chosen start site does include all of the coding potential. /note=SD (Final) Score:-2.170, the best final score on PECAAN /note=Gap/overlap: 36, which is less than 50 so it is a reasonable gap and this gene and gap distance is conserved with lots of other phages such as daffodil(c). /note=Phamerator: pham: 690. Date 01/08/23. It is conserved; found in other 147 non-draft phages, such as Ading or Bread. /note=Starterator: Start site 3 in Starterator was manually annotated in 135/135 non-draft genes in this pham. Start 3 is 12545 in JulietS. This evidence agrees with the site predicted by both Glimmer and GeneMark. /note=Location call:Based on the above evidence, this is a real gene and the most likely start site is 12545 /note=Function call: Not known function. The top 2 phagesdb BLAST hits have the function of unknown function with e-value of 2e-48 Not known function, and the ncbi blast also have no known function with e-value of 1e-59. There is also no hits in CDD, the largest possibility in HHpred is 37.16%, so it is not reliable, so overall there is no known function in this gene. /note=Transmembrane domains: Neither TMHMM or TOPCONS or DeepTMHMM predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Lee, Amber /note=Secondary Annotator QC: I agree with the location and function calls. CDS 12870 - 13166 /gene="41" /product="gp41" /function="hypothetical protein" /locus tag="JulietS_41" /note=Original Glimmer call @bp 13005 has strength 5.77; Genemark calls start at 12870 /note=SSC: 12870-13166 CP: yes SCS: both-gm ST: SS BLAST-Start: [HTH DNA binding protein [Mycobacterium phage Astraea] ],,NCBI, q1:s1 100.0% 1.1059E-64 GAP: 49 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.088, -5.787564606701106, no F: hypothetical protein SIF-BLAST: ,,[HTH DNA binding protein [Mycobacterium phage Astraea] ],,YP_008061543,100.0,1.1059E-64 SIF-HHPRED: HTH marR-type domain-containing protein; heme-binding, helix-turn-helix, TRANSCRIPTION; HET: HEM; 1.7A {Streptococcus agalactiae serotype III (strain NEM316)},,,7DVR_A,88.7755,99.3 SIF-Syn: /note=Primary Annotator Name: Uyemura, Antonio /note=Auto-annotation: Glimmer calls the start site at 13005 and GeneMark calls it at 12870. The start codon is GTG. /note=Coding Potential: In the Host trained GeneMark, there is coding potential just shy of 13000. In Self trained, it has coding potential all the way to 12870. There is only coding potential in the forward direction. /note=SD (Final) Score: -5.788. Although this is not a great SD score, it has the smallest gap and covers all the coding potential. /note=Gap/overlap: 49 bp. This gap is seen in other phages within this pham such as Alice 50 bp, Astraea 50 bp, and Grungle 50bp. /note=Phamerator: 5320 1/19/23. It is conserved and seen in Alice (C1), Astraea (C1), and Grungle (C1). /note=Starterator: Start 5 @ 12870 which is the most manually annotated 8/12.This start site agrees with GeneMark and contains the most coding potential. /note=Location call: Based on all the provided evidence, this is a real gene that starts at 12870. Glimmer gave a start site of 13005 however, this does not cover all the coding potential found in the self trained GeneMark. GeneMark states the start site at 12870 which does cover all the coding potential. Moreover, when looking at the manually annotated start sites, 12870 was called 8/12 times. This start site would also reduce the gap from 184bp to 49bp which is within the acceptable range between genes. This gap is also conserved in other phages (Alice and Astraea). Lastly this new start site has a better z-score at 2.088 compared to 1.368 based on the 13005 start site. /note=Function call: Helix turn Helix DNA binding protein. The top two phagesdb Blast hits have the function as HTH DNA binding protein (e-value 3e-51) and the top two hits on NCBI BLAST are also HTH DNA binding protein (100% coverage, <98% coverage, <1e-64). HHpred gave two promising hits (7DVR_A 4.3e-11 and 5FFX_A 6.5e-10) where the sequence had an alpha helix, 3 amino acid space, more alpha helix, and finished with 3 more spaced. These genes came from S. agalaciae and S. aureus. CDD has no credible hits. /note=Transmembrane domains: DeepTMHMM predicts it to be inside the cell with 100% probability. /note=Secondary Annotator Name: Arredondo, Alexis /note=Secondary Annotator QC: I agree with the function and location call of the primary annotator. CDS 13163 - 13675 /gene="42" /product="gp42" /function="immunity repressor" /locus tag="JulietS_42" /note=Genemark calls start at 13163 /note=SSC: 13163-13675 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein PBI_BOBI_40 [Mycobacterium phage Bobi] ],,NCBI, q1:s1 100.0% 1.98272E-122 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.13, -4.39309052803282, no F: immunity repressor SIF-BLAST: ,,[hypothetical protein PBI_BOBI_40 [Mycobacterium phage Bobi] ],,YP_008408997,100.0,1.98272E-122 SIF-HHPRED: Immunity repressor; Immunity repressor, helix-turn-helix motif, DNA binding protein, GENE REGULATION, GENE REGULATION-DNA complex; HET: MSE; 2.79A {Mycobacterium phage TipsytheTRex},,,7TZ1_A,95.8824,100.0 SIF-Syn: Immunity repressor in JulietS not called for Lukilu (Pham 68557), upstream gene is a helix-turn-helix DNA binding domain in JulietS and Lukilu (Pham 5320), downstream gene has NKF in JulietS not called for Lukilu (Pham 320), all genes are from the same pham. /note=Primary Annotator Name: Barrera, Alexis /note=Auto-annotation: Genemark calls the start site at 13163, Glimmer does not call a start site. The start codon is ATG, which is common. /note=Coding Potential: Coding potential found in GeneMark Self contains both typical and atypical peaks in the forward and the reverse direction. GeneMark Host contains very little coding potential in this region only in the forward direction. /note=SD (Final) Score: The z score is 2.13, a z-score of 2 is good. The final score is -4.393 which is the third highest value. /note=Gap/overlap: -4 bps gap upstream of the gene and a 136 gap downstream of the gene. These gaps are conserved in phages Alice and Lukilu. /note=Phamerator: Pham: 65503. Date 01/12/23. The gene is conserved in phages HyRo and LRRHood which are in the same cluster as JulietS. /note=Starterator: Start site 84 was called for 177 of the 737 non-draft phage genomes in the pham, it is found in 227 of 771 of the genes in this pham. It is the most annotated start site and is called 82.2 % of the time when it is present. /note=Location call: Considering the evidence above, this gene is a real gene and has a start site at 13163 bp. Genemark calls the start at 13163 and Glimmer does not make a call. /note=Function call: Immunity repressor. The top hit for HHpred has a function of immunity repressor (probability: 100, E-value: 1.8e-29, % coverage: 95.8824). In this pham, the Phagesdb Function Frequency also states that there is a frequency of 72% being an immunity repressor in subcluster A1 and 9% in subcluster C1. PhagesDB BLASTp shows phages Ronan and Phox with the function of immunity repressor (E-value: 3e-98, 1e-97), both are from cluster C1. NCBI BLASTp shows phage Phox with the function of immunity repressor, (% coverage: 100, E-value: 1.32391e-121). CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM predicts no TMRs. This evidence indicates that this is not a membrane protein. /note=Secondary Annotator Name: Fernandez, Mackenzie /note=Secondary Annotator QC: Phamerator/starterator: Pham appears to have changed in starterator report (also include number of phages in pham), should double check, need to revise start site number ; agree with function call CDS complement (13719 - 13811) /gene="43" /product="gp43" /function="membrane protein" /locus tag="JulietS_43" /note=Genemark calls start at 13811 /note=SSC: 13811-13719 CP: yes SCS: genemark ST: SS BLAST-Start: [gp36 [Mycobacterium phage Fruitloop] ],,NCBI, q1:s1 100.0% 2.68054E-10 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.765, -5.220380545373514, no F: membrane protein SIF-BLAST: ,,[gp36 [Mycobacterium phage Fruitloop] ],,YP_002241721,100.0,2.68054E-10 SIF-HHPRED: SIF-Syn: This gene displays synteny. This gene, the gene preceding it, and the gene following it are conserved in the same order in phages Alice (C1) and Astraea (C1). /note=Primary Annotator Name: Kristianto, Luke /note=Auto-annotation: Only GeneMark calls the gene’s start site at 13811 bp. Glimmer does not call a start site. The start codon is GTG, which has a high probability of being used as a start site. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only on the second frame, indicating that this is a reverse gene. The ORF has reasonable coding potential and the called start site does capture all of the coding potential. /note=SD (Final) Score: The final score is the second best option at -5.220 and the z-score is the second best option at 1.765. This is justified because the gap/overlap is at -8, indicating that the gene is likely part of an operon. /note=Gap/overlap: The gap/overlap is reasonable at -8 bp. This overlap is conserved and shows up in phage Alice from cluster/subcluster C1. /note=Phamerator: Pham: 320. Date 1/10/2023. It is conserved and found in Alice (C1) and Astraea (C1). /note=Starterator: Start number 4 in Starterator was manually annotated in 176/198 non-draft genes in this pham. Start number 4 is 13811 bp in phage JulietS. This is likely the start site because it was called the most and was conserved in 207/209 (99.0%) of genes in the pham. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 13811 bp. Starterator agrees with GeneMark. /note=Function call: Membrane protein. The top two PhagesDB BLAST hits have the function of “function unknown” (100% identity, E-value = 4e-10), and the top two NCBI BLAST hits have the function of “hypothetical protein” (100%/96.7742% identity, E-value = 2.68054e-10/2.68539e-10). Another PhagesDB BLAST hit has the function of "putative membrane protein" (100% identity, E-value = 4e-10). This hit is backed by the results of DeepTMHMM, which shows that the gene has 1 TMD 17 amino acids long, which is long enough to be considered a membrane protein. Results from CDD and HHpred were irrelevant because either no results came up or unlikely results with unreasonably high e-values came up. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs. However, DeepTMHMM predicts the presence of 1 TMD 17 amino acids long. Therefore, it is likely a membrane protein. /note=Secondary Annotator Name: Wu, Angus /note=Secondary Annotator QC: I agree with the location and function call. For the TMDs predicted by DeepTMHMM, please reference the annotation manual and add information for how long the TMD is (16 amino acids long in this case). CDS complement (13804 - 14025) /gene="44" /product="gp44" /function="hypothetical protein" /locus tag="JulietS_44" /note=Genemark calls start at 14025 /note=SSC: 14025-13804 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein AU088_gp036 [Mycobacterium phage Cabrinians] ],,NCBI, q1:s1 100.0% 3.70006E-44 GAP: -14 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.05, -5.213475109731446, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein AU088_gp036 [Mycobacterium phage Cabrinians] ],,YP_009189758,100.0,3.70006E-44 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Shah, Amay /note=Auto-annotation: Glimmer does not predict a start site. Genemark calls the start site at 14025. /note=Coding Potential: Coding potential in this ORF is in the reverse direction only, indicating that this is a reverse gene. Coding potential is not found in the entire ORF and only in the host-trained Genemark. /note=SD (Final) Score: The final score is -5.213. This is not the best final score on PECAAN, but better than some of the gene candidates. /note=Gap/overlap: The gap of 199bp appears excessive, but this is the smallest gap of all gene candidates. /note=Phamerator: pham: 65669. Date 1/10/2023. It is conserved; found in Alice (C) and Astraea (C). /note=Starterator: Start site 9 in Starterator was manually annotated in 90/138 non-draft genes in this pham. Start 9 is 14025 in JulietS. This evidence agrees with the site predicted by GeneMark only. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 14025. /note=Function call: NKF. BLASTp hits only included proteins labeled function unknown. CDD returned no hits. HHPRED returned no significant hits. The top NCBI BLAST hit was a hypothetical protein with an e-value of 3.7e-44, so there is not enough evidence to determine a function call. /note=Transmembrane domains: Deep TMHMM doesn`t predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Tseng, Kylie /note=Secondary Annotator QC: Include e-values for BLASTp and for NCBI BLAST, I would change the wording to say that the top hits are for hypothetical proteins (include e-values and coverage). Also, don`t forget to check DeepTMHMM for the transmembrane call and check off two boxes of evidence for each phagesdb and NCBI blast CDS complement (14012 - 14107) /gene="45" /product="gp45" /function="hypothetical protein" /locus tag="JulietS_45" /note= /note=SSC: 14107-14012 CP: no SCS: neither ST: NI BLAST-Start: [hypothetical protein I5H41_gp038 [Mycobacterium phage Galactic] ],,NCBI, q1:s1 100.0% 3.95622E-13 GAP: 117 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.189, -4.4121902292613795, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein I5H41_gp038 [Mycobacterium phage Galactic] ],,YP_009957441,100.0,3.95622E-13 SIF-HHPRED: SIF-Syn: /note=added gene. Tricky start. All genes in pham are relatively short (87bp-114bp). The only other cluster C members of the pham select starts with either length of 96bp (start 14107, GTG in JulietS) or 114bp (start 14125, TTG). RBS scores similar; 14107 slightly better. Chose that one. CDS 14225 - 14464 /gene="46" /product="gp46" /function="hypothetical protein" /locus tag="JulietS_46" /note=Original Glimmer call @bp 14225 has strength 19.27; Genemark calls start at 14225 /note=SSC: 14225-14464 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein AWH68_gp051 [Mycobacterium phage Breeniome] ],,NCBI, q1:s3 100.0% 6.02566E-49 GAP: 117 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.22, -2.17469248771465, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein AWH68_gp051 [Mycobacterium phage Breeniome] ],,YP_009221181,97.5309,6.02566E-49 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sandhu, Muskaan /note=Auto-annotation:The start site in Glimmer and GeneMark are the same: 14225. Start codon is ATG, which is more common. /note=Coding Potential:There is coding potential present in ORF 2 of the direct sequence (as expected for a forward gene) in the Host-Trained & Self-Trained Genemark. Start site 14225 covers all of coding potential. /note=SD (Final) Score:-2.175. This is the best score and it corresponds to the start site reported by Glimmer and GeneMark. /note=Gap/overlap:There is a 200 bp gap upstream of the gene and a 2bp gap downstream of the gene. Gap is reasonable because it is conserved in phage Astraea and there is no coding potential present to fill in the gaps in Host-Trained and Self-Trained Genemark. /note=Phamerator: Pham 487 as of 1/12/23. It is conserved, found in Ading (C1) and CindyLou (C1). /note=Starterator: 78/159 of non-draft genes called start site 9. Start 9 is @ 14225 bp in JulietS. Evidence agrees w/ site predicted by Glimmer and GeneMark. /note=Location call: Start site 9 @ 14225 bp. /note=Function call: No known function. The top three PhagesDB BLAST hits have the function of hypothetical proteins (E-value <10^-40) and the top three NCBI BLAST hits also have the function of hypothetical proteins (100% coverage, >96% identity, and E-value <10^-49). HHpred had no relevant hits with E-values being greater than 9.9. CDD had no relevant hits as well. /note=Transmembrane domains: DeepTMHMM does not show any TMDs, therefore, this is not a transmembrane protein. /note=Secondary Annotator Name: Gurunathan, Vibha /note=Secondary Annotator QC: Slight typo, just change it to DeepTMHMM does "not" show any TMDs. Otherwise it looks good - I agree with the function and location call. CDS 14466 - 14681 /gene="47" /product="gp47" /function="hypothetical protein" /locus tag="JulietS_47" /note=Original Glimmer call @bp 14466 has strength 15.75; Genemark calls start at 14466 /note=SSC: 14466-14681 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PLEIONE_53 [Mycobacterium phage Pleione] ],,NCBI, q1:s1 100.0% 4.60377E-42 GAP: 1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.744, -3.459652012164126, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PLEIONE_53 [Mycobacterium phage Pleione] ],,YP_009017827,100.0,4.60377E-42 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Wu, Angus /note=Auto-annotation: Using both Glimmer and GeneMark, they agree that the host start site is 14466. The start codon that is called is ATG. /note=Coding Potential: The gene has good coding potential with the host-trained GeneMark on ORF 3. The coding potential increases near the start site potential start site (14466) and plateaus until the stop codon where it tapers off. The gene similarly has good coding potential with the self-trained GeneMark on ORF 3, where the coding potential rises at the start site and plateaus until the stop site. Finally, the start and stop sites also encompass all of the coding potential. /note=SD (Final) Score: The Final Score is -3.460, and the Z score is 2.744. The final score for this start site has a really good sequence match (highest final score). It is also the only listed start site candidate on PECAAN. /note=Gap/overlap: There is a 1 base pair gap with the gene upstream, and this is reasonable/acceptable because the gap is very short, and implies it could share an operon with the gene upstream. This chosen start site also makes the gene have the longest open reading frame, and have a gene length of 216 bp, which is also good. /note=Phamerator: The gene is found in Pham 621 as of 1/15/23. The gene is conserved in other members of subcluster C1, and I used phage Stubby and Turret for comparison. /note=Starterator: A start site choice exists that is conserved among members of the pham, and corresponds to start site 2 and 14466. 145/145 non-draft genomes in the pham call also this start site. /note=Location call: The available evidence suggests that the start site is 14466. The gene appears to be real, and this proposed start site covers all the coding potential, and has a good final score and z-score. In starterator, it is called 100% of the time when present. /note=Function call: The function of this protein is unknown (NKF). All results from PhagesDB BLASTp show hits with 100% identity, 100% alignment, and 100% coverage, with e-values 2e-34,, and those proteins have no known function. Similarly on NCBI BLASTp, e-values are low (5e-42), and those proteins also have no known function. There were no hits on CDD, and HHpred results were not desirable, as e-values were too high. /note=Transmembrane domains: Neither DeepTMHMM or TOPCONS predicted any TMDs, this is not a membrane protein. /note=Secondary Annotator Name: Kumar, Preyasi /note=Secondary Annotator QC: I agree with the location and function call. CDS 14681 - 15007 /gene="48" /product="gp48" /function="hypothetical protein" /locus tag="JulietS_48" /note=Original Glimmer call @bp 14681 has strength 14.46; Genemark calls start at 14681 /note=SSC: 14681-15007 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SCOTTMCG_50 [Mycobacterium phage ScottMcG] ],,NCBI, q1:s1 100.0% 1.18297E-75 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.908, -2.8454123590742793, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SCOTTMCG_50 [Mycobacterium phage ScottMcG] ],,YP_002224083,100.0,1.18297E-75 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Davis, Kayla /note=Auto-annotation: Both Glimmer and GeneMark have a start site listed at 14681, with a start site of ATG /note=Coding Potential: There is good coding potential with this gene. Both the Host-trained GeneMark and the Self-Trained Genemark have complete direct sequences, which shows that they have good coding potential. In addition to that, this gene also shares synteny with phages Grungle and Daffodil. /note=SD (Final) Score: The Z-score that was chosen was 2.908, while the final score that was chosen was -2.845 for the start site of 14681. Both of these were the most optimal z-score and final score. This start site also matched with Glimmer and GeneMark’s listed start sites. /note=Gap/overlap: This gene has an overlap of 1 bp, as evidenced by the -1 gap, which is evidence of an operon. /note=Phamerator: As of 1/12/2023, this gene was found in Pham 914. It was found to be conserved in several other cluster C phages, such as Ava, Bipolarisk, and Bread. This is good because JulietS is also a cluster C phage. /note=Starterator: As of 01/08/2023, start site number 4 was the most annotated start site. It was called in 99/102 non-draft phages. In JulietS, start site 4 correlates to 14681, which also matches up with the GeneMark and Glimmer listed start sites, and in addition it matches up with the SD Scores. /note=Location call: Based on the evidence above, it is likely that this gene is a real gene with a start site of 14681. /note=Function call: NKF. Based on all the available data, as of now it is impossible to conclude a function for this protein. All of the top hits in NCBI Blast and PhagesDB Blastp are all hypothetical proteins with e-values that are close to zero. There were no significant hits in CDD. In addition to that, the top hits for the HHpred had rather medium probabilities, however their E-values were much greater than 0, showing that none of these could definitely say what function the gene has. /note=Transmembrane domains: Deep TMHMM predicted that this is a protein that is inside of the cell, therefore it is not a transmembrane protein. /note=Secondary Annotator Name: Aunger, Sarah /note=Secondary Annotator QC: I agree with location and function call. However, just make sure to add the DeepTMHMM conclusions to your notes and to be a little bit more specific in e-values instead of saying extremely low. CDS 15036 - 15107 /gene="49" /product="gp49" /function="hypothetical protein" /locus tag="JulietS_49" /note= /note=SSC: 15036-15107 CP: no SCS: neither ST: NI BLAST-Start: [hypothetical protein SEA_NUEVOMUNDO_58 [Mycobacterium phage NuevoMundo] ],,NCBI, q1:s1 100.0% 2.24659E-6 GAP: 28 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.315, -6.4522068641678745, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_NUEVOMUNDO_58 [Mycobacterium phage NuevoMundo] ],,AVJ48513,100.0,2.24659E-6 SIF-HHPRED: SIF-Syn: /note=Stop @ 15107: Added dgene, very small, but excellent coding potential CDS 15180 - 15410 /gene="50" /product="gp50" /function="hypothetical protein" /locus tag="JulietS_50" /note=Original Glimmer call @bp 15180 has strength 17.68; Genemark calls start at 15180 /note=SSC: 15180-15410 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SCOTTMCG_52 [Mycobacterium phage ScottMcG] ],,NCBI, q1:s1 100.0% 3.06689E-48 GAP: 72 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.986, -2.6013996449736907, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SCOTTMCG_52 [Mycobacterium phage ScottMcG] ],,YP_002224085,100.0,3.06689E-48 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Pham, Truc /note=Auto-annotation: The two auto-annotation algorithms, Glimmer and GeneMarks, both call the start of this gene at 15180 with the start codon of ATG. /note=Coding Potential: The coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host and all of the ORF is included with the 15180 start site. /note=SD (Final) Score: The SD (Final) Score for this start site is -2.601, which is the best score of all the possible start sites on PECAAN. Additionally, this start site also have the best Z-score of 2.986 on PECAAN. /note=Gap/overlap: This gene has a gap of 172 base pairs with the previous gene. This is not the best possible start site given that there’s two other possible start sites with a smaller gap. However, this gene is conserved in several other phages and the gap was seen in the other phages as well, such as phage Bread and StephanieG. /note=Phamerator: This gene belongs to pham number 542 as of 1/08/2023. The gene is conserved in phages of this cluster (C) like Bread and StephanieG. There is no function listed for members of this family, so it is highly likely that this is a gene with an unknown function. /note=Starterator: Start site 1 is most often called as it was manually annotated in 98/156 non-draft genes in the pham. Start 1 is 15180 in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 15180. /note=Function call: No Known Function. The top three phagesdb BLAST hits are function unknown (E-value <10^-38), and the top three NCBI BLAST hits also have no known function for this gene (100% coverage, 78.35+% identity, and E-value <10^-48). HHpred’s top hits also indicate no known function (68+% probability, 35.5+% coverage, and E-value <15). CDD had no relevant hits. /note=Transmembrane domains: Neither DeepTMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Aunger, Sarah /note=Secondary Annotator QC: I agree with this gene being real and the function of NKF. Make sure to add DeepTMHMM conclusions to your notes as well. CDS 15414 - 15788 /gene="51" /product="gp51" /function="hypothetical protein" /locus tag="JulietS_51" /note=Original Glimmer call @bp 15414 has strength 16.43; Genemark calls start at 15414 /note=SSC: 15414-15788 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SCOTTMCG_53 [Mycobacterium phage ScottMcG] ],,NCBI, q1:s1 100.0% 1.4207E-85 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.062, -2.794070146961553, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SCOTTMCG_53 [Mycobacterium phage ScottMcG] ],,YP_002224086,100.0,1.4207E-85 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Deng, Yiran /note=Auto-annotation: Glimmer and GeneMark both call the start site at 15414 with start codon ATG /note=Coding Potential: Coding potential in this ORF is found on forward direct sequence and no switch in orientation is observed in both host-trained and self-trianed Genemark. All the coding potential included in the ORF and by the selected start site. /note=SD (Final) Score: final score of -2.794 and z-value of 3.062, which is the best final score on PECAAN /note=Gap/overlap: the gap with the upstream gene is 3 bp. This gene is conserved in several other phages of the same cluster (Fludd, Grungle) and the gap does not contain coding potential and was seen in Fludd and Grungle as well. /note=Phamerator: pham: 562, date 01/20/2023. It is conserved; found in Ading (C) and Alice (C). /note=Starterator: Start site number 49 in Starterator had the highest manual annotation in 155/155 non-drafted genes in this pham. Start site 49 is at position 15414 in JulietS, which agrees with the auto-annotated site by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 15414. /note=Function call: NKF. None of the phagesDB BLAST, NCBI BLAST, CDD, and HHpred shows any significant relevant hits with known functions, only significant probability or coverage to hits with unknown functions, the top hits with unknown function Dandelion and Darko in phagesDB blast have high evalue of 2e-63. /note=Transmembrane domains: deep TMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Qi, Haocheng /note=Secondary Annotator QC: I agree with the location and function call. However, since you specially choose the two calls in phageDB blast, you should mention their name and e-value in the function call. CDS 15818 - 16120 /gene="52" /product="gp52" /function="helix-turn-helix DNA binding domain" /locus tag="JulietS_52" /note=Original Glimmer call @bp 15818 has strength 16.48; Genemark calls start at 15818 /note=SSC: 15818-16120 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_DTDEVON_56 [Mycobacterium phage DTDevon]],,NCBI, q1:s1 100.0% 8.62587E-64 GAP: 29 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.062, -2.442961286954254, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[hypothetical protein SEA_DTDEVON_56 [Mycobacterium phage DTDevon]],,ALF50919,99.0,8.62587E-64 SIF-HHPRED: GcrA ; GcrA cell cycle regulator,,,PF07750.14,36.0,91.9 SIF-Syn: /note=Primary Annotator Name: Santilla, Matthew /note=Auto-annotation: Both Glimmer and Genemark agree that the start site for this gene is @ 15818. Start codon is ATG. /note=Coding Potential: The host-trained Genemark shows that the coding potential does fully capture the start site @ 15818 and the stop site @ 16120. The self-trained Genemark shows that the coding potential fully captures the start site @ 15818, but does not fully capture the stop site @ 16120. Both of these coding potentials are found in the forward reading frame of the gene. /note=SD (Final) Score: The final score of this start site is -2.443 with a z score of 3.062. This final score is the least negative final score found from the listed potential start sites and the z score is 3 which is above 2 which makes this start site favorable. /note=Gap/overlap: There is a 29bp gap which is below the recommended 50bp limit. /note=Phamerator: Found in Pham 345. Date 01/11/22. This gene is conserved and found in Derek, QBert, and Grungle which are all phages in Cluster C which is the same cluster as JulietS. /note=Starterator: Start site 9 is called in 175/185 non-draft genes in this pham. Start site 9 is @ 15,818 which agrees with the auto annotated start site of Glimmer and Genemark. /note=Location call: Based on the aforementioned evidence the start site for this gene is most likely @ 15818. /note=Function call: NKF. The PhagesDB blast top hits show phages BackyardAgain and StephanieG list the function as a helix turn helix DNA binding protein with both having an e value of 1e^-50. The top 3 hits from NCBI blast show different functions with phage Bxz1 stating the function as a helix turn helix DNA binding protein with an e value of 1e^-64, phage Janiyra stating the function as a helix-turn-helix DNA binding domain protein with an e value of 3e^-64, and phage Mangeria listing the function as a hypothetical protein with an e value of 4e^-64. CDD had no relevant hits. HHPred had no relevant hits. I think the fact that HHPred did not have any relevant hits compared to both PhagesDB Blast and NCBI Blast which state its a HTH DNA binding domain then it is a NKF because HHPred is the most reliable. /note=Transmembrane domains: Deep TMHMM did not predict any transmembrane domains, so this gene is not a membrane protein. /note=Secondary Annotator Name: Davis, Kayla /note=Secondary Annotator QC: Be sure to mention the date of the starterator report, HHpred outside of PECAAN had high e-values for all results, while NCBI stated that there were hits for helix-turn-helix. As HHpred is more reliable, I agree with your conclusion that this gene`s function is NKF as of now. CDS 16120 - 16260 /gene="53" /product="gp53" /function="hypothetical protein" /locus tag="JulietS_53" /note=Original Glimmer call @bp 16120 has strength 7.91; Genemark calls start at 16120 /note=SSC: 16120-16260 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_MORIZZLED23_59 [Mycobacterium phage Morizzled23] ],,NCBI, q1:s1 100.0% 3.02931E-25 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.629, -3.4283035164720754, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_MORIZZLED23_59 [Mycobacterium phage Morizzled23] ],,AZF95932,100.0,3.02931E-25 SIF-HHPRED: SIF-Syn: This phage demonstrates synteny with Momo and Phlegm. /note=Primary Annotator Name: Arredondo, Alexis /note=Auto-annotation: Both Glimmer and GeneMark list the start site at 16120, with the start codon being GTG. /note=Coding Potential: Coding potential is demonstrated in both the host-trained and self-trained GeneMark, based on the selected start site, running the length of the gene. /note=SD (Final) Score: The SD score is the best, being -3.428, and the Z-score is the best as well, being 2.629 suggesting that this gene is a very strong candidate. /note=Gap/overlap: The gap is -1, which suggests that it may be part of an operon. /note=Phamerator: It belongs to pham 68590, dated 2/7/2023. It is conserved, also being found in Momo and Phlegm. /note=Starterator: Start site 52 was manually annotated in 173 of the 381 non-draft genes in the pham. This is not the start that is called the most, being the second most called start site, however, among other genes within the same cluster, it is. This start site is 16120, which agrees with the auto annotation by Glimmer and GeneMark, and further validates the start site. /note=Location call: Based on the fact that Glimmer and GeneMark share the same start site, 16120, which is validated by starterator, and the gene is conserved in Momo (C1) and Phlegm (C1), this gene is most likely real. Additionally, the start site demonstrates coding potential for the entirety of the gene, further validating the suggested start site, 16120, as the actual start site. /note=Function call: NKF. There were no significant hits for this gene in CDD and HHPred. However, there were two significant hits in NCBI for NKF. One was listed with a probability of 98.2%, an e-value of 1e-25, and a percent coverage of 100%. The second hit had a probability of 96.7%, an e-value of 4e-25, and a percent coverage of 100%. Additionally Phages DB lists two phage, Bad Agartude and BeanWater, have an e-value of 7e-24, indicating that the function of the gene is unknown. /note=Transmembrane domains: Deep TMHMM does not list TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Fernandez, Mackenzie /note=Secondary Annotator QC: List other phages that there is synteny with ; Phamerator/starterator: Pham appears to have changed in starterator report (also include number of phages in pham), should double check, need to revise start site number ; Function: agree with function call but need to list what PhagesDB said, for HHPRED shoult put "no siginificant hits" instead of no hits ; need to update notes to Deep TMHMM CDS 16260 - 16511 /gene="54" /product="gp54" /function="hypothetical protein" /locus tag="JulietS_54" /note=Original Glimmer call @bp 16260 has strength 13.27; Genemark calls start at 16260 /note=SSC: 16260-16511 CP: yes SCS: both ST: SS BLAST-Start: [gp58 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 1.65613E-54 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.085, -4.839586000312467, yes F: hypothetical protein SIF-BLAST: ,,[gp58 [Mycobacterium phage Bxz1] ],,NP_818134,100.0,1.65613E-54 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kumar, Preyasi /note=Auto-annotation: Glimmer and GeneMark both call the start at 16260, ATG. /note=Coding Potential: Yes, the gene has reasonable coding potential predicted within the putative ORF and the chosen start site covers all this coding potential. Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.840. It is the best final score on PECAAN. /note=Gap/overlap: 1bp overlap. Reasonable, evidence of an operon, acceptable gene length. I didn’t choose a start site that would make a longer ORF because both Glimmer and GeneMark agreed on the current start site and all other final scores are lower than the current start site. /note=Phamerator: Pham 513. Date 01/08/2023. It is conserved, found in ShiaLabeouf and Shrimp. /note=Starterator: Yes, there is a conserved start site choice. It is start number 3 with a base pair coordinate of 16260. 157 of 157 call site #3. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 16260 bp. /note=Function call: NKF; Both NCBI and PhagesDB database did not predict functions for this gene. The top three PhagesDB BLAST and HHpred hits have unknown function (e-value < 10^-44). No conserved domains identified for this query sequence. /note=Transmembrane domains: Neither TMHMM or TOPCONS or DeepTMHMM predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Sandhu, Muskaan /note=Secondary Annotator QC: I agree with this location and function call. CDS 16508 - 16834 /gene="55" /product="gp55" /function="hypothetical protein" /locus tag="JulietS_55" /note=Original Glimmer call @bp 16508 has strength 11.29; Genemark calls start at 16508 /note=SSC: 16508-16834 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein KHO63_gp053 [Mycobacterium phage QBert] ],,NCBI, q1:s1 100.0% 1.73471E-72 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.492, -6.080970649622256, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein KHO63_gp053 [Mycobacterium phage QBert] ],,YP_010058152,100.0,1.73471E-72 SIF-HHPRED: DUF5614 ; Family of unknown function (DUF5614),,,PF18474.4,83.3333,70.5 SIF-Syn: /note=Primary Annotator Name: Rheinhardt, Jenna /note=Auto-annotation: Glimmer and GeneMark both call 16508 start. Start codon is ATG. /note=Coding Potential: There is coding potential present within the host and self trained. It is mainly within the forward strand on the second row. /note=SD (Final) Score: -6.081 ; the final score is not the strongest given when compared to the other start sites. The Z-score is 1.492, this is also not the best score because it is not close to 2 as other scores. /note=Gap/overlap: -4 ; There is an overlap of 4 base pairs. This is a sign that there is an operon present at the start of the gene. /note=Phamerator: pham : 505, date 1/18/2023. Conserved and found within phage Amataga (C ) and Bread (C ). /note=Starterator: Start site 3 ; from Starterator has 155/157 non-draft genes calling the start of 16508. This agrees with what was presented from Glimmer and GeneMark. /note=Location call: Based on the data from Starterator, Glimmer and GeneMark the start site should be 16508. /note=Function call: NKF ; NCBI BLAST presents phage QBert with a 100% identity and 100% alignment with an e value of 1.73471e-72 that is a hypothetical protein. Phagesdb BLAST presents RoMag and Sauce phages with e values of 2e-58 that present an unknown function. HHPRED called unknown function with e-values of 18 and 25 and the highest coverage being 83.33%. No function frequency or CDD was given. /note=Transmembrane domains: DeepTmHmm presented no transmembrane domains so it is likely not a membrane protein. /note=Secondary Annotator Name: Aunger, Sarah /note=Secondary Annotator QC: I agree that this gene is real and the function is NKF. CDS 16920 - 17327 /gene="56" /product="gp56" /function="hypothetical protein" /locus tag="JulietS_56" /note=Original Glimmer call @bp 16920 has strength 17.39; Genemark calls start at 16920 /note=SSC: 16920-17327 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein KHO63_gp054 [Mycobacterium phage QBert] ],,NCBI, q1:s1 100.0% 7.88475E-93 GAP: 85 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.986, -2.6013996449736907, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein KHO63_gp054 [Mycobacterium phage QBert] ],,YP_010058153,100.0,7.88475E-93 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Tseng, Kylie /note=Auto-annotation: Glimmer and Genemark both call the same start site at 16920; start codon: ATG /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF on the forward strand only on line 3 (indicating that this is a forward gene). The chosen start site covers this ORF. Coding potential is found in both the host and self-trained Genemark. There is synteny with non-draft phages such as Astraea and Atlantean. /note=SD (Final) Score: -2.601. This is a reasonable SD score. /note=Gap/overlap: The gap is 85 bp, which is a bit bigger than the recommended gap of at most 50 bp. However, all other evidence seems to point towards this being a real gene. This is the smallest gap when compared with other start sites. /note=Phamerator: Date of investigation: 1/12/23; Pham 676; Yes, the pham is in other members that belong to Cluster C, which is the cluster JulietS belongs to (i.e. phages Astraea and Atlantean). /note=Starterator: Yes, there is a conserved start choice. It is start number 4 with a base pair coordinate of 16920. Has 83 MA’s. Found in 142/148 (95.9%) of genes in pham. This start site also agrees with Glimmer and Genemark’s call. E-values are good at -75. /note=Location call: Yes, the evidence suggests this gene is real. Start site 16920 is most likely. /note=Function call: NKF; CDD had no hits. NCBI predicted hypothetical proteins with coverages of 100%, identities above 97%, and e-values at -91 and -93. PhagesDB did not predict functions for this gene. Phages QBert and RoMag support the NKF conclusion since they have synteny with this gene and no known function (e-values -75). HHPred has hits, but the coverage is low and the e-values are much higher than 0. /note=Transmembrane domains: No TMDs were predicted by DeepTMHMM; therefore it is not a membrane protein. /note=Secondary Annotator Name: Pham, Truc /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 17418 - 18113 /gene="57" /product="gp57" /function="hypothetical protein" /locus tag="JulietS_57" /note=Original Glimmer call @bp 17418 has strength 14.96; Genemark calls start at 17418 /note=SSC: 17418-18113 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_SPRINKLERS_57 [Mycobacterium phage Sprinklers]],,NCBI, q1:s1 100.0% 4.02413E-165 GAP: 90 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.295, -2.0949393225970705, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_SPRINKLERS_57 [Mycobacterium phage Sprinklers]],,QAY13352,100.0,4.02413E-165 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Aunger, Sarah /note=Auto-annotation: Glimmer and GeneMark both call the start site at 17418 with a start codon of ATG. /note=Coding Potential: Coding potential in this Open Reading Frame (ORF) is in the forward strand, indicating a forward reading gene. The suggested start site of #17418 covers around 99.9% of the coding potential regions on the forward strand, which supports the forward direction of this gene. Yet, the start site of #17364 covers the entire ORF for the gene indicating a different start site could be used. Additionally, coding potential is found in both Host-Trained GeneMark and Self-Trained GeneMark graphs which suggests that this is a real gene. /note=SD (Final) Score: The suggested start site at #17418 has a final score of -2.095 and a Z-score of 3.295 which is not the most negative final score but has the highest z-score. In comparison with the other start site that could be used, the final score is SD = -5.991 which is higher than the suggested start site but it has a lower Z-score of 2.134. Both start sites suggest good potential to be the starting point of the coding for the gene in the ORF. But being since that both GeneMark and Glimmer both call the same start site, there is letigitmacy in choosing the autogenerated site especially with a Z-score over 2 and being the highest score as it allows the gene to be the most conserved between other phages (Grungle). /note=Gap/overlap: The suggested start site has a gap of 90 bp which shows synteny with other genes in the Cluster C1. The other potential start site has a much lower gap between genes with a 36 bp gap which does not show syteny between the phage grungle and JulietS. There is a reasonal overlap between this gene and the gene downstream of about -1 bp which allows for the suggested start site and the new potential start site (@ #17364) to remain true. The start site at #17364 allows for the gene to be longer with no potential issues in the ORF. /note=Phamerator: The gene was found to be in Pham 62083 (01/18/2023), which is common in Cluster C1 phages, as previously seen in Phages Ading, Ava, and Grungle. There are no functions listed for this Pham. However, the base pair length was conserved at 969 bp. /note=Starterator: There is a reasonable and highly conserved start site that was looked on 01/18/2023 at (6, 17418) which was called by 147 out of the 161 non-draft genes out of the 175 total pham members. /note=Location call: The gene and the start site are both conserved for this gene. With a high mediocre coding potential in the ORF it still indicates a real gene’s placement in the operon and thus the evidence supports that the start site of this gene starts at #17418 and the gene is accurate to other phages within the C1 group. Being as the Pham has the conserved length of 696 bp throughout other Phages in Cluster C1 and both Glimmer and GeneMark call for the suggested start site, it indicates that the suggested start site is more favoured against the start site at #17364. /note=Function call: The function of the gene seems to have no known function. In the BLASTp on PhagesDB.org it has two matches between Sprinklers and Caravan phages that have an e-value of 10-134 and 100% positives with no known function (NKF). Additionally, when looking at the NCBI BLASTp it also indicates a no known function (NKF) with an e-value of 4x10-165 and a 100% match with phage Sprinklers. There are additional matches with a phage name QBert that indicates a hypothetical protein but this is not enough to determine a function. HHpred and CDD both give inclusive results with CDD giving no hits and HHpred having the lowest possible hit with an e-value of 89 and a probability of 36.65 for general control protein. Thus, this indicates a high probability that the function of this gene is not known. /note=Transmembrane domains: Both TOPCONS and TMHMM predict no transmembrane domains, thus this gene does not encode for a membrane protein. DeepTMHMM also predicts no transmembrane domains as well, so all encoding of this gene is done inside the cell. /note=Secondary Annotator Name: Arredondo, Alexis /note=Secondary Annotator QC: I agree with the location and functional call of the primary annotator. CDS 18113 - 18424 /gene="58" /product="gp58" /function="hypothetical protein" /locus tag="JulietS_58" /note=Original Glimmer call @bp 18113 has strength 15.04; Genemark calls start at 18113 /note=SSC: 18113-18424 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SCOTTMCG_60 [Mycobacterium phage ScottMcG] ],,NCBI, q1:s1 100.0% 1.20194E-68 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.078, -4.643862285635445, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SCOTTMCG_60 [Mycobacterium phage ScottMcG] ],,YP_002224093,100.0,1.20194E-68 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Fernandez, Mackenzie /note=Auto-annotation: Glimmer and GeneMark agree. Both call a start of 18113. GTG start codon. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene, and covers the start site. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.644. It is not the best final score on PECAAN. Z-score is 2.078. This final score was chosen based on the fact that it aligned with Glimmer, GeneMark, and Starterator. The best final score is -2.681 and best Z-score is 2.986 (for a start site of 18155). /note=Gap/overlap: -1 overlap. Is small and reasonable as the overlap is conserved in other phages (Bigswole, CindyLou) and there is no coding potential in the overlap that might be a new gene. /note=Phamerator: Pham: 538. Date 1/13/2023. Pham has 171 members and 14 phages are drafts. Conserved in Gabriel and HyRo. /note=Starterator: Start site 3 in Starterator was manually annotated in 156 of the 157 non-draft genes in the pham. Start 3 is 18113 in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. Start site 18155 is not supported by Starterator as only 1 manual annotation was done. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 18113. /note=Function call: NKF. No, there is not enough evidence to suggest a function for this protein as both NCBI and PhagesDB did not have any significant hits. Function unknown was listed for every PhagesDB hit. Hypothetical proteins were listed for NCBI (e-values around 1e-68 and coverage of 100%). CDD had zero hits. HHpred produced 6 hits with probability significantly less than 90% and scores less than 30 for all hits. No significant e-values. /note=Transmembrane domains: 0 predicted TMRs from Deep TMHMM, therefore it is not a membrane protein. /note=Secondary Annotator Name: Tseng, Kylie /note=Secondary Annotator QC: I agree with your location and function call. However, I think you could add why you are choosing the start site despite the bad final score (under the SD score notes) and also state if the pham is conserved in other members of the same cluster (Phamerator). You could also include specific e-values and coverage values for the hypothetical protein hits for NCBI and don`t forget to check off two boxes for evidence for each phages db and NCBI BLAST CDS 18421 - 18627 /gene="59" /product="gp59" /function="helix-turn-helix DNA binding domain" /locus tag="JulietS_59" /note=Original Glimmer call @bp 18421 has strength 14.31; Genemark calls start at 18421 /note=SSC: 18421-18627 CP: yes SCS: both ST: SS BLAST-Start: [gp63 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 3.99989E-41 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.062, -2.523003374675015, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[gp63 [Mycobacterium phage Bxz1] ],,NP_818139,100.0,3.99989E-41 SIF-HHPRED: Transcriptional regulator ComR; RNPP family TPR domain HTH domain bacterial signaling peptide binding, TRANSCRIPTION; 1.894A {Streptococcus vestibularis F0396},,,6HU8_A,83.8235,98.8 SIF-Syn: helix-turn-helix DNA binding domain, upstream gene is NKF, downstream is helix-turn-helix DNA binding domain, just like in phage Grungle from the same pham 534. /note=Primary Annotator Name: Geghamyan, Knar /note=Auto-annotation: Both Glimmer and GeneMark call the gene and they agree that the start site is 18,421. The start codon is ATG. /note=Coding Potential: The ORF has good coding potential and the chosen start site includes all of the coding potential. /note=SD (Final) Score: -2.523 is the best SD (final) score because it is the highest (least negative) value, and has the smallest gap of all of the choices. Z-score is above 2. /note=Gap/overlap: This start site had the smallest gap/overlap. Since the overlap is by 4 base pairs, this implies that the gene could be part of an operon. /note=Phamerator: Gene found in pham 1 on 1/13/23. When compared to phages BackyardAgain, Daffodil, Megamind, and Shrimp, the pham in which this gene is most commonly annotated was found to be in other members of the same cluster C. The start number called the most often in the published annotations is 1, it was called in all 157 of the 157 non-draft genes in the pham. There is synteny with other non-draft phages belonging to the same cluster. /note=Starterator: The start number called the most often in the published annotations is 1, it was called /note=in 157 of the 157 non-draft genes in the pham. /note=Location call: The evidence supports that this is a real gene, and the potential candidate start site at 18421 seems most likely. /note=Function call: Helix-turn-helix domain binding protein. The top three phagesdb BLAST hits (Megamind, ZygoTaiga, and HyRo) have the same function (E-value <10^-33), and the top 2 NCBI BLAST hits (Dandelion and Fostrotp1) also agreed that this is a helix-turn-helix domain binding protein with 100% coverage, 98%+ identity, and E-value <10^-40. Top 2 CDD hits also agreed on the function with a <68% coverage and e-value < 10^-8. Top second and third hits on HHpred with a probability >98%, coverage >82%, and an e-value <10^-7 also agreed that the function is a HTH DNA binding protein. In HHpred, there are over 3 alpha helices that are separates by 3-4 amino acid sequences, indicating that this is indeed a HTH DNA binding domain. /note=Transmembrane domains: Neither TMHMM, deep TMHMM, or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Qi, Haocheng /note=Secondary Annotator QC: I agree with the location and function call, good work. However, for the synteny, remember to add the pham number for those conserved NKF gene. CDS 18624 - 18932 /gene="60" /product="gp60" /function="helix-turn-helix DNA binding domain" /locus tag="JulietS_60" /note=Original Glimmer call @bp 18663 has strength 13.14; Genemark calls start at 18624 /note=SSC: 18624-18932 CP: yes SCS: both-gm ST: SS BLAST-Start: [hypothetical protein M182_gp066 [Mycobacterium phage Astraea] ],,NCBI, q1:s1 100.0% 7.4951E-68 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.676, -5.4858509637772555, no F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[hypothetical protein M182_gp066 [Mycobacterium phage Astraea] ],,YP_008061561,100.0,7.4951E-68 SIF-HHPRED: HTH marR-type domain-containing protein; heme-binding, helix-turn-helix, TRANSCRIPTION; HET: HEM; 1.7A {Streptococcus agalactiae serotype III (strain NEM316)},,,7DVR_A,95.098,99.4 SIF-Syn: There is synteny with phage Alice, Blackbrain, and Cactojaque in both function and location (The gene is 58 in Alice, Blackbrain, and Cactojaque and 57 in JulietS) /note=Primary Annotator Name: Melnyk, Mattie /note=Auto-annotation: Glimmer calls the start as 18663 with a start codon of GTG while genemark calls the start as 18624 with a start codon of ATG /note=Coding Potential: Both starts include all of the coding potential in both host trained and self trained and are on the 3rd forward facing ORF /note=SD (Final) Score: 18624 has a final score of -5.486, which is not the highest SD score on PECAAN and 18663 has a final score of -3.703 which is the best SD score /note=Gap/overlap: Start 18624 has a gap of -4 which indicates the gene is part of an operon and start 18663 has a gap of 35 which is reasonable /note=Phamerator: Pham number 62068 as of 1/13/23 and has 193 members, 14 are drafts. Phages Alice and Astrea (both c phages) are members of this pham /note=Starterator: There are 157 manual annotations for Start 23 @ 18624 and none for 18663 /note=Location call: This is likely a real gene with a start at 18624 /note=Function call: helix-turn-helix DNA binding domain; There are 2 hits for the PhagesDB function frequency that call this same function. The top two hits on the PhagesDB BLAST also call this function and have e-values of 5e^-54. HHpred has a hit with an e-value of 7.8e^-12 which calls this function as helix-turn-helix and has 95% coverage and 99.4 probability. NCBI has 2 hits that call it as helix-turn-helix as well with (coverage:100, identity:100, e-value: 7.4951e-68) and (Coverage: 100, Identity: 99.01, e-value: 1.63434e-67). An independent HHpred investigation reveals that the amino acid sequence has a series of 3 alpha helicies separated by spacers and the crystal structure confirms this. /note=Transmembrane domains: Deep TMHMM predicts 0 TMRs and that the whole protein is inside the cell /note=Secondary Annotator Name: Rheinhardt, Jenna /note=Secondary Annotator QC: I agree with the location call and function call based on the provided evidence. CDS 18932 - 19267 /gene="61" /product="gp61" /function="hypothetical protein" /locus tag="JulietS_61" /note=Original Glimmer call @bp 18932 has strength 16.32; Genemark calls start at 18932 /note=SSC: 18932-19267 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein M182_gp067 [Mycobacterium phage Astraea] ],,NCBI, q1:s1 100.0% 8.77417E-74 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.078, -5.030063891036238, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein M182_gp067 [Mycobacterium phage Astraea] ],,YP_008061562,100.0,8.77417E-74 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Gurunathan, Vibha /note=Auto-annotation: Both Glimmer and Genemark call the start site at 18932, with the start codon ATG. /note=Coding Potential: Coding potential is predominantly in the top strand (there is a slight bit in the reverse strand at the very beginning, but there’s barely an overlap). The gene has good coding potential in host trained and self trained Genemark. It has slightly poorer coding potential near the end, but is overall reasonable CP. The start site covers all the coding potential. /note=SD (Final) Score: The SD score is -5.030, which is not the best SD score out of the options listed, but it’s comparable and the other options that have better SD scores have overlaps of over 100 bp. The Z score is 2.078. /note=Gap/overlap: There is an overlap of 1 bp, which is reasonable. /note=Phamerator: Pham 531. Date 1/12/23. It is conserved; found in Trinitium_60 and Shifa_58. Function was listed as unknown on the phams database. /note=Starterator: Start site 3 @18932 was the most annotated, with 142 manual annotations. Start site 4 @ 18935 was annotated much less in comparison, with 15 manual annotations. /note=Location call: Based on the above evidence, this gene is a real gene and the most likely start site is 18932. /note=Function call: PhagesDB matches were listed as function unknown. HHPred showed genes with similar functions, but the evalues were higher than 1, with the lowest being 2, and some matches were also listed as hypothetical proteins. NCBI BlastP shows hypothetical protein matches. CDD shows no conserved domains. Thus, the function was called NKF. /note=Transmembrane domains: Deep TMHMM predicted no TMRs, so this gene is likely not coding for a transmembrane protein. /note=Secondary Annotator Name: Barrera, Alexis /note=Secondary Annotator QC: I`m in agreement with the primary annotator`s location and function call. CDS 19454 - 19717 /gene="62" /product="gp62" /function="hypothetical protein" /locus tag="JulietS_62" /note= /note=SSC: 19454-19717 CP: yes SCS: neither ST: SS BLAST-Start: [hypothetical protein M181_gp070 [Mycobacterium phage Gizmo] ],,NCBI, q1:s1 100.0% 9.60841E-55 GAP: 186 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.314, -6.181859025529208, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein M181_gp070 [Mycobacterium phage Gizmo] ],,YP_008060867,100.0,9.60841E-55 SIF-HHPRED: SIF-Syn: CDS 19843 - 20073 /gene="63" /product="gp63" /function="hypothetical protein" /locus tag="JulietS_63" /note=Original Glimmer call @bp 19843 has strength 17.65; Genemark calls start at 19843 /note=SSC: 19843-20073 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_GRUNGLE_57 [Mycobacterium phage Grungle] ],,NCBI, q1:s1 100.0% 1.72151E-44 GAP: 125 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.295, -1.953940808934884, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_GRUNGLE_57 [Mycobacterium phage Grungle] ],,QDP44426,100.0,1.72151E-44 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hines, Kia /note=Auto-annotation: Glimmer and GeneMark both call 19843 as start. /note=Coding Potential: Yes, there is great coding potential on the forward strand, indicating it is a forward gene. On GeneMark Self, the coding potential does dip a little in the middle, but the coding potential still covers the entire start to stop coding region and is high for the rest of the region. /note=SD (Final) Score: The Z-score (1.847) and the Final Score (-5.814) of the auto called start site both don’t meet the requirements for ideal scores. The Z-score and the Final Score of the second start site called are ideal, 2.306 and -4.167 respectively, however since this start site has never been manually annotated and 19843 has been 16/16 times, I would not switch to calling the second start site. /note=Gap/overlap: The gap with the upstream gene is too large to be reasonable (416 bp), however since there is no coding potential within the gap region on Host Trained Gene Mark then there is no indication of gene insertion being an option to fill the gap. The length of the gene (390 bp) is an acceptable length. /note=Phamerator: Pham number is 3113 as of 1/08/23. It is conserved, found in StephanieG_66 and Sprinklers_63. /note=Starterator: Start number (@19843) is 6, manually annotated 16 times. This is the same as the conserved start site. There are 30 members of the pham and 16 call the same conserved start site. /note=Location call: There is good coding potential and the length of the gene is a good length, so despite the large gap with the upstream gene and the poor SD scores, it is still reasonable to call this a real gene. The start site is 19843. /note=Function call: NKF. Phagesdb, NCBI Blast, CDD, and HHpred all call no known function. /note=Transmembrane domains: There are no TMDs predicted by deepTMHMM, so it is not a membrane protein. /note=Secondary Annotator Name: Kumar, Preyasi /note=Secondary Annotator QC: I agree with the location and function call. CDS 20213 - 20419 /gene="64" /product="gp64" /function="hypothetical protein" /locus tag="JulietS_64" /note=Original Glimmer call @bp 20213 has strength 11.93; Genemark calls start at 20213 /note=SSC: 20213-20419 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein M182_gp070 [Mycobacterium phage Astraea] ],,NCBI, q1:s1 100.0% 1.4523E-39 GAP: 139 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.22, -2.639579286017301, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein M182_gp070 [Mycobacterium phage Astraea] ],,YP_008061565,100.0,1.4523E-39 SIF-HHPRED: SIF-Syn: /note=PECAAN Notes /note=Primary Annotator Name: Luk, Jarrett /note=Auto-annotation:Forward Gene.Glimmer(20213), Genmark (20213), start codon:ATG /note=Coding Potential:The coding potential is found in the forward frame. The gene does cover all coding potential in both the Host Trained and Self Trained GeneMark /note=SD (Final) Score:-2.640 it`s the most reasonable score due to how it has a small gap. It is the best score as it is the least negative. Z score is 3.22 which is greater than 2.It shows that the site is a good candidate for a start site. /note=Gap/overlap:139 which is a relatively big gap. However the large gap is conserved in Alice and CharlieB from the same cluster /note=Phamerator: Pham 667, 01/14/23 gene conserved in Phage Ronan and Cane17 /note=Starterator: 01/08/23. Manually annotated 140/140 nondraft in this pham. Start 6 (6,20213) are manually called by 140 others . Evidence is also in line with the autoannotation made by glimmer and genemark /note=Location call: It is a real gene and the start site is Start 6 at 20213 based on the evidence above /note=Function call:NKF.The top two hits on phagesDB blast (e-value=2e-32 for phage stubby and StephaineG) indicate unknown function. The top hit on NCBI blast(e-value =1e-39 and 5e-39 for phage Astraea and ScottMcG, with over 98% identity and 100% coverage) also indicates an unknown function. There are no significant hits on HHpred and CDD. /note=Transmembrane domains: No transmembrane domain predicted on TOPCONS,TMHMM and deep TMHMM.Which indicates that the gene is not responsible for a membrane protein /note=Secondary Annotator Name: Gurunathan, Vibha /note=Secondary Annotator QC: Check the starterator and GM coding capacity boxes, otherwise, I agree with the function and location call. CDS 20529 - 21764 /gene="65" /product="gp65" /function="hypothetical protein" /locus tag="JulietS_65" /note=Original Glimmer call @bp 20529 has strength 18.93; Genemark calls start at 20529 /note=SSC: 20529-21764 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein KHO65_gp070 [Mycobacterium phage Sauce] ],,NCBI, q1:s1 100.0% 0.0 GAP: 109 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.062, -2.523003374675015, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein KHO65_gp070 [Mycobacterium phage Sauce] ],,YP_010058629,100.0,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Okahata, Leila /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 20529 bp. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. The chosen start site includes all of the coding potential. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -2.523. It is the best final score on PECAAN. /note=Gap/overlap: 109 bp gap. Somewhat large, but ultimately reasonable because the gap is conserved in other phages (Grungle, Colt) and there is no coding potential in the gap that might indicate a new gene. /note=Phamerator: Pham 389. Date 1/11/2023. It is conserved and found in Grungle, Daffodil, and ParkTD, which are all in the same cluster as JulietS (C). /note=Starterator: Start site 18 in Starterator was manually annotated in 164/174 non-draft genes in this pham. Start 18 is 20529 bp in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and likely has a start site at 20529 bp. /note=Function call: No known function (NKF). The top two phagesDB BLAST hits are of unknown function (E-value = 0), and the top three NCBI BLAST hits are also of unknown function (E-value = 0, 99.76%+ identity, 100% coverage). CDD and HHpred had no significant hits. /note=Transmembrane domains: Neither TMHMM, TOPCONS, or DeepTMHMM predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Tseng, Kylie /note=Secondary Annotator QC: I agree with the primary annotator`s location and function calls. CDS 21838 - 22521 /gene="66" /product="gp66" /function="hypothetical protein" /locus tag="JulietS_66" /note=Original Glimmer call @bp 21838 has strength 12.33; Genemark calls start at 21838 /note=SSC: 21838-22521 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein AVA3_72 [Mycobacterium phage Ava3] ],,NCBI, q1:s1 100.0% 1.41168E-167 GAP: 73 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.776, -3.392076448018144, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein AVA3_72 [Mycobacterium phage Ava3] ],,AFL46765,100.0,1.41168E-167 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hegde, Priya /note=Auto-annotation: Glimmer and GeneMark call a start site of 21838 (ATG codon). /note=Coding Potential: Good coding potential in GeneMark Host and Self. Coding potential in this ORF is on the forward strand only. /note=SD (Final) Score: -3.392. It is the best final score on PECAAN. /note=Gap/overlap: There is a 73bp gap with the upstream gene, which is a bit large. The gap is well-conserved. /note=Phamerator: Gene is in pham 507 (date accessed: 01/17/23). It is conserved, found in phages JPickles and Khaleesi. /note=Starterator: Start site 6 at 21838 was manually annotated 127 times. It is the most manually annotated start site on Starterator. /note=Location call: The selected start site at 21838 has the highest final score and a Z-score greater than two. The high number of manual annotations also support this location call. /note=Function call: NKF in PhagesDB BLASTp; top hits were phages Ava3 and FoxtrotP1, both with low e-values of 1e-135. No convincing evidence in HHPred (all e-values too high). No CDD hits. NKF in NCBI BLAST; top hits were phages Ava3 (1.41e-167) and ChaylaJr (1.42e-167). /note=Transmembrane domains: No TMDs predicted in DeepTMHMM, so this protein is not a membrane protein. /note=Secondary Annotator Name: Shah, Amay /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 22526 - 23098 /gene="67" /product="gp67" /function="hypothetical protein" /locus tag="JulietS_67" /note=Original Glimmer call @bp 22604 has strength 10.2; Genemark calls start at 22526 /note=SSC: 22526-23098 CP: yes SCS: both-gm ST: SS BLAST-Start: [gp72 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 1.69846E-137 GAP: 4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.437, -5.906571732736166, no F: hypothetical protein SIF-BLAST: ,,[gp72 [Mycobacterium phage Bxz1] ],,NP_818148,100.0,1.69846E-137 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Lee, Amber /note=Auto-annotation: Both Glimmer and GeneMark call the gene but at different start sites. Glimmer calls the gene at the start site 22604 bp. GeneMark calls the gene at the start site 22526 bp. Preference is given to the GeneMark start site at 22526 bp even though it has a final score that’s very negative and a Z-score that’s less than 2 because there is a significantly smaller gap (4 bp vs 82 bp with the Glimmer start site) and Starterator notes that this site is the most manually annotated start site. The start codon is GTG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: The start site 22526 bp corresponds to a Final Score of -5.907 which is not the best final score but it minimizes the gap and the start site is the most annotated start site. /note=Gap/overlap: Gap: 4 bp. Reasonable because it is less than 50 bp, the gap is conserved in other phages (Adlitam, Ading) of the same cluster and there is no coding potential in the gap that might be a new gene. /note=Phamerator: pham: 783. Date 1/12/2023. It is conserved; found in Adlitam (C1) and Ading(C1). /note=Starterator: Start site 5 in Starterator was manually annotated in 97/118 non-draft genes in this pham. Start 5 is 22526 bp in JulietS. This evidence agrees with the site predicted by GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 22526 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: NKF. E-values too large and not enough coverage in HHPRED. Many strong hits (e-value ~0) are observed in PhagesDB and NCBI BLAST, but no known function. No hits in CDD. Not a membrane protein because it wasn’t called by TMHMM, Deep TMHMM, or TOPCONS. /note=Transmembrane domains: Neither TMHMM, Deep TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Qi, Haocheng /note=Secondary Annotator QC: I agree with the location and function call, good work. CDS 23103 - 23453 /gene="68" /product="gp68" /function="hypothetical protein" /locus tag="JulietS_68" /note=Original Glimmer call @bp 23103 has strength 11.44; Genemark calls start at 23103 /note=SSC: 23103-23453 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SCOTTMCG_70 [Mycobacterium phage ScottMcG] ],,NCBI, q1:s1 100.0% 1.17252E-78 GAP: 4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.644, -5.490418349731808, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SCOTTMCG_70 [Mycobacterium phage ScottMcG] ],,YP_002224103,96.6667,1.17252E-78 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Qi, Haocheng /note=Auto-annotation: Both Glimmer and genemark, they agree on the same site which is 23103 the start codon is ATG /note=Coding Potential: Coding potential is found both in GeneMark Self and Host, and there is only forward potential, and the chosen start site does include all of the coding potential. /note=SD (Final) Score: -5.490, this is not the highest final score but consider the other factors such as Starterator this number is reasonable. /note=Gap/overlap: 4, the smallest gap and also conserved in other phages such as daffodil /note=Phamerator: pham: 569. Date 01/08/23. It is conserved; found in other 166 non-draft phages, such as Ading or Bread. /note=Starterator: Start site 24 in Starterator was manually annotated in 141/154 non-draft genes in this pham. Start 24 is 23103 in JulietS. This evidence agrees with the site predicted by both Glimmer and GeneMark. /note= /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 23103 /note=Function call: Not known function. The top 2 phagesdb BLAST hits (phage:ZygoTaiga; Zeenon) have the function of unknown function with e-value of 4e-64 Not known function, and the ncbi blast also have no known function with e-value of 1e-78. There is also no hits in CDD, the largest possibility in HHpred is 63.06%, so it is not reliable, so overall there is no known function in this gene. /note=Transmembrane domains: Neither TMHMM or TOPCONS or DeepTMHMM predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Luk,Jarrett /note=Secondary Annotator QC:checked CDS 23514 - 24248 /gene="69" /product="gp69" /function="hypothetical protein" /locus tag="JulietS_69" /note=Original Glimmer call @bp 23514 has strength 14.92; Genemark calls start at 23514 /note=SSC: 23514-24248 CP: yes SCS: both ST: SS BLAST-Start: [gp74 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 6.44221E-177 GAP: 60 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.295, -1.953940808934884, yes F: hypothetical protein SIF-BLAST: ,,[gp74 [Mycobacterium phage Bxz1] ],,NP_818150,99.1803,6.44221E-177 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Uyemura, Antonio /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 23514. The start site codon is ATG. /note=Coding Potential: There is good coding potential in both the Host trained and self trained GeneMark. Additionally, the coding potential for this ORF is only in the forward strand indicating that it is a forward gene. /note=SD (Final) Score: -1.954. This is the best final score in PECAAN. /note=Gap/overlap: 60bp. Although the gap is over the 50 bp threshold, it is the smallest gap PECAAN. There is no coding potential in the gap. /note=Phamerator: pham: 572 01/11/23. It is conserved and found in both Ading (C) and Grungle (C). /note=Starterator: Start site 4 was the most manually annotated site (154/154) for non draft genes. Start 4 is 23514 which agrees with both Glimmer and GeneMark /note=Location call: Based on the above evidence, this is a real gene that starts at 23514. Additionally the z-score was 3.295. /note=Function call: The function is unknown. The top three phages (Adlitam (C1), Amataga (C1), ArcherS7 (C1)) phagesdb BLAST hits mark the function as unknown (E-value 1e-146). Additionally the first 5 NCBI BLAST hits also call the function as a Hypothetical protein (100% coverage, 99% identity, and E-values <10^-176). HHPRED is not convincing evidence nor did anything show up for CDD. /note=Transmembrane domains: DeepTMHMM predicts it to be inside the cell with 100% probability. /note=Secondary Annotator Name: Arredondo, Alexis /note=Secondary Annotator QC: I agree with the function and location call made by the primary annotator. CDS 24245 - 24457 /gene="70" /product="gp70" /function="hypothetical protein" /locus tag="JulietS_70" /note=Genemark calls start at 24245 /note=SSC: 24245-24457 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein KHO62_gp078 [Mycobacterium phage NoodleTree] ],,NCBI, q1:s1 100.0% 8.14781E-41 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.389, -4.902475914772797, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein KHO62_gp078 [Mycobacterium phage NoodleTree] ],,YP_010057952,98.5714,8.14781E-41 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Barrera, Alexis /note=Auto-annotation: Genemark calls the start site at 24245 and Glimmer does not call a start site. The start codon is GTG, which is common (about half of all genes have this start). /note=Coding Potential: Coding potential found in GeneMark Self contains both typical and atypical peaks in the forward and the reverse direction. GeneMark Host contains very little coding potential in this region in the forward and the reverse direction. /note=SD (Final) Score: The z score is 2.389. The final score is -4.902 which is the fourth highest value. /note=Gap/overlap: -4 bps gap upstream of the gene and a 3 bp gap downstream of the gene. These gaps are conserved in phages Ava3, CharlieB, and Latch. /note=Phamerator: Pham: 572. Date 01/14/23. The gene is conserved in phages JustHall and Blackbrain which are in the same cluster as JulietS. /note=Starterator: Start site 4 was called for 154 of 154 non-draft phage genomes in the pham, it is found in 168 of 168 of the genes in this pham. It is the most annotated start site and is called 100% of the time when it is present. /note=Location call: Considering the evidence above, this gene is a real gene and has a start site at 24245 bp. Genemark calls the start at 24245 and Glimmer does not make a call. /note=Function call: NKF. The top hits from Phagesdb BLASTp shows phages ZygoTaiga, Zeenon, and Zalkecks all from cluster C1 having no function (all E-value: 4e-34). NCBI BLASTp shows phages NoodleTree, StephanieG, and LinStu with no known function, full coverage and E-values of e-40. Phagesdb Function Frequency, HHpred, CDD showed no relevant hits. /note=Transmembrane domains: DeepTMHMM predicts no TMRs. This evidence indicates that this is not a membrane protein. /note=Secondary Annotator Name: Fernandez, Mackenzie /note=Secondary Annotator QC: Phamerator: there is a new pham assignment, list total number of phages in pham, need to update start number; agree with function call CDS 24454 - 24624 /gene="71" /product="gp71" /function="hypothetical protein" /locus tag="JulietS_71" /note=Original Glimmer call @bp 24460 has strength 9.15; Genemark calls start at 24460 /note=SSC: 24454-24624 CP: no SCS: both-cs ST: SS BLAST-Start: [gp76 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 9.00957E-33 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.797, -3.824830700482977, no F: hypothetical protein SIF-BLAST: ,,[gp76 [Mycobacterium phage Bxz1] ],,NP_818152,100.0,9.00957E-33 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kristianto, Luke /note=Auto-annotation: Both Glimmer and GeneMark call the gene, and they agree on the start site at 24460 bp. The start codon is ATG, which has a high probability of being used as a start site. /note=Coding Potential: Coding potential in this ORF is on the forward strand only on the first frame, indicating that this is a forward gene. The ORF has reasonable coding potential and the called start site does capture all of the coding potential. /note=SD (Final) Score: The final score is the best option at -3.348 and the z-score is the second best option at 2.797. /note=Gap/overlap: The gap/overlap is reasonable at 2 bp. This gap is conserved and shows up in phage Ading from cluster/subcluster C1. /note=Phamerator: Pham: 1194. Date 1/18/2023. It is conserved and found in Ading (C1) and Adlitam (C1). /note=Starterator: Start number 2 in Starterator was manually annotated in 44/73 non-draft genes in this pham. Start number 2 is 24460 bp in phage JulietS. This is likely the start site because it was called the most and was conserved in 80/80 (100.0%) of genes in the pham. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 24460 bp. Starterator agrees with Glimmer and GeneMark. /note=Function call: NKF. The top two PhagesDB BLAST hits have the function of “function unknown” (100% identity, E-value = 3e-28), and the top two NCBI BLAST hits have the function of “hypothetical protein” (96.4286%/100% identity, E-value = 2.87274e-31/3.33173e-31). Results from CDD and HHpred were irrelevant because either no results came up or unlikely results with unreasonably high e-values came up. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs. DeepTMHMM agrees (1.0 probability inside throughout). Therefore, it is likely not a membrane protein. /note=Secondary Annotator Name: Kumar, Preyasi /note=Secondary Annotator QC: I agree with this gene`s function call and location. CDS 24680 - 25006 /gene="72" /product="gp72" /function="hypothetical protein" /locus tag="JulietS_72" /note=Original Glimmer call @bp 24680 has strength 13.6; Genemark calls start at 24680 /note=SSC: 24680-25006 CP: yes SCS: both ST: SS BLAST-Start: [gp76 [Mycobacterium phage Spud] ],,NCBI, q1:s1 100.0% 7.84431E-74 GAP: 55 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.13, -4.534089041695007, no F: hypothetical protein SIF-BLAST: ,,[gp76 [Mycobacterium phage Spud] ],,YP_002224330,100.0,7.84431E-74 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Shah, Amay /note=Auto-annotation: Glimmer and Genemark. Both call the start site at 24680. /note=Coding Potential: Coding potential in this ORF is in the forward direction only, indicating that this is a forward gene. Coding potential is found in both GeneMark Self and Host. /note=SD (Final) Score: The final score is -4.534. This is the second best final score on PECAAN. /note=Gap/overlap: The gap is 55bp which is reasonable. /note=Phamerator: pham: 475. Date 1/10/2023. It is conserved; found in Alice (C) and Astraea (C). /note=Starterator: Start site 3 in Starterator was manually annotated in 157/160 non-draft genes in this pham. Start 3 is 24680 in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 24680. /note=Function call: NFK. BLASTp hits only included hypothetical proteins. CDD returned no hits. HHPRED returned no significant hits. The top BLAST hit was a hypothetical protein with an e-value of 7.84e-71, so there is not enough evidence to determine a function call. /note=Transmembrane domains: Deep TMHMM doesn`t predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Lee, Amber /note=Secondary Annotator QC: I agree with the function and location calls. CDS 25045 - 25755 /gene="73" /product="gp73" /function="hypothetical protein" /locus tag="JulietS_73" /note=Original Glimmer call @bp 25045 has strength 14.01; Genemark calls start at 25045 /note=SSC: 25045-25755 CP: yes SCS: both ST: SS BLAST-Start: [gp78 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 4.54243E-173 GAP: 38 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.141, -2.338663114094478, yes F: hypothetical protein SIF-BLAST: ,,[gp78 [Mycobacterium phage Bxz1] ],,NP_818154,100.0,4.54243E-173 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sandhu, Muskaan /note=Auto-annotation: The start site in Glimmer and GeneMark are the same: 25045. Start codon is ATG, which is more common. /note=Coding Potential: There is coding potential present in ORF 1 of the direct sequence (as expected for a forward gene) in the Host-trained & Self-trained Genemark. Start site 25045 covers all of coding potential. /note=SD (Final) Score: -2.399. This is the best score and it corresponds to the start site reported by Glimmer and GeneMark. /note=Gap/overlap: There is a 39 bp gap upstream of the gene and a 14bp gap downstream of the gene. Gap is reasonable because it is conserved in phage BananaFence and there is no coding potential present to fill in the gaps in Host-Trained and Self-Trained Genemark. /note=Phamerator: Pham 545 as of 1/12/2023. It is conserved, found in Latch (C1) and SmallFry (C1). /note=Starterator: 156/156 of non-draft genes called start site 1. Start 1 is @ 25045 bp in JulietS. Evidence agrees w/ start site predicted by Glimmer and GeneMark. /note=Location call: Start site 1 @ 25045 bp. /note=Function call: No known function. The top three PhagesDB BLAST hits have the function of hypothetical proteins (E-value = 1*10-137) and the top three NCBI BLAST hits also have the function of hypothetical proteins (100% coverage, >98% identity, and E-value <10-173). HHpred had no relevant hits with E-values being greater than 34. CDD had no relevant hits as well. /note=Transmembrane domains: DeepTMHMM does show any TMDs, therefore, this is not a transmembrane protein. /note=Secondary Annotator Name: Aunger, Sarah /note=Secondary Annotator QC: I can agree with the functional call and location of the gene CDS 25778 - 25978 /gene="74" /product="gp74" /function="hypothetical protein" /locus tag="JulietS_74" /note=Genemark calls start at 25769 /note=SSC: 25778-25978 CP: yes SCS: genemark-cs ST: SS BLAST-Start: [hypothetical protein M181_gp084 [Mycobacterium phage Gizmo] ],,NCBI, q1:s1 100.0% 4.50599E-41 GAP: 22 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.131, -4.3913063680686815, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein M181_gp084 [Mycobacterium phage Gizmo] ],,YP_008060881,100.0,4.50599E-41 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Wu, Angus /note=Auto-annotation: Glimmer did not call a start site. GeneMark called the start site at 25769. The start codon that is called is GTG. /note=Coding Potential: The gene has good coding potential with the host-trained GeneMark on ORF 2. The coding potential increases near the start site potential start site (25769) and plateaus, but tapers off a little bit earlier around 25900 instead of at the stop site. The gene similarly has good coding potential with the self-trained GeneMark on ORF 2, where the coding potential rises at the start site, but also drops a bit earlier than the stop site. Finally, the start and stop sites encompass all of the coding potential. /note=SD (Final) Score: For start site 25769, The Final Score is -6.140 and the Z score is 1.334.These are not good metrics. For an alternate start site (25778), the Final Score is -4.391, and the Z score is 2.131. These are the best Final Score and Z scores among the possible candidate start sites. /note=Gap/overlap: For start site 25778, there is a 22 base pair gap with the gene upstream, and this is reasonable/acceptable because the gap is very short. This chosen start site does not have the longest open reading frame, but it has a gene length of 201 bp, which is good. /note=Phamerator: The gene is found in Pham 520 as of 1/15/23. The gene is conserved in other members of subcluster C1, and I used phage Stubby and Turret for comparison. /note=Starterator: A start site choice exists that is conserved among members of the pham, and corresponds to start site 2 and 25769. 84/157 non-draft genomes in the pham call also this start site, and is the most called start site. The 2nd most called start site is 25778, and is called in the remaining 73 non-draft genomes. /note=Location call: The available evidence suggests that the start site is 25778. The gene appears to be real, and this proposed start site covers all the coding potential, and has a good final score and z-score. In starterator, it is not the most called but is the 2nd most called. /note=Function call: The function of this protein is unknown (NKF). All results from PhagesDB BLASTp show hits with 100% identity, 100% alignment, and 100% coverage, with e-values 3e-33,, and those proteins have no known function. Similarly on NCBI BLASTp, e-values are low (5e-41), and those proteins also have no known function. There were no hits on CDD, and HHpred results were not desirable, as e-values were too high. /note=Transmembrane domains: Neither DeepTMHMM or TOPCONS predicted any TMDs, this is not a membrane protein. /note=Secondary Annotator Name: Uyemura, Antonio /note=Secondary Annotator QC: After reviewing the above evidence, I agree. Although the start site had to be switched and does not agree with GeneMark, I do agree that 25778 is a better call than 25769. CDS 26016 - 26312 /gene="75" /product="gp75" /function="membrane protein" /locus tag="JulietS_75" /note=Original Glimmer call @bp 26016 has strength 3.12; Genemark calls start at 26016 /note=SSC: 26016-26312 CP: yes SCS: both ST: SS BLAST-Start: [gp77 [Mycobacterium phage Rizal] ],,NCBI, q1:s1 100.0% 5.16487E-64 GAP: 37 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.306, -2.0720764396375664, yes F: membrane protein SIF-BLAST: ,,[gp77 [Mycobacterium phage Rizal] ],,YP_002224775,100.0,5.16487E-64 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Davis, Kayla /note=Auto-annotation: Both Glimmer and GeneMark have a start site listed at 26016, with a start site of GTG /note=Coding Potential: There is good coding potential within this gene as both the Self-trained and Host-trained Genemark have direct sequences that are complete.In addition to that, the gene shares synteny with phages Adlitam, Bangla1971 and Chomp. /note=SD (Final) Score: The Z-score that was chosen was one of 3.306, with a final score of -2.072. These are the best scores out of the candidates listed. The start site also matched with the one listed by both GeneMark and Glimmer. /note=Gap/overlap: There is a gap of 37 bp which is less than the maximum recommended gap of 50 base pairs. This gap is also shared in other phages like Atlantean and Ava3 /note=Phamerator: As of 1/12/2023, this gene is a part of Pham 523. It was found to be conserved in several other cluster C phages such as Alice, Amataga, and BeanWater. /note=Starterator: As of 01/08/23, the most called start number is 1, which was called in 157/157 of the non-draft genes in this phages. In JulietS, start site 1 correlates with the start of 26016, which matches with the sites listed by GeneMark and Glimmer. /note=Location call: Based on the evidence above, it is likely that this is a real gene that has a start site at 26016. /note=Function call: NKF. This gene appears to have no known function based on all of the evidence. The BLASTp had several hits with low e-values, all of which were classified as function unknown.The NCBI Blast had shown that there were several hits with e-values of 0.0 for a hypothetical protein. All of the top results that were provided for HHpred all had extremely high e-values as well. In addition to that, there were no hits for CDD. /note=Transmembrane domains: There were no TOPCONS that were predicted. Deep TMHMM predicted 2 TMRs, meaning that this is a transmembrane protein, with no known function. /note=Secondary Annotator Name: Kristianto, Luke /note=Secondary Annotator QC: I agree with the location and function call. Please add information about whether the gap/overlap is conserved and in what other phages. Please also include information regarding DeepTMHMM. CDS 26312 - 26473 /gene="76" /product="gp76" /function="hypothetical protein" /locus tag="JulietS_76" /note=Original Glimmer call @bp 26312 has strength 1.96; Genemark calls start at 26312 /note=SSC: 26312-26473 CP: yes SCS: both ST: SS BLAST-Start: [gp81 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 6.22317E-31 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.841, -3.812150084159046, yes F: hypothetical protein SIF-BLAST: ,,[gp81 [Mycobacterium phage Bxz1] ],,NP_818157,100.0,6.22317E-31 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Pham, Truc /note=Auto-annotation: The two auto-annotation algorithms, Glimmer and GeneMarks, both call the start of this gene at 26312 with the start codon of ATG. /note=Coding Potential: The coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host and all of the ORF is included with the 26312 start site. /note=SD (Final) Score: The SD (Final) Score for this start site is -3.812, which is the best score of all the possible start sites on PECAAN. Additionally, this start site also has the best Z-score of 2.841 on PECAAN. /note=Gap/overlap: This gene has a gap of 162 base pairs with the previous gene. This is not the best possible start site given that there are two other possible start sites with a smaller gap. However, this gene is conserved in several other phages and the gap was seen in the other phages as well, such as phage Spud and Shelob. /note=Phamerator: This gene belongs to pham number 514 as of 1/08/2023. The gene is conserved in phages of this cluster (C) like Spud and Shelob. There is no function listed for members of this family, so it is highly likely that this is a gene with an unknown function. /note=Starterator: Start site 1 is most often called as it was manually annotated in 157/157 non-draft genes in the pham. Start 1 is 26312 in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 26312. /note=Function call: No Known Function. The top three phagesdb BLAST hits are function unknown (E-value <10^-25), and the top three NCBI BLAST hits also have no known function for this gene (100% coverage, 98.11+% identity, and E-value <10^-30). HHpred’s top hits also indicate no known function (89% probability, 75.47% coverage, and E-value <0.73). CDD had no relevant hits. /note=Transmembrane domains: Neither DeepTMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Tseng, Kylie /note=Secondary Annotator QC: I agree with the primary annotator`s location and function calls. However, don`t forget to include DeepTMHMM for the transmembrane domain call. CDS 26454 - 26942 /gene="77" /product="gp77" /function="membrane protein" /locus tag="JulietS_77" /note=Original Glimmer call @bp 26454 has strength 15.92; Genemark calls start at 26454 /note=SSC: 26454-26942 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein M180_gp083 [Mycobacterium phage ArcherS7] ],,NCBI, q1:s1 100.0% 1.0667E-112 GAP: -20 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.74, -3.76872822804853, no F: membrane protein SIF-BLAST: ,,[hypothetical protein M180_gp083 [Mycobacterium phage ArcherS7] ],,YP_008061341,100.0,1.0667E-112 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Deng, Yiran /note=Auto-annotation: Glimmer and GeneMark both call the start site at 26454 with start codon ATG /note=Coding Potential: Coding potential in this ORF is found on forward direct sequence and no switch in orientation is observed in both host-trained and self-trianed Genemark. All the coding potential included in the ORF and by the selected start site. /note=SD (Final) Score: the highest Final score (-7.140) starts at 26385, while the 7286 start has a lower final score (-5.743) but better Z-score (2.74) /note=Gap/overlap: the overlap with the upstream gene is 20 bp. This gene is conserved in several other phages of the same cluster (Fludd, ChaylaJr) and the overlap was seen in Fludd and Grungle as well. /note=Phamerator: pham: 482, date 01/20/2023. It is conserved; found in Ading (C) and Alice (C). /note=Starterator: Start site number 74 in Starterator had the highest manual annotation in 160/160 non-drafted genes in this pham. Start site 74 is at position 26454 in JulietS, which agrees with the auto-annotated site by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 26454 since it has a smaller overlap and has the highest manual annotation in this pham. /note=Function call: NKF. None of phagesDB BLAST, NCBI BLAST, CDD, and HHpred shows any significant relevant hits with known functions, only significant probability or coverage to hits with unknown function. /note=Transmembrane domains: deep TMHMM predict one TMD, therefore it is a membrane protein. /note=Secondary Annotator Name: Barrera, Alexis /note=Secondary Annotator QC: I`m in agreement with the primary annotator`s location and function call. However, I believe you can mark some evidence for the NCBI Blast. Also, I would run the protein sequence on DeepTMHMM to confirm this gene does not code for a membrane protein. CDS 26945 - 28630 /gene="78" /product="gp78" /function="hypothetical protein" /locus tag="JulietS_78" /note=Original Glimmer call @bp 26945 has strength 8.47; Genemark calls start at 26945 /note=SSC: 26945-28630 CP: yes SCS: both ST: SS BLAST-Start: [gp78 [Mycobacterium phage Cali] ],,NCBI, q1:s1 100.0% 0.0 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.926, -4.883837968736599, no F: hypothetical protein SIF-BLAST: ,,[gp78 [Mycobacterium phage Cali] ],,YP_002224555,100.0,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Santilla, Matthew /note=Auto-annotation: Both Glimmer and Genemark agree the start site is @ 26945. Start codon is ATG. /note=Coding Potential: The host-trained Genemark indicates that there is coding potential between the start site @ 26945 and the stop site @ 28630. The self-trained Genemark also agrees that there is coding potential between the aforementioned start and stop sites. /note=SD (Final) Score: The final score is -4.884, which is not the least negative number, but it is close to the most negative number, and the z-score is 1.926 which is not 2, but relatively close to 2. /note=Gap/overlap: The gap found is 2 which is under the recommended 50bp limit for a gap. /note=Phamerator: This gene is found in Pham 53992. Ading, RoMag, and Shaqnato which are all in Cluster C which is the same cluster as our phage JulietS. /note=Starterator: Start site 1 is the most called start site with 151/153 non-draft genes calling this start site. Start site 1 is @ 26945 in JulietS which agrees with the Glimmer and Genemark auto annotation. /note=Location call: Based on the aforementioned evidence I believe that the start site is @ 26945. /note=Function call: ParB-like nuclease domain protein. Based on the PhagesDB blast three phages have an e value of 0.0 which are Ronan, CindyLou, and Capablanca which all list the function as a ParB-like nuclease domain protein. The top 2 results on NCBI blast come from phage Cali which states the function is gp78 with an e value of 0.0 and percent identity of 100% and phage Ronan which states the function is a ParB-like nuclease domain protein with an e value of 0.0 and percent identity 99.82%. CDD had no relevant hits. HHPred had 2 significant hits with Sulfiredoxin, code 6KY4_A, probability 96.49%, % coverage of 14.4385, e-value of 0.0098, and is a oxidoreductase, and ParB_N_like_MT, code 7BNR_A, probability 95.39%, e-value of 0.02, coverage of 9.80392%, and is a DNA Binding Protein. HHPred also had 100% alignment with phage Ronan, protein number 76, which was called to ParB-like nuclease domain protein with an e-value of 6.2e-280. /note=Transmembrane domains: Deep TMHMM did not predict any transmembrane domains, so this gene is not a membrane protein. /note=Secondary Annotator Name: Rheinhardt, Jenna /note=Secondary Annotator QC: I agree with the primary annotators location call and function call based on the provided evidence. /note=Tertiary QC - Fadi Al Banaa: Gene function changed to NKF. No significant HHpred hits. Many CDD hits for parB partioning system or ParB N-terminal domain (very poor coverage and sequence identity). tRNA 28721 - 28804 /gene="79" /product="tRNA-Ser(gct)" /locus tag="JULIETS_79" /note=tRNA-Ser(gct) tRNA 28997 - 29072 /gene="80" /product="tRNA-Leu(cag)" /locus tag="JULIETS_80" /note=tRNA-Leu(cag) tRNA 29192 - 29266 /gene="81" /product="tRNA-Leu(gag)" /locus tag="JULIETS_81" /note=tRNA-Leu(gag) tRNA 29267 - 29340 /gene="82" /product="tRNA-Leu(caa)" /locus tag="JULIETS_82" /note=tRNA-Leu(caa) CDS 29410 - 33123 /gene="83" /product="gp83" /function="hypothetical protein" /locus tag="JulietS_83" /note=Original Glimmer call @bp 29410 has strength 10.94; Genemark calls start at 29410 /note=SSC: 29410-33123 CP: yes SCS: both ST: SS BLAST-Start: [gp83 [Mycobacterium phage Cali] ],,NCBI, q1:s1 100.0% 0.0 GAP: 779 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.986, -2.6814417326944517, yes F: hypothetical protein SIF-BLAST: ,,[gp83 [Mycobacterium phage Cali] ],,YP_002224556,100.0,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kumar, Preyasi /note=Auto-annotation: Glimmer and GeneMark both call the start at 29410, ATG. /note=Coding Potential: Yes, the gene has reasonable coding potential predicted within the putative ORF and the chosen start site covers all this coding potential. Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -2.681. It is the best final score on PECAAN. /note=Gap/overlap: 208bp. Slightly large, but the smallest and most reasonable candidate, especially since the gene length is so large. I didn’t choose a different gene candidate because both Glimmer and GeneMark agreed on the current start site and all other final scores are lower than the current start site. There is no coding potential in the gap that might be a new gene. /note=Phamerator: Pham 64503. Date 01/08/2023. It is conserved, found in ShiaLabeouf and Shrimp. /note=Starterator: Yes, there is a conserved start site choice. It is start number 1 with a base pair coordinate of 29410. 161 of 163 call site #1. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 29410 bp. /note=Function call: NKF; Both NCBI and PhagesDB database did not predict functions for this gene. The top three PhagesDB BLAST hits have unknown function (e-value = 0). 1 CDD hit for PHA03169 super family, hypothetical protein was found. /note=Transmembrane domains: Neither TMHMM or TOPCONS or DeepTMHMM predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Aunger, Sarah /note=Secondary Annotator QC: I agree with the function call and location of the gene. Make sure to add your DeepTMHMM notes as well. CDS 33139 - 33507 /gene="84" /product="gp84" /function="hypothetical protein" /locus tag="JulietS_84" /note=Original Glimmer call @bp 33232 has strength 5.1; Genemark calls start at 33139 /note=SSC: 33139-33507 CP: yes SCS: both-gm ST: SS BLAST-Start: [gp84 [Mycobacterium phage Cali] ],,NCBI, q1:s1 100.0% 8.72637E-86 GAP: 15 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.496, -3.979581593363423, no F: hypothetical protein SIF-BLAST: ,,[gp84 [Mycobacterium phage Cali] ],,YP_002224557,100.0,8.72637E-86 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Rheinhardt, Jenna /note=Auto-annotation: Glimmer calls 33232 and GeneMark calls 33139. Start codon is GTG. /note=Coding Potential: There is coding potential that would cover both of the start sites presented by Glimmer and GeneMark within the self-trained GeneMark. Potential is on the forward strand on the first row. /note=SD (Final) Score: -3.980 ; This is the best final score presented on PECAAN. Z-score of 2.496. /note=Gap/overlap: gap of 15 bp; reasonable gap. /note=Phamerator: pham : 4856 , date completed on 1/18/2023. conserved within other phages that contain the gene with the gene length of 276 bp. Astraea (C ) and Derek (C ). /note=Starterator: Start site 2, 33193, this was not the start site that was called by the MA phages.. This agrees with the data from GeneMark. /note=Location call: Based on the evidence from GeneMark, Starterator and Phamerator the start site would most likely be 33193. /note=Function call: NKF ; Phagesdb BLAST has phages Alice and BackyardAgain with e-values of 3e-50 presenting unknown functions. HHPRED has an uncharacterized hit with 78.002% coverage. NCBI BLAST has phage ScottMcG with 74.5902% alignment and e-value of 5.76739e-60 with a hypothetical protein. No function frequency or CDD was given. /note=Transmembrane domains: DeepTmHmm presented no transmembrane domains so it is likely not a membrane protein. /note=Secondary Annotator Name: Kumar, Preyasi /note=Secondary Annotator QC: I agree with this gene`s function call and location. CDS 33511 - 36135 /gene="85" /product="gp85" /function="portal protein" /locus tag="JulietS_85" /note=Original Glimmer call @bp 33511 has strength 13.72; Genemark calls start at 33511 /note=SSC: 33511-36135 CP: yes SCS: both ST: SS BLAST-Start: [gp85 [Mycobacterium phage Cali] ],,NCBI, q1:s1 100.0% 0.0 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.719, -3.160956374632079, no F: portal protein SIF-BLAST: ,,[gp85 [Mycobacterium phage Cali] ],,YP_002224558,100.0,0.0 SIF-HHPRED: Portal_Gp20 ; Bacteriophage T4-like portal protein (Gp20),,,PF07230.14,50.4577,99.9 SIF-Syn: portal protein; upstream gene is in [pham 4856] function is NKF, downstream gene is in [pham 561] function is NKF, just like in phages Capablanca and Derek. /note=Primary Annotator Name: Tseng, Kylie /note=Auto-annotation: Glimmer and Genemark both call the same start site at 33511; start codon: ATG /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF on the forward strand only on line 1 (indicating that this is a forward gene). The chosen start site covers this ORF. Coding potential is found in both the host and self-trained Genemark. There is synteny with non-draft phages such as Amataga and Audrick. /note=SD (Final) Score: -3.161. This is a reasonable SD score. /note=Gap/overlap: There is a gap of 3 bp, which is reasonable. This is the smallest gap when compared to the other start site choices. /note=Phamerator: Date of investigation: 1/17/23; Pham 64501; Yes, the pham is in other members that belong to Cluster C, which is the cluster JulietS belongs to (i.e. phages Babyland and Bipolarisk). /note=Starterator: Yes, there is a conserved start choice. It is start number 5 with a base pair coordinate of 33511. Has 136 MA’s. Found in 147/158 (93%) of genes in pham. This start site also agrees with Glimmer and Genemark’s call. E-values are good at 0. /note=Location call: Yes, the evidence suggests this gene is real. Start site 16920 is most likely. /note=Function call: Portal protein; both PhagesDB and NCBI have hits to support this call; supporting evidence in phages CharlieB, FudgeTart, and Grungle with e-values of 0 and NCBI hits have coverages of 100%, identities above 99%, and e-values of 0. HHPred has hits with coverages at 50% and e-values very close to 0 (-20 and -18). CDD had no hits. /note=Transmembrane domains: No TMDs were predicted by DeepTMHMM; therefore it is not a membrane protein. /note=Secondary Annotator Name: Arredondo, Alexis /note=Secondary Annotator QC: I agree with the function and location call made by the primary annotator. CDS 36242 - 36601 /gene="86" /product="gp86" /function="hypothetical protein" /locus tag="JulietS_86" /note=Genemark calls start at 36242 /note=SSC: 36242-36601 CP: yes SCS: genemark ST: SS BLAST-Start: [gp90 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 8.62831E-80 GAP: 106 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.496, -3.769471247018311, yes F: hypothetical protein SIF-BLAST: ,,[gp90 [Mycobacterium phage Bxz1] ],,NP_818163,100.0,8.62831E-80 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Aunger, Sarah /note=Auto-annotation: Glimmer does not call a start site but GeneMark calls one at 36242 with a start codon of TTG. /note=Coding Potential: Coding potential in this Open Reading Frame (ORF) is in the forward strand, indicating a forward reading gene. The suggested start site of #36242 covers all of the coding potential regions on the forward strand, which supports the forward direction of this gene. But, the coding potential does not look extremely strong to classify the gene are fully real. But, the poor coding potential is found in both Host-Trained GeneMark and Self-Trained GeneMark graphs which does suggest that this can be a real gene. /note=SD (Final) Score: The final score is the least negative (SD = -3.769), and the Z-score is the highest overall score (Z-score = 2.469), indicating that the autogenerated start site is the better of the options. The GeneMark start site does call for the suggested start site which does allow for some legitimacy in choosing the autogenerated site especially with a Z-score over 2 as it allows the gene to be the longest reasonable length. /note=Gap/overlap: There is a gap of 106 bp, which seems reasonable when looking at the syntenty of other phages classed in the C1 cluster against this gene and Grungle. There is a reasonal gap between this gene and an the downstream gene of about 71 bp which allows for the conserved 360 bp gene to be reasonable. /note=Phamerator: The gene was found to be in Pham 561 (01/18/2023), which is common in Cluster C1 phages, as previously seen in Phages Ading, Ava, and Grungle. There are no functions listed for this Pham. However, the base pair length was conserved at 360 bp. /note=Starterator: There is a reasonable and highly conserved start site that was looked on 01/18/2023 at (1, 36242) which was called by 156 out of the 157 non-draft genes out of the 168 total pham members. /note=Location call: The gene and the start site are both conserved for this gene. Even though there is a low mediocre coding potential in the ORF there is still an indication of a real gene’s placement in the operon due to syteny between other Phages in C1(Grungle) and thus the evidence supports that the start site of this gene starts at #36242. Being as the Pham has the conserved length of 360 bp throughout other Phages in Cluster C1 and GeneMark calls for the suggested start site, it indicates that the suggested start site is more favoured against any other start site since it allows for the gene to be the longest it can be (which being conserved through synteny). Starterator also calls for this site the most out of all of the Phages in the Pham 561 indicating a higher probability of the suggested start site being correct. /note=Function call: The function of the gene seems to have no known function. In the BLASTp on PhagesDB.org it has two matches between Zeenon and Wally phages that have an e-value of 3x10-64 and 100% positives with no known function (NKF). Additionally, when looking at the NCBI BLASTp it also indicates a no known function (NKF) with an e-value of 5x10-79 and a 99% match with phage ArcherS7. There are additional matches with a phage name ChickenPhender that indicates a hypothetical protein but this is not enough to determine a function. HHpred and CDD both give inclusive results with CDD giving no hits and HHpred having the lowest possible hit with an e-value of 5.3 and a probability of 64.48 for Collagen alpha-1(III) chain. Thus, this indicates a high probability that the function of this gene is not known. /note=Transmembrane domains: Both TOPCONS and TMHMM predict no transmembrane domains, thus this gene does not encode for a membrane protein. DeepTMHMM also predicts no transmembrane domains as well, so all encoding of this gene is done inside the cell. /note=Secondary Annotator Name: Fernandez, Mackenzie /note=Secondary Annotator QC: I agree with the primary annotators location call and function call based on the provided evidence. CDS 36673 - 36828 /gene="87" /product="gp87" /function="membrane protein" /locus tag="JulietS_87" /note=Genemark calls start at 36673 /note=SSC: 36673-36828 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein SCOTTMCG_89 [Mycobacterium phage ScottMcG] ],,NCBI, q1:s1 100.0% 6.38131E-21 GAP: 71 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.66, -3.284949965096066, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein SCOTTMCG_89 [Mycobacterium phage ScottMcG] ],,YP_002224117,100.0,6.38131E-21 SIF-HHPRED: SIF-Syn: membrane protein, upstream gene is in [pham 561] function of NKF, downstream gene is in [pham 114] function is ThyX-like thymidylate synthase, just like in phage Amataga and Guwapp. /note=Primary Annotator Name: Fernandez, Mackenzie /note=Auto-annotation: Glimmer score and start site not listed on PECAAN. GeneMark calls a start of 36673. ATG start codon. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene, and covers the start site. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.285. It is the best final score on PECAAN. Z-score is 2.66. /note=Gap/overlap: Gap of 71bp. The gap is conserved in other phages (Amataga, Guwapp) and there are no ORFs longer than 120bp. /note=Phamerator: Pham: 65632. Date 1/13/2023. Pham has 183 members and 16 phages are drafts. Conserved in HyRo and SilverDipper. /note=Starterator: Start site 6 in Starterator was manually annotated in 163 of the 167 non-draft genes in the pham. Start 6 is 36673 in JulietS. This evidence agrees with the site predicted by GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 36673. /note=Function call: Membrane protein. Both NCBI (e-values around e-11 and coverage of 100%) and PhagesDB (e-value of 2e-22) did not have any significant hits. CDD had zero hits. HHpred had potential hits with probability higher than 80% but the score was less than 30 for all hits. No significant e-values. Deep TMHMM supports the call of membrane protein. /note=Transmembrane domains: 2 predicted TMRs from Deep TMHMM, therefore it is a membrane protein. Type: alpha TM. Length of 51. /note=Secondary Annotator Name: Tseng, Kylie /note=Secondary Annotator QC: I agree with the location and function calls. However, I think you could add if the pham is conserved in other members of the same cluster (Phamerator) and you could also include specific e-values and coverage values for the hypothetical protein hits for NCBI and for phagesdb. CDS 36893 - 37600 /gene="88" /product="gp88" /function="ThyX-like thymidylate synthase" /locus tag="JulietS_88" /note=Original Glimmer call @bp 36893 has strength 9.18; Genemark calls start at 36893 /note=SSC: 36893-37600 CP: yes SCS: both ST: SS BLAST-Start: [gp91 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 4.09082E-176 GAP: 64 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.235, -4.254107899658224, yes F: ThyX-like thymidylate synthase SIF-BLAST: ,,[gp91 [Mycobacterium phage Bxz1] ],,NP_818164,100.0,4.09082E-176 SIF-HHPRED: Thymidylate synthase ThyX; Tetramer, UMP/dUMP methylase, ThyX homolog, TRANSFERASE; HET: 5BU, FAD; 1.76A {Streptomyces cacaoi subsp. asoensis} SCOP: d.207.1.0,,,4P5A_A,99.5745,100.0 SIF-Syn: ThyX-like thymidylate synthase, upstream gene is NKF, downstream is also NKF, just like in phage Grungle from the same pham 114. /note=Primary Annotator Name: Geghamyan, Knar /note=Auto-annotation: Both Glimmer and GeneMark call the gene and they agree that the start site is 36,893. The start codon is GTG. /note=Coding Potential: The ORF has good coding potential and the chosen start site includes all of the coding potential. /note=SD (Final) Score: -4.254 is the best SD (final) score because it is the highest (least negative) value, and has the smallest gap of all of the choices. Z-score is above 2. /note=Gap/overlap: There is a significant gap of 64bp upstream, but there is no coding potential in the gap, and when compared to non-draft phages from the same cluster, there weren`t any additional genes present. /note=Phamerator: Gene found in pham 57 on 1/13/23. When compared to phages BackyardAgain, Megamind, and Shrimp, the pham in which this gene is most commonly annotated was found to be in other members of the same cluster C. The start number called the most often in the published annotations is 57, it was called in 230 of the 339 non-draft genes in the pham. There is synteny with other non-draft phages belonging to the same cluster. /note=Starterator: Start: 57 @36893 has 230 MA`s. The start number called the most often in the published annotations is 57, it was called in 230 of the 339 non-draft genes in the pham. /note=Location call: The evidence supports that this is a real gene, and the potential candidate start site at 36893 seems most likely. /note=Function call: ThyX-like Thymidylate synthase. The top three phagesdb BLAST hits (Cali, Capablanca, and CindyLou) have the same function (E-value <10^-136), and the top 2 NCBI BLAST hits (Delylah and Nappy) also agreed that this is a ThyX-like Thymidylate synthase with 100% coverage, 100%+ identity, and E-value <10^-176. The top CDD hit also agreed on the function with a >93% coverage and e-value < 10^-25. Top three hits on HHpred with a probability of 100%, coverage >96%, and an e-value <10^-36 also agreed that the function is a Thymidylate Synthase. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Qi, Haocheng /note=Secondary Annotator QC: I agree with the location and function call. However, for the synteny, remember to add the pham number for those conserved NKF. CDS 37621 - 37869 /gene="89" /product="gp89" /function="hypothetical protein" /locus tag="JulietS_89" /note=Original Glimmer call @bp 37621 has strength 13.6; Genemark calls start at 37621 /note=SSC: 37621-37869 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_BEANWATER_86 [Mycobacterium phage BeanWater] ],,NCBI, q1:s6 100.0% 3.36036E-55 GAP: 20 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.426, -3.773970443115436, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BEANWATER_86 [Mycobacterium phage BeanWater] ],,ATN87540,94.2529,3.36036E-55 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Melnyk, Mattie /note=Auto-annotation: Both Glimmer and Genemark call the start site at 37621 with a start codon of ATG /note=Coding Potential: There is coding potential in the first forward ORF for the whole range of the gene /note=SD (Final) Score: The final score is -3.774 which is not the least negative final score but it is close /note=Gap/overlap: There is a gap of 20 which is a very reasonable gap /note=Phamerator: The pham number is 450 as of 1/13/23 and there are 179 members, 14 of which are drafts. Many of the phages in the Pham are cluster C including Drasdys and EasyJones /note=Starterator: JulietS calls the most annotated start site (Start 5 @ 37621) with 148 manual annotations. /note=Location call: This is likely a real gene with a start at 37621 /note=Function call: NKF; phagesDB hits have E-values of 3e^-44 and their function is called as function unknown, NCBI hits are (Identity:100, coverage:100, e-value: 3.36e^-55) and (Identity: 99.08, Coverage: 100, e-value: 4.018e^-55) and both call the function as hypothetical protein. CDD and HHPred had no relevant hits /note=Transmembrane domains: 0 predicted TMRs on Deep TMHMM and the protein is predicted to be intracellular /note=Secondary Annotator Name: Sandhu, Muskaan /note=Secondary Annotator QC: I agree with this location and function call. CDS 37870 - 38421 /gene="90" /product="gp90" /function="RNA ligase" /locus tag="JulietS_90" /note=Original Glimmer call @bp 37870 has strength 16.77; Genemark calls start at 37870 /note=SSC: 37870-38421 CP: yes SCS: both ST: SS BLAST-Start: [gp90 [Mycobacterium phage Cali] ],,NCBI, q1:s1 100.0% 8.85641E-133 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.295, -1.953940808934884, yes F: RNA ligase SIF-BLAST: ,,[gp90 [Mycobacterium phage Cali] ],,YP_002224563,100.0,8.85641E-133 SIF-HHPRED: d.61.1.2 (A:) 2`-5` RNA ligase LigT {Thermus thermophilus [TaxId: 274]} | CLASS: Alpha and beta proteins (a+b), FOLD: LigT-like, SUPFAM: LigT-like, FAM: 2`-5` RNA ligase LigT,,,SCOP_d1iuha_,78.6885,99.7 SIF-Syn: RNA ligase, upstream gene is NKF, and downstream gene is BHNH endonuclease, just like in phage Ading. /note=Primary Annotator Name: Gurunathan, Vibha /note=Auto-annotation: Both Glimmer and Genemark call the start site at 37870, at start codon GTG. /note=Coding Potential: Coding potential is in the top strand, and the gene has good coding potential. The start site covers all the coding potential (there is a gene right before, but it has a gap of 0 and it’s coding potential dips to 0, so this start site covers all of the coding potential of the next gene). /note=SD (Final) Score: This SD score has the best final score of -1.954, or the least negative, and a good Z- score of at least 2, at 3.295. /note=Gap/overlap: No gap/overlap. The gene length is 552 bp. /note=Phamerator: Pham 390. Date 1/14/23. It is conserved; found in YoungMoneyMata_95 /note=and ValleyTerrace_95. Function was listed as RNA ligase on the phams database. /note=Starterator: Start site 12 @37870 was the most annotated, with 133 manual annotations. Start site 22 @37924 was annotated much less in comparison, with 27 manual annotations. /note=Location call: Based on the above evidence, this gene is a real gene and the most likely start site is 37870. /note=Function call: The function is called as RNA ligase due to PhamsDB data showing that most of the genes in this Pham are RNA ligase with strong e-values. HHPred also showed strong e-values for RNA ligase genes, with every hit being a RNA ligase and e-values of around -130, which is strong evidence. CDD did not show conserved domains. /note=Transmembrane domains: There are no transmembrane sequences predicted by Deep TMHMM, so this protein likely does not have transmembrane regions. /note=Secondary Annotator Name: Hines, Kia /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 38478 - 38861 /gene="91" /product="gp91" /function="HNH endonuclease" /locus tag="JulietS_91" /note=Genemark calls start at 38478 /note=SSC: 38478-38861 CP: no SCS: genemark ST: SS BLAST-Start: [gp91 [Mycobacterium phage Cali] ],,NCBI, q1:s21 100.0% 2.24831E-84 GAP: 56 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.088, -5.009413356317462, no F: HNH endonuclease SIF-BLAST: ,,[gp91 [Mycobacterium phage Cali] ],,YP_002224564,86.3946,2.24831E-84 SIF-HHPRED: zf-His_Me_endon ; Zinc-binding loop region of homing endonuclease,,,PF05551.14,82.6772,99.8 SIF-Syn: HNH endonuclease, upstream gene is RNA ligase, downstream gene is LysM-like peptidoglycan binding protein like in McGee and Quasimodo /note=PECAAN Notes /note=Primary Annotator Name: Luk, Jarrett /note=Auto-annotation:Forward Gene.No Glimmer called , Genmark (38478), start codon:ATG /note=Coding Potential: The coding potential is found in the forward frame. The coding potential is poor on the forward strand /note=SD (Final) Score:-3.751 it`s the best score due to how it has a small gap. Z score is 2.504 which is greater than 2.It shows that the site is a good candidate for a start site. /note=Gap/overlap:4 bp overlap which indicates that it is more likely and operon. /note=Phamerator: Pham 58525,01/15/23 gene conserved in Phage YoungMoneyMata and Trinitium from Cluster C1 /note=Starterator: 01/08/23. Manually annotated 52/53 nondraft in this pham. Start 1 (3,20213) are manually called by 52 others . Evidence is also in line with the autoannotation made by glimmer and genemark /note=Location call:It is a real gene and the strat site is Start 3 at 38478 based on the evidence above /note=Function call:HNH endonuclease. All PhagesDB blast, NCBI blast and HHpred all indicate that the function of the gene is a HNH endonuclease. The top two hits on phagesDB blast (e-value=3e-72 for phage Ading and Adlitam). The top hit on NCBI blast(e-value =2.25e-84, coverage over 70%, identity over 80%) HHpred top hits also indicate the function to be HNH endonuclease(evalue=1e-18 and 4e-17, coverage over 70% and identity over 99%). However there were no significant hits on CDD. /note=Transmembrane domains: No transmembrane domain predicted on TOPCONS, TMHMM and deep TMHMM .Which indicates that the gene is not responsible for a membrane protein /note=Secondary Annotator Name: Rheinhardt, Jenna /note=Secondary Annotator QC: I agree with the location and function call. Just remember to do the synteny box. CDS 38910 - 42746 /gene="92" /product="gp92" /function="LysM-like peptidoglycan binding protein" /locus tag="JulietS_92" /note=Original Glimmer call @bp 38910 has strength 12.02; Genemark calls start at 38910 /note=SSC: 38910-42746 CP: yes SCS: both ST: SS BLAST-Start: [gp92 [Mycobacterium phage Cali] ],,NCBI, q1:s1 100.0% 0.0 GAP: 48 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.908, -2.8454123590742793, yes F: LysM-like peptidoglycan binding protein SIF-BLAST: ,,[gp92 [Mycobacterium phage Cali] ],,YP_002224565,100.0,0.0 SIF-HHPRED: SIF-Syn: LysM-like peptidoglycan binding protein, upstream gene is HNH endonuclease, downstream gene is NKF, like in McGee and Quasimodo /note=Primary Annotator Name: Okahata, Leila /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 38910 bp. Start codon ATG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. The chosen start site includes all of the coding potential. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -2.845. It is the best final score on PECAAN. /note=Gap/overlap: 48 bp gap. This gap is very small and reasonable (less than 50 bp). /note=Phamerator: Pham 65637. Date 1/20/2023. It is conserved and found in Grungle, Daffodil, and ParkTD, which are all in the same cluster as JulietS (C). /note=Starterator: Start site 3 in Starterator was manually annotated in 136 /162 non-draft genes in this pham. Start 3 is 38910 bp in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and likely has a start site at 38910 bp. /note=Function call: LysM-like peptidoglycan binding protein. The top two phagesDB BLAST hits are of unknown function (E-value = 0) and the top two NCBI BLAST hits are of unknown function (E-value = 0, 99.76%+ identity, 100% coverage). However, the 3rd and 4th top NCBI BLAST hits were LysM domain proteins (E-value = 0, >99.45%, 100% coverage). The top two CDD hits were both related to LysM motifs/domains (E-values < 4.42e-11) but had low identity (<45.4545%) and low coverage (<3.59937%). HHpred had no significant hits. To confirm this gene is a LysM-like peptidoglycan binding protein, this sequence was aligned on HHpred with BananaFence_93, which is confirmed as a LysM-like peptidoglycan binding protein. The two sequences were aligned: https://toolkit.tuebingen.mpg.de/jobs/JulietS_LysM /note=Transmembrane domains: Neither TMHMM, TOPCONS, or DeepTMHMM predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Tseng, Kylie /note=Secondary Annotator QC: I agree with the primary annotator`s location and function calls. CDS 42782 - 46642 /gene="93" /product="gp93" /function="hypothetical protein" /locus tag="JulietS_93" /note=Original Glimmer call @bp 42782 has strength 9.46; Genemark calls start at 42782 /note=SSC: 42782-46642 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_SHAQNATO_90 [Mycobacterium phage Shaqnato]],,NCBI, q1:s1 100.0% 0.0 GAP: 35 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.141, -2.356391881054909, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_SHAQNATO_90 [Mycobacterium phage Shaqnato]],,QAY05050,99.8445,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hegde, Priya /note=Auto-annotation: Glimmer and GeneMark call a start site of 42782 (GTG codon). /note=Coding Potential: Good coding potential in GeneMark Host and Self. Coding potential on this ORF is on the forward strand only. /note=SD (Final) Score: -2.356. This is the highest final score in PECAAN. /note=Gap/overlap: There is a 35bp gap with the upstream gene, which is reasonable. /note=Phamerator: Gene is in pham 427 (date accessed: 01/17/23). It is conserved, found in phages Iota and Koguma. /note=Starterator: Start site 3 at 42782 has 160 manual annotations. It is the most manually annotated start site on Starterator. /note=Location call: The high final score and Z-score greater than 2 support the start site at 42782 called by Glimmer and Genemark. Additionally, this is supported by the high number of manual annotations for this start site in Starterator. /note=Function call: NKF in NCBI BLAST; top hits were phages Shaqnato and QBert (both had e-values of 0). NKF in PhagesDB BLASTp; top hits were Shaqnato and LRRHood (both had e-values of 0). No convincing evidence in HHPred (high e-values). No CDD hits. /note=Transmembrane domains: No TMDs predicted in DeepTMHMM, so this protein is not a membrane protein. /note=Secondary Annotator Name: Shah, Amay /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 46673 - 47215 /gene="94" /product="gp94" /function="hypothetical protein" /locus tag="JulietS_94" /note=Original Glimmer call @bp 46673 has strength 15.21; Genemark calls start at 46673 /note=SSC: 46673-47215 CP: yes SCS: both ST: SS BLAST-Start: [gp96 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 1.39734E-124 GAP: 30 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.986, -3.2535385006449706, yes F: hypothetical protein SIF-BLAST: ,,[gp96 [Mycobacterium phage Bxz1] ],,NP_818169,100.0,1.39734E-124 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Lee, Amber /note=Auto-annotation: Glimmer and GeneMark both call the start site at 46673 bp. The start codon is ATG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: The start site 46673 corresponds to a Final Score of -3.254 which is the least negative and therefore the best final score. It also has a Z-score greater than 2. /note=Gap/overlap: There is a 30 bp gap which is reasonable because the gap is conserved in other phages (BadAgartude, BackyardAgain) of the same cluster, it is less than 50 bp, and there is no coding potential in the gap that might be a new gene. /note=Phamerator: pham: 418. Date 1/11/2023. It is conserved; found in BadAgartude (C1) and BackyardAgain (C1). /note=Starterator: Start site 1 in Starterator was manually annotated in 157/168 non-draft genes in this pham and is the most manually annotated start site. Start 1 is 46673 bp in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 46673 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: NKF. HHPRED had a hit that had sufficient coverage and an e-value of 0.63, however it was not sufficient evidence. Many strong hits (e-value ~0) are observed in PhagesDB and NCBI BLAST, but no known function. No hits in CDD. Not a membrane protein because it wasn’t called by TMHMM, Deep TMHMM, or TOPCONS. /note=Transmembrane domains: Neither TMHMM, Deep TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Kumar, Preyasi /note=Secondary Annotator QC: I agree with this gene`s function call and location. CDS 47219 - 48220 /gene="95" /product="gp95" /function="major capsid protein" /locus tag="JulietS_95" /note=Original Glimmer call @bp 47219 has strength 19.54; Genemark calls start at 47219 /note=SSC: 47219-48220 CP: yes SCS: both ST: SS BLAST-Start: [major capsid protein [Mycobacterium phage Tonenili] ],,NCBI, q1:s1 100.0% 0.0 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.799, -5.227163949899647, no F: major capsid protein SIF-BLAST: ,,[major capsid protein [Mycobacterium phage Tonenili] ],,YP_009287973,99.6997,0.0 SIF-HHPRED: Gp27 major capsid protein; phage G, major capsid protein, decoration protein, capsid, icosahedral, gp26, gp27, VIRUS; 6.1A {Bacillus virus G},,,6WKK_D,87.988,100.0 SIF-Syn: Major capsid protein, upstream gene is NKF with pham of 418, downstream is not conserved, just like in phage daffodil and chomp /note=Primary Annotator Name: Qi, Haocheng /note=Auto-annotation:Both Glimmer and genemark, they agree on the same site which is 47219 the start codon is ATG /note=Coding Potential:Coding potential is found both in GeneMark Self and Host, and there is only forward potential, and the chosen start site does include all of the coding potential. /note=SD (Final) Score: -5.227, the best final score on PECAAN /note=Gap/overlap:3, the smallest gap and also conserved in other phages such as daffodil /note=Phamerator: pham: 425. Date 01/08/23. It is conserved; found in other 181 non-draft phages, such as Ading or Bread. /note=Starterator: Start site 3 in Starterator was manually annotated in 164/168 non-draft genes in this pham. Start 3 is 47219 in JulietS. This evidence agrees with the site predicted by both Glimmer and GeneMark. /note=Location call:Based on the above evidence, this is a real gene and the most likely start site is 47219 /note=Function call: Major capsid protein. The top 2 phagesdb BLAST hits (phage:ZygoTaiga; Zeenon) have the function of major capsid protein with e-value of 0, and the ncbi blast also have major capsid protein with e-value of 0 for phage: Mycobacterium phage Tonenili and Mycobacterium phage ScoobyDoobyDoo. There are also no hits in CDD, the largest possibility in HHpred is 99.96%, with the e-value of 8.5e-27 with the function of major capsid protein, so overall this gene is for major capsid protein. /note=Transmembrane domains: Neither TMHMM or TOPCONS or DeepTMHMM predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Uyemura, Antonio /note=Secondary Annotator QC: I agree with this call. Based on the above evidence this gene is real, starts at 47219, and has the function of major capsid protein. Don`t forget to add DeepTMHMM. CDS 48305 - 48994 /gene="96" /product="gp96" /function="Holliday junction resolvase" /locus tag="JulietS_96" /note=Original Glimmer call @bp 48518 has strength 6.76; Genemark calls start at 48518 /note=SSC: 48305-48994 CP: yes SCS: both-cs ST: NI BLAST-Start: [holliday junction resolvase [Mycobacterium phage Melpomini] ],,NCBI, q1:s1 100.0% 1.48067E-170 GAP: 84 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.008, -6.822112620101109, no F: Holliday junction resolvase SIF-BLAST: ,,[holliday junction resolvase [Mycobacterium phage Melpomini] ],,QAX93621,99.5633,1.48067E-170 SIF-HHPRED: SIF-Syn: Holliday junction resolvase, upstream NKF, downstream major capsid protein. Conserved in Grungle. /note=Primary Annotator Name: Uyemura, Antonio /note=Auto-annotation: Both Glimmer and GeneMark call the start site as 48518. The start codon is ATG. /note=Coding Potential: There is good coding potential in both the Host trained and self trained GeneMark. Additionally, the coding potential for this ORF is only in the forward strand indicating that it is a forward gene. /note=SD (Final) Score: Final score -6.822. Not ideal but this start site covers the coding potential and has the smallest gap. Z-score is 1.008. /note=Gap/overlap: 84 bp. This is a smallest gap and lengthens the gene to 690 which is similar to other genes in the 3757 pham.This is also the length of the LORF. /note=Phamerator: 54797 1/24/23. When looking into this pham, there were only 2 non draft phages. But when looking at the Phagesdb Function Frequency Pham 3757 has a better hit. When comparing this gene to Ava3 which is in pham 3757, through NCBI BLASTp, they are identical. Therefore, I believe that the actual pham for this gene should be 3757. /note=Starterator: /note=Location call: Based on the above evidence, this is a real gene that starts at 48305. /note=Function call: Holliday junction resolvase. The top phages in Phagesdb BLAST (EmmaElysia C1, Melpomini C1, and Napoleon13 C1) call the function as Holliday junction resolvase with an evalue of 1e-140. Additionally the top 2 NCBI BLAST hits also call the function as Holliday junction resolvase (100% coverage, 99% identity, and E-values <10^-170). HHPRED is not convincing evidence nor did anything show up for CDD. /note=Transmembrane domains: DeepTMHMM predicts it to be inside the cell with 100% probability. /note=Secondary Annotator Name: Aunger, Sarah /note=Secondary Annotator QC: This call was difficult since the Pham was completely wrong; however, I am comfortable that I can confirm the evidence shown in the Pham 3757 with the start site chosen are agreeable with what is put here for the function call for this gene especially with the NCBI BLASTp and Phagesdb BLASTp all calling a Holliday junction resolvase. CDS 49032 - 49664 /gene="97" /product="gp97" /function="hypothetical protein" /locus tag="JulietS_97" /note=Original Glimmer call @bp 49032 has strength 21.25; Genemark calls start at 49032 /note=SSC: 49032-49664 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_SHAQNATO_94 [Mycobacterium phage Shaqnato]],,NCBI, q1:s1 100.0% 3.67685E-150 GAP: 37 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.381, -3.867841122493862, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_SHAQNATO_94 [Mycobacterium phage Shaqnato]],,QAY05054,100.0,3.67685E-150 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Barrera, Alexis /note=Auto-annotation: Glimmer and Genemark. Both call the start located at 49032. The start codon is ATG, which is common (about half of all genes have this start). /note=Coding Potential: Coding potential in this reading frame is in the forward direction only, which indicates this is a forward gene. Coding potential is found in both GeneMark Self and Host. All of the coding potential is included. /note=SD (Final) Score: The z score is 2.381. The final score is -3.868 which is the highest value of all listed start sites. /note=Gap/overlap: 37 bps gap upstream of the gene and a 79 bp gap downstream of the gene. This gene and these gaps are conserved in phages Nappy, TinyTim, and Yucca. There is no other coding potential in these regions. /note=Phamerator: Pham: 66975. Date 01/14/23. The gene is conserved in phages JustHall and BeanWater which are in the same cluster as JulietS. /note=Starterator: Start site 82 was called for 155 of 349 non-draft phage genomes in the pham. It is the most annotated start site and is called 97.1% of the time when present. /note=Location call: Considering the evidence above, this gene is a real gene and has a start site at 49032 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: NKF. Phagesdb BLASTp shows phages Shaqnato, Grungle, and Wally from cluster C1 having no function and an E-value of e-114, e-114, and e-113 respectively. NCBI BLASTp shows phages Shaqnato, Grungle, and QBert with no known function, full coverage and E-values of e-150, e-149, and e-149 respectively. Phagesdb Function Frequency showed the function capsid decoration protein called for four phages, at a frequency of 100%, however based on the other evidence, this should be disregarded. HHpred and CDD showed no relevant hits. /note=Transmembrane domains: DeepTMHMM predicts no TMRs. This evidence indicates that this is not a membrane protein. /note=Secondary Annotator Name: Wu, Angus /note=Secondary Annotator QC: I agree with the location and function call. CDS 49746 - 50285 /gene="98" /product="gp98" /function="head fiber protein" /locus tag="JulietS_98" /note=Original Glimmer call @bp 49746 has strength 12.86; Genemark calls start at 49746 /note=SSC: 49746-50285 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein DANDELION_109 [Mycobacterium phage Dandelion] ],,NCBI, q1:s1 100.0% 2.17335E-116 GAP: 81 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.152, -2.604595770381943, yes F: head fiber protein SIF-BLAST: ,,[hypothetical protein DANDELION_109 [Mycobacterium phage Dandelion] ],,YP_009012883,100.0,2.17335E-116 SIF-HHPRED: Head fiber protein; supercoiled triple repeating helix-turn-helix, VIRAL PROTEIN; 1.52A {Bacillus phage phi29},,,3QC7_A,76.5363,99.6 SIF-Syn: This gene displays synteny. This gene, the gene preceding it (NKF, pham 72427), and the gene following it (NKF, pham 522) are conserved in the same order in phages Ading (C1) and Adlitam (C1). /note=Primary Annotator Name: Kristianto, Luke /note=Auto-annotation: Both Glimmer and GeneMark call the gene, and they agree on the start site at 49746 bp. The start codon is ATG, which has a high probability of being used as a start site. /note=Coding Potential: Coding potential in this ORF is on the forward strand only on the third frame, indicating that this is a forward gene. The ORF has reasonable coding potential and the called start site does capture all of the coding potential. /note=SD (Final) Score: The final score is the best option at -2.605 and the z-score is the best option at 3.152. This provides strong evidence that the called start site is the real start site. /note=Gap/overlap: The gap/overlap is fairly large at 81 bp. However, this gap is justified because it is the smallest possible gap out of the potential start sites and there is no coding potential within the gap. Furthermore, this gap is conserved and shows up in phage Ading from cluster/subcluster C1. /note=Phamerator: Pham: 516. Date 1/18/2023. It is conserved and found in Ading (C1) and Adlitam (C1). /note=Starterator: Start number 5 in Starterator was manually annotated in 157/157 non-draft genes in this pham. Start number 5 is 49746 bp in phage JulietS. This is likely the start site because it was called the most and was conserved in 171/171 (100.0%) of genes in the pham. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 49746 bp. Starterator agrees with Glimmer and GeneMark. /note=Function call: Head fiber protein. The top two PhagesDB BLAST hits have the function of “head fiber protein” (99% identity, E-value = 1e-95), and the top two NCBI BLAST hits have the function of “head fiber protein” (99.4413%/98.324% identity, E-value = 2.17335e-116/2.8631e-115). HHpred agreed that the gene may have the function of a head fiber protein (99.6% probability, E-value = 1.3e-14). Using HHpred, a comparative analysis of the gene and a gene within the same pham Capablanca_104, which is manually annotated as a head fiber protein, showed that the gene is most likely a head fiber protein (100% probability, E-value = 9.4e-52). A comparative analysis of the gene and another gene within the same pham ChaylaJr_97, which is also manually annotated as a head fiber protein, backed that the gene is most likely a head fiber protein (100% probability, E-value = 4e-50). Results from CDD were irrelevant because no results came up. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs. DeepTMHMM agrees (1.0 probability outside throughout). Therefore, it is likely not a membrane protein. /note=Secondary Annotator Name: Hines, Kia /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 50355 - 50573 /gene="99" /product="gp99" /function="hypothetical protein" /locus tag="JulietS_99" /note=Original Glimmer call @bp 50355 has strength 21.32; Genemark calls start at 50355 /note=SSC: 50355-50573 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_BEANWATER_95 [Mycobacterium phage BeanWater] ],,NCBI, q1:s1 100.0% 1.93406E-43 GAP: 69 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.062, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BEANWATER_95 [Mycobacterium phage BeanWater] ],,ATN87549,100.0,1.93406E-43 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Shah, Amay /note=Auto-annotation: Glimmer and Genemark. Both call the start site at 50355. /note=Coding Potential: Coding potential in this ORF is in the forward direction only, indicating that this is a forward gene. Coding potential is found in both GeneMark Self and Host. /note=SD (Final) Score: The final score is -2.443. This is the second best final score on PECAAN. /note=Gap/overlap: There is a reasonable gap of 69bp. /note=Phamerator: pham: 522. Date 1/10/2023. It is conserved; found in Alice (C) and Astraea (C). /note=Starterator: Start site 2 in Starterator was manually annotated in 155/157 non-draft genes in this pham. Start 2 is 50355 in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 50355. /note=Function call: NFK. BLASTp hits only included hypothetical proteins. CDD returned no hits. HHPRED returned no significant hits. The top BLAST hit was a hypothetical protein with an e-value of 1.93e-43, so there is not enough evidence to determine a function call. /note=Transmembrane domains: Deep TMHMM doesn`t predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Lee, Amber /note=Secondary Annotator QC: I agree with the function and location calls. CDS 50589 - 50753 /gene="100" /product="gp100" /function="hypothetical protein" /locus tag="JulietS_100" /note=Original Glimmer call @bp 50589 has strength 19.71; Genemark calls start at 50589 /note=SSC: 50589-50753 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_BEANWATER_96 [Mycobacterium phage BeanWater] ],,NCBI, q1:s1 100.0% 1.0549E-28 GAP: 15 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.152, -3.16089827114923, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BEANWATER_96 [Mycobacterium phage BeanWater] ],,ATN87550,100.0,1.0549E-28 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sandhu, Muskaan /note=Auto-annotation: The start site in Glimmer and GeneMark are the same: 50589. Start codon is ATG, which is more common. /note=Coding Potential: There is coding potential present in ORF 3 of the direct sequence (as expected for a forward gene) in the Host-trained & Self-trained Genemark. Start site 50589 covers all of the coding potential. /note=SD (Final) Score: -3.161. This is the best score and it corresponds to the start site reported by Glimmer and Genemark. /note=Gap/overlap: There is a 16 bp gap upstream my gene and there is a 44 bp gap downstream my gene. Gap is reasonable because it is conserved in phage Bangla1971 and there is no coding potential present to fill in the gaps in Host-Trained and Self-Trained Genemark. /note=Phamerator: Pham 64521 as of 1/18/23. It is conserved, found in Alice (C1) and Bread (C1). /note=Starterator: 139/145 of non-draft genes call start site 5. Start site 5 is @50589 bp in JulietS. Evidence agrees w/ start site predicted by Glimmer and GeneMark. /note=Location call: Start site 5 @50589 bp. /note=Function call: No known function. The top three PhagesDB BLAST hits have the function of hypothetical proteins (E-values = 2*10-23) and the top three NCBI BLAST hits also have the function of hypothetical proteins (have 100% coverage, >96% identity, and E-value <10-27). HHpred had no relevant hits with E-values being greater than 2. CDD had no relevant hits as well. /note=Transmembrane domains: DeepTMHMM does show any TMDs, therefore, this is not a transmembrane protein. /note=Secondary Annotator Name: Fernandez, Mackenzie /note=Secondary Annotator QC: I agree with the primary annotators location call and function call based on the provided evidence. CDS 50797 - 52380 /gene="101" /product="gp101" /function="hypothetical protein" /locus tag="JulietS_101" /note=Original Glimmer call @bp 50797 has strength 9.09; Genemark calls start at 50797 /note=SSC: 50797-52380 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_GRUNGLE_93 [Mycobacterium phage Grungle]],,NCBI, q1:s1 100.0% 0.0 GAP: 43 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.152, -2.905625766045924, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_GRUNGLE_93 [Mycobacterium phage Grungle]],,QDP44459,100.0,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Wu, Angus /note=Auto-annotation: Using both Glimmer and GeneMark, they agree that the host start site is 50797. The start codon that is called is ATG. /note=Coding Potential: The gene has good coding potential with the host-trained GeneMark on ORF 1. The coding potential increases near the start site potential start site (52380) with a little bit of up and down, until it plateaus until the stop codon where it tapers off. The gene has better coding potential with the self-trained GeneMark on ORF 1, where the coding potential rises at the start site and plateaus until the stop site. Finally, the start and stop sites also encompass all of the coding potential. /note=SD (Final) Score: The Final Score is -2.906, and the Z score is 3.152. The final score for this start site has the best sequence match (highest final score), and also has the highest Z-score of the possible start site candidates. /note=Gap/overlap: There is a 43 base pair gap with the gene upstream, and this is reasonable/acceptable because the gap isn’t too long (>100bp). This chosen start site does not have the longest open reading frame, but it is the 2nd longest and has a gene length of 1584 bp. /note=Phamerator: The gene is found in Pham 58381 as of 1/15/23. The gene is conserved in other members of subcluster C1, and I used phage Stubby and Turret for comparison. /note=Starterator: A start site choice exists that is conserved among members of the pham, and corresponds to start site 5 and 50797. 77/158 non-draft genomes in the pham call this start site, and it is the most annotated start site. /note=Location call: The available evidence suggests that the start site is 50797. The gene appears to be real, and this proposed start site covers all the coding potential, and has a good final score and z-score. It is the most commonly called site on starterator and called 53% of the time when present. /note=Function call: The function of this protein is unknown (NKF). All results from PhagesDB BLASTp show hits with 100% identity, 100% alignment, and 100% coverage, with e-values 0, and those proteins have no known function. Similarly on NCBI BLASTp, e-values are also 0, and those proteins also have no known function. /note=There were no hits on CDD, and HHpred results were not desirable, as e-values were too high. /note=Transmembrane domains: Neither DeepTMHMM or TOPCONS predicted any TMDs, this is not a membrane protein. /note=Secondary Annotator Name: Rheinhardt, Jenna /note=Secondary Annotator QC: I agree with the primary annotators location call and function call based on the provided evidence. CDS 52373 - 53908 /gene="102" /product="gp102" /function="hypothetical protein" /locus tag="JulietS_102" /note=Original Glimmer call @bp 52373 has strength 10.57; Genemark calls start at 52373 /note=SSC: 52373-53908 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein KHO63_gp207 [Mycobacterium phage QBert] ],,NCBI, q1:s1 100.0% 0.0 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.72, -3.221100372932034, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein KHO63_gp207 [Mycobacterium phage QBert] ],,YP_010058195,100.0,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Davis, Kayla /note=Auto-annotation: Both Glimmer and GeneMark listed a start site of 52373, which corresponds to a start codon of ATG. /note=Coding Potential: There is good coding potential within this gene as both the Self-trained and Host-trained Genemark have direct sequences that are complete. In addition to that, the gene shares synteny with phages Astraea, and Lukilu. /note=SD (Final) Score: The Z-score that was chosen was one of 2.72, with a final score of -3.221, which are both the best candidates out of the ones listed. This also corresponds with the same start site of 52373. /note=Gap/overlap: There is an overlap of 8 base pairs, as evidenced by the -8 gap. /note=Phamerator: As of 01/18/23, this gene was found to be a part of Pham 548. It was found to be in several other C1 phages such as JustHall, Kamryn, and Jessibeth. /note=Starterator: As of 01/13/23, the most called start number of the published annotations is 8, and it was called in 150/156 non-draft genes in the pham. In JulietS, start site 8 doesn’t exist in the candidate starts. The start that did have the most MA’s was start site 2 which was at 52373. /note=Location call: Based on the evidence above, this is likely to be a real gene with a start site of 52373. /note=Function call: NKF. Based on the evidence, there doesn’t appear to be a known function for this protein. The NCBI Blast had shown that there were several hits with e-values of 0.0 for a hypothetical protein. All of the top results that were provided for HHpred all had extremely high e-values as well. In addition to that, there were no hits for CDD. /note=Transmembrane domains: Deep TMHMM predicted 0 TMRs, meaning that this is not a transmembrane protein. /note=Secondary Annotator Name: Luk, Jarrett /note=Secondary Annotator QC: I agree with the location and function call but include deep TMHMM info in transmembrane domain CDS 53908 - 54120 /gene="103" /product="gp103" /function="hypothetical protein" /locus tag="JulietS_103" /note=Original Glimmer call @bp 53908 has strength 16.37; Genemark calls start at 53908 /note=SSC: 53908-54120 CP: yes SCS: both ST: SS BLAST-Start: [gp105 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 1.43653E-41 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.986, -2.9525085049809894, yes F: hypothetical protein SIF-BLAST: ,,[gp105 [Mycobacterium phage Bxz1] ],,NP_818178,100.0,1.43653E-41 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Pham, Truc /note=Auto-annotation: The two auto-annotation algorithms, Glimmer and GeneMarks, both call the start of this gene at 53908 with the start codon of ATG. /note=Coding Potential: The coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host and all of the ORF is included with the 53908 start site. /note=SD (Final) Score: The SD (Final) Score for this start site is -2,953, which is the best score of all the possible start sites on PECAAN. Additionally, this start site also has the best Z-score of 2.986 on PECAAN. /note=Gap/overlap: This gene has an overlap of 1 base pair with the previous gene. This is the best possible start site since any other option would increase the size of the gap, which will potentially require the addition of another gene. /note=Phamerator: This gene belongs to pham number 509 as of 1/08/2023. The gene is conserved in phages of this cluster (C) like Babyland and Blackbrain. There is no function listed for members of this family, so it is highly likely that this is a gene with an unknown function. /note=Starterator: Start site 4 is most often called as it was manually annotated in 157/157 non-draft genes in the pham. Start 4 is 53908 in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 53908. /note=Function call: No Known Function. The top three phagesdb BLAST hits are function unknown (E-value <10^-35), and the top three NCBI BLAST hits also have no known function for this gene (98.57+% coverage, 100% identity, and E-value <10^-41). HHpred’s top hit indicates a different function but many hits called a domain of unknown function (27+% probability, 33+% coverage, and E-value <210). CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Qi, Haocheng /note=Secondary Annotator QC: I agree with the location and function call, good work. CDS 54123 - 54494 /gene="104" /product="gp104" /function="membrane protein" /locus tag="JulietS_104" /note=Original Glimmer call @bp 54123 has strength 14.27; Genemark calls start at 54123 /note=SSC: 54123-54494 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein ET08_97 [Mycobacterium phage ET08] ],,NCBI, q1:s1 100.0% 9.96617E-84 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.58, -3.5316035464369326, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein ET08_97 [Mycobacterium phage ET08] ],,YP_003347776,100.0,9.96617E-84 SIF-HHPRED: Iron_permease ; Low affinity iron permease,,,PF04120.15,73.1707,99.9 SIF-Syn: /note=Primary Annotator Name: Deng, Yiran /note=Auto-annotation: Glimmer and GeneMark both call the start site at 54123 with start codon ATG /note=Coding Potential: Coding potential in this ORF is found on forward direct sequence and no switch in orientation is observed in both host-trained and self-trianed Genemark. All the coding potential included in the ORF and by the selected start site. /note=SD (Final) Score: final score of -3.532 and z-value of 2.58, which is the best final score on PECAAN /note=Gap/overlap: the gap with the upstream gene is 2 bp. This gene is conserved in several other phages of the same cluster (Fludd, Grungle) and the gap does not contain coding potential and was seen in Fludd and Grungle as well. /note=Phamerator: pham: 639, date 01/20/2023. It is conserved; found in Ading (C) and Alice (C). /note=Starterator: Start site number 103 in Starterator had the highest manual annotation in 138/144 non-drafted genes in this pham. Start site 103 is at position 54123 in JulietS, which agrees with the auto-annotated site by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 54123. /note=Function call: NKF. None of phagesDB BLAST, NCBI BLAST, or CDD shows any significant relevant hits with known functions, only significant probability or coverage to hits with unknown function. Although HHpred top hit is low affinity iron permease with 99.9% probability, >70% coverage and e-value of 5.1e-24. /note=Transmembrane domains: TMHMM predicts two TMHs, though Topcons predicts none. Two TMHMM predictions can serve as supportive evidence to assume this gene to have a real TMD, and is therefore a membrane protein with no specific known function. /note=Secondary Annotator Name: Kumar, Preyasi /note=Secondary Annotator QC: I agree with this gene`s function call and location. Be sure to select any NCBI Blast and CDD hits as evidence where relevant. CDS 54494 - 54724 /gene="105" /product="gp105" /function="hypothetical protein" /locus tag="JulietS_105" /note=Original Glimmer call @bp 54494 has strength 11.3; Genemark calls start at 54494 /note=SSC: 54494-54724 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_INTERFOLIA_116 [Mycobacterium phage InterFolia]],,NCBI, q1:s47 100.0% 1.60744E-46 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.062, -2.583959800616441, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_INTERFOLIA_116 [Mycobacterium phage InterFolia]],,AVR77616,62.2951,1.60744E-46 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Santilla, Matthew /note=Auto-annotation: Both Glimmer and Genemark agree that the start site is @ 54494. Start codon is ATG. /note=Coding Potential: On the host-trained Genemark and the self-trained Genemark we can see that the coding potential does cover the from the start site @ 54494 and the stop site @ 54724 with overlap. /note=SD (Final) Score: The final score is -2.584 which is the least negative number shown with a z-score of 3.062 which is way above the recommended z-score of 2. /note=Gap/overlap: There is a -1 overlap in this gene which may indicate that it is an operon. /note=Phamerator: This gene is in pham 65553. This gene is found in phages BackyardAgain, Bigswole, and CindyLou which are all found in the same cluster as JulietS. /note=Starterator: Start site 6 is @ 54494 in JulietS which agrees with auto annotated start site by Glimmer and Genemark. /note=Location call: Based on the aforementioned evidence the start site is most likely @ 54494. /note=Function call: The top 3 hits on PhagesDB (phages Tyke, Stubby, and Koguma) all have an e-value of 2e^-43 and list the function as unknown. NCBI blast showed phage InterFolia with an e-value of 2e^-46 and percent identity of 100% had the function listed as hypothetical protein and phage Bxz1 with an e-value of 5e^-46 and a percent identity of 100% had the function listed as gp107. CDD had no relevant hits. HHPred had no relevant hits. /note=Transmembrane domains: Deep TMHMM did not predict any transmembrane domains, so this gene is not a membrane protein. /note=Secondary Annotator Name: Shah, Amay /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 54727 - 54948 /gene="106" /product="gp106" /function="hypothetical protein" /locus tag="JulietS_106" /note=Original Glimmer call @bp 54727 has strength 12.8; Genemark calls start at 54727 /note=SSC: 54727-54948 CP: yes SCS: both ST: SS BLAST-Start: [gp108 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 1.97836E-44 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.056, -5.200500400297053, yes F: hypothetical protein SIF-BLAST: ,,[gp108 [Mycobacterium phage Bxz1] ],,NP_818181,100.0,1.97836E-44 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Arredondo, Alexis /note=Auto-annotation: Both Glimmer and Genemark mark the start site at 54727, with start codon ATG. /note=Coding Potential: Coding potential is demonstrated in both the host-trained and self-trained GeneMark, based on the selected start site, running the length of the gene. /note=SD (Final) Score: The gene candidate has the best Z-score and final score, 2.056 and -5.201, making it a very strong candidate. /note=Gap/overlap: The gap is 2, which is very reasonable and does not suggest that the gene may be part of an operon. /note=Phamerator: The gene belongs to pham 67085, dated 1/13/23. It is conserved, being found in Momo (C1) and Phlegm (C1). /note=Starterator: Start site 1 was manually annotated in 171 of the 171 non-draft genes in the pham. This start site is 54727, which agrees with the auto annotation by Glimmer and GeneMark, and further validates the start site. /note=Location call: Based on the agreed start site provided by both Glimmer and GeneMark, 54727, which is validated by starterator, along with the consideration that the gene is conserved in both Momo (C1) and Phlegm (C1), this gene is most likely real. /note=Function call: NKF. The NCBI Blast produced multiple hits, which were listed as a hypothetical protein (NKF). One significant hit had a probability of 100%, an e-value of 1.97e-44, and percent coverage of 100%, indicating that the selected gene also does not have a known function. HHPred did not have a significant hit, and neither did CDD. However, given the statistics of the single NCBI Blast, it is most definitely a protein with no known function. Additionally, Phages DB lists two phages Essence and Flabslab with an e-value of 2e-40 with no known function. /note=Transmembrane domains: Deep TMHMM does not list TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Aunger, Sarah /note=Secondary Annotator QC: I agree with the function call and location of the gene. Just make sure to add your DeepTMHMM conclusions! CDS 54978 - 55178 /gene="107" /product="gp107" /function="hypothetical protein" /locus tag="JulietS_107" /note= /note=SSC: 54978-55178 CP: no SCS: neither ST: NI BLAST-Start: [hypothetical protein KHO60_gp208 [Mycobacterium phage CharlieB] ],,NCBI, q1:s1 100.0% 4.30044E-39 GAP: 29 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.098, -5.368077452953389, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein KHO60_gp208 [Mycobacterium phage CharlieB] ],,YP_010057512,100.0,4.30044E-39 SIF-HHPRED: SIF-Syn: CDS 55212 - 55478 /gene="108" /product="gp108" /function="hypothetical protein" /locus tag="JulietS_108" /note=Original Glimmer call @bp 55212 has strength 11.72; Genemark calls start at 55227 /note=SSC: 55212-55478 CP: yes SCS: both-gl ST: SS BLAST-Start: [gp111 [Mycobacterium phage Spud] ],,NCBI, q1:s1 100.0% 1.54288E-56 GAP: 33 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.22, -3.2136105537450197, yes F: hypothetical protein SIF-BLAST: ,,[gp111 [Mycobacterium phage Spud] ],,YP_002224360,100.0,1.54288E-56 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kumar, Preyasi /note=Auto-annotation: Glimmer calls the start at 55212, and GeneMark calls the start at 55227, ATG. /note=Coding Potential: Yes, the gene has reasonable coding potential predicted within the putative ORF and the chosen start site covers all this coding potential. Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.214. It is the best final score on PECAAN. Z-score of 3.22 is above 2, which is strong. /note=Gap/overlap: 263bp. Slightly large, but the smallest and most reasonable candidate. There is no coding potential in the gap that might be a new gene. I didn’t choose a start site that would make a longer ORF because all other final scores are lower than the current start site. /note=Phamerator: Pham 461. Date 01/13/2023. It is conserved, found in ShiaLabeouf and Shrimp. /note=Starterator: Yes, there is a conserved start site choice. It is start number 2 with a base pair coordinate of 55212. 111 of 163 call site #2. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 55212 bp. /note=Function call: NKF; Both NCBI and PhagesDB database did not predict functions for this gene. The top three PhagesDB BLAST hits have unknown function (e-value < 10^-46). No conserved domains identified for this query sequence. /note=Transmembrane domains: Neither TMHMM or TOPCONS or DeepTMHMM predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Hines, Kia /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 55567 - 55686 /gene="109" /product="gp109" /function="hypothetical protein" /locus tag="JulietS_109" /note=Original Glimmer call @bp 55567 has strength 16.69; Genemark calls start at 55567 /note=SSC: 55567-55686 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SCOTTMCG_109 [Mycobacterium phage ScottMcG] ],,NCBI, q1:s1 100.0% 6.33104E-19 GAP: 88 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.986, -2.6814417326944517, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SCOTTMCG_109 [Mycobacterium phage ScottMcG] ],,YP_002224137,100.0,6.33104E-19 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Rheinhardt, Jenna /note=Auto-annotation: Glimmer and GeneMark call 55567 as start. Start codon of ATG. /note=Coding Potential: Coding potential within host and self trained with the range of 55567 to 55686. Potential on the first forward strand. /note=SD (Final) Score:-2.681 ; best final score presented on PECAAN. Z-score of 2.986, second best score presented. /note=Gap/overlap: 88bp gap ; within reasonable range and there is no coding range that falls within the gap. /note=Phamerator: pham : 1155. 86 phages match with 120 bp long. Both BananaFence and Bread are consistent. /note=Starterator: Calls start 6, 73 MA’s call this start. Start agrees with Glimmer and GeneMark of 55567. /note=Location call: Yes the evidence above supports that the start site is real. The most likely start site is 55567. /note=Function call: NKF ; Phagesdb BLAST presents functions unknown with phages Essence and ET08 with e-vales of 2e-16. HHPRED does have hits with 6QKP_A of a nucleoid-associated protein but it has a 66.3% probability and 71.7949% coverage so not reliable relatedness and its e-value is 20. NCBI BLAST has phage ScottMcG with 100% identity and 100% aligned for hypothetical protein. No CDD. /note=Transmembrane domains: DeepTmHmm presented no transmembrane domains so it is likely not a membrane protein. /note=Secondary Annotator Name: Tseng, Kylie /note=Secondary Annotator QC: I agree with the primary annotator`s location and function calls. CDS 55748 - 56242 /gene="110" /product="gp110" /function="hypothetical protein" /locus tag="JulietS_110" /note=Original Glimmer call @bp 55748 has strength 14.22; Genemark calls start at 55748 /note=SSC: 55748-56242 CP: yes SCS: both ST: SS BLAST-Start: [gp111 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 2.6251E-116 GAP: 61 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.062, -3.3503726477288405, yes F: hypothetical protein SIF-BLAST: ,,[gp111 [Mycobacterium phage Bxz1] ],,NP_818184,100.0,2.6251E-116 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Tseng, Kylie /note=Auto-annotation: Glimmer and Genemark both call the same start site at 55748; start codon: GTG /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF on the forward strand only on line 2 (indicating that this is a forward gene). The chosen start site covers this ORF. Coding potential is found in both the host and self-trained Genemark. There is synteny with non-draft phages such as Ading and Adlitam. /note=SD (Final) Score: -3.350. This is a reasonable SD score. /note=Gap/overlap: There is a gap of 61 bp which is reasonable. There is a smaller gap of 40 bp but that start site has a worse final score and worse Z-score as well. The 61 bp gap’s start site agrees with Glimmer and Genemark too. /note=Phamerator: Date of investigation: 1/19/23; Pham 491; Yes, the pham is in other members that belong to Cluster C, which is the cluster JulietS belongs to (i.e. phages BananaFence and BeanWater). /note=Starterator: Yes, there is a conserved start choice. It is start number 2 with a base pair coordinate of 55748. Has 84 MA’s. Found in 171/173 (98.8%) of genes in pham. This start site also agrees with Glimmer and Genemark’s call. E-values are good at -94. /note=Location call: Yes, the evidence suggests this gene is real. Start site 55748 is most likely. /note=Function call: NKF; CDD has no hits. NCBI predicted hypothetical proteins with coverages of 100%, identities over 99%, and e-values of -116. PhagesDB did not predict any functions for this gene. Phages Adlitam and Audrick support the NKF conclusion since they have synteny with this gene and no known function (e-values -94). HHPred has hits, but the coverage is low and the e-value is much higher than 0. /note=Transmembrane domains: No TMDs were predicted by DeepTMHMM; therefore it is not a membrane protein. /note=Secondary Annotator Name: Lee, Amber /note=Secondary Annotator QC: I agree with the function and location calls. CDS 56260 - 57165 /gene="111" /product="gp111" /function="hypothetical protein" /locus tag="JulietS_111" /note=Original Glimmer call @bp 56260 has strength 17.01; Genemark calls start at 56260 /note=SSC: 56260-57165 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ERNIEJ_109 [Mycobacterium phage ErnieJ]],,NCBI, q1:s1 100.0% 0.0 GAP: 17 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.632, -4.24928381388937, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ERNIEJ_109 [Mycobacterium phage ErnieJ]],,ALF51201,99.6678,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Aunger, Sarah /note=Auto-annotation: Glimmer and GeneMark both call the start site at 56260 with a start codon of ATG. /note=Coding Potential: Coding potential in this Open Reading Frame (ORF) is in the forward strand, indicating a forward reading gene. The start site of #56260 covers all of the potential coding regions on the forward strand, which supports the forward direction of this gene. Additionally, coding potential is found in both Host-Trained GeneMark and Self-Trained GeneMark graphs which suggests that this is a potential gene at start site #56260. /note=SD (Final) Score: Even though the final score is not the most negative (SD = -4.249), the Z-score is the highest overall score (Z-score = 2.632), indicating that the autogenerated start site is the better of the options. /note=Gap/overlap: There is a gap of 17 bp, which seems reasonable when looking at the syntenty of other phages classed in the C1 cluster against this gene There is a reasonal overlap between this gene and the gene downstream of about 28 bp which allows for the 906 bp gene to be reasonable /note=Phamerator: The gene was found to be in Pham 62065 (01/12/2023), which is common in Cluster C1 phages, as previously seen in Phages Ading, Ava, and Ewok. There were a differing types of functions mentioned but not enough to understand the function of this gene. However, the base pair length was conserved at 906 bp. /note=Starterator: There is a reasonable and highly conserved start site that was looked on 01/12/2023 at (9, 56260) which was called by 165 out of the 176 non-draft genes out of the 195 total pham members. /note=Location call: The gene and the start site are both conserved for this gene. With a high coding potential in the ORF it indicates a real gene’s placement in the operon and thus the evidence supports that the start site of this gene starts at #56260 and the gene is accurate to other phages within the C1 group. /note=Function call: The function of the gene is a capsid decoration protein. In the BLASTp on PhagesDB.org it has strong match against Lukuli with an e-value of 10-172 and 100% positives but has an unknown function.Yet, when looking at the NCBI BLASTp and the phage Lukilu it also has 100% positive match with an e-value of 0.0 but has the function of capsid decoration protein. Also, within NCBI BLASTp, there is a 99% match with phage Dandelion that also has the function of capsid decoration protein. There was also another match for capsid decoration protein with Phage LinStu with an e-value of 0.0 on NCBI BLASTp. HHpred and CDD both give inconclusive results. Thus, there is a probability that the function of this gene has the function of capsid decoration protein. /note=Transmembrane domains: TmHmm does not predict any transmembrane proteins and TOPCONS confirms this. Therefore there are no transmembrane proteins. DeepTMHMM also predicts no transmembrane domains as well, it does conclude that all of the encoding of this gene is done outside the cell. /note=Secondary Annotator Name: Melnyk, Mattie /note=Secondary Annotator QC: I agree with the location and function call of this gene CDS 57194 - 58012 /gene="112" /product="gp112" /function="hypothetical protein" /locus tag="JulietS_112" /note=Original Glimmer call @bp 57194 has strength 13.73; Genemark calls start at 57194 /note=SSC: 57194-58012 CP: yes SCS: both ST: SS BLAST-Start: [gp113 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 0.0 GAP: 28 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.153, -4.408951082954879, no F: hypothetical protein SIF-BLAST: ,,[gp113 [Mycobacterium phage Bxz1] ],,NP_818186,100.0,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Fernandez, Mackenzie /note=Auto-annotation: Glimmer and GeneMark agree. Both call a start of 57194. ATG start codon. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene, and covers the suggested start site. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.409. It is the best final score on PECAAN. Z-score is 2.153. /note=Gap/overlap: Gap of 28bp. The gap is conserved in other phages (Derek, McGee) and there are no ORFs longer than 120bp. /note=Phamerator: Pham: 502. Date 1/13/2023. Pham has 171 members and 14 phages are drafts. /note=Starterator: Start site 1 in Starterator was manually annotated in 157 of the 157 non-draft genes in the pham. Start 1 is 57194 in JulietS. This evidence agrees with the site predicted by GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 57194. /note=Function call: NKF. No, there is not enough evidence to suggest a function for this protein as both NCBI (e-values of 0 and 100% coverage) and PhagesDB (e-values of e-152, Ading, BackyardAgain) did not have any significant hits. CDD had zero hits. HHpred had potential hits with probability higher than 80% but the score was less than 50 for all hits. No significant e-values. /note=Transmembrane domains: 0 predicted TMRs from Deep TMHMM, therefore it is not a membrane protein. /note=Secondary Annotator Name: Qi, Haocheng /note=Secondary Annotator QC: I agree with the location and function call. However you should choose 2 examples in the phagesDB Blast and mention it in the function call CDS 58012 - 58620 /gene="113" /product="gp113" /function="hypothetical protein" /locus tag="JulietS_113" /note=Original Glimmer call @bp 58012 has strength 10.35; Genemark calls start at 58012 /note=SSC: 58012-58620 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_SPRINKLERS_113 [Mycobacterium phage Sprinklers]],,NCBI, q1:s1 100.0% 5.05081E-144 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.231, -2.1695583717155773, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_SPRINKLERS_113 [Mycobacterium phage Sprinklers]],,QAY13402,100.0,5.05081E-144 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Geghamyan, Knar /note=Auto-annotation: Both Glimmer and GeneMark call the gene and they agree that the start site is 36,893. The start codon is ATG. /note=Coding Potential: The ORF has good coding potential on the forward direct sequence and the chosen start site includes all of the coding potential. /note=SD (Final) Score: -2.170 is the best SD (final) score because it is the highest (least negative) value, and has the smallest gap of all of the choices. Z-score is above 2. /note=Gap/overlap: This start site has the lowest gap/overlap.There is an overlap of 1 bp upstream, implying that this gene could be part of an operon. /note=Phamerator: Gene found in pham 11 on 1/20/23. When compared to phages BackyardAgain, Megamind, and Shrimp, the pham in which this gene is most commonly annotated was found to be in other members of the same cluster C. There is synteny with other non-draft phages belonging to the same cluster. /note=Starterator: Start: 11 @58012 has 11 MA`s. The start number called the most often in the published annotations is 11, it was called in 154 of the 159 non-draft genes in the pham. /note=Location call: The evidence supports that this is a real gene, and the potential candidate start site at 59012 seems most likely. /note=Function call: NKF. There were no phagesdb BLAST hits, and the top 2 NCBI BLAST hits (Sprinklers and Cali) also agreed that there is NKF with 100% coverage, 99%+ identity, and E-value <10^-143. There were also no hits on CDD or HHpred. /note=Transmembrane domains: Neither TMHMM, deep TMHMM, or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Lee, Amber /note=Secondary Annotator QC: I agree with this function and location call CDS 58692 - 59318 /gene="114" /product="gp114" /function="hypothetical protein" /locus tag="JulietS_114" /note=Original Glimmer call @bp 58692 has strength 13.25; Genemark calls start at 58692 /note=SSC: 58692-59318 CP: yes SCS: both ST: SS BLAST-Start: [gp115 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 3.57594E-153 GAP: 71 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.062, -2.523003374675015, yes F: hypothetical protein SIF-BLAST: ,,[gp115 [Mycobacterium phage Bxz1] ],,NP_818188,100.0,3.57594E-153 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Melnyk, Mattie /note=Auto-annotation: Both genemark and glimmer call the start site at 58692 which has a start codon of GTG /note=Coding Potential: There is good coding potential in both the host and self-rained genemarks for the entire range of the gene /note=SD (Final) Score: The final score is -2.523 and it is the best score on PECAAN. the Z score is 3.062 which is good /note=Gap/overlap: There is a gap of 71 bp which is a reasonable gap and the gene is 627 bp long which is a reasonable length /note=Phamerator: Pham 424 has 183 members as of 1/13/23, 15 are drafts. Many of the Pham members are cluster C including Alice and ArcherS7 /note=Starterator: JulietS calls the most annotated start site (Start: 16 @58692) and it has 159 MA`s therefore the /note=Location call: This is likely a real gene with a start at 58692 /note=Function call: NKF; phagesDB hits have E-values of 1e^-120 and their function is called as function unknown, NCBI hits are (Identity:100, coverage:100, e-value: 3.576e^-153) and (Identity: 99.51, Coverage: 100, e-value: 9.188e^--153) and both call the function as hypothetical protein. CDD and HHPred had no relevant hits /note=Transmembrane domains: 0 predicted TMRs on Deep TMHMM and the protein is predicted to be intracellular /note=Secondary Annotator Name: Hines, Kia /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 59337 - 59627 /gene="115" /product="gp115" /function="hypothetical protein" /locus tag="JulietS_115" /note=Original Glimmer call @bp 59337 has strength 13.63; Genemark calls start at 59337 /note=SSC: 59337-59627 CP: yes SCS: both ST: SS BLAST-Start: [gp116 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 2.25876E-64 GAP: 18 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.908, -3.6727816321281046, no F: hypothetical protein SIF-BLAST: ,,[gp116 [Mycobacterium phage Bxz1] ],,NP_818189,100.0,2.25876E-64 SIF-HHPRED: SIF-Syn: NKF, and upstream gene is NKF and downstream gene is NKF, similar to Ading. /note=Primary Annotator Name: Gurunathan, Vibha /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 59337, at start codon ATG. /note=Coding Potential: Coding potential is in the top strand only, and the start site covers all the coding potential. This gene has strong coding potential. /note=SD (Final) Score: The final score is -3.673, the least negative out of all the potential start sites besides 59589 which is slightly less negative (however, 59589 would make the length of the gene 39 bp). The Z score follows the same trend, with 2.908, which is the highest Z score out of all potential start sites besides 59589 (which is very slightly higher). /note=Gap/overlap: There is a gap of 18 bp, the smallest gap out of all the potential start sites listed. /note=Phamerator: Pham 480. Date 1/19/23. It is conserved; found in ZygoTaiga_117 and YoungMoneyMata_120. Function was listed as function unknown on the phams database. /note=Starterator: Start site 1 @59337 has 157 manual annotations. The rest of the potential start sites have none. /note=Location call: Due to the above evidence, the start site was called at 59337. /note=Function call: HHPred shows no hits besides one that has a high e-value (71). CDD shows no conserved domains. BLASTp shows matches with hypothetical proteins with high e-values. Therefore, this gene is called a real protein with no known function. /note=Transmembrane domains: DeepTMHMM shows that this is most likely a globular protein with no transmembrane domains. If there are no transmembrane domains, it is most likely not encoding for a membrane protein. /note=Secondary Annotator Name: Rheinhardt, Jenna /note=Secondary Annotator QC: I agree with the location and function call. CDS 59714 - 61324 /gene="116" /product="gp116" /function="hypothetical protein" /locus tag="JulietS_116" /note=Original Glimmer call @bp 59714 has strength 18.62; Genemark calls start at 59714 /note=SSC: 59714-61324 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_FUDGETART_118 [Mycobacterium phage FudgeTart] ],,NCBI, q1:s1 100.0% 0.0 GAP: 86 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.163, -4.325675514574479, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_FUDGETART_118 [Mycobacterium phage FudgeTart] ],,AYD83585,100.0,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hines, Kia /note=Auto-annotation: Glimmer and GeneMark both call 59714 as start. /note=Coding Potential: There is really good coding potential on the forward strand, indicating this is a forward gene. The chosen start site covers all of the coding potential. /note=SD (Final) Score: The final score is the best option at -4.326 and the z score, although not the highest, is still a good score at 2.163. /note=Gap/overlap: There is a gap of 86 bp, which is higher than preferred. The length of the gene is 1611 which is a very adequate length. /note=Phamerator: Pham number is 451 as of 1/08/23. It is conserved, found in Sprinklers_116 and FudgeTart_118. /note=Starterator: Start number (@59714) is 3, manually annotated 161 times. This is the same as the conserved start site. There are 179 members of the pham and 161 call the same conserved start site. /note=Location call: Based on the above evidence, this is a real gene and the start site is 59714. /note=Function call: NKF. The top three hits on HHpred call a few different functions, however the e-values on these function calls are very poor e-values, so the information will be disregarded when determining function call. The probabilities of these calls were in the low nineties though which are good percentages, but the coverage was also very poor. NCBI Blast and CDD did not call any functions and neither did phagesdb. /note=Transmembrane domains: There are no TMDs predicted by deepTMHMM, so it is not a membrane protein. /note=Secondary Annotator Name: Shah, Amay /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 61335 - 61838 /gene="117" /product="gp117" /function="acetyltransferase" /locus tag="JulietS_117" /note=Original Glimmer call @bp 61335 has strength 17.62; Genemark calls start at 61335 /note=SSC: 61335-61838 CP: yes SCS: both ST: SS BLAST-Start: [acetyltransferase [Mycobacterium phage Phusco]],,NCBI, q1:s1 100.0% 3.39373E-118 GAP: 10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.306, -1.9310779259753799, yes F: acetyltransferase SIF-BLAST: ,,[acetyltransferase [Mycobacterium phage Phusco]],,QAY06794,99.4012,3.39373E-118 SIF-HHPRED: GLUCOSAMINE-6-PHOSPHATE N-ACETYLTRANSFERASE; TRANSFERASE; HET: COA; 1.55A {CAENORHABDITIS ELEGANS} SCOP: d.108.1.0,,,4AG7_B,33.5329,93.6 SIF-Syn: Gene function is acetyltransferase. Downstream gene is NKF Pham 415, Upstream gene is NKF Pham 451, just like phage Ava3 and BananaFence /note=PECAAN Notes /note=Primary Annotator Name: Luk, Jarrett /note=Auto-annotation:Forward Gene.Glimmer(61335), Genmark (61335), start codon:GTG /note=Coding Potential: The coding potential is found in the forward frame.The gene covers all coding potential in both the Host Trained and Self Trained GeneMark. The coding potential is also relatively regular which makes it good. /note=SD (Final) Score:-1.931 it`s the best score due to how it is the least negative and has the smallest gap. Z score is 3.306 which is greater than 2.It shows that the site is a good candidate for start site. /note=Gap/overlap:10 which is a relatively small gap /note=Phamerator: Pham 515, 01/15/23 gene conserved in Phages FoxtrotP1, Guwapp, and Toneili /note=Starterator: 01/13/23. Manually annotated 157/157 nondraft in this pham. Start 1 (1,61335) is manually called by 157 others . Evidence is also in line with the autoannotation made by glimmer and genemark /note=Location call: It is a real gene and the start site is Start 1 at 61335 based on the evidence above /note=Function call:acetyltransferase.Both PhagesDB blast and NCBI Blast indicate the function to be an acetyltransferase. The top two hits on phagesDB blast (e-value=3e-95 for phage Guwapp and ikeLoa) were acetyltransferase . The top hit on NCBI blast(e-value =3e-119 for phage I3 and phusco, with 99% identity and 100% coverage) also indicates the function to be an acetyltransferase. There are no significant hits on HHpred and CDD(also indicate acetyltransferase but have a high e-value with low identity and coverage.) /note=Transmembrane domains: No transmembrane domain predicted on TOPCONS,TMHMM and deep TMHMM.Which indicates that the gene is not responsible for a membrane protein /note= /note=Secondary Annotator Name: Melnyk, Mattie /note=Secondary Annotator QC: I agree with the location call and function call based upon the above evidence however make sure to fill out the synteny box CDS 61835 - 62824 /gene="118" /product="gp118" /function="hypothetical protein" /locus tag="JulietS_118" /note=Original Glimmer call @bp 61835 has strength 17.7; Genemark calls start at 61835 /note=SSC: 61835-62824 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein ET08_110 [Mycobacterium phage ET08] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.974, -5.371913467750883, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein ET08_110 [Mycobacterium phage ET08] ],,YP_003347789,100.0,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Okahata, Leila /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 61835 bp. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. The chosen start site includes all of the coding potential. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -5.372. Although this is not the best SD score, because this start site has a 4 bp overlap, this may be evidence of an operon so this SD score is still acceptable. /note=Gap/overlap: -4 bp upstream overlap. This overlap is very small and reasonable and may be evidence of an operon. /note=Phamerator: Pham 415. Date 1/12/2023. It is conserved and found in Grungle, Daffodil, and ParkTD, which are all in the same cluster as JulietS (C). /note=Starterator: Start site 10 in Starterator was manually annotated in 123/168 non-draft genes in this pham. Start 10 is 61835 bp in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and likely has a start site at 61835 bp. /note=Function call: No known function (NFK). The top two phagesDB BLAST hits are of unknown function (E-value = 0), and the top three NCBI BLAST hits are also of unknown function (E-value = 0, 99.70%+ identity, 100% coverage). CDD and HHpred had no significant hits. /note=Transmembrane domains: Neither TMHMM, TOPCONS, or DeepTMHMM predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Barrera, Alexis /note=Secondary Annotator QC: I agree with the location and function call. CDS 62824 - 63420 /gene="119" /product="gp119" /function="hypothetical protein" /locus tag="JulietS_119" /note=Original Glimmer call @bp 62824 has strength 13.59; Genemark calls start at 62824 /note=SSC: 62824-63420 CP: yes SCS: both ST: SS BLAST-Start: [gp120 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 2.3591E-146 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.175, -4.36137266396122, no F: hypothetical protein SIF-BLAST: ,,[gp120 [Mycobacterium phage Bxz1] ],,NP_818193,100.0,2.3591E-146 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hegde, Priya /note=Auto-annotation: Glimmer and GeneMark call a start site of 62824 (ATG). /note=Coding Potential: Good coding potential in GeneMark Host and Self. Coding potential on this ORF is on the forward strand only. /note=SD (Final) Score: -4.361. It is the highest final score on PECAAN. /note=Gap/overlap: There is a 1bp overlap with the previous gene. /note=Phamerator: Gene is in pham 443 (date accessed: 01/17/23). It is conserved, found in phages BackyardAgain and Bananafence. /note=Starterator: Start site 20 at 62824 has 166 manual annotations. It is the most manually annotated start site on Starterator. /note=Location call: High final score and Z-score > 2 support the location call at 62824. Also, the high number of manual annotations support the selected start site. /note=Function call: NKF in PhagesDB BLASTp; top hits were phages FudgeTart and Gabriel with e-values of 1e-120. NKF in NCBI BLAST; top hits were phages Bxz1 (2.36e-146) and Grungle (1.37e-145). No convincing evidence in HHPred (e-values too high). No hits in CDD. /note=Transmembrane domains: No TMDs predicted in DeepTMHMM, so this protein is not a membrane protein. /note=Secondary Annotator Name: Wu, Angus /note=Secondary Annotator QC: I agree with the location and function call. CDS 63417 - 64217 /gene="120" /product="gp120" /function="hypothetical protein" /locus tag="JulietS_120" /note=Original Glimmer call @bp 63417 has strength 12.92; Genemark calls start at 63417 /note=SSC: 63417-64217 CP: yes SCS: both ST: SS BLAST-Start: [ParB-like nuclease domain protein [Mycobacterium phage I3] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.13, -4.45540384879315, no F: hypothetical protein SIF-BLAST: ,,[ParB-like nuclease domain protein [Mycobacterium phage I3] ],,YP_010510532,100.0,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Lee, Amber /note=Auto-annotation: Glimmer and GeneMark both call the start site at 63417 bp. The start codon is ATG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: The start site 63417 corresponds to a Final Score of -4.455 which is the least negative and therefore the best final score. It also has a Z-score of 2.13. /note=Gap/overlap: There is a 4 nucleotide overlap which suggests that this gene is part of an operon. This overlap is seen in other non-draft phages like Burrough (C1) and Capablanca (C1). /note=Phamerator: pham: 552. Date 1/12/2023. It is conserved; found in Burrough (C1) and Capablanca (C1). /note=Starterator: Start site 3 in Starterator was manually annotated in 156/156 non-draft genes in this pham and is the most manually annotated start site. Start 3 is 63417 bp in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 63417 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: NKF. Many strong (e-value ~0) hits in PhagesDB and NCBI BLAST corresponding to ParB-like nuclease domain protein. Function was not called because not all requirements met in SEA-PHAGES list of approved functions. HHPRED was used to align protein sequences of JulietS with two other non-draft phages Burrough (C1) and Capablanca (C1). Both Burrough and Capablanca had synteny with JulietS and called the ParB-like nuclease domain protein in PhagesDB. The protein sequences were 100% identical. HHPRED had no good hits (no good coverage or probability; e-value is too high). No hits in CDD. Not a membrane protein because it wasn’t called by TMHMM, Deep TMHMM, or TOPCONS. /note=Transmembrane domains: Neither TMHMM, Deep TMHMM, or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Geghamyan, Knar /note=Secondary Annotator QC: I agree with this location and function call. CDS 64214 - 64669 /gene="121" /product="gp121" /function="hypothetical protein" /locus tag="JulietS_121" /note=Original Glimmer call @bp 64214 has strength 14.61; Genemark calls start at 64214 /note=SSC: 64214-64669 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_MCWOLFISH_123 [Mycobacterium phage McWolfish]],,NCBI, q1:s1 100.0% 2.32234E-103 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.896, -5.711975353186392, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_MCWOLFISH_123 [Mycobacterium phage McWolfish]],,AZF96224,99.3377,2.32234E-103 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Qi, Haocheng /note=Auto-annotation:Both Glimmer and genemark, they agree on the same site which is 64214 the start codon is GTG /note=Coding Potential:Coding potential is found both in GeneMark Self and Host, and there is only forward potential, and the chosen start site does include all of the coding potential. /note=SD (Final) Score:-5.712, the best final score on PECAAN /note=Gap/overlap:-4bp, very likely an operon start /note=Phamerator: pham: 410. Date 01/08/23. It is conserved; found in other 181 non-draft phages, such as Ading or Bread. /note=Starterator: Start site 12 in Starterator was manually annotated in 161/168 non-draft genes in this pham. Start 12 is 64214 in JulietS. This evidence agrees with the site predicted by both Glimmer and GeneMark. /note=Location call:Based on the above evidence, this is a real gene and the most likely start site is 64214 /note=Function call: Not known function. The top 2 phagesdb BLAST hits (phage:ZygoTaiga; Zeenon) have the function of unknown function with e-value of 1e-85, and the ncbi blast also have no known function with e-value of 2e-103. There is also no hits in CDD, the largest possibility in HHpred is 99.31%, with the e-value of 1.2e-11,but this read says that this is a minor capsid protein. However, from the seaphages forum, a similar example happened before says that this should be a not known function(https://seaphages.org/forums/topic/5207/). So overall there is no known function in this gene. /note=Transmembrane domains:Neither TMHMM or TOPCONS or DeepTMHMM predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Kristianto, Luke /note=Secondary Annotator QC: checked CDS 64666 - 65379 /gene="122" /product="gp122" /function="hypothetical protein" /locus tag="JulietS_122" /note=Original Glimmer call @bp 64666 has strength 19.93; Genemark calls start at 64666 /note=SSC: 64666-65379 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein ET08_114 [Mycobacterium phage ET08] ],,NCBI, q1:s1 100.0% 4.30243E-173 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.231, -2.1518296047551457, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein ET08_114 [Mycobacterium phage ET08] ],,YP_003347793,100.0,4.30243E-173 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Uyemura, Antonio /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 64666. The start codon is GTG. /note=Coding Potential: There is good coding potential in both host trained and self trained. The coding potential is only in the forward strand indicating it is a forward gene. /note=SD (Final) Score: -2.152. This is the best final score. /note=Gap/overlap: -4 bp. This indicates that there is a 4 bp overlap which is highly favorable because it indicates the gene is apart of an operon. /note=Phamerator: 422 (01/15/22). This gene is conserved and found in Bipolarrisk (C1) and Fludd (C1). /note=Starterator: Start 10 @ 64666. (01/15/22) This site was manually annotated 160/168. This agrees with Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 64666. Additionally, it had a Z-score of 3.231. /note=Function call: The function is unknown. The top three phages (EasyJones (C1), ET08 (C1), Grungle (C1)) on phagesdb BLAST hits mark the function as unknown (E-value 1e-135). Additionally the first 5 NCBI BLAST hits also call the function as a Hypothetical protein (100% coverage, 99% identity, and E-values <10^-173). HHPRED provides no convincing evidence nor did anything show up for CDD. /note=Transmembrane domains: DeepTMHMM predicts it to be outside cell with 100% probability. /note=Secondary Annotator Name: Luk, Jarrett /note=Secondary Annotator QC: I agree with the function and location called. However try to include the fact that a -4p overlap indicates the gene being a part of an operon in the gap section. Also include the date of the starterator report In addition add the deep TMHMM info in the transmembrane domain section. CDS 65445 - 67190 /gene="123" /product="gp123" /function="tail sheath protein" /locus tag="JulietS_123" /note=Original Glimmer call @bp 65445 has strength 20.34; Genemark calls start at 65445 /note=SSC: 65445-67190 CP: yes SCS: both ST: SS BLAST-Start: [tail sheath protein [Mycobacterium phage ET08] ],,NCBI, q1:s1 100.0% 0.0 GAP: 65 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.786, -3.6716023243868, yes F: tail sheath protein SIF-BLAST: ,,[tail sheath protein [Mycobacterium phage ET08] ],,YP_003347794,100.0,0.0 SIF-HHPRED: Pyocin sheath PA0622; bacteriocin, pyocin, helix, ANTIMICROBIAL PROTEIN; 2.9A {Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1)},,,6PYT_N,57.1429,100.0 SIF-Syn: Tail sheath protein called for both JulietS and Wally (Pham 426), upstream gene has NKF in JulietS and not called in Wally (Pham 422), has NKF in JulietS and not called in Wally (Pham 413), all genes are from the same pham. /note=Primary Annotator Name: Barrera, Alexis /note=Auto-annotation: Glimmer and Genemark. Both call the start located at 65445. The start codon is ATG, which is common (about half of all genes have this start). /note=Coding Potential: Coding potential in this reading frame is in the forward direction only, which indicates this is a forward gene. Coding potential is found in both GeneMark Self and Host. All of the coding potential is included. /note=SD (Final) Score: The z score is 2.786. The final score is -3.672 which is the second highest value of all listed start sites. /note=Gap/overlap: 65 bps gap upstream of the gene and a 79 bp gap downstream of the gene. This gene and these gaps are conserved in phages Wally, Melpomini, and Yucca. There is no other coding potential in these regions. /note=Phamerator: Pham: 426. Date 01/14/23. The gene is conserved in phages Wally and Yucca which are in the same cluster as JulietS. /note=Starterator: Start site 3 is found in 172 of 183 of genes in pham and was called for 158 of 168 non-draft phage genomes in the pham. It is the most annotated start site and is called 100.0% of the time when present. /note=Location call: Considering the evidence above, this gene is a real gene and has a start site at 65445 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: Tail sheath protein. The top hit for HHpred has a function of sheath protein (probability: 100, E-value: 4.2e-27, % coverage: 57.1429) and other hits for sheath protein with similar values. In this pham, the Phagesdb Function Frequency also states that there is a frequency of 99% being a tail sheath protein (92 phages called) in subcluster C1 with the other 1% called as tail sheath (1 phage) in subcluster C1. PhagesDB BLASTp shows many hits for phages with the function of tail sheath protein, all with E-values of 0. NCBI BLASTp has the top three hits with the function of tail sheath protein, (% coverage: 100, E-value: 0). There is one CDD hit for tail sheath protein (% identity: 11.2128, % alignment: 16.9336, % coverage: 46.2995, E-value: 0.000921809). /note=Transmembrane domains: DeepTMHMM predicts no TMRs. This evidence indicates that this is not a membrane protein. /note=Secondary Annotator Name: Davis, Kayla /note=Secondary Annotator QC: I agree with the location and function call for this gene. Make sure to include the date that the starterator was done in the annotation notes. CDS 67225 - 67692 /gene="124" /product="gp124" /function="hypothetical protein" /locus tag="JulietS_124" /note=Original Glimmer call @bp 67225 has strength 14.82; Genemark calls start at 67225 /note=SSC: 67225-67692 CP: yes SCS: both ST: SS BLAST-Start: [gp125 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 6.83413E-109 GAP: 34 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.306, -2.838489286749966, yes F: hypothetical protein SIF-BLAST: ,,[gp125 [Mycobacterium phage Bxz1] ],,NP_818198,100.0,6.83413E-109 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kristianto, Luke /note=Auto-annotation: Both Glimmer and GeneMark call the gene, and they agree on the start site at 67225 bp. The start codon is ATG, which has a high probability of being used as a start site. /note=Coding Potential: Coding potential in this ORF is on the forward strand only on the first frame, indicating that this is a forward gene. The ORF has reasonable coding potential and the called start site does capture all of the coding potential. /note=SD (Final) Score: The final score is the best option at -2.838 and the z-score is the best option at 3.306. This provides strong evidence that the called start site is the real start site. /note=Gap/overlap: The gap/overlap is reasonable at 34 bp. This gap is justified because it is the smallest possible gap out of the potential start sites and there is no coding potential within the gap. This gap is conserved and shows up in phage Ading from cluster/subcluster C1. /note=Phamerator: Pham: 413. Date 1/19/2023. It is conserved and found in Ading (C1) and Adlitam (C1). /note=Starterator: Start number 3 in Starterator was manually annotated in 157/168 non-draft genes in this pham. Start number 3 is 67225 bp in phage JulietS. This is likely the start site because it was called the most and was conserved in 171/183 (93.4%) of genes in the pham. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 67225 bp. Starterator agrees with Glimmer and GeneMark. /note=Function call: NKF. The top two PhagesDB BLAST hits have the function of “function unknown” (99% identity, E-value = 4e-85/9e-97), and the top two NCBI BLAST hits have the function of “hypothetical protein” (100%/99.3548% identity, E-value = 6.83413e-109/1.08363e-108). Results from CDD and HHpred were irrelevant because either no results came up or unlikely results with unreasonably high e-values came up. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs. DeepTMHMM agrees (1.0 probability outside throughout). Therefore, it is likely not a membrane protein. /note=Secondary Annotator Name: Lee, Amber /note=Secondary Annotator QC: I agree with the function and location call. CDS 67705 - 68538 /gene="125" /product="gp125" /function="hypothetical protein" /locus tag="JulietS_125" /note=Original Glimmer call @bp 67705 has strength 10.48; Genemark calls start at 67705 /note=SSC: 67705-68538 CP: yes SCS: both ST: SS BLAST-Start: [gp126 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 0.0 GAP: 12 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.098, -4.522979412939132, no F: hypothetical protein SIF-BLAST: ,,[gp126 [Mycobacterium phage Bxz1] ],,NP_818199,100.0,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Shah, Amay /note=Auto-annotation: Glimmer and Genemark. Both call the start site at 67705. /note=Coding Potential: Coding potential in this ORF is in the forward direction only, indicating that this is a forward gene. Coding potential is found in both GeneMark Self and Host. /note=SD (Final) Score: The final score is -4.523. This is the best final score on PECAAN. /note=Gap/overlap: There is a reasonable gap of 12bp. /note=Phamerator: pham: 476. Date 1/10/2023. It is conserved; found in Alice (C) and Astraea (C). /note=Starterator: Start site 7 in Starterator was manually annotated in 153/160 non-draft genes in this pham. Start 7 is 67705 in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 67705. /note=Function call: NFK. BLASTp hits only included hypothetical proteins. CDD returned no hits. HHPRED returned no significant hits. The top NCBI BLAST hit was a hypothetical protein with an e-value of 0, so there is not enough evidence to determine a function call. /note=Transmembrane domains: Deep TMHMM doesn`t predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Geghamyan, Knar /note=Secondary Annotator QC: I agree with location and function call. CDS 68551 - 69078 /gene="126" /product="gp126" /function="tail assembly chaperone" /locus tag="JulietS_126" /note=Original Glimmer call @bp 68551 has strength 14.91; Genemark calls start at 68539 /note=SSC: 68551-69078 CP: yes SCS: both-gl ST: SS BLAST-Start: [gp127 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 5.34524E-124 GAP: 12 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.762, -3.899580331731527, yes F: tail assembly chaperone SIF-BLAST: ,,[gp127 [Mycobacterium phage Bxz1] ],,NP_818201,100.0,5.34524E-124 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sandhu, Muskaan /note=Auto-annotation:The start site in Glimmer and GeneMark are different. Glimmer reports 68551 while GeneMark reports 68539. The start codon is ATG, which is more common. /note=Coding Potential:There is coding potential present in ORF 1 of the direct sequence (as expected for a forward gene) in the Host-Trained & Self-Trained Genemark. Start site 68538 covers all of coding potential. /note=SD (Final) Score: -3.9 This is not the lowest final score (lowest one is -3.486) but it has the better Z-score (2.76 compared to 2.56) and it corresponds to the start site reported by Glimmer. /note=Gap/overlap:There is a 13 bp gap upstream of the gene and a 63bp gap downstream of the gene. These gaps are reasonable because there is no coding potential present to indicate the presence of another gene within the gaps. /note=Phamerator: Pham 411 as of 1/19/23. It is conserved, found in Ading (C1) and Bonray (C1). /note=Starterator: Most annotated start site is 9; 116/168 of non-draft genes called start site 9. Start 9 is @ 68551 bp in JulietS. Evidence agrees w/ site predicted by Glimmer. /note=Location call: Start site 22 @ 68551 bp. Staterator and final score evidence points toward Glimmer start site. /note=Function call: Tail assembly chaperone. The top three PhagesDB BLAST hits have the function of tail assembly chaperone (score 348 and E-value = 4*10-96) and the top three NCBI BLAST hits also have the function of tail assembly chaperone (100% coverage, >97% identity, and E-value 5*10-124). HHpred had no relevant hits with E-values being greater than 0.012. CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not show any TMDs, therefore, this is not a transmembrane protein. /note=Secondary Annotator Name: Luk, Jarrett /note=Secondary Annotator QC: i agree to the location and function call suggest phagesDB and NCBI blast CDS join(68551..69066,69066..69449) /gene="127" /product="gp127" /function="tail assembly chaperone" /locus tag="JulietS_127" /note= /note=SSC: 68551-69449 CP: no SCS: neither ST: NI BLAST-Start: [tail assembly chaperone [Mycobacterium phage ET08] ],,NCBI, q1:s5 100.0% 0.0 GAP: -528 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.762, -3.899580331731527, yes F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Mycobacterium phage ET08] ],,YP_003347797,98.6799,0.0 SIF-HHPRED: Phage_TAC_13 ; Phage tail assembly chaperone, TAC,,,PF16459.8,35.4515,94.9 SIF-Syn: CDS 69449 - 71707 /gene="128" /product="gp128" /function="tape measure protein" /locus tag="JulietS_128" /note=Original Glimmer call @bp 69449 has strength 15.44; Genemark calls start at 69449 /note=SSC: 69449-71707 CP: yes SCS: both ST: SS BLAST-Start: [tape measure protein [Mycobacterium phage Phox]],,NCBI, q1:s1 100.0% 0.0 GAP: -2 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.632, -3.482870966776971, no F: tape measure protein SIF-BLAST: ,,[tape measure protein [Mycobacterium phage Phox]],,ATN91406,100.0,0.0 SIF-HHPRED: SIF-Syn: Tape Measure Protein called for both JulietS and BackyardAgain (Pham 48756). The upstream gene is a minor tail protein in both phages (pham 477), and there is a tail assembly protein that is downstream in both phages (pham 411). /note=Primary Annotator Name: Davis, Kayla /note=Auto-annotation: Both Glimmer and GeneMark list a start site of 69449, which is the start site of ATG. /note=Coding Potential: There looks to be good coding potential in this gene as both the Host-trained and the Self-trained GeneMark appear to show complete direct sequences and it covers the entire gene region. As for pham maps, the gene shares synteny with other phages Ading and Atlantean. /note=SD (Final) Score: The chosen Z-score was 2.632, with a final score of -3.483, both of which corresponded to the start site of 69449. There was another start site that had a higher Z-score, but a much more negative final score. /note=Gap/overlap: There’s an overlap of -1 which indicates that there could be an operon. /note=Phamerator: As of 1/19/23, this gene was found to be a part of pham number 48756. Other phages in this pham include Alice and Amataga, both of which are cluster C phages. /note=Starterator: As of 01/13/23, the most called start number was 6, and it was called in 113 of the 165 non-draft genes. In JulietS, start site 6 corresponds to a start of 69449 with 113 MAs. /note=Location call: Based on the evidence above, this is likely to be a real gene with a start site of 69449. /note=Function call: Tape Measure Protein. The NCBI Blast had numerous hits with 100% coverage and e-values of 0.0 for tape measure proteins/tail length tape measure proteins. HHpred showed that there was a major hit with a very low e-value for a tail-length tape measure protein. There were no hits for CDD. /note=Transmembrane domains: Deep TMHMM did not show any hits for Transmembrane Domains /note=Secondary Annotator Name: Hines, Kia /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 71707 - 72390 /gene="129" /product="gp129" /function="minor tail protein" /locus tag="JulietS_129" /note=Original Glimmer call @bp 71707 has strength 16.07; Genemark calls start at 71707 /note=SSC: 71707-72390 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SCOTTMCG_129 [Mycobacterium phage ScottMcG] ],,NCBI, q1:s1 100.0% 3.08222E-166 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.496, -3.6907860541164537, no F: minor tail protein SIF-BLAST: ,,[hypothetical protein SCOTTMCG_129 [Mycobacterium phage ScottMcG] ],,YP_002224157,100.0,3.08222E-166 SIF-HHPRED: All3321 protein; contractile tail, injection system, macromolecular machine, PROTEIN TRANSPORT; 3.2A {Nostoc sp. (strain PCC 7120 / SAG 25.82 / UTEX 2576)},,,7B5H_EK,51.1013,96.2 SIF-Syn: Minor tail protein called for both JulietS and BackyardAgain (Pham 477). The upstream gene is tape measure protein in both phages and NFK downstream also. All genes are from the same pham (48756 for tape measure protein and 429 for NFK). /note=Primary Annotator Name: Pham, Truc /note=Auto-annotation: The two auto-annotation algorithms, Glimmer and GeneMarks, both call the start of this gene at 71707 with the start codon of ATG. /note=Coding Potential: The coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host and all of the ORF is included with the 71707 start site. /note=SD (Final) Score: The SD (Final) Score for this start site is -3.691, which is the best score of all the possible start sites on PECAAN. Additionally, this start site also has the second-best Z-score of 2.496 on PECAAN. /note=Gap/overlap: This gene has an overlap of 1 base pair with the previous gene. This is the best possible start site since any other option would increase the size of the gap, which will potentially require the addition of another gene. /note=Phamerator: This gene belongs to pham number 477 as of 1/08/2023. The gene is conserved in phages of this cluster (C) like Shifa and Napoleon13. Many members of this family have their function listed as a minor tail protein, so there is a possibility that this gene also shares this function. /note=Starterator: Start site 2 is most often called as it was manually annotated in 160/160 non-draft genes in the pham. Start 2 is 71707 in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 71707. /note=Function call: Minor Tail Protein. The top three phagesdb BLAST hits are function unknown (E-value <10^-35), and the top three NCBI BLAST hits also have no known function for this gene (100% coverage, 99+% identity, and E-value <10^-165). HHpred’s top hit also indicated minor tail protein function (96+% probability, 51+% coverage, and E-value <0.088). CDD had no relevant hits. /note=Transmembrane domains: Neither DeepTMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Shah, Amay /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 72406 - 72576 /gene="130" /product="gp130" /function="hypothetical protein" /locus tag="JulietS_130" /note=Original Glimmer call @bp 72406 has strength 11.67; Genemark calls start at 72406 /note=SSC: 72406-72576 CP: yes SCS: both ST: SS BLAST-Start: [gp131 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 9.93682E-33 GAP: 15 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.056, -3.3633184689272606, yes F: hypothetical protein SIF-BLAST: ,,[gp131 [Mycobacterium phage Bxz1] ],,NP_818204,100.0,9.93682E-33 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Deng, Yiran /note=Auto-annotation: Glimmer and GeneMark both call the start site at 72406 with start codon ATG /note=Coding Potential: Coding potential in this ORF is found on forward direct sequence and no switch in orientation is observed in both host-trained and self-trianed Genemark. All the coding potential included in the ORF and by the selected start site. /note=SD (Final) Score: final score of -3.363 and z-value of 3.056, which is the best final score on PECAAN /note=Gap/overlap: the gap with the upstream gene is 15 bp. This gene is conserved in several other phages of the same cluster (Fludd, Grungle) and the gap does not contain coding potential and was seen in Fludd and Grungle as well. /note=Phamerator: pham: 429, date 01/25/2023. It is conserved; found in Ading (C) and Alice (C). /note=Starterator: Start site number 129 in Starterator had the highest manual annotation in 166/168 non-drafted genes in this pham. Start site 129 is at position 72406 in JulietS, which agrees with the auto-annotated site by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 72406. /note=Function call: NKF. None of phagesDB BLAST, NCBI BLAST, CDD, or HHPRED shows any significant relevant hits with known functions, only significant probability or coverage to hits with unknown function, top hits in PhagesDB are HyRo and I3 which both call unknown function have high evalue of 4e-30. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Qi, Haocheng /note=Secondary Annotator QC: I agree with the function and location call. However, the phamerator and starterator part is wrong, you might have mixed it with your other gene. Also, even there is no function call for phagesDb, you should still choose 2 examples and shown in the function call part. CDS 72580 - 73170 /gene="131" /product="gp131" /function="hypothetical protein" /locus tag="JulietS_131" /note=Original Glimmer call @bp 72580 has strength 10.17; Genemark calls start at 72580 /note=SSC: 72580-73170 CP: yes SCS: both ST: SS BLAST-Start: [gp130 [Mycobacterium phage Cali] ],,NCBI, q1:s1 100.0% 3.23334E-142 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.611, -5.62127637145861, no F: hypothetical protein SIF-BLAST: ,,[gp130 [Mycobacterium phage Cali] ],,YP_002224603,100.0,3.23334E-142 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Santilla, Matthew /note=Auto-annotation: Both Glimmer and Genemark agree that the start site is @ 72580. Start codon is ATG. /note=Coding Potential: The host-trained Genemark and self-trained Genemark both show that the coding potential fully captures the auto annotated start site @ 72580 and the stop site @ 73170. /note=SD (Final) Score: The final score of this start site is -5.621 with a z-score of 1.611. This is neither the most negative final score of the start sites nor the most favorable z score since the z score is less than 2. /note=Gap/overlap: There is a 3 bp gap which is below the recommended 50 bp gap. /note=Phamerator: Found in Pham 428. Date 01/12/2023.The gene is conserved and found in phages Bigswole, Bread, and FoxTrotP1. /note=Starterator: Start site 3 is called in 168/168 non-draft genes. Start site 3 is @ 72580 in JulietS which agrees with the auto annotated start site from Genemark and Glimmer. /note=Location call: Based on the aforementioned evidence the start site is most likely @ 72580. /note=Function call: NKF. The top PhagesDB blast hits from Willis and Teardrop show the function is unknown with both having an e value of 1e^-117. The top 3 hits from NCBI Blast show phage Cali having gp130 with an e value of 3e^-142, phage Bxz1 having gp132 with an e value of 9e^-142, and phage Koguma having a hypothetical protein with an e value of 2e^-141. CCD had no relevant hits. HHPred had no relevant hits. /note=Transmembrane domains: Deep TMHMM did not predict any transmembrane domains, so this gene is not a membrane protein. /note=Secondary Annotator Name: Geghamyan, Knar /note=Secondary Annotator QC: I agree with the location and function call. CDS 73170 - 76016 /gene="132" /product="gp132" /function="minor tail protein" /locus tag="JulietS_132" /note=Original Glimmer call @bp 73170 has strength 14.09; Genemark calls start at 73170 /note=SSC: 73170-76016 CP: yes SCS: both ST: SS BLAST-Start: [gp133 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.379, -3.935403931688939, no F: minor tail protein SIF-BLAST: ,,[gp133 [Mycobacterium phage Bxz1] ],,NP_818206,100.0,0.0 SIF-HHPRED: Putative cell wall hydrolase Tn916-like,CTn1-Orf17; Two domains protein, Slt/lysozyme-like muramidase, NlpC/P60 LD endopeptidase, Structural Genomics, Joint Center for Structural Genomics, JCSG; HET: MSE, GOL, OCS; 2.38A {Clostridium difficile},,,4HPE_C,10.443,99.5 SIF-Syn: This is a minor tail protein. Upstream gene is NKF, and the downstream gene is also a minor tail protein. /note=Primary Annotator Name: Arredondo, Alexis /note=Auto-annotation: Both Glimmer and GeneMark label the start site to be 73170, with the start codon being ATG. /note=Coding Potential: The selected start codon, ATG, demonstrates coding potential for the entire length of the gene in both the host-trained and self-trained, covering the whole gene. /note=SD (Final) Score: The Z-score is 2.379 and the final score is -3.935, both are not the best scores. /note=Gap/overlap: The gene has an overlap of -1, which is reasonable and may suggest that it is part of an operon. /note=Phamerator: The gene belongs to pham 421, dated 1/13/23. The gene is conserved, both in Momo (C1) and Phlegm (C1). /note=Starterator: Start site 5 was manually annotated in 157 of the 168 non-draft genes in the pham. This start site is 73170, which agrees with the auto annotation by Glimmer and GeneMark, and further validates the start site. /note=Location call: Based on the agreed start site provided by both Glimmer and GeneMark, 73170, which is validated by starterator, along with the consideration that the gene is conserved in both Momo (C1) and Phlegm (C1), this gene is most likely real. /note=Function call: Minor Tail Protein. There were no listed hits for this gene in CDD and HHPred. However, there were two significant hits in NCBI for Minor Tail Protein. One was listed with a probability of 100%, an e-value of 0, and a percent coverage of 100%. The second hit had a probability of 99.89%, an e-value of 0, and a percent coverage of 100%. Additionally, Phages DB lists two phages Gizmo and Ghost with an e-value of 0 and with minor tail protein as a function. /note=Transmembrane domains: Deep TMHMM does not list TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Shah, Amay /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 76016 - 77482 /gene="133" /product="gp133" /function="minor tail protein" /locus tag="JulietS_133" /note=Original Glimmer call @bp 76016 has strength 13.51; Genemark calls start at 76016 /note=SSC: 76016-77482 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Mycobacterium phage CharlieB] ],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.053, -4.6353190821692705, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Mycobacterium phage CharlieB] ],,YP_010057539,100.0,0.0 SIF-HHPRED: SIF-Syn: minor tail protein, upstream gene is in [pham 421] function of minor tail protein, downstream gene is in [pham 68639] function is minor tail protein, just like in phage ShiaLabeouf and Shrimp. /note=Primary Annotator Name: Kumar, Preyasi /note=Auto-annotation: Glimmer and GeneMark both call the start at 76016, ATG. /note=Coding Potential: Yes, the gene has reasonable coding potential predicted within the putative ORF and the chosen start site covers all this coding potential. Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.635 is the final score associated with the original site, but it is the third-best score. Z-score is 2.053, which is strong. I did not choose a candidate with a higher final score because of the start site and start codon evidence. /note=Gap/overlap: 1bp overlap. Reasonable, evidence of an operon, acceptable gene length. I didn’t choose a start site that would make a longer ORF because both Glimmer and GeneMark agreed on the current start site. /note=Phamerator: Pham 519. Date 01/13/2023. It is conserved, found in ShiaLabeouf and Shrimp. /note=Starterator: Yes, there is a conserved start site choice. It is start number 6 with a base pair coordinate of 76016. 126 of 157 call site #3. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 76016 bp. /note=Function call: Minor tail protein; The PhagesDB database and top NCBI BLAST hits say minor tail protein. HHpred hits have unknown function (e-value = 0). No conserved domains identified for this query sequence. /note=Transmembrane domains: Neither TMHMM or TOPCONS or DeepTMHMM predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Hines, Kia /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 77482 - 78237 /gene="134" /product="gp134" /function="minor tail protein" /locus tag="JulietS_134" /note=Original Glimmer call @bp 77482 has strength 8.9; Genemark calls start at 77482 /note=SSC: 77482-78237 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SCOTTMCG_134 [Mycobacterium phage ScottMcG] ],,NCBI, q1:s1 100.0% 1.14664E-161 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.349, -4.075451192669358, no F: minor tail protein SIF-BLAST: ,,[hypothetical protein SCOTTMCG_134 [Mycobacterium phage ScottMcG] ],,YP_002224162,100.0,1.14664E-161 SIF-HHPRED: SIF-Syn: minor tail protein, upstream gene is in [pham 519] function of minor tail protein, downstream gene is in [pham 417] function is baseplate wedge protein, just like in phage BananaFence and BackyardAgain. /note=Primary Annotator Name: Rheinhardt, Jenna /note=Auto-annotation: Glimmer and GeneMark call 77482. Start of ATG /note=Coding Potential: Strong coding potential in the first ORF in the forward strand and covers the entire range of the start and stop positions. /note=SD (Final) Score: -4.075 ; best final score presented. Z score of 2.349, was not the best one given, but due to the 1bp overlap the score is acceptable. /note=Gap/overlap: -1 ; reasonable overlap when compared to the other options presented. Chance of an operon present. /note=Phamerator: Pham : 65582 ; range of clusters (A, C, and F). 160 phages had the bp length of 756, including Alice and Ava3. /note=Starterator: Start site 26 at 77482, which has 146 MA calling the site. This agrees with Glimmer and GeneMark. /note=Location call: Yes the evidence above supports that the start site is real. The most likely start site is 77482. /note=Function call: minor tail protein; The function frequency states it as a minor tail protein with a frequency of 100%. Phagesdb BLAST presents functions unknown with phages Ading and AdLitam with e-vales of 2e-147. HHPRED does have hits with 7Nx3_A of a Kinase receptor but the e-value is 150. NCBI BLAST has phage ScottMcG with 100% identity and 100% aligned for a hypothetical protein which could be categorized as a minor tail protein. No CDD. Rabinovish also presents 100% aligned for minor tail protein. It is glycine rich within its protein coding region. /note=Transmembrane domains: DeepTmHmm presents no transmembrane domains so it is likely not a membrane protein. /note= /note=Secondary Annotator Name: Kristianto, Luke /note=Secondary Annotator QC: I agree with the location and function call. I`m a bit confused by the notes on coding potential; it seems like there`s strong coding potential in the ORF on the first frame. Please discuss that the z score isn`t the best, but since the gap/overlap is so good (-1, likely an operon), it`s the most likely start site. Please only check evidence that your gene is a minor tail capsid (there are a good amount in both PhagesDB BLAST and NCBI BLAST.) - resolved CDS 78272 - 78688 /gene="135" /product="gp135" /function="baseplate wedge protein" /locus tag="JulietS_135" /note=Original Glimmer call @bp 78272 has strength 13.67; Genemark calls start at 78272 /note=SSC: 78272-78688 CP: yes SCS: both ST: SS BLAST-Start: [baseplate wedge protein [Mycobacterium phage Phabba] ],,NCBI, q1:s1 99.2754% 3.14319E-55 GAP: 34 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.152, -2.3335289980954053, yes F: baseplate wedge protein SIF-BLAST: ,,[baseplate wedge protein [Mycobacterium phage Phabba] ],,YP_010056838,76.4286,3.14319E-55 SIF-HHPRED: Baseplate wedge protein gp25; contractile sheath, baseplate, wedge, sheath polymerization, viral protein; HET: MSE; 2.47A {Enterobacteria phage T4},,,5IW9_B,99.2754,99.6 SIF-Syn: baseplate wedge protein; upstream gene is in [pham 68639] function is minor tail protein, downstream gene is in [pham 414] function is baseplate wedge protein, just like in phages Janiyra and Mikro. /note=Primary Annotator Name: Tseng, Kylie /note=Auto-annotation: Glimmer and Genemark both call the same start site at 78272; start codon: ATG /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF on the forward strand only on line 2 (indicating that this is a forward gene). The chosen start site covers this ORF. Coding potential is found in both the host and self-trained Genemark. There is synteny with non-draft phages such as Catera and Darko. /note=SD (Final) Score: -2.334. This is a good SD score. /note=Gap/overlap: There is a gap of 34 bp which is reasonable. This is the smallest gap of all possible start sites for this gene. /note=Phamerator: Date of investigation: 1/19/23; Pham 417; Yes, the pham is in other members that belong to Cluster C, which is the cluster JulietS belongs to (i.e. phages Astraea and Atlantean). /note=Starterator: Yes, there is a conserved start choice. It is start number 2 with a base pair coordinate of 78272. Has 168 MA’s. Found in 183/183 (100%) of genes in pham. This start site also agrees with Glimmer and Genemark’s call. E-values are good at -72. /note=Location call: Yes, the evidence suggests this gene is real. Start site 78272 is most likely. /note=Function call: baseplate wedge protein; PhagesDB, HHPred, and NCBI all have hits for baseplate wedge protein function (supporting evidence in phages I3, IkeLoa, and InigoMontoya with e-values of -72 in PhagesDB; coverage of 99% and e-value of -13 in HHPred; coverage of above 95% and e-values of -55 in NCBI); CDD has no hits but all other databases are providing supporting evidence. /note=Transmembrane domains: No TMDs were predicted by DeepTMHMM; therefore it is not a membrane protein. /note=Secondary Annotator Name: Santilla, Matthew /note=Secondary Annotator QC: I agree with this annotation. CDS 78699 - 80438 /gene="136" /product="gp136" /function="baseplate wedge protein" /locus tag="JulietS_136" /note=Original Glimmer call @bp 78699 has strength 19.93; Genemark calls start at 78699 /note=SSC: 78699-80438 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SCOTTMCG_136 [Mycobacterium phage ScottMcG] ],,NCBI, q1:s1 100.0% 0.0 GAP: 10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.306, -1.9310779259753799, yes F: baseplate wedge protein SIF-BLAST: ,,[hypothetical protein SCOTTMCG_136 [Mycobacterium phage ScottMcG] ],,YP_002224164,100.0,0.0 SIF-HHPRED: baseplate wedge protein, gp16; Myoviridae, Phage, Baseplate complex, VIRUS;{Vibrio phage XM1},,,7KH1_F5,38.6874,100.0 SIF-Syn: This is a baseplate wedge protein. Upstream gene is baseplate wedge protein, downstream gene is NKF, just like in phages Atlantean and Stubby /note=Primary Annotator Name: Aunger, Sarah /note=Auto-annotation: Glimmer and GeneMark both call the start site at 78699 with a start codon of TTG. /note=Coding Potential: Coding potential in this Open Reading Frame (ORF) is in the forward strand, indicating a forward reading gene. The start site of #78699 covers all of the coding potential regions on the forward strand, which supports the forward direction of this gene. Additionally, coding potential is found in both Host-Trained GeneMark and Self-Trained GeneMark graphs which suggests that this is a potential gene. /note=SD (Final) Score: The final score is the most negative (SD = -1.931), and the Z-score is the highest overall score (Z-score = 3.306), indicating that the autogenerated start site is the better of the options. This makes the score reasonable as there is syntenty with Lukilu in the gene sequence thus there is no need to change the autogenerate start site. /note=Gap/overlap: There is an upstream gap of 10 bp, which seems reasonable when looking at the synteny of other phages classed in the C1 cluster against this gene. There is a reasonal gap between this gene and the gene downstream of about 2 bp which allows for the 1740 bp gene to be reasonable. /note=Phamerator: The gene was found to be in Pham 414 (01/17/2023), which is common in Cluster C1 phages, as previously seen in Phages Ading, Alice, and Ewok. There were a lot of commonly listed functions in the pham which included the baseplate wedge protein and baseplate J protein which is all all conserved within the Pham. Moreover, the base pair length was conserved at 1740 bp. /note=Starterator: There is a reasonable and highly conserved start site that was looked on 01/17/2023 at (2, 78699) which was called by 155 out of the 168 non-draft genes out of the 183 total pham members. /note=Location call: The gene and the start site are both conserved for this gene. With a high coding potential in the ORF it indicates a real gene’s placement in the operon and thus the evidence supports that the start site of this gene starts at #78699 and the gene is accurate to other phages within the C1 group. /note=Function call: The function of the gene is a potential baseplate wedge protein. In the BLASTp on PhagesDB.org it has strong match against Teardrop with an e-value of 0 and 100% positives with the function of baseplate wedge protein. Additionally, in the BLASTp on PhagesDB.org, there is a strong match with ZygoTaiga with an E-value of 0 and 100% match with the function of a baseplate J. Additionally, when looking at the NCBI BLASTp it also indicates a function of baseplate wedge subunit with an e-value of 0 and a 100% match with phage ScottMcG. There are also matches with functions for baseplate J protein with a 99.83% match with phage Bangla1971. HHpred gives the hit with the best possivle e-value of 1.6x10-32 with the function call of baseplate wedge protein. The CDD shows one match with the baseplate J domain family with a e-value of 8.32x10-17. Overall, the best possible option of the function of of this gene has the function of baseplate wedge protein as it has the highest probability compared to the baseplate J protein. /note=Transmembrane domains: Both TOPCONS and TMHMM predict no transmembrane domains, thus this gene does not encode for a membrane protein. DeepTMHMM also predicts no transmembrane domains as well, it does conclude that all of the encoding of this gene is done outside the cell. /note=Secondary Annotator Name: Gurunathan, Vibha /note=Secondary Annotator QC: Slight typos - correct to synteny and autogenerated. Otherwise it all looks good - I agree with the function and location calls. CDS 80441 - 80776 /gene="137" /product="gp137" /function="hypothetical protein" /locus tag="JulietS_137" /note=Original Glimmer call @bp 80441 has strength 10.79; Genemark calls start at 80441 /note=SSC: 80441-80776 CP: yes SCS: both ST: SS BLAST-Start: [HNH endonuclease domain protein [Mycobacterium phage Gizmo] ],,NCBI, q1:s1 100.0% 5.42874E-69 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.691, -4.048329621256372, yes F: hypothetical protein SIF-BLAST: ,,[HNH endonuclease domain protein [Mycobacterium phage Gizmo] ],,YP_008060940,96.3964,5.42874E-69 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Fernandez, Mackenzie /note=Auto-annotation: Glimmer and GeneMark agree. Both call a start of 80441. ATG start codon. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene, and covers the suggested start site. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.048. It is the best final score on PECAAN. Z-score is 2.691. /note=Gap/overlap: Gap of 2bp. The gap is conserved in other phages (Gabriel, Naija) and there are no ORFs longer than 120bp. /note=Phamerator: Pham: 539. Date 1/13/2023. Pham has 171 members and 14 phages are drafts. /note=Starterator: Start site 3 in Starterator was manually annotated in 132 of the 157 non-draft genes in the pham. Start 3 is 80441 in JulietS. This evidence agrees with the site predicted by GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 80441. /note=Function call: NKF. NCBI had all hits as hypothetical except for one call (a HNH endonuclease protein with a coverage of 100%, percentage identity of 95.5%, and an e-value of 5e-69) but it was not strong enough to support a clear function. PhagesDB did not have any significant hits. CDD had one hit for a pneumococcal surface protein but the e-value was 3.77e-3. HHpred had no significant hits (probability lower than 80% and score was less than 50 for all hits; no significant e-values). /note=Transmembrane domains: 0 predicted TMRs from Deep TMHMM, therefore it is not a membrane protein. /note=Secondary Annotator Name: Qi, Haocheng /note=Secondary Annotator QC: I agree with your location call, but for the function, I recommend you to check with professor, I think compare to NCBI, the HHpred will be more reliable and I would think this is a NKF gene. Even though it has function, please look at the manual to understand the correct way to write the synteny CDS 80773 - 80919 /gene="138" /product="gp138" /function="hypothetical protein" /locus tag="JulietS_138" /note=Original Glimmer call @bp 80773 has strength 9.34; Genemark calls start at 80773 /note=SSC: 80773-80919 CP: yes SCS: both ST: SS BLAST-Start: [gp140 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 7.02056E-26 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.528, -5.795782549058414, no F: hypothetical protein SIF-BLAST: ,,[gp140 [Mycobacterium phage Bxz1] ],,NP_818213,100.0,7.02056E-26 SIF-HHPRED: SIF-Syn: /note=Auto-annotation: Both Glimmer and GeneMark call the gene and they agree that the start site is 80,773. The start codon is GTG. /note=Coding Potential: The ORF has good coding potential on the complementary sequence and the chosen start site includes all of the coding potential. /note=SD (Final) Score: While the start site at 80773 doesn`t have the highest SD score (-4.199 at start site 80893 is better than -5.796 at 807732. However, it is the best choice because it has a significantly smaller gap/overlap. The z-score isn`t above 2, but it is close at 1.528. /note=Gap/overlap: This start site has the lowest gap/overlap. Since the overlap is 4bp, this suggests that the gene could be a part of an operon. /note=Phamerator: Gene found in pham 3 on 1/20/23. When compared to phages BackyardAgain, Megamind, and Shrimp, the pham in which this gene is most commonly annotated was found to be in other members of the same cluster C. There is synteny with other non-draft phages belonging to the same cluster. /note=Starterator: Start: 3 @80773 has 157 MA`s. The start number called the most often in the published annotations is 3, it was called in 157 of the 157 non-draft genes in the pham. /note=Location call: The evidence supports that this is a real gene, and the potential candidate start site at 80773 seems most likely. /note=Function call: NKF. There were no phagesdb BLAST hits, and the top 2 NCBI BLAST hits (Gizmo and Loanshark) also agreed that there is NKF with 100% coverage, 100%+ identity, and E-value <10^-26. There were also no hits on CDD or HHpred that fulfilled the e-value, coverage, and identity requirements. Phage Tyke has a hit on phagesDB blast with a low e-value for minor tail protein, but when the sequences of surrounding genes of minor tail proteins are blasted against each other, there is extremely low probability and a high e-value. Not enough evidence to claim this as a minor tail protein. /note=Transmembrane domains: Neither TMHMM, deep TMHMM, or TOPCONS predict any TMDs, therefore it is not a membrane protein. Phage Tyke has a hit on phagesDB blast with a low e-value, but when the sequences of surrounding genes of minor tail proteins are blasted against each other, there is extremely low probability and a high e-value. Not enough evidence to claim this as a minor tail protein. /note=Secondary Annotator Name: Kristianto, Luke /note=Secondary Annotator QC: Typo: Glimmer & GeneMark call the start site as 80773, not 80919. Typo: It`s an overlap of -4, not a gap of 4. Please mention that a gap/overlap of -4 is likely indicates the presence of an operon. I agree with function call. CDS 80916 - 81059 /gene="139" /product="gp139" /function="membrane protein" /locus tag="JulietS_139" /note=Original Glimmer call @bp 80916 has strength 22.19; Genemark calls start at 80916 /note=SSC: 80916-81059 CP: yes SCS: both ST: SS BLAST-Start: [gp141 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 4.42022E-22 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.295, -2.0949393225970705, yes F: membrane protein SIF-BLAST: ,,[gp141 [Mycobacterium phage Bxz1] ],,NP_818214,100.0,4.42022E-22 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Melnyk, Mattie /note=Auto-annotation: Both genemark and glimmer call 80916 as the start site with a start codon of ATG /note=Coding Potential: Both the host trained and the self-trained genemark show good coding potential for the auto-annotated start site. /note=SD (Final) Score: The final score is -2.095 which is the highest final score on PECAAN /note=Gap/overlap: This start site has a gap of -4 indicating that this gene is in a operon /note=Phamerator: As of 1/13/23, this gene is in Pham 508 which has 171 members, 14 of which are drafts. Other members of this pham that are cluster C include HyRo and I3. /note=Starterator: JulietS calls Start 1 @ 80916 which is the most annotated start site with 157 manual annotations. /note=Location call: I believe this is a real gene with a start site of 80916 based on the above information /note=Function call: NKF; There are no significant results on HHpred and no results on CCD. NCBI hits are (identity:100, coverage:100, e-value: 4.42e^-22) and (Identity: 97.87, coverage:100, e-value: 1.18e^-21) and both call it as a hypothetical protein. The Phages DB hits both have e-values of 5e^-19 and call the function as unknown. /note=Transmembrane domains: There is 1 predicted TMR on Deep TMHMM that is 20 amino acids in length. It is said to be an alpha TMR. Since there are no clues as to the function of this protein all this tells us is that there is likely an alpha TMR in the protein /note=Secondary Annotator Name: Geghamyan, Knar /note=Secondary Annotator QC: I agree with location and function call. CDS 81061 - 82416 /gene="140" /product="gp140" /function="minor tail protein" /locus tag="JulietS_140" /note=Original Glimmer call @bp 81061 has strength 18.23; Genemark calls start at 81061 /note=SSC: 81061-82416 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Mycobacterium phage I3] ],,NCBI, q1:s1 100.0% 0.0 GAP: 1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.796, -5.443606692405957, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Mycobacterium phage I3] ],,YP_010510552,99.7783,0.0 SIF-HHPRED: baseplate wedge protein, gp17; Myoviridae, Phage, Baseplate complex, VIRUS;{Vibrio phage XM1},,,7KH1_A2,43.459,99.6 SIF-Syn: This is a minor tail protein. Upstream gene is NKF, and the downstream gene is also a minor tail protein, similar to Grungle and Fludd. /note=Primary Annotator Name: Gurunathan, Vibha /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 81061, at start codon ATG. /note=Coding Potential: The chosen start site covers all the coding potential - the previous gene has high coding potential and there’s only a gap of 1 bp between this gene and the previous. However, because there is the gene directly previous that covers that gene’s coding potential, this start site for this gene covers all the coding potential. This gene has good coding potential and it is in the top strand only. /note=SD (Final) Score: The final score of this start site is -5.444, which is not the most negative final score of the start site options, but it is the one which is fairly less negative and minimizes the gap. Similarly, the Z-score is not above 2, but it is fairly close, at 1.796. This is not the best z-score out of the potential start sites, but it is one of the best ones and it minimizes the gap. /note=Gap/overlap: There is a gap of 1 bp. /note=Phamerator: Pham 434. Date 1/19/23. It is conserved; found in Stubby_137 and Sprinklers_140. Function was listed as a minor tail protein on the phams database. /note=Starterator: Start site 3@81061 was the most annotated, with 165 manual annotations. No other candidate start site was annotated. /note=Location call: Due to the above evidence, the start site was called at 81061. /note=Function call: HHPred shows similarity with baseplate wedge proteins with low e-values. However, the baseplate wedge protein has multiple proteins within it, and the protein that this gene encodes for may be a tail protein that connects to this wedge protein, similar to other baseplate wedge structures across other phages. BLASTp shows matches with virion structural proteins with e-values of 0 and minor tail proteins with high e-values. Phages DB lists the pham function as minor tail protein. Therefore, the function is called a minor tail protein. /note=Transmembrane domains: Deep TMHMM shows that this is most likely a globular protein, but does not predict any TMD’s, meaning that it is most likely not a membrane protein. /note=Secondary Annotator Name: Sandhu, Muskaan /note=Secondary Annotator QC: I agree with this start site and function call. CDS 82428 - 86231 /gene="141" /product="gp141" /function="minor tail protein" /locus tag="JulietS_141" /note=Original Glimmer call @bp 82437 has strength 16.35; Genemark calls start at 82428 /note=SSC: 82428-86231 CP: yes SCS: both-gm ST: SS BLAST-Start: [minor tail protein [Mycobacterium phage QBert] ],,NCBI, q1:s1 100.0% 0.0 GAP: 11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.295, -2.782170923661845, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Mycobacterium phage QBert] ],,YP_010058234,100.0,0.0 SIF-HHPRED: Endo-1,4-beta-xylanase C; Binding Site, Carbohydrates, Enzyme Stability, Substrate Specificity, Endo-1, 4-beta-xylanase, Xylan-binding domain, Thermophilic enzymes, Thermostabilizing Domains, sugar; HET: CA, GOL; 2.43A {Paenibacillus barcinonensis},,,4XUP_A,28.8871,98.7 SIF-Syn: Minor Tail protein, upstream gene is minor tail protein, and downstream gene is NKF, just like in phage Grungle. /note=Primary Annotator Name: Hines, Kia /note=Auto-annotation: Glimmer calls 82437 as the start but GeneMark calls 82428 as start. /note=Coding Potential: Host Trained GeneMark shows that there is really good coding potential on the forward strand, indicating this is a forward gene. The chosen start sites both cover all of the coding potential. However, there is a region of no coding potential towards the end of the gen, but it picks up again before the stop codon. GeneMarkS also shows that the start site covers the entire coding potential, however there isn’t an area of no coding potential like there is on Host Trained GeneMark. /note=SD (Final) Score: The two chosen start sites have the same z-score (3.295) which is the best option. However, start site 82428 has a final score of -2.782 and start site 82437 has a final score of -3.055. Both of which are the best/most negative two scores among the rest of the options. /note=Gap/overlap: Start site 82428 has a gap of 11 bp with the upstream gene and start site 82437 has a gap of 20 bp; both of which are acceptable gaps. The length of the gene with the 82428 start site is 3804 and 3795 with the 82437 start site; both of which are acceptable lengths. /note=Phamerator: Pham number is 419 as of 1/13/23. It is conserved, found in QBert_139 and Sprinklers_141. /note=Starterator: Start number for start site 82437 is 8, manually annotated 22 times. Start number for start site 82428 is 7, manually annotated 138 times. The most conserved start site is 7, it is seen in 138 of the 168 non draft pham members. /note=Location call: Both start sites cover the entire region of coding potential, and there is great coding potential on the entire length of the gene so this is a real gene. Start site 82428 has a better Final score than start 82437 and is conserved and manually annotated more times than start 82437, so the start site of this gene is 82428. /note=Function call: Predicted function is Minor Tail Protein. NCBI Blast and Phages db both call this function with e-values of 0 and coverage of 100% and %identity of 99%. However, the top two hits on HHpred did call 1,4 beta-glycodase with e-values very close to zero (0.000025), but because NCBI and phagesdb have multiple hits with better e-values all stating minor tail protein, this was the final function call I decided to make. /note=Transmembrane domains: There are no TMDs predicted by deepTMHMM, so it is not a membrane protein. /note=Secondary Annotator Name: Gurunathan, Vibha /note=Secondary Annotator QC: It looks good - I agree with the function and location calls. I`m assuming HHPred and CDD had issues so they were ommitted? CDS complement (86232 - 86558) /gene="142" /product="gp142" /function="hypothetical protein" /locus tag="JulietS_142" /note=Original Glimmer call @bp 86558 has strength 13.56; Genemark calls start at 86558 /note=SSC: 86558-86232 CP: yes SCS: both ST: SS BLAST-Start: [gp145 [Mycobacterium phage Spud] ],,NCBI, q1:s1 100.0% 1.93003E-73 GAP: 38 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.632, -3.4219145408355454, yes F: hypothetical protein SIF-BLAST: ,,[gp145 [Mycobacterium phage Spud] ],,YP_002224394,100.0,1.93003E-73 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Luk, Jarrett /note=Auto-annotation:Reverse Gene.Glimmer(86558), Genmark (86558), start codon:ATG /note=Coding Potential: The coding potential is found in the reverse frame.The gene covers all coding potential in both the Host Trained and Self Trained GeneMark. The coding potential is also relatively regular which makes it good. /note=SD (Final) Score:-3.422 it`s the best score due to how it is the least negative and has the smallest gap. Z score is 2.632 which is greater than 2.It shows that the site is a good candidate for start site. /note=Gap/overlap:38 which is a relatively small gap /note=Phamerator: Pham 375, 01/15/23 gene conserved in other phages in the same Pham such as Fruitloop and Colt /note=Starterator: 01/13/23. Manually annotated 178/178 nondraft in this pham. Start 2 (2,61335) is manually called by 178 others . Evidence is also in line with the autoannotation made by glimmer and genemark /note=Location call: It is a real gene and the strat site is Start 2 at 86588 based on the evidence above /note=Function call:NKF.The top two hits on phagesDB blast (e-value=5e-60 for phage Adlitam and Alice) indicate unknown function. The top hit on NCBI blast(e-value =2e-73 and 3e-73 for phage Spud and Naija, with over 99% identity and 100% coverage) also indicates an unknown function. There are no significant hits on HHpred and CDD. /note= /note=Transmembrane domains: No transmembrane domain predicted on TOPCONS, TMHMM and deep TMHMM.Which indicates that the gene is not responsible for a membrane protein /note= /note=Secondary Annotator Name: Pham, Truc /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. Note: The starterator menu has not been filled out. CDS complement (86597 - 86923) /gene="143" /product="gp143" /function="hypothetical protein" /locus tag="JulietS_143" /note=Original Glimmer call @bp 86923 has strength 14.42; Genemark calls start at 86923 /note=SSC: 86923-86597 CP: yes SCS: both ST: SS BLAST-Start: [gp142 [Mycobacterium phage Rizal] ],,NCBI, q1:s13 100.0% 8.88139E-71 GAP: 176 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.22, -2.1123791669543204, yes F: hypothetical protein SIF-BLAST: ,,[gp142 [Mycobacterium phage Rizal] ],,YP_002224835,90.0,8.88139E-71 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Okahata, Leila /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 86923 bp. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. The chosen start site includes all of the coding potential. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -2.112. It is the best final score on PECAAN. /note=Gap/overlap: 176 bp. Somewhat large, but ultimately reasonable because the gap is conserved in other phages (Grungle, Daffodil) and there is no coding potential in the gap that might indicate a new gene. /note=Phamerator: Pham 510. Date 1/12/2023. It is conserved and found in Grungle, Daffodil, and ParkTD, which are all in the same cluster as JulietS (C). /note=Starterator: Start site 6 in Starterator was manually annotated in 131/157 non-draft genes in this pham. Start 6 is 86923 bp in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and likely has a start site at 86923 bp. /note=Function call: No known function (NKF). The top two phagesDB BLAST hits are of unknown function (E-value = 2e-58), and the top three NCBI BLAST hits are also of unknown function (E-value < 3e-70, 99.07%+ identity, 100% coverage). CDD and HHpred had no significant hits. /note=Transmembrane domains: Neither TMHMM, TOPCONS, or DeepTMHMM predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Uyemura, Antonio /note=Secondary Annotator QC: I agree with the above evidence that this is a real reverse gene with start site 86923. CDS 87100 - 87573 /gene="144" /product="gp144" /function="hypothetical protein" /locus tag="JulietS_144" /note=Original Glimmer call @bp 87100 has strength 18.4; Genemark calls start at 87100 /note=SSC: 87100-87573 CP: yes SCS: both ST: SS BLAST-Start: [gp146 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 6.1241E-108 GAP: 176 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.22, -2.253377680616507, yes F: hypothetical protein SIF-BLAST: ,,[gp146 [Mycobacterium phage Bxz1] ],,NP_818219,100.0,6.1241E-108 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hegde, Priya /note=Auto-annotation: Glimmer and GeneMark call a start site of 87100 (ATG start codon). /note=Coding Potential: Good coding potential in GeneMark Host and Self. Coding potential on this ORF is on the forward strand only. /note=SD (Final) Score: -2.253. It is the highest final score on PECAAN. /note=Gap/overlap: There is a large 176bp gap with the previous gene. No coding potential noted in this gap in GeneMark Host and Self. The gap is also conserved, found in phages BeanWater and Cali. /note=Phamerator: Gene is in pham 558 (date accessed: 01/19/2023). It is conserved, found in phages BeanWater and Cali. /note=Starterator: Start site 1 at 87100 has 156 manual annotations. It is the most manually annotated start site on Starterator. /note=Location call: This gene likely starts at 87100, supported by the high final score and a Z-score > 2. This is also supported by the high number of manual annotations for this start site. /note=Function call: NKF in PhagesDB BLASTp; top hits were phages Adlitam and Alice (e-values of 6e-88). NKF in NCBI BLAST; top hits were phages Bxz1 (6.12e-108) and Cane17 (1.11e-107). No convincing hits in HHPred (e-values too high). No hits in CDD. /note=Transmembrane domains: No TMDs predicted in DeepTMHMM, so this protein is not a membrane protein. /note=Secondary Annotator Name: Santilla, Matthew /note=Secondary Annotator QC: I agree with this annotation. CDS 87742 - 87885 /gene="145" /product="gp145" /function="hypothetical protein" /locus tag="JulietS_145" /note=Original Glimmer call @bp 87742 has strength 19.57; Genemark calls start at 87742 /note=SSC: 87742-87885 CP: yes SCS: both ST: SS BLAST-Start: [gp147 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 5.57119E-24 GAP: 168 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.22, -2.1123791669543204, yes F: hypothetical protein SIF-BLAST: ,,[gp147 [Mycobacterium phage Bxz1] ],,NP_818220,100.0,5.57119E-24 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Lee, Amber /note=Auto-annotation: Glimmer and GeneMark both call the start site at 87742 bp. The start codon is ATG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: The start site 46673 corresponds to a Final Score of -2.112 which is the best and only final score. It also has a Z-score of 3.22. /note=Gap/overlap: There is a 168 bp gap which is slightly large but it is reasonable because the gap is conserved in other non-draft phages (Cali, Capablanca) of the same cluster and there is no coding potential in the gap that might be a new gene. This gene is shorter than normal but the length is also conserved in other non-draft phages of the same cluster. /note=Phamerator: pham: 495. Date 1/12/2023. It is conserved; found in Cali (C1) and Capablanca (C1). /note=Starterator: Start site 6 in Starterator was manually annotated in 158/158 non-draft genes in this pham and is the most manually annotated start site. Start 6 is 87742 bp in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 87742 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: NKF. E-values too large and not enough coverage in HHPRED. Many strong hits (e-value ~0) are observed in PhagesDB and NCBI BLAST, but no known function. No hits in CDD. Not a membrane protein because it wasn’t called by TMHMM, Deep TMHMM or TOPCONS. /note=Transmembrane domains: Neither TMHMM, Deep TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Shah, Amay /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 87885 - 88211 /gene="146" /product="gp146" /function="hypothetical protein" /locus tag="JulietS_146" /note=Original Glimmer call @bp 87885 has strength 12.25; Genemark calls start at 87885 /note=SSC: 87885-88211 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SCOTTMCG_146 [Mycobacterium phage ScottMcG] ],,NCBI, q1:s1 100.0% 2.58345E-71 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.062, -2.523003374675015, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SCOTTMCG_146 [Mycobacterium phage ScottMcG] ],,YP_002224174,100.0,2.58345E-71 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Qi, Haocheng /note=Auto-annotation:Both Glimmer and genemark, they agree on the same site which is 87885 the start codon is ATG /note=Coding Potential:Coding potential is found both in GeneMark Self and Host, and there is only forward potential, and the chosen start site does include all of the coding potential. /note=SD (Final) Score:-2.523, the best final score on PECAAN /note=Gap/overlap:-1bp, very reasonable short overlap /note=Phamerator: 372. Date 01/13/23. It is conserved; found in other 193 non-draft phages, such as Ading or Bread. /note=Starterator: Start site 14 in Starterator was manually annotated in 154/179 non-draft genes in this pham. Start 14 is 87885 in JulietS. This evidence agrees with the site predicted by both Glimmer and GeneMark. /note=Location call:Based on the above evidence, this is a real gene and the most likely start site is 87885 /note=Function call:Not known function. The top 2 phagesdb BLAST hits (phage:ZygoTaiga; Zeenon) have the function of unknown function with e-value of 2e-55 Not known function, and the ncbi blast also have no known function with e-value of 3e-71. There is also no hits in CDD, the largest possibility in HHpred is 60.36%, so it is not reliable, so overall there is no known function in this gene. /note=Transmembrane domains:Neither TMHMM or TOPCONS or DeepTMHMM predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Kristianto, Luke /note=Secondary Annotator QC:checked CDS 88276 - 88521 /gene="147" /product="gp147" /function="hypothetical protein" /locus tag="JulietS_147" /note=Original Glimmer call @bp 88276 has strength 16.87; Genemark calls start at 88276 /note=SSC: 88276-88521 CP: yes SCS: both ST: SS BLAST-Start: [gp147 [Mycobacterium phage Cali] ],,NCBI, q1:s1 100.0% 1.33807E-48 GAP: 64 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.062, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: ,,[gp147 [Mycobacterium phage Cali] ],,YP_002224620,100.0,1.33807E-48 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Uyemura, Antonio /note=Auto-annotation: Both Glimmer and GeneMark call the start site as 88276. The start codon is ATG. /note=Coding Potential: Good coding potential in both self and host trained but there is some coding potential in the reverse direction. However, this gene is flanked by forward genes indicating that this gene is in the forward direction. /note=SD (Final) Score: -2.443. This is the best final score. /note=Gap/overlap: 64 bp. Although this gap is over the 50 bp threshold, it is the smallest gap. /note=Phamerator: 503 pham 1/15/23. This gene is conserved in both Alice (C1) and Astraea (C1). /note=Starterator: Start 1 @ 88276. This start site was manually annotated 157/157. This agrees with both Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 88276. Additionally, it had a Z-score of 3.062. /note=Function call: The function is unknown. The top three phages (BackyardAgain (C1), BadAgartude (C1), Basquiat (C1)) on phagesdb BLAST hits mark the function as unknown (E-value 2e-37). Additionally the first 5 NCBI BLAST hits also call the function as a Hypothetical protein (100% coverage, 98>% identity, and E-values <10^-48). HHPRED provides no convincing evidence nor did anything show up for CDD. /note=Transmembrane domains: DeepTMHMM predicts it to be inside the cell with 100% probability. /note=Secondary Annotator Name: Pham, Truc /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 88514 - 88744 /gene="148" /product="gp148" /function="hypothetical protein" /locus tag="JulietS_148" /note=Original Glimmer call @bp 88514 has strength 3.41; Genemark calls start at 88514 /note=SSC: 88514-88744 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SCOTTMCG_148 [Mycobacterium phage ScottMcG] ],,NCBI, q1:s1 100.0% 6.16935E-49 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.274, -4.620091156089588, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SCOTTMCG_148 [Mycobacterium phage ScottMcG] ],,YP_002224176,100.0,6.16935E-49 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Barrera, Alexis /note=Auto-annotation: Glimmer and Genemark. Both call the start located at 88514. The start codon is ATG, which is common (about half of all genes have this start). /note=Coding Potential: Coding potential in this reading frame is in the forward direction only, which indicates this is a forward gene. Coding potential is found in both GeneMark Self and Host. All of the coding potential is included. /note=SD (Final) Score: The z score is 2.274. The final score is -4.620 which is the highest value of all listed start sites. /note=Gap/overlap: -8 bps gap upstream of the gene and a very large gap downstream of the gene, 517 bp. Pham maps indicate that the gap downstream of the gene is tRNA. This gene and these gaps, both upstream and downstream of the gene, are conserved in phages Amataga, Darko, and Yucca. There is no other coding potential in these regions. /note=Phamerator: Pham: 585. Date 01/14/23. The gene is conserved in phages Wally and Yucca which are in the same cluster as JulietS. /note=Starterator: Start site 7 is found in 166 of 166 of genes in pham and was called for 150 of 152 non-draft phage genomes in the pham. It is the most annotated start site and is called 98.8% of the time when present. /note=Location call: Considering the evidence above, this gene is a real gene and has a start site at 88514 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: NKF. Phagesdb BLASTp shows phages ZygoTaiga, Zeenon, and Zalkecks from cluster C1 having no function and an E-value of 2e-44. NCBI BLASTp shows phages ScottMcG, NoodleTree, and Shifa with no known function, full coverage and E-values of e-49, e-48, and e-48 respectively. Phagesdb Function Frequency, HHpred and CDD showed no relevant hits. /note=Transmembrane domains: DeepTMHMM predicts no TMRs. This evidence indicates that this is not a membrane protein. /note=Secondary Annotator Name: Geghamyan, Knar /note=Secondary Annotator QC: I agree with the location and function call. tRNA 88936 - 89010 /gene="149" /product="tRNA-Pro(tgg)" /locus tag="JULIETS_149" /note=tRNA-Pro(tgg) tRNA 89025 - 89095 /gene="150" /product="tRNA-Trp(cca)" /locus tag="JULIETS_150" /note=tRNA-Trp(cca) CDS 89261 - 89542 /gene="151" /product="gp151" /function="hypothetical protein" /locus tag="JulietS_151" /note=Original Glimmer call @bp 89261 has strength 7.73; Genemark calls start at 89261 /note=SSC: 89261-89542 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SCOTTMCG_152 [Mycobacterium phage ScottMcG] ],,NCBI, q1:s1 100.0% 5.87211E-61 GAP: 516 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.141, -2.2763497933341483, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SCOTTMCG_152 [Mycobacterium phage ScottMcG] ],,YP_002224177,100.0,5.87211E-61 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kristianto, Luke /note=Auto-annotation: Both Glimmer and GeneMark call the gene, and they agree on the start site at 89261 bp. The start codon is ATG, which has a high probability of being used as a start site. /note=Coding Potential: Coding potential in this ORF is on the forward strand only on the second frame, indicating that this is a forward gene. The ORF has reasonable coding potential and the called start site does capture all of the coding potential. /note=SD (Final) Score: The final score is the best option at -2.276 and the z-score is the best option at 3.141. This provides strong evidence that the called start site is the real start site. /note=Gap/overlap: The gap/overlap is very large at 516 bp. However, this gap is justified because there are no other potential start sites and there is no coding potential within the gap. Furthermore, this gap is relatively conserved and shows up in phage Ading from cluster/subcluster C1. There is a slight shift likely due to the presence of tRNAs in the gap for phage JulietS. /note=Phamerator: Pham: 577. Date 1/19/2023. It is conserved and found in Ading (C1) and Adlitam (C1). /note=Starterator: Start number 1 in Starterator was manually annotated in 154/154 non-draft genes in this pham. Start number 1 is 89261 bp in phage JulietS. This is likely the start site because it was called the most and was conserved in 167/167 (100.0%) of genes in the pham. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 89261 bp. Starterator agrees with Glimmer and GeneMark. /note=Function call: NKF. The top two PhagesDB BLAST hits have the function of “function unknown” (100% identity, E-value = 3e-55), and the top two NCBI BLAST hits have the function of “hypothetical protein” (100%/98.9247% identity, E-value = 5.87211e-61/1.68508e-60). Results from CDD and HHpred were irrelevant because either no results came up or unlikely results with unreasonably high e-values came up. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs. DeepTMHMM agrees (1.0 probability inside throughout). Therefore, it is likely not a membrane protein. /note=Secondary Annotator Name: Uyemura, Antonio /note=Secondary Annotator QC: I agree with the above evidence that this is a real, forward gene with the start site of 89261. In addition the likely function is unknown. CDS 89607 - 89981 /gene="152" /product="gp152" /function="peptidyl tRNA hydrolase" /locus tag="JulietS_152" /note=Original Glimmer call @bp 89607 has strength 9.22; Genemark calls start at 89607 /note=SSC: 89607-89981 CP: yes SCS: both ST: SS BLAST-Start: [peptidyl tRNA hydrolase [Mycobacterium phage JustHall]],,NCBI, q1:s1 100.0% 3.27223E-85 GAP: 64 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.953, -4.843798539558957, no F: peptidyl tRNA hydrolase SIF-BLAST: ,,[peptidyl tRNA hydrolase [Mycobacterium phage JustHall]],,QAY09052,100.0,3.27223E-85 SIF-HHPRED: Peptidyl-tRNA hydrolase; alpha-beta, Structural Genomics, PSI-2, Protein Structure Initiative, Northeast Structural Genomics Consortium, NESG, Cytoplasm, Hydrolase; 1.8A {Archaeoglobus fulgidus} SCOP: c.131.1.1,,,3ERJ_B,87.9032,99.9 SIF-Syn: Peptidyl tRNA hydrolase, upstream gene is minor tail protein, and downstream gene is DNA helicase, just like in phages Astrea and Fludd. /note=Primary Annotator Name: Shah, Amay /note=Auto-annotation: Glimmer and Genemark. Both call the start site at 89607. /note=Coding Potential: Coding potential in this ORF is in the forward direction only, indicating that this is a forward gene. Coding potential is found in both GeneMark Self and Host. /note=SD (Final) Score: The final score is -4.844. This is the second best final score on PECAAN. /note=Gap/overlap: There is a reasonable gap of 64bp. /note=Phamerator: pham: 479. Date 1/10/2023. It is conserved; found in Alice (C) and Astraea (C). /note=Starterator: Start site 1 in Starterator was manually annotated in 160/160 non-draft genes in this pham. Start 1 is 89607 in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 89607. /note=Function call: peptidyl t-RNA hydrolase. BLASTp hits only included peptidyl tRNA hydrolases with an e-value of 1e-67. The top CDD hit was a peptidyl-tRNA hydrolase type 2. The top HHpred hit was a peptidyl-tRNA hydrolase with a probability of 99.9 and an e-value of 2e-25. The top NCBI BLAST hit was a peptidyl tRNA hydrolase with an e-value of 1.69e-85. Based on this, the likely function is a peptidyl tRNA hydrolase. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Wu, Angus /note=Secondary Annotator QC: I agree with the location and function call. Please remember to mention the called stop codon in the PECAAN Notes, and also talk about the e-values, probability, or % coverage in the BLASTp and CDD hits. Finally, please also fill out the synteny box in PECAAN. tRNA 90292 - 90364 /gene="153" /product="tRNA-Stop(cta)" /locus tag="JULIETS_153" /note=tRNA-Stop(cta) tRNA 90514 - 90588 /gene="154" /product="tRNA-Met(cat)" /locus tag="JULIETS_154" /note=tRNA-Met(cat) tRNA 90715 - 90786 /gene="155" /product="tRNA-Cys(gca)" /locus tag="JULIETS_155" /note=tRNA-Cys(gca) tRNA 90791 - 90865 /gene="156" /product="tRNA-Glu(ctc)" /locus tag="JULIETS_156" /note=tRNA-Glu(ctc) tRNA 90867 - 90940 /gene="157" /product="tRNA-His(gtg)" /locus tag="JULIETS_157" /note=tRNA-His(gtg) tRNA 91102 - 91176 /gene="158" /product="tRNA-Ala(tgc)" /locus tag="JULIETS_158" /note=tRNA-Ala(tgc) tRNA 91366 - 91438 /gene="159" /product="tRNA-Phe(gaa)" /locus tag="JULIETS_159" /note=tRNA-Phe(gaa) tRNA 91444 - 91517 /gene="160" /product="tRNA-Val(cac)" /locus tag="JULIETS_160" /note=tRNA-Val(cac) tRNA 91637 - 91710 /gene="161" /product="tRNA-Lys(ctt)" /locus tag="JULIETS_161" /note=tRNA-Lys(ctt) tRNA 91715 - 91791 /gene="162" /product="tRNA-Glu(ttc)" /locus tag="JULIETS_162" /note=tRNA-Glu(ttc) tRNA 91871 - 91943 /gene="163" /product="tRNA-Gly(tcc)" /locus tag="JULIETS_163" /note=tRNA-Gly(tcc) tRNA 92003 - 92077 /gene="164" /product="tRNA-Thr(cgt)" /locus tag="JULIETS_164" /note=tRNA-Thr(cgt) tRNA 92078 - 92150 /gene="165" /product="tRNA-Thr(tgt)" /locus tag="JULIETS_165" /note=tRNA-Thr(tgt) tRNA 92210 - 92282 /gene="166" /product="tRNA-Thr(ggt)" /locus tag="JULIETS_166" /note=tRNA-Thr(ggt) CDS 92352 - 93428 /gene="167" /product="gp167" /function="minor tail protein" /locus tag="JulietS_167" /note=Original Glimmer call @bp 92352 has strength 12.52; Genemark calls start at 92352 /note=SSC: 92352-93428 CP: yes SCS: both ST: SS BLAST-Start: [gp169 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 0.0 GAP: 2370 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.729, -3.139452255336754, yes F: minor tail protein SIF-BLAST: ,,[gp169 [Mycobacterium phage Bxz1] ],,NP_818228,100.0,0.0 SIF-HHPRED: Beta-lactamase; avibactam, NXL, serine beta-lactamase, TRU-1, class C, Aeromonas enteropelogenes, HYDROLASE; HET: SO4, NXL; 1.04A {Aeromonas enteropelogenes} SCOP: e.3.1.0,,,6FM7_A,94.1341,100.0 SIF-Syn: This gene shares synteny with the phage BackyardAgain, with both having the gene lie under pham 41912. In addition to that, it has upstream synteny with a peptidyl tRNA hyrdrolase (pham 479) and downstream synteny with an NKF (pham 599) /note=Primary Annotator Name: Davis, Kayla /note=Auto-annotation: Both Glimmer and GeneMark list a start site of 92352, which corresponds to a start site of ATG. /note=Coding Potential: There looks to be good coding potential in this gene as both the Host-trained and the Self-trained GeneMark appear to show complete direct sequences. This gene shares synteny with BeanWater and Bipolarisk. /note=SD (Final) Score: The chosen Z-score was 2.729, with a final score of -3.139, both of which corresponded to the start site of 92352. /note=Gap/overlap: There is a large gap of 959 base pairs. /note=Phamerator: As of 01/24/23 this gene was found to be in pham number 41912. Other phages in this pham include Burrough and Pleione. /note=Starterator: As of 01/13/23, the most called start number was 23, and it was called in 191 of the 386 non-draft genes in the pham. However, JulietS does not have a start number of 23 at all. Instead, the best candidate has a start number of 49 @92352, which has 157 MA`s. /note=Location call: Based on the evidence above, it is very likely that this is a real gene with a start site of 92352. /note=Function call: NCBI Blast has numerous hits with low e-values for this being a minor tail protein. HHpred had several hits for this protein being a beta-lactamase. There was one hit on CDD for this being related to a beta-lactamase. Though there were hits on this protein being a beta-lactamase, it is more likely that this is a minor tail protein as it shares synteny with BackyardAgain, which has the same gene listed as a minor tail protein. /note=Transmembrane domains: Deep TMHMM predicted that there were 0 TMRs, meaning that this is not a transmembrane protein. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 93436 - 93594 /gene="168" /product="gp168" /function="hypothetical protein" /locus tag="JulietS_168" /note=Original Glimmer call @bp 93436 has strength 7.46; Genemark calls start at 93436 /note=SSC: 93436-93594 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein ET08_161 [Mycobacterium phage ET08] ],,NCBI, q1:s1 100.0% 5.49649E-27 GAP: 7 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.299, -4.691784380744663, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein ET08_161 [Mycobacterium phage ET08] ],,YP_003347823,100.0,5.49649E-27 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Pham, Truc /note=Auto-annotation: The two auto-annotation algorithms, Glimmer and GeneMarks, both call the start of this gene at 93436 with the start codon of ATG. /note=Coding Potential: The coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host and all of the ORF is included with the 93436 start site. /note=SD (Final) Score: The SD (Final) Score for this start site is -4.692, which is the best score of all the possible start sites on PECAAN. Additionally, this start site also has the best Z-score of 2.299 on PECAAN. /note=Gap/overlap: This gene has a gap of 7 base pairs with the previous gene. This is the best possible start since the other start site option would have a great overlap of 71 base pairs. Additionally, this 7 base pairs gap is not large enough for us to add another gene in either. /note=Phamerator: This gene belongs to pham number 599 as of 1/08/2023. The gene is conserved in phages of this cluster (C) like Audrick and JPickles. There is no function listed for almost all members of this family, so it is highly likely that this is a gene with an unknown function. /note=Starterator: Start site 2 is most often called as it was manually annotated in 149/149 non-draft genes in the pham. Start 2 is 93436 in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 93436. /note=Function call: No Known Function. The top three phagesdb BLAST hits are function unknown (E-value <10^-26), and the only NCBI BLAST hit also have no known function for this gene (100% coverage, 100% identity, and E-value <10^-27). HHpred’s top hits also indicate a known function, but these results have a low probability, coverage percentage, and poor e-value. Also, many other bioinformatic tools disagree, so it is most probable that this gene has no known function. CDD had no relevant hits. /note=Transmembrane domains: Neither DeepTMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Fernandez, Mackenzie /note=Secondary Annotator QC: I agree with the primary annotators location call and function call based on the provided evidence. CDS 93635 - 93832 /gene="169" /product="gp169" /function="hypothetical protein" /locus tag="JulietS_169" /note= /note=SSC: 93635-93832 CP: no SCS: neither ST: NI BLAST-Start: [hypothetical protein M181_gp171 [Mycobacterium phage Gizmo] ],,NCBI, q1:s2 100.0% 1.00348E-38 GAP: 40 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.455, -6.157910251162862, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein M181_gp171 [Mycobacterium phage Gizmo] ],,YP_008060956,98.4848,1.00348E-38 SIF-HHPRED: SIF-Syn: tRNA 94027 - 94102 /gene="170" /product="tRNA-Gly(gcc)" /locus tag="JULIETS_170" /note=tRNA-Gly(gcc) tRNA 94116 - 94188 /gene="171" /product="tRNA-Asp(gtc)" /locus tag="JULIETS_171" /note=tRNA-Asp(gtc) tRNA 94248 - 94320 /gene="172" /product="tRNA-Met(cat)" /locus tag="JULIETS_172" /note=tRNA-Met(cat) tRNA 94326 - 94400 /gene="173" /product="tRNA-Ile(gat)" /locus tag="JULIETS_173" /note=tRNA-Ile(gat) tRNA 94494 - 94566 /gene="174" /product="tRNA-Arg(acg)" /locus tag="JULIETS_174" /note=tRNA-Arg(acg) tRNA 94609 - 94681 /gene="175" /product="tRNA-Val(gac)" /locus tag="JULIETS_175" /note=tRNA-Val(gac) tRNA 94855 - 94928 /gene="176" /product="tRNA-Arg(cct)" /locus tag="JULIETS_176" /note=tRNA-Arg(cct) tmRNA 94942 - 95376 /gene="177" /locus tag="JULIETS_177" /note= tRNA 95469 - 95544 /gene="178" /product="tRNA-Gln(ttg)" /locus tag="JULIETS_178" /note=tRNA-Gln(ttg) tRNA 95548 - 95623 /gene="179" /product="tRNA-Arg(tct)" /locus tag="JULIETS_179" /note=tRNA-Arg(tct) tRNA 95625 - 95700 /gene="180" /product="tRNA-Lys(ttt)" /locus tag="JULIETS_180" /note=tRNA-Lys(ttt) CDS complement (95737 - 96213) /gene="181" /product="gp181" /function="hypothetical protein" /locus tag="JulietS_181" /note=Original Glimmer call @bp 96213 has strength 13.82; Genemark calls start at 96213 /note=SSC: 96213-95737 CP: yes SCS: both ST: SS BLAST-Start: [gp178 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 2.6978E-111 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.797, -3.077027835973012, yes F: hypothetical protein SIF-BLAST: ,,[gp178 [Mycobacterium phage Bxz1] ],,NP_818229,100.0,2.6978E-111 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Deng, Yiran /note=Auto-annotation: Glimmer and GeneMark both call the start site at 96213 with start codon ATG /note=Coding Potential: Coding potential in this ORF is found on reverse complementary sequence and no switch in orientation is observed in both host-trained and self-trianed Genemark. All the coding potential included in the ORF and by the selected start site. /note=SD (Final) Score: final score of -3.077 and z-value of 2.797, which is the best final score on PECAAN /note=Gap/overlap: Gap/overlap: the overlap with the upstream gene is 1 bp, and may be an operon. This gene is conserved in several other phages of the same cluster (Fludd, Atlantean) and the gap does not contain coding potential and was seen in Fludd and Atlantean as well. /note=Phamerator: pham: 359, date 01/20/2023. It is conserved; found in Ading (C) and Alice (C). /note=Starterator: Start site number 180 in Starterator had the highest manual annotation in 158/180 non-drafted genes in this pham. Start site 180 is at position 96213 in JulietS, which agrees with the auto-annotated site by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 96213. /note=Function call: NKF. None of phagesDB BLAST, NCBI BLAST, CDD, or HHPRED shows any significant relevant hits with known functions, only significant probability or coverage to hits with unknown function. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Santilla, Matthew /note=Secondary Annotator QC: I agree with this annotation. Also this gene is probably an operon because the overlap is -1. CDS complement (96213 - 98075) /gene="182" /product="gp182" /function="DNA helicase" /locus tag="JulietS_182" /note=Original Glimmer call @bp 98075 has strength 19.65; Genemark calls start at 98075 /note=SSC: 98075-96213 CP: yes SCS: both ST: SS BLAST-Start: [gp179 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 0.0 GAP: 115 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.421, -6.530842035202567, no F: DNA helicase SIF-BLAST: ,,[gp179 [Mycobacterium phage Bxz1] ],,NP_818230,100.0,0.0 SIF-HHPRED: ATP-dependent DNA helicase; helicase, winged helix, HYDROLASE;{Mycolicibacterium smegmatis},,,7LHL_D,76.4516,99.9 SIF-Syn: DNA helicase, upstream gene is NKF, downstream is NKF, just like in phage CharlieB. /note=Primary Annotator Name: Santilla, Matthew /note=Auto-annotation: Both Glimmer and Genemark state the start site is @ 98075. Start codon is ATG. /note=Coding Potential: The coding potential on both the host-trained Genemark and the self-trained Genemark covers the start site @ 98075 and the stop site @ 96213 in the reverse direction. /note=SD (Final) Score: The final score of the start site @ 98075 is -6.531 which is not the most negative number in this list and the z-score is 1.421 which is not favorable since it is less than 2. /note=Gap/overlap: The gap in this gene is 115 bp which is above the recommended 50 bp maximum gap. /note=Phamerator: Found in Pham 472. Date 01/13/23. The gene is conserved and found in phages BeanWater, Derek, and Nappy which all belong to Cluster C. /note=Starterator: Start site 5 is called in 150/160 non-draft genes. Start site 5 is @ 98075 in JulietS which agrees with the auto annotated start site given by Glimmer and Genemark. /note=Location call: Based on the aforementioned evidence the gene start site is most likely @ 98075. /note=Function call: DNA Helicase. The top PhagesDB blast hits from phages ZygoTaiga, Zeenon, and Yucca have e values of 0.0 with the function being listed as DNA helicase. /note=Transmembrane domains: The top NCBI Blast hits are from phage Spud which has the function listed as gp184 with an e value of 0.0 and phage BackyardAgain which has the function listed as helicase with an e value of 0.0. CDD had one relevant hit with (ascension number pfam00271) a bit score of 61.84 and e value of 9.39e^-12 which states this is Helicase conserved C-terminal domain. HHPred had 3 significant hits with: Code 6VZ4_K; probability 100%, e value of 9.2e^-54, and % coverage 78.0645 stating its a nuclear protein used in chromatin remodeling, Code 6JYL_K; probability 100%, e value of 9.3e^-54, and % coverage 78.0645 stating its a chromatin-remodeling complex, and Code 6PWF_K; probability 100%, e value of 4.4e^-51, and % coverage 78.0645 stating its a chromatin remodeling factor ISWI. /note=Transmembrane domains: Deep TMHMM did not predict any transmembrane domains, so this gene is not a membrane protein. /note=Secondary Annotator Name: Wu, Angus /note=Secondary Annotator QC: I agree with the location and function call. Please remember to fill out the synteny box on PECAAN. CDS 98191 - 98493 /gene="183" /product="gp183" /function="hypothetical protein" /locus tag="JulietS_183" /note=Original Glimmer call @bp 98191 has strength 15.98; Genemark calls start at 98191 /note=SSC: 98191-98493 CP: yes SCS: both ST: SS BLAST-Start: [gp185 [Mycobacterium phage Spud] ],,NCBI, q1:s1 100.0% 4.96052E-64 GAP: 115 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.062, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: ,,[gp185 [Mycobacterium phage Spud] ],,YP_002224406,100.0,4.96052E-64 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Arredondo, Alexis /note=Auto-annotation: Both Glimmer and GeneMark agree and suggest the same start site, 98191, with start codon ATG. /note=Coding Potential: The selected start codon covers the entire length of the gene with coding potential for both the host-trained and self-trained genemark. /note=SD (Final) Score: The gene candidate has the best Z-score, 3.062, and final score -2.505, making it the best candidate. /note=Gap/overlap: The gap is sizable, being 115, however, the gene prior to this gene is in the reserve direction, which may explain the reason as to why the gap is 115. /note=Phamerator: This gene belongs to pham 68, dated 1/13/23. It is conserved in other phages as well, such as Momo (C1) and Phlegm (C1). /note=Starterator: Start site 16 was manually annotated in 376 of the 399 non-draft genes in the pham. This start site is 98191, which agrees with the auto annotation by Glimmer and GeneMark, and further validates the start site. /note=Location call: Based on the agreed start site provided by both Glimmer and GeneMark, 98191, which is validated by starterator, along with the consideration that the gene is conserved in both Momo (C1) and Phlegm (C1), this gene is most likely real. /note=Function call: NKF. The NCBI Blast produced multiple hits, which were listed as a hypothetical protein (NKF). One significant hit had a probability of 100%, an e-value of 4.96e-64, and percent coverage of 100%, indicating that the selected gene also does not have a known function. HHPred did not have a significant hit, and neither did CDD. However, given the statistics of the single NCBI Blast, it is most definitely a protein with no known function. Additionally, Phages DB lists two phages, Delilah and Derek with e-values of 2e-50, with no listed function. /note=Transmembrane domains: Deep TMHMM does not list TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Sandhu, Muskaan /note=Secondary Annotator QC: I agree with the called start site and the function of the gene. However, you need to add evidence from DeepTMHMM to see if there are any transmembrane domains and not rely on TMHMM and TOPCONS. Also, be sure to check the boxes which provide evidence that this gene encodes a protein with no known function under Phagesdb BLAST and include that evidence in your PECAAN notes. CDS 98926 - 99351 /gene="184" /product="gp184" /function="hypothetical protein" /locus tag="JulietS_184" /note=Original Glimmer call @bp 98926 has strength 6.97; Genemark calls start at 98926 /note=SSC: 98926-99351 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_SPRINKLERS_187 [Mycobacterium phage Sprinklers]],,NCBI, q1:s1 100.0% 6.57006E-99 GAP: 432 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.364, -4.556208905684731, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_SPRINKLERS_187 [Mycobacterium phage Sprinklers]],,QAY13446,100.0,6.57006E-99 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kumar, Preyasi /note=Auto-annotation: Glimmer and GeneMark both call the start at 98926, GTG. /note=Coding Potential: Yes, the gene has reasonable coding potential predicted within the putative ORF and the chosen start site covers all this coding potential. Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.556 is the original gene candidate`s final score, but the fourth best final score on PECAAN. Z-score is 2.364, which is strong. I did not select a gene candidate with a higher final score because of start site and coding potential evidence. /note=Gap/overlap: 432bp. Slightly large, but the smallest and most reasonable candidate, especially since the gene length is so large. I didn’t choose a different gene candidate because both Glimmer and GeneMark agreed on the current start site, and there is no coding potential in the gap that might be a new gene. /note=Phamerator: Pham 440. Date 01/13/2023. It is conserved, found in ShiaLabeouf and Shrimp. /note=Starterator: Yes, there is a conserved start site choice. It is start number 14 with a base pair coordinate of 98926. 162 of 167 call site #14. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 98926 bp. /note=Function call: NKF; Both NCBI and PhagesDB database did not predict functions for this gene. The top three PhagesDB BLAST and HHpred hits have unknown function (e-value < 10^-78). No conserved domains identified for this query sequence. /note=Transmembrane domains: Neither TMHMM or TOPCONS or DeepTMHMM predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Gurunathan, Vibha /note=Secondary Annotator QC: I agree with the function and location calls. I would maybe check evidence for calling this function unknown below. CDS 99401 - 99607 /gene="185" /product="gp185" /function="hypothetical protein" /locus tag="JulietS_185" /note=Original Glimmer call @bp 99401 has strength 12.82; Genemark calls start at 99401 /note=SSC: 99401-99607 CP: yes SCS: both ST: SS BLAST-Start: [gp182 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 2.7049E-40 GAP: 49 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.553, -4.4158953075094765, yes F: hypothetical protein SIF-BLAST: ,,[gp182 [Mycobacterium phage Bxz1] ],,NP_818233,100.0,2.7049E-40 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Rheinhardt, Jenna /note=Auto-annotation: Glimmer and GeneMark call 99401 as start. GTG as the start codon. /note=Coding Potential: Not a significant amount of coding potential in the three forward strands. It other options for start presented by PECAAN there is no coding potential as well. /note=SD (Final) Score: -4.416 ; not the best final score given. Z-score is 2.553, not the best one presented. /note=Gap/overlap: 49 ; smallest gap/overlap and within acceptable range. /note=Phamerator: pham : 550. 169 phages within the pham have the 207bp length including Astraea and BackyardAgain. /note=Starterator: Start site 1 at 99401, which has 155 MA calling the site. This agrees with Glimmer and GeneMark. /note=Location call: Yes the evidence above supports that the start site is real. The most likely start site is 99401. /note=Function call: NKF ; PhagesDB presents unknown function from Grungle and Guwapp with e-values of 5e-31. HHPRED best result given was 3JQ0_A with 57.7% probability and e value of 84, structural protein. NCBI BLAST presents phage Bxz1 that is 100% aligned and is listed as a hypothetical protein. No CDD. /note=Transmembrane domains: DeepTmHmm presents no transmembrane domains so it is likely not a membrane protein /note=Secondary Annotator Name: Davis, Kayla /note=Secondary Annotator QC: I thought that the gene looked like it had good coding potential within then Host-trained and self-trained Genemark as they were both continuous. As for the phamerator section, make sure to date the day that the phamerator was last checked, as well as date the starterator run date. CDS 99607 - 99783 /gene="186" /product="gp186" /function="hypothetical protein" /locus tag="JulietS_186" /note=Original Glimmer call @bp 99607 has strength 14.69; Genemark calls start at 99607 /note=SSC: 99607-99783 CP: yes SCS: both ST: SS BLAST-Start: [gp183 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 5.32707E-31 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.153, -4.487636275856737, yes F: hypothetical protein SIF-BLAST: ,,[gp183 [Mycobacterium phage Bxz1] ],,NP_818234,100.0,5.32707E-31 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Tseng, Kylie /note=Auto-annotation: Glimmer and Genemark both call the same start site at 99607; start codon: ATG /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF on the forward strand only on line 1 (indicating that this is a forward gene). The chosen start site covers this ORF. Coding potential is found in both the host and self-trained Genemark. There is synteny with non-draft phages such as Darko and Derek. /note=SD (Final) Score: -4.488. This is a reasonable SD score. The other start site choice has a SD score of -8.1, so -4.488 is preferable by far. /note=Gap/overlap: There is an overlap of 1 bp which is reasonable. The gap might be preferable (125 bp for the other start site choice) but z-score, final score, and Glimmer and Genemark calls agree with the start site with the overlap of 1. /note=Phamerator: Date of investigation: 1/19/23; Pham 544; Yes, the pham is in other members that belong to Cluster C, which is the cluster JulietS belongs to (i.e. phages Alice and Amataga). /note=Starterator: Yes, there is a conserved start choice. It is start number 2 with a base pair coordinate of 99607. Has 156 MA’s. Found in 170/170 (100%) of genes in pham. This start site also agrees with Glimmer and Genemark’s call. E-values are good at -25. /note=Location call: Yes, the evidence suggests this gene is real. Start site 99607 is most likely. /note=Function call: NKF; CDD has no hits. NCBI predicted hypothetical proteins with coverages of 100%, identities over 84%, and e-values of -23 and -31. PhagesDB did not predict any functions for this gene. Phages Guwapp and HyRo support the NKF conclusion since they have synteny with this gene and no known function (e-values -25). HHPred has hits, but the e-values are much higher than 0. /note=Transmembrane domains: No TMDs were predicted by DeepTMHMM; therefore it is not a membrane protein. /note=Secondary Annotator Name: Deng, Yiran /note=Secondary Annotator QC: I agree with this annotation. All of the evidence have been considered. CDS 99780 - 99869 /gene="187" /product="gp187" /function="hypothetical protein" /locus tag="JulietS_187" /note=Genemark calls start at 99780 /note=SSC: 99780-99869 CP: no SCS: genemark ST: SS BLAST-Start: [hypothetical protein M181_gp163 [Mycobacterium phage Gizmo] ],,NCBI, q1:s1 100.0% 5.46656E-12 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.26, -4.122392812353492, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein M181_gp163 [Mycobacterium phage Gizmo] ],,YP_008060964,100.0,5.46656E-12 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Aunger, Sarah /note=Auto-annotation: Glimmer does not call a start site but GeneMark calls one at 99780 with a start codon of ATG. /note=Coding Potential: Coding potential in this Open Reading Frame (ORF) is in the forward strand, indicating a forward reading gene. The start site of #99780 is mediocre in covering all of the coding potential regions on the forward strand, but it does supper the forward direction of this gene. Additionally, mediocre coding potential is found in both Host-Trained GeneMark and Self-Trained GeneMark graphs which suggests that this is a potential gene. /note=SD (Final) Score: Since the final score is the least negative (SD = -4.122), and the Z-score is the highest overall score (Z-score = 2.26), indicating that the autogenerated start site is the better of the options. This is reasonable due to the length of the gene and synteny with Phage Grungle. /note=Gap/overlap: There is an downstream gap of -4 bp, which seems reasonable when looking at the synteny of other phages classed in the C1 cluster against this gene. There is a reasonable gap between this gene and the gene downstream of about -10 bp which allows for the 90 bp gene to be reasonable. This is the most reasonable as if another start site was chosen the gap would be either -256 or 41 bp which are significantly bigger and would cause the gene to be deleted. This length and gap have synteny with Grungle with the length of 90 bp and gap of -4 indicating this position is highly probable. /note=Phamerator: The gene was found to be in Pham 796 (01/17/2023), which is common in Cluster C1 phages, as previously seen in Phages Ading, Alice, and Ewok. There were no functions listed for this Pham which is conserved with all genes found in C1. However, the base pair length was conserved at 90 bp. /note=Starterator: There is a reasonable and highly conserved start site that was looked on 01/11/2023 at (2, 99780) which was called by 114 out of the 114 non-draft genes out of the 127 total pham members. /note=Location call: The gene and the start site are both conserved for this gene. With a high mediocre coding potential in the ORF it indicates a real gene’s placement in the operon compared to the other start sites listed on PECAAN and thus the evidence supports that the start site of this gene starts at #99780 and the gene is accurate to other phages within the C1 group. /note=Function call: The function of this gene is no known function as the CDD results are inconclusive. HHpred also gives no good hits with the same function, the best e-value was 0.0095 for a zinc finger at a 95% probability. BLASTp on PhagesDB.org has the better e-values for all hits with phages EasyJones and CindyLou at 8x10-12 with all having an unknown function. This is also seen in the NCBI BLASTp will all hits being hypothetical proteins. As a result, there is the high probability that this gene has no known function (NKF). /note=Transmembrane domains: Both TOPCONS and TMHMM predict no transmembrane domains, thus this gene does not encode for a membrane protein. DeepTMHMM also predicts no transmembrane domains as well, so all encoding of this gene is done inside the cell. /note=Secondary Annotator Name: Geghamyan, Knar /note=Secondary Annotator QC: I agree with the location and function call. CDS 99860 - 100126 /gene="188" /product="gp188" /function="hypothetical protein" /locus tag="JulietS_188" /note=Original Glimmer call @bp 99860 has strength 21.58; Genemark calls start at 99860 /note=SSC: 99860-100126 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SCOTTMCG_187 [Mycobacterium phage ScottMcG] ],,NCBI, q1:s1 100.0% 1.67087E-56 GAP: -10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.762, -3.1513923047253267, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SCOTTMCG_187 [Mycobacterium phage ScottMcG] ],,YP_002224186,100.0,1.67087E-56 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Fernandez, Mackenzie /note=Auto-annotation: Glimmer and GeneMark agree. Both call a start of 99860. ATG start codon. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene, and covers the suggested start site. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.151. It is the best final score on PECAAN. Z-score is 2.762. /note=Gap/overlap: Overlap of -10bp. Is small and reasonable as the overlap is conserved in other phages (Turret, Dietrick) and there is no coding potential in the overlap that might be a new gene. /note=Phamerator: Pham: 573. Date 1/13/2023. Pham has 168 members and 14 phages are drafts. /note=Starterator: Start site 12 in Starterator was manually annotated in 102 of the 154 non-draft genes in the pham. Start 12 is 99860 in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 99860. /note=Function call: NKF. No, there is not enough evidence to suggest a function for this protein as both NCBI and PhagesDB did not have any significant hits. CDD had zero hits. HHpred had no significant hits (probability lower than 80% and score was less than 50 for all hits; no significant e-values). /note=Transmembrane domains: 0 predicted TMRs from Deep TMHMM, therefore it is not a membrane protein. /note=Secondary Annotator Name: Barrera, Alexis /note=Secondary Annotator QC: I agree with the location and function call. However, I would mark some evidence under Phagesdb BLAST and NCBI BLAST to support the function call. CDS 100238 - 100999 /gene="189" /product="gp189" /function="hypothetical protein" /locus tag="JulietS_189" /note=Original Glimmer call @bp 100238 has strength 20.92; Genemark calls start at 100238 /note=SSC: 100238-100999 CP: yes SCS: both ST: SS BLAST-Start: [gp188 [Mycobacterium phage Rizal] ],,NCBI, q1:s19 100.0% 0.0 GAP: 111 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.062, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: ,,[gp188 [Mycobacterium phage Rizal] ],,YP_002224853,93.3579,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Geghamyan, Knar /note=Auto-annotation: Both Glimmer and GeneMark call the gene and they agree that the start site is 100,238. The start codon is ATG. /note=Coding Potential: The ORF has good coding potential on the direct sequence and the chosen start site includes all of the coding potential. /note=SD (Final) Score: -2.505 is the best SD (final) score because it is the highest (least negative) value, and has the second smallest gap of all of the choices, although it is substantial at 111bps. However, there is no coding potential in the gap, and when compared to non-draft phages from the same cluster, there weren`t any additional genes present. Z-score is above 2. /note=Gap/overlap: There is a significant gap of 111bp upstream, but there is no coding potential in the gap, and when compared to non-draft phages from the same cluster, there weren`t any additional genes present. /note=Phamerator: Gene found in pham 5 on 1/20/23. When compared to phages BackyardAgain, Megamind, and Shrimp, the pham in which this gene is most commonly annotated was found to be in other members of the same cluster C. There is synteny with other non-draft phages belonging to the same cluster. /note=Starterator: Start: 5 @100238 has 136 MA`s. The start number called the most often in the published annotations is 3, it was called in 136 of the 167 non-draft genes in the pham. /note=Location call: The evidence supports that this is a real gene, and the potential candidate start site at 100238 seems most likely. /note=Function call: NKF. The top 2 phagesdb BLAST hits (e value < 10^-140) had no known function (Capyblanca and Grungle), and there weren`t any NCBI BLAST hits with high coverage, identity, and a low E-value. There were also no hits on CDD or HHpred that fulfilled the e-value, coverage, and identity requirements. /note=Transmembrane domains: Neither TMHMM, deep TMHMM, or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Davis, Kayla /note=Secondary Annotator QC: I agree with your function call and location call. Make sure to note the date that the starterator was run! Good work! CDS 101003 - 101650 /gene="190" /product="gp190" /function="hypothetical protein" /locus tag="JulietS_190" /note=Original Glimmer call @bp 101003 has strength 13.27; Genemark calls start at 101003 /note=SSC: 101003-101650 CP: yes SCS: both ST: SS BLAST-Start: [gp186 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 1.86798E-155 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.062, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: ,,[gp186 [Mycobacterium phage Bxz1] ],,NP_818237,100.0,1.86798E-155 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Melnyk, Mattie /note=Auto-annotation: Both glimmer and Genemark call the start site as 101003 with a start codon of ATG /note=Coding Potential: There is coding potential on both the self and host trained gene marks for the second Forward ORF, however on both it appears to start slightly after the called start site /note=SD (Final) Score: The SD score is -2.443 which is the best SD score called on PECAAN /note=Gap/overlap: This gene has a gap of 3 which is reasonable /note=Phamerator: The Pham is 486 as of 1/13/23 and there are 173 members, 14 of which are drafts. Other members of subcluster C1 are in this Pham including ErnieJ and Essence /note=Starterator: JulietS called the most annotated start site which is 101003. It is start site 1 with 159 manual annotations /note=Location call: This is likely a real gene with a start at 101003 /note=Function call: NKF; PhagesDB hits call the gene as function unknown with e-values of 1e^-126. NCBI hits both call a hypothetical protein with (Identity: 100, Coverage: 100, e-value: 1.867e^-155) and (identity: 99.53, coverage: 100, e-value: 2.62e^-155). There are no significant results in HHpred or CDD /note=Transmembrane domains: DeepTMHMM predicts 0 TMRs /note=Secondary Annotator Name:Luk Jarrett /note=Secondary Annotator QC: I agree with the location and function call. However include coding potential is found in the forward strand. Include starterator and phamerator dates. Also include start codon CDS 101691 - 102380 /gene="191" /product="gp191" /function="thymidylate kinase" /locus tag="JulietS_191" /note=Original Glimmer call @bp 101691 has strength 14.36; Genemark calls start at 101691 /note=SSC: 101691-102380 CP: yes SCS: both ST: SS BLAST-Start: [gp187 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 4.03373E-163 GAP: 40 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.908, -3.6727816321281046, yes F: thymidylate kinase SIF-BLAST: ,,[gp187 [Mycobacterium phage Bxz1] ],,NP_818238,99.1266,4.03373E-163 SIF-HHPRED: c.37.1.1 (A:) Thymidylate kinase {Human (Homo sapiens) [TaxId: 9606]} | CLASS: Alpha and beta proteins (a/b), FOLD: P-loop containing nucleoside triphosphate hydrolases, SUPFAM: P-loop containing nucleoside triphosphate hydrolases, FAM: Nucleotide and nucleoside kinases,,,SCOP_d1nn5a_,98.69,99.8 SIF-Syn: /note=Primary Annotator Name: Gurunathan, Vibha /note=Auto-annotation: Both Glimmer and Genemark call the start site at 101691, at start codon GTG. /note=Coding Potential: This gene has strong coding potential in the top strand only, and the chosen start site covers all of the coding potential. /note=SD (Final) Score: The start site has a final score of -3.673 and a Z score of 2.908, which are the best final scores and z-scores of all potential start sites listed. /note=Gap/overlap: There is a gap of 40 bp, which is not ideal, but it is acceptable. /note=Phamerator: Pham 494. Date 1/20/23. It is conserved; found in Aditlam_196 and Alice_184. Function was listed as a thymidylate kinase on the phams database. /note=Starterator: Start site 20@101691 has 153 manual annotations, and start site 29@101829 and start site 37@101937 have 1 manual annotation. /note=Location call: Due to the above evidence, the location is called at start site 101691. /note=Function call: PhagesDB data shows genes in the same pham having the thymidylate kinase function. NCBI blast shows significant hits with this gene being a thymidylate kinase (almost all the proteins listed were thymidlyate kinases, with very low e-values). PhagesDB showed pham members being thymidylate kinases. CDD showed domain hits with a P-loop containing nucleoside triphosphate hydrolases, which bind a phosphate of a bound nucleotide. HHPred shows hits relevant to thymidylate kinases as well with low e-values. Thus, the function is called thymidylate kinase. /note=Transmembrane domains: Deep TMHMM shows no predicted TMRs, so this protein is likely not a membrane protein. /note=Secondary Annotator Name: Santilla, Matthew /note=Secondary Annotator QC: I agree with this annotation. Please fill out the synteny box. CDS 102352 - 103395 /gene="192" /product="gp192" /function="aminotransferase" /locus tag="JulietS_192" /note=Original Glimmer call @bp 102352 has strength 11.37; Genemark calls start at 102352 /note=SSC: 102352-103395 CP: yes SCS: both ST: SS BLAST-Start: [aminotransferase [Mycobacterium phage ET08] ],,NCBI, q1:s1 100.0% 0.0 GAP: -29 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.299, -4.947056885847969, no F: aminotransferase SIF-BLAST: ,,[aminotransferase [Mycobacterium phage ET08] ],,YP_003347836,100.0,0.0 SIF-HHPRED: Adenosylmethionine-8-amino-7-oxononanoate aminotransferase; transaminase PLP Complex Fragment, transferase-transferase inhibitor complex; HET: EPE, EDO, PLP, 3VR; 1.35A {Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv)} SCOP: c.67.1.0,,,4WYD_A,89.6254,100.0 SIF-Syn: Aminotransferase, upstream gene is thymidylate kinase, and downstream gene is NKF, just like in phage Napoleon13. /note=Primary Annotator Name: Hines, Kia /note=Auto-annotation: Glimmer and GeneMark both call 102352 as start. /note=Coding Potential: There is, overall, good coding potential on the forward strand, indicating this is a forward gene. The chosen start site covers all of the coding potential. /note=SD (Final) Score: The final score is not the best option at -6.406 and the z score, also not the best, is 1.236. The best final score is for start site 102439 at -4.708 and the z-score is 2.148. /note=Gap/overlap: There is an overlap of -128 bp, which is not preferable. However, since this overlap is conserved in most other draft genes of the cluster, it doesn’t affect whether the gene is real or if a different start site should be called. The length of the gene is 1143 which is a very adequate length. /note=Phamerator: Pham number is 57264 as of 1/08/23. It is conserved, found in ValleyTerrace_199 and Teardrop_195. /note=Starterator: Start number (@102352) is 12, manually annotated 150 times. This is the same as the conserved start site. There are 182 members of the pham and 150 call the same conserved start site. The second start number is 22 (@ 102439) which has been manually annotated 6 times (6 members of the pham call this start site). /note=Location call: Based on the above evidence, this is a real gene but the original start site that is called, 102352, is not the best start site. I think 102439 should be the start site instead because of the lack of overlap and also the better SD scores. /note=Function call: Predicted function is aminotransferase. The top hits for non draft genes on phagesdb, NCBI Blast, and HHpred all call aminotransferase as the function of this gene with e-values of 0, 0, and 9.3e-30 respectively. The top hits on all three sites also have good percent coverage (high 90s) and probability (100%). /note=Transmembrane domains: There are no TMDs predicted by deepTMHMM so it is not a membrane protein. /note=Secondary Annotator Name: Gurunathan, Vibha /note=Secondary Annotator QC: I agree with the function and location calls for this gene. CDS 103415 - 103882 /gene="193" /product="gp193" /function="hypothetical protein" /locus tag="JulietS_193" /note=Original Glimmer call @bp 103439 has strength 13.89; Genemark calls start at 103478 /note=SSC: 103415-103882 CP: yes SCS: both-cs ST: SS BLAST-Start: [gp189 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 1.14466E-111 GAP: 19 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.711, -4.483253196181581, no F: hypothetical protein SIF-BLAST: ,,[gp189 [Mycobacterium phage Bxz1] ],,NP_818240,100.0,1.14466E-111 SIF-HHPRED: SIF-Syn: /note=PECAAN Notes /note=Primary Annotator Name: Luk, Jarrett /note=Auto-annotation:Forward Gene.Glimmer called the start site at 103439, Genmark called the start site at 103478 /note=Coding Potential:The coding potential is found in the forward frame. The gene almost covers all of the coding potential in both the Host Trained and Self Trained GeneMark /note=SD (Final) Score:-4.483 it`s the best score due to how it has the smallest gap. Z score is 2.711 which is greater than 2.It shows that the site is a good candidate for a start site. /note=Gap/overlap:19 which is a relatively small gap.The gap is conserved in Alice and Ading from the C1 cluster /note=Phamerator: Pham 484, 01/19/23 gene conserved in Phage Ronan and Cane17 /note=Starterator: 01/13/23. Manually annotated 130/159 nondraft in this pham. Start 3 (3,103415) are manually called by 130 others . Evidence is not in line with glimmer and genemark however the gap is conserved in other phages which makes it a better candidate /note=Location call: It is a real gene and the strat site is Start 3 at 103415 based on the evidence above /note=Function call:NKF.The top two hits on phagesDB blast (e-value=1e-87 for phage Ading and Amataga) indicate unknown function. The top hit on NCBI blast(e-value =1e-111 and 5e-111 for phage Bxz1 and Shelob respectively, with over 99% identity and 100% coverage) also indicates an unknown function. There are no significant hits on HHpred and CDD. /note=Transmembrane domains: No transmembrane domain predicted on TOPCONS,TMHMM and deep TMHMM.Which indicates that the gene is not responsible for a membrane protein /note=Secondary Annotator Name: Pham, Truc /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 103949 - 104335 /gene="194" /product="gp194" /function="hypothetical protein" /locus tag="JulietS_194" /note=Original Glimmer call @bp 103949 has strength 14.22; Genemark calls start at 103949 /note=SSC: 103949-104335 CP: yes SCS: both ST: SS BLAST-Start: [gp190 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 9.49058E-90 GAP: 66 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.22, -2.4634880269616195, yes F: hypothetical protein SIF-BLAST: ,,[gp190 [Mycobacterium phage Bxz1] ],,NP_818241,100.0,9.49058E-90 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Okahata, Leila /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 103949 bp. Start codon ATG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. The chosen start site includes all of the coding potential. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -2.463. It is the best final score on PECAAN. /note=Gap/overlap: 66 bp gap. A little large, but ultimately reasonable because the gap is conserved in other phages (Grungle, Daffodil) and there is no coding potential in the gap that might indicate a new gene. /note=Phamerator: Pham 436. Date 1/12/2023. It is conserved and found in Grungle, Daffodil, and QBert, which are all in the same cluster as JulietS (C). /note=Starterator: Start site 9 in Starterator was manually annotated in 154/167 non-draft genes in this pham. Start 9 is 103949 bp in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and likely has a start site at 103949 bp. /note=Function call: No known function (NKF). The top two phagesDB BLAST hits are of unknown function (E-value = 6e-71), and the top three NCBI BLAST hits are also of unknown function (E-value < 2e-89, 99.22%+ identity, 100% coverage). CDD and HHpred had no significant hits. /note=Transmembrane domains: Neither TMHMM, TOPCONS, or DeepTMHMM predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Wu, Angus /note=Secondary Annotator QC: I agree with the location and function call. CDS 104335 - 104709 /gene="195" /product="gp195" /function="hypothetical protein" /locus tag="JulietS_195" /note=Original Glimmer call @bp 104476 has strength 8.96; Genemark calls start at 104335 /note=SSC: 104335-104709 CP: no SCS: both-gm ST: SS BLAST-Start: [hypothetical protein ET08_189 [Mycobacterium phage ET08] ],,NCBI, q1:s1 100.0% 8.61219E-87 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.311, -6.188708875567495, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein ET08_189 [Mycobacterium phage ET08] ],,YP_003347839,100.0,8.61219E-87 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hegde, Priya /note=Auto-annotation: Glimmer (104476, ATG) and GeneMark (104335, ATG) call different start sites. /note=Coding Potential: /note=SD (Final) Score: The start site at 104476 has a final score of -5.863. The start site at 104335 has a final score of -6.189. /note=Gap/overlap: The start site at 104476 has a gap of 140 with the previous gene, which is large, but conserved with other phages (Darko and Patter). /note=Phamerator: This gene is in pham 4572. It is conserved, found in phages Darko and Norm. /note=Starterator: Start site 5 at 104476 has 11 manual annotations. It is the most annotated start site on Starterator. /note=Location call: This gene likely starts at 104476. This is supported by this start site having the highest final score and the most manual annotations in Starterator. Although the gap is large, it is conserved with phages Darko and Patter. Additionally, no coding potential was noted in any ORF in the gap. /note=Function call: NKF in NCBI BLAST; top hits were phages QBert (9.24e-51) and Bigswole (2.49e-50). NKF in PhagesDB BLASTp; top hits were Ading and Alice (1e-41). One hit in HHPred supporting NKF (1.5e-27). No hits in CDD. /note=Transmembrane domains: No TMDs predicted in DeepTMHMM, so this protein is not a membrane protein. /note=Secondary Annotator Name: Davis, Kayla /note=Secondary Annotator QC: Make sure to add the dates that the phamerator was done, as well as when the starterator was performed. Make sure to talk about the z-score within the annotation notes about your SD final score. Other than that, I agree with the function and location calls. CDS 104706 - 105488 /gene="196" /product="gp196" /function="DnaC-like helicase loader" /locus tag="JulietS_196" /note=Original Glimmer call @bp 104706 has strength 16.31; Genemark calls start at 104706 /note=SSC: 104706-105488 CP: yes SCS: both ST: SS BLAST-Start: [putative DnaC [Mycobacterium phage ScottMcG] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.49, -5.73456940637445, no F: DnaC-like helicase loader SIF-BLAST: ,,[putative DnaC [Mycobacterium phage ScottMcG] ],,YP_002224194,100.0,0.0 SIF-HHPRED: DNA replication protein DnaC; Helicase, helicase loader, AAA+, RecA, REPLICATION; HET: 08T, ADP;{Escherichia coli},,,6QEM_K,72.6923,99.8 SIF-Syn: Upstream is NKF, downstream is DnaB-like dsDNA helicase, just like Phage Melpomini /note=Primary Annotator Name: Lee, Amber /note=Auto-annotation: Glimmer and GeneMark both call the start site at 104706 bp. The start codon is GTG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. All coding potential is included. /note=SD (Final) Score: The start site 104706 corresponds to a Final Score of -5.735 which is the fourth best final score. It has a Z-score of 1.49. Despite the lower Z-score and the final score not being the best, this gene is conserved in other non-draft phages as part of an operon. /note=Gap/overlap: There is a 4 nucleotide overlap which suggests that this gene is part of an operon. /note=Phamerator: pham: 435. Date 1/17/2023. It is conserved; found in Ava3 (C1) and BackyardAgain (C1). /note=Starterator: Start site 20 in Starterator was manually annotated in 143/167 non-draft genes in this pham and is the most manually annotated start site. Start 20 is 104706 bp in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 104706 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: DNA-C Helicase Loader. The top ten PhagesDB BLAST hits have the function of DNA-C Helicase Loader (E-value <10^-153); PhagesDB Function Frequency also calls DNA-C Helicase Loader. The top ten NCBI BLAST hits also have the function of DNA-C Helicase Loader (100% coverage, >95% identity, and E-value ~0). HHPRED had two hits for a DNA-C Helicase Loader with 99.8% probability, >65% coverage and E-values of 3.3e-17 and 9.5e-16). CDD had hits for proteins related to DNA replication and DNA-C Helicase Loader with low e-values (~0) and coverage >70%, but identity values were low (<20%) which is why they were not included as evidence. Not a membrane protein because it wasn’t called by TMHMM, Deep TMHMM, or TOPCONS. /note=Transmembrane domains: Neither TMHMM, Deep TMHMM, or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Kristianto, Luke /note=Secondary Annotator QC: I agree with the location and function call. Please discuss whether the ORF covers all coding potential. Great discussion of the start site likely being a part of an operon and the gap/overlap being conserved. Great notes on rationale for function call. CDS 105490 - 106746 /gene="197" /product="gp197" /function="DnaB-like dsDNA helicase" /locus tag="JulietS_197" /note=Original Glimmer call @bp 105481 has strength 17.7; Genemark calls start at 105490 /note=SSC: 105490-106746 CP: yes SCS: both-gm ST: SS BLAST-Start: [gp193 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 0.0 GAP: 1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 0.992, -6.918086385167798, no F: DnaB-like dsDNA helicase SIF-BLAST: ,,[gp193 [Mycobacterium phage Bxz1] ],,NP_818244,100.0,0.0 SIF-HHPRED: Replicative DNA helicase; Helicase ATPase DNA replication, dodecamer, hydrolase; HET: TBR; 6.7A {Helicobacter pylori},,,4ZC0_D,98.5646,100.0 SIF-Syn: DnaB-like dsDNA helicase, the upstream is conserved which is DnaC-like helicase loader, pham 435, the downstream is NKF, pham 580. This consistency is seen in Ading and Bread. /note=Primary Annotator Name: Qi, Haocheng /note=Auto-annotation: The Glimmer and Genemark doesn’t agree with each other, glimmer choose 105481, start codon TTG; and genemark choose 105490, start codon GTG /note=Coding Potential: Coding potential is found both in GeneMark Self and Host, and there is only forward potential, and the chosen start site does include all of the coding potential. /note=SD (Final) Score: -6.918, not the best final score but the other factor is better. /note=Gap/overlap:1, very likely an operon starter which make the start of 105490 the chosen one /note=Phamerator: 439. Date 01/13/23. It is conserved; found in other 182 non-draft phages, such as Ading or Bread. /note=Starterator: Start site 11 in Starterator was manually annotated in 124/167 non-draft genes in this pham. Start 11 is 105490 in JulietS. This evidence agrees with the site predicted by both Glimmer and GeneMark. /note=Location call:Based on the above evidence, this is a real gene and the most likely start site is 105490 /note=Function call: DnaB-like dsDNA helicase, The top 2 phagesdb BLAST hits (phage:CharlieB; Zeenon) have the function of DnaB-like dsDNA helicase with e-value of 0, and the ncbi blast also have DnaB-like helicase with e-value of 0.The highest hits in CDD shows this is a DnaB helicase C terminal domain with e-value of 2.11602e-15, but the identity and coverage have low % so is not reliable. The largest possibility in HHpred is 100%, shows that this is a replicative DNA helicase, so overall there is a DnaB-like dsDNA helicase for this gene. /note=Transmembrane domains: There is no read in DeepTMHMM, which means that this does not have transmembrane domains /note=Secondary Annotator Name: Deng, Yiran /note=Secondary Annotator QC: checked CDS 106743 - 106949 /gene="198" /product="gp198" /function="hypothetical protein" /locus tag="JulietS_198" /note=Original Glimmer call @bp 106743 has strength 15.3; Genemark calls start at 106743 /note=SSC: 106743-106949 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SCOTTMCG_197 [Mycobacterium phage ScottMcG] ],,NCBI, q1:s1 100.0% 1.24837E-41 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.896, -4.963787326180191, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SCOTTMCG_197 [Mycobacterium phage ScottMcG] ],,YP_002224196,100.0,1.24837E-41 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Uyemura, Antonio /note=Auto-annotation: Both Glimmer and GeneMark call the start site as 106743. The start codon is GTG. /note=Coding Potential: Good coding potential in both self and host trained but there is some coding potential in the reverse direction. However, this gene is flanked by forward genes indicating that this gene is in the forward direction. /note=SD (Final) Score: -4.964. Although this is not the best final score, it has a highly favorable overlap out of all the other options. /note=Gap/overlap: -4. This indicates that there is a 4 bp overlap which is highly favorable because it indicates an operon. /note=Phamerator: 580 pham (1/15/23). This gene is conserved in both Alice (C1) and Astraea (C1). /note=Starterator: Start site 4 @ 106743. This start site was manually annotated 153/167. This agrees with both Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 106743. Additionally, it had a Z-score of 1.896. /note=Function call: The function is unknown. The top three phages (Ading (C1), Alice (C1), Amataga (C1)) on phagesdb BLAST hits mark the function as unknown (E-value 2e-35). Additionally the first 5 NCBI BLAST hits also call the function as a Hypothetical protein (97>% coverage, 98>% identity, and E-values <10^-40). HHPRED provides no convincing evidence nor did anything show up for CDD. /note=Transmembrane domains: DeepTMHMM predicts it to be inside the cell with 100% probability. /note=Secondary Annotator Name: Geghamyan, Knar /note=Secondary Annotator QC: I agree with the location and function call. CDS 106946 - 108019 /gene="199" /product="gp199" /function="DNA primase" /locus tag="JulietS_199" /note=Original Glimmer call @bp 106946 has strength 8.66; Genemark calls start at 106946 /note=SSC: 106946-108019 CP: yes SCS: both ST: SS BLAST-Start: [DNA primase [Mycobacterium phage ScottMcG] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.148, -5.40694046485655, no F: DNA primase SIF-BLAST: ,,[DNA primase [Mycobacterium phage ScottMcG] ],,YP_002224197,100.0,0.0 SIF-HHPRED: DNA primase; Zinc Ribbon, TOPRIM, RNA POLYMERASE, DNA REPLICATION, TRANSFERASE; 2.0A {Aquifex aeolicus},,,2AU3_A,96.9188,100.0 SIF-Syn: DNA primase called for both JulietS and Amataga (Pham 64444), upstream gene has NKF in JulietS and not called in Amataga (Pham 580), has NKF in JulietS and not called in Amataga (Pham 488), all genes are from the same pham. /note=Gene (stop@#108019 F) /note=PECAAN Notes /note=Primary Annotator Name: Barrera, Alexis /note=Auto-annotation: Glimmer and Genemark. Both call the start located at 106946. The start codon is ATG, which is common (about half of all genes have this start). /note=Coding Potential: Coding potential in this reading frame is in the forward direction only for GeneMarkS with some small peaks in the reverse direction for GeneMarkH. Coding potential is found in both GeneMark Self and Host however, the coding potential is not consistent in the middle of the gene, ~107450-107750. All of the coding potential is included in this region. /note=SD (Final) Score: The z score is 2.148. The final score is -5.407. /note=Gap/overlap: -31 bps gap upstream of the gene and a 4 bp gap downstream of the gene. This gene and these gaps are conserved in phages Amataga, Bread, and Darko. There is no other coding potential in these regions. /note=Phamerator: Pham: 64444. Date 01/18/23. The gene is conserved in phages Wally and Yucca which are in the same cluster as JulietS. /note=Starterator: Start site 13 is found in 175 of 270 genes in pham and was called for 161 of 246 non-draft phage genomes in the pham. It is the most annotated start site and is called 100.0% of the time when present. /note=Location call: Considering the evidence above, this gene is a real gene and has a start site at 106946 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: DNA primase. The top hit for HHpred has a function of DNA primase (probability: 100, E-value: 5.7e-30, % coverage: 96.9188) and other hits for DNA primase with similar values. In this pham, the Phagesdb Function Frequency also states that there is a frequency of 100% being a DNA primase (89 phages called) in subcluster C1. PhagesDB BLASTp shows many hits for phages with the function of DNA primase, all with E-values of 0. NCBI BLASTp has the top five hits with the function of DNA primase, (% coverage: 100, E-value: 0). There are no TmHmm hits. There are four CDD hits for DNA primase, with the first displaying these values (% identity: 18.5542, % alignment: 30.6024, % coverage: 83.7535, E-value: 1.68627e-8). /note=Transmembrane domains: DeepTMHMM predicts no TMRs. This evidence indicates that this is not a membrane protein. /note=Secondary Annotator Name: Uyemura, Antonio /note=Secondary Annotator QC: I agree with the above evidence that this is a real gene in the forward direction with the likely function of DNA primase. All that is missing is the synteny box. CDS 108023 - 108262 /gene="200" /product="gp200" /function="hypothetical protein" /locus tag="JulietS_200" /note=Original Glimmer call @bp 108023 has strength 8.65; Genemark calls start at 108023 /note=SSC: 108023-108262 CP: yes SCS: both ST: SS BLAST-Start: [gp196 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 1.60646E-49 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.622, -4.01551542808139, yes F: hypothetical protein SIF-BLAST: ,,[gp196 [Mycobacterium phage Bxz1] ],,NP_818247,100.0,1.60646E-49 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kristianto, Luke /note=Auto-annotation: Both Glimmer and GeneMark call the gene, and they agree on the start site at 108023 bp. The start codon is ATG, which has a high probability of being used as a start site. /note=Coding Potential: Coding potential in this ORF is on the forward strand only on the second frame, indicating that this is a forward gene. The ORF has reasonable coding potential and the called start site does capture all of the coding potential. /note=SD (Final) Score: The final score is the best option at -4.016 and the z-score is the best option at 2.622. This provides strong evidence that the called start site is the real start site. /note=Gap/overlap: The gap/overlap is reasonable at 3 bp. This gap is conserved and shows up in phage Ading from cluster/subcluster C1. /note=Phamerator: Pham: 577. Date 1/19/2023. It is conserved and found in Ading (C1) and Adlitam (C1). /note=Starterator: Start number 2 in Starterator was manually annotated in 157/159 non-draft genes in this pham. Start number 2 is 108023 bp in phage JulietS. This is likely the start site because it was called the most and was conserved in 172/173 (99.4%) of genes in the pham. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 108023 bp. Starterator agrees with Glimmer and GeneMark. /note=Function call: NKF. The top two PhagesDB BLAST hits have the function of “function unknown” (100% identity, E-value = 2e-39), and the top two NCBI BLAST hits have the function of “hypothetical protein” (100%/98.7342% identity, E-value = 1.60646e-49/7.39394e-49). Results from CDD and HHpred were irrelevant because either no results came up or unlikely results with unreasonably high e-values came up. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs. DeepTMHMM agrees (1.0 probability inside throughout). Therefore, it is likely not a membrane protein. /note=Secondary Annotator Name: Okahata, Leila /note=Secondary Annotator QC: I agree with the location and function call. All of the evidence categories have been considered. Make sure in the Location Call section that the right start site is called, you have it as 89261 but it should be 108023 (this probably just a copy/paste typo). Also, you don`t need to include in the Starterator section "and was found in 172/173 (99.4%) of genes in the pham" as the focus should be mainly on that it`s the most manually annotated. CDS 108225 - 109013 /gene="201" /product="gp201" /function="chaperonin, DnaJ-like" /locus tag="JulietS_201" /note=Original Glimmer call @bp 108483 has strength 11.59; Genemark calls start at 108483 /note=SSC: 108225-109013 CP: yes SCS: both-cs ST: SS BLAST-Start: [DnaJ-like chaperonin [Mycobacterium phage FrayBell]],,NCBI, q1:s1 100.0% 0.0 GAP: -38 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.648, -5.465638252962307, no F: chaperonin, DnaJ-like SIF-BLAST: ,,[DnaJ-like chaperonin [Mycobacterium phage FrayBell]],,QAY06399,99.6183,0.0 SIF-HHPRED: Co-chaperone protein HscB, mitochondrial precursor; Co-chaperone protein HscB, Structural Genomics Medical Relevance, Protein Structure Initiative, PSI-2, Center for Eukaryotic Structural Genomics; HET: MSE; 3.0A {Homo sapiens},,,3BVO_A,61.8321,99.3 SIF-Syn: /note=Primary Annotator Name: Shah, Amay /note=Auto-annotation: Glimmer and Genemark. Both call the start site at 108483. /note=Coding Potential: Coding potential in this ORF is in the forward direction only, indicating that this is a forward gene. Coding potential is found in both GeneMark Self and Host. Coding potential extends upstream of the suggested start site. /note=SD (Final) Score: The final score is -5.923. This is not the best final score on PECAAN. /note=Gap/overlap: There is a large gap of 220 bp. /note=Phamerator: pham: 4743. Date 1/10/2023. It is conserved; found in ArcherS7 (C) and Bangla1971 (C). /note=Starterator: Start site 6 in Starterator was manually annotated in 8/13 non-draft genes in this pham. Start 6 is 108411 in JulietS. This evidence does not agree with the site predicted by Glimmer and GeneMark. This start site also does not include the upstream coding potential. /note=Location call: Based on the above evidence, the start site is unlikely to be either of the sites called by Glimmer, Genemark, and Starterator. The start site is most likely at 108225 since this start site covers the left-out coding potential, and this is the only gene candidate with a gap or overlap that is less than 100bp; all other gene candidates have excessive gaps or overlaps. /note=Function call: NFK. BLASTp returned no hits. CDD returned no hits. NCBI BLAST returned no hits. HHPRED did not return any hits. /note=Transmembrane domains: DeepTMHMM doesn`t predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Santilla, Matthew /note=Secondary Annotator QC: I agree with the function call of this annotation. The location evidence is quite confusing for this gene, but I agree with this start site call, but I think you should ask the professor/TA if it`s correct or not. Also please fill out starterator dropdown and synteny box. CDS 109099 - 110070 /gene="202" /product="gp202" /function="SSB protein" /locus tag="JulietS_202" /note=Original Glimmer call @bp 109120 has strength 13.57; Genemark calls start at 109120 /note=SSC: 109099-110070 CP: yes SCS: both-cs ST: SS BLAST-Start: [single strand DNA binding protein [Mycobacterium phage Cali] ],,NCBI, q1:s1 100.0% 0.0 GAP: 85 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.822, -5.389187297999249, no F: SSB protein SIF-BLAST: ,,[single strand DNA binding protein [Mycobacterium phage Cali] ],,YP_002224645,100.0,0.0 SIF-HHPRED: gp32 single stranded DNA binding protein; Zn2+ binding subdomain, 5-stranded beta-sheet, OB fold, single-stranded DNA binding, DNA BINDING PROTEIN; 2.0A {Enterobacteria phage RB69} SCOP: b.40.4.7,,,2A1K_B,56.0372,99.6 SIF-Syn: /note=Primary Annotator Name: Sandhu, Muskaan /note=Auto-annotation: The start site in Glimmer and Genemark are the same: 109120. Start codon is ATG, which is a common start codon. /note=Coding Potential: There is coding potential present ORF 1 of the direct sequence (as expected for a forward gene) in the Host-Trained and Self-Trained Genemark. Start site 109120 covers all of coding potential. /note=SD (Final) Score: -5.389. 4th best final score but it corresponds with the start site that has most manual annotations for JulietS. /note=Gap/overlap: There is a 188 bp gap downstream of the gene and 107 bp gap upstream of the gene. Gap is reasonable because it is conserved in Izajani and LinStu and there is no coding potential present to fill in the gaps in Host-Trained and Self-Trained Genemark. /note=Phamerator: Pham 449 as of 1/24/23. It is conserved, found in Babyland (C1) and Ava3 (C1). /note=Starterator: Most annotated start site is 7 (153/165 of non-draft genes called start site 7); start site 4 (109099 bp) has 153 MA annotations in phage JulietS compared to 7 MA for start site 7 (109120). /note=Location call: Start site 4 @ 109099 bp. Even though this start site has a lower z-score and final score, Starterator has a lot of manual annotations and the length of the gene with this start site corresponds to other phages present in pham 449. /note=Function call: SSB protein. The top three PhagesDB BLAST hits have the function of ssDNA binding protein (score 675 and E-value = 0) and the top three NCBI BLAST hits also have the function of ssDNA binding protein (100% coverage, >99% identity, and E-value = 0). HHpred had one relevant hit of a ssDNA binding protein with E-value of 0, probability of 99^ and coverage of 56%. CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does show any TMDs, therefore, this is not a transmembrane protein. /note=Secondary Annotator Name: Davis, Kayla /note=Secondary Annotator QC: I agree with the function call of this annotation as well as the location call, please will out the the synteny box as it is a gene with a known function. CDS 110258 - 113653 /gene="203" /product="gp203" /function="DnaE-like DNA polymerase III (alpha)" /locus tag="JulietS_203" /note=Original Glimmer call @bp 110183 has strength 20.75; Genemark calls start at 110258 /note=SSC: 110258-113653 CP: yes SCS: both-gm ST: NA BLAST-Start: [DNA polymerase [Mycobacterium phage LinStu] ],,NCBI, q1:s26 100.0% 0.0 GAP: 187 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.062, -2.970161406017234, yes F: DnaE-like DNA polymerase III (alpha) SIF-BLAST: ,,[DNA polymerase [Mycobacterium phage LinStu] ],,YP_009014768,97.8374,0.0 SIF-HHPRED: DNA POLYMERASE III ALPHA; TRANSFERASE, DNA REPLICATION, DNA POLYMERASE III ALPHA, DNA POLYMERASE III BETA, DNA POLYMERASE III EPSILON; 7.3A {ESCHERICHIA COLI K-12},,,5FKW_A,99.7347,100.0 SIF-Syn: This gene has function DnaE-like DNA polymerase III (alpha), with upstream gene ssDNA binding protein, and downstream gene RF-1 chain peptide release factor, just like in Phage Turret. /note=Primary Annotator Name: Wu, Angus /note=Auto-annotation: Glimmer called the start site at 110183 with a GTG start codon, and GeneMark called the start site at 110258 with a ATG start codon. /note=Coding Potential: The gene has good coding potential with the host-trained GeneMark on ORF 2. The coding potential increases near the potential start site and plateaus until the stop codon where it tapers off. The gene similarly has good coding potential with the self-trained GeneMark on ORF 2, where the coding potential rises at the start site and plateaus until the stop site. Finally, the start and stop sites also encompass all of the coding potential. /note=SD (Final) Score: The Final Score is -2.970, and the Z score is 3.062 for 110258. The final score for this start site has the best sequence match (highest final score), and also has the highest Z-score of the possible start site candidates. /note=Gap/overlap: There is a 187 base pair gap with the gene upstream, and this is slightly concerning as it is a long gap (>100bp). This chosen start site does not have the longest open reading frame, but the gene length will be 3396 bp, which is good. /note=Phamerator: The gene is found in Pham 57171 as of 1/23/23. The gene is conserved in other members of subcluster C1, and I used phage Stubby and Turret for comparison. /note=Starterator: A start site choice exists that is conserved among members of the pham, and corresponds to start site 64 and 110258. 95/372 non-draft genomes in the pham call also this start site, and is the most manually annotated start site. /note=Location call: The available evidence suggests that the start site is 110258. The gene appears to be real, and this proposed start site covers all the coding potential, and has a good final score and z-score. In starterator, it is the most commonly called site and called 63.5% of the time when present. /note=Function call: The function of this protein is DnaE-like DNA polymerase III (alpha). All results from PhagesDB BLASTp show hits with 100% identity, 100% alignment, and 100% coverage, with e-values 0, and those proteins have function DnaE-like DNA polymerase III. Similarly on NCBI BLASTp, e-values are 0, and those proteins also have function DnaE-like DNA polymerase III. CDD had a hit of DNA polymerase III, alpha subunit, that agreed with BLASTp, and HHpred results were good with e-value 0 and nearly 99.9% coverage and 100% probability, also with function DNA Polymerase III Alpha subunit. /note=Transmembrane domains: Neither DeepTMHMM or TOPCONS predicted any TMDs, this is not a membrane protein. /note=Secondary Annotator Name: Uyemura, Antonio /note=Secondary Annotator QC: I agree with the function and location call. In the gap section I would mention that the gap is conserved in other phages (name the phage). In addition, you need to select something for the Starterator drop down box. Otherwise this looks good. CDS 113646 - 114056 /gene="204" /product="gp204" /function="RF-1 peptide chain release factor" /locus tag="JulietS_204" /note=Original Glimmer call @bp 113805 has strength 16.24; Genemark calls start at 113805 /note=SSC: 113646-114056 CP: no SCS: both-cs ST: NI BLAST-Start: [RF-1 domain peptide chain release factor [Mycobacterium phage ET08] ],,NCBI, q1:s1 100.0% 9.12208E-94 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.082, -5.021806517096519, no F: RF-1 peptide chain release factor SIF-BLAST: ,,[RF-1 domain peptide chain release factor [Mycobacterium phage ET08] ],,YP_003347848,100.0,9.12208E-94 SIF-HHPRED: SIF-Syn: This gene has synteny with Grungle`s gene 195, as they are both in pham 4399. In addition to this, upstream, both genes are DNAE-like DNA polymerase III (pham 57171) and downstream they are both RecA-like DNA recombinases (pham 103), meaning that this gene has synteny. /note=Primary Annotator Name: Davis, Kayla /note=Auto-annotation: Both Glimmer and GeneMark list a start site of 113805, which corresponds to a start site of GTG. /note=Coding Potential: There looks to be good coding potential in this gene as both the Host-trained and the Self-trained GeneMark appear to show complete direct sequences for the entire gene. This gene shares synteny with Grungle and LolaVinca. /note=SD (Final) Score: The chosen Z-score is 2.477, while the chosen final score is -3.729. Though these are not the highest Z-scores, they are quite high and match with the suggested start site that is predicted by both GeneMark and Glimmer. In addition to that, in both Grungle and LolaVinca, there is a gap that matches with the one in JulietS as synteny is shared. /note=Gap/overlap: There is a large gap of 151 base pairs, however this gap is seen in other cluster C phages like Grungle and LolaVinca. /note=Phamerator: As of 02/07/2023, this phage was found to be a part of Pham 4399. Other phages in this pham include Grungle, and Caravan. /note=Starterator: As of 01/27/23, the most called start number was 4, and it was called in 8 out of the 8 non draft genes in this pham. JulietS also has start site 4 as it’s most annotated start, with a start site of 113805. This matches up with the start sites that are listed by GeneMark and Glimmer. /note=Location call: Based on the evidence above, this appears to be a real gene with a start site of 113805. /note=Function call: Based on the evidence, this gene is most likely an RF-1 peptide chain release factor. The top hits on NCBI blast had listed this gene to have the function of an RF-1 peptide chain release factor. The HHpred had several hits with high coverages ranging from 85-98% that this gene was related to a polypeptide chain release factor. /note=There were two hits for CDD, one being for an RF-1 domain, while the other was for a PCRF domain. Both of these domains are found in peptide chain release factors. /note=Transmembrane domains: Deep TMHMM found that there were 0 predicted TMRs meaning that this is not a membrane protein. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 114053 - 115141 /gene="205" /product="gp205" /function="RecA-like DNA recombinase" /locus tag="JulietS_205" /note=Original Glimmer call @bp 114023 has strength 20.43; Genemark calls start at 114053 /note=SSC: 114053-115141 CP: yes SCS: both-gm ST: SS BLAST-Start: [RecA-like DNA recombinase [Mycobacterium phage BananaFence]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.711, -3.1779018267349564, no F: RecA-like DNA recombinase SIF-BLAST: ,,[RecA-like DNA recombinase [Mycobacterium phage BananaFence]],,QWY80773,99.7238,0.0 SIF-HHPRED: Protein recA; Alpha and beta proteins (a/b, a+b), ATP-binding, Cytoplasm, DNA damage, DNA recombination, DNA repair, DNA-binding, Nucleotide-binding; 1.95A {Thermotoga maritima},,,3HR8_A,96.4088,100.0 SIF-Syn: Synteny: RecA-like DNA recombinase is called for both JulietS and StephanieG (Pham 103). The upstream gene is of JulietS is not the same from the one StephanieG. However, downstream the holliday protein resolvase is seen in both phages. /note=Primary Annotator Name: Pham, Truc /note=Auto-annotation: The two auto-annotation algorithms, Glimmer and GeneMarks, call the start of this gene at different locations. Glimmer calls the start site at 114023 and GeneMark calls the start site at 114053 with the start codon of GTG. /note=Coding Potential: The coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host and all of the ORF is included with the 114053 start site. /note=SD (Final) Score: The SD (Final) Score for the 114053 start site is -3.178 and a Z-score of 2.711 on PECAAN. Although, these aren`t the best scores, it`s still decent. /note=Gap/overlap: This gene has an overlap of 34 base pairs with the previous gene. This is the best possible start since the other start site with the highest Z-score would have a gap of 590 base pairs, which will require the addition of another gene in this gap. /note=Phamerator: This gene belongs to pham number 103 as of 1/13/2023. The gene is conserved in phages of this cluster (C) like Ading and Amataga. Many members of this family have the function of RecA-like DNA recombinase listed as their function, so it is highly likely that this is a gene with RecA-like DNA recombinase function. /note=Starterator: Start site 25 is most often called as it was manually annotated in 151/356 non-draft genes in the pham. However, JulietS is called at Start 15 which is at 114053. This evidence agrees with the site predicted by Glimmer. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 114053. /note=Function call: RecA-like DNA recombinase. The top three phagesdb BLAST hits are function unknown (E-value <0), and the top NCBI BLAST hits also listed RecA-like DNA recombinase function for this gene (87+% coverage, 27+% identity, and E-value <10^-31). HHpred’s top hits also indicate RecA-like DNA recombinase function,(100% probability, 90+% coverage, and E-value <10^-40). CDD’s top hits also indicate RecA-like DNA recombinase function,(29+% identity, 80+% coverage,, and E-value <10^-25). /note=Transmembrane domains: Neither DeepTMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Luk Jarrett /note=Secondary Annotator QC: I do agree with the function call but I don not agree with the location call. I believe the start site would be start site 25 at 114053 which has 151 mannual annotations which is a lot more than the 6MA of start site 15 at 114023. A-4 overlap is also better than the -34 overlap at 114023, the -4 overlap is conserved in many phages from cluster C such as Ading and Alice. The -4 overlap could indicate the gene is part of an operon. The most common length of the gene found in other phages of cluster C is 1089bp which is the start site at 114053. The coding potential also seem to be starting around 114053. Therefore I believe the location call should be start 25 at 114053 CDS 115141 - 115521 /gene="206" /product="gp206" /function="Holliday junction resolvase" /locus tag="JulietS_206" /note=Original Glimmer call @bp 115141 has strength 16.61; Genemark calls start at 115141 /note=SSC: 115141-115521 CP: yes SCS: both ST: SS BLAST-Start: [gp202 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 5.65664E-88 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.639, -3.4076099559729456, yes F: Holliday junction resolvase SIF-BLAST: ,,[gp202 [Mycobacterium phage Bxz1] ],,NP_818253,100.0,5.65664E-88 SIF-HHPRED: HOLLIDAY-JUNCTION RESOLVASE; HYDROLASE, ENZYME, HOMOLOGOUS RECOMBINATION, HOLLIDAY JUNCTION RESOLVING ENZYME, NUCLEASE, ARCHAEA, THERMOPHILE; HET: SO4, EDO; 1.8A {SULFOLOBUS SOLFATARICUS} SCOP: c.52.1.18,,,1OB8_A,84.127,99.4 SIF-Syn: Holliday junction resolvase, upstream gene is in [pham 103] function of RecA-like DNA recombinase, downstream gene is in [pham 533] function is NKF, just like in phage DrPhinkDaddy and FoxTrotP1. /note=Primary Annotator Name: Deng, Yiran /note=Auto-annotation:Glimmer and GeneMark both call the start site at 115141 with start codon ATG /note=Coding Potential: this ORF has good coding potential on the direct sequence (forward strand), indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. The chosen start site does include all of the coding potential /note=SD (Final) Score: final score of -3.408 and z-value of 2.639, which is the best final score on PECAAN /note=Gap/overlap: the overlap with the upstream gene is 1 bp, and may be a operon. This gene is conserved in several other phages of the same cluster (Alice, Grungle) and the gap does not contain coding potential and was seen in Alice and Grungle as well. /note=Phamerator: Pham 416 and has 183 members, Date 1/20/2023. It is conserved and found in Alice (C) and ArcherS7 (C) /note=Starterator: Start site number 205 in Starterator had the highest manual annotation in 157/168 non-drafted genes in this pham. Start site 205 is at position 115141 in JulietS, which agrees with the auto-annotated site by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 115141. /note=Function call: holliday junction resolvase. The top phagesDB BLAST hits have the same suggested function with small e-values of 1e -70 (Zalkecks, Yucca). The highest hit of NCBI Blastq top hits agree with the same function (100%+ coverage, 99%+ identity, and e-value of 6e-88). HHPRED top hits agree with the same function with >99% probability, >80% coverage and e-value of <1.4e-11. CDD has no hit. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Santilla. Matthew /note=Secondary Annotator QC: I agree with this annotation. Also I think it`s important to mention that this is potentially an operon because the overlap is -1. CDS 115518 - 115673 /gene="207" /product="gp207" /function="hypothetical protein" /locus tag="JulietS_207" /note=Original Glimmer call @bp 115518 has strength 11.36; Genemark calls start at 115518 /note=SSC: 115518-115673 CP: yes SCS: both ST: SS BLAST-Start: [gp203 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 4.02022E-30 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.632, -3.482870966776971, yes F: hypothetical protein SIF-BLAST: ,,[gp203 [Mycobacterium phage Bxz1] ],,NP_818254,100.0,4.02022E-30 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Santilla, Matthew /note=Auto-annotation: Both Glimmer and Genemark state the start site is 115518. Start codon is GTG. /note=Coding Potential: The host-trained and self-trained Genemark shows that the coding potential is covered in both the start site @ 115518 and the stop site @ 115673. /note=SD (Final) Score: The final score of this start site is -3.483 which is the least negative number of the start sites given and the z-score is 2.151 which is above the recommended 2 z-score. /note=Gap/overlap: There is a -4 gap which may indicate that this is an operon. /note=Phamerator: This gene is found in Pham 533. This gene is also found in phages Darko, Koguma, and LifeSavor which are all found in the same cluster as phage JulietS. /note=Starterator: Start site 2 is @ 115518 in JulietS and is called in 157/157 non-draft genes. This agrees with the auto annotated start site given by Glimmer and Genemark. /note=Location call: Based on the aforementioned evidence the start site is most likely @ 115518. /note=Function call: The top 3 hits on PhagesDB blast (phages ZygoTaiga, Yucca, and Wally) all have an e-value of 8e^-25 and state the function is unknown. NCBI blast shows that phage Bxz1 with an e-value of 4e^-30 and percent identity of 100% lists the function as gp203 and phage YemiJoy2021 with an e-value of 5e^-29 and percent identity of 98.04% lists that the gene is a hypothetical protein. CDD had no relevant hits. HHPred had no relevant hits. /note=Transmembrane domains: Deep TMHMM did not predict any transmembrane domains, so this gene is not a membrane protein. /note=Secondary Annotator Name: Okahata, Leila /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 115666 - 117438 /gene="208" /product="gp208" /function="HNH endonuclease" /locus tag="JulietS_208" /note=Original Glimmer call @bp 115666 has strength 9.51; Genemark calls start at 115666 /note=SSC: 115666-117438 CP: yes SCS: both ST: SS BLAST-Start: [HNH endonuclease [Mycobacterium phage Sauce] ],,NCBI, q1:s1 100.0% 0.0 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.671, -3.4030855957987485, no F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Mycobacterium phage Sauce] ],,YP_010058738,99.8305,0.0 SIF-HHPRED: V-type ATP synthase alpha chain; intein, protein splicing, endonuclease, HYDROLASE; 1.56A {Thermococcus litoralis},,,7QSS_A,45.5932,99.6 SIF-Syn: HNH Endonuclease, upstream gene is in [pham 533] function of NKF, downstream gene is in [pham 74188] function is RusA-like resolvase (endonuclease), just like in phage Atlantean and Dietrick. /note=Primary Annotator Name: Arredondo, Alexis /note=Auto-annotation: Both Glimmer and GeneMark agree on the start site, 130958, with the start codon being ATG. /note=Coding Potential: The selected start codon covers the entire length of the gene with coding potential for both the host-trained and self-trained genemark. /note=SD (Final) Score: The gene candidate does not have the best final score, -3.791, however it does have the best Z-score 2.485, making the gene a very strong candidate. /note=Gap/overlap: The gap is -8, highly indicating that the gene may be part of an operon with the genes near it. /note=Phamerator: This gene belongs to pham 456, dated 1/13/23. It is conserved in other phages as well, such as Momo (C1) and Phlegm (C1). /note=Starterator: Start site 2 was manually annotated in 161 of the 165 non-draft genes in the pham. This start site is 130958, which agrees with the auto annotation by Glimmer and GeneMark, and further validates the start site. /note=Location call: Based on the agreed start site provided by both Glimmer and GeneMark, 130958, which is validated by starterator, along with the consideration that the gene is conserved in both Momo (C1) and Phlegm (C1), this gene is most likely real. /note=Function call: HNH endonuclease. HHPred had multiple significant hits, however, one hit listed as HNH endonuclease had a probability of 99.6%, an e-value of 1.2e-12, and a percent coverage of 99.6%. Additionally, the two top hits for NCBI Blast also listed the function as HNH endonuclease. However, the first hit had a percentage identity of 99.661%, a percentage coverage of 100%, and an e-value of 0, indicating that the gene’s function is an HNH endonuclease. Additionally, Phages DB lists two phages, Sauce and Audrick with e-values of 0 and a function listed as HNH endonuclease. /note=Transmembrane domains: Deep TMHMM does not list TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Deng, Yiran /note=Secondary Annotator QC: I agree with the function and location call. Please remember to indicate that -8bp in gap/overlap section which should be an overlap. And remember to select the drop down menus CDS 117438 - 117893 /gene="209" /product="gp209" /function="RusA-like resolvase" /locus tag="JulietS_209" /note=Original Glimmer call @bp 117438 has strength 9.95; Genemark calls start at 117438 /note=SSC: 117438-117893 CP: yes SCS: both ST: SS BLAST-Start: [RusA-like Holliday junction resolvase [Mycobacterium phage Cali] ],,NCBI, q1:s1 100.0% 1.16325E-106 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.485, -3.790975366313636, no F: RusA-like resolvase SIF-BLAST: ,,[RusA-like Holliday junction resolvase [Mycobacterium phage Cali] ],,YP_002224652,100.0,1.16325E-106 SIF-HHPRED: RusA ; Endodeoxyribonuclease RusA,,,PF05866.14,89.404,99.9 SIF-Syn: RusA-like resolvase (endonuclease), upstream gene is in [pham 775] function of HNH endonuclease, downstream gene is in [pham 430] function is NKF, just like in phage ShiaLabeouf and Shrimp. /note=Primary Annotator Name: Kumar, Preyasi /note=Auto-annotation: Glimmer and GeneMark both call the start at 117438, ATG. /note=Coding Potential: Yes, the gene has reasonable coding potential predicted within the putative ORF and the chosen start site covers all this coding potential. Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -2.095. It is the best final score on PECAAN, but not the original start site’s Final Score (-3.791, second best). /note=Gap/overlap: 1bp overlap. Reasonable, evidence of an operon, acceptable gene length. I didn’t choose a start site that would make a longer ORF because both Glimmer and GeneMark agreed on the current start site and all other final scores are lower than the current start site. /note=Phamerator: Pham 39717. Date 01/13/2023. It is conserved, found in ShiaLabeouf and Shrimp. /note=Starterator: Yes, there is a conserved start site choice. It is start number 41 with a base pair coordinate of 117438. 157 of 218 call site #41. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 117438 bp. /note=Function call: RusA-like resolvase; Both NCBI and PhagesDB database predicted RusA-like resolvase; top PhagesDB BLAST hits (Shaqnato_206, Phox_216, LRRHood_208), and NCBI BLAST hits (Cali, QBert, CharlieB), and HHpred top hits (PF05866.14 and 2H8E_A) have RusA description. e-value < 10^-83. 1 CDD hit (pfam05866) for Endodeoxyribonuclease RusA, RusA superfamily was found. /note=Transmembrane domains: Neither TMHMM or TOPCONS or DeepTMHMM predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Deng, Yiran /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 117877 - 118596 /gene="210" /product="gp210" /function="hypothetical protein" /locus tag="JulietS_210" /note=Original Glimmer call @bp 117877 has strength 7.17; Genemark calls start at 117877 /note=SSC: 117877-118596 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein M180_gp135 [Mycobacterium phage ArcherS7] ],,NCBI, q1:s1 100.0% 3.76366E-175 GAP: -17 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.976, -5.367638368103365, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein M180_gp135 [Mycobacterium phage ArcherS7] ],,YP_008061445,100.0,3.76366E-175 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Rheinhardt, Jenna /note=Auto-annotation: GeneMark and Glimmer both call 117877 as the start site. Start codon of ATG. /note=Coding Potential: Strong coding potential present in the host and self trained GeneMark. It can be found on the first strand in the forward direction. /note=SD (Final) Score: -5.368 ; not the best score presented. Z-score of 1.976 ; best z score presented. /note=Gap/overlap: -17 overlap. Within acceptable range and the coding potential within the overlap is not strong. /note=Phamerator: pham 430 ; 153 of 183 phages with this gene have the 170bp length that is called. This includes phages Audrick and Babyland. /note=Starterator: Calls start 11 beginning with 117877 with 143 MA calling this start site as well. This call agrees with GeneMark and Glimmer. /note=Location call: Yes the evidence above supports that the start site is real. The most likely start site is 117877. /note=Function call: NKF ; Phagesdb BLAST calls unknown function with strong e-values of 1e-135, phages Astraea and BackyardAgain. HHPRED does not present matches with strong e-values (>90). NCBI BLAST has a 100% aligned and 100% identity for a hypothetical protein from phage ArcherS7 and 99.5% aligned and identity for another hypothetical protein with phage Shrimp. /note=Transmembrane domains: DeepTmHmm presents no transmembrane domains so it is likely not a membrane protein. /note=Secondary Annotator Name: Geghamyan, Knar /note=Secondary Annotator QC: I agree with the location and function call. CDS 118577 - 119446 /gene="211" /product="gp211" /function="hypothetical protein" /locus tag="JulietS_211" /note=Original Glimmer call @bp 118577 has strength 6.92; Genemark calls start at 118643 /note=SSC: 118577-119446 CP: yes SCS: both-gl ST: SS BLAST-Start: [gp207 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 0.0 GAP: -20 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.797, -3.1379842619144376, no F: hypothetical protein SIF-BLAST: ,,[gp207 [Mycobacterium phage Bxz1] ],,NP_818258,100.0,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Tseng, Kylie /note=Auto-annotation: Glimmer and Genemark call different sites; Glimmer Start is 118577; Genemark Start is 118643; The start site I am going with is 118577; start codon: ATG /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF on the forward strand only on line 2 (indicating that this is a forward gene). The chosen start site covers this ORF. Coding potential is found in both the host and self-trained Genemark. There is synteny with non-draft phages such as Teardrop and Turret. /note=SD (Final) Score: -3.138. This is a reasonable SD score. /note=Gap/overlap: There is an overlap of 20 bp. This overlap seems a bit larger than what is normally expected, but it is conserved in many other phages so is reasonable. /note=Phamerator: Date of investigation: 1/23/23; Pham 466; Yes, the pham is in other members that belong to Cluster C, which is the cluster JulietS belongs to (i.e. phages Alice and Amataga). /note=Starterator: Yes, there is a conserved start choice. It is start number 5 with a base pair coordinate of 118577. Has 155 MA’s. Found in 170/177 (96%) of genes in pham. This start site also agrees with Glimmer’s call. E-values are good at -169. /note=Location call: Yes, the evidence suggests this gene is real. Start site 118577 is most likely. /note=Function call: NKF; CDD has one hit but the coverage is only 10%. NCBI predicted hypothetical proteins with coverages of 100%, identities over 99%, and e-values of 0. PhagesDB predicted the function “DNA binding protein” with a low e-value of -169 (phage I3) but did not predict any functions for this gene otherwise. Phages Gizmo and Grungle support the NKF conclusion since they have synteny with this gene and no known function (e-values -169). HHPred has hits, but the coverage is low (only 22%). /note=Transmembrane domains: No TMDs were predicted by DeepTMHMM; therefore it is not a membrane protein. /note=Secondary Annotator Name: Wu, Angus /note=Secondary Annotator QC: I agree with the location and function call. CDS 119498 - 119812 /gene="212" /product="gp212" /function="hypothetical protein" /locus tag="JulietS_212" /note=Original Glimmer call @bp 119498 has strength 4.15; Genemark calls start at 119498 /note=SSC: 119498-119812 CP: yes SCS: both ST: SS BLAST-Start: [gp208 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 5.03199E-69 GAP: 51 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.999, -4.731360067354143, no F: hypothetical protein SIF-BLAST: ,,[gp208 [Mycobacterium phage Bxz1] ],,NP_818259,100.0,5.03199E-69 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Aunger, Sarah /note=Auto-annotation: Glimmer and GeneMark both call the start site at 119498 with a start codon of ATG. /note=Coding Potential: Coding potential in this Open Reading Frame (ORF) is in the forward strand, indicating a forward reading gene. The start site of #119498 covers all of the coding potential regions on the forward strand, which supports the forward direction of this gene. Additionally, strong coding potential is found in both Host-Trained GeneMark and Self-Trained GeneMark graphs which suggests that this is a potential gene. /note=SD (Final) Score: Even though the final score is not the most negative (SD = -4.731), and neither is the Z-score (Z-score = 1.999). There does not seem to be any great inclination if the auto-generated start site is the most preferred. But since both Glimmer and GeneMark both call for this start site, it gives a higher probability that this is the correct start site even if it does not have the most negative SD score or Z-score. /note=Gap/overlap: There is an upstream gap of 51 bp, which seems reasonable when looking at the synteny of other phages classed in the C1 cluster against this draft gene. It is also the smallest gap that holds synteny with other phages in Cluster C1. There is a reasonal gap between the this draft gene and the gene upstream of about 3 bp which allows for the largest 315 bp gene to be reasonable. /note=Phamerator: The gene was found to be in Pham 445 (01/21/2023), which is common in Cluster C1 phages, as previously seen in Phages Ading, Alice, and Ewok. There were no listed functions throughout the phamily in Cluster C1. However, the base pair length was conserved at 315 bp /note=Starterator: /note=Location call: There is a reasonable and highly conserved start site that was looked on 01/21/2023 at (5, 119498) which was called by 159 out of the 164 non-draft genes out of the 175 total pham members. /note=Function call: The function of the gene is an unknown function. In the BLASTp on PhagesDB.org it has strong match against Zeenon an e-value of 4x10-54 and 100% positives with both carrying unknown function. Additionally, when looking at the NCBI BLASTp it also indicates an unknown function with an e-value of 0 and a 100% match with phage Bxz1. CDD was inconclusive in providing a function. HHpred gave a function call of CurH fusion protein with a probability of 16.57% and an e-value of 250 which is too poor of a score to consider. There is an indication of a high probability that the function of this gene is unknown. /note=Transmembrane domains: Both TOPCONS and TMHMM predict no transmembrane domains, thus this gene does not encode for a membrane protein. DeepTMHMM also predicts no transmembrane domains as well, so all encoding of this gene is done inside the cell. /note=Secondary Annotator Name: Davis, Kayla /note=Secondary Annotator QC: Please fill out the Starterator portion of the annotation, other than that, I agree with the location as well as the function call. In addition, cleared up a few spell check errors. CDS 119816 - 120613 /gene="213" /product="gp213" /function="hypothetical protein" /locus tag="JulietS_213" /note=Original Glimmer call @bp 119816 has strength 15.8; Genemark calls start at 119816 /note=SSC: 119816-120613 CP: yes SCS: both ST: SS BLAST-Start: [gp212 [Mycobacterium phage Cali] ],,NCBI, q1:s1 100.0% 0.0 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.141, -2.627458653341447, yes F: hypothetical protein SIF-BLAST: ,,[gp212 [Mycobacterium phage Cali] ],,YP_002224656,100.0,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Fernandez, Mackenzie /note=Auto-annotation: Glimmer and GeneMark agree. Both call a start of 119816. ATG start codon. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene, and covers the suggested start site. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -2.627. It is the best final score on PECAAN. Z-score is 3.141. /note=Gap/overlap: Gap of 3bp. The gap is conserved in other phages (Pio, Littleton) and there are no ORFs longer than 120bp. /note=Phamerator: Pham: 483. Date 1/13/2023. Pham has 174 members and 14 phages are drafts. /note=Starterator: Start site 1 in Starterator was manually annotated in 157 of the 160 non-draft genes in the pham. Start 1 is 119816 in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 119816. /note=Function call: Helix-turn-helix DNA binding domain. NCBI calls this function with a coverage of 100%, percentage identity of 99.62%, and an e-value of 0. PhagesDB did not have any significant hits. CDD had zero hits. HHpred had one hit for a Helix-Turn-Helix DNA binding protein. Probability was greater than 90%, e-value of 0.0024, and score of 46.83. Compared JulietS_177 HHpred with Rachaly_45 to confirm HTH DNA binding domain. /note=Transmembrane domains: 0 predicted TMRs from Deep TMHMM, therefore it is not a membrane protein. /note=Secondary Annotator Name: Kristianto, Luke /note=Secondary Annotator QC: I agree with the location and function call. These notes look great. Please also check phage "I3" as evidence for the function call since it has a great e-value of 1e-147. CDS 120610 - 121635 /gene="214" /product="gp214" /function="hypothetical protein" /locus tag="JulietS_214" /note=Original Glimmer call @bp 120619 has strength 10.25; Genemark calls start at 120619 /note=SSC: 120610-121635 CP: no SCS: both-cs ST: SS BLAST-Start: [gp210 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.354, -6.018071631562228, no F: hypothetical protein SIF-BLAST: ,,[gp210 [Mycobacterium phage Bxz1] ],,NP_818261,100.0,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Geghamyan, Knar /note=Auto-annotation: Both Glimmer and GeneMark call the gene and they agree that the start site is 120,619. The start codon is GTG. /note=Coding Potential: The ORF has good coding potential on the direct sequence and the chosen start site includes all of the coding potential. /note=SD (Final) Score: -3.134 is the best SD (final) score because it is the highest (least negative) value, and has the second smallest gap/overlap of all of the choices. Z-score is above 2. /note=Gap/overlap: While the 5 bp gap is one bp larger than the -4 bp overlap of start site 120610, the z score and SD scores are better for start site 120619. /note=Phamerator: Gene found in pham 5 on 1/24/23. When compared to phages Bipolarisk, Megamind, and Shrimp, the pham in which this gene is most commonly annotated was found to be in other members of the same cluster C. There is synteny with other non-draft phages belonging to the same cluster. /note=Starterator: The start number called the most often in the published annotations is 5, it was called /note=in 96 of the 165 non-draft genes in the pham. /note=Location call: The evidence supports that this is a real gene, and the potential candidate start site at 120619 seems most likely. /note=Function call: NKF. There were no phagesdb BLAST hits with functions, and the top 2 NCBI BLAST hits with 100% coverage, 100%+ identity, and E-value close to 0 also agreed that it was NKF. There were also no hits on CDD or HHpred that fulfilled the e-value, coverage, and identity requirements. Phage SilverDipper has a hit on phagesDB blast with a low e-value for glycoside hydrolase, but when the sequences of both genes are blasted against each other, there is extremely low probability and a high e-value. Not enough evidence to claim this as a glycoside hydrolase. /note=Transmembrane domains: Neither TMHMM, deep TMHMM, or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Hegde, Priya /note=Secondary Annotator QC: I agree with the location and function call. CDS 121685 - 122707 /gene="215" /product="gp215" /function="hypothetical protein" /locus tag="JulietS_215" /note=Original Glimmer call @bp 121685 has strength 17.23; Genemark calls start at 121685 /note=SSC: 121685-122707 CP: yes SCS: both ST: SS BLAST-Start: [gp214 [Mycobacterium phage Cali] ],,NCBI, q1:s1 100.0% 0.0 GAP: 49 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.407, -3.8137921535956054, no F: hypothetical protein SIF-BLAST: ,,[gp214 [Mycobacterium phage Cali] ],,YP_002224658,100.0,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Melnyk, Mattie /note=Auto-annotation: Both Glimmer and Genemark call the start site as 121685 which is a TTG codon /note=Coding Potential: Both host trained and self trained genemark show coding potential between the start and stop sites /note=SD (Final) Score: The final score is -3.814 which is the second best SD score /note=Gap/overlap: There is a gap of 49 which is a very reasonable gap /note=Phamerator: As of 1/13/23 the Pham is 431 and it has 183 members, 15 of which are drafts including Cali and CharlieB /note=Starterator: JulietS called the most annotated start site (Start 14 @ 121685) with 160 Manual annotations /note=Location call: This is likely a real gene with a start at 121685 /note=Function call: NKF; PhagesDB hits are function unknown with e-values of 0. NCBI hits both call a hypothetical protein with (identity: 100, coverage: 100, e-value: 0) and (Identity: 99.7059, coverage: 100, e-value: 0). HHpred hits both call a domain of unknown function with e-values of 2e-18 and 7.7e-16. There are no relevant CDD hits. /note=Transmembrane domains: Deep TMHMM predicts no TMDs and that the protein is intracellular. /note=Secondary Annotator Name: Uyemura, Antonio /note=Secondary Annotator QC: I agree with the location and function call. The phamerator section needs to be reworded though. As it stands, it sounds like Cali and CharlieB are draft genomes. Also, I would suggest mentioning the z-score somewhere in your PECAAN notes. This can be added in the SD score or the location call section. Lastly I would suggest adding the ratio of manually annotated start sites over the total amount (160/168) in the starterator section. CDS 122707 - 122940 /gene="216" /product="gp216" /function="hypothetical protein" /locus tag="JulietS_216" /note=Original Glimmer call @bp 122677 has strength 6.98; Genemark calls start at 122707 /note=SSC: 122707-122940 CP: yes SCS: both-gm ST: SS BLAST-Start: [gp215 [Mycobacterium phage Cali] ],,NCBI, q1:s2 100.0% 3.94134E-48 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.857, -5.315750068410762, no F: hypothetical protein SIF-BLAST: ,,[gp215 [Mycobacterium phage Cali] ],,YP_002224659,98.7179,3.94134E-48 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Gurunathan, Vibha /note=Auto-annotation: Glimmer calls the start site at 122677, with start codon ATG, and GeneMark calls the start site at 122707, also with start codon ATG. /note=Coding Potential: This gene has variable coding potential in the top strand, and both the start sites chosen by Glimmer and Genemark cover all the coding potential. /note=SD (Final) Score: The suggested start site by Genemark had a Z score of 1.857, with a final score of -5.316. The suggested start chosen by Glimmer had a more negative final score and a lower Z score. /note=Phamerator: Pham 521. Date 1/20/23. It is conserved; found in Ading_224 and Adlitam_222. No notes/functions were listed on Phams database. /note=Starterator: Start site 5@122677 has 1 manual annotation, and start site 7@122704 has 95 manual annotations, and start site 8 @ 122707 has 58 manual annotations. /note=Location call: Due to the suggested start site by the autoannotation, reasonable overlap, and more manual annotations than the start site called by Glimmer, the location is called at start site 122707. /note=Function call: PhagesDB data shows no information. HHPred results all had high e-values. The top results from NCBI BLASTp were hypothetical proteins. Thus, function is called as unknown. /note=Transmembrane domains: Deep TMHMM shows no predicted TMRs, so this gene is likely not coding for a membrane protein. /note=Secondary Annotator Name: Barrera, Alexis /note=Secondary Annotator QC: I agree with the location and function call. However, I would mark some evidence under NCBI BLAST to support the function call. You also need to add the "gaps/overlap" line in you PECAAN notes making sure to mention which other phages display similar gaps/overlaps. tRNA 123063 - 123137 /gene="217" /product="tRNA-Gln(ctg)" /locus tag="JULIETS_217" /note=tRNA-Gln(ctg) tRNA 123144 - 123219 /gene="218" /product="tRNA-Asn(gtt)" /locus tag="JULIETS_218" /note=tRNA-Asn(gtt) CDS 123294 - 123752 /gene="219" /product="gp219" /function="hypothetical protein" /locus tag="JulietS_219" /note=Original Glimmer call @bp 123294 has strength 17.24; Genemark calls start at 123294 /note=SSC: 123294-123752 CP: yes SCS: both ST: SS BLAST-Start: [gp214 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 2.31818E-108 GAP: 353 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.986, -2.7423981586358774, yes F: hypothetical protein SIF-BLAST: ,,[gp214 [Mycobacterium phage Bxz1] ],,NP_818264,100.0,2.31818E-108 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hines, Kia /note=Auto-annotation: Glimmer and GeneMark both call 123294 as start. /note=Coding Potential: There is good coding potential on the forward strand indicating this is a forward gene. /note=SD (Final) Score: The final score is the best option at -2.742 and the z score, also the best option, is 2.986. /note=Gap/overlap: The gap with the upstream gene is too large to be reasonable (353 bp), however since there is no coding potential within the gap region on Host Trained Gene Mark then there is no indication of gene insertion being an option to fill the gap. The length of the gene (459 bp) is an acceptable length. /note=Phamerator: Pham number is 499 as of 1/13/23. It is conserved, found in ZygoTaiga_222 and Zalkecks_224. /note=Starterator: Start number (@123294) is 4, manually annotated 159 times. This is the same as the conserved start site. There is another start number, 1, at start site 123309. There are 183 members of the pham and 159 of the 168 non-draft genes call the same conserved start site. /note=Location call: Based on the above evidence, this is a real gene and the start site is 123294. /note=Function call: NKF. Phagesdb, NCBI Blast, CDD, and HHpred all call no known function. Despite having e-values very close to zero (5e-86 and 2.3e-108 respectively), neither phages db nor NCBI Blast call a function. CDD did not have any results and the top hits on HHpred had really terrible e-values (like 200 and 60). /note=Transmembrane domains: There are no TMDs predicted by deepTMHMM so it is not a membrane protein. /note=Secondary Annotator Name: Hegde, Priya /note=Secondary Annotator QC: I agree with the location and function call for this gene. CDS 123749 - 124192 /gene="220" /product="gp220" /function="hypothetical protein" /locus tag="JulietS_220" /note=Original Glimmer call @bp 123749 has strength 11.79; Genemark calls start at 123749 /note=SSC: 123749-124192 CP: yes SCS: both ST: SS BLAST-Start: [gp215 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 7.76894E-105 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.58, -3.5316035464369326, yes F: hypothetical protein SIF-BLAST: ,,[gp215 [Mycobacterium phage Bxz1] ],,NP_818265,100.0,7.76894E-105 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Luk, Jarrett /note=Auto-annotation:Forward Gene.Glimmer called the start site at 123749, Genmark called the start site at 123749. Starting codon is GTG /note=Coding Potential:The coding potential is found in the forward frame. The gene almost covers all of the coding potential in the Self Trained GeneMark. The gene covers all the coding potential in the Host Trained genemark /note=SD (Final) Score:-3.532 it`s the best score due to how it has the smallest overlap. Z score is 2.58 which is greater than 2.It shows that the site is a good candidate for a start site. /note=Gap/overlap:-4 which is indicates that the gene can be part of an operon.The overlap is conserved in Alice and Ading from the C1 cluster /note=Phamerator: Pham 462, 01/19/23 gene conserved in Phage CharlieB and Sebata /note=Starterator: 01/13/23. Manually annotated 157/163 nondraft in this pham. Start 9 (9,123749) are manually called by 157 others . Evidence is also in line with the autoannotation made by glimmer and genemark /note=Location call: It is a real gene and the strat site is Start 9 at 123749 based on the evidence above /note=Function call:NKF.The top two hits on phagesDB blast (e-value=2e-85 for phage Zeenon and ZygoTaiga) indicate unknown function. The top hit on NCBI blast(e-value =8e-105 and 1e-104 for phage Bxz1 and Turret respectively, with over 100% identity and 100% coverage) also indicates an unknown function. There are no significant hits on HHpred and CDD. /note=Transmembrane domains: No transmembrane domain predicted TMHMM, TOPCONS and deepTMHMM.Which indicates that the gene is not responsible for a membrane protein /note=Secondary Annotator Name: Geghamyan, Knar /note=Secondary Annotator QC: I agree with the location and function call. CDS 124192 - 124461 /gene="221" /product="gp221" /function="hypothetical protein" /locus tag="JulietS_221" /note=Original Glimmer call @bp 124192 has strength 6.58; Genemark calls start at 124192 /note=SSC: 124192-124461 CP: yes SCS: both ST: SS BLAST-Start: [gp220 [Mycobacterium phage Cali] ],,NCBI, q1:s2 100.0% 3.20845E-59 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.554, -4.0338999058337315, no F: hypothetical protein SIF-BLAST: ,,[gp220 [Mycobacterium phage Cali] ],,YP_002224662,98.8889,3.20845E-59 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Okahata, Leila /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 124192 bp. Start codon ATG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. The chosen start site includes all of the coding potential. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.034. Although this is not the best SD score, because this start site has a 1 bp overlap, this may be evidence of an operon so this SD score is still acceptable. /note=Gap/overlap: -1 bp upstream overlap. This overlap is very small and reasonable and may be evidence of an operon. /note=Phamerator: Pham 549. Date 1/13/2023. It is conserved and found in Grungle, Daffodil, and ParkTD, which are all in the same cluster as JulietS (C). /note=Starterator: Start site 2 in Starterator was manually annotated in 135/156 non-draft genes in this pham. Start 2 is 124192 bp in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and likely has a start site at 124192 bp. /note=Function call: No known function (NKF). The top two phagesDB BLAST hits are of unknown function (E-value = 3e-48), and the top three NCBI BLAST hits are also of unknown function (E-value < 9e-59, 98.88%+ identity, 100% coverage). CDD and HHpred had no significant hits. /note=Transmembrane domains: Neither TMHMM, TOPCONS, or DeepTMHMM predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Sandhu, Muskaan /note=Secondary Annotator QC: I agree with the location call and the function call. CDS 124518 - 124853 /gene="222" /product="gp222" /function="hypothetical protein" /locus tag="JulietS_222" /note=Original Glimmer call @bp 124518 has strength 5.91; Genemark calls start at 124518 /note=SSC: 124518-124853 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_SPRINKLERS_226 [Mycobacterium phage Sprinklers] ],,NCBI, q1:s1 100.0% 2.8774E-76 GAP: 56 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.496, -3.979581593363423, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_SPRINKLERS_226 [Mycobacterium phage Sprinklers] ],,QAY13483,100.0,2.8774E-76 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hegde, Priya /note=Auto-annotation: Glimmer and GeneMark call a start site of 124518 (ATG codon). /note=Coding Potential: Good coding potential in GeneMark Self, though coding potential in Host-trained GeneMark looks a little worse (but still present). Coding potential in this ORF is on the forward strand only and spans the length of the gene. /note=SD (Final) Score: -3.980. It is the best final score on PECAAN. /note=Gap/overlap: There is a 56bp gap with the previous gene, which is reasonable. The gap is conserved, found in phages Flabslab and Grungle. /note=Phamerator: This gene is in pham 570 (date accessed: 01/19/23). It is conserved, found in phages Flabslab and Grungle. /note=Starterator: Start site 3 at 124518 has 110 manual annotations. It is the most manually annotated gene in Starterator. /note=Location call: This gene likely starts at 124518. This is supported by a high final score, Z-score > 2, and a high number of manual annotations in Starterator. /note=Function call: NKF in PhagesDB BLASTp; top hits were phages Cali and Ronan (e-values of 1e-59). No convincing hits in HHPred. No hits in CDD. NKF in NCBI BLAST; top hits were phages Sprinklers (2.88e-76) and Cali (3.21e-76). /note=Transmembrane domains: No TMDs predicted in DeepTMHMM, so this protein is not a membrane protein. /note=Secondary Annotator Name: Kristiano, Luke /note=Secondary Annotator QC: I agree with the location and function call. Since the gap/overlap is above 50 bp, please discuss whether the gap/overlap is conserved as well as whether all coding potential is covered by the ORF. CDS 124932 - 125285 /gene="223" /product="gp223" /function="membrane protein" /locus tag="JulietS_223" /note=Original Glimmer call @bp 124932 has strength 4.75; Genemark calls start at 124932 /note=SSC: 124932-125285 CP: yes SCS: both ST: SS BLAST-Start: [gp218 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 5.20572E-78 GAP: 78 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.198, -4.6033671133628555, no F: membrane protein SIF-BLAST: ,,[gp218 [Mycobacterium phage Bxz1] ],,NP_818268,100.0,5.20572E-78 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Lee, Amber /note=Auto-annotation: Glimmer and GeneMark both call the start site at 124932 bp. The start codon is ATG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: The start site 124932 corresponds to a Final Score of -4.603 which is the third best final score. It also has a Z-score of 2.198. /note=Gap/overlap: There is a 78 bp gap which is reasonable because the gap is conserved in other phages (Fludd, FoxtrotP1) of the same cluster and there is no coding potential in the gap that might be a new gene. /note=Phamerator: pham: 517. Date 1/17/2023. It is conserved; found in Fludd (C1) and FoxtrotP1 (C1). /note=Starterator: Start site 1 in Starterator was manually annotated in 157/157 non-draft genes in this pham and is the most manually annotated start site. Start 1 is 124932 bp in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 124932 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: Membrane protein. HHPRED calls a specific type of membrane protein in a secretion system with adequate probability (>80%), and coverage (>50%) but not a low e-value (>6.9); this was not included as part of evidence. Many strong hits (e-value ~0) are observed in PhagesDB and NCBI BLAST, but no known function. One hit in NCBI BLAST list membrane protein which agrees with the current function call. No hits in CDD. TMHMM and Deep TMHMM calls 2 TMDS. /note=Transmembrane domains: TMHMM calls 2 TMDs. Deep TMHMM confirms this call. TOPCONs doesn’t call anything. /note=Secondary Annotator Name: Okahata, Leila /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 125236 - 125556 /gene="224" /product="gp224" /function="hypothetical protein" /locus tag="JulietS_224" /note=Original Glimmer call @bp 125236 has strength 7.99; Genemark calls start at 125335 /note=SSC: 125236-125556 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_SHAQNATO_221 [Mycobacterium phage Shaqnato]],,NCBI, q1:s1 100.0% 5.458E-69 GAP: -50 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.843, -5.056409918993686, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_SHAQNATO_221 [Mycobacterium phage Shaqnato]],,QAY05151,100.0,5.458E-69 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Qi, Haocheng /note=Auto-annotation:The Glimmer and Genemark doesn’t agree with each other, glimmer choose 125236, start codon GTG; and genemark choose 125335, start codon GTG /note=Coding Potential: Coding potential is found both in GeneMark Self and Host, and there is only forward potential, and the chosen start site does include all of the coding potential. /note=SD (Final) Score:1.843, not the best one, but still is a reasonable amount /note=Gap/overlap: -50, this is a large overlap, however, this overlap is conserved in a lot of other phage in cluster C1 such as ading. Due to this consistency, this -50 overlap is acceptable. /note=Phamerator: 683 Date 01/13/23. It is conserved; found in other 148 non-draft phages, such as Ading or Bread. /note= /note=Starterator: Start site 2 in Starterator was manually annotated in 124/167 non-draft genes in this pham. Start 2 is 125236 in JulietS. This evidence agrees with the site predicted by Glimmer. /note=Location call: Due to the consistency and also the starterator, this is a real gene and the start is 125236 /note=Function call: Not known function. The top 2 phagesdb BLAST hits (phage:Pinkcreek; Stubby) have the function of unknown function with e-value of 6e-57 Not known function, and the ncbi blast also have no known function with e-value of 5.458e-69. There is also no hits in CDD, the largest possibility in HHpred is 42.6%, so it is not reliable, so overall there is no known function in this gene. /note=Transmembrane domains: There is no read in DeepTMHMM, which means that this does not have transmembrane domains /note=Secondary Annotator Name: Deng, Yiran /note=Secondary Annotator QC: I agree with this annotation. All of the evidence have been considered. CDS 125671 - 125832 /gene="225" /product="gp225" /function="hypothetical protein" /locus tag="JulietS_225" /note= /note=SSC: 125671-125832 CP: no SCS: neither ST: NI BLAST-Start: [gp225 [Mycobacterium phage Spud] ],,NCBI, q1:s26 100.0% 8.89024E-30 GAP: 114 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.112, -6.876055311631723, no F: hypothetical protein SIF-BLAST: ,,[gp225 [Mycobacterium phage Spud] ],,YP_002224444,67.9487,8.89024E-30 SIF-HHPRED: SIF-Syn: CDS 125837 - 126010 /gene="226" /product="gp226" /function="hypothetical protein" /locus tag="JulietS_226" /note= /note=SSC: 125837-126010 CP: no SCS: neither ST: NI BLAST-Start: [hypothetical protein M181_gp125 [Mycobacterium phage Gizmo] ],,NCBI, q1:s1 100.0% 2.63705E-32 GAP: 4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.953, -5.86583354674825, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein M181_gp125 [Mycobacterium phage Gizmo] ],,YP_008061002,100.0,2.63705E-32 SIF-HHPRED: SIF-Syn: CDS 126041 - 127699 /gene="227" /product="gp227" /function="Ro-like RNA binding protein" /locus tag="JulietS_227" /note=Original Glimmer call @bp 126041 has strength 15.88; Genemark calls start at 126041 /note=SSC: 126041-127699 CP: yes SCS: both ST: SS BLAST-Start: [Ro protein [Mycobacterium phage Breeniome] ],,NCBI, q1:s1 100.0% 0.0 GAP: 30 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.811, -3.0478325568523097, yes F: Ro-like RNA binding protein SIF-BLAST: ,,[Ro protein [Mycobacterium phage Breeniome] ],,YP_009221330,99.4565,0.0 SIF-HHPRED: 60-kDa SS-A/Ro ribonucleoprotein; HEAT repeat von Willebrand Factor A Rossmann fold MIDAS motif`, RNA BINDING PROTEIN; 1.95A {Xenopus laevis} SCOP: c.62.1.5, a.118.25.1,,,1YVR_A,98.0072,100.0 SIF-Syn: Ro-like RNA binding protein called for both JulietS and Pier (Pham 227), upstream gene has NKF in JulietS and not called in Pier (Pham 683), has NKF in JulietS and not called in Pier (Pham 535), all genes are from the same pham. /note=Primary Annotator Name: Barrera, Alexis /note=Auto-annotation: Glimmer and Genemark. Both call the start located at 126041. The start codon is ATG, which is common (about half of all genes have this start). /note=Coding Potential: Coding potential in this reading frame is in the forward direction only, which indicates this is a forward gene. Coding potential is found in both GeneMark Self and Host. All of the coding potential is included. /note=SD (Final) Score: The z score is 2.811. The final score is -3.048 which is the highest value of all listed start sites. /note=Gap/overlap: 484 bps gap upstream of the gene and a 361 gap downstream of the gene. Pham maps indicate that part of the gap upstream of the gene is tRNA. This gene and large gaps, both upstream and downstream of the gene, are conserved in phages BackyardAgain, Catera, and Lukilu. /note=Phamerator: Pham: 227. Date 01/15/23. The gene is conserved in phages JustHall and Derek which are in the same cluster as JulietS. /note=Starterator: Start site 25 is found in 175 of 255 of genes in pham and was called for 160 of 234 non-draft phage genomes in the pham. It is the most annotated start site and is called 100.0% of the time when present. /note=Location call: Considering the evidence above, this gene is a real gene and has a start site at 126041 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: Ro-like RNA binding protein. The top two hits for HHpred have a function of ro-like RNA binding protein (probability: 100, E-value: 5.6e-74 and 1.7e-69, % coverage: 98.0072 and 99.4565). In this pham, the Phagesdb Function Frequency also states that there is a frequency of 100% being a ro-like RNA binding protein (94 phages called) in subcluster C1. PhagesDB BLASTp shows many hits for phages with the function of ro-like RNA binding protein, all with E-values of 0. NCBI BLASTp has eight of the top ten hits with the function of ro-like RNA binding protein, (all % coverage: 100, E-value: 0). There is one CDD hit for a TROVE domain, Ro RNP which is likely RNA binding (% identity: 34.8083, % alignment: 510324, % coverage: 62.3188, E-value: 0). /note=Transmembrane domains: DeepTMHMM predicts no TMRs. This evidence indicates that this is not a membrane protein. /note=Secondary Annotator Name: Luk, Jarrett /note=Secondary Annotator QC: I agree with the location and function called. CDS 128060 - 128383 /gene="228" /product="gp228" /function="hypothetical protein" /locus tag="JulietS_228" /note=Original Glimmer call @bp 128060 has strength 8.85; Genemark calls start at 128060 /note=SSC: 128060-128383 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDH18_gp107 [Mycobacterium phage Lukilu] ],,NCBI, q1:s1 100.0% 1.92665E-71 GAP: 360 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.553, -3.5707972674952195, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDH18_gp107 [Mycobacterium phage Lukilu] ],,YP_009597791,100.0,1.92665E-71 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kristianto, Luke /note=Auto-annotation: Both Glimmer and GeneMark call the gene, and they agree on the start site at 128060 bp. The start codon is ATG, which has a high probability of being used as a start site. /note=Coding Potential: Coding potential in this ORF is on the forward strand only on the second frame, indicating that this is a forward gene. The ORF has reasonable coding potential and the called start site does capture all of the coding potential. /note=SD (Final) Score: The final score is the second best option at -3.571 and the z-score is the best option at 2.553. This provides strong evidence that the called start site is the real start site. /note=Gap/overlap: The gap/overlap is very large at 360 bp. However, this gap is justified because there is no coding potential within the gap. Furthermore, this gap is conserved and shows up in phage Ading from cluster/subcluster C1. /note=Phamerator: Pham: 535. Date 1/24/2023. It is conserved and found in Ading (C1) and Adlitam (C1). /note=Starterator: Start number 4 in Starterator was manually annotated in 122/157 non-draft genes in this pham. Start number 4 is 128060 bp in phage JulietS. This is likely the start site because it was called the most and was conserved in 171/171 (100.0%) of genes in the pham. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 128060 bp. Starterator agrees with Glimmer and GeneMark. /note=Function call: NKF. The top two PhagesDB BLAST hits have the function of “function unknown” (100% identity, E-value = 7e-56), and the top two NCBI BLAST hits have the function of “hypothetical protein” (100%/99.0654% identity, E-value = 1.92665e-71/5.77601e-71). Results from CDD and HHpred were irrelevant because either no results came up or unlikely results with unreasonably high e-values came up. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs. DeepTMHMM agrees (1.0 probability inside throughout). Therefore, it is likely not a membrane protein. /note=Secondary Annotator Name: Deng, Yiran /note=Secondary Annotator QC: I agree with this annotation, all of the evidences have been considered. CDS 128412 - 128651 /gene="229" /product="gp229" /function="hypothetical protein" /locus tag="JulietS_229" /note=Original Glimmer call @bp 128412 has strength 10.63; Genemark calls start at 128412 /note=SSC: 128412-128651 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein M181_gp122 [Mycobacterium phage Gizmo] ],,NCBI, q1:s14 100.0% 4.03472E-51 GAP: 28 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.062, -2.970161406017234, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein M181_gp122 [Mycobacterium phage Gizmo] ],,YP_008061005,85.8696,4.03472E-51 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Shah, Amay /note=Auto-annotation: Glimmer and Genemark. Both call the start site at 128412. /note=Coding Potential: Coding potential in this ORF is in the forward direction only, indicating that this is a forward gene. Coding potential is found in both GeneMark Self and Host. /note=SD (Final) Score: The final score is -2.970. This is the best final score on PECAAN. The z-score is the highest of all gene candidates. /note=Gap/overlap: There is a reasonable gap of 28bp. /note=Phamerator: pham: 547. Date 1/11/2023. It is conserved; found in Alice (C) and Bangla1971 (C). /note=Starterator: Start site 4 in Starterator was manually annotated in 141/156 non-draft genes in this pham. Start 4 is 128412 in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 128412. /note=Function call: NKF. BLASTp hits only included hypothetical proteins with an e-value of 6e-41. CDD returned no hits. The top NCBI BLAST hit was a hypothetical protein with an e-value of 4.03e-51. HHPRED did not return any significant hits. /note=Transmembrane domains: Deep TMHMM doesn`t predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Uyemura, Antonio /note=Secondary Annotator QC: I agree with the location and function call. I would suggest putting in the z-score either in the SD score or location section. Also in the Function call, id suggest to add the e-value observed for the PhagesDB BLAST and add the coverage and identity for the NCBI Blast. CDS 128698 - 129165 /gene="230" /product="gp230" /function="hypothetical protein" /locus tag="JulietS_230" /note=Original Glimmer call @bp 128698 has strength 11.79; Genemark calls start at 128698 /note=SSC: 128698-129165 CP: yes SCS: both ST: SS BLAST-Start: [gp227 [Mycobacterium phage Cali] ],,NCBI, q1:s1 100.0% 2.27073E-106 GAP: 46 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.965, -2.707694805492614, yes F: hypothetical protein SIF-BLAST: ,,[gp227 [Mycobacterium phage Cali] ],,YP_002224669,100.0,2.27073E-106 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sandhu, Muskaan /note=Auto-annotation:The start site in Glimmer and GeneMark are the same: 128698. Start codon is GTG, which is a common start codon. /note=Coding Potential: There is coding potential present in ORF 1 of the direct sequence (as expected for a forward gene) in the Host-Trained & Self-Trained Genemark. Start site 128698 covers all of coding potential. /note=SD (Final) Score:-2.708. This is the best score and it corresponds to the start site reported by Glimmer and GeneMark. /note=Gap/overlap:There is a 47 bp gap upstream of the gene and a 0bp gap downstream of the gene. Gap is reasonable because it is conserved in phage Cali and there is no coding potential present to fill in the gaps in Host-Trained and Self-Trained Genemark. /note=Phamerator: Pham 66992 as of 1/18/23. It is conserved, found in Colt (C1) and Darko (C1). /note=Starterator: Most annotated start site is 22; 165/338 of non-draft genes called start site 22. Start 22 is @ 128698 bp in JulietS. Evidence agrees w/ site predicted by Glimmer and GeneMark. /note=Location call: Start site 22 @ 128698 bp. /note=Function call: No known function. The top three PhagesDB BLAST hits have the function of hypothetical proteins (E-value = 4*10-84) and the top three NCBI BLAST hits also have the function of hypothetical proteins (100% coverage >99% identity, and E-value <10-105). HHpred had no relevant hits with E-values being greater than 26. CDD had no relevant hits as well. /note=Transmembrane domains: DeepTMHMM does show any TMDs, therefore, this is not a transmembrane protein. /note=Secondary Annotator Name: Hegde, Priya /note=Secondary Annotator QC: I agree with the location and function call for this gene. CDS 129165 - 129557 /gene="231" /product="gp231" /function="hypothetical protein" /locus tag="JulietS_231" /note=Original Glimmer call @bp 129165 has strength 7.54; Genemark calls start at 129165 /note=SSC: 129165-129557 CP: yes SCS: both ST: SS BLAST-Start: [gp224 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 1.32154E-91 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.062, -3.3503726477288405, yes F: hypothetical protein SIF-BLAST: ,,[gp224 [Mycobacterium phage Bxz1] ],,NP_818274,100.0,1.32154E-91 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Wu, Angus /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 129165, and the called start codon is GTG. /note=Coding Potential: The gene has good coding potential with the host-trained GeneMark on ORF 3. The coding potential increases near the potential start site and plateaus until the stop codon where it tapers off. The gene similarly has good coding potential with the self-trained GeneMark on ORF 3, where the coding potential rises at the start site and plateaus until the stop site. Finally, the start and stop sites also encompass all of the coding potential. /note=SD (Final) Score: The Final Score is -3.350, and the Z score is 3.062. The final score for this start site has the best sequence match (highest final score), and also has the highest Z-score of the possible start site candidates. /note=Gap/overlap: There is a 1 base pair overlap with the gene upstream, and this is reasonable/acceptable because the overlap isn’t too long, and implies a shared operon. This chosen start site does not have the longest open reading frame, but the gene will still be 393 bp long, which is good. /note=Phamerator: The gene is found in Pham 660 as of 1/15/23. The gene is conserved in other members of subcluster C1, and I used phage Turret and Yassified for comparison. /note=Starterator: A start site choice exists that is conserved among members of the pham, and corresponds to start site 5 and 129165. 142/142 non-draft genomes in the pham call also this start site. /note=Location call: The available evidence suggests that the start site is 129165. The gene appears to be real, and this proposed start site covers all the coding potential, and has a good final score and z-score. It is the most commonly called site and called 100% of the time when present. /note=Function call: The function of this protein is unknown (NKF). All results from PhagesDB BLASTp show hits with 100% identity, 100% alignment, and 100% coverage, with e-values 1e-73,, and those proteins have no known function. Similarly on NCBI BLASTp, e-values are low (1e-91), and those proteins also have no known function. There were no hits on CDD, and HHpred results were not desirable, as e-values were too high. /note=Transmembrane domains: Neither DeepTMHMM or TOPCONS predicted any TMDs, this is not a membrane protein. /note=Secondary Annotator Name: Santilla, Matthew /note=Secondary Annotator QC: I agree with this annotation. CDS 129557 - 129763 /gene="232" /product="gp232" /function="hypothetical protein" /locus tag="JulietS_232" /note=Original Glimmer call @bp 129557 has strength 13.78; Genemark calls start at 129557 /note=SSC: 129557-129763 CP: yes SCS: both ST: SS BLAST-Start: [gp225 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 2.36075E-41 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.399, -3.9718914447963414, yes F: hypothetical protein SIF-BLAST: ,,[gp225 [Mycobacterium phage Bxz1] ],,NP_818275,100.0,2.36075E-41 SIF-HHPRED: DUF6307 ; Family of unknown function (DUF6307),,,PF19826.2,48.5294,91.9 SIF-Syn: /note=Primary Annotator Name: Davis, Kayla /note=Auto-annotation: Both Glimmer and GeneMark state that this gene has a start site of 129557, which correlates to a start codon of ATG. /note=Coding Potential: Both the Host trained and the Self-trained GeneMark show that there is a complete direct sequence in the forward direction for the entirety of the gene.This gene shares synteny with Grungle and BackyardAgain. /note=SD (Final) Score: The chosen Z-score is 2.399, and the chosen final score is -3.972. Both of these are the highest in their respective categories, and correspond to the start site of 129557. /note=Gap/overlap: There is an overlap of 1bp as evidenced by the -1 in the gap score. This is likely evidence of an operon. /note=Phamerator: As of 02/07/23, this phage was found to be in pham 553. Other phages in this pham include DirtMonster, Matsumoto, and JPickles. /note=Starterator: As of 01/27/23, the most annotated start site is 3, which was called in 148 of the 156 non-draft phages in this pham. As for JulietS, start site 3 also was the most annotated start site, with 148 MA’s. This correlates to a start site of 129557, which matches with the GeneMark and Glimmer start site. /note=Location call: It is very likely that this is a real gene with a start site of 129557. /note=Function call: Based on the evidence, it is very likely that this protein is NKF. HHpred had several hits that had high e-values that were much greater than 0, meaning that we cannot verify a function through HHpred. There were 0 relevant hits on CDD. /note=The NCBI Blast had numerous hits all of which were mainly for hypothetical proteins, these had low e-values ranging from 2e–41 to 9e-39. /note=Transmembrane domains: Deep TMHMM predicted 0 TMRs, meaning that this is not a transmembrane protein. /note=Secondary Annotator Name: Hegde, Priya /note=Secondary Annotator QC: I agree with the location and function calls for this gene. CDS 129760 - 129891 /gene="233" /product="gp233" /function="hypothetical protein" /locus tag="JulietS_233" /note=Original Glimmer call @bp 129760 has strength 5.85; Genemark calls start at 129760 /note=SSC: 129760-129891 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein DRAZDYS_236 [Mycobacterium phage Drazdys] ],,NCBI, q1:s5 100.0% 2.4463E-24 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.753, -5.183665924498821, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein DRAZDYS_236 [Mycobacterium phage Drazdys] ],,AEK07037,91.4894,2.4463E-24 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Pham, Truc /note=Auto-annotation: The two auto-annotation algorithms, Glimmer and GeneMarks, both call the start of this gene at 129760 with the start codon of ATG. /note=Coding Potential: The coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host and all of the ORF is included with the 129760 start site. /note=SD (Final) Score: The SD (Final) Score for this start site is -5.184 and the Z-score is 1.753 on PECAAN. Neither score are the best score possible, but given that this gene is part of an operon, these scores do not matter. /note=Gap/overlap: This gene has an overlap of 4 base pairs with the previous gene, indicating that it is part of an operon. This is the best possible start since the other start site options would have a great overlap of 274 base pairs or a gap of 89 base pairs that would require the addition of another gene. /note=Phamerator: This gene belongs to pham number 357 as of 1/13/2023. The gene is conserved in phages of this cluster (C) like Bigswole and Basquiat. There is no function listed for almost all members of this family, so it is highly likely that this is a gene with an unknown function. /note=Starterator: Start site 7 is most often called as it was manually annotated in 180/184 non-draft genes in the pham. Start 7 is 129760 in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 129760. /note=Function call: No Known Function. The top three phagesdb BLAST hits are function unknown (E-value <10^-21). NCBI BLAST top hits also have no known function for this gene (100% coverage, 91+% identity, and E-value <10^-24). HHpred’s top hits indicate a known function, but these results have a low probability, coverage percentage, and poor e-value. Also, many other bioinformatic tools disagree, so it is most probable that this gene has no known function. CDD had no relevant hits. /note=Transmembrane domains: Neither DeepTMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Deng, Yiran /note=Secondary Annotator QC: I agree with this annotation, all of the evidences have been considered CDS 129943 - 130671 /gene="234" /product="gp234" /function="glycosyltransferase" /locus tag="JulietS_234" /note=Original Glimmer call @bp 129943 has strength 12.65; Genemark calls start at 129943 /note=SSC: 129943-130671 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein KHO60_gp107 [Mycobacterium phage CharlieB] ],,NCBI, q1:s1 100.0% 0.0 GAP: 51 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.286, -4.1462986041499, no F: glycosyltransferase SIF-BLAST: ,,[hypothetical protein KHO60_gp107 [Mycobacterium phage CharlieB] ],,YP_010057613,100.0,0.0 SIF-HHPRED: Polypeptide N-acetylgalactosaminyltransferase; GalNAc-Ts, GalNAc-T3, long-range glycosylation preference, (glyco)peptides, Molecular dynamics, specificity, enzyme kinetics, FGF23, phosphate homeostasis, TRANSFERASE; HET: EDO, NAG, UDP, NGA; 1.96A {Taeniopygia guttata},,,6S22_A,95.8678,99.9 SIF-Syn: /note=Primary Annotator Name: Deng, Yiran /note=Auto-annotation: Glimmer and GeneMark both call the start site at 129943 with start codon ATG /note=Coding Potential: this ORF has good coding potential on the direct sequence (forward strand), indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. The chosen start site does include all of the coding potential /note=SD (Final) Score: final score of -4.146 and z-value of 2.286 is not the highest score, however the longest ORF start site 129925 does not have the highest score either /note=Gap/overlap: the gap with the upstream gene is 51 bp. This gene is conserved in several other phages of the same cluster (Babyland, Bangla1971) and the gap does not contain coding potential and was seen in Alice and Grungle as well. /note=Phamerator: Pham 52739 and has 161 members, Date 1/24/2023. It is conserved and found in Ading (C) and ArcherS7 (C) /note=Starterator: Start site number 232 in Starterator had the highest manual annotation in 125/147 non-drafted genes in this pham. Start site 232 is at position 129943 in JulietS, which agrees with the auto-annotated site by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 129943. Its final score is not the highest which may due to the gap that is longer than the LORF start site, though not much coding potential is found in the gap and the start site 129943 has the highest manual annotation. /note=Function call: glycosyltransferase. The top phagesDB BLAST hits have the same suggested function with small e-values of 1e -151 (Shaqnato, Sprinklers). The highest hit of NCBI Blastq top hits agree with the same function (99%+ coverage, 99%+ identity, and e-value of <9e-180). HHPRED top hits agree with the polypeptode N-acetylgalactosaminyltransferase function with >99% probability, >83% coverage and e-value of <3.9e-22. CDD has no relevant hit. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Santilla, Matthew /note=Secondary Annotator QC: I agree with this annotation. CDS 130659 - 130961 /gene="235" /product="gp235" /function="membrane protein" /locus tag="JulietS_235" /note=Original Glimmer call @bp 130659 has strength 6.73; Genemark calls start at 130659 /note=SSC: 130659-130961 CP: yes SCS: both ST: SS BLAST-Start: [gp231 [Mycobacterium phage Cali] ],,NCBI, q1:s1 100.0% 5.15944E-58 GAP: -13 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.175, -4.299059343200891, no F: membrane protein SIF-BLAST: ,,[gp231 [Mycobacterium phage Cali] ],,YP_002224673,100.0,5.15944E-58 SIF-HHPRED: SIF-Syn: helix-turn-helix DNA binding domain, upstream gene is glycosyltransferase, downstream is glycosyltransferase, just like in phage Fludd. /note=Primary Annotator Name: Santilla, Matthew /note=Auto-annotation: Glimmer and Genemark both agree that the start site is @ 130659. Start codon is ATG. /note=Coding Potential: The host-trained and self-trained Genemark show that there is coding potential between the aforementioned start site and stop site. /note=SD (Final) Score: The final score of the start site @ 130659 is -4.299 which is the most negative score out of the start sites given and the z-score is 2.175 which is above the recommended 2 for z-scores. /note=Gap/overlap: This is an overlap of -13 bp which is good since it`s below the -50 bp limit. /note=Phamerator: This gene is a part of Pham 620. This gene is also found in phages Cali, Capablanca, and Ading which are all phages in JulietS’s cluster. /note=Starterator: Start site 1 is @ 130659 in JulietS and is the most called start site with 144/147 non-draft genes in this pham calling this start site. /note=Location call: Based on the aforementioned evidence the start site is most likely @ 130659. /note=Function call: In PhagesDB blast the top 3 hits are phages Sprinklers, Breeniome, and BeanWater who all have an e-value of 2e^-50 and the function listed as unknown. In NCBI Blast the top hit is from phage Cali with an e-value of 5e^-58 and percent identity of 99% and states the function is gp231. The second top hit in NCBI Blast is from phage ScottMcG with an e-value of 9e^-58 and percent identity of 98% and states the function is a hypothetical protein. The third top hit in NCBI Blast is from phage Phox with an e-value of 1e^-56 and percent identity of 97%. CDD had no relevant hits. HHPred had no relevant hits. /note=Transmembrane domains: Deep THMHH predicts 3 transmembrane proteins so this gene is membrane protein. /note=Secondary Annotator Name: Melnyk, Mattie /note=Secondary Annotator QC: I agree with both the function and the location of this gene but add to information in the synteny box CDS 130958 - 131629 /gene="236" /product="gp236" /function="glycosyltransferase" /locus tag="JulietS_236" /note=Original Glimmer call @bp 130958 has strength 16.04; Genemark calls start at 130958 /note=SSC: 130958-131629 CP: yes SCS: both ST: SS BLAST-Start: [galactosyl transferase [Mycobacterium phage Breeniome] ],,NCBI, q1:s1 100.0% 1.125E-164 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.485, -3.790975366313636, yes F: glycosyltransferase SIF-BLAST: ,,[galactosyl transferase [Mycobacterium phage Breeniome] ],,YP_009221340,100.0,1.125E-164 SIF-HHPRED: CESA_like_2; CESA_like_2 is a member of the cellulose synthase superfamily. The cellulose synthase (CESA) superfamily includes a wide variety of glycosyltransferase family 2 enzymes that share the common characteristic of catalyzing the elongation of polysaccharide chains.,,,cd06427,96.4126,99.9 SIF-Syn: /note=Primary Annotator Name: Arredondo, Alexis /note=Auto-annotation: Both Glimmer and GeneMark agree on the start site, 130958, with the start codon being ATG. /note=Coding Potential: The selected start codon covers the entire length of the gene with coding potential for both the host-trained and self-trained genemark. /note=SD (Final) Score: The gene candidate does not have the best final score, -3.791, however it does have the best Z-score 2.485, making the gene a very strong candidate. /note=Gap/overlap: The gap is -4, highly indicating that the gene may be part of an operon with the genes near it. /note=Phamerator: This gene belongs to pham 456, dated 1/13/23. It is conserved in other phages as well, such as Momo (C1) and Phlegm (C1). /note=Starterator: Start site 2 was manually annotated in 161 of the 165 non-draft genes in the pham. This start site is 130958, which agrees with the auto annotation by Glimmer and GeneMark, and further validates the start site. /note=Location call: Based on the agreed start site provided by both Glimmer and GeneMark, 130958, which is validated by starterator, along with the consideration that the gene is conserved in both Momo (C1) and Phlegm (C1), this gene is most likely real. /note=Function call: Glycosyltransferase. HHPred had multiple significant hits, however, one hit listed as glycosyltransferase had a probability of 99.9%, an e-value of 8.8e-21, and a percent coverage of 96.41%. Additionally, the two top hits for NCBI Blast also listed the function as glycosyltransferase. However, the first hit had a percentage identity of 100%, a percentage coverage of 100%, and an e-value of 1.125e-164, indicating that the gene’s function is glycosyltransferase. Additionally, Phages DB lists two phages, BadAgartude and BeanWater with e-values of 1e-133, with a listed function of glycosyltransferase. /note=Transmembrane domains: Deep TMHMM does not list TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Deng, Yiran /note=Secondary Annotator QC: I agree with the function and location call. However, PhagesDB blast also have strong hits with relevant functions, could be considered as evidence and added in; also please remeber to select the drop down menus CDS 131619 - 132335 /gene="237" /product="gp237" /function="glycosyltransferase" /locus tag="JulietS_237" /note=Original Glimmer call @bp 131619 has strength 13.46; Genemark calls start at 131652 /note=SSC: 131619-132335 CP: yes SCS: both-gl ST: SS BLAST-Start: [galactosyl transferase [Mycobacterium phage Cali] ],,NCBI, q1:s1 100.0% 2.09676E-175 GAP: -11 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.51, -4.698795016966896, yes F: glycosyltransferase SIF-BLAST: ,,[galactosyl transferase [Mycobacterium phage Cali] ],,YP_002224675,100.0,2.09676E-175 SIF-HHPRED: ATP synthase subunits region ORF 6; majastridin, ATPase operon, glycosyl transferase, Rossmann fold, sulphur SAD, TRANSFERASE; HET: GOL; 1.1A {Rhodobacter blasticus},,,2NXV_B,81.9328,99.9 SIF-Syn: glycosyltransferase, upstream gene is in [pham 456] function of glycosyltransferase, downstream gene is in [pham 64461] function is PnuC-like Nicotinamide riboside transporter, just like in phage ShiaLabeouf and Shaqnato. /note=Primary Annotator Name: Kumar, Preyasi /note=Auto-annotation: Glimmer calls the start at 131619, and GeneMark calls the start at 131652, ATG. /note=Coding Potential: Yes, the gene has reasonable coding potential predicted within the putative ORF and the chosen start site covers all this coding potential. Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.699 is the final score associated with the original start site, but it is the fourth best final score on PECAAN. Z-score is 2.51, which is strong. I did not choose a different gene candidate because of stronger start site, start codon, starterator, and coding potential evidence. /note=Gap/overlap: 11bp. Slightly large overlap to be considered part of an operon, but the most reasonable candidate. /note=Phamerator: Pham 51469. Date 01/13/2023. It is conserved, found in ShiaLabeouf and Shrimp. /note=Starterator: Yes, there is a conserved start site choice. It is start number 33 with a base pair coordinate of 131619. 148 of 263 call site #33. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 131619 bp. /note=Function call: Glycosyl transferase. The PhagesDB database and BLAST and top NCBI BLAST hits say glycosyl transferase. Top HHpred hits also say glycosyl transferase (e-value < 10^-142). 1 CDD hit (Glyco_tranf_2_2 super family) for Glycosyltransferase like family was found. /note=Transmembrane domains: Neither TMHMM or TOPCONS or DeepTMHMM predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Melnyk, Mattie /note=Secondary Annotator QC: I agree with the location and function call however NCBI evidence should be checked and make sure to add information to the synteny box CDS 132332 - 132634 /gene="238" /product="gp238" /function="PnuC-like Nicotinamide riboside transporter" /locus tag="JulietS_238" /note=Original Glimmer call @bp 132332 has strength 4.56; Genemark calls start at 132332 /note=SSC: 132332-132634 CP: yes SCS: both ST: SS BLAST-Start: [gp230 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 3.0405E-66 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.131, -4.4713484557894425, no F: PnuC-like Nicotinamide riboside transporter SIF-BLAST: ,,[gp230 [Mycobacterium phage Bxz1] ],,NP_818280,100.0,3.0405E-66 SIF-HHPRED: NMN_transporter ; Nicotinamide mononucleotide transporter,,,PF04973.15,68.0,99.5 SIF-Syn: PnuC-like nicotinamide riboside transporter, upstream gene is in [pham 51469] function of glycosyltransferase, downstream gene is in [pham 64390] function is glycosyltransferase, just like in phage Ading and Audrick. /note=Primary Annotator Name: Rheinhardt, Jenna /note=Auto-annotation: Glimmer and GeneMark both call start site of 132332. Start codon of GTG. /note=Coding Potential: Self and host trained both have strong coding potential. Present in the forward strand on the second lane. /note=SD (Final) Score: -4.471 ; not the best final score available. Z-score of 2.13.1, not the best but still strong. /note=Gap/overlap: -4 ; overlap of 4 which could indicate the presence of an operon. /note=Phamerator: pham : 64461. 143 calls have the same bp length of 303bp. Contain clusters C, AM, DZ, DJ, BK, and a Singleton. Multiple of the phages have a note that the gene is called as a PnuC-like nicotinamide riboside transporter, phages Ading and Adlitam. /note=Starterator: Calls start 21 beginning with 132332 with 148 MA calling this start site as well. This call agrees with GeneMark and Glimmer. /note=Location call: Yes the evidence above supports that the start site is real. The most likely start site is 132332. /note=Function call: PnuC-like nicotinamide riboside transporter ; Phagesdb BLAST has multiple matches with e-values of 5e-59 that call the function PnuC-like nicotinamide riboside transporter including phages Amataga and Astraea. HHPRED has over 65% coverage hits for a nicotinamide riboside transporter with e-values of 1.9e-12 and 4.4e-11. NCBI BLAST 100% aligned and identity for PnuC-like nicotinamide riboside transporter with phage Qbert. CDD does have a hit for a nicotinamide mononucleotide transporter but the alignment is 16.29% and 74% coverage with an e-value of 0.0098. /note=Transmembrane domains: DeepTmHmm predicted 3 predicted TMRs. /note=Secondary Annotator Name: Hegde, Priya /note=Secondary Annotator QC: I agree with the location and function call for this gene. CDS 132631 - 133293 /gene="239" /product="gp239" /function="glycosyltransferase" /locus tag="JulietS_239" /note=Original Glimmer call @bp 132631 has strength 14.86; Genemark calls start at 132631 /note=SSC: 132631-133293 CP: yes SCS: both ST: SS BLAST-Start: [glycosyltransferase [Mycobacterium phage Shrimp] ],,NCBI, q1:s1 100.0% 1.32198E-159 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.908, -2.9063687850157054, yes F: glycosyltransferase SIF-BLAST: ,,[glycosyltransferase [Mycobacterium phage Shrimp] ],,AGV99934,100.0,1.32198E-159 SIF-HHPRED: Envelope protein H3; H3, vaccinia virus, poxvirus, glycosyl transferase, VIRAL PROTEIN; HET: PG0, PDO, MOH, GOL, EDO; 1.9A {Vaccinia virus (strain Western Reserve)},,,5EJ0_A,61.3636,99.8 SIF-Syn: glycosyltransferase; upstream gene is in [pham 64461] function is PnuC-like nicotinamide riboside transporter, downstream gene is in [pham 423] function varies either capsid decoration protein or NKF, just like in phages Adlitam and ArcherS7. /note=Primary Annotator Name: Tseng, Kylie /note=Auto-annotation: Glimmer and Genemark both call the same start site at 132631; start codon: GTG /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF on the forward strand only on line 1 (indicating that this is a forward gene). The chosen start site covers this ORF. Coding potential is found in both the host and self-trained Genemark. There is synteny with non-draft phages such as Pier and Pio. /note=SD (Final) Score: -2.906. This is a reasonable SD score. /note=Gap/overlap: There is an overlap of 4 bp, suggesting that this gene could be part of an operon. There are also reasonable gaps for other start site choices, but evidence points toward this being the best option. /note=Phamerator: Date of investigation: 1/20/23; Pham 64390; Yes, the pham is in other members that belong to Cluster C, which is the cluster JulietS belongs to (i.e. phages Ading and Adlitam). /note=Starterator: Yes, there is a conserved start choice. It is start number 55 with a base pair coordinate of 132631. Has 152 MA’s. Found in only 169/453 (37%) of genes in pham but all other evidence points toward this being the correct start site. This start site also agrees with Glimmer and Genemark’s call. E-values are good at -127. /note=Location call: Yes, the evidence suggests this gene is real. Start site 132631 is most likely. /note=Function call: glycosyltransferase; PhagesDB and NCBI all have hits for glycosyltransferase function (supporting evidence in phages CindyLou, Colt, and Daffodil with e-values of -127 in PhagesDB; identities of above 99%, coverage of 100%, and e-values of -159 in NCBI); CDD has no hits and HHPred has hits but low coverage but other databases are providing supporting evidence. /note=Transmembrane domains: No TMDs were predicted by DeepTMHMM; therefore it is not a membrane protein. /note=Secondary Annotator Name: Barrera, Alexis /note=Secondary Annotator QC: I agree with the location and function call. CDS 133357 - 136812 /gene="240" /product="gp240" /function="hypothetical protein" /locus tag="JulietS_240" /note=Original Glimmer call @bp 133357 has strength 19.55; Genemark calls start at 133357 /note=SSC: 133357-136812 CP: yes SCS: both ST: SS BLAST-Start: [gp232 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 0.0 GAP: 63 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.141, -2.2763497933341483, yes F: hypothetical protein SIF-BLAST: ,,[gp232 [Mycobacterium phage Bxz1] ],,NP_818282,99.9131,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Aunger, Sarah /note=Auto-annotation: Glimmer and GeneMark both call the start site at 133357 with a start codon of GTG. /note=Coding Potential: Coding potential in this Open Reading Frame (ORF) is in the forward strand, indicating a forward reading gene. The start site of #133357 covers all of the coding potential regions on the forward strand, which supports the forward direction of this gene. Additionally, coding potential is found in both Host-Trained GeneMark and Self-Trained GeneMark graphs which suggests that this is a potential gene. /note=SD (Final) Score: Even though the final score is not the most negative (SD = -2.276), the Z-score is the highest overall score (Z-score = 3.141), indicating that the autogenerated start site is the better of the options. Since both Glimmer and GeneMark both call for this start site, it gives a higher inclination that this is the correct start site even if it does not have the most negative SD score. /note=Gap/overlap: There is an upstream gap of 63 bp, which seems reasonable when looking at the syntenty of other phages classed in the C1 cluster against this gene. There is a reasonal gap between this gene and the gene downstream of about 39 bp which allows for the 3456 bp gene to be reasonable. /note=Phamerator: The gene was found to be in Pham 423 (01/11/2023), which is common in Cluster C1 phages, as previously seen in Phages Ading, Alice, and Ewok. There were very few listed functions which were all conserved to be capsid decoration protein but not enough to call a function. However, the base pair length was conserved at 3456 bp. /note=Starterator: There is a reasonable and highly conserved start site that was looked on 01/11/2023 at (4, 133357) which was called by 159 out of the 168 non-draft genes out of the 171 total pham members. /note=Location call: The gene and the start site are both conserved for this gene. With a high coding potential in the ORF it indicates a real gene’s placement in the operon and thus the evidence supports that the start site of this gene starts at #133357 and the gene is accurate to other phages within the C1 group. /note=Function call: The function of the gene is a capsid decoration protein. In the BLASTp on PhagesDB.org it has strong match against Quasimodo an e-value of 0 and 100% positives with both carrying the function of capsid decoration protein. Additionally, when looking at the NCBI BLASTp it also indicates a function of capsid decoration protein with an e-value of 0 and a 100% match with phage Quasimodo. HHpred and CDD were both inconclusive in providing a function. There is an indication of a high probability that the function of this gene has the function of capsid decoration protein. /note=Transmembrane domains: Both TOPCONS and TMHMM predict no transmembrane domains, thus this gene does not encode for a membrane protein. DeepTMHMM also predicts no transmembrane domains as well, it does conclude that all of the encoding of this gene is done outside the cell. /note=Secondary Annotator Name: Sandhu, Muskaan /note=Secondary Annotator QC: I agree with the function call and the location call of this gene. CDS 136852 - 138246 /gene="241" /product="gp241" /function="hypothetical protein" /locus tag="JulietS_241" /note=Original Glimmer call @bp 136852 has strength 17.09; Genemark calls start at 136852 /note=SSC: 136852-138246 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SCOTTMCG_239 [Mycobacterium phage ScottMcG] ],,NCBI, q1:s1 100.0% 0.0 GAP: 39 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.995, -2.9334385989924945, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SCOTTMCG_239 [Mycobacterium phage ScottMcG] ],,YP_002224236,100.0,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Fernandez, Mackenzie /note=Auto-annotation: Glimmer and GeneMark agree. Both call a start of 136852. ATG start codon. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene, and covers the suggested start site. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -2.933. It is the best final score on PECAAN. Z-score is 2.995. /note=Gap/overlap: Gap of 39bp. The gap is conserved in other phages (Shaqnato, NoodleTree) and there are no ORFs longer than 120bp. /note=Phamerator: Pham: 457. Date 1/13/2023. Pham has 178 members and 14 phages are drafts. Conserved in ValleyTerrace and BeanWater. /note=Starterator: Start site 5 in Starterator was manually annotated in 161 of the 164 non-draft genes in the pham. Start 5 is 136852 in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 136852. /note=Function call: NKF. All PhagesDB hits supported unknown function with e-values of 0 (ZygoTaiga, Wally, Teardrop). NCBI also supported unknown function with e-values of 0 and coverage of 100% for ScottMcG, Breeniome, and Ading. CDD had zero hits. HHpred had no significant hits (probability lower than 80% and score was less than 50 for all hits; no significant e-values). /note=Transmembrane domains: 0 predicted TMRs from Deep TMHMM, therefore it is not a membrane protein. /note=Secondary Annotator Name: Melnyk, Mattie /note=Secondary Annotator QC: I agree with the location and function calls but remember to include examples of other cluster C phages in the pham. Additionally I would check and list the phagesDB hits and NCBI hits that call the protein function unknown or hypothetical because those will strengthen your argument CDS 138257 - 139027 /gene="242" /product="gp242" /function="hypothetical protein" /locus tag="JulietS_242" /note=Original Glimmer call @bp 138257 has strength 18.93; Genemark calls start at 138257 /note=SSC: 138257-139027 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SCOTTMCG_240 [Mycobacterium phage ScottMcG] ],,NCBI, q1:s1 100.0% 0.0 GAP: 10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.152, -2.253486910374644, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SCOTTMCG_240 [Mycobacterium phage ScottMcG] ],,YP_002224237,100.0,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Geghamyan, Knar /note=Auto-annotation: Both Glimmer and GeneMark call the gene and they agree that the start site is 138,257. The start codon is GTG. /note=Coding Potential: The ORF has good coding potential on the direct sequence and the chosen start site includes all of the coding potential. /note=SD (Final) Score: -2.253 is the best SD (final) score because it is the highest (least negative) value, and has the smallest gap of all of the choices. Z-score is above 2. /note=Gap/overlap: While there is a 10 bp gap at start site 138257, it is the smallest of all of the options and is still under the threshold of a 50bp gap. /note=Phamerator: Gene found in pham 446 on 1/24/23. When compared to phages Bipolarisk, Megamind, and Shrimp, the pham in which this gene is most commonly annotated was found to be in other members of the same cluster C. There is synteny with other non-draft phages belonging to the same cluster. /note=Starterator: Start: 1 @138257 has 160 MA`s. The start number called the most often in the published annotations is 1, it was called in 160 of the 166 non-draft genes in the pham. /note=Location call: The evidence supports that this is a real gene, and the potential candidate start site at 138257 seems most likely. /note=Function call: NKF. The phagesdb BLAST hits with e-value < 10^-146 had no known function, and the top 2 NCBI BLAST claimed were minor tail proteins with 100% coverage, 99%+ identity, and E-values close to 0, but there was no synteny with other phages of the same cluster (Grungle, Shrimp), meaning there isn`t sufficient evidence to label this as a minor tail protein. There were also no hits on CDD or HHpred. /note=Transmembrane domains: Neither TMHMM, deep TMHMM, or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Okahata, Leila /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. Make sure to check transmembrane domains with DEEPTMHMM. CDS 139040 - 139795 /gene="243" /product="gp243" /function="hypothetical protein" /locus tag="JulietS_243" /note=Original Glimmer call @bp 139040 has strength 20.59; Genemark calls start at 139040 /note=SSC: 139040-139795 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein LRRHOOD_241 [Mycobacterium phage LRRHood] ],,NCBI, q1:s1 100.0% 0.0 GAP: 12 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.141, -2.2763497933341483, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein LRRHOOD_241 [Mycobacterium phage LRRHood] ],,ACU41734,100.0,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Melnyk, Mattie /note=Auto-annotation: Both Genemark and Glimmer call the start at 139040 with a start codon of GTG /note=Coding Potential: Both host-trained and self-trained genemark show coding potential from the start to end /note=SD (Final) Score: The final score is -2.276 which is the best final score on PECAAN /note=Gap/overlap: There is a gap of 10 which is a very reasonable gap /note=Phamerator: As of 1/13/23 the pham is 64502 and has 179 members, 14 of which are drafts. It includes other cluster C1 phages such as BackyardAgain and Bread. /note=Starterator: JulietS calls the most annotated start (Start: 13 @139040) and it has 162 manual annotations /note=Location call: This is likely a real gene with a start at 139040 /note=Function call: NKF; The phagesDB hits call function unknown with e-vaues of 1e-141. NCBI hits call hypothetical protein with (identity: 100, coverage: 100, e-value: 0) and (identity: 99.6, coverage:100, e-value: 0). There are no relevant HHpred or CDD hits. /note=Transmembrane domains: Deep TMHMM predicts no TMDs and that the protein is extracellular. /note=Secondary Annotator Name: Pham, Truc /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 139834 - 141264 /gene="244" /product="gp244" /function="lysin A" /locus tag="JulietS_244" /note=Original Glimmer call @bp 139834 has strength 15.63; Genemark calls start at 139834 /note=SSC: 139834-141264 CP: yes SCS: both ST: SS BLAST-Start: [lysin A [Mycobacterium phage CharlieB] ],,NCBI, q1:s1 100.0% 0.0 GAP: 38 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.152, -2.3335289980954053, no F: lysin A SIF-BLAST: ,,[lysin A [Mycobacterium phage CharlieB] ],,YP_010057623,100.0,0.0 SIF-HHPRED: N-acetylmuramoyl-L-alanine amidase; amidase, zinc binding, cell wall degradation, endolysine, hydrolase; HET: PO4, GOL; 1.21A {Clostridium intestinale},,,6SSC_A,30.4622,99.5 SIF-Syn: This is a lysin A protein. Upstream gene is NKF, downstream gene is holin, just like in phage Megamind. /note=Primary Annotator Name: Gurunathan, Vibha /note=Auto-annotation: Both Glimmer and Genemark call the start site at 139834, at start codon GTG. /note=Coding Potential: This gene has strong coding potential in the top strand only, and the chosen start site covers all of the coding potential. /note=SD (Final) Score: The start site has a final score of -2.334 and a Z score of 3.152, which are the best final scores and z-scores of all potential start sites listed. /note=Gap/overlap: There is a gap of 38 bp, which is not ideal, but it is acceptable. /note=Phamerator: Pham 68651. Date 1/26/23. It is conserved; found in Ading_253 and Adlitam_250. Function was listed as a lysin A on the phams database. /note=Starterator: Start site 31@139834 has 157 manual annotations. No other start sites have manual annotations. /note=Location call: Due to the above evidence, the location is called at start site 139834. /note=Function call: PhagesDB data shows genes in the same pham having the lysin A function. NCBI blastp shows significant hits with e-values of 0 with this gene being either an endolysin or lysin A. CDD showed domain hits with Amidase_2. HHPred shows lysin A as well with low e-values. Thus, the function is called as lysin A. /note=Transmembrane domains: Deep TMHMM shows no predicted TMRs, so this protein is likely not a membrane protein. /note=Secondary Annotator Name: Okahata, Leila /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. The Pham I see is defined 68651 (Date 1/26/2023) - potentially the Pham just updated as I couldn`t open Starterator when I was QCing this gene. So look to update the Pham. For the Starterator section, make sure to include how many total manual annotations are there for that start site (i.e. 157 out of what total). CDS 141261 - 141671 /gene="245" /product="gp245" /function="holin" /locus tag="JulietS_245" /note=Original Glimmer call @bp 141261 has strength 19.28; Genemark calls start at 141261 /note=SSC: 141261-141671 CP: yes SCS: both ST: SS BLAST-Start: [gp237 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 1.37839E-87 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.986, -2.7423981586358774, yes F: holin SIF-BLAST: ,,[gp237 [Mycobacterium phage Bxz1] ],,NP_818287,99.2647,1.37839E-87 SIF-HHPRED: Phage_holin_7_1 ; Mycobacterial 2 TMS Phage Holin (M2 Hol) Family,,,PF16081.8,97.0588,100.0 SIF-Syn: Holin, upstream gene is lysin A, and downstream gene is lysin B, just like in phage McGee. /note=Primary Annotator Name: Hines, Kia /note=Auto-annotation: Glimmer and GeneMark both call 141261 as start. /note=Coding Potential: There is very good coding potential on the forward strand, indicating this is a forward gene. The chosen start site covers all of the coding potential. /note=SD (Final) Score: The final score is the best option at -2.742 and the z score is the highest at 2.986. /note=Gap/overlap: There is an overlap of -4 bp which indicates that this gene is part of an operon. The length of the gene is 411 which is an adequate length. /note=Phamerator: Pham number is 511 as of 1/08/23. It is conserved, found in Ronan_247 and ET08_237. /note=Starterator: Start number (@141261) is 1, manually annotated 157 times. This is the same as the conserved start site. There are 171 members of the pham and 157 call the same conserved start site. /note=Location call: Based on the above evidence, this is a real gene and the start site is 141261. /note=Function call: Predicted function is Holin. The top hits for non draft genes on phagesdb, NCBI Blast, and HHpred all call holin as the function of this gene with e-values of 3e-69, 1.37e-87, and 2.7e-29 respectively. /note=Transmembrane domains: deepTMHMM predicts two TMDs. Based on this evidence, this gene can be assumed to have a TMD and is therefore a “membrane protein”, more specifically with a function call of holin protein. The top hits on all three sites also have good percent coverage (high 90s) and probability (100%). /note=Secondary Annotator Name: Deng, Yiran /note=Secondary Annotator QC: I agree with this annotation, all of the evidences have been considered CDS 141668 - 142696 /gene="246" /product="gp246" /function="lysin B" /locus tag="JulietS_246" /note=Original Glimmer call @bp 141668 has strength 15.01; Genemark calls start at 141668 /note=SSC: 141668-142696 CP: yes SCS: both ST: SS BLAST-Start: [gp242 [Mycobacterium phage Cali] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.49, -6.561440757437232, no F: lysin B SIF-BLAST: ,,[gp242 [Mycobacterium phage Cali] ],,YP_002224684,100.0,0.0 SIF-HHPRED: Gene 12 protein; alpha/beta sandwich, CELL ADHESION; 2.0A {Mycobacterium phage D29},,,3HC7_A,71.6374,100.0 SIF-Syn: The function of the gene upstream is Holin and the gene downstream is a Terminase in phages CharlieB and Alice. Which shows synteny with phage JulietS /note=Primary Annotator Name: Luk, Jarrett /note=Auto-annotation:Forward Gene.Glimmer called the start site at 141668, Genmark called the start site at 141668. Starting codon is ATG /note=Coding Potential:The coding potential is found in the forward frame. The gene covers all of the coding potential in the Self Trained GeneMark and Host Trained Genepark /note=SD (Final) Score:-6.561 it`s the best score due to how it has the smallest overlap. Z score is 1.49 which is smaller than 2 however the overlap is conserved among other phages from cluster C1 therefore this site would be a better candidate than other options /note=Gap/overlap:-4 which is indicates that the gene can be part of an operon.The overlap is conserved in Alice and Ading from the C1 cluster /note=Phamerator: Pham 66928, 01/19/23 gene conserved in Phage Ading and Adlitam from cluster C1 /note=Starterator: 01/13/23. Manually annotated 290/1732 nondraft in this pham. Start 156 (156,141698) are manually called by 290 others . However the gene is evident to be a part of an operon. The function of the two genes upstream of the current gene are called to be holin and lysinA. It is reasonable that the gene would be part of an operon as the function of the gene is lysinB(shown in Function call below) which usually works together with holin and lysinA. Therefore the start site at (132,141668) with 154MA and an overlap of 4bp would be a better candidate. Evidence is also in line with the autoannotation made by glimmer and genemark /note=Location call: It is a real gene and the strat site is Start 132 at 141668 based on the evidence above /note=Function call:LysinB.The top two hits on phagesDB blast (e-value=0 for phage Stubby and Ronan) indicate the function to be LysinB. The top hit on NCBI blast(e-value =0 for phage Cali and Rizal, with 99% identity and 100% coverage) also indicates the function to be lysin B. Top hit on HHPRED (e-value=9e-34, 100% probability, and over 70% coverage) also indicates that the function to be lysinB. There are no significant hits on CDD. /note=Transmembrane domains: No transmembrane domain predicted TMHMM, TOPCONS and deepTMHMM.Which indicates that the gene is not responsible for a membrane protein /note=Secondary Annotator Name: Hegde, Priya /note=Secondary Annotator QC: I agree with the location and function call for this gene. CDS 142699 - 145722 /gene="247" /product="gp247" /function="terminase" /locus tag="JulietS_247" /note=Original Glimmer call @bp 142699 has strength 11.76; Genemark calls start at 142699 /note=SSC: 142699-145722 CP: yes SCS: both ST: SS BLAST-Start: [terminase [Mycobacterium phage Pier] ],,NCBI, q1:s1 100.0% 0.0 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.601, -3.4080892535965868, no F: terminase SIF-BLAST: ,,[terminase [Mycobacterium phage Pier] ],,AVJ48727,99.9007,0.0 SIF-HHPRED: Terminase large subunit; genome packaging, bacteriophage, ATPase, nuclease, VIRAL PROTEIN; HET: BR; 2.2A {Enterobacteria phage HK97},,,6Z6D_A,24.429,99.9 SIF-Syn: Terminase, upstream gene is lysin B, downtream gene is NFK, like in Grungle and Ading /note=Primary Annotator Name: Okahata, Leila /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 142699 bp. Start codon ATG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. The chosen start site includes all of the coding potential. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.408. It is the best final score on PECAAN. /note=Gap/overlap: 2 bp gap. This gap is very small and reasonable (less than 50 bp). /note=Phamerator: Pham 65633. Date 1/19/2023. It is conserved and found in Grungle, Daffodil, and ParkTD, which are all in the same cluster as JulietS (C). /note=Starterator: Start site 12 in Starterator was manually annotated in 156/168 non-draft genes in this pham. Start 12 is 142699 bp in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and likely has a start site at 142699 bp. /note=Function call: Terminase. The top two phagesDB BLAST hits have the function of a terminase (E-value = 0), and the top two NCBI BLAST hits also have the function of a terminase (E-value = 0.0, >99.90% Identity, 100% coverage). CDD had two significant hits, both related to the hedgehog/intein domain with a high E-value (<2.36e-06) but of low identity (<34%) and low coverage (<21.3235). The top three HHpred hits were of a terminase function (>99.82% probability, E-value < 2.9e-18, 24.1311%-24.6276% coverage). /note=Transmembrane domains: Neither TMHMM, TOPCONS, or DeepTMHMM predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Hegde, Priya /note=Secondary Annotator QC: I agree with the location and function call for this gene. CDS 145795 - 145923 /gene="248" /product="gp248" /function="membrane protein" /locus tag="JulietS_248" /note=Original Glimmer call @bp 145795 has strength 15.42; Genemark calls start at 145795 /note=SSC: 145795-145923 CP: yes SCS: both ST: SS BLAST-Start: [gp240 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 2.97612E-21 GAP: 72 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.396, -3.9166971242758706, yes F: membrane protein SIF-BLAST: ,,[gp240 [Mycobacterium phage Bxz1] ],,NP_818290,100.0,2.97612E-21 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hegde, Priya /note=Auto-annotation: Glimmer and GeneMark call a start site of 145795 (ATG codon). /note=Coding Potential: Good coding potential in GeneMark Host and Self. Coding potential in this ORF is on the forward strand only. /note=SD (Final) Score: -3.917. It is the best final score on PECAAN. /note=Gap/overlap: There is a 72bp gap with the previous gene, which is reasonable. The gap is conserved, found in phages JPickles and JustHall. /note=Phamerator: This gene is in pham 525. It is conserved, found in phages JPickles and JustHall. /note=Starterator: Start 1 at 145795 has 157 manual annotations. It is the most manually annotated start site on Starterator. /note=Location call: This gene likely starts at 145795. This is supported by a high final score, Z-score > 2, and a high number of manual annotations in Starterator. /note=Function call: NKF in PhagesDB BLASTp; top hits were phages Ghost and Grungle (e-values of 9e-19). NKF in NCBI BLAST; top hits were phages Bxz1 (2.98e-21) and Gizmo (9.75e-21). No convincing evidence in HHPred. No CDD hits. /note=Transmembrane domains: One TMD predicted in DeepTMHMM (20 aa long), so this protein is a membrane protein. /note=Secondary Annotator Name: Barrera, Alexis /note=Secondary Annotator QC: I agree with the location and function call. However, I would include more information under the gap/overlap section, perhaps check for similar gaps in other non-draft phages. CDS 145920 - 146237 /gene="249" /product="gp249" /function="hypothetical protein" /locus tag="JulietS_249" /note=Original Glimmer call @bp 145944 has strength 13.87; Genemark calls start at 145944 /note=SSC: 145920-146237 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein KHO59_gp083 [Mycobacterium phage Cane17] ],,NCBI, q1:s1 100.0% 1.50954E-69 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 0.559, -7.682388013302671, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein KHO59_gp083 [Mycobacterium phage Cane17] ],,YP_010057397,100.0,1.50954E-69 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Lee, Amber /note=Auto-annotation: Glimmer and GeneMark both call the start site at 145944 bp. The start codon is ATG. Preference was given to the most manually annotated start site at 145920 bp since the Final and Z-scores did not differ much from the start suggested by Glimmer and Genemark but the -4 overlap suggests it is part of an operon and it is the most manually annotated start site. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: The start site 145920 corresponds to a Final Score of -7.682 which is the worst final score. It also has a Z-score of 0.559. This start site was still chosen because it is the most manually annotated start site, is highly conserved in the pham, and because it has a 4 bp overlap with the previous gene which suggests that it is part of an operon. /note=Gap/overlap: There is a 4 nucleotide overlap which suggests that this gene is part of an operon. This overlap is seen in other non-draft phages like Ading (C1) and Alice (C1). /note=Phamerator: pham: 529. Date 1/19/2023. It is conserved; found in Ading (C1) and Ading (C1). /note=Starterator: Start site 3 in Starterator was manually annotated in 102/157 non-draft genes in this pham and is the most manually annotated start site. Start 3 is 145920 bp in JulietS. This evidence does not agree with the site predicted by Glimmer and GeneMark but is considered the best choice. Start 3 suggests that this gene is part of an operon and the Final Score and Z-score is about the same as those of the auto-annotated start site. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 145920 bp. Starterator does not agree with Glimmer and Genemark. Starterator was given preference. /note=Function call: NKF. E-values too large and not enough coverage in HHPRED. Many strong hits (e-value ~0) are observed in PhagesDB and NCBI BLAST, but no known function. No hits in CDD. Not a membrane protein because it wasn’t called by TMHMM, Deep TMHMM, or TOPCONS. /note=Transmembrane domains: Neither TMHMM, Deep TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Pham, Truc /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 146234 - 146479 /gene="250" /product="gp250" /function="hypothetical protein" /locus tag="JulietS_250" /note=Original Glimmer call @bp 146234 has strength 13.3; Genemark calls start at 146234 /note=SSC: 146234-146479 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein M182_gp094 [Mycobacterium phage Astraea] ],,NCBI, q1:s1 100.0% 1.32823E-50 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.158, -5.243579400788688, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein M182_gp094 [Mycobacterium phage Astraea] ],,YP_008061716,100.0,1.32823E-50 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Qi, Haocheng /note=Auto-annotation:Both Glimmer and genemark, they agree on the same site which is 146234 the start codon is ATG /note=Coding Potential:Coding potential is found both in GeneMark Self and Host, and there is only forward potential, and the chosen start site does include all of the coding potential. /note=SD (Final) Score:-5.244, this is not the best final score but the other factors also support this start /note=Gap/overlap: -4, very reasonable and likely an operon /note=Phamerator: 518. Date 01/13/23. It is conserved; found in other 171 non-draft phages, such as Ading or Bread. /note=Starterator: Start site 1 in Starterator was manually annotated in 156/157 non-draft genes in this pham. Start 1 is 146234 in JulietS. This evidence agrees with the site predicted by both Glimmer and GeneMark. /note=Location call:Based on the above evidence, this is a real gene and the most likely start site is 146234 /note=Function call: Not known function. The top 2 phagesdb BLAST hits (phage:ZygoTaiga; Zeenon) have the function of unknown function with e-value of 3e-40 Not known function, and the ncbi blast also have no known function with e-value of 1e-50. There is also no hits in CDD, the largest possibility in HHpred is 79.2%, so it is not reliable, so overall there is no known function in this gene. /note=Transmembrane domains:Neither TMHMM or TOPCONS, or DeepTMHMM predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Melnyk, Mattie /note=Secondary Annotator QC: I agree with the location and function of this gene CDS 146476 - 146697 /gene="251" /product="gp251" /function="hypothetical protein" /locus tag="JulietS_251" /note=Original Glimmer call @bp 146476 has strength 11.37; Genemark calls start at 146476 /note=SSC: 146476-146697 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein KHO63_gp087 [Mycobacterium phage QBert] ],,NCBI, q1:s1 100.0% 3.25223E-46 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.814, -3.0425830684175623, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein KHO63_gp087 [Mycobacterium phage QBert] ],,YP_010058315,100.0,3.25223E-46 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Uyemura, Antonio /note=Auto-annotation: Glimmer and GeneMark call the start site as 146476 with a start codon of GTG. /note=Coding Potential: Coding potential is only in the forward strand indicating it to be a forward gene. Both GeneMark self and host trained show good coding potential. /note=SD (Final) Score: -3.043 this is the best score in PECAAN. Z-score is 2.814 /note=Gap/overlap: -4bp. This is a highly favorable overlap because it is evident of an operon. /note=Phamerator: 555 1/23/23. It is conserved and found in ading (C1), Adlitam (C1), and Alice (C1). /note=Starterator: Start: 2 @ 146476. This was manually annotated 156/156 non-draft genomes. This agrees with Glimmer and GeneMark. /note=Location call: Based on the above data, this is a real gene and starts at 146476. /note=Function call: The function is unknown. The top three phages (BeanWater (C1), Grungle (C1), Lota (C1)) on phagesdb BLAST hits mark the function as unknown (E-value 1e-36). Additionally the first 2 NCBI BLAST hits also call the function as a Hypothetical protein (100% coverage, 98>% identity, and E-values <10^-46). HHPRED provides no convincing evidence nor did anything show up for CDD. /note=Transmembrane domains: DeepTMHMM predicts it to be inside the cell with 100% probability. /note=Secondary Annotator Name: Okahata, Leila /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. However, in the Auto-Annotation section, make sure to include the specific start codons that are called (in this case GTG). Also indicate in the Gap/Overlap that the -4 may be evidence of an operon. CDS 146728 - 147249 /gene="252" /product="gp252" /function="polynucleotide kinase" /locus tag="JulietS_252" /note=Original Glimmer call @bp 146728 has strength 15.89; Genemark calls start at 146728 /note=SSC: 146728-147249 CP: yes SCS: both ST: SS BLAST-Start: [gp250 [Mycobacterium phage Spud] ],,NCBI, q1:s1 100.0% 7.29125E-123 GAP: 30 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.814, -3.3136498407041004, yes F: polynucleotide kinase SIF-BLAST: ,,[gp250 [Mycobacterium phage Spud] ],,YP_002224469,100.0,7.29125E-123 SIF-HHPRED: HAD_PNKP-C; C-terminal phosphatase domain of T4 polynucleotide kinase/phosphatase (PNKP) and related phosphatases.,,,cd07502,84.9711,99.8 SIF-Syn: Polynucleotide kinase called for both JulietS and Latch (Pham 59177), upstream gene has NKF in JulietS and not called in Latch (Pham 555), has NKF in JulietS and not called in Latch (Pham 537), all genes are from the same pham. /note=Primary Annotator Name: Barrera, Alexis /note=Auto-annotation: Glimmer and Genemark. Both call the start located at 146728. The start codon is ATG, which is common (about half of all genes have this start). /note=Coding Potential: Coding potential in this reading frame is in the forward direction only, which indicates this is a forward gene. Coding potential is found in both GeneMark Self and Host. GeneMark Self includes all atypical coding potential, however it seems that a small part of the typical coding potential is not included near the start. All of the coding potential is included in GeneMark Host. /note=SD (Final) Score: The z score is 2.814 and the final score is -3.314, both have the highest value out of all listed start sites. /note=Gap/overlap: 30 bps gap upstream of the gene and a 60 bp gap downstream of the gene. This gene and these gaps, both upstream and downstream of the gene, are conserved in phages Latch, Alice, and Cali. There is no other coding potential in these regions. /note=Phamerator: Pham: 59177. Date 01/15/23. The gene is conserved in phages JustHall and Khaleesi which are in the same cluster as JulietS. /note=Starterator: Start site 35 is found in 188 of 263 of genes in pham and was called for 163 of 236 non-draft phage genomes in the pham. It is the most annotated start site and is called 94.7% of the time when present. /note=Location call: Considering the evidence above, this gene is a real gene and has a start site at 146728 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: Polynucleotide kinase. The top two hits for HHpred have a function of polynucleotide kinase (probability: 98.8, E-value: 2.8e-18 and 1.3e-16, % coverage: 84.9711 and 88.4393). In this pham, the Phagesdb Function Frequency also states that there is a frequency of 100% being a polynucleotide kinase (90 phages called) in subcluster C1. PhagesDB BLASTp shows many hits for phages with the function of polynucleotide kinase, all with E-values of 2e-97. NCBI BLASTp has seven of the top ten hits with the function of polynucleotide kinase, (all % coverage: 100, E-value ranging from e-121-e-123). There is one CDD hit for polynucleotide kinase (% identity: 33.7931, % alignment: 48.9655, % coverage: 83.815, E-value: 1.08306e-14). /note=Transmembrane domains: DeepTMHMM predicts no TMRs. This evidence indicates that this is not a membrane protein. /note=Secondary Annotator Name: Hegde, Priya /note=Secondary Annotator QC: I agree with the location and function calls for this gene. The synteny box just needs to be filled out because a function was called for this gene. CDS 147309 - 147545 /gene="253" /product="gp253" /function="hypothetical protein" /locus tag="JulietS_253" /note=Original Glimmer call @bp 147309 has strength 10.46; Genemark calls start at 147330 /note=SSC: 147309-147545 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein M181_gp097 [Mycobacterium phage Gizmo] ],,NCBI, q1:s1 100.0% 7.79255E-51 GAP: 59 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.58, -3.513874779476501, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein M181_gp097 [Mycobacterium phage Gizmo] ],,YP_008061030,100.0,7.79255E-51 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kristianto, Luke /note=Auto-annotation: Glimmer calls the gene’s start site at 147309 bp. GeneMark calls the gene’s start site at 147330 bp. In both cases, the start codon is ATG, which has a high probability of being used as a start site. /note=Coding Potential: Coding potential in this ORF is on the forward strand only on the third frame, indicating that this is a forward gene. The ORF has reasonable coding potential and both called start sites do capture all of the coding potential. /note=SD (Final) Score: For the called start site 147309 bp, the final score is the best option at -3.514 and the z-score is the best option at 2.58. This provides strong evidence that this called start site is the real start site. /note=Gap/overlap: The gap/overlap is fairly large at 59 bp. However, this gap is justified because it is the smallest possible gap out of the potential start sites and there is no coding potential within the gap. Furthermore, this gap is conserved and shows up in phage Ading from cluster/subcluster C1. /note=Phamerator: Pham: 537. Date 1/24/2023. It is conserved and found in Ading (C1) and Adlitam (C1). /note=Starterator: Start number 4 in Starterator was manually annotated in 123/157 non-draft genes in this pham. Start number 4 is 147309 bp in phage JulietS. This is likely the start site because it was called the most and was conserved in 135/171 (78.9%) of genes in the pham. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 147309 bp. Starterator agrees with Glimmer and GeneMark. /note=Function call: NKF. The top two PhagesDB BLAST hits have the function of “function unknown” (100% identity, E-value = 2e-41), and the top two NCBI BLAST hits have the function of “hypothetical protein” (100%/98.7179% identity, E-value = 7.79255e-51/2.44218e-50). Results from CDD and HHpred were irrelevant because either no results came up or unlikely results with unreasonably high e-values came up. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs. DeepTMHMM agrees (1.0 probability inside throughout). Therefore, it is likely not a membrane protein. /note=Secondary Annotator Name: Melnyk, Mattie /note=Secondary Annotator QC: Fix the location call in the notes to 147309 but it is called correctly in terms of checking the start site. I also agree with the function call CDS 147545 - 147988 /gene="254" /product="gp254" /function="hypothetical protein" /locus tag="JulietS_254" /note=Original Glimmer call @bp 147545 has strength 4.23; Genemark calls start at 147545 /note=SSC: 147545-147988 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein M180_gp092 [Mycobacterium phage ArcherS7] ],,NCBI, q1:s1 100.0% 3.49503E-104 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.986, -3.2535385006449706, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein M180_gp092 [Mycobacterium phage ArcherS7] ],,YP_008061488,100.0,3.49503E-104 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Shah, Amay /note=Auto-annotation: Glimmer and Genemark. Both call the start site at 147545. /note=Coding Potential: Coding potential in this ORF is in the forward direction only, indicating that this is a forward gene. Coding potential is found in both GeneMark Self and Host. /note=SD (Final) Score: The final score is -3.254. This is the best final score on PECAAN. /note=Gap/overlap: There is an overlap of 1bp, suggesting this gene may be an operon. /note=Phamerator: pham: 928. Date 1/11/2023. It is conserved; found in Bigswole (C) and Bangla1971 (C). /note=Starterator: Start site 2 in Starterator was manually annotated in 98/99 non-draft genes in this pham. Start 2 is 147545 in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 147545. /note=Function call: NKF. BLASTp hits only included hypothetical proteins. CDD returned no hits. The top BLAST hit was a hypothetical protein with an e-value of 3.49e-104. HHPRED did not return any significant hits. /note=Transmembrane domains: Deep TMHMM doesn`t predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Barrera, Alexis /note=Secondary Annotator QC: I agree with the location and function call. However, I would check DeepTMHMM to ensure this gene does not encode for a membrane protein. CDS 147985 - 149115 /gene="255" /product="gp255" /function="serine/threonine kinase" /locus tag="JulietS_255" /note=Original Glimmer call @bp 147985 has strength 12.99; Genemark calls start at 147985 /note=SSC: 147985-149115 CP: yes SCS: both ST: SS BLAST-Start: [gp251 [Mycobacterium phage Cali] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.762, -3.7234890726758456, yes F: serine/threonine kinase SIF-BLAST: ,,[gp251 [Mycobacterium phage Cali] ],,YP_002224693,100.0,0.0 SIF-HHPRED: Serine/threonine-protein kinase PknA; PknA, kinase, drug target, Mtb, TRANSFERASE; 2.03A {Mycobacterium tuberculosis},,,4OW8_A,72.8723,100.0 SIF-Syn: /note=Primary Annotator Name: Sandhu, Muskaan /note=Auto-annotation: The start site in in Glimmer and GeneMark are the same: 147985. Start codon is ATG. /note=Coding Potential: There is coding potential present in ORF 1 of the direct sequence (as expected for a forward gene) in the Host-Trained & Self-Trained Genemark. Start site 128698 covers all of coding potential. /note=SD (Final) Score: -3.723. This is not the lowest score (lowest score is -3.098) but it has the highest Z-score (2.762) and it corresponds to the start site reported by Glimmer and GeneMark. /note=Gap/overlap: There is a -3 bp gap upstream of the gene and a 36 bp gap downstream of the gene. Gap is reasonable because there is no coding potential present to fill in the gaps in Host-Trained and Self-Trained Genemark and it falls within the reasonable less than 50bp gap limit. /note=Phamerator: Pham 527 as of 1/19/23. It is conserved, found in Alice (C1) and Ava3 (C1). /note=Starterator: Most annotated start site is 4; 157/157 of non-draft genes called start site 4. Start 4 is @ 147985 bp in JulietS. Evidence agrees w/ site predicted by Glimmer and GeneMark. /note=Location call: Start site 4 @ 147985 bp. /note=Function call: Serine/threonine kinase. The top three PhagesDB BLAST hits have the function of serine/threonine kinase (score 768 and E-value = 0) and the top three NCBI BLAST hits also have the function of serine/threonine kinsase (100% coverage, >98% identity, and E-value = 0). HHpred had relevant hits for serine/threonine kinase as well (100% coverage, >70 coverage, E-value < 5*10-34). CDD had one hit for a serine-threonine specific protein kinase which did not show up on PECAAN (E value = 1*10-16). /note=Transmembrane domains: DeepTMHMM does show any TMDs, therefore, this is not a transmembrane protein. /note=Secondary Annotator Name: Deng, Yiran /note=Secondary Annotator QC: I agree with function and location call. For the -4 gap, I believe it has an 4 bp overlap and maybe considered as an operon, which upset the final score CDS 149151 - 150932 /gene="256" /product="gp256" /function="hypothetical protein" /locus tag="JulietS_256" /note=Original Glimmer call @bp 149151 has strength 14.99; Genemark calls start at 149151 /note=SSC: 149151-150932 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_IOTA_259 [Mycobacterium phage Iota] ],,NCBI, q1:s1 100.0% 0.0 GAP: 35 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.295, -1.953940808934884, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_IOTA_259 [Mycobacterium phage Iota] ],,QAY08507,100.0,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Wu, Angus /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 149151, and the called start codon is ATG. /note=Coding Potential: The gene has good coding potential with the host-trained GeneMark on ORF 3. The coding potential increases near the potential start site and stays relatively consistent/high until the stop codon where it tapers off. The gene has even better coding potential with the self-trained GeneMark on ORF 3, where the coding potential rises at the start site and plateaus until the stop site. Finally, the start and stop sites also encompass all of the coding potential. /note=SD (Final) Score: The Final Score is -1.954, and the Z score is 3.295. The final score for this start site has the best sequence match (highest final score), and also has the highest Z-score of the possible start site candidates. /note=Gap/overlap: There is a 35 base pair gap with the gene upstream, and this is reasonable/acceptable because the gap isn’t too long (>100bp). This chosen start site does not have the longest open reading frame, but the gene will still be 1782 bp long, which is good. /note=Phamerator: The gene is found in Pham 1693 as of 1/15/23. The gene is conserved in other members of subcluster C1, and I used phage Stubby and Yassified for comparison. /note=Starterator: A start site choice exists that is conserved among members of the pham, and corresponds to start site 2 and 150932. 39/50 non-draft genomes in the pham call also this start site, and it is the most manually annotated site. /note=Location call: The available evidence suggests that the start site is 150932. The gene appears to be real, and this proposed start site covers all the coding potential, and has a good final score and z-score. It is the most commonly called site and called 79% of the time when present. /note=Function call: The function of this protein is unknown (NKF). All results from PhagesDB BLASTp show hits with 100% identity, 100% alignment, and 100% coverage, with e-value 0,, and those proteins have no known function. Similarly on NCBI BLASTp, e-values are 0, and while there are 2 hits with function HNH endonuclease, the vast majority of the genes have no known function. /note=There were no significant hits on CDD with good coverage, and while there were hits on HHpred, the significant hits were not function consistent and conflict with other evidence. /note=Transmembrane domains: Neither DeepTMHMM or TOPCONS predicted any TMDs, this is not a membrane protein. /note=Secondary Annotator Name: Melnyk, Mattie /note=Secondary Annotator QC: I agree with the function and location call of this gene CDS 150929 - 151195 /gene="257" /product="gp257" /function="hypothetical protein" /locus tag="JulietS_257" /note=Original Glimmer call @bp 150929 has strength 10.74; Genemark calls start at 150929 /note=SSC: 150929-151195 CP: no SCS: both ST: NI BLAST-Start: [gp253 [Mycobacterium phage Cali] ],,NCBI, q1:s1 100.0% 2.88305E-57 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.159, -4.98464302562818, no F: hypothetical protein SIF-BLAST: ,,[gp253 [Mycobacterium phage Cali] ],,YP_002224695,100.0,2.88305E-57 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Davis, Kayla /note=Auto-annotation: Both Glimmer and GeneMark list a start site of 150929, which corresponds to a start site of ATG. /note=Coding Potential: There looks to be good coding potential in this gene as both the Host-trained and the Self-trained GeneMark appear to show complete direct sequences. This gene shares synteny with BadAgartude and Atlantean. /note=SD (Final) Score: The chosen Z-score is 2.159, with a final score of -4.985. Though these are not the best overall Z-score and Final score, they correspond to a favorable overlap of -4. /note=Gap/overlap: There is an overlap of 4 base pairs, with the previous gene. /note=Phamerator: As of 02/07/23, this gene is in pham 627. Other phages in this pham are Daffodil, CindyLou and NoodleTree. /note=Starterator: As of 01//27/23, the most annotated start site was 5. It was called in 143 out of 144 non-draft genes in the pham. JulietS has a start number of 5, with 143 MA’s. THis corresponds to a start site of 150929, which is the same as the other start sites called by Glimmer and GeneMark. /note=Location call: This is likely a real gene with a start site of 150929. /note=Function call: Based on the evidence, this protein is likely NKF. CDD turned up 0 relevant hits. There were several hits on HHpred, however, none of the hits had low enough e-values in order to claim that these functions were for this protein. The lowest e-value was 37, which is not meaningful evidence. NCBI Blast had several high identity matches along with very low e-vales (3e-57 - 2e-54 for example) that listed this as a hypothetical protein. /note=Transmembrane domains: Deep TMHMM predicted 0 TMRs, meaning that this is not a transmembrane protein. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 151192 - 152520 /gene="258" /product="gp258" /function="adenylosuccinate synthetase PurA-like" /locus tag="JulietS_258" /note=Original Glimmer call @bp 151192 has strength 16.69; Genemark calls start at 151192 /note=SSC: 151192-152520 CP: yes SCS: both ST: SS BLAST-Start: [adenylosuccinate synthetase [Micromonospora trujilloniae]],,NCBI, q8:s2 97.5113% 1.00485E-92 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.148, -4.497860114175419, no F: adenylosuccinate synthetase PurA-like SIF-BLAST: ,,[adenylosuccinate synthetase [Micromonospora trujilloniae]],,WP_204024076,56.1033,1.00485E-92 SIF-HHPRED: c.37.1.10 (A:) Adenylosuccinate synthetase, PurA {Malaria parasite (Plasmodium falciparum) [TaxId: 5833]} | CLASS: Alpha and beta proteins (a/b), FOLD: P-loop containing nucleoside triphosphate hydrolases, SUPFAM: P-loop containing nucleoside triphosphate hydrolases, FAM: Nitrogenase iron protein-like,,,SCOP_d1p9ba_,98.19,100.0 SIF-Syn: PurA-like adenylosuccinate synthetase is called for both JulietS and StephanieG (Pham 503). Both the upstream and downstream genes to this in both phages are NFK genes. /note=Primary Annotator Name: Pham, Truc /note=Auto-annotation: The two auto-annotation algorithms, Glimmer and GeneMarks, both call the start of this gene at 151192 with the start codon of GTG. /note=Coding Potential: The coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host and all of the ORF is included with the 151192 start site. /note=SD (Final) Score: The SD (Final) Score for this start site is -4.498 and the Z-score is 2.148 on PECAAN. Neither of these is the best possible score, however, it is still favorable. /note=Gap/overlap: This gene has an overlap of -4 base pair with the previous gene. This is the best possible start site since every other possible option would greatly increase the size of the gap, which will potentially require the addition of another gene. This gap does not indicate that this gene is part of an operon because the preferred start site has a favorable RBS score (Z-score) compared to other potential starts, but not necessarily the best. /note=Phamerator: This gene belongs to pham number 501 as of 1/13/2023. The gene is conserved in phages of this cluster (C) like FoxtrotP1 and Gabriel. Many members of this family have their function listed as a PurA-like adenylosuccinate synthetase, so there is a possibility that this gene also shares this function. /note=Starterator: Start site 6 is most often called as it was manually annotated in 157/157 non-draft genes in the pham. Start 6 is 151192 in JulietS. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 151192. /note=Function call: PurA-like adenylosuccinate synthetase. The top three phagesdb BLAST hits are PurA-like adenylosuccinate synthetase (E-value <0), and the top three NCBI BLAST hits also indicate PurA-like adenylosuccinate synthetase function for this gene (96+% coverage, 45+% identity, and E-value <10^-92). HHpred’s top hit also indicated PurA-like adenylosuccinate synthetase function (100% probability, 98% coverage, and E-value <0). CDD’s top hits also indicated PurA-like adenylosuccinate synthetase function (100% coverage, 99+% identity, and E-value <0). /note=Transmembrane domains: Neither DeepTMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Hegde, Priya /note=Secondary Annotator QC: I agree with the location and function call for this gene. The synteny box just needs to be filled out because a function was called for the gene. I would also check DeepTMHMM to verify there are no TMDs present per the updated guidelines. CDS 152520 - 152606 /gene="259" /product="gp259" /function="hypothetical protein" /locus tag="JulietS_259" /note= /note=SSC: 152520-152606 CP: no SCS: neither ST: NI BLAST-Start: [hypothetical protein M181_gp091 [Mycobacterium phage Gizmo] ],,NCBI, q1:s1 100.0% 9.91506E-8 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.062, -3.3503726477288405, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein M181_gp091 [Mycobacterium phage Gizmo] ],,YP_008061036,100.0,9.91506E-8 SIF-HHPRED: SIF-Syn: CDS 152603 - 152839 /gene="260" /product="gp260" /function="hypothetical protein" /locus tag="JulietS_260" /note=Original Glimmer call @bp 152603 has strength 14.23; Genemark calls start at 152603 /note=SSC: 152603-152839 CP: yes SCS: both ST: SS BLAST-Start: [gp251 [Mycobacterium phage Bxz1] ],,NCBI, q1:s1 100.0% 1.75174E-45 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.776, -5.2756940258553, no F: hypothetical protein SIF-BLAST: ,,[gp251 [Mycobacterium phage Bxz1] ],,NP_818301,100.0,1.75174E-45 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Deng, Yiran /note=Auto-annotation: Glimmer and GeneMark both call the start site at 152603 with start codon ATG /note=Coding Potential: Coding potential in this ORF is found on forward direct sequence and no switch in orientation is observed in both host-trained and self-trianed Genemark. All the coding potential included in the ORF and by the selected start site. /note=SD (Final) Score: final score of -5.276 and z-value of 1.776, while the best final score on PECAAN starts at 152744 (final score -3.774 and z-score 2.426) /note=Gap/overlap: the gap with the upstream gene is 82 bp, which is the smallest gap. This gene is conserved in several other phages of the same cluster (Alice, Grungle) and the gap does not contain coding potential and was seen in Alice and Grungle as well. /note=Phamerator: pham: 568, date 01/24/2023. It is conserved; found in Ading (C) and Alice (C). /note=Starterator: Start site number 257 in Starterator had the highest manual annotation in 152/154 non-drafted genes in this pham. Start site 257 is at position 152603 in JulietS, which agrees with the auto-annotated site by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 152603 since the gap is the smallest and contain the longest ORF and has the most manual annotation. /note=Function call: NKF. None of phagesDB BLAST, NCBI BLAST, CDD, or HHPRED shows any significant relevant hits with known functions, only significant probability or coverage to hits with unknown function. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Barrera, Alexis /note=Secondary Annotator QC: I agree with the location and function call. However, I would check DeepTMHMM to ensure this gene does not encode for a membrane protein. CDS 152839 - 152997 /gene="261" /product="gp261" /function="hypothetical protein" /locus tag="JulietS_261" /note=Original Glimmer call @bp 152836 has strength 6.74; Genemark calls start at 152836 /note=SSC: 152839-152997 CP: no SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SCOTTMCG_257 [Mycobacterium phage ScottMcG] ],,NCBI, q1:s2 100.0% 1.98612E-28 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.896, -5.024743752121617, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SCOTTMCG_257 [Mycobacterium phage ScottMcG] ],,YP_002224254,98.1132,1.98612E-28 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Santilla, Matthew /note=Auto-annotation: Both Glimmer and Genemark agree that the start site is @ 152836. Start codon is ATG. /note=Coding Potential: Both the host-trained and self-trained Genemark show that there is coding potential between the start site and the stop site. /note=SD (Final) Score: The final score of this start site is -4.964 which is not the most negative score from the given start sites, and the z-score 1.896 which is below the recommended z-score of at least 2, but it’s still close to 2. /note=Gap/overlap: The overlap in this gene is 4bp which could indicate that this is an operon. /note=Phamerator: This gene is in Pham 566. This gene is also found in phages Phox, Pier, and QBert which are all found in the same cluster as JulietS. /note=Starterator: This start site is called start site 5 and is called in 105/154 manual annotations of non-draft genes. Start site 5 in JulietS is @ 152836. /note=Location call: Based on the aforementioned evidence I believe the start site is @ 152836. /note=Function call: In PhagesDB Blast the top 3 hits are phages TinyTim, Kboogie, and Kamryn who all have e-values of 8e^-28 and state the function is unknown. In NCBI Blast the 2 top hits are phages ScottMcG, and ChaylaJr who have an e-value of 2e^-29 and 6e^-29 respectively and percent identity of 100% and 98.11% respectively and both state that this gene is a hypothetical protein. CDD has no relevant hits. HHPred had no relevant hits. /note=Transmembrane domains: Deep TMHMM did not predict any transmembrane domains, so this gene is not a membrane protein. /note=Secondary Annotator Name: Okahata, Leila /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. However, in the SD (Final) Score section, because it isn`t the best score, make sure to briefly explain why it is still a valid start site and why you chose it. In the Phamerator section, make sure to include a Date of when you checked the phams because phams can change with time. CDS join(153166. .153525;1. .18) /gene="262" /product="gp262" /function="hypothetical protein" /locus tag="JulietS_262" /note=Original Glimmer call @bp 153166 has strength 12.74 /note=SSC: 153166-18 CP: yes SCS: glimmer ST: SS BLAST-Start: [GP257 [Mycobacterium phage Cali] ],,NCBI, q1:s1 100.0% 5.03729E-86 GAP: 168 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.078, -4.643862285635445, no F: hypothetical protein SIF-BLAST: ,,[GP257 [Mycobacterium phage Cali] ],,YP_002224477,100.0,5.03729E-86 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kumar, Preyasi /note=Auto-annotation: Glimmer calls the start at 153166, and GeneMark didn’t predict the start at all, ATG. /note=Coding Potential: Yes, the gene has reasonable coding potential predicted within the putative ORF and the chosen start site covers all this coding potential. Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found in GeneMark Self. /note=SD (Final) Score: -4.644 is the final score associated with this start site, but it is the second best final score on PECAAN. Z-score is 2.078, which is strong. I did not choose a different gene candidate because of Glimmer start site, start codon, coding potential, starterator evidence. /note=Gap/overlap: 168bp. Slightly large, but the smallest and most reasonable candidate, and there is no coding potential in the gap that might be a new gene. /note=Phamerator: Pham 4308. Date 01/13/2023. It is conserved, found in BananaFence and Cali. /note=Starterator: Yes, there is a conserved start site choice. It is start number 1 with a base pair coordinate of 153166. 9 of 9 call site #1. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 153166 bp. /note=Function call: NKF; Both NCBI and PhagesDB database did not predict consistent functions for this gene. The top PhagesDB BLAST and HHpred and NCBI BLAST hits have unknown function/hypothetical protein (e-value < 10^-66). No conserved domains identified for this query sequence. /note=Transmembrane domains: Neither TMHMM or TOPCONS or DeepTMHMM predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Deng, Yiran /note=Secondary Annotator QC: I agree with this annotation, all of the evidence have been considered