CDS 41 - 379 /gene="1" /product="gp1" /function="hypothetical protein" /locus tag="Loca_1" /note=Original Glimmer call @bp 41 has strength 10.65; Genemark calls start at 41 /note=SSC: 41-379 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein JTF52_gp01 [Microbacterium phage Azizam] ],,NCBI, q1:s1 100.0% 1.20022E-73 GAP: 0 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.068, -5.035742295039811, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein JTF52_gp01 [Microbacterium phage Azizam] ],,YP_009996611,100.0,1.20022E-73 SIF-HHPRED: DUF6279 ; Family of unknown function (DUF6279),,,PF19795.2,55.3571,42.3 SIF-Syn: Phages of the same cluster and high percentage of identity have gene called as NKF. /note=(AG) This gene is validated with multiple hits to pham members from other phages. The start call is from both Glimmer and GeneMark. The starterator suggested start site found in 86 of 86 ( 100.0% ) of genes in pham. Manual Annotations of this start: 65 of 65. Called 100.0% of the time when present. Z-score is reasonable because it is one of the closest to 2, and the final score is the closest to 0. There are is no high probability in HHpred and this is not a TM protein. It is most annotated as NKF. CDS 376 - 1089 /gene="2" /product="gp2" /function="hypothetical protein" /locus tag="Loca_2" /note=Original Glimmer call @bp 376 has strength 8.9; Genemark calls start at 376 /note=SSC: 376-1089 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_LEAFY_2 [Microbacterium phage Leafy]],,NCBI, q1:s1 100.0% 8.97065E-164 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.339, -4.497802523441791, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LEAFY_2 [Microbacterium phage Leafy]],,QYC54896,99.5781,8.97065E-164 SIF-HHPRED: MALTOSE ABC TRANSPORTER PERIPLASMIC PROTEIN, ENVELOPE GLYCOPROTEIN; VIRAL PROTEIN, VIRAL MEMBRANE FUSION, HAIRPIN, CHIMERA; HET: GLC, EDO; 1.95A {ESCHERICHIA COLI},,,2XZ3_A,25.3165,30.7 SIF-Syn: Phages with genes in this pham reported the gene as NKF. /note=This gene is validated with multiple hits to pham members from other phages. The start call is from both Glimmer and GeneMark. The starterator suggested start site found in 84 of 84 ( 100.0% ) of genes in pham. Manual Annotations of this start: 63 of 63. Called 94.0% of the time when present. Overlap of -4. Has Z-score and final score is reasonable because it is one of the closest to 2, and the final score is the closest to 0. There are is no high probability in HHpred and this is not a TM protein. It is most annotated as NKF. CDS 1092 - 2552 /gene="3" /product="gp3" /function="terminase" /locus tag="Loca_3" /note=Original Glimmer call @bp 1092 has strength 10.51; Genemark calls start at 1092 /note=SSC: 1092-2552 CP: yes SCS: both ST: SS BLAST-Start: [terminase [Microbacterium phage Leafy]],,NCBI, q1:s1 100.0% 0.0 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.498, -4.9321243179248535, no F: terminase SIF-BLAST: ,,[terminase [Microbacterium phage Leafy]],,QYC54897,99.7942,0.0 SIF-HHPRED: Terminase large subunit; genome packaging, bacteriophage, ATPase, nuclease, VIRAL PROTEIN; HET: BR; 2.2A {Enterobacteria phage HK97},,,6Z6D_A,93.0041,100.0 SIF-Syn: Gene is near the start of the genome which is the predicted location of terminase genes and there is only one predicted terminase in this genome /note=(MA) Other genes in cluster EE have been found to have gene 3 as the terminase. This does make sense because the function of terminase is to initiate packaging of the viral genome, and it is right before the portal protein where they will work together to package the viral genome. /note= /note=Dustin Edwards: Gene is near the start of the genome which is the predicted location of terminase genes and there is only one predicted terminase in this genome CDS 2677 - 3804 /gene="4" /product="gp4" /function="portal protein" /locus tag="Loca_4" /note=Original Glimmer call @bp 2650 has strength 11.35; Genemark calls start at 2746 /note=SSC: 2677-3804 CP: no SCS: both-cs ST: NI BLAST-Start: [portal protein [Microbacterium phage Quaker] ],,NCBI, q1:s1 100.0% 0.0 GAP: 124 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.921, -6.282848987771162, no F: portal protein SIF-BLAST: ,,[portal protein [Microbacterium phage Quaker] ],,YP_009996765,99.4667,0.0 SIF-HHPRED: SIF-Syn: Loca`s close relatives for in NCBI blast for gene 4 is Leafy and Belthelas. Both of them have relatively matching nucleotide sequences and both of their gene 4`s are portal proteins. /note=(MA) I looked at the pham report for the start site, and even though the gap is big it is the smallest listed. I then blasted the nucleotide sequence and found leafy and belthelas to be close relatives. So I looked at them in Phamerator to see if the gene sequences would match up and they did for the most part. Then I went through the other data bases and saw that there is a very high percent chance that gene 4 is the portal protein. /note= /note=(KW) Looking at the start site of gene 4 on the following phages Quaker, KayPaulus, LivingWater, BommRoasted, Sippinontea, and WolfPack, they all have high percent identities to gene 20 and all call start 20 (start site: 2677.) Start 20 is found in 62.3% of genes and called 84.9% of the time. Looking at Phamerator, all of these genes have the same DNA and protein sequence. CDS 3801 - 5390 /gene="5" /product="gp5" /function="major capsid and protease fusion protein" /locus tag="Loca_5" /note=Original Glimmer call @bp 3801 has strength 13.1; Genemark calls start at 3801 /note=SSC: 3801-5390 CP: yes SCS: both ST: SS BLAST-Start: [major capsid and protease fusion protein [Microbacterium phage TimoTea] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.117, -3.5088110057482766, yes F: major capsid and protease fusion protein SIF-BLAST: ,,[major capsid and protease fusion protein [Microbacterium phage TimoTea] ],,QBI97318,99.2424,0.0 SIF-HHPRED: Major capsid protein; Virus Procapsid particles, VIRUS; 5.2A {Enterobacteria phage HK97},,,3QPR_D,55.0095,99.9 SIF-Syn: Upstream gene is portal protein, downstream gene is head-to-tail adaptor. /note=Dustin Edwards: in the bioinformatics guide official functions list, cluster EE have this gene. /note= /note=The start site is said to be 3801 in Starterator. The start number called the most often in the published annotations is 2, it was called in 63 of the 65 non-draft genes in the pham. Start number 2 was found in 82 of 86 of genes in the pham. The proposed start and stop sites have coding capacity according to host-trained GeneMark. Major capsid and protease fusion protein was chosen because the function has 100% coverage on NCBI BLAST. Gene 5 on both phages Hulk and Luxx have nearly identical sequence length and score, and identical pham as gene 5 on Loca. CDS 5394 - 5747 /gene="6" /product="gp6" /function="head-to-tail adaptor" /locus tag="Loca_6" /note=Original Glimmer call @bp 5394 has strength 13.1; Genemark calls start at 5394 /note=SSC: 5394-5747 CP: yes SCS: both ST: SS BLAST-Start: [head-to-tail adaptor [Microbacterium phage Bri160] ],,NCBI, q1:s1 100.0% 1.2466E-78 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.689, -4.552792845506871, yes F: head-to-tail adaptor SIF-BLAST: ,,[head-to-tail adaptor [Microbacterium phage Bri160] ],,YP_009996641,100.0,1.2466E-78 SIF-HHPRED: Gp6; 13-membered ring, VIRAL PROTEIN; HET: MPD, MSE; 2.1A {Enterobacteria phage HK97},,,3JVO_H,94.0171,99.5 SIF-Syn: Gene apears after major caspid protein gene and before tail terminator gene. /note=DE: bioinformatics guide official list states that to call the function of this gene as a head-to-tail adaptor, it must have HHpred alignment to HK97 gp6, which it does. /note= /note=Starterator suggests start site two which begins at base pair 5394 and ends at base pair 5747. This start site has been called in 65 of the 65 non-draft genes in the pham. The gene`s function is a perfect match to other very closely related phages. CDS 5744 - 6124 /gene="7" /product="gp7" /function="tail terminator" /locus tag="Loca_7" /note=Original Glimmer call @bp 5744 has strength 15.84; Genemark calls start at 5744 /note=SSC: 5744-6124 CP: yes SCS: both ST: SS BLAST-Start: [tail terminator [Microbacterium phage Quaker] ],,NCBI, q1:s1 100.0% 2.78656E-83 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.525, -6.591314556970386, no F: tail terminator SIF-BLAST: ,,[tail terminator [Microbacterium phage Quaker] ],,YP_009996768,99.2064,2.78656E-83 SIF-HHPRED: TAIL-TO-HEAD JOINING PROTEIN GP17; VIRAL PROTEIN, VIRAL INFECTION, TAILED BACTERIOPHAGE, SIPHOVIRIDAE, SPP1, VIRAL ASSEMBLY, HEAD-TO-TAIL INTERFACE, DNA GATEKEEPER, ALLOSTERIC MECHANISM; 7.2A {BACILLUS PHAGE SPP1},,,5A21_G,99.2064,97.5 SIF-Syn: (ML) Cause the connection of the tail to the head, which is the supposed protein of the one before. It also has minor tail protein of lambda that it hits and the SPP1 that it hits is still a tail terminator, along with it being in the correct place for a tail terminator. DE: good! SPP1 gp17 /note=(Dustin Edwards) bioinformatics guide official functions state that to call the function of this gene as a tail terminator, if it aligns with SPP1 gp17 do call it head-to-tail connectors, instead call them tail terminator and major tail subunit, of which it does align. /note= /note=(ML) Gene 7 had a start site of 2 with 86 out of 86 of genes in the pham giving a 100% start site at 2. The start site and end site of the gene is 5744 - 6124 which is supported by the GeneMark which had a coding capacity at 5744 to 6124. It is a tail terminator gene because NCBI Blast has a 100% coverage. Though minor tail protein was supported by HHPRED with a 98.7% coverage, I still say it is tail terminator because Phagesdp Blast had multiple phages that supported tail terminator as the function of the protein. CDS 6163 - 6597 /gene="8" /product="gp8" /function="major tail protein" /locus tag="Loca_8" /note=Original Glimmer call @bp 6163 has strength 13.61; Genemark calls start at 6163 /note=SSC: 6163-6597 CP: yes SCS: both ST: SS BLAST-Start: [major tail protein [Microbacterium phage Bri160] ],,NCBI, q1:s1 100.0% 5.19148E-100 GAP: 38 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.54, -5.796882727134779, no F: major tail protein SIF-BLAST: ,,[major tail protein [Microbacterium phage Bri160] ],,YP_009996643,100.0,5.19148E-100 SIF-HHPRED: Phage major tail protein, TP901-1 family; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_G,96.5278,99.3 SIF-Syn: Downstream from the tail terminator gene. /note=(KW) The start and stop on basepairs on GeneMark (6163 - 6597) matches up with the start and stop on Glimmer. The proposed start site for Loca_8 is start 2. It is in 86/93 (92.5%) of genes in Pham. /note=On Phamerator, Loca_8 looks very similar to the other phages which would suggest that it would have the same function (major tail protein.) However, it has a low Z-score of 1.54 and a large gap of 38. This gap was the smallest one which is why it was selected. CDS 6610 - 6993 /gene="9" /product="gp9" /function="minor tail protein" /locus tag="Loca_9" /note=Original Glimmer call @bp 6610 has strength 11.63; Genemark calls start at 6610 /note=SSC: 6610-6993 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein JTF51_gp09 [Microbacterium phage PaoPu] ],,NCBI, q1:s1 100.0% 2.90668E-85 GAP: 12 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.265, -4.294054598558697, yes F: minor tail protein SIF-BLAST: ,,[hypothetical protein JTF51_gp09 [Microbacterium phage PaoPu] ],,YP_009996594,100.0,2.90668E-85 SIF-HHPRED: HK97-gp10_like ; Bacteriophage HK97-gp10, putative tail-component,,,PF04883.15,55.9055,99.1 SIF-Syn: Appropriate location in the genome for minor tail protein as compared to similar phages. /note=(VM) On Pham report I scrolled down where it tells you it said Loca`s start codon was 6. Start codon has been checked by Phamerator and Phagesdb (6610-6993). /note= /note=(GC) When I read the starterator reports it said that Loca gene 9 had start site at 6610 and a stop at 6693. I confirmed this with phamerator genemarks and pecaan and confirmed the function to be a minor tail function. CDS 7007 - 7327 /gene="10" /product="gp10" /function="tail assembly chaperone" /locus tag="Loca_10" /note=Original Glimmer call @bp 7007 has strength 13.79; Genemark calls start at 7007 /note=SSC: 7007-7327 CP: yes SCS: both ST: SS BLAST-Start: [tail assembly chaperone [Microbacterium phage Bri160] ],,NCBI, q1:s1 100.0% 8.47402E-68 GAP: 13 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.501, -3.8256434426994073, yes F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Microbacterium phage Bri160] ],,YP_009996646,100.0,8.47402E-68 SIF-HHPRED: Phage_TAC_12 ; Phage tail assembly chaperone protein, TAC,,,PF12363.11,50.9434,92.1 SIF-Syn: Located just upstream of the tape measure gene and has a PTF. /note=According to phamerator, starterator and PECAAN the start of this gene is at 7007 and the end is at 7327. This is start site number 6. GeneMark had a block at about 7000 that spanned to about 7300; this block shape indicates an open reading frame confirming that the gene exists. This is close to the start and end sites given by starterator and phamerator and PECAAN. This gene`s glimmer start was 7007 this matches up with what was found on the other sites. GeneMark start is also 7007. The function of this gene is the tail assembly chaperone I know this because when I blasted the protein sequence on blast p it gave me a bunch of related phages that I then made a genome map to compare with ( the phages I compared with Loca were Bri160,Concrete,Leafy, Lola20, McShie, and Quaker). All of these related phages had gene 10 as the Tail Assembly Chaperone. This was confirmed by BLAST and PECAAN. CDS join(7007..7291,7291..7443) /gene="11" /product="gp11" /function="tail assembly chaperone" /locus tag="Loca_11" /note= /note=SSC: 7007-7443 CP: yes SCS: neither ST: SS BLAST-Start: [tail assembly chaperone [Microbacterium phage Bri160] ],,NCBI, q1:s1 100.0% 8.77042E-98 GAP: -321 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.501, -3.8256434426994073, yes F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Microbacterium phage Bri160] ],,YP_009996645,100.0,8.77042E-98 SIF-HHPRED: Phage_TAC_6 ; Phage tail assembly chaperone protein, TAC,,,PF09550.13,20.6897,95.9 SIF-Syn: there is a frameshift to produce two tail assembly chaperone proteins /note=-1 frameshift was used to add the second tail assembly chaperone protein CDS 7559 - 9664 /gene="12" /product="gp12" /function="tape measure protein" /locus tag="Loca_12" /note=Original Glimmer call @bp 7559 has strength 14.97; Genemark calls start at 7559 /note=SSC: 7559-9664 CP: yes SCS: both ST: SS BLAST-Start: [tape measure protein [Microbacterium phage Leafy]],,NCBI, q1:s1 100.0% 0.0 GAP: 114 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.773, -4.386181351886765, no F: tape measure protein SIF-BLAST: ,,[tape measure protein [Microbacterium phage Leafy]],,QYC54906,99.7147,0.0 SIF-HHPRED: Tape Measure Protein, gp57; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_AF,21.826,100.0 SIF-Syn: The tape measure protein is almost always the longest gene in the genome, in the case of Loca gene_12 this seems to be the case as this gene is the longest within the entire genome. This protein corresponds directly to the length of the tail, forming the core of the tail. Some tape measure proteins have some enzymatic activity domains near the C-terminal domain, as the tape measure is injected into the cell ahead of the DNA during infection, and this activity may help cleave portions of the cell wall. There is one and only one per genome. /note=According to both GeneMark and Glimmer, their prediction for the start site of Loca gene_12 was found to have a start site at 7559 basepairs. According to the Pham report for Pham 98910, the most prevalent human-verified start site number for this Pham was 3, with the second most prevalent at 16; however, Loca gene_12 has no start site at 3, only a start site at 16, thus the decision was made to use the information given for start site 16. Along with this, according to GeneMark, the proposed start and stop sites, 7559 and 9664, have great gene coding capacity as there are no major dips between the start and stop sites for Loca gene_12, this means that a gene has a very high potential of being present at this site. This gene candidate chosen has a calculated z-score of 2.773 and a calculated final score of -4.386 making it the gene candidate with a final score value closest to zero, the best-fit gene candidate. With this, the gene candidate chosen has a spacer distance of 16 and a gap distance of 93, both values are relatively low, which further builds the reputability of the gene candidate chosen. The putative function chosen, tape measure protein is shown to be the best fit by all 3 databases, NCBI BLASTn, NCBI BLASTp, and HHPred, and is also shown to be approved on the SEA-PHAGES approved functions list. As shown by the handful of evidence down below, HHPred shows that the probability of tape measure protein being the most probable function of this gene is >99.9%. The evidence shown in NCBI BLAST gives a similarity percentage of >99% making tape measure protein the most likely function for this gene. Additionally, TmHmm and Topcons` predicts that Loca gene_12 contains 10 transmembrane helices shows the high-resolution structure of the proteins found within the gene. When taking a look into Phamerator, it can be seen that the location and proposed function for Loca gene_12 is the best fit. The reason is, other phages found within the EE cluster have the same location and function for this very same gene within phamerator, showing the nearly perfect fit of the location and proposed function for Loca gene_12. Lastly, the size of the gap downstream of the proposed gene is only 116 bp; additionally, there is no gap upstream, but an overlap with Loca gene_13 of 3 bp is present. This further verifies the location of Loca gene_12 as a mere best-fit with close to 99% accuracy. CDS 9661 - 10692 /gene="13" /product="gp13" /function="minor tail protein" /locus tag="Loca_13" /note=Original Glimmer call @bp 9661 has strength 14.49; Genemark calls start at 9661 /note=SSC: 9661-10692 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Microbacterium phage KayPaulus] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.019, -4.924073590240254, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Microbacterium phage KayPaulus] ],,AXC35371,97.0845,0.0 SIF-HHPRED: b.18.1.7 (A:) Xylan-binding domain {Clostridium thermocellum [TaxId: 1515]},,,d1h6ya_,47.8134,98.8 SIF-Syn: This gene is right after the tape measure protein, which is evidence for this being the minor tail protein as compared to similar phages. /note=Starterator shows 100% usage of start site 1, starting at base pair 9661. NCBI Blast, phagesDB, HHpred show that minor tail protein is the most probable function. NCBI Blast shows 95.3353% coverage for minor tail protein. In phagesDB, 68% in pham 9835 were minor tail protein. CDS 10692 - 12752 /gene="14" /product="gp14" /function="minor tail protein" /locus tag="Loca_14" /note=Original Glimmer call @bp 10692 has strength 14.85; Genemark calls start at 10692 /note=SSC: 10692-12752 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Microbacterium phage McShie] ],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.825, -4.089061295905795, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Microbacterium phage McShie] ],,QJD53912,96.793,0.0 SIF-HHPRED: RECEPTOR-TYPE TYROSINE-PROTEIN PHOSPHATASE MU; MEMBRANE, RECEPTOR, HYDROLASE, GLYCOPROTEIN, RECEPTOR PROTEIN TYROSINE PHOSPHATASE, CELL ADHESION, TRANSMEMBRANE, PROTEIN PHOSPHATASE, EXTRACELLULAR REGION, IMMUNOGLOBULIN; HET: NAG; 3.1A {HOMO SAPIENS},,,2V5Y_A,62.5364,99.7 SIF-Syn: Appropriate location within the genome for a minor tail protein as compared to similar phages. /note=Start site 1 (position 10692 in Loca) is found in 87 of 87 ( 100.0% ) of genes in pham. Manual Annotations of this start: 68 of 68. Called 100.0% of the time when present. Good gap of -1. In cluster EE, this pham is called with 87% frequency as a minor tail protein. HHpred has 99.7% probability of being a cell adhesion protein. BLASTp query has 95.7% identity to other phage minor tail proteins. CDS 12754 - 13332 /gene="15" /product="gp15" /function="minor tail protein" /locus tag="Loca_15" /note=Original Glimmer call @bp 12754 has strength 16.56; Genemark calls start at 12754 /note=SSC: 12754-13332 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Microbacterium phage Quaker] ],,NCBI, q1:s1 100.0% 1.69923E-133 GAP: 1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.507, -3.954790667257792, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Microbacterium phage Quaker] ],,YP_009996776,98.9583,1.69923E-133 SIF-HHPRED: Short tail fiber protein; STRUCTURAL PROTEIN; 12.0A {Enterobacteria phage T4} SCOP: i.19.1.1,,,1PDI_B,86.9792,97.4 SIF-Syn: Appropriate location within the genome for a minor tail protein as compared to similar phages. /note=According to both GeneMark and Glimmer, their prediction for the start site of Loca gene_15 was found to have a start site at 12754 basepairs. According to the Pham report for Pham 20105, the most prevalent human-verified start site number for this Pham was 7, with the second most prevalent at 6; however, Loca gene_15 has no start site at 6, only a start site at 7, thus the decision was made to use the information given for start site 7. Along with this, according to GeneMark, the proposed start and stop sites, 12754 and 13332, doesn`t have the best gene coding capacity as there is one major dip between the start and stop sites for Loca gene_15; however, the gene coding capacity is still fairly high as the rest of the space between the start and stop site is fairly flat with no dips; therefore, there is around a 75% chance a gene could be placed within this space. This gene candidate chosen has a calculated z-score of 2.507 and a calculated final score of -3.955 making it the gene candidate with a final score value closest to zero, the best-fit gene candidate. With this, the gene candidate chosen has a spacer distance of 12 and a gap distance of 1, both values are significantly low, which further builds the reputability of the gene candidate chosen. The putative function chosen, minor tail protein is shown to be the best fit by all 2 of the 3 databases, NCBI BLASTn and NCBI BLASTp, but not HHPred; however, it is also shown to be approved on the SEA-PHAGES approved functions list. HHPred not verifying the putative function is not that big of a flaw since each potential piece of evidence found within HHPred matches with a different function. Whereas all evidence found within NCBI BLAST matches with the same function, minor tail protein, with an identity percentage of >97% making minor tail protein the most likely function for this gene. Unfortunately, TmHmm and Topcons` predict that Loca gene_12 contains 0 transmembrane helices show the low-resolution structure of the proteins found within the gene. When taking a look into Phamerator, it can be seen that the proposed function and location for Loca gene_15 is the best fit. The reason is, other phages found within the EE cluster have the same location and function for this very same gene within phamerator, showing the nearly perfect fit of the location and proposed function for Loca gene_15. However, there are a few exceptions to this reasoning as there are some phages within the cluster that locate gene 15 somewhere else; however, these same phages have shifted locations for most genes within their genome. Lastly, the size of the gap downstream of the proposed gene is only 44 bp; additionally, the gap upstream is smaller with a size of 2 bp. This further verifies the location of Loca gene_16 as a mere best-fit with close to 99% accuracy. CDS 13376 - 13624 /gene="16" /product="gp16" /function="hypothetical protein" /locus tag="Loca_16" /note=Original Glimmer call @bp 13376 has strength 14.75; Genemark calls start at 13376 /note=SSC: 13376-13624 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein JTF58_gp16 [Microbacterium phage Quaker] ],,NCBI, q1:s1 100.0% 3.156E-42 GAP: 43 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.773, -3.364992052816827, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein JTF58_gp16 [Microbacterium phage Quaker] ],,YP_009996777,95.2941,3.156E-42 SIF-HHPRED: DUF6776 ; Family of unknown function (DUF6776),,,PF20567.1,53.6585,79.6 SIF-Syn: Loca_16 is placed were gene 16 on other phages. Phages within the same Pham reported this gene as no know function. /note=This gene is validated with multiple hits to pham members from other phages. The start call is from both Glimmer and GeneMark. The starterator suggested start site found in 82 of 83 ( 98.8% ) of genes in pham. Manual Annotations of this start: 52 of 62. Called 89.0% of the time when present. Has the smallest gap at 43 bp. Has Z-score and final score is reasonable because it is one of the closest to 2, and the final score is the closest to 0. Highest probability in HHpred is NKF and this is not a TM protein. It is most annotated as NKF. CDS 13648 - 14340 /gene="17" /product="gp17" /function="endolysin" /locus tag="Loca_17" /note=Original Glimmer call @bp 13648 has strength 9.34; Genemark calls start at 13648 /note=SSC: 13648-14340 CP: yes SCS: both ST: SS BLAST-Start: [endolysin [Microbacterium phage Quaker] ],,NCBI, q1:s1 100.0% 9.25422E-164 GAP: 23 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.363, -2.17469248771465, yes F: endolysin SIF-BLAST: ,,[endolysin [Microbacterium phage Quaker] ],,YP_009996778,100.0,9.25422E-164 SIF-HHPRED: Putative peptidase M23; PEPTIDASE, STRUCTURAL GENOMICS, PSI, PROTEIN STRUCTURE INITIATIVE, New York SGX Research Center for Structural Genomics, NYSGXRC; 1.9A {Pseudomonas aeruginosa PAO1},,,2HSI_B,53.0435,97.4 SIF-Syn: This is a Microbacteriophage, so there is no Lysin A unless there is a Lysin B. If there is not a Lysin B, then this gene is an Endolysin. /note=(DB) Starterator shows the start site to be #19 (13648bp - 14340bp) with 65 out of 67 non-draft phages being called and this site being found in 86 of 88 genes in the pham. Lowest gap number of 23. Starterator and PhagesDB match the pham report. Genemarks matches where the start site is suspected to be. This is a Microbacteriophage, so there is no Lysin A unless there is a Lysin B. If there isn`t a Lysin B, then this gene is an Endolysin. CDS 14337 - 14573 /gene="18" /product="gp18" /function="hypothetical protein" /locus tag="Loca_18" /note=Original Glimmer call @bp 14307 has strength 9.32; Genemark calls start at 14340 /note=SSC: 14337-14573 CP: yes SCS: both-cs ST: NI BLAST-Start: [hypothetical protein JTF58_gp18 [Microbacterium phage Quaker] ],,NCBI, q1:s1 100.0% 3.35305E-50 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.164, -5.595837784824238, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein JTF58_gp18 [Microbacterium phage Quaker] ],,YP_009996779,100.0,3.35305E-50 SIF-HHPRED: DUF2730 ; Protein of unknown function (DUF2730),,,PF10805.11,66.6667,95.9 SIF-Syn: Loca_18 is placed where gene 18 is on other phages with NKF. And sequences start to match each other. /note=(MA) Originally, I chose the first choice primarily because the start and stop numbers matched the Starterator numbers. However, the gap was very large and the other numbers could be better especially compared to the second choice. I decided to look at the Phagesdb section on PECAAN to see what phages are more similar to Loca. I then looked at the pham reports to see what the different start and stop sites were, and the suggested start site was 6 but it`s only called 36.9% of the time. And the next suggested start site which is start site 8 is called 53.0% of the time. Considering both of those percentages I felt that start site 8 was the better option. Except on PECAAN the GeneMark start site is 14340 which is for start site 9, it has a 53.3% of being present, but it is only found in 17.4% of the genes in the pham. Versus the other two possible start sites where they are both found in above 96% of the genes in the pham. So overall I concluded that start site 8 was probably the best option overall. Not to mention it has the best numbers for the other categories. Then I used NCBI Blast to check the nucleotide sequence and found a close related phages Quaker and Leafy. Compared them on Phamerator and saw that their gene 18 matched up with Loca`s gene 18, so we started to compare the protein and nucleotide sequences and found that the second choice was the one that matched the sequences versus the first original choice. /note= /note=(AG) After reviewing this gene and its components I can confirm that the 14337-14573 is the most likely start site. This is logic is based on the Gene Mark Start showing favorable coding capacity in the area and the data collected under gene candidate being favorable. /note= /note=(AD) Looking at similar phages such as BoomRoasted, Bri160, Concrete, and Wolfpack, BoomRoasted and Bri160 are older phages with start site 6 called while Concrete and Wolfpack are newer phages with start site 8 called. Start site 8 also has better gap (-4), Z-score, and final score than start site 6 so I believe start site 8 is the better option. CDS 14570 - 14794 /gene="19" /product="gp19" /function="hypothetical protein" /locus tag="Loca_19" /note=Original Glimmer call @bp 14570 has strength 11.01; Genemark calls start at 14570 /note=SSC: 14570-14794 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein JTF52_gp19 [Microbacterium phage Azizam] ],,NCBI, q1:s1 100.0% 3.05651E-43 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.862, -3.170856472917156, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein JTF52_gp19 [Microbacterium phage Azizam] ],,YP_009996629,100.0,3.05651E-43 SIF-HHPRED: Phage_holin_7_1 ; Mycobacterial 2 TMS Phage Holin (M2 Hol) Family,,,PF16081.8,59.4595,97.4 SIF-Syn: Cannot be a membrane protein as it must have multiple membrane domains and TmHmm shows 1 domain. Cannot be a Holin as they have multiple transmembrane domains. /note=(DE) does not have multiple membrane passes, so likely not a holin even though there is HHpred hit for it. /note= /note=Sequence begins at start site five at base pair 14570. Genemark shows proper formation at the 14570 spacing. BLASTp shows 100% query cover with phage Azizam which uses start site 5 in Starterator. Starterator also shows 97.7% usage of start site five. Starterator shows start site five to begin at bp 14570. Has best gap (-4), final score and z score among possible start sites, and it is the longest possible ORF. CDS complement (14864 - 15076) /gene="20" /product="gp20" /function="Lsr2-like DNA bridging protein" /locus tag="Loca_20" /note=Original Glimmer call @bp 15076 has strength 11.1; Genemark calls start at 15076 /note=SSC: 15076-14864 CP: yes SCS: both ST: SS BLAST-Start: [Lsr2-like DNA bridging protein [Microbacterium phage Quaker] ],,NCBI, q1:s1 100.0% 1.05612E-41 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.672, -3.627004739933808, yes F: Lsr2-like DNA bridging protein SIF-BLAST: ,,[Lsr2-like DNA bridging protein [Microbacterium phage Quaker] ],,YP_009996781,100.0,1.05612E-41 SIF-HHPRED: Protein lsr2; anti-parallel beta sheet, dimer, DNA BINDING PROTEIN; 1.728A {Mycobacterium tuberculosis},,,4E1P_B,84.2857,99.9 SIF-Syn: Among several DNA binding proteins at the 3` end of the genome. /note=(AG) GeneMark and Glimmer the suggested start site is 15076bp. the Z score was the closest to 2, the final score was the closest of all to 0, the Gap was only 2, and the LORF was TRUE. Called in cluster EE as lsr2-like dna bridging protein with 82% frequency. HHpred has 99.9% probability of being lsr2-like dna binding protein. BLASTp has 100% identity to lsr2-like dna bridging protein in similar phages. CDS complement (15079 - 15585) /gene="21" /product="gp21" /function="helix-turn-helix DNA binding domain, MerR-like" /locus tag="Loca_21" /note=Original Glimmer call @bp 15585 has strength 13.73; Genemark calls start at 15585 /note=SSC: 15585-15079 CP: yes SCS: both ST: SS BLAST-Start: [helix-turn-helix DNA binding protein [Microbacterium phage Quaker] ],,NCBI, q1:s5 100.0% 7.52151E-118 GAP: 81 bp gap LO: no RBS: Kibler 6, Karlin Medium, 0.498, -7.945159473550231, no F: helix-turn-helix DNA binding domain, MerR-like SIF-BLAST: ,,[helix-turn-helix DNA binding protein [Microbacterium phage Quaker] ],,YP_009996782,97.6744,7.52151E-118 SIF-HHPRED: Putative DNA-binding protein; BldC, S. coelicolor, developmental switch, MerR-like, DNA BINDING PROTEIN-DNA complex; 3.09A {Streptomyces venezuelae},,,6AMA_A,37.5,98.7 SIF-Syn: Among several DNA binding proteins at the 3` end of the genome. /note=(KW) Glimmer and GeneMark both predict 15585 start site (Start 6) which is found in 80 of 87 ( 92.0% ) of genes in pham. Manual Annotations of this start: 39 of 68. Called 51.2% of time when present. In 90.9% of genes, the function was called helix-turn-helix DNA binding domain, MerR-like. HHpred shows 98.7% probability of being MerR-like DNA binding protein. CDS complement (15667 - 15894) /gene="22" /product="gp22" /function="helix-turn-helix DNA binding domain" /locus tag="Loca_22" /note=Original Glimmer call @bp 15894 has strength 7.7; Genemark calls start at 15894 /note=SSC: 15894-15667 CP: yes SCS: both ST: SS BLAST-Start: [helix-turn-helix DNA binding protein [Microbacterium phage Quaker] ],,NCBI, q1:s1 100.0% 3.71377E-45 GAP: 509 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 0.937, -7.838780790165433, no F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding protein [Microbacterium phage Quaker] ],,YP_009996783,100.0,3.71377E-45 SIF-HHPRED: Endothelial differentiation-related factor 1; EDF1, HMBF1alpha, helix-turn-helix, Structural Genomics, NPPSFA, National Project on Protein Structural and Functional Analyses, RIKEN Structural; NMR {Homo sapiens} SCOP: l.1.1.1, a.35.1.12,,,1X57_A,98.6667,99.0 SIF-Syn: Among several DNA binding proteins at the 3` end of the genome. /note=Glimmer and GeneMark both predict start site at 15894 (Start 5) found in 87 of 87 ( 100.0% ) of genes in pham. Manual Annotations of this start: 60 of 68. Called 89.7% of time when present. Has a larger gap of 509, but it is the smallest gap among possible start sites. The genes reverse here, so the gap might contain a promoter. It is the longest predicted ORF. Cluster EE phages have high frequency of calling the gene as helix-turn-helix DNA binding domain and HHPred has 99.5% probability of being so. CDS 16404 - 16793 /gene="23" /product="gp23" /function="hypothetical protein" /locus tag="Loca_23" /note=Original Glimmer call @bp 16404 has strength 6.76; Genemark calls start at 16611 /note=SSC: 16404-16793 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein JTF58_gp23 [Microbacterium phage Quaker] ],,NCBI, q1:s1 100.0% 2.23443E-83 GAP: 509 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.974, -5.398667063434825, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein JTF58_gp23 [Microbacterium phage Quaker] ],,YP_009996784,100.0,2.23443E-83 SIF-HHPRED: DEXHc_RecG; DEXH/Q-box helicase domain of DEAD-like helicase RecG family proteins. The DEAD-like helicase RecG family is part of the DEAD-like helicases superfamily, a diverse family of proteins involved in ATP-dependent RNA or DNA unwinding.,,,cd17918,20.9302,67.2 SIF-Syn: Compared Loca to phages Bethelas, Quaker, LivingWater, Wolfpack, KayPaulus, Sippinontea, and Leafy and determined that the gene 24 function is No known function. /note=This gene is validated with multiple hits to pham members from other phages. The start call differs for Glimmer and GeneMark. The starterator suggested start site 3 (16404) found in 60 of 63 ( 95.2% ) of genes in pham. Manual Annotations of this start: 45 of 48. Called 100.0% of time when present. Has the smallest gap at 509 bp. The genes reverse here and the gap could contain a promoter. Has Z-score and final score is reasonable because it is one of the closest to 2, and the final score is the closest to 0. Highest probability in HHpred is helicase at a lower probability of 67%. This is not a TM protein. It is most annotated as NKF. CDS 16887 - 17105 /gene="24" /product="gp24" /function="helix-turn-helix DNA binding domain, MerR-like" /locus tag="Loca_24" /note=Original Glimmer call @bp 16887 has strength 12.24; Genemark calls start at 16896 /note=SSC: 16887-17105 CP: yes SCS: both-gl ST: SS BLAST-Start: [MerR-like helix-turn-helix DNA binding domain protein [Microbacterium phage PaoPu] ],,NCBI, q1:s1 100.0% 1.21676E-43 GAP: 93 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.76, -5.377477764364887, yes F: helix-turn-helix DNA binding domain, MerR-like SIF-BLAST: ,,[MerR-like helix-turn-helix DNA binding domain protein [Microbacterium phage PaoPu] ],,YP_009996609,100.0,1.21676E-43 SIF-HHPRED: Putative DNA-binding protein; BldC, S. coelicolor, developmental switch, MerR-like, DNA BINDING PROTEIN-DNA complex; 3.09A {Streptomyces venezuelae},,,6AMA_A,77.7778,99.4 SIF-Syn: /note=Glimmer and GeneMark have different predicted start sites. The starterator report predicted start (10) was found in 83 of 85 ( 97.6% ) of genes in pham. Manual Annotations of this start: 63 of 66. Called 98.8% of time when present. The proposed start and stop sites of this gene have coding capacity according to host-trained GeneMark. The function was chosen because NCBI BLASTp predicted 98.6111% identity and 100% alignment and coverage. These proteins are predicted to function as transcription regulators that mediate responses to stress. CDS 17102 - 17401 /gene="25" /product="gp25" /function="HNH endonuclease" /locus tag="Loca_25" /note=Original Glimmer call @bp 17102 has strength 9.59; Genemark calls start at 17102 /note=SSC: 17102-17401 CP: yes SCS: both ST: SS BLAST-Start: [HNH endonuclease [Microbacterium phage Bri160] ],,NCBI, q1:s1 100.0% 5.26481E-68 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.421, -6.050814105256152, no F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Microbacterium phage Bri160] ],,YP_009996660,100.0,5.26481E-68 SIF-HHPRED: HNH endonuclease domain protein; CRISPR-Cas, Cas9, HNH, RuvC, RNA-guided DNA endonuclease, cytoplasmic, Hydrolase; HET: SPD; 2.201A {Actinomyces naeslundii},,,4OGE_A,71.7172,97.8 SIF-Syn: HNH endonuclease is either always at the end of a genome or at the beginning of a genome. /note=M.L. - Gene 25 had a start site at 9, this was found in 85 out of 85 of the genes in the pham. The start site and end site of the gene is 17102 - 17401, supported by the GeneMark that had coding capacity at 17102 - 17401. NCBI BLAST had a 100% cover for HNH endonuclease, while Phagesdb Blast phages all but one had HNH endonuclease as the function. HHpred has 97.8% probability of being HNH endonuclease domain.