CDS 1 - 750 /gene="1" /product="gp1" /function="ParB-like nuclease domain" /locus tag="Tonitrus_1" /note=Original Glimmer call @bp 16 has strength 17.96; Genemark calls start at 1 /note=SSC: 1-750 CP: yes SCS: both-gm ST: NI BLAST-Start: [ParB-like nuclease domain protein [Gordonia phage AnarQue]],,NCBI, q3:s11 95.5823% 7.15375E-68 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.13, -7.283435476019514, no F: ParB-like nuclease domain SIF-BLAST: ,,[ParB-like nuclease domain protein [Gordonia phage AnarQue]],,UBF41604,60.6742,7.15375E-68 SIF-HHPRED: SIF-Syn: First gene in both Tonitrus and Apex (B4) so very similar area /note=Start 16 had the strongest scores, function is based on strong blast and HHPRED results /note=Part of the thymine hyper modification system (see Wooper) /note=Start changed due to no calls in Starterator and doesn`t capture all CP CDS 743 - 1165 /gene="2" /product="gp2" /function="Hypothetical Protein" /locus tag="Tonitrus_2" /note=Original Glimmer call @bp 743 has strength 13.17; Genemark calls start at 743 /note=SSC: 743-1165 CP: yes SCS: both ST: NA BLAST-Start: GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.08, -4.502863771973257, no F: Hypothetical Protein SIF-BLAST: SIF-HHPRED: SIF-Syn: N/A /note=No blast results. No significant HHPRED results. Start had best scores and contained all CP. /note=HHPRED result clicked to demonstrate how bad the results were CDS 1162 - 2043 /gene="3" /product="gp3" /function="Tet-like J-binding protein" /locus tag="Tonitrus_3" /note=Original Glimmer call @bp 1162 has strength 14.39; Genemark calls start at 1162 /note=SSC: 1162-2043 CP: yes SCS: both ST: SS BLAST-Start: [oxidoreductase [Gordonia phage CloverMinnie] ],,NCBI, q10:s6 94.8805% 3.57207E-94 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.407, -3.9718914447963414, no F: Tet-like J-binding protein SIF-BLAST: ,,[oxidoreductase [Gordonia phage CloverMinnie] ],,QFG11149,67.4825,3.57207E-94 SIF-HHPRED: SIF-Syn: Almost all phages that have this pham have the gene as gene #2. For Tonitrus, this pham is gene #3. Still in same area of the genome. /note=Could be oxidoreductase or oxygenase, but not enough evidence to say for sure. Should be reviewed further. /note=Part of the thymine hyper modification system (see Wooper). Called Tet-like J-binding protein because of synteny with Wooper and thymine hyper modification system. CDS 2043 - 2273 /gene="4" /product="gp4" /function="Hypothetical Protein" /locus tag="Tonitrus_4" /note=Original Glimmer call @bp 2043 has strength 14.08; Genemark calls start at 2043 /note=SSC: 2043-2273 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein [Rhodococcus fascians]],,NCBI, q5:s4 89.4737% 4.53673E-26 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.082, -2.523003374675015, yes F: Hypothetical Protein SIF-BLAST: ,,[hypothetical protein [Rhodococcus fascians]],,WP_052071032,60.6742,4.53673E-26 SIF-HHPRED: SIF-Syn: No good synteny noted /note=Function bases on lots of hypothetical proteins in blast and no strong results in HHPRED, Start kept because of minimal overlap and good CP throughout CDS 2270 - 2371 /gene="5" /product="gp5" /function="Hypothetical Protein" /locus tag="Tonitrus_5" /note=Genemark calls start at 2270 /note=SSC: 2270-2371 CP: yes SCS: genemark ST: NI BLAST-Start: GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.505, -3.769471247018311, yes F: Hypothetical Protein SIF-BLAST: SIF-HHPRED: SIF-Syn: No synteny. /note=Current start, 2270, is best option as it contains all CP and other option creates large overlap. Starterator shows one other phage who does not share the same start as Tonitrus which is not particularly informative. There are no conclusive results for function from BLAST, HHPred, Conserve Domains, CDS 2368 - 3237 /gene="6" /product="gp6" /function="aGPT-Pplase2 domain protein" /locus tag="Tonitrus_6" /note=Original Glimmer call @bp 2368 has strength 12.82; Genemark calls start at 2368 /note=SSC: 2368-3237 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CAIB_3 [Gordonia phage CaiB]],,NCBI, q1:s1 98.9619% 2.32643E-124 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.505, -3.6907860541164537, yes F: aGPT-Pplase2 domain protein SIF-BLAST: ,,[hypothetical protein SEA_CAIB_3 [Gordonia phage CaiB]],,UOW92987,75.7895,2.32643E-124 SIF-HHPRED: SIF-Syn: Phages in the DR pham tend to have this gene as gene 3, Tonitus has it as gene 6, so there is general synteny. /note=Current start will be kept because of good CP and scores, Longest ORF, Starterator and strong BLAST results. BLAST results and HHPred points towards this gene`s function being an aGPT-Pplase2 domain protein. /note=A part of the thymine hyper modification system (see Wooper), which led us to select this specific function. CDS 3234 - 3764 /gene="7" /product="gp7" /function="5-hmU DNA kinase" /locus tag="Tonitrus_7" /note=Original Glimmer call @bp 3234 has strength 12.99; Genemark calls start at 3234 /note=SSC: 3234-3764 CP: yes SCS: both ST: SS BLAST-Start: [adenylate kinase [Gordonia phage CaiB]],,NCBI, q1:s1 97.1591% 8.04008E-48 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.163, -2.2763497933341483, yes F: 5-hmU DNA kinase SIF-BLAST: ,,[adenylate kinase [Gordonia phage CaiB]],,UOW92988,67.9558,8.04008E-48 SIF-HHPRED: SIF-Syn: Genes of same pham typically found around same area (start of genome). /note=Top blast results are all Q1:T1 results with this function. Identified 5 protein system thymine hyper modification system assigns the 4th protein (7) as 5-hmU DNA kinase. /note= /note=Specific function was selected because of synteny with Wooper and thymine hyper modification system. CDS 3761 - 4642 /gene="8" /product="gp8" /function="Hypothetical Protein" /locus tag="Tonitrus_8" /note=Original Glimmer call @bp 3761 has strength 8.63; Genemark calls start at 3761 /note=SSC: 3761-4642 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PBI_CLOVERMINNIE_5 [Gordonia phage CloverMinnie]],,NCBI, q1:s1 96.587% 7.04909E-79 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.925, -2.9063687850157054, yes F: Hypothetical Protein SIF-BLAST: ,,[hypothetical protein PBI_CLOVERMINNIE_5 [Gordonia phage CloverMinnie]],,QFG11152,59.3103,7.04909E-79 SIF-HHPRED: SIF-Syn: Same gene found in phages in DR cluster early in the genome. /note=May be oxidoreductase as supported by HHPRED and BLAST results. Function list mentions that it should be called hypothetical protein as there is not enough evidence for it to be called oxidoreductase. Proteins 1,3,6,7,8 match genes in Wooper that are part of an identified thymine hyper modification system which suggests that this is the final protein in the system. CDS 4643 - 5092 /gene="9" /product="gp9" /function="helix-turn-helix DNA binding domain" /locus tag="Tonitrus_9" /note=Original Glimmer call @bp 4643 has strength 13.3; Genemark calls start at 4643 /note=SSC: 4643-5092 CP: yes SCS: both ST: NA BLAST-Start: [sigma-70 region 4 domain-containing protein [Corynebacterium sphenisci] ],,NCBI, q11:s6 86.5772% 1.83435E-16 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.585, -3.8156109669835954, no F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[sigma-70 region 4 domain-containing protein [Corynebacterium sphenisci] ],,WP_303354710,52.1127,1.83435E-16 SIF-HHPRED: SIF-Syn: Orpham :( /note=Start not changed due to good CP, longest ORF and scores. A couple BLAST results indicate this possibly being a helix-turn-helix protein. HHPRED indicates that protein forms multiple alpha helixes separated by spacers of 3-4 amino acids. This matches the approved function criteria for helix-turn-helix DNA binding protein. CDS 5076 - 5300 /gene="10" /product="gp10" /function="helix-turn-helix DNA binding domain" /locus tag="Tonitrus_10" /note=Original Glimmer call @bp 5076 has strength 10.44; Genemark calls start at 5076 /note=SSC: 5076-5300 CP: yes SCS: both ST: NI BLAST-Start: [helix-turn-helix DNA binding domain protein [Gordonia phage Soos]],,NCBI, q9:s8 87.8378% 1.45186E-11 GAP: -17 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.662, -3.368377069717189, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding domain protein [Gordonia phage Soos]],,WNM68747,60.5263,1.45186E-11 SIF-HHPRED: DUF2774 ; Protein of unknown function (DUF2774),,,PF11242.12,45.9459,98.6 SIF-Syn: Phages in cluster CP with acceptable e value exhibit genes between protein #17-20. /note=Initial BLAST and HHPRED results indicate possible HTH DNA binding domain. Further analysis of HHPRED identifies HTH as per approved function list- alignments reveal 3 alpha helices separated by 3-4 amino acid spacer sequences. CDS 5348 - 5905 /gene="11" /product="gp11" /function="membrane protein" /locus tag="Tonitrus_11" /note=Original Glimmer call @bp 5348 has strength 17.82; Genemark calls start at 5348 /note=SSC: 5348-5905 CP: yes SCS: both ST: NA BLAST-Start: GAP: 47 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.174, -2.253486910374644, yes F: membrane protein SIF-BLAST: SIF-HHPRED: Cytadhesin_P30 ; Cytadhesin P30/P32,,,PF07271.15,21.0811,71.8 SIF-Syn: Orpham :( /note=No BLAST results. No significant HHPRED results. Has transmembrane domain, so likely a membrane protein of some sort. /note=. CDS complement (5896 - 6039) /gene="12" /product="gp12" /locus tag="Tonitrus_12" /note=Original Glimmer call @bp 6039 has strength 4.88 /note=SSC: 6039-5896 CP: yes SCS: glimmer ST: NA BLAST-Start: GAP: 13 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 0.923, -6.944608131136856, no F: SIF-BLAST: SIF-HHPRED: KISc_KIF3; Kinesin motor domain, kinesins II or KIF3_like proteins. Kinesin motor domain, kinesins II or KIF3_like proteins.,,,cd01371,53.1915,24.6 SIF-Syn: Orpham :( /note=Bad coding potential. No BLAST or HHPRED hits. Should be reviewed for deletion. /note= /note=CDF: Look over CP again, looks very solid and suggests gene is definite. Not official synteny on phamerator but similar similar reverse genes on other singletons. CDS 6053 - 6367 /gene="13" /product="gp13" /function="Hypothetical Protein" /locus tag="Tonitrus_13" /note=Original Glimmer call @bp 6053 has strength 8.99; Genemark calls start at 6053 /note=SSC: 6053-6367 CP: yes SCS: both ST: NA BLAST-Start: GAP: 13 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.925, -2.7653702713535186, yes F: Hypothetical Protein SIF-BLAST: SIF-HHPRED: SIF-Syn: No synteny /note=Start kept due to good CP and best RBS, blast and HHPRED yielded no viable results, so I blasted them myself, my own blast results yielded nothing as well as HHPRED. /note= /note=NMB CDS 6376 - 8133 /gene="14" /product="gp14" /function="terminase" /locus tag="Tonitrus_14" /note=Original Glimmer call @bp 6376 has strength 15.64; Genemark calls start at 6376 /note=SSC: 6376-8133 CP: yes SCS: both ST: NI BLAST-Start: [terminase family protein [Rhodococcus ruber] ],,NCBI, q13:s16 95.2137% 0.0 GAP: 8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.058, -5.200500400297053, no F: terminase SIF-BLAST: ,,[terminase family protein [Rhodococcus ruber] ],,WP_237211070,74.5331,0.0 SIF-HHPRED: Terminase, large subunit; phage defense, pattern-recognition receptor, nlr, stand, atpase, ANTIVIRAL PROTEIN; HET: ATP;{Salmonella enterica},,,8DGC_F,87.8633,100.0 SIF-Syn: Similar location to StAugustine and not too far from Pine5 both are singletons /note=Good CP for all starts, good (not best) RBS and longest ORF so start kept, Hits for large subunit but cannot call unless a small subunit is found /note=Start already does not contain all CP so shortening it would cover less of the CP /note= /note= /note=Also, continue checking the genome to see if a small subunit is found. CDS 8244 - 9536 /gene="15" /product="gp15" /function="Hypothetical Protein" /locus tag="Tonitrus_15" /note=Original Glimmer call @bp 8244 has strength 18.33; Genemark calls start at 8244 /note=SSC: 8244-9536 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein Namu_2691 [Nakamurella multipartita DSM 44233]],,NCBI, q267:s13 36.5116% 2.84068E-35 GAP: 110 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.242, -2.1123791669543204, yes F: Hypothetical Protein SIF-BLAST: ,,[hypothetical protein Namu_2691 [Nakamurella multipartita DSM 44233]],,ACV79037,31.1111,2.84068E-35 SIF-HHPRED: SIF-Syn: No pham so no synteny /note=Start kept due to good CP and RBS scores, Bad HHPRED scores, Function chosen based on Blast, no TMHMM /note= /note=NMB CDS 9612 - 9896 /gene="16" /product="gp16" /function="Hypothetical Protein" /locus tag="Tonitrus_16" /note=Original Glimmer call @bp 9612 has strength 13.67; Genemark calls start at 9612 /note=SSC: 9612-9896 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein [Mycobacteroides abscessus]],,NCBI, q8:s16 61.7021% 2.40009E-6 GAP: 75 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.082, -2.442961286954254, yes F: Hypothetical Protein SIF-BLAST: ,,[hypothetical protein [Mycobacteroides abscessus]],,WP_100514073,21.4286,2.40009E-6 SIF-HHPRED: SIF-Syn: No known synteny due to lack of a pham /note=Start kept because CP was good for all starts and high scores for RBS, function unknown due to no TMHMM results and only Blast hits are hypothetical protein /note= /note=NMB CDS 10011 - 10355 /gene="17" /product="gp17" /function="Hypothetical Protein" /locus tag="Tonitrus_17" /note=Original Glimmer call @bp 10011 has strength 17.33; Genemark calls start at 10011 /note=SSC: 10011-10355 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein [Mycobacteroides abscessus] ],,NCBI, q33:s29 64.9123% 2.7749E-9 GAP: 114 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.242, -2.1123791669543204, yes F: Hypothetical Protein SIF-BLAST: ,,[hypothetical protein [Mycobacteroides abscessus] ],,WP_108658134,40.367,2.7749E-9 SIF-HHPRED: SIF-Syn: No synteny /note=Start kept: good CP and RBS, No good data for functions besides hypothetical protein /note= /note=NMB CDS 10418 - 10957 /gene="18" /product="gp18" /function="Hypothetical Protein" /locus tag="Tonitrus_18" /note=Original Glimmer call @bp 10418 has strength 15.41; Genemark calls start at 10418 /note=SSC: 10418-10957 CP: yes SCS: both ST: NA BLAST-Start: GAP: 62 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.242, -2.1123791669543204, yes F: Hypothetical Protein SIF-BLAST: SIF-HHPRED: SIF-Syn: NA (none similar enough) /note=Close to covering all coding potential, but this start cuts off 20 bp (can`t push the start back because the previous start has 300+ overlap with gene 17). /note=Very little CP would be recovered if we pushed the start back, would be very inefficient. /note=PhagesDB BLAST came up with nothing of substance, all either unknown function or E-values > 1. /note=HHPRED had good probability (80-95%), but E-Values were much to large to take into account. /note=Deep TMHMM came up with no hits for membrane protein. /note= /note=NMB CDS 11022 - 11258 /gene="19" /product="gp19" /function="Hypothetical Protein" /locus tag="Tonitrus_19" /note=Original Glimmer call @bp 11022 has strength 8.43; Genemark calls start at 11022 /note=SSC: 11022-11258 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Hoyosella altamirensis] ],,NCBI, q1:s1 100.0% 1.36359E-15 GAP: 64 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.242, -2.1123791669543204, yes F: Hypothetical Protein SIF-BLAST: ,,[hypothetical protein [Hoyosella altamirensis] ],,WP_064442217,64.0,1.36359E-15 SIF-HHPRED: SIF-Syn: Checked with Nergal, no similarity. (different surrounding genes, different positioning, etc.) /note=starterator calls start in 2/2 of pham. /note=NCBI blast: 3 hits on Hypothetical protein, one from the other phage in the pham (Nergal) and two from the bacterium Hoyosella. /note=HHPRED has bad probability and E-values > 1 CDS 11294 - 11833 /gene="20" /product="gp20" /function="Hypothetical Protein" /locus tag="Tonitrus_20" /note=Original Glimmer call @bp 11294 has strength 13.27; Genemark calls start at 11294 /note=SSC: 11294-11833 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein [Prescottella equi]],,NCBI, q4:s1 92.1788% 7.16702E-14 GAP: 35 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.925, -2.8454123590742793, yes F: Hypothetical Protein SIF-BLAST: ,,[hypothetical protein [Prescottella equi]],,NKT77272,50.0,7.16702E-14 SIF-HHPRED: SIF-Syn: genes are not in similar position in Tonitrus and Flinch /note=Current start contains all coding potential and has best score /note=PhagesDB BLAST finch has a good e value /note=HHPREDt. couple of probabilities over 80% but e values are low /note=Deep TMHMM none /note=NCBI Blast: good e value for hypothetical protein of rhodococcus equi CDS 11914 - 12468 /gene="21" /product="gp21" /function="Hypothetical Protein" /locus tag="Tonitrus_21" /note=Original Glimmer call @bp 11914 has strength 15.97; Genemark calls start at 11914 /note=SSC: 11914-12468 CP: yes SCS: both ST: NA BLAST-Start: [DUF732 domain-containing protein [Mycolicibacterium farcinogenes] ],,NCBI, q118:s69 36.413% 0.0375515 GAP: 80 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.174, -2.253486910374644, yes F: Hypothetical Protein SIF-BLAST: ,,[DUF732 domain-containing protein [Mycolicibacterium farcinogenes] ],,WP_221851768,27.907,0.0375515 SIF-HHPRED: DUF732 ; Protein of unknown function (DUF732),,,PF05305.18,44.0217,99.5 SIF-Syn: nothing good from Clawz. /note=DeepTMHMM found one section, could mean Transmembrane protein /note=HHPRED; good prob (99) and good E-Value, but bad % coverage (44) for a hypothetical protein in the DUF732 family. /note=NCBI Blast also hit on a DUF732 family, but with worse identity/e-value/etc. results. CDS 12481 - 13029 /gene="22" /product="gp22" /function="Hypothetical Protein" /locus tag="Tonitrus_22" /note=Original Glimmer call @bp 12481 has strength 9.61; Genemark calls start at 12481 /note=SSC: 12481-13029 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_SOOS_6 [Gordonia phage Soos]],,NCBI, q11:s12 84.6154% 1.86711E-20 GAP: 12 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.857, -2.90473872338446, yes F: Hypothetical Protein SIF-BLAST: ,,[hypothetical protein SEA_SOOS_6 [Gordonia phage Soos]],,WNM68736,55.4217,1.86711E-20 SIF-HHPRED: Leucine-rich repeat-containing protein 47; Ribosome Biogenesis, Pre-40S, RIBOSOME; HET: ATP, 6MZ, B8N, 4AC, OMU, MA6, OMC, PSU, OMG, A2M, MG, 7MG; 2.6A {Homo sapiens},,,6ZXG_k,18.1319,59.2 SIF-Syn: gene is not in similar position as sting draft and dontron draft. The genes located in sting and don ton were in similar positions and between same genes. /note=Phagesdb Blast has mutiple phages that have good e values /note=HHpred has low probabability, e value, coverage /note=NCBI Blast has a couple hypothetical protein from gordonia that have high coverage and good e value /note=TMHMM none CDS 13026 - 13466 /gene="23" /product="gp23" /function="Hypothetical Protein" /locus tag="Tonitrus_23" /note=Original Glimmer call @bp 13026 has strength 14.77; Genemark calls start at 13026 /note=SSC: 13026-13466 CP: yes SCS: both ST: NA BLAST-Start: GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.014, -2.723328252647382, no F: Hypothetical Protein SIF-BLAST: SIF-HHPRED: SIF-Syn: no phages found were close enough to draw conclusions on synteny. i.e. NI. /note=Start chosen did not have the best score (was second best), but the best score cut off a bunch of cp. /note=PhagesDB blast came up only with itself (Tonitrus), the rest of the evaluations (HHPRED, NCBI, DTMHMM, etc.) all came up with inconclusive results... CDS 13470 - 13709 /gene="24" /product="gp24" /function="Hypothetical Protein" /locus tag="Tonitrus_24" /note=Original Glimmer call @bp 13470 has strength 16.76; Genemark calls start at 13470 /note=SSC: 13470-13709 CP: yes SCS: both ST: NA BLAST-Start: GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.889, -3.1896562502401133, yes F: Hypothetical Protein SIF-BLAST: SIF-HHPRED: NdhS ; NAD(P)H dehydrogenase subunit S,,,PF11623.12,50.6329,74.6 SIF-Syn: e-values of the matched phages in BLAST were too low to be usable in synteny. /note=chosen start covers all cp. and has the best score /note=PhagesDB BLAST results have E-values > 1. /note=HHPRED had probability (around 70%), but E-Values were much to large to take into account. /note=Deep TMHMM came up with no hits for TMH. CDS 13709 - 13963 /gene="25" /product="gp25" /function="Hypothetical Protein" /locus tag="Tonitrus_25" /note=Original Glimmer call @bp 13709 has strength 6.32; Genemark calls start at 13709 /note=SSC: 13709-13963 CP: yes SCS: both ST: NA BLAST-Start: GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.904, -4.863722327770724, yes F: Hypothetical Protein SIF-BLAST: SIF-HHPRED: SIF-Syn: e-values of the matched phages in BLAST were too low to be usable in synteny. /note=CP is one parabola shape covering which is 1/4 of the ORF but still all covered /note=Hydrogenase/urease came up in both NCBI Blast and HHPRED, but both had poor e-values. Not useable evidence. CDS 14059 - 14370 /gene="26" /product="gp26" /function="membrane protein" /locus tag="Tonitrus_26" /note=Original Glimmer call @bp 14059 has strength 12.15; Genemark calls start at 14059 /note=SSC: 14059-14370 CP: no SCS: both ST: SS BLAST-Start: [DUF1360 domain-containing protein [Dietzia sp.]],,NCBI, q2:s3 93.2039% 7.04559E-17 GAP: 95 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.094, -5.778963282823847, yes F: membrane protein SIF-BLAST: ,,[DUF1360 domain-containing protein [Dietzia sp.]],,MDZ4233751,56.1905,7.04559E-17 SIF-HHPRED: DUF1360 ; Protein of unknown function (DUF1360),,,PF07098.15,84.466,99.8 SIF-Syn: gene in similar position in soos and dontron draft. comes before same gene. for soos and dontron draft is in between same to genes /note=chosen start does not cover cp completely but covers more cp than other possible starts. also it has the best scores. stop ends before whole cp, suggests that there may be premature stop codon. /note=Phagesdb Blast has couple of matches with good e value /note=HHPred has one match that has a high %identity and good e value /note=NCBI Blast low %identity but good e value /note=conserved domain database matches have high e value /note=TMHMM domain matches transmembrane, 3 tmrs called CDS 14379 - 16331 /gene="27" /product="gp27" /function="portal protein" /locus tag="Tonitrus_27" /note=Original Glimmer call @bp 14379 has strength 11.51; Genemark calls start at 14379 /note=SSC: 14379-16331 CP: yes SCS: both ST: SS BLAST-Start: [portal protein [Gordonia phage Clawz] ],,NCBI, q44:s46 82.7692% 0.0 GAP: 8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.733, -3.6859871712346846, no F: portal protein SIF-BLAST: ,,[portal protein [Gordonia phage Clawz] ],,YP_010675470,59.1463,0.0 SIF-HHPRED: SIF-Syn: different gene number but in similar position and in between same genes in clawz and soos /note=Starterator: most common start not found in Tonitrus, called start found in only 3.8% of pham and called 18.5% when found. However, changing the start would cut off coding potential. /note= /note=HHPRED: called portal protein with 60% coverage, 100% identity, and a good e-value /note=NCBI Blast: also called portal protein with good e-value (0) and coverage (80%), but lower alignment and identity) As a note, all the calls were for close to the chosen start of the gene (evidence for not moving it.) CDS 16328 - 18733 /gene="28" /product="gp28" /function="Hypothetical Protein" /locus tag="Tonitrus_28" /note=Original Glimmer call @bp 16328 has strength 14.02; Genemark calls start at 16328 /note=SSC: 16328-18733 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein RN001_016450 [Aquatica leii]],,NCBI, q28:s1 83.5206% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.925, -3.1164791313608173, no F: Hypothetical Protein SIF-BLAST: ,,[hypothetical protein RN001_016450 [Aquatica leii]],,KAK4871441,74.5938,0.0 SIF-HHPRED: Phage_min_cap2 ; Phage minor capsid protein 2,,,PF06152.15,17.1036,99.5 SIF-Syn: Similar gene positioning to genes in close Phages (see PhagesDB blast/phamerator, etc.) Differing function calls...not conclusive In phage wooper there is this same gene with function call capside maturation protease. /note=Starterator: does not have the most annotated start, and all other options for the start were not called more than 2 times for the entire pham (~3%). Currently called start only found in Tonitrus, but covers all CP and other starts are not compelling enough evidence to change at this moment. /note=HHPRED: has really good e-values, but terrible coverage (~15%) for phage minor capsid protein. /note=NCBI Blast: has really good e-values and coverage, but bad % identity and alignment for phage minor head protein (many are listed with this function) /note=Conserved Domain Database: hit on MuF-like protine (related to minor head protein) but bad coverage/identity /note=PhagesDB Blast suggests capsid maturation protease (see approved function list) with many phage genes showing this same function. This is not enough to make a function call in any direction... CDS 18735 - 18884 /gene="29" /product="gp29" /function="Hypothetical Protein" /locus tag="Tonitrus_29" /note=Original Glimmer call @bp 18735 has strength 11.54; Genemark calls start at 18735 /note=SSC: 18735-18884 CP: yes SCS: both ST: NI BLAST-Start: [MULTISPECIES: hypothetical protein [Rhodococcus] ],,NCBI, q1:s1 91.8367% 2.18061E-9 GAP: 1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.925, -2.7653702713535186, yes F: Hypothetical Protein SIF-BLAST: ,,[MULTISPECIES: hypothetical protein [Rhodococcus] ],,WP_161800540,70.8333,2.18061E-9 SIF-HHPRED: DUF6723 ; Family of unknown function (DUF6723),,,PF20484.2,51.0204,88.9 SIF-Syn: e-values of the matched phages in BLAST were too low to be usable in synteny. /note=no good results for NCBI Blast, or HHPred; deep TMHMM and conserved domain N/A; therefore hypothetical protein CDS 18919 - 19125 /gene="30" /product="gp30" /function="Hypothetical Protein" /locus tag="Tonitrus_30" /note=Original Glimmer call @bp 18982 has strength 8.04; Genemark calls start at 18919 /note=SSC: 18919-19125 CP: yes SCS: both-gm ST: NA BLAST-Start: GAP: 34 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.305, -4.691784380744663, no F: Hypothetical Protein SIF-BLAST: SIF-HHPRED: MSP1_C ; Merozoite surface protein 1 (MSP1) C-terminus,,,PF07462.15,67.6471,63.2 SIF-Syn: e-values of the matched phages in BLAST were too low to be usable in synteny. /note=TMHMM: N/A /note=No good hits in HHPRED and NCBI BLAST /note=Start was changed to the LORF bc there was better CP, smaller Gap, better RBS values /note=Original start was 18982- it did not include all of the CP, it had a larger gap, and worse RBS values compared to the changed start. CDS 19280 - 21067 /gene="31" /product="gp31" /function="major capsid protein" /locus tag="Tonitrus_31" /note=Original Glimmer call @bp 19280 has strength 14.7; Genemark calls start at 19280 /note=SSC: 19280-21067 CP: yes SCS: both ST: NI BLAST-Start: [virion structural protein [Gordonia phage Clawz] ],,NCBI, q19:s24 96.6387% 3.51833E-167 GAP: 154 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.174, -2.253486910374644, yes F: major capsid protein SIF-BLAST: ,,[virion structural protein [Gordonia phage Clawz] ],,YP_010675473,62.1488,3.51833E-167 SIF-HHPRED: Mediator of RNA polymerase II transcription subunit 21; Transcription, RNA polymerase II; 3.4A {Schizosaccharomyces pombe},,,5N9J_D,16.9748,67.1 SIF-Syn: in similar position in clawz and soos also before same gene. /note=Conserved Domain: major capsid protein /note=NCBI Blast: bad %identity /note=HHPred: low probability and bad e-value /note=Starterator says that this gene does not call the most annotated start, but the start was not changed because the selected start has the best scores /note=Function: major capsid protein was chosen over major capsid hexamer protein because the evidence for major capsid hexamer protein was not as good and phamerator had major capsid protein as the conserved domain CDS 21138 - 21920 /gene="32" /product="gp32" /function="major capsid pentamer protein" /locus tag="Tonitrus_32" /note=Original Glimmer call @bp 21138 has strength 15.49; Genemark calls start at 21162 /note=SSC: 21138-21920 CP: yes SCS: both-gl ST: SS BLAST-Start: [MULTISPECIES: hypothetical protein [Rhodococcus] ],,NCBI, q9:s3 96.5385% 4.79766E-53 GAP: 70 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.004, -2.6013996449736907, yes F: major capsid pentamer protein SIF-BLAST: ,,[MULTISPECIES: hypothetical protein [Rhodococcus] ],,WP_102030965,59.5142,4.79766E-53 SIF-HHPRED: DUF5830 ; Family of unknown function (DUF5830),,,PF19148.4,21.1538,23.8 SIF-Syn: in similar position in soos and clawz and after same gene. in 450 other. /note=TMHMM: N/A /note=BLAST: % identity of all of the hits were too low /note=HHPRED: large e-value; low probability /note=Phages db Function Frequency: major capsid pentamer protein /note= /note=Because of all the low hits, I don`t think this gene is a hypothetical protein. Based on the frequency found in the Phages db frequency, this gene is quite possibly a major capsid pentamer protein. CDS 21930 - 22409 /gene="33" /product="gp33" /function="Hypothetical Protein" /locus tag="Tonitrus_33" /note=Original Glimmer call @bp 21930 has strength 13.88; Genemark calls start at 21930 /note=SSC: 21930-22409 CP: yes SCS: both ST: NA BLAST-Start: GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.404, -3.8366550365551095, no F: Hypothetical Protein SIF-BLAST: SIF-HHPRED: Translation initiation factor eif-2b-like protein; eIF2B, eIF2, guanine nucleotide exchange factor, GEF, regulatory subcomplex, regulatory subunit, translation initiation, translation; 2.55A {Chaetomium thermophilum},,,4ZEO_H,44.6541,79.2 SIF-Syn: Orpham /note=TMHMM: N/A /note=Phages db BLAST: closest hits were labeled "function unknown" /note=NCBI BLAST: N/A /note=HHPRED: low % probability-inconclusive /note= /note=HS: please make notes on how you confirmed your start CDS 22454 - 22666 /gene="34" /product="gp34" /function="membrane protein" /locus tag="Tonitrus_34" /note=Original Glimmer call @bp 22454 has strength 8.06; Genemark calls start at 22454 /note=SSC: 22454-22666 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein KHQ84_gp010 [Rhodococcus phage Finch] ],,NCBI, q9:s5 81.4286% 7.1528E-8 GAP: 44 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.481, -5.73456940637445, no F: membrane protein SIF-BLAST: ,,[hypothetical protein KHQ84_gp010 [Rhodococcus phage Finch] ],,YP_010059032,39.5349,7.1528E-8 SIF-HHPRED: DUF6116 ; Family of unknown function (DUF6116),,,PF19611.3,58.5714,85.7 SIF-Syn: gene is in similar position in flinch but between different genes /note=Starterator was inconclusive, start was only called by Tonitrus, but pham also only had 22 members. Tonitrus did not have most annotated start available. /note=TMHMM: 2 TMDs /note=Low % for NCBI BLAST and HHPRED; this could be because of the TMDs /note=BLASTS were inconclusive either calling unknown function or having poor e-values/identity. No help with function call. /note=Overall, the best hit was from TMHMM, can categorize as membrane protein. CDS 22668 - 23153 /gene="35" /product="gp35" /function="membrane protein" /locus tag="Tonitrus_35" /note=Original Glimmer call @bp 22668 has strength 7.11; Genemark calls start at 22668 /note=SSC: 22668-23153 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Rhodococcus ruber]],,NCBI, q8:s9 95.6522% 6.20164E-27 GAP: 1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.704, -5.336886114444951, no F: membrane protein SIF-BLAST: ,,[hypothetical protein [Rhodococcus ruber]],,WP_048770926,54.023,6.20164E-27 SIF-HHPRED: DUF4235 ; Protein of unknown function (DUF4235),,,PF14019.10,34.7826,92.6 SIF-Syn: different gene number but in similar position and not between same genes /note=Kept start because of no overlap and starterator points to this start. /note=Deep TMHMM hits, 2 transmembrane domain /note=Phamerator says 2 transmembrane proteins /note=Phagesdb evidence for Holin Protein, but more evidence for transmembrane protein CDS 23239 - 23979 /gene="36" /product="gp36" /function="major tail protein" /locus tag="Tonitrus_36" /note=Original Glimmer call @bp 23239 has strength 17.06; Genemark calls start at 23239 /note=SSC: 23239-23979 CP: yes SCS: both ST: SS BLAST-Start: [major tail protein [Gordonia phage Clawz] ],,NCBI, q3:s2 99.187% 3.82527E-105 GAP: 85 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.082, -2.523003374675015, yes F: major tail protein SIF-BLAST: ,,[major tail protein [Gordonia phage Clawz] ],,YP_010675478,72.0,3.82527E-105 SIF-HHPRED: L_lac_phage_MSP ; Phage tail tube protein,,,PF06488.15,72.7642,57.2 SIF-Syn: Similar in position to Phage Clawz. fairly decent synteny /note=No Transmembrane Domains /note=Evidence for function based off of phagesdb function frequency and blast function, hhpred had poor probability. CDS 24105 - 25664 /gene="37" /product="gp37" /function="minor tail protein" /locus tag="Tonitrus_37" /note=Original Glimmer call @bp 24105 has strength 16.45; Genemark calls start at 24105 /note=SSC: 24105-25664 CP: yes SCS: both ST: NI BLAST-Start: [tail fiber protein [Rhodococcus phage ReqiPepy6] ],,NCBI, q44:s64 91.5222% 0.0 GAP: 125 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.925, -2.827683592113848, yes F: minor tail protein SIF-BLAST: ,,[tail fiber protein [Rhodococcus phage ReqiPepy6] ],,YP_009017619,67.2862,0.0 SIF-HHPRED: PAF_acetylesterase_like; PAF_acetylhydrolase (PAF-AH)_like subfamily of SGNH-hydrolases. Platelet-activating factor (PAF) and PAF-AH are key players in inflammation and in atherosclerosis.,,,cd01820,53.3719,99.7 SIF-Syn: Doesn`t match with any other phages in the blast matches well. Not much synteny. Gene seems to be spliced into place. /note=Calling minor tail protein for now because of the hits with other phages and NCBI blast. /note=Lots of hits on HHPred to "acetylhydrolase" and "flavodoxin-like" and "SGNH-hydrolase" but coverage is only 50% but probability is high for best hits /note=Protein sequence is rich in glycine so that works with the approved function list. Can`t confirm collagen-like protien /note=Gene is in region with other tail proteins too. CDS 25747 - 26508 /gene="38" /product="gp38" /function="head-to-tail adaptor" /locus tag="Tonitrus_38" /note=Original Glimmer call @bp 25747 has strength 11.5; Genemark calls start at 25747 /note=SSC: 25747-26508 CP: yes SCS: both ST: NI BLAST-Start: [head-to-tail adaptor [Gordonia phage Soos]],,NCBI, q5:s7 95.6522% 6.26741E-87 GAP: 82 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.253, -2.0895162839948163, yes F: head-to-tail adaptor SIF-BLAST: ,,[head-to-tail adaptor [Gordonia phage Soos]],,WNM68776,68.1452,6.26741E-87 SIF-HHPRED: Head completion protein; Neck, Portal, T5, VIRUS, VIRAL PROTEIN; 3.2A {Escherichia phage DT57C},,,8HQO_R,83.004,99.7 SIF-Syn: Gene position is similar for directly nearby genes in Clawz and Baloo /note=Kept start because of the best RBS scores, didn`t have the most annotated start. Even though this start is only found in Tonitrus, other possible starts weren`t called by any other phages either. Starterator uninformative. /note=Hits for head-to-tail adapter on other phages /note=HHPred hits have SPP1 and Bacillus YqbG, as needed on the approved function list /note=Slightly higher HHPred hits show Portal Protein as another possible function /note= /note=MR: Great work investigating the approved function list. Don`t forget to include start notes as well as synteny, which can me found in starterator or in function frequency /note=JC: chosen start is only called for tonitrus (1 of 706) CDS 26505 - 27032 /gene="39" /product="gp39" /function="Hypothetical Protein" /locus tag="Tonitrus_39" /note=Original Glimmer call @bp 26535 has strength 11.28; Genemark calls start at 26526 /note=SSC: 26505-27032 CP: yes SCS: both-cs ST: NI BLAST-Start: [hypothetical protein [Corynebacterium xerosis] ],,NCBI, q1:s1 100.0% 1.3273E-38 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.179, -5.127289457927852, no F: Hypothetical Protein SIF-BLAST: ,,[hypothetical protein [Corynebacterium xerosis] ],,WP_102211785,60.8187,1.3273E-38 SIF-HHPRED: SIF-Syn: Synteny is similar to phage Claws and Soos for the directly surrounding genes /note=Changed Start: Selected start with -4 overlap so that it covers all the CP. Best RBS scores without too much overlap. Starterator wasn`t super informative. Moved start to start 41 that isn`t chosen in any phage (instead of start 52, which is called 2% of the time when present) on the basis that all the CP would be covered at this start. /note=No evidence to call any sort of function. One Conserved domain, but its not very informative. CDS 27042 - 27359 /gene="40" /product="gp40" /function="Hypothetical Protein" /locus tag="Tonitrus_40" /note=Original Glimmer call @bp 27042 has strength 12.54; Genemark calls start at 27042 /note=SSC: 27042-27359 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SPI1_28 [Skermania phage SPI1]],,NCBI, q3:s4 75.2381% 1.81834E-4 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.415, -3.8137921535956054, no F: Hypothetical Protein SIF-BLAST: ,,[hypothetical protein SPI1_28 [Skermania phage SPI1]],,AKJ71689,35.0427,1.81834E-4 SIF-HHPRED: Hypothetical protein yqbF; two-domain, Bsu26130, YqbF, NESG, Structural Genomics, PSI-2, Protein Structure Initiative, Northeast Structural Genomics Consortium; NMR {Bacillus subtilis} SCOP: a.140.3.2, d.344.1.1, l.1.1.1,,,2HJQ_A,86.6667,97.0 SIF-Syn: synteny is similar to in nearby positioning with Clawz and Soos /note=Kept start based on Starterator and rbs scores /note=HHPRED: 99.7% probability for Hypothetical protein NCBI BLAST: no good hits TMHMM: N/A /note=Overall not much info for any specific function CDS 27403 - 27783 /gene="41" /product="gp41" /function="Hypothetical Protein" /locus tag="Tonitrus_41" /note=Original Glimmer call @bp 27403 has strength 9.18; Genemark calls start at 27346 /note=SSC: 27403-27783 CP: yes SCS: both-gl ST: NI BLAST-Start: [HK97 gp10 family phage protein [Hoyosella altamirensis] ],,NCBI, q2:s20 83.3333% 5.2443E-17 GAP: 43 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.49, -7.322319962392713, no F: Hypothetical Protein SIF-BLAST: ,,[HK97 gp10 family phage protein [Hoyosella altamirensis] ],,WP_083962534,44.0299,5.2443E-17 SIF-HHPRED: Minor_capsid_2 ; Minor capsid protein,,,PF11114.12,95.2381,99.4 SIF-Syn: Similar in nearby position with Clawz and Soos. /note=best HHPred match leads to minor capsid protein. Phagesdb matches suggest a possible tail assembly chaperone /note=HK97_gp10 suggests a using Hypothetical protein for function CDS 27832 - 28347 /gene="42" /product="gp42" /function="tail assembly chaperone" /locus tag="Tonitrus_42" /note=Original Glimmer call @bp 27832 has strength 13.2; Genemark calls start at 27832 /note=SSC: 27832-28347 CP: yes SCS: both ST: NI BLAST-Start: [tail assembly chaperone [Gordonia phage Clawz] ],,NCBI, q1:s1 97.6608% 1.71943E-38 GAP: 48 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.004, -2.6013996449736907, yes F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Gordonia phage Clawz] ],,YP_010675483,59.7765,1.71943E-38 SIF-HHPRED: Phage_TAC_10 ; Phage tail assembly chaperone,,,PF10963.12,45.614,94.5 SIF-Syn: Similar in nearby synteny with Clawz and Soos /note=Kept start: good rbs scores and covers all cp /note=Called tail assembly chaperone based on HHPred and NCBI hits CDS 28395 - 28634 /gene="43" /product="gp43" /function="Hypothetical Protein" /locus tag="Tonitrus_43" /note=Original Glimmer call @bp 28395 has strength 12.45; Genemark calls start at 28395 /note=SSC: 28395-28634 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQD13_gp55 [Gordonia phage Clawz] ],,NCBI, q1:s1 93.6709% 4.37513E-19 GAP: 47 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.584, -5.6626780000781585, no F: Hypothetical Protein SIF-BLAST: ,,[hypothetical protein PQD13_gp55 [Gordonia phage Clawz] ],,YP_010675484,69.863,4.37513E-19 SIF-HHPRED: SIF-Syn: Synteny is similar in Clawz and Soos but not at all in the DR cluster phages (after the tape measure protein instead of right before. /note=Kept start: best rbs that doesn`t cut out any coding potential /note=Starterator supports this start: doesn`t contain the most annotated start but this one is used by other phages that have it. /note=There`s no good evidence for function whatsoever. CDS 28674 - 33584 /gene="44" /product="gp44" /function="tape measure protein" /locus tag="Tonitrus_44" /note=Original Glimmer call @bp 28674 has strength 18.47; Genemark calls start at 28674 /note=SSC: 28674-33584 CP: yes SCS: both ST: NI BLAST-Start: [tape measure protein [Gordonia phage Soos]],,NCBI, q1:s1 100.0% 0.0 GAP: 39 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.535, -6.529896889058254, no F: tape measure protein SIF-BLAST: ,,[tape measure protein [Gordonia phage Soos]],,WNM68782,49.3927,0.0 SIF-HHPRED: Tape Measure Protein, gp57; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_BF,7.15159,99.9 SIF-Syn: VGK: has synteny /note=Kept start: could go with next start and keep all coding potential, but would lower rbs scores slightly /note=Longest gene in the phage and is next to other tail proteins, HHPred and NCBI show high probability for tape measure protein. Definitely a tape measure protein. /note=Lots of conserved domains that support the function call, one transmembrane domain. CDS 33595 - 34905 /gene="45" /product="gp45" /function="minor tail protein" /locus tag="Tonitrus_45" /note=Original Glimmer call @bp 33595 has strength 16.51; Genemark calls start at 33595 /note=SSC: 33595-34905 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Gordonia phage Clawz] ],,NCBI, q1:s1 99.5413% 4.70367E-99 GAP: 10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.174, -2.253486910374644, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Gordonia phage Clawz] ],,YP_010675486,57.7103,4.70367E-99 SIF-HHPRED: ORF46; Distal tail protein, Receptor-binding protein, Phage baseplate, host adsorption apparatus, genome injection device, VIRAL PROTEIN; 3.8A {Lactococcus phage TP901-1},,,4V96_AU,35.5505,95.6 SIF-Syn: VGK: has synteny /note=> captures all coding potential /note=> found in 6 of 30 genes in pham, called 100% when present /note=>glycine-rich protein, you can call them minor tail proteins. /note=> should keep start /note= /note=JH: cite evidence for glycine-rich protein. CDS 34902 - 35993 /gene="46" /product="gp46" /function="minor tail protein" /locus tag="Tonitrus_46" /note=Original Glimmer call @bp 34902 has strength 18.76; Genemark calls start at 34902 /note=SSC: 34902-35993 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Gordonia phage Soos]],,NCBI, q12:s13 96.6942% 1.74362E-61 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.656, -4.925390935265885, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Gordonia phage Soos]],,WNM68784,47.3822,1.74362E-61 SIF-HHPRED: Tail protein, 43 kDa; tail protein, structural genomics, PSI, MCSG, Protein Structure Initiative, Midwest Center for Structural Genomics, UNKNOWN FUNCTION; 2.1A {Neisseria meningitidis MC58} SCOP: b.106.1.1,,,3D37_A,93.1129,99.7 SIF-Syn: VGK: has synteny /note=Kept Start: good rbs score, nice -4 overlap, covers all cp /note=Easy call for minor tail protein based on Phagesdb, HHPred, and NCBI hits. Surrounded by tail proteins. /note= /note=JH: maybe give examples of some of these values? Idk if its necessary as this is much more concise, just a thought. CDS 35996 - 36175 /gene="47" /product="gp47" /function="minor tail protein" /locus tag="Tonitrus_47" /note=Original Glimmer call @bp 35996 has strength 9.58; Genemark calls start at 35996 /note=SSC: 35996-36175 CP: yes SCS: both ST: NA BLAST-Start: [minor tail protein [Gordonia phage Clawz] ],,NCBI, q3:s5 94.9153% 5.59822E-9 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.641, -3.4283035164720754, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Gordonia phage Clawz] ],,YP_010675488,52.0548,5.59822E-9 SIF-HHPRED: SIF-Syn: e-values of the matched phages in BLAST were too low to be usable in synteny JH: see 48 (i know yall are working on it ; ) /note=> Singleton, nothing to compare to /note=> good coding potential, longest ORF /note=> should keep start /note=> some glycine but not rich (short protein sequence) CDS 36214 - 36438 /gene="48" /product="gp48" /function="Hypothetical Protein" /locus tag="Tonitrus_48" /note=Original Glimmer call @bp 36214 has strength 12.36; Genemark calls start at 36220 /note=SSC: 36214-36438 CP: yes SCS: both-gl ST: NA BLAST-Start: GAP: 38 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.163, -2.417348306996335, yes F: Hypothetical Protein SIF-BLAST: SIF-HHPRED: SIF-Syn: JH: just use "e-values of the matched phages in BLAST were too low to be usable in synteny. " if there are no phagesdb BLAST results with good e-values /note=This is the best start (despite not being the longest ORF). Matches for any potential functions are inconclusive with poor E scores on PhagesDB/BLAST and inconclusive on HHPred. /note= /note=HS: please state WHY your start is good despite Longest ORF CDS 36438 - 37013 /gene="49" /product="gp49" /function="Hypothetical Protein" /locus tag="Tonitrus_49" /note=Original Glimmer call @bp 36438 has strength 12.9; Genemark calls start at 36438 /note=SSC: 36438-37013 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein [Corynebacterium xerosis] ],,NCBI, q1:s1 98.9529% 3.55785E-41 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.925, -2.9063687850157054, no F: Hypothetical Protein SIF-BLAST: ,,[hypothetical protein [Corynebacterium xerosis] ],,WP_102211795,55.814,3.55785E-41 SIF-HHPRED: SIF-Syn: Has synteny /note=Start covers all coding potential and is the longest ORF CDS 37024 - 38445 /gene="50" /product="gp50" /function="Hypothetical Protein" /locus tag="Tonitrus_50" /note=Original Glimmer call @bp 37024 has strength 13.45; Genemark calls start at 37024 /note=SSC: 37024-38445 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein SAMN05443665_101738 [Actinomadura meyerae]],,NCBI, q1:s1 19.2389% 5.60298E-14 GAP: 10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.33, -1.9310779259753799, yes F: Hypothetical Protein SIF-BLAST: ,,[hypothetical protein SAMN05443665_101738 [Actinomadura meyerae]],,SNT14071,17.0213,5.60298E-14 SIF-HHPRED: SIF-Syn: N/A (orpham) /note=Keeping start (best score and longest ORF). Some PhagesDB matches suggest a minor tail protein or tape measure protein function, but no E score is high enough to be considered for a function. Likely remains `function unknown` /note=JH: just make sure you go through and check if the BLAST/NCBI/HHPRED values you used/looked at are checked. (they need have the box checked in order to be included.) CDS 38455 - 39078 /gene="51" /product="gp51" /function="lysin A, N-acetylmuramoyl-L-alanine amidase domain TBD" /locus tag="Tonitrus_51" /note=Original Glimmer call @bp 38455 has strength 11.12; Genemark calls start at 38455 /note=SSC: 38455-39078 CP: yes SCS: both ST: NA BLAST-Start: [N-acetylmuramoyl-L-alanine amidase [Rhodococcus aetherivorans] ],,NCBI, q1:s5 99.0338% 4.96785E-107 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.619, -4.2219893161307915, no F: lysin A, N-acetylmuramoyl-L-alanine amidase domain TBD SIF-BLAST: ,,[N-acetylmuramoyl-L-alanine amidase [Rhodococcus aetherivorans] ],,WP_051446051,80.7512,4.96785E-107 SIF-HHPRED: Bifunctional autolysin; peptidoglycan, autolysin, amidase, N-acetylmuramoyl-L-alanine amidase, HYDROLASE; HET: PEG, IMD; 1.124A {Staphylococcus aureus subsp. aureus},,,4KNK_A,79.7101,99.2 SIF-Syn: N/A (orpham) /note=> singleton, nothing to compare to according to starterator /note=> good coding potential and longest ORF /note=> should keep start /note= /note=if not a Mycobacteriophage, must have a lysin B, otherwise it is endolysin, N-acetylmuramoyl-L-alanine amidase domain so function is TBD CDS 39078 - 39962 /gene="52" /product="gp52" /function="peptidase" /locus tag="Tonitrus_52" /note=Original Glimmer call @bp 39078 has strength 12.02; Genemark calls start at 39075 /note=SSC: 39078-39962 CP: yes SCS: both-gl ST: NA BLAST-Start: [M23 family metallopeptidase [Rhodococcus sp. p52] ],,NCBI, q12:s1 94.898% 3.66245E-106 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.828, -5.927106658893579, no F: peptidase SIF-BLAST: ,,[M23 family metallopeptidase [Rhodococcus sp. p52] ],,WP_065996442,68.5315,3.66245E-106 SIF-HHPRED: Peptidase M23; membrane protein, enzyme, TRANSPORT PROTEIN; HET: ADP;{Vibrio cholerae},,,8TZL_E,47.2789,99.2 SIF-Syn: This is a singleton; not a member of a cluster /note=The start was picked to reduce the overlap between gene 52 and 51. Z-score and final score remained about the same from the original start. Coding potential also included all of the new selected start. A lot of hits for metallopeptidase for conserved domains were seen; could be looked in to further for specific function. CDS 39964 - 41100 /gene="53" /product="gp53" /function="hydrolase" /locus tag="Tonitrus_53" /note=Original Glimmer call @bp 39964 has strength 11.64; Genemark calls start at 39964 /note=SSC: 39964-41100 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein RN001_016453 [Aquatica leii]],,NCBI, q1:s1 100.0% 1.21118E-165 GAP: 1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.926, -4.881372581065431, no F: hydrolase SIF-BLAST: ,,[hypothetical protein RN001_016453 [Aquatica leii]],,KAK4871444,74.6073,1.21118E-165 SIF-HHPRED: Putative secreted protein; mycobacterium tuberculosis arabinogalactan, HYDROLASE; HET: SIN; 2.41A {Mycobacterium tuberculosis H37Rv},,,8AN0_A,91.5344,100.0 SIF-Syn: Has similar start to other phages in C2 cluster /note=Start captures all of the coding potential, has good scores, and is the suggested start in other phages. Hits some lysin domains that could point to a generalized hydrolase or a more specified hydrolase function but it doesn`t share a pham with the listed phages on the approved function list. We are not sure whether this could point to it having a more specific hydrolase function, and plan to ask about this in our Tonitrus forum questions. CDS 41117 - 41488 /gene="54" /product="gp54" /function="membrane protein" /locus tag="Tonitrus_54" /note=Original Glimmer call @bp 41117 has strength 12.19; Genemark calls start at 41117 /note=SSC: 41117-41488 CP: yes SCS: both ST: NA BLAST-Start: [holin [Gordonia phage ChisanaKitsune] ],,NCBI, q13:s39 70.7317% 9.91825E-13 GAP: 16 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.163, -3.1837611541087343, yes F: membrane protein SIF-BLAST: ,,[holin [Gordonia phage ChisanaKitsune] ],,YP_010675704,30.7263,9.91825E-13 SIF-HHPRED: SIF-Syn: It was an orpham no known synteny RDJ: but if we called it a holin, would it have synteny? /note=Start has book Z/final scores and includes all CP. Had hits for two transmembrane domains in phamerator and deeptmhmm and had several hits for holin but they didn`t had strong enough alignment/percent identity to call it as that. This gene does fit the requirements to be called a holin because it hits holin in BLAST and HHPred. It also has two transmembrane domains. Evidence to call it membrane protein for now: "for now when we seem multiple possibilities for a holin gene, let`s call them membrane proteins." /note= /note=RDJ: good notes, though add something about the start; holin is a possibility; does it meet any of the criteria? CDS 41488 - 41835 /gene="55" /product="gp55" /function="tail fiber" /locus tag="Tonitrus_55" /note=Original Glimmer call @bp 41527 has strength 15.97; Genemark calls start at 41488 /note=SSC: 41488-41835 CP: yes SCS: both-gm ST: NI BLAST-Start: [hypothetical protein PQD13_gp67 [Gordonia phage Clawz] ],,NCBI, q5:s7 94.7826% 7.15534E-15 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.261, -4.130083972178869, yes F: tail fiber SIF-BLAST: ,,[hypothetical protein PQD13_gp67 [Gordonia phage Clawz] ],,YP_010675496,52.5,7.15534E-15 SIF-HHPRED: SIF-Syn: It is following the tape measure protein which is involved in tail function so this function would fit with the position of this gene in the genome. /note=Start has good CP and good scores. Other starts were not supported. We are calling this as a tail fiber for now but would like to have this checked over there was strong probability and e values for this in HHPred but not for a lot of hits, many hits were hypothetical protein yet. Additionally, we think this should be considered to see if it would be better to call it as a minor tail protein. There are no requirements in the Official Function list for tail fiber. CDS 41828 - 42193 /gene="56" /product="gp56" /function="Hypothetical Protein" /locus tag="Tonitrus_56" /note=Original Glimmer call @bp 41846 has strength 9.91; Genemark calls start at 41828 /note=SSC: 41828-42193 CP: yes SCS: both-gm ST: NI BLAST-Start: [hypothetical protein [Gordonia sp. PDNC005] ],,NCBI, q1:s1 99.1736% 3.10825E-39 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.484, -3.7329837339109084, no F: Hypothetical Protein SIF-BLAST: ,,[hypothetical protein [Gordonia sp. PDNC005] ],,WP_205329313,68.0328,3.10825E-39 SIF-HHPRED: BppU_IgG ; Baseplate upper protein immunoglobulin like domain,,,PF18667.5,40.4959,90.3 SIF-Syn: /note=The original start was at 41846 which shortened the gene and had significantly worse scores, so we changed the start to 41828 which had better scores, still included the coding potential, and did not have a notable effect on the function. Most of the hits for functions were either hypothetical protein or hit other things but had too low e-values. CDS 42209 - 43330 /gene="57" /product="gp57" /function="tail fiber" /locus tag="Tonitrus_57" /note=Original Glimmer call @bp 42209 has strength 17.2; Genemark calls start at 42209 /note=SSC: 42209-43330 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Gordonia sp. PDNC005] ],,NCBI, q4:s3 99.1957% 2.64435E-158 GAP: 15 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.163, -2.627458653341447, yes F: tail fiber SIF-BLAST: ,,[hypothetical protein [Gordonia sp. PDNC005] ],,WP_205329312,79.4038,2.64435E-158 SIF-HHPRED: SIF-Syn: Additionally, this gene follows the tape measure gene involved in the length of the tail, though it is more removed and could be involved with tail function, which suggests the tail fiber function. /note=Has the conserved domain for a tail fiber protein, and has several strong hits for it in the NCBI blast, it also has some hits for head/capsid decoration protein, but based on the spot of this gene in the genome we think that function would be less likely. Other genomes also suggest that head/capsid decoration proteins are not common in this region. There are also many hits for hypothetical proteins but the evidence for tail fiber protein is strong enough to call it, though it could be looked into further. /note= /note=RDJ: can you explain how you determined it is a less likely spot for a head/capsid decoration protein? Did you look at other genomes to determine this? If so great! - just record. CDS 43340 - 43522 /gene="58" /product="gp58" /function="Hypothetical Protein" /locus tag="Tonitrus_58" /note=Original Glimmer call @bp 43340 has strength 11.27; Genemark calls start at 43340 /note=SSC: 43340-43522 CP: yes SCS: both ST: NA BLAST-Start: GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.619, -4.2219893161307915, yes F: Hypothetical Protein SIF-BLAST: SIF-HHPRED: SIF-Syn: Orpham /note=There isn`t great evidence from HHPred or NCBI BLAST for function. CP includes the start and Z/final scores are good and indicate keeping the current start. /note= /note=RDJ: small gene - confirm CP in notes, other reasons for keeping. Starterator and CP dropdowns - looks like you might have been partway done with this one, so apologies. CDS 43532 - 44185 /gene="59" /product="gp59" /function="Hypothetical Protein" /locus tag="Tonitrus_59" /note=Original Glimmer call @bp 43532 has strength 12.83; Genemark calls start at 43532 /note=SSC: 43532-44185 CP: yes SCS: both ST: NI BLAST-Start: [hydrolase [Rhodococcus phage NiceHouse]],,NCBI, q1:s1 98.1567% 1.76314E-53 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.754, -3.196631460098011, yes F: Hypothetical Protein SIF-BLAST: ,,[hydrolase [Rhodococcus phage NiceHouse]],,QLF83408,63.8498,1.76314E-53 SIF-HHPRED: SIF-Syn: /note=HHPred did result in significant hits. Hydrolase was a hit, but the score was not significant enough to reach a conclusive function. BLAST results were not significant. CDS 44270 - 44503 /gene="60" /product="gp60" /function="helix-turn-helix DNA binding domain" /locus tag="Tonitrus_60" /note=Original Glimmer call @bp 44270 has strength 10.13; Genemark calls start at 44270 /note=SSC: 44270-44503 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein FraQA3DRAFT_4504 [Frankia sp. QA3]],,NCBI, q17:s3 71.4286% 3.18501E-8 GAP: 84 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.163, -2.9284886490054283, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[hypothetical protein FraQA3DRAFT_4504 [Frankia sp. QA3]],,EIV94725,48.6486,3.18501E-8 SIF-HHPRED: Recombination Directionality Factor RdfS; Excisionase, Recombination Directionality Factor, winged helix-turn-helix, superhelix, DNA BINDING PROTEIN; HET: GOL; 2.45A {Mesorhizobium japonicum R7A},,,8DGL_A,76.6234,99.1 SIF-Syn: The gene is found in very similar location to others in the CR cluster. /note=The longest ORF was not selected but the recommended start has good CP and decent scores that lengthen the gene. There are very high e-values that call the gene RuvC-like resolvase in HHpred and NCBI blast. One hit with Holliday junction resolvase but with approved fucntion list says to call it Holliday junction resolvase if there are no RuvC or RusA which there are. CDS 44575 - 44943 /gene="61" /product="gp61" /function="Hypothetical Protein" /locus tag="Tonitrus_61" /note=Original Glimmer call @bp 44575 has strength 16.75; Genemark calls start at 44575 /note=SSC: 44575-44943 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PP491_gp52 [Gordonia phage BoyNamedSue] ],,NCBI, q7:s6 82.7869% 5.69414E-30 GAP: 71 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.174, -2.3158002311349737, yes F: Hypothetical Protein SIF-BLAST: ,,[hypothetical protein PP491_gp52 [Gordonia phage BoyNamedSue] ],,YP_010653245,63.3929,5.69414E-30 SIF-HHPRED: SIF-Syn: /note=There isn`t great evidence from HHPred or NCBI BLAST to support alternative function. Start was chosen for the smallest gap, its RBS values, and coding potential. CDS 44978 - 45802 /gene="62" /product="gp62" /function="Hypothetical Protein" /locus tag="Tonitrus_62" /note=Original Glimmer call @bp 44978 has strength 17.83; Genemark calls start at 44978 /note=SSC: 44978-45802 CP: yes SCS: both ST: SS BLAST-Start: [DUF2303 family protein [Mycobacterium eburneum] ],,NCBI, q13:s22 94.1606% 1.92999E-82 GAP: 34 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.174, -2.394485424036831, yes F: Hypothetical Protein SIF-BLAST: ,,[DUF2303 family protein [Mycobacterium eburneum] ],,WP_133278476,59.7173,1.92999E-82 SIF-HHPRED: DUF2303 ; Uncharacterized conserved protein (DUF2303),,,PF10065.13,89.4161,100.0 SIF-Syn: /note=Has strong hits in HHPred, NCBI, and the Conserved Domains but all for the Conserved domain DUF2303 which has an unknown function. The first start is best because of the smaller gap. Does have coding potential. CDS 46084 - 46386 /gene="63" /product="gp63" /function="Hypothetical Protein" /locus tag="Tonitrus_63" /note=Original Glimmer call @bp 46084 has strength 8.81; Genemark calls start at 46084 /note=SSC: 46084-46386 CP: yes SCS: both ST: NA BLAST-Start: GAP: 281 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.819, -5.118120525712711, no F: Hypothetical Protein SIF-BLAST: SIF-HHPRED: SDH5_plant ; Succinate dehydrogenase subunit 5, mitochondrial,,,PF14290.10,69.0,31.6 SIF-Syn: /note=BLAST hits indicate a longer start and while it would decrease the gap, the scores got worse. We selected the current start as other starts significantly shorten gene and have worse scores. CP includes all the current start. No good evidence from HHPred or BLAST. Phagesdb BLAST results hinted towards lysin/edolysin, but evidence from other HHPred and NCBI Blast did not support this. /note= /note=RDJ: BLAST hits indicate a longer start might be in order, though the change is probably not sufficient to make the BLAST line up Q1:T1, so take a look. Or note reasons for choosing start you did? CDS 46421 - 47128 /gene="64" /product="gp64" /function="helix-turn-helix DNA binding domain" /locus tag="Tonitrus_64" /note=Original Glimmer call @bp 46376 has strength 8.63; Genemark calls start at 46376 /note=SSC: 46421-47128 CP: no SCS: both-cs ST: NA BLAST-Start: [helix-turn-helix domain-containing protein [Rhodococcus rhodochrous] ],,NCBI, q1:s14 45.9574% 4.55157E-29 GAP: 34 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.375, -4.547509295897855, no F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix domain-containing protein [Rhodococcus rhodochrous] ],,WP_192379501,60.9023,4.55157E-29 SIF-HHPRED: Hypothetical Protein; Structural Genomics, NIAID, National Institute of Allergy and Infectious Diseases, Center for Structural Genomics of Infectious; HET: SO4, MSE; 1.8A {Vibrio cholerae O1 biovar El Tor},,,4RO3_B,23.8298,98.2 SIF-Syn: orpham /note=Shows helix-turn-helix in HHPred similarly to gene 60. Hits transcriptional factors and DNA binding proteins in HHPred and BLAST; although there is proper evidence of the function is DNA binding protein, we cannot say for certain. The start for this gene has good coding potential and the best RBS with a reasonable gap. CDS 47255 - 47443 /gene="65" /product="gp65" /function="Hypothetical Protein" /locus tag="Tonitrus_65" /note=Original Glimmer call @bp 47255 has strength 22.0; Genemark calls start at 47255 /note=SSC: 47255-47443 CP: yes SCS: both ST: NA BLAST-Start: GAP: 126 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.242, -2.1123791669543204, yes F: Hypothetical Protein SIF-BLAST: SIF-HHPRED: yfhH hypothetical protein; Structural Genomics, unknown function, PSI, Protein Structure Initiative, Midwest Center for Structural Genomics, MCSG; 1.71A {Bacillus subtilis} SCOP: b.34.15.1, l.1.1.1,,,1SF9_A,53.2258,79.1 SIF-Syn: Orpham /note=Original start used ;LORF; No good evidence from HHPred or BLAST, probabilities were low. CDS 47518 - 48192 /gene="66" /product="gp66" /function="Hypothetical Protein" /locus tag="Tonitrus_66" /note=Original Glimmer call @bp 47518 has strength 18.19; Genemark calls start at 47518 /note=SSC: 47518-48192 CP: yes SCS: both ST: NA BLAST-Start: GAP: 74 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.242, -2.1123791669543204, yes F: Hypothetical Protein SIF-BLAST: SIF-HHPRED: PROTEIN (FLIG); FLAGELLAR MOTOR SWITCH PROTEIN, STRUCTURAL PROTEIN; 2.2A {Thermotoga maritima} SCOP: a.118.14.1,,,1QC7_A,29.4643,89.2 SIF-Syn: Orpham /note=No evidence from HHPred or BLAST. Start has coding potential and the best RBS. CDS 48273 - 48740 /gene="67" /product="gp67" /function="Hypothetical Protein" /locus tag="Tonitrus_67" /note=Original Glimmer call @bp 48273 has strength 13.26; Genemark calls start at 48273 /note=SSC: 48273-48740 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein [Gemmatimonadota bacterium]],,NCBI, q109:s8 28.3871% 7.40672E-11 GAP: 80 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.672, -3.3472632858563953, yes F: Hypothetical Protein SIF-BLAST: ,,[hypothetical protein [Gemmatimonadota bacterium]],,MCK5650945,49.2308,7.40672E-11 SIF-HHPRED: c.37.1.1 (B:) Deoxycytidine kinase {Human (Homo sapiens) [TaxId: 9606]} | CLASS: Alpha and beta proteins (a/b), FOLD: P-loop containing nucleoside triphosphate hydrolases, SUPFAM: P-loop containing nucleoside triphosphate hydrolases, FAM: Nucleotide and nucleoside kinases,,,SCOP_d1p5zb_,8.3871,57.5 SIF-Syn: Orpham /note=No strong evidence from HHPred or NCBI BLAST. Best RBS, good coding potential /note= /note=Group 3 check: any notes about start? CDS 48810 - 49829 /gene="68" /product="gp68" /function="Cas4 exonuclease" /locus tag="Tonitrus_68" /note=Original Glimmer call @bp 48810 has strength 14.62; Genemark calls start at 48810 /note=SSC: 48810-49829 CP: yes SCS: both ST: NI BLAST-Start: [Cas4 exonuclease [Gordonia phage Soos]],,NCBI, q10:s4 97.3451% 7.03758E-89 GAP: 69 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.925, -3.1164791313608173, yes F: Cas4 exonuclease SIF-BLAST: ,,[Cas4 exonuclease [Gordonia phage Soos]],,WNM68808,62.6113,7.03758E-89 SIF-HHPRED: c.52.1.13 (A:) lambda exonuclease {Bacteriophage lambda [TaxId: 10710]} | CLASS: Alpha and beta proteins (a/b), FOLD: Restriction endonuclease-like, SUPFAM: Restriction endonuclease-like, FAM: lambda exonuclease,,,SCOP_d3sm4a_,57.5221,99.9 SIF-Syn: This gene is seen in other phages of different clusters. /note=start was selected due to good Z/final scores and coding potential included all of the start. Contains the conserved domain PD-(D/E)XK nuclease superfamily which indicate a Cas4 exonuclease although does not include the crystal structure 4R5Q_A, 41C1_A. Exonuclease function was found in Trogglehumper_74, although was not found in the cluster of Tonirtus. CDS 49840 - 51597 /gene="69" /product="gp69" /function="ASCE ATPase" /locus tag="Tonitrus_69" /note=Original Glimmer call @bp 49936 has strength 13.3; Genemark calls start at 49936 /note=SSC: 49840-51597 CP: yes SCS: both-cs ST: NI BLAST-Start: [ASCE ATPase [Gordonia phage Soos]],,NCBI, q35:s4 92.1367% 2.48607E-110 GAP: 10 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.868, -2.881875840424956, no F: ASCE ATPase SIF-BLAST: ,,[ASCE ATPase [Gordonia phage Soos]],,WNM68809,61.6766,2.48607E-110 SIF-HHPRED: DNA repair and recombination protein radB; RadB, filament formation, homologous recombination, ATPase domain, hyperthermophile, DNA BINDING PROTEIN; 2.2A {Thermococcus kodakarensis},,,2CVH_B,35.8974,99.2 SIF-Syn: Very similar genes are seen adjacent to this pham on phages Claws and Soos from the CP cluster. /note=Start changed to include all coding potential. /note=CP: Good, yes; LO: no, longest ORF introduces significant overlap; Gap: 10; RBS: best score; Starterator: not most annotated; BLAST-func: Phage Soos, ASCE ATPase, great e-val; no conserved domains; no TM domains; Function rationale: Strong BLAST matches with phage Soos and synteny showing very similar genes on either side of this gene. Reviewing criteria, this gene appears to meet the requirements. /note=RDJ: great notes! CDS 51613 - 51885 /gene="70" /product="gp70" /function="Hypothetical Protein" /locus tag="Tonitrus_70" /note=Original Glimmer call @bp 51613 has strength 11.25; Genemark calls start at 51613 /note=SSC: 51613-51885 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_SOOS_81 [Gordonia phage Soos]],,NCBI, q1:s1 88.8889% 7.51345E-8 GAP: 15 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.672, -3.284949965096066, yes F: Hypothetical Protein SIF-BLAST: ,,[hypothetical protein SEA_SOOS_81 [Gordonia phage Soos]],,WNM68811,47.3684,7.51345E-8 SIF-HHPRED: Cys_rich_KTR ; Cysteine-rich KTR,,,PF14205.10,54.4444,86.4 SIF-Syn: Similar genes are seen adjacent to this pham in Soos and Clawz from the CP cluster. /note=CP: good, yes; LO: yes; Gap: 15; RBS: best score; BLAST-start: Q1:T1; Starterator: called when present (most annotated); BLAST-function: hypothetical protein; Conserved domain: none; HHPred: Zinc ribbon domain, 86.2%, Q1:T1; CDS 51892 - 52272 /gene="71" /product="gp71" /function="HNH endonuclease" /locus tag="Tonitrus_71" /note=Original Glimmer call @bp 51892 has strength 5.88; Genemark calls start at 51892 /note=SSC: 51892-52272 CP: yes SCS: both ST: NI BLAST-Start: [HNH endonuclease signature motif containing protein [Corynebacterium sp. LK19] ],,NCBI, q1:s1 84.9206% 1.7804E-19 GAP: 6 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.683, -4.31216594647988, yes F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease signature motif containing protein [Corynebacterium sp. LK19] ],,WP_147718968,49.5652,1.7804E-19 SIF-HHPRED: d.4.1.8 (A:513-673) CRISPR-associated endonuclease Cas9/Csn1, HNH domain {Actinomyces naeslundii [TaxId: 1115803]} | CLASS: Alpha and beta proteins (a+b), FOLD: His-Me finger endonucleases, SUPFAM: His-Me finger endonucleases, FAM: HNH domain from CRISPR-associated protein Cas9,,,SCOP_d4ogca2,60.3175,96.9 SIF-Syn: N/A /note=CP: good; LO: yes; Gap: 6; RBS: best score; BLAST-start: q1:t2; Starterator: not MA, called 1 of 558 times when present (0.2%); BLAST-func: HNH endonuclease; No TM domains; prelim func call: HNH endonuclease: meets criteria- contains HNNH sequence with less than 30 aa between (HNNH approved by forum to qualify for HNH proteins). /note=Function Rationale: HNH endonuclease is being called for this gene due to numerous BLAST calls that match it, as well as it meeting the requirements for this function. /note=RDJ: mention what the criteria were? Does it have an HNH in the sequence? CDS complement (52262 - 52447) /gene="72" /product="gp72" /function="Hypothetical Protein" /locus tag="Tonitrus_72" /note=Original Glimmer call @bp 52447 has strength 0.14 /note=SSC: 52447-52262 CP: yes SCS: glimmer ST: NA BLAST-Start: GAP: 73 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.675, -5.6853845463436965, no F: Hypothetical Protein SIF-BLAST: SIF-HHPRED: c.69.1.12 (A:) automated matches {Pseudomonas fluorescens [TaxId: 294]} | CLASS: Alpha and beta proteins (a/b), FOLD: alpha/beta-Hydrolases, SUPFAM: alpha/beta-Hydrolases, FAM: Haloperoxidase,,,SCOP_d3ia2a_,65.5738,43.4 SIF-Syn: /note=Hypothetical protein with not much else pointing to it existing other than the good CP. /note=Start kept because good CP, High final score, High Z score. Start: 52447, End: 52262. Longest possible ORF. /note=RDJ: yes a little odd reverse gene in the middle of forward genes, but CP is strong CDS 52521 - 52982 /gene="73" /product="gp73" /function="Hypothetical Protein" /locus tag="Tonitrus_73" /note=Original Glimmer call @bp 52521 has strength 16.72; Genemark calls start at 52521 /note=SSC: 52521-52982 CP: yes SCS: both ST: NA BLAST-Start: GAP: 73 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.174, -2.604595770381943, yes F: Hypothetical Protein SIF-BLAST: SIF-HHPRED: DUF6429 ; Domain of unknown function (DUF6429),,,PF20008.3,26.1438,65.3 SIF-Syn: Orpham /note=Neither HHPred, Blast, or anything else points to it at all. /note=Not longest ORF but good CP, High final score, High Z score. Start: 52521, Stop: 52982 /note=Group 2: Notes on start? CDS 52979 - 53323 /gene="74" /product="gp74" /function="Hypothetical Protein" /locus tag="Tonitrus_74" /note=Original Glimmer call @bp 52979 has strength 13.02; Genemark calls start at 52979 /note=SSC: 52979-53323 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein [Prescottella equi] ],,NCBI, q7:s11 85.0877% 6.62356E-8 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.505, -3.769471247018311, no F: Hypothetical Protein SIF-BLAST: ,,[hypothetical protein [Prescottella equi] ],,NKS56207,40.1709,6.62356E-8 SIF-HHPRED: DUF1480 ; Protein of unknown function (DUF1480),,,PF07351.17,16.6667,23.0 SIF-Syn: /note=No function. Nothing points to it. /note=Start kept because good CP, Good final and Z score. Start: 52979, End: 53323. Not longest ORF but least amount of overlap. Longer ORFs have large amount of ocerlap. /note=MS: mention start and other information that available. CDS 53310 - 53537 /gene="75" /product="gp75" /function="Hypothetical Protein" /locus tag="Tonitrus_75" /note=Original Glimmer call @bp 53310 has strength 8.36; Genemark calls start at 53316 /note=SSC: 53310-53537 CP: yes SCS: both-gl ST: NA BLAST-Start: GAP: -14 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.084, -4.556919718793868, no F: Hypothetical Protein SIF-BLAST: SIF-HHPRED: M14_PaCCP-like; Peptidase M14-like domain of ATP/GTP binding proteins and cytosolic carboxypeptidases similar to Pseudomonas aerugnosa CCP (PaCCP).,,,cd06234,93.3333,36.8 SIF-Syn: /note=CP: good, all included; LO: no, but longest doesn`t add any coding potential and has a large overlap; Gap: -14; RBS: best scores; no conserved domains or transmembrane domains; no significant HHPred hits; no BLAST hits /note= /note=RDJ: great notes!; correct NA Starterator for orpham /note=MCS: Just don`t forget to check off boxes, like the first hit, as evidence for things like HHPred and Blast even if they are not significant. CDS 53537 - 53983 /gene="76" /product="gp76" /function="RusA-like resolvase" /locus tag="Tonitrus_76" /note=Original Glimmer call @bp 53537 has strength 12.9; Genemark calls start at 53537 /note=SSC: 53537-53983 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein [Nocardia cyriacigeorgica] ],,NCBI, q20:s15 85.1351% 5.74969E-20 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.163, -2.2763497933341483, yes F: RusA-like resolvase SIF-BLAST: ,,[hypothetical protein [Nocardia cyriacigeorgica] ],,WP_163822396,47.619,5.74969E-20 SIF-HHPRED: d.79.6.1 (A:) automated matches {Escherichia coli [TaxId: 562]} | CLASS: Alpha and beta proteins (a+b), FOLD: Bacillus chorismate mutase-like, SUPFAM: Holliday junction resolvase RusA, FAM: Holliday junction resolvase RusA,,,SCOP_d2h8ea_,91.2162,99.7 SIF-Syn: /note=CP: good, all included; LO: no, but longest doesn`t add any coding potential and has a large overlap; Gap: -1; RBS: best scores; no conserved domains or transmembrane domains /note= /note=RDJ: explain/give some summary of why Starterator is NI on this one. There are many members in this pham, so it`s unexpected that it would NI. Does this gene meet the criteria in the official functions list for RusA-like resolvase? How? CDS 54004 - 54357 /gene="77" /product="gp77" /function="Hypothetical Protein" /locus tag="Tonitrus_77" /note=Original Glimmer call @bp 54004 has strength 15.48; Genemark calls start at 54004 /note=SSC: 54004-54357 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein SEA_SOOS_5 [Gordonia phage Soos]],,NCBI, q14:s10 77.7778% 2.57914E-5 GAP: 20 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.925, -3.1164791313608173, yes F: Hypothetical Protein SIF-BLAST: ,,[hypothetical protein SEA_SOOS_5 [Gordonia phage Soos]],,WNM68735,46.875,2.57914E-5 SIF-HHPRED: DUF1773 ; MUG2-like C-terminal domain,,,PF08593.14,45.2991,52.7 SIF-Syn: Group 2: Any synteny? Orpham? /note=CP: good, all included; LO: no, but longest doesn`t add any coding potential and has a large overlap; Gap: 20; RBS: best scores; no conserved domains or transmembrane domains; no significant HHPred hits; /note= /note=AG: I believe we are supposed to check items off from HHPred and BLAST even if they are not significant to prove that we checked them. Nothing is checked off. CDS 54338 - 54619 /gene="78" /product="gp78" /function="Hypothetical Protein" /locus tag="Tonitrus_78" /note=Original Glimmer call @bp 54338 has strength 7.33; Genemark calls start at 54338 /note=SSC: 54338-54619 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein [Amycolatopsis sp. CA-230715] ],,NCBI, q10:s12 82.7957% 1.38343E-13 GAP: -20 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.925, -3.4175091270247986, yes F: Hypothetical Protein SIF-BLAST: ,,[hypothetical protein [Amycolatopsis sp. CA-230715] ],,WP_215547145,55.5556,1.38343E-13 SIF-HHPRED: DUF6083 ; Family of unknown function (DUF6083),,,PF19561.3,83.871,99.6 SIF-Syn: /note=CP: good, all included; LO: yes; Gap: -20; RBS: best scores; no conserved domains or transmembrane domains CDS 54622 - 55155 /gene="79" /product="gp79" /function="Hypothetical Protein" /locus tag="Tonitrus_79" /note=Original Glimmer call @bp 54622 has strength 15.06; Genemark calls start at 54622 /note=SSC: 54622-55155 CP: yes SCS: both ST: NA BLAST-Start: GAP: 2 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.925, -2.7653702713535186, yes F: Hypothetical Protein SIF-BLAST: SIF-HHPRED: DUF4290 ; Domain of unknown function (DUF4290),,,PF14123.10,71.7514,62.0 SIF-Syn: /note=Maintain currently selected start. Good coding capacity. Not the longest ORF; it has a good RBS score and a smaller gap of 2. No NBCI Blast results. No conserved and transmembrane domains. Function: Hypothetical protein. No good enough evidence to assign a specific function to it. This gene is also an orpham. No NBCI Blast results. /note= /note=AG: forgot to selected drop down for coding potential and startrator evidence. I believe we are supposed to check items off from HHPred and BLAST even if they are not significant to prove that we checked them. Nothing is checked off. CDS 55218 - 55484 /gene="80" /product="gp80" /function="Hypothetical Protein" /locus tag="Tonitrus_80" /note=Original Glimmer call @bp 55218 has strength 15.28; Genemark calls start at 55218 /note=SSC: 55218-55484 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein KNU75_gp036 [Mycobacterium phage Jeeves] ],,NCBI, q1:s1 95.4545% 0.00805376 GAP: 62 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.082, -2.583959800616441, yes F: Hypothetical Protein SIF-BLAST: ,,[hypothetical protein KNU75_gp036 [Mycobacterium phage Jeeves] ],,YP_010104383,45.0549,0.00805376 SIF-HHPRED: DUF3938 ; Protein of unknown function (DUF3938),,,PF13074.10,21.5909,25.5 SIF-Syn: No. It is quite an unlikely site in phages genome for a the tail protein. /note=The currently selected start is the best because it has the best RBS score, Gap (as opposed to the huge overlap in the first start), has good coding capacity. /note= /note=Phagesdb Blast lists phages with "no function". Blasts to Phage Jeevees with a hypothetical protein function(% identity was low, q1:t1). HHPred shows 7 hits with some different functions and all have very low probability (one of the hits was a protein of unknown function). /note= /note=RB: forgot to selected drop down for coding potential and startrator evidence. I believe we are supposed to check items off from HHPred and BLAST even if they are not significant to prove that we checked them. Nothing is checked off. Also, the start you selected is not the LO. CDS 55481 - 55687 /gene="81" /product="gp81" /function="membrane protein" /locus tag="Tonitrus_81" /note=Original Glimmer call @bp 55481 has strength 10.03; Genemark calls start at 55469 /note=SSC: 55481-55687 CP: no SCS: both-gl ST: NA BLAST-Start: GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.018, -4.629584883725491, no F: membrane protein SIF-BLAST: SIF-HHPRED: SIF-Syn: Orpham /note=The currently selected start is the best because it has the best RBS score, smaller overlap compared to the first two starts, has good coding capacity (whereas the first two do not). Longest possible ORF: No. /note= No conserved domain. Transmembrane domain: Domain match. /note=Since this gene has a domain match, it is assigned the function of a membrane protein. /note= /note=RDJ: if transmembrane domain identified, the function should be changed to membrane protein CDS 55689 - 56165 /gene="82" /product="gp82" /function="Hypothetical Protein" /locus tag="Tonitrus_82" /note=Original Glimmer call @bp 55689 has strength 7.92; Genemark calls start at 55689 /note=SSC: 55689-56165 CP: yes SCS: both ST: NI BLAST-Start: [MULTISPECIES: hypothetical protein [Rhodococcus] ],,NCBI, q6:s4 54.4304% 6.71738E-7 GAP: 1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.683, -3.7892872011995418, yes F: Hypothetical Protein SIF-BLAST: ,,[MULTISPECIES: hypothetical protein [Rhodococcus] ],,WP_094638090,45.8716,6.71738E-7 SIF-HHPRED: Protein lsr2; anti-parallel beta sheet, dimer, DNA BINDING PROTEIN; 1.728A {Mycobacterium tuberculosis},,,4E1P_A,22.7848,82.2 SIF-Syn: Orpham... /note=The currently selected start is the best. Although its RBS comes close to that of the first start listed above. but it has a better z-score and a smaller gap:1. /note=Couldn`t confirm the start from starterator because gene 82 is an orpham. /note=No transmembrane domain, no conserved domain. /note=Function: Hypoyhetical protein (evidence from NBCI blast result and its e-value). /note=*HHPred hit had a high e-value:6.9 /note=RJD: great notes! CDS 56162 - 56554 /gene="83" /product="gp83" /function="Hypothetical Protein" /locus tag="Tonitrus_83" /note=Original Glimmer call @bp 56162 has strength 12.29; Genemark calls start at 56162 /note=SSC: 56162-56554 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein [Rhodococcus zopfii]],,NCBI, q28:s112 39.2308% 5.24302E-9 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 0.851, -7.109811946098469, no F: Hypothetical Protein SIF-BLAST: ,,[hypothetical protein [Rhodococcus zopfii]],,MDV2477137,19.1617,5.24302E-9 SIF-HHPRED: SIF-Syn: Group 2: Orpham? /note=CP: good, all included; LO: no, but longest gave worse scores, a large overlap, and didn`t add any coding potential; Gap: -4; RBS: close to best score; no conserved domains or transmembrane domains; no significant HHPred hits CDS 56551 - 57285 /gene="84" /product="gp84" /function="Hypothetical Protein" /locus tag="Tonitrus_84" /note=Original Glimmer call @bp 56551 has strength 8.44; Genemark calls start at 56551 /note=SSC: 56551-57285 CP: yes SCS: both ST: SS BLAST-Start: GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.371, -3.90407005001345, yes F: Hypothetical Protein SIF-BLAST: SIF-HHPRED: SIF-Syn: Yes, these DNA binding domains are often found near the end of genomes. RDJ: great notes! maybe include what you are basing that on? /note=CP: good, all included; LO: no, but longest doesn`t add any coding potential; Gap: -4; RBS: close to best score; no conserved domains or transmembrane domains; HHPred: insignificant results CDS 57282 - 57599 /gene="85" /product="gp85" /function="Hypothetical Protein" /locus tag="Tonitrus_85" /note=Original Glimmer call @bp 57282 has strength 9.63; Genemark calls start at 57282 /note=SSC: 57282-57599 CP: yes SCS: both ST: NA BLAST-Start: GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.1, -4.460666092178802, no F: Hypothetical Protein SIF-BLAST: SIF-HHPRED: SIF-Syn: Group 2: orpham? /note=CP: good, all included; LO: no, but longest doesn`t add any coding potential and has a large overlap; Gap: -4; RBS: close to best score; no conserved domains or transmembrane domains; no significant HHPred hits; no NCBIblast hits /note= /note=RB: I believe we are supposed to check items off from HHPred and BLAST even if they are not significant to prove that we checked them. Nothing is checked off.