CDS 1634 - 1840 /gene="1" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_1" /note=Original Glimmer call @bp 1634 has strength 4.21; Genemark calls start at 1634 /note=SSC: Start = 1634, Stop = 1840. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.933 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 207 bp is the longest possible ORF. GAP: 0 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= . CDD= . /note=Great CP: Start: 1634 End: 1840. Great CP, longest ORF, best z and final score. Not pointed too by PhagesDB, NCBI Blast, Conserved Domain Database, and DeepTMHMM CDS 1900 - 2256 /gene="2" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_2" /note=Original Glimmer call @bp 1900 has strength 13.5; Genemark calls start at 1900 /note=SSC: Start = 1900, Stop = 2256. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.243 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 357 bp is not the longest possible ORF. GAP: 59 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 3, Function= function unknown, EValue= 1.0E-11. NCBIBLAST= . HHPRED= . CDD= . /note=Great CP: Start: 1900 End: 2256. Great CP, not longest ORF, best z and final score that keeps length and no overlap. Not pointed too by PhagesDB, HHPred, NCBI Blast, Conserved Domain Database, and DeepTMHMM. CDS 2339 - 2482 /gene="3" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_3" /note=Original Glimmer call @bp 2363 has strength 6.07; Genemark calls start at 2321 /note=SSC: Start = 2339, Stop = 2482. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.776 is not the highest start score. SCS: Start is not called by Glimmer and is not called by Genemark. LO: 144 bp is not the longest possible ORF. GAP: 82 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 4, Function= function unknown, EValue= 2.0E-12. NCBIBLAST= . HHPRED= . CDD= . /note=Terrible CP, not great final or z score: Start: 1634 End: 1840. changed to start: 2339 end: 2482. Not pointed too by PhagesDB, HHPred, NCBI Blast, Conserved Domain Database, and DeepTMHMM. Even with start change, still called too by nothing. Z score: 2.776. Final score: -3.769. CDS 2572 - 2733 /gene="4" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_4" /note=Original Glimmer call @bp 2572 has strength 5.94; Genemark calls start at 2572 /note=SSC: Start = 2572, Stop = 2733. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.254 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 162 bp is the longest possible ORF. GAP: 89 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= . CDD= . /note=Great CP: Start: 2572 End: 2733. Good CP, longest ORF, best z and final score. Not pointed too by PhagesDB, HHPred, NCBI Blast, Conserved Domain Database, and DeepTMHMM CDS 2736 - 2969 /gene="5" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_5" /note=Original Glimmer call @bp 2736 has strength 4.52; Genemark calls start at 2736 /note=SSC: Start = 2736, Stop = 2969. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.471 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 234 bp is the longest possible ORF. GAP: 2 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= . CDD= . /note=Great CP: Start: 2736 End: 2969. Good CP, longest ORF, best z and final score. Not pointed too by PhagesDB, HHPred, NCBI Blast, Conserved Domain Database, and DeepTMHMM CDS 2985 - 3578 /gene="6" /product="glycosylase" /function="glycosylase" /locus tag="Dorin_6" /note=Original Glimmer call @bp 2985 has strength 1.24; Genemark calls start at 2985 /note=SSC: Start = 2985, Stop = 3578. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.243 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 594 bp is the longest possible ORF. GAP: 15 bp. ST: SS=NA. F: glycosylase. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 5, Function= function unknown, EValue= 1.0E-111. NCBIBLAST= PhageName= hypothetical protein FH39_gp10 [Mycobacterium phage Phantastic] >gb|AHY27152.1| hypothetical protein PBI_PHANTASTIC_89 [Mycobacterium phage Phantastic], Coverage= 99.4924, SubjectRange= 1:191, QueryRange= 1:196, EValue= 3.93284E-53. HHPRED= . CDD= . /note=Great CP: Start: 2985 End: 3578. Great CP, longest ORF, best z and final score. NCBI Blast points to Hypothetical protein and glycosylase but hits in PhagesDB Blast only as Hypothetical Protein so it is called to be Hypothetical protein. Few hits in phagesdb to glycolase but xtremely low fequency. /note= /note=HHPred has some okay hits, called due to same hits in shagrat where gene was called. CDS 3657 - 4325 /gene="7" /product="Lsr2-like DNA bridging protein" /function="Lsr2-like DNA bridging protein" /locus tag="Dorin_7" /note=Original Glimmer call @bp 3657 has strength 11.31; Genemark calls start at 3657 /note=SSC: Start = 3657, Stop = 4325. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.427 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 669 bp is the longest possible ORF. GAP: 78 bp. ST: SS=NA. F: Lsr2-like DNA bridging protein. FS: PHDBLAST= PhageName= NiceHouse, ProteinNumber= 9, Function= function unknown, EValue= 1.0E-8. NCBIBLAST= . HHPRED= Accession= 2KNG_A, Description= Protein lsr2; DNA-binding domain, Immune response, DNA BINDING PROTEIN; NMR {Mycobacterium tuberculosis}, Probability= 95.8. Coverage= 15.3153, SubjectRange= 11:45, QueryRange= 11:119. CDD= Accession= pfam11774, Coverage= 31.0811, SubjectRange= 46:104, QueryRange= 46:115, EValue= 8.61863E-5. /note=Great CP: Start: 3657 End: 4325. Great CP, longest ORF, High z and final score, not the highest but close in both categories. Not pointed too by, HHPred, NCBI Blast, Conserved Domain Database, and DeepTMHMM. Hits by phages DB function but very low frequency, hit in phagesDB Bast to Nice House genes 9 and 290. /note= /note=Called due to convincing conserved domain matching with some okay HHPred hits. CDS 4392 - 4514 /gene="8" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_8" /note=Original Glimmer call @bp 4392 has strength 5.95; Genemark calls start at 4392 /note=SSC: Start = 4392, Stop = 4514. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.243 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 123 bp is the longest possible ORF. GAP: 66 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= . CDD= . /note=Great CP: Start: 4392 End: 4514. Great CP, longest ORF, best z and final score. Not pointed too by PhagesDB, HHPred, NCBI Blast, Conserved Domain Database, and DeepTMHMM. CDS complement (4835 - 5182) /gene="9" /product="membrane protein" /function="membrane protein" /locus tag="Dorin_9" /note=Original Glimmer call @bp 5182 has strength 6.18; Genemark calls start at 5182 /note=SSC: Start = 5182, Stop = 4835. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.353 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 348 bp is the longest possible ORF. GAP: 28 bp. ST: SS=NA. F: membrane protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= . CDD= . /note=Great CP: Start: 5182 End: 4835. Great CP, longest ORF, best z and final score. Not pointed too by PhagesDB, HHPred, NCBI Blast, Conserved Domain Database, and DeepTMHMM /note=Called membrane protein due to presence of a transmembrane domain on TMHMM CDS complement (5211 - 5894) /gene="10" /product="lysin B" /function="lysin B" /locus tag="Dorin_10" /note=Original Glimmer call @bp 5894 has strength 3.42; Genemark calls start at 5894 /note=SSC: Start = 5894, Stop = 5211. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.617 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 684 bp is the longest possible ORF. GAP: 30 bp. ST: SS=NA. F: lysin B. FS: PHDBLAST= PhageName= Weasels2, ProteinNumber= 31, Function= lysin B, EValue= 4.0E-32. NCBIBLAST= PhageName= lysin B [Rhodococcus phage Weasels2] >gb|AOZ63621.1| lysin B [Rhodococcus phage Weasels2], Coverage= 96.0352, SubjectRange= 1:218, QueryRange= 1:220, EValue= 1.96328E-36. HHPRED= Accession= 5W95_B, Description= Conserved membrane protein of uncharacterised function; PEG, Complex, HYDROLASE; HET: 1PE; 1.723A {Mycobacterium tuberculosis}, Probability= 99.9. Coverage= 98.6784, SubjectRange= 23:278, QueryRange= 23:226. CDD= . /note=Start: 5894 End: 5211. Great CP, longest ORF, best z and final score. PhagesDB frequency has some hits to Lysin B but are not very frequent. PhagesDB Blast has good hits for Lysin B. HHPred has good hits but for Serine Hydrolase. The NCBI Blast hits for Lysin B. There is some evidence for both Lysin B and Serine Hydrolase. Lysin B has two domains, peptidoglycan binding domain and a serine hydrolase domain. If it were to be called a serine hydrolase it would only have the serine hydrolase domain. Both in PECAAN and Phamerator shows that there are no domains present. Upon further investigation into the PhagesDB blast that hit for Lysin B and looking at Grayson, Weasels2, and Peregrin, which functions are Lysin B, they appeared to have no domain hits either. For this reason, I will be calling it Lysin B. CDS complement (5925 - 6131) /gene="11" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_11" /note=Original Glimmer call @bp 6131 has strength 4.08; Genemark calls start at 6131 /note=SSC: Start = 6131, Stop = 5925. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.438 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 207 bp is the longest possible ORF. GAP: -11 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= PhageName= hypothetical protein [Roseibium sp.] >gb|MBO6858359.1| hypothetical protein [Roseibium sp.], Coverage= 82.3529, SubjectRange= 4:59, QueryRange= 4:61, EValue= 0.0125468. HHPRED= . CDD= . /note=Great CP: Start: 6131 End: 5925. Great CP, longest ORF, best z and final score. PhagesDb function frequency points to lsr-2 like dna bridging protein, but very low frequency (36 and lower). Phages DB blast points to NiceHouse, function still unknown. No HHPred hits, NCBI points to nothing, Conserved domain values point to nothing. CDS complement (6121 - 6390) /gene="12" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_12" /note=Original Glimmer call @bp 6390 has strength 5.26; Genemark calls start at 6390 /note=SSC: Start = 6390, Stop = 6121. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.264 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 270 bp is not the longest possible ORF. GAP: -26 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= . CDD= . /note=Hits by nothing. CDS complement (6365 - 6604) /gene="13" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_13" /note=Original Glimmer call @bp 6604 has strength 6.98; Genemark calls start at 6604 /note=SSC: Start = 6604, Stop = 6365. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.528 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 240 bp is the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= PhageName= hypothetical protein BJD55_gp111 [Gordonia phage Yvonnetastic] >gb|AMS02655.1| hypothetical protein SEA_YVONNETASTIC_111 [Gordonia phage Yvonnetastic], Coverage= 92.4051, SubjectRange= 1:76, QueryRange= 1:73, EValue= 4.14308E-5. HHPRED= . CDD= . /note=Start: 6604 End: 6365. Great CP, longest ORF, best z and final score but contains a very small overlap. Not pointed too by PhagesDB, HHPred, NCBI Blast, Conserved Domain Database, and DeepTMHMM. Good CP. CDS complement (6601 - 6987) /gene="14" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_14" /note=Original Glimmer call @bp 6987 has strength 8.58; Genemark calls start at 6987 /note=SSC: Start = 6987, Stop = 6601. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.947 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 387 bp is not the longest possible ORF. GAP: -8 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= PhageName= hypothetical protein MM2B0307_0317 [Mycobacteroides abscessus subsp. bolletii 2B-0307], Coverage= 89.8438, SubjectRange= 8:120, QueryRange= 8:118, EValue= 5.10227E-12. HHPRED= . CDD= . /note=Start: 6987 End: 6601. The CP looks good for the suggested start but isn`t there for some of the overlap starts suggested above, not longest ORF, best z and final score. PhagesDb function frequency hit nothing. Phages DB blast had no hits. No HHPred hits, No NCBI hits, and no NCBI hits. CDS complement (6980 - 7294) /gene="15" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_15" /note=Original Glimmer call @bp 7225 has strength 2.74; Genemark calls start at 7294 /note=SSC: Start = 7294, Stop = 6980. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.576 is not the highest start score. SCS: Start is not called by Glimmer and is called by Genemark. LO: 315 bp is the longest possible ORF. GAP: -1 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= PhageName= hypothetical protein SEA_MCWOLFISH_74 [Mycobacterium phage McWolfish] >gb|AZF96509.1| hypothetical protein SEA_KHALEESI_75 [Mycobacterium phage Khaleesi] >gb|WNT45849.1| hypothetical protein SEA_PURDUEPETE_76 [Mycobacterium phage PurduePete], Coverage= 93.2692, SubjectRange= 52:153, QueryRange= 52:99, EValue= 2.39945E-35. HHPRED= . CDD= . /note=Not pointed too by PhagesDB, Phamerator, HHPred, Blast, no conserved domains. Longest ORF. Start: 7294, End:6980.Almost best Z and Final, very close. The Better scores make it way shorter. Good CP. CDS complement (7294 - 7491) /gene="16" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_16" /note=Original Glimmer call @bp 7491 has strength 4.47; Genemark calls start at 7491 /note=SSC: Start = 7491, Stop = 7294. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.7 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 198 bp is not the longest possible ORF. GAP: -14 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= . CDD= . /note=Start: 7491 End: 7294. The CP is good, not longest ORF, not the best z and final score but using the best would shorten the gene and not use all of the coding potential. PhagesDb function frequency hit nothing. Phages DB blast had no hits. No HHPred hits, No NCBI hits, and no NCBI hits. CDS complement (7478 - 7762) /gene="17" /product="membrane protein" /function="membrane protein" /locus tag="Dorin_17" /note=Original Glimmer call @bp 7762 has strength 6.78; Genemark calls start at 7762 /note=SSC: Start = 7762, Stop = 7478. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.796 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 285 bp is not the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: membrane protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= . CDD= . /note=Not pointed too by PhagesDB, Phamerator, HHPred, Blast, no conserved domains. Not Best Z and final, but better values change length significantly (way shorter). Start: 7762 End: 7478. Good CP. /note=Called membrane protein due to presence of a membrane domain on TMHMM, and checked with SOSUI as per the guidelines to confirm with another check when only one membrane domain is found. CDS complement (7759 - 8046) /gene="18" /product="membrane protein" /function="membrane protein" /locus tag="Dorin_18" /note=Original Glimmer call @bp 8046 has strength 5.17; Genemark calls start at 8046 /note=SSC: Start = 8046, Stop = 7759. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.189 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 288 bp is the longest possible ORF. GAP: 8 bp. ST: SS=NA. F: membrane protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= . CDD= . /note=Start: 8046 End: 7759. The CP looks good, longest ORF, not the best z and final score but it lengthens the gene. PhagesDb function frequency hit nothing. Phages DB blast had no hits. No HHPred hits, and no NCBI hits. /note=TMHMM showed membrane protein, check with sosui confirmed CDS complement (8055 - 8441) /gene="19" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_19" /note=Original Glimmer call @bp 8441 has strength 9.23; Genemark calls start at 8441 /note=SSC: Start = 8441, Stop = 8055. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.104 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 387 bp is not the longest possible ORF. GAP: -20 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Peregrin, ProteinNumber= 31, Function= function unknown, EValue= 2.0E-14. NCBIBLAST= PhageName= hypothetical protein PBI_GRAYSON_33 [Rhodococcus phage Grayson], Coverage= 98.4375, SubjectRange= 1:123, QueryRange= 1:126, EValue= 1.41757E-14. HHPRED= Accession= PF20215.2, Description= DUF6575 ; Family of unknown function (DUF6575), Probability= 99.9. Coverage= 96.875, SubjectRange= 5:120, QueryRange= 5:127. CDD= . /note=Start: 8441 End: 8055. The CP looks good, not longest ORF, best z and final score. Causes a small overlap but it lengthens the gene and has a way better z and final score than the next suggested start. PhagesDb function frequenc hit tape measure protein at 100 percent in subcluster CR1. However, Phages DB blast had hits that pointed to it being an unknown function and so did the HHPred hit and the NCBI hit. CDS complement (8422 - 8712) /gene="20" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_20" /note=Original Glimmer call @bp 8712 has strength 6.55; Genemark calls start at 8712 /note=SSC: Start = 8712, Stop = 8422. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.695 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 291 bp is the longest possible ORF. GAP: -23 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= . CDD= . /note=Not pointed too by PhagesDB, Phamerator, HHPred, Blast, no conserved domains. Good CP, longest ORF. Start: 8712, End: 8422. Best Z and final score. CDS complement (8690 - 9157) /gene="21" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_21" /note=Original Glimmer call @bp 9157 has strength 6.78; Genemark calls start at 9157 /note=SSC: Start = 9157, Stop = 8690. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.177 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 468 bp is the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= PhageName= hypothetical protein [Nocardia farcinica], Coverage= 95.4839, SubjectRange= 158:304, QueryRange= 158:149, EValue= 0.00286316. HHPRED= . CDD= . /note=Not pointed too by PhagesDB, Phamerator, HHPred, Blast, no conserved domains. Longest ORF, Good CP, Best z and final. CDS complement (9154 - 9384) /gene="22" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_22" /note=Original Glimmer call @bp 9384 has strength 4.51; Genemark calls start at 9384 /note=SSC: Start = 9384, Stop = 9154. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.341 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 231 bp is the longest possible ORF. GAP: 6 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Trina, ProteinNumber= 31, Function= function unknown, EValue= 2.0E-10. NCBIBLAST= PhageName= hypothetical protein FDI69_gp031 [Rhodococcus phage Trina] >gb|ASZ74848.1| hypothetical protein SEA_TRINA_31 [Rhodococcus phage Trina], Coverage= 93.4211, SubjectRange= 1:62, QueryRange= 1:71, EValue= 4.95371E-10. HHPRED= . CDD= . /note=Start: 9384 End: 9154. The CP looks good, longest ORF, best z and final score. PhagesDb function frequency hit nothing. Phages DB blast had hits to unknown function and so did NCBI blast. CDS complement (9391 - 9528) /gene="23" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_23" /note=Original Glimmer call @bp 9528 has strength 8.28; Genemark calls start at 9528 /note=SSC: Start = 9528, Stop = 9391. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.658 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 138 bp is the longest possible ORF. GAP: 18 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= . CDD= . /note=Not pointed too by PhagesDB, Phamerator, HHPred, Blast, no conserved domains. Good CP, Longest ORF, Best Z and final score. CDS complement (9547 - 9720) /gene="24" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_24" /note=Genemark calls start at 9720 /note=SSC: Start = 9720, Stop = 9547. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.027 is the highest start score. SCS: Start is not called by Glimmer and is called by Genemark. LO: 174 bp is the longest possible ORF. GAP: 102 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= NiceHouse, ProteinNumber= 35, Function= function unknown, EValue= 6.0E-8. NCBIBLAST= PhageName= hypothetical protein SEA_NICEHOUSE_35 [Rhodococcus phage NiceHouse], Coverage= 98.2456, SubjectRange= 1:60, QueryRange= 1:56, EValue= 5.44162E-8. HHPRED= . CDD= . /note=Start: 9720 End: 9547. The CP is interesting and doesn`t cover a ton but without it, it would leave a large gap, longest ORF, best z and final score. PhagesDb function frequency hit nothing. Phages DB blast had one hit to unknown function. No HHPred hits, NCBI had one hit to unknown function. CDS complement (9823 - 10932) /gene="25" /product="hydrolase" /function="hydrolase" /locus tag="Dorin_25" /note=Original Glimmer call @bp 10932 has strength 4.3; Genemark calls start at 10932 /note=SSC: Start = 10932, Stop = 9823. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.243 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 1110 bp is not the longest possible ORF. GAP: 78 bp. ST: SS=NA. F: hydrolase. FS: PHDBLAST= PhageName= NiceHouse, ProteinNumber= 37, Function= hydrolase, EValue= 3.0E-46. NCBIBLAST= PhageName= DUF4185 domain-containing protein [Labilithrix sp.], Coverage= 99.729, SubjectRange= 113:449, QueryRange= 113:369, EValue= 3.6285E-84. HHPRED= Accession= 8HHV_D, Description= endo-alpha-D-arabinanase; D-arabinan, anomer-retaining glycoside hydrolase, HYDROLASE; HET: GOL; 1.6A {Microbacterium arabinogalactanolyticum}, Probability= 100.0. Coverage= 99.458, SubjectRange= 8:325, QueryRange= 8:369. CDD= . /note=HHPred and Phagesdbblast call it a hydrolase. NCBI blast calls it something else but identity and alignment is terrible. While it has a conserved domain, coverage and identity is terrible. Start is 10932, end: 9823. Reverse. CDS complement (11011 - 11739) /gene="26" /product="ThyX-like thymidylate synthase" /function="ThyX-like thymidylate synthase" /locus tag="Dorin_26" /note=Original Glimmer call @bp 11739 has strength 6.48; Genemark calls start at 11739 /note=SSC: Start = 11739, Stop = 11011. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.341 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 729 bp is the longest possible ORF. GAP: 7 bp. ST: SS=NA. F: ThyX-like thymidylate synthase. FS: PHDBLAST= PhageName= Peregrin, ProteinNumber= 41, Function= ThyX-like thymidylate synthase, EValue= 4.0E-76. NCBIBLAST= PhageName= ThyX-like thymidylate synthase [Rhodococcus phage Peregrin], Coverage= 98.7603, SubjectRange= 1:238, QueryRange= 1:239, EValue= 5.90518E-93. HHPRED= Accession= 6J61_B, Description= Flavin-dependent thymidylate synthase; Thymidylate synthase, pyrimidine nucleotide biosynthetic pathway, C-terminal domain, Structural Genomics, TRANSFERASE; HET: FAD, PO4; 2.5A {Thermus thermophilus (strain HB8 / ATCC 27634 / DSM 579)}, Probability= 100.0. Coverage= 95.4545, SubjectRange= 6:220, QueryRange= 6:239. CDD= Accession= pfam02511, Coverage= 84.7107, SubjectRange= 4:185, QueryRange= 4:228, EValue= 2.49367E-30. /note=Start: 11739 End: 11011. The CP is good, longest ORF, best z and final score. PhagesDb function frequency hit to thyx-like thymidylate synthase on many subclusters. Phages DB blast had hits to thyx-like thymidylate synthase as well. So did HHPred and NCBI blast. It also contained conserved domain of Thymidylate synthase. CDS complement (11747 - 12313) /gene="27" /product="glycosyltransferase" /function="glycosyltransferase" /locus tag="Dorin_27" /note=Original Glimmer call @bp 12292 has strength 6.66; Genemark calls start at 12313 /note=SSC: Start = 12313, Stop = 11747. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.898 is not the highest start score. SCS: Start is not called by Glimmer and is called by Genemark. LO: 567 bp is not the longest possible ORF. GAP: 105 bp. ST: SS=NA. F: glycosyltransferase. FS: PHDBLAST= PhageName= Grayson, ProteinNumber= 46, Function= function unknown, EValue= 6.0E-45. NCBIBLAST= PhageName= transglycosylase [Rhodococcus phage Weasels2] >gb|AOZ63636.1| transglycosylase-like domain protein [Rhodococcus phage Weasels2], Coverage= 98.4043, SubjectRange= 1:179, QueryRange= 1:185, EValue= 2.29852E-48. HHPRED= Accession= 4OW1_W, Description= Resuscitation-promoting factor RpfC; Resuscitation Promoting Factor, Peptidoglycan, Transglycosylase, HYDROLASE; 1.9A {Mycobacterium tuberculosis} SCOP: d.2.1.0, Probability= 99.2. Coverage= 43.0851, SubjectRange= 4:82, QueryRange= 4:110. CDD= Accession= pfam06737, Coverage= 38.2979, SubjectRange= 4:75, QueryRange= 4:101, EValue= 4.56842E-40. /note=Start: 12313 End: 11746. The CP is good, second longest ORF, second best z and final score. It is not the first start because the starterator calls it in some other phages and never calls the first start in all of the other phages. The second start was also listed in two other MA`s. Phages DB blast and the NCBI Blast point towards it being a hypothetical protein. The HHPred is very conflicting with each other and DeepTHMM isn`t enough to call it anything else. /note=Transglycosylase is the common denominator, called CDS complement (12419 - 12646) /gene="28" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_28" /note=Original Glimmer call @bp 12646 has strength 9.46; Genemark calls start at 12646 /note=SSC: Start = 12646, Stop = 12419. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.766 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 228 bp is not the longest possible ORF. GAP: 44 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= . CDD= . /note=Not pointed too by PhagesDB, Phamerator, HHPred, Blast, no conserved domains. CDS complement (12691 - 13467) /gene="29" /product="methyltransferase" /function="methyltransferase" /locus tag="Dorin_29" /note=Original Glimmer call @bp 13467 has strength 9.51; Genemark calls start at 13467 /note=SSC: Start = 13467, Stop = 12691. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.674 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 777 bp is the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: methyltransferase. FS: PHDBLAST= PhageName= Trina, ProteinNumber= 47, Function= methyltransferase, EValue= 8.0E-95. NCBIBLAST= PhageName= methyltransferase [Rhodococcus phage Trina] >gb|ASZ74864.1| methyltransferase [Rhodococcus phage Trina], Coverage= 98.4496, SubjectRange= 4:256, QueryRange= 4:258, EValue= 5.23738E-116. HHPRED= Accession= PF05050.16, Description= Methyltransf_21 ; Methyltransferase FkbM domain, Probability= 99.4. Coverage= 58.9147, SubjectRange= 1:169, QueryRange= 1:226. CDD= Accession= TIGR01444, Coverage= 50.7752, SubjectRange= 4:142, QueryRange= 4:204, EValue= 1.37119E-24. /note=Start: 13467 End: 12691. The CP is good, longest ORF, almost best z score but best final score. PhagesDb function frequency hit to methyltransferase on many subclusters. Phages DB blast had hits to methyltransferase as well. So did HHPred and NCBI blast. It also contained conserved domain of methyltransferase. CDS complement (13464 - 13838) /gene="30" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_30" /note=Original Glimmer call @bp 13838 has strength 8.12; Genemark calls start at 13838 /note=SSC: Start = 13838, Stop = 13464. (Reverse). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 1.573 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 375 bp is not the longest possible ORF. GAP: -17 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 30, Function= function unknown, EValue= 2.0E-68. NCBIBLAST= PhageName= hypothetical protein PBI_GRAYSON_52 [Rhodococcus phage Grayson], Coverage= 93.5484, SubjectRange= 12:126, QueryRange= 12:124, EValue= 1.80712E-6. HHPRED= Accession= 2Q00_B, Description= Orf c02003 protein; P95883, NESG, SSO2109, Structural Genomics, PSI-2, Protein Structure Initiative, Northeast Structural Genomics Consortium, UNKNOWN FUNCTION; 2.4A {Sulfolobus solfataricus P2}, Probability= 83.1. Coverage= 25.0, SubjectRange= 89:120, QueryRange= 89:71. CDD= . /note=The currently selected start is the best because it has high coding potential. Smallest overlap. Contains all the coding capacity However, it does not have the longest ORF. /note= /note=No conserved domain. No transmembrane domain. /note=NBCI blasts to Gene 52 in Phage Grayson and Gene 51 in phage Peregrin of cluster CB. Both of these genes have function as hypothetical protein. CDS complement (13822 - 13935) /gene="31" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_31" /note=Original Glimmer call @bp 13935 has strength 6.76 /note=SSC: Start = 13935, Stop = 13822. (Reverse). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 2.745 is the highest start score. SCS: Start is called by Glimmer and is not called by Genemark. LO: 114 bp is not the longest possible ORF. GAP: 122 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 31, Function= function unknown, EValue= 4.0E-13. NCBIBLAST= . HHPRED= Accession= 4Z61_C, Description= Somatic embryogenesis receptor kinase 2; hormone receptor, complex, TRANSFERASE; HET: TYS, NAG; 2.75A {Daucus carota}, Probability= 81.1. Coverage= 91.8919, SubjectRange= 1:35, QueryRange= 1:35. CDD= . /note=The currently selected start is the best because it has good coding potential. Smallest gap. However, it does not have the longest ORF. /note= /note=No conserved domain. Has one transmembrane protein (domain match). /note=No NBCI blasts results. /note=Phagesdb Blast: good e value for Francesca with unknown function /note=HHPRED: no good hits, low probability, coverage and large e value CDS 14058 - 14216 /gene="32" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_32" /note=Original Glimmer call @bp 14058 has strength 2.14 /note=SSC: Start = 14058, Stop = 14216. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.275 is not the highest start score. SCS: Start is called by Glimmer and is not called by Genemark. LO: 159 bp is not the longest possible ORF. GAP: 122 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 32, Function= function unknown, EValue= 2.0E-21. NCBIBLAST= . HHPRED= Accession= 7R7C_b, Description= Nucleolar GTP-binding protein 1; ribosome biogenesis, DEAD-box ATPases, methyltransferase, nucleolus, RIBOSOME;{Saccharomyces cerevisiae BY4741}, Probability= 30.9. Coverage= 48.0769, SubjectRange= 14:37, QueryRange= 14:52. CDD= . /note=The currently selected start is the best because it has high coding potential. Smallest gap. Does contain all the coding capacity However, it does not have the longest ORF. /note= /note=No conserved domain. No transmembrane domain. /note=No NBCI Blast results. /note=HHPRED: no good hits, low probability, coverage and large e value CDS complement (14437 - 15081) /gene="33" /product="serine hydrolase" /function="serine hydrolase" /locus tag="Dorin_33" /note=Original Glimmer call @bp 15081 has strength 6.84; Genemark calls start at 15081 /note=SSC: Start = 15081, Stop = 14437. (Reverse). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 3.243 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 645 bp is the longest possible ORF. GAP: 119 bp. ST: SS=NA. F: serine hydrolase. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 33, Function= function unknown, EValue= 1.0E-128. NCBIBLAST= PhageName= cutinase family protein [Prescottella agglutinans] >gb|MDH6285068.1| cutinase [Prescottella agglutinans], Coverage= 94.3925, SubjectRange= 2:198, QueryRange= 2:206, EValue= 5.98719E-70. HHPRED= Accession= 7CW1_B, Description= Cutinase-like enzyme; cutinase-like enzyme, biodegradable plastic degrading enzyme, alpha/beta hydrolase fold, hydrolase; HET: CAD; 1.7A {Pseudozyma antarctica}, Probability= 99.8. Coverage= 75.7009, SubjectRange= 2:182, QueryRange= 2:198. CDD= . /note=The currently selected start is the best because it has high coding potential. Smallest gap. Contains all the coding capacity. It also has the longest ORF. /note=Although phagesdb blasts to some phages (eg. Jflix2 and Shagrat) with the serine hydrolase function, this gene cannot be assigned the serine hydrolase function because it does not have the serine hydrolase domain which is requirement for that function to be assigned. /note=HHPRED: Hits have good probability but the e-value is not the best. /note= /note=Serine hydrolase is best match, cutinase would not make sense in a phage but it is an enzyme that contains a similar active site to serine hydrolase so hits are likely that CDS 15201 - 15545 /gene="34" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_34" /note=Original Glimmer call @bp 15201 has strength 4.74; Genemark calls start at 15201 /note=SSC: Start = 15201, Stop = 15545. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.014 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 345 bp is the longest possible ORF. GAP: 119 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Trina, ProteinNumber= 55, Function= function unknown, EValue= 5.0E-21. NCBIBLAST= PhageName= hypothetical protein FDI69_gp055 [Rhodococcus phage Trina] >gb|ASZ74871.1| hypothetical protein SEA_TRINA_55 [Rhodococcus phage Trina], Coverage= 94.7368, SubjectRange= 3:110, QueryRange= 3:114, EValue= 1.38372E-25. HHPRED= . CDD= . /note=Hits by phagesDB/NCBI blast. All point to hypothetical proteins. Longest ORF, but not best Z or final score. Any change to better values would significantly shorten the gene. Start: 15201, End: 15545. CDS 15542 - 15817 /gene="35" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_35" /note=Original Glimmer call @bp 15542 has strength 10.32; Genemark calls start at 15542 /note=SSC: Start = 15542, Stop = 15817. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.849 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 276 bp is the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= . CDD= . /note=Start: 15542 End: 15817. The CP is good, longest ORF, best z and final score. No hits on Phages DB or NCBI Blast, HHPred or Conserved Domains. CDS 15814 - 16821 /gene="36" /product="glycosyltransferase" /function="glycosyltransferase" /locus tag="Dorin_36" /note=Original Glimmer call @bp 15814 has strength 6.29; Genemark calls start at 15814 /note=SSC: Start = 15814, Stop = 16821. (Forward). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 3.086 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 1008 bp is the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: glycosyltransferase. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 36, Function= function unknown, EValue= 0.0. NCBIBLAST= PhageName= glycosyltransferase [Rhodococcus phage NiceHouse], Coverage= 96.7164, SubjectRange= 1:323, QueryRange= 1:324, EValue= 2.28511E-130. HHPRED= Accession= cd03809, Description= GT4_MtfB-like; glycosyltransferases MtfB, WbpX, and similar proteins. This family is most closely related to the GT4 family of glycosyltransferases., Probability= 100.0. Coverage= 94.6269, SubjectRange= 1:361, QueryRange= 1:319. CDD= Accession= COG0438, Coverage= 72.8358, SubjectRange= 149:377, QueryRange= 149:326, EValue= 4.43126E-9. /note=Most annotated start. Called 100% of the time when present. Has the smallest overlap. Good coding potential. No transmembrane domain. /note=NBCI Blast results show a function of glycosyltansferase in Phage NiceHouse and Phage Trina. /note=Conserved domain: GT4_MtfB /note= HHPRED show good e-scores and 100 probability for GT4_MtfB which is closely related to the GT4 family of glycosyltransferase CDS 16985 - 18946 /gene="37" /product="ribonucleotide reductase" /function="ribonucleotide reductase" /locus tag="Dorin_37" /note=Original Glimmer call @bp 16985 has strength 8.96; Genemark calls start at 16985 /note=SSC: Start = 16985, Stop = 18946. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.341 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 1962 bp is not the longest possible ORF. GAP: 163 bp. ST: SS=NA. F: ribonucleotide reductase. FS: PHDBLAST= PhageName= Paradiddles, ProteinNumber= 38, Function= ribonucleotide reductase, EValue= 0.0. NCBIBLAST= PhageName= ribonucleoside-triphosphate reductase [bacterium], Coverage= 97.5498, SubjectRange= 5:642, QueryRange= 5:644, EValue= 0.0. HHPRED= Accession= cd01676, Description= RNR_II_monomer; Class II ribonucleotide reductase, monomeric form. Ribonucleotide reductase (RNR) catalyzes the reductive synthesis of deoxyribonucleotides from their corresponding ribonucleotides., Probability= 100.0. Coverage= 96.1715, SubjectRange= 1:658, QueryRange= 1:641. CDD= Accession= cd01676, Coverage= 90.8116, SubjectRange= 9:636, QueryRange= 9:613, EValue= 0.0. /note=Not the longest ORF nor the most annotated start, however, the start is called in 95.7% of genes in the pham and this start is also shared with the other member of the CG cluster, Francesca. The function of ribonucleotide reductase was called due to extremely high BLAST matches with several other phages, namely the majority of phages with the BE1 cluster. CDS 19049 - 19627 /gene="38" /product="HNH endonuclease" /function="HNH endonuclease" /locus tag="Dorin_38" /note=Original Glimmer call @bp 19007 has strength 4.36 /note=SSC: Start = 19049, Stop = 19627. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.606 is not the highest start score. SCS: Start is not called by Glimmer and is not called by Genemark. LO: 579 bp is not the longest possible ORF. GAP: 102 bp. ST: SS=NA. F: HNH endonuclease. FS: PHDBLAST= . NCBIBLAST= PhageName= HNH endonuclease [Fontibacillus phaseoli] >gb|RCX22946.1| HNH endonuclease [Fontibacillus phaseoli], Coverage= 82.8125, SubjectRange= 3:182, QueryRange= 3:160, EValue= 1.41231E-12. HHPRED= Accession= 3M7K_A, Description= restriction endonuclease PacI; HNH restriction endonuclease, beta-beta-alpha-metal active site, 8 base-pair rare cutter, HYDROLASE-DNA complex; HET: SO4; 1.92A {Pseudomonas alcaligenes}, Probability= 99.2. Coverage= 85.9375, SubjectRange= 1:132, QueryRange= 1:167. CDD= Accession= pfam14279, Coverage= 23.9583, SubjectRange= 1:47, QueryRange= 1:156, EValue= 3.72839E-6. /note=Start: 91658 End: 91795. The CP is good, longest ORF, best z score, best final score. No significant hits to anything, HHPred hits had high e-values and are insignificant, no DeepTmhmm hits or conserved domains /note= /note=Has HNN over 30 AA span, as per guidelines to call HNH CDS 19932 - 21698 /gene="39" /product="lysin A" /function="lysin A" /locus tag="Dorin_39" /note=Original Glimmer call @bp 19836 has strength 9.97; Genemark calls start at 19836 /note=SSC: Start = 19932, Stop = 21698. (Forward). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 1.488 is not the highest start score. SCS: Start is not called by Glimmer and is not called by Genemark. LO: 1767 bp is not the longest possible ORF. GAP: 304 bp. ST: SS=NA. F: lysin A. FS: PHDBLAST= PhageName= NiceHouse, ProteinNumber= 51, Function= lysin A, EValue= 1.0E-113. NCBIBLAST= PhageName= lysin A [Rhodococcus phage NiceHouse], Coverage= 61.5646, SubjectRange= 105:461, QueryRange= 105:363, EValue= 7.18345E-126. HHPRED= Accession= PF08924.15, Description= Rv2525c_GlyHyd-like ; Rv2525c-like, glycoside hydrolase-like domain, Probability= 99.7. Coverage= 30.102, SubjectRange= 1:184, QueryRange= 1:579. CDD= Accession= cd06418, Coverage= 31.4626, SubjectRange= 21:212, QueryRange= 21:580, EValue= 2.41544E-15. /note=The start does not contain all coding capacity due to the fact that the coding capacity extends beyond the first available start. The longest ORF was not chosen due to the fact that a start does not exist in that position. The protein has been named a lysin A due to significant hits with BLAST as well as meeting the requirements of also having a lysin B. CDS 22041 - 22766 /gene="40" /product="serine protease" /function="serine protease" /locus tag="Dorin_40" /note=Original Glimmer call @bp 22041 has strength 7.04; Genemark calls start at 22041 /note=SSC: Start = 22041, Stop = 22766. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.185 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 726 bp is not the longest possible ORF. GAP: 342 bp. ST: SS=NA. F: serine protease. FS: PHDBLAST= PhageName= Madraxi_Draft, ProteinNumber= 3, Function= function unknown, EValue= 9.0E-18. NCBIBLAST= PhageName= serine protease [Rhodococcus sp. T7] >gb|KAF0960058.1| hypothetical protein MLGJGCBP_06810 [Rhodococcus sp. T7], Coverage= 99.1701, SubjectRange= 1:223, QueryRange= 1:240, EValue= 1.14617E-40. HHPRED= Accession= SCOP_d3w94a_, Description= b.47.1.2 (A:) automated matches {Oryzias latipes [TaxId: 8090]} | CLASS: All beta proteins, FOLD: Trypsin-like serine proteases, SUPFAM: Trypsin-like serine proteases, FAM: Eukaryotic proteases, Probability= 99.7. Coverage= 75.1037, SubjectRange= 22:233, QueryRange= 22:230. CDD= . /note=Not most annotated start, called in 100% of genes with this start. Not longest ORF, but lengthening adds no coding potential. This protein has been called as a hypothetical protein as there is not enough evidence to suggest a specific known function. /note= /note=Many matches for serine protease CDS 22827 - 24338 /gene="41" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_41" /note=Original Glimmer call @bp 22827 has strength 8.18; Genemark calls start at 22827 /note=SSC: Start = 22827, Stop = 24338. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 0.059 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 1512 bp is the longest possible ORF. GAP: 60 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= NiceHouse, ProteinNumber= 55, Function= function unknown, EValue= 6.0E-83. NCBIBLAST= PhageName= hypothetical protein SEA_NICEHOUSE_55 [Rhodococcus phage NiceHouse], Coverage= 99.8012, SubjectRange= 1:502, QueryRange= 1:502, EValue= 1.62887E-88. HHPRED= Accession= 3IPF_A, Description= uncharacterized protein; Q251Q8_DESHY, NESG, DhR8c, Structural Genomics, PSI-2, Protein Structure Initiative, Northeast Structural Genomics Consortium, unknown function; 1.988A {Desulfitobacterium hafniense}, Probability= 37.9. Coverage= 6.75944, SubjectRange= 15:49, QueryRange= 15:177. CDD= . /note=Most annotated start, called 100% of the time when present (called in every gene in the pham). The protein has been called as a hypothetical protein due to a lack of information that suggests a specific known function. CDS 24362 - 24709 /gene="42" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_42" /note=Original Glimmer call @bp 24362 has strength 9.77; Genemark calls start at 24362 /note=SSC: Start = 24362, Stop = 24709. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.21 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 348 bp is the longest possible ORF. GAP: 23 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Weasels2, ProteinNumber= 76, Function= function unknown, EValue= 1.0E-11. NCBIBLAST= PhageName= hypothetical protein [Oxalobacteraceae bacterium], Coverage= 80.0, SubjectRange= 19:103, QueryRange= 19:108, EValue= 1.0116E-17. HHPRED= Accession= PF08861.14, Description= DUF1828 ; Domain of unknown function DUF1828, Probability= 93.3. Coverage= 70.4348, SubjectRange= 7:84, QueryRange= 7:100. CDD= Accession= PRK08560, Coverage= 42.6087, SubjectRange= 79:126, QueryRange= 79:98, EValue= 0.00466047. /note=Not most annotated start, but called 100% of the time when present. The protein has been called a hypothetical protein as there is not any information that seems to suggest a specific known function. /note= /note=Changing start would greatly decrease coding potential captured. CDS 24709 - 26421 /gene="43" /product="portal protein" /function="portal protein" /locus tag="Dorin_43" /note=Original Glimmer call @bp 24709 has strength 9.72; Genemark calls start at 24709 /note=SSC: Start = 24709, Stop = 26421. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.08 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 1713 bp is not the longest possible ORF. GAP: -1 bp. ST: SS=NA. F: portal protein. FS: PHDBLAST= PhageName= Trina, ProteinNumber= 69, Function= portal protein, EValue= 1.0E-157. NCBIBLAST= PhageName= portal protein [Rhodococcus phage Trina] >gb|ASZ74883.1| portal protein [Rhodococcus phage Trina], Coverage= 95.9649, SubjectRange= 5:562, QueryRange= 5:570, EValue= 0.0. HHPRED= Accession= 6TE9_A, Description= Phage portal protein, HK97 family; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus}, Probability= 100.0. Coverage= 66.1404, SubjectRange= 47:396, QueryRange= 47:468. CDD= Accession= TIGR01540, Coverage= 56.8421, SubjectRange= 2:306, QueryRange= 2:408, EValue= 2.80673E-28. /note=Not longest ORF, but chosen to reduce overlap. Not most annotated start, but called 100% of the time when present. The function has been called as a portal protein due to high synteny, BLAST, and HHPred hits. CDS 26479 - 27768 /gene="44" /product="capsid maturation protease" /function="capsid maturation protease" /locus tag="Dorin_44" /note=Original Glimmer call @bp 26479 has strength 6.29; Genemark calls start at 26479 /note=SSC: Start = 26479, Stop = 27768. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 0.926 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 1290 bp is the longest possible ORF. GAP: 57 bp. ST: SS=NA. F: capsid maturation protease. FS: PHDBLAST= PhageName= Trina, ProteinNumber= 71, Function= capsid maturation protease, EValue= 3.0E-74. NCBIBLAST= PhageName= head maturation protease [Rhodococcus phage Trina] >gb|ASZ74885.1| capsid maturation protease [Rhodococcus phage Trina], Coverage= 96.0373, SubjectRange= 17:434, QueryRange= 17:424, EValue= 1.19923E-79. HHPRED= . CDD= Accession= COG5271, Coverage= 39.627, SubjectRange= 3804:3949, QueryRange= 3804:414, EValue= 4.87296E-4. /note=Not most annotated start, but seen in NiceHouse, and Francesca. It is also the only start that includes all coding potential. While the BLAST hits and synteny seem to suggest a capsid maturation protease, but due to a lack of certain features in HHPred hits, it cannot be determined as such at the moment. /note= /note=Can call based off synteny and outstanding number of amazing phage hits from NCBI. I also reran the HHPred in the online software and found capsid maturation protease hits. CDS 27836 - 28846 /gene="45" /product="major capsid protein" /function="major capsid protein" /locus tag="Dorin_45" /note=Original Glimmer call @bp 27836 has strength 6.84; Genemark calls start at 27848 /note=SSC: Start = 27836, Stop = 28846. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.341 is the highest start score. SCS: Start is called by Glimmer and is not called by Genemark. LO: 1011 bp is the longest possible ORF. GAP: 67 bp. ST: SS=NA. F: major capsid protein. FS: PHDBLAST= PhageName= Trina, ProteinNumber= 72, Function= major capsid protein, EValue= 1.0E-111. NCBIBLAST= PhageName= virion structural protein [Rhodococcus phage Trina] >gb|ASZ74886.1| major capsid protein [Rhodococcus phage Trina], Coverage= 99.7024, SubjectRange= 3:333, QueryRange= 3:336, EValue= 1.65652E-139. HHPRED= Accession= 6TSU_A4, Description= Major capsid protein Rcc01687; "capsid", "jelly roll", "spike", "HK97", VIRUS; 3.42A {Rhodobacter capsulatus DE442}, Probability= 100.0. Coverage= 97.3214, SubjectRange= 86:385, QueryRange= 86:333. CDD= . /note=Most annotated start, called 100% of the time when present, and seen in 93 of 101 genes within the pham. Major capsid protein has been called for this gene as there are several very strong hits to it within HHPred and BLAST, as well as high synteny Weasels2, NiceHouse, and Trina who all call this gene a major capsid protein. CDS 28904 - 29332 /gene="46" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_46" /note=Original Glimmer call @bp 28904 has strength 8.52; Genemark calls start at 28904 /note=SSC: Start = 28904, Stop = 29332. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.415 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 429 bp is the longest possible ORF. GAP: 57 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Trina, ProteinNumber= 73, Function= function unknown, EValue= 4.0E-15. NCBIBLAST= PhageName= hypothetical protein FDI69_gp073 [Rhodococcus phage Trina] >gb|ASZ74887.1| hypothetical protein SEA_TRINA_73 [Rhodococcus phage Trina], Coverage= 95.0704, SubjectRange= 1:134, QueryRange= 1:142, EValue= 1.31919E-14. HHPRED= Accession= SCOP_d1e7la1, Description= a.140.4.1 (A:104-157) Recombination endonuclease VII, C-terminal and dimerization domains {Bacteriophage T4 [TaxId: 10665]} | CLASS: All alpha proteins, FOLD: LEM/SAP HeH motif, SUPFAM: Recombination endonuclease VII, C-terminal and dimerization domains, FAM: Recombination endonuclease VII, C-terminal and dimerization domains, Probability= 97.4. Coverage= 28.169, SubjectRange= 11:46, QueryRange= 11:49. CDD= . /note=Start is not most annotated but is the only start that contains all coding capacity. There is not enough evidence to suggest a known function for the protein. CDS 29329 - 30198 /gene="47" /product="head-to-tail adaptor" /function="head-to-tail adaptor" /locus tag="Dorin_47" /note=Original Glimmer call @bp 29329 has strength 9.5; Genemark calls start at 29329 /note=SSC: Start = 29329, Stop = 30198. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.941 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 870 bp is the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: head-to-tail adaptor. FS: PHDBLAST= PhageName= NiceHouse, ProteinNumber= 63, Function= head-to-tail adaptor, EValue= 1.0E-42. NCBIBLAST= PhageName= head-to-tail adaptor [Rhodococcus phage NiceHouse], Coverage= 98.6159, SubjectRange= 3:293, QueryRange= 3:289, EValue= 6.66631E-49. HHPRED= Accession= 7Z4W_l, Description= Head completion protein gp15; Bacteriophage, SPP1, Portal Protein, Head completion proteins, Connector Complex, DNA Channel, VIRAL PROTEIN; 2.7A {Bacillus subtilis}, Probability= 99.0. Coverage= 63.3218, SubjectRange= 1:102, QueryRange= 1:286. CDD= . /note=Not most annotated start, but called in all members of the CG cluster. This protein has been named a head-to-tail adaptor due to significant hits with phage NiceHouse as well as meeting the requirements of having HHPred hits to SPP1 15 and HK96 gp6. CDS 30199 - 30618 /gene="48" /product="head-to-tail stopper" /function="head-to-tail stopper" /locus tag="Dorin_48" /note=Original Glimmer call @bp 30199 has strength 8.55; Genemark calls start at 30199 /note=SSC: Start = 30199, Stop = 30618. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.557 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 420 bp is the longest possible ORF. GAP: 0 bp. ST: SS=NA. F: head-to-tail stopper. FS: PHDBLAST= PhageName= NiceHouse, ProteinNumber= 64, Function= head-to-tail stopper, EValue= 2.0E-30. NCBIBLAST= PhageName= head-to-tail stopper [Rhodococcus phage NiceHouse], Coverage= 99.2806, SubjectRange= 1:139, QueryRange= 1:138, EValue= 6.11846E-36. HHPRED= Accession= 6TE9_E, Description= Stopper protein Rcc01689; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus}, Probability= 93.3. Coverage= 58.2734, SubjectRange= 1:80, QueryRange= 1:86. CDD= . /note=Not most annotated start, however, when present it is called 100% of the time (Seen in Francesca, NiceHouse, and Trina). The call for a head-to-tail stopper protein was called due to hits with NiceHouse as well as meeting the criteria of having HHpred hits to Spp1 16. CDS 30618 - 31415 /gene="49" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_49" /note=Original Glimmer call @bp 30618 has strength 9.93; Genemark calls start at 30618 /note=SSC: Start = 30618, Stop = 31415. (Forward). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 2.509 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 798 bp is the longest possible ORF. GAP: -1 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 49, Function= function unknown, EValue= 1.0E-152. NCBIBLAST= PhageName= hypothetical protein FDI69_gp076 [Rhodococcus phage Trina] >gb|ASZ74890.1| hypothetical protein SEA_TRINA_76 [Rhodococcus phage Trina], Coverage= 98.8679, SubjectRange= 7:265, QueryRange= 7:265, EValue= 6.92112E-44. HHPRED= Accession= PF12685.11, Description= SpoIIIAH ; SpoIIIAH-like protein, Probability= 55.9. Coverage= 16.9811, SubjectRange= 106:153, QueryRange= 106:263. CDD= . /note=The currently selected selected start is the best because it has high coding potential. Smallest overlap. Has the longest ORF. Contains all the coding capacity. /note=No conserved domain. No transmembrane domain. /note=NBCI blasts to Gene 76 in Phage Nicehouse and Gene 65 in phage Trina of cluster CE. Both of these genes have function as hypothetical protein. CDS 31412 - 31951 /gene="50" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_50" /note=Original Glimmer call @bp 31412 has strength 8.43; Genemark calls start at 31412 /note=SSC: Start = 31412, Stop = 31951. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.386 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 540 bp is the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Dorin_Draft, ProteinNumber= 50, Function= function unknown, EValue= 1.0E-103. NCBIBLAST= PhageName= hypothetical protein SEA_NICEHOUSE_66 [Rhodococcus phage NiceHouse], Coverage= 89.3855, SubjectRange= 4:161, QueryRange= 4:162, EValue= 6.7011E-29. HHPRED= Accession= 6TE9_F, Description= Tail terminator protein Rcc01690; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus}, Probability= 47.6. Coverage= 27.3743, SubjectRange= 41:94, QueryRange= 41:107. CDD= . /note=The currently selected selected start is the best because it has high coding potential. Smallest overlap. Has the longest ORF. Contains all the coding capacity. /note=No conserved domain. No transmembrane domain. /note=NBCI blasts to Gene 66 in Phage Nicehouse and Gene 77 in phage Trina of cluster CE. Both of these genes have function as hypothetical protein. CDS 32041 - 32715 /gene="51" /product="major tail protein" /function="major tail protein" /locus tag="Dorin_51" /note=Original Glimmer call @bp 32041 has strength 10.85; Genemark calls start at 32041 /note=SSC: Start = 32041, Stop = 32715. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.341 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 675 bp is the longest possible ORF. GAP: 89 bp. ST: SS=NA. F: major tail protein. FS: PHDBLAST= PhageName= NiceHouse, ProteinNumber= 67, Function= major tail protein, EValue= 1.0E-51. NCBIBLAST= PhageName= major tail protein [Rhodococcus phage NiceHouse], Coverage= 98.6607, SubjectRange= 1:207, QueryRange= 1:221, EValue= 8.16349E-64. HHPRED= Accession= 6XGR_M, Description= YSD1_22 major tail protein; Bacteriophage tail, helical assembly, VIRAL PROTEIN; 3.5A {Bacteriophage sp.}, Probability= 99.0. Coverage= 92.4107, SubjectRange= 2:266, QueryRange= 2:208. CDD= . /note=Most annotated start, called 100% of the time when present. Due to high synteny with Nicehouse, Trina, and Bmoc, as well as BLAST hits, we have opted to call this protein a major tail protein. CDS 32756 - 33280 /gene="52" /product="tail assembly chaperone" /function="tail assembly chaperone" /locus tag="Dorin_52" /note=Original Glimmer call @bp 32756 has strength 11.41; Genemark calls start at 32783 /note=SSC: Start = 32756, Stop = 33280. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.021 is not the highest start score. SCS: Start is called by Glimmer and is not called by Genemark. LO: 525 bp is not the longest possible ORF. GAP: 40 bp. ST: SS=NA. F: tail assembly chaperone. FS: PHDBLAST= PhageName= Trina, ProteinNumber= 79, Function= tail assembly chaperone, EValue= 1.0E-25. NCBIBLAST= PhageName= hypothetical protein FDI69_gp079 [Rhodococcus phage Trina] >gb|ASZ74893.1| tail assembly chaperone [Rhodococcus phage Trina], Coverage= 91.3793, SubjectRange= 3:159, QueryRange= 3:172, EValue= 4.25761E-28. HHPRED= Accession= PF11836.12, Description= Phage_TAC_11 ; Phage tail tube protein, GTA-gp10, Probability= 97.1. Coverage= 77.0115, SubjectRange= 5:90, QueryRange= 5:156. CDD= . /note=Not most annotated start, but start called 100% of the time when present. Not longest ORF, but choosing LORF does not add any coding potential and thus is not necessary. This has a slippery sequence that needs to be added - the first region starts at 32756 and then slips at 33265 into the second region, which stops at 33600. /note= /note=Compared NiceHouse gene 69 (corresponding gene/function) with this gene to find location of slippery sequence. Based on comparison, slippery sequence is found at base 33267 (the fourth A in the GGGAAAA sequence). The conserved protein sequence leading up to frameshift was NDPGK. CDS 33606 - 40205 /gene="53" /product="tape measure protein" /function="tape measure protein" /locus tag="Dorin_53" /note=Original Glimmer call @bp 33606 has strength 7.64; Genemark calls start at 33606 /note=SSC: Start = 33606, Stop = 40205. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.617 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 6600 bp is the longest possible ORF. GAP: 325 bp. ST: SS=NA. F: tape measure protein. FS: PHDBLAST= PhageName= NiceHouse, ProteinNumber= 70, Function= tape measure protein, EValue= 0.0. NCBIBLAST= PhageName= tape measure protein [Rhodococcus phage NiceHouse], Coverage= 66.3938, SubjectRange= 859:2309, QueryRange= 859:2197, EValue= 2.07628E-154. HHPRED= Accession= 7ZHJ_g, Description= Pore-forming tail tip protein pb2; Bacteriophage, Siphophage, T5, baseplate, VIRAL PROTEIN; 3.53A {Escherichia phage T5}, Probability= 99.5. Coverage= 11.5507, SubjectRange= 42:311, QueryRange= 42:380. CDD= Accession= TIGR01760, Coverage= 15.0068, SubjectRange= 1:317, QueryRange= 1:499, EValue= 5.60138E-22. /note=Fantastic coding potential. Not most annotated start, but called 100% of time when present. Due to high synteny with NiceHouse and Trina, matches with several other phages, as well as length, we have opted to call this gene a tape measure protein. CDS 40198 - 40644 /gene="54" /product="minor tail protein" /function="minor tail protein" /locus tag="Dorin_54" /note=Original Glimmer call @bp 40198 has strength 8.52; Genemark calls start at 40198 /note=SSC: Start = 40198, Stop = 40644. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.185 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 447 bp is the longest possible ORF. GAP: -8 bp. ST: SS=NA. F: minor tail protein. FS: PHDBLAST= PhageName= NiceHouse, ProteinNumber= 71, Function= minor tail protein, EValue= 1.0E-29. NCBIBLAST= PhageName= minor tail protein [Rhodococcus phage NiceHouse], Coverage= 94.5946, SubjectRange= 3:135, QueryRange= 3:148, EValue= 8.62635E-34. HHPRED= Accession= PF20458.2, Description= DUF6711 ; Family of unknown function (DUF6711), Probability= 99.9. Coverage= 91.8919, SubjectRange= 4:134, QueryRange= 4:148. CDD= . /note=Not most annotated start, but when present called 100% of the time. We have opted to call this protein a minor tail protein due to high syntenic and e-value matches with other phages as well as several other adjacent genes seeming to also be minor tail proteins. CDS 40648 - 46005 /gene="55" /product="minor tail protein" /function="minor tail protein" /locus tag="Dorin_55" /note=Original Glimmer call @bp 40648 has strength 5.39; Genemark calls start at 40648 /note=SSC: Start = 40648, Stop = 46005. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.296 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 5358 bp is the longest possible ORF. GAP: 3 bp. ST: SS=NA. F: minor tail protein. FS: PHDBLAST= PhageName= Trina, ProteinNumber= 83, Function= minor tail protein, EValue= 0.0. NCBIBLAST= PhageName= minor tail protein [Rhodococcus phage Trina] >gb|ASZ74897.1| minor tail protein [Rhodococcus phage Trina], Coverage= 64.4818, SubjectRange= 285:1338, QueryRange= 285:1511, EValue= 0.0. HHPRED= . CDD= Accession= pfam05345, Coverage= 2.46499, SubjectRange= 1:49, QueryRange= 1:1686, EValue= 1.30129E-4. /note=Not most annotated start, called 100% of time when present. Due to extremely high synteny as well as several matches with e-values that are very small (some being 0.0), this protein has been deemed a minor tail protein. CDS 46034 - 46444 /gene="56" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_56" /note=Original Glimmer call @bp 46034 has strength 7.38; Genemark calls start at 46034 /note=SSC: Start = 46034, Stop = 46444. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.559 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 411 bp is the longest possible ORF. GAP: 28 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Trina, ProteinNumber= 84, Function= function unknown, EValue= 2.0E-10. NCBIBLAST= PhageName= hypothetical protein FDI69_gp084 [Rhodococcus phage Trina] >gb|ASZ74898.1| hypothetical protein SEA_TRINA_84 [Rhodococcus phage Trina], Coverage= 98.5294, SubjectRange= 2:111, QueryRange= 2:136, EValue= 2.86115E-8. HHPRED= Accession= 6RAO_H, Description= Afp9; Anti-feeding prophage, secretion system, AFP, contractile, VIRUS LIKE PARTICLE; 3.1A {Serratia entomophila}, Probability= 66.6. Coverage= 29.4118, SubjectRange= 89:128, QueryRange= 89:135. CDD= . /note=Start found in both Dorin and Francesca, not most annotated but only one phage has most annotated start in this pham. All signs point to a hypothetical protein (or rather a lack of signs point to a hypothetical protein). CDS 46441 - 47238 /gene="57" /product="minor tail protein" /function="minor tail protein" /locus tag="Dorin_57" /note=Original Glimmer call @bp 46441 has strength 7.55; Genemark calls start at 46441 /note=SSC: Start = 46441, Stop = 47238. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.861 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 798 bp is the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: minor tail protein. FS: PHDBLAST= PhageName= NiceHouse, ProteinNumber= 74, Function= minor tail protein, EValue= 8.0E-77. NCBIBLAST= PhageName= minor tail protein [Rhodococcus phage NiceHouse], Coverage= 100.0, SubjectRange= 1:264, QueryRange= 1:265, EValue= 1.68164E-95. HHPRED= . CDD= . /note=Start found in 71 of 98 genes in pham (72.4%). Due to high synteny and matches with phages NiceHouse and Trina, we have opted to call this protein a minor tail protein. CDS 47239 - 51780 /gene="58" /product="minor tail protein" /function="minor tail protein" /locus tag="Dorin_58" /note=Original Glimmer call @bp 47239 has strength 4.51; Genemark calls start at 47239 /note=SSC: Start = 47239, Stop = 51780. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.419 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 4542 bp is the longest possible ORF. GAP: 0 bp. ST: SS=NA. F: minor tail protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 58, Function= function unknown, EValue= 0.0. NCBIBLAST= PhageName= minor tail protein [Rhodococcus phage NiceHouse], Coverage= 99.6695, SubjectRange= 5:1472, QueryRange= 5:1512, EValue= 0.0. HHPRED= . CDD= Accession= smart00060, Coverage= 5.08923, SubjectRange= 1:78, QueryRange= 1:331, EValue= 1.81581E-4. /note=The selected start has good coding potential and the best z/final scores. HHPred suggests a variety of functions, however after looking at the approved function list, none fit the criteria of this gene. NCBI Blast suggests a minor tail protein. CDS 51799 - 52401 /gene="59" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_59" /note=Original Glimmer call @bp 51799 has strength 7.94; Genemark calls start at 51799 /note=SSC: Start = 51799, Stop = 52401. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.341 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 603 bp is the longest possible ORF. GAP: 18 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Dorin_Draft, ProteinNumber= 59, Function= function unknown, EValue= 1.0E-118. NCBIBLAST= . HHPRED= . CDD= . /note=The start has a good CP, gap, and RBS scores. There were varying hits in HHpred and no hits in NCBI Blast, suggesting this is a hypothetical protein. /note= /note=Change HHPred to hits to nothing. CDS 52434 - 52649 /gene="60" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_60" /note=Original Glimmer call @bp 52434 has strength 5.95; Genemark calls start at 52434 /note=SSC: Start = 52434, Stop = 52649. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.121 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 216 bp is the longest possible ORF. GAP: 32 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 60, Function= function unknown, EValue= 4.0E-25. NCBIBLAST= . HHPRED= . CDD= . /note=Selected start has the best z/final scores and coding potential is good. There are not significant e-value scores to support a function in HHPred. There were no results from NCBI and Deep TMHMM. This gene could potentially be deleted. /note= /note=Good CP, might want to look over this one again. CDS 52650 - 53081 /gene="61" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_61" /note=Original Glimmer call @bp 52650 has strength 5.6; Genemark calls start at 52650 /note=SSC: Start = 52650, Stop = 53081. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.678 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 432 bp is the longest possible ORF. GAP: 0 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 61, Function= function unknown, EValue= 4.0E-86. NCBIBLAST= PhageName= hypothetical protein FDI69_gp089 [Rhodococcus phage Trina] >gb|ASZ74903.1| hypothetical protein SEA_TRINA_89 [Rhodococcus phage Trina], Coverage= 99.3007, SubjectRange= 4:145, QueryRange= 4:143, EValue= 8.27157E-51. HHPRED= Accession= 7C96_A, Description= RxLR effector protein Avh6; Complex, Inhibitor, Self ubiquitination, negative regulatory of Plant immunity, IMMUNE SYSTEM; HET: MSE; 2.51A {Phytophthora sojae (strain P6497)}, Probability= 81.8. Coverage= 17.4825, SubjectRange= 1:26, QueryRange= 1:35. CDD= . /note=Went with original start, RBS values and CP were good. There were no hits for a specific function CDS 53149 - 53907 /gene="62" /product="metallophosphoesterase" /function="metallophosphoesterase" /locus tag="Dorin_62" /note=Original Glimmer call @bp 53149 has strength 6.28; Genemark calls start at 53149 /note=SSC: Start = 53149, Stop = 53907. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.409 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 759 bp is not the longest possible ORF. GAP: 67 bp. ST: SS=NA. F: metallophosphoesterase. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 62, Function= function unknown, EValue= 1.0E-151. NCBIBLAST= PhageName= metallophosphoesterase [Rhodococcus phage NiceHouse], Coverage= 99.2064, SubjectRange= 1:255, QueryRange= 1:250, EValue= 7.10729E-80. HHPRED= Accession= 2A22_B, Description= vacuolar protein sorting 29; VACUOLAR PROTEIN SORTING PROTEIN, ALPHA-BETA-BETA-ALPHA SANDWICH, Structural Genomics, Structural Genomics Consortium, SGC, PROTEIN TRANSPORT; 2.198A {Cryptosporidium parvum} SCOP: d.159.1.7, l.1.1.1, Probability= 99.8. Coverage= 93.6508, SubjectRange= 23:202, QueryRange= 23:238. CDD= . /note=We need to revisit this to check if there is a holin prior,might match a metallophosphatase function but need to be able to check some of the functions of earliers genes, so we will come back to it next class to check further. CDS 53910 - 54074 /gene="63" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_63" /note=Original Glimmer call @bp 53910 has strength 6.0; Genemark calls start at 53910 /note=SSC: Start = 53910, Stop = 54074. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.976 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 165 bp is the longest possible ORF. GAP: 2 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Dorin_Draft, ProteinNumber= 63, Function= function unknown, EValue= 1.0E-25. NCBIBLAST= . HHPRED= Accession= PF03672.17, Description= UPF0154 ; Uncharacterised protein family (UPF0154), Probability= 80.9. Coverage= 66.6667, SubjectRange= 9:43, QueryRange= 9:44. CDD= . /note=Selected start has the best z/final scores. There are not significant e-value scores to support a function in HHPred. There were no results from NCBI and Deep TMHMM. CDS 54064 - 54876 /gene="64" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_64" /note=Original Glimmer call @bp 54064 has strength 6.48; Genemark calls start at 54064 /note=SSC: Start = 54064, Stop = 54876. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.003 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 813 bp is not the longest possible ORF. GAP: -11 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 64, Function= function unknown, EValue= 1.0E-156. NCBIBLAST= PhageName= hypothetical protein SEA_NICEHOUSE_80 [Rhodococcus phage NiceHouse], Coverage= 96.2963, SubjectRange= 5:264, QueryRange= 5:263, EValue= 2.06764E-70. HHPRED= Accession= cd19437, Description= lipocalin_apoD-like; apolipoprotein D and similar proteins. Human apolipoprotein D (ApoD) is a small glycoprotein associated with high density lipoproteins (HDL) in plasma., Probability= 47.8. Coverage= 17.4074, SubjectRange= 107:153, QueryRange= 107:202. CDD= . /note=We think this start does have a larger overlap and we considered the other start at 54121 that would have similar scores but this wouldn`t cover all the coding potential. Most hits were hypothetical protein, or had to small e-values, there was one hit for phosphoesterase, but it was the only one and there are no guidelines on the function spreadsheet for how to call this one, we think this could be looked into again. /note= /note=MR: It says it matches with a gene on Trina but looking at the phamerator it is not called in either Trina or NiceHouse so I would go with no CDS 54891 - 55136 /gene="65" /product="membrane protein" /function="membrane protein" /locus tag="Dorin_65" /note=Original Glimmer call @bp 54891 has strength 8.7; Genemark calls start at 54873 /note=SSC: Start = 54891, Stop = 55136. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.104 is the highest start score. SCS: Start is called by Glimmer and is not called by Genemark. LO: 246 bp is not the longest possible ORF. GAP: 14 bp. ST: SS=NA. F: membrane protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 65, Function= function unknown, EValue= 2.0E-39. NCBIBLAST= PhageName= hypothetical protein [Gordonia alkanivorans] >gb|MDJ0010088.1| hypothetical protein [Gordonia alkanivorans] >gb|MDJ0495722.1| hypothetical protein [Gordonia alkanivorans], Coverage= 97.5309, SubjectRange= 1:77, QueryRange= 1:79, EValue= 5.25556E-10. HHPRED= Accession= PF04531.17, Description= Phage_holin_1 ; Bacteriophage holin, Probability= 87.7. Coverage= 71.6049, SubjectRange= 10:71, QueryRange= 10:70. CDD= . /note=Selected start has the best z/final scores as well as good CP. NCBI Blast suggests good e-scores and hits for hypothetical protein. No significant hits from HHPred. /note= /note=MR: membrane domain matches, membrane protein CDS 55174 - 55680 /gene="66" /product="helix-turn-helix DNA binding domain" /function="helix-turn-helix DNA binding domain" /locus tag="Dorin_66" /note=Original Glimmer call @bp 55207 has strength 2.05; Genemark calls start at 55216 /note=SSC: Start = 55174, Stop = 55680. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.002 is not the highest start score. SCS: Start is not called by Glimmer and is not called by Genemark. LO: 507 bp is the longest possible ORF. GAP: 37 bp. ST: SS=NA. F: helix-turn-helix DNA binding domain. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 66, Function= function unknown, EValue= 3.0E-87. NCBIBLAST= PhageName= helix-turn-helix DNA-binding domain protein [Rhodococcus phage NiceHouse], Coverage= 85.119, SubjectRange= 4:146, QueryRange= 4:154, EValue= 1.28576E-47. HHPRED= Accession= SCOP_d6hn7b1, Description= a.6.1.5 (B:1-69) automated matches {Escherichia phage [TaxId: 10710]} | CLASS: All alpha proteins, FOLD: Putative DNA-binding domain, SUPFAM: Putative DNA-binding domain, FAM: Terminase gpNU1 subunit domain, Probability= 98.6. Coverage= 37.5, SubjectRange= 2:57, QueryRange= 2:123. CDD= . /note=We changed the start from 55207 to 55174, which is an earlier start, and has better scores is the longest ORF, includes the coding potential and preserves the helix turn helix function. This had strong hits for dna-binding protein and upon looking at the HHPred results of the new protein sequence it continued to show the signs of helix turn helix. CDS 55677 - 55997 /gene="67" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_67" /note=Original Glimmer call @bp 55677 has strength 6.13; Genemark calls start at 55677 /note=SSC: Start = 55677, Stop = 55997. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.891 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 321 bp is the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 67, Function= function unknown, EValue= 1.0E-52. NCBIBLAST= PhageName= hypothetical protein FDI69_gp094 [Rhodococcus phage Trina] >gb|ASZ74908.1| hypothetical protein SEA_TRINA_94 [Rhodococcus phage Trina], Coverage= 86.7924, SubjectRange= 1:87, QueryRange= 1:92, EValue= 1.79388E-13. HHPRED= Accession= PF19619.3, Description= DUF6124 ; Family of unknown function (DUF6124), Probability= 86.3. Coverage= 55.6604, SubjectRange= 11:75, QueryRange= 11:70. CDD= . /note=Selected start has the best z/final scores. There are no significant hits in HHPRED. NCBI Blast result with good e-values towards a hypothetical protein. CDS 55990 - 56154 /gene="68" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_68" /note=Original Glimmer call @bp 55978 has strength 6.95; Genemark calls start at 55990 /note=SSC: Start = 55990, Stop = 56154. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.849 is not the highest start score. SCS: Start is not called by Glimmer and is called by Genemark. LO: 165 bp is not the longest possible ORF. GAP: -8 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 68, Function= function unknown, EValue= 1.0E-25. NCBIBLAST= . HHPRED= Accession= SCOP_d1unnc1, Description= d.240.1.1 (C:243-351) DNA polymerase IV {Escherichia coli [TaxId: 562]} | CLASS: Alpha and beta proteins (a+b), FOLD: Lesion bypass DNA polymerase (Y-family), little finger domain, SUPFAM: Lesion bypass DNA polymerase (Y-family), little finger domain, FAM: Lesion bypass DNA polymerase (Y-family), little finger domain, Probability= 46.2. Coverage= 70.3704, SubjectRange= 1:39, QueryRange= 1:51. CDD= . /note=We changed the start from 55978 to 55990 to lower the overlap, since it was considerable, and since this improved scores and did not affect the coding potential. Also, the start that we changed it too, is the start that is called in Francesca which was further evidence to support this change. Additionally, this did not affect the function as there were not strong hits for any particular function prior. CDS 56156 - 56491 /gene="69" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_69" /note=Original Glimmer call @bp 56156 has strength 9.84; Genemark calls start at 56156 /note=SSC: Start = 56156, Stop = 56491. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.429 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 336 bp is the longest possible ORF. GAP: 1 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 69, Function= function unknown, EValue= 2.0E-60. NCBIBLAST= . HHPRED= Accession= 4EAH_C, Description= Formin-like protein 3; ATP binding, cytoskeleton, formin, FMNL3, actin, PROTEIN BINDING; HET: ATP, ACT; 3.4A {Oryctolagus cuniculus}, Probability= 73.4. Coverage= 54.0541, SubjectRange= 329:389, QueryRange= 329:98. CDD= . /note=Selected start has the best z/final scores as well as good CP. HHPred did not result in significant hits. NCBI blast did not produce any hits. CDS 56475 - 57782 /gene="70" /product="DnaB-like dsDNA helicase" /function="DnaB-like dsDNA helicase" /locus tag="Dorin_70" /note=Original Glimmer call @bp 56475 has strength 9.93; Genemark calls start at 56475 /note=SSC: Start = 56475, Stop = 57782. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.851 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 1308 bp is the longest possible ORF. GAP: -17 bp. ST: SS=NA. F: DnaB-like dsDNA helicase. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 70, Function= function unknown, EValue= 0.0. NCBIBLAST= PhageName= DNA helicase [Rhodococcus phage Trina] >gb|ASZ74911.1| DNA helicase [Rhodococcus phage Trina], Coverage= 96.7816, SubjectRange= 1:422, QueryRange= 1:421, EValue= 0.0. HHPRED= Accession= 4ZC0_A, Description= Replicative DNA helicase; Helicase ATPase DNA replication, dodecamer, hydrolase; HET: TBR; 6.7A {Helicobacter pylori}, Probability= 100.0. Coverage= 94.4828, SubjectRange= 45:505, QueryRange= 45:413. CDD= Accession= COG0305, Coverage= 68.046, SubjectRange= 122:415, QueryRange= 122:390, EValue= 1.18909E-18. /note=Does not contain the most annotated start, although called 100% of the time when present. The selected start has good CP and has the best z/final scores. HHPred results had good e-scores and 100 probability for DNA helicase. Upon further research, DnaB-helicase was concluded due to conserved domains and synteny with phage NiceHouse. CDS 57795 - 58076 /gene="71" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_71" /note=Original Glimmer call @bp 57795 has strength 4.67; Genemark calls start at 57795 /note=SSC: Start = 57795, Stop = 58076. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.179 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 282 bp is the longest possible ORF. GAP: 12 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 71, Function= function unknown, EValue= 5.0E-50. NCBIBLAST= PhageName= hypothetical protein HWB75_gp192 [Streptomyces phage Annadreamy] >gb|AXG66175.1| hypothetical protein SEA_ANNADREAMY_54 [Streptomyces phage Annadreamy] >gb|QGH79387.1| hypothetical protein SEA_LIMPID_54 [Streptomyces phage Limpid], Coverage= 97.8495, SubjectRange= 1:90, QueryRange= 1:91, EValue= 1.83286E-12. HHPRED= Accession= 4F54_A, Description= Uncharacterized protein; PF13590 family protein, DUF4136, Structural Genomics, Joint Center for Structural Genomics, JCSG, Protein Structure Initiative, PSI-BIOLOGY; HET: SO4, MLY, MSE; 1.6A {Bacteroides thetaiotaomicron}, Probability= 85.3. Coverage= 55.914, SubjectRange= 30:89, QueryRange= 30:55. CDD= . /note=Start was not changed, had good scores, good coding potential, and most annotated start for the genes pham. /note=Evidence presented itself as hypothetical protein CDS 58073 - 59086 /gene="72" /product="DNA primase" /function="DNA primase" /locus tag="Dorin_72" /note=Original Glimmer call @bp 58073 has strength 5.78; Genemark calls start at 58073 /note=SSC: Start = 58073, Stop = 59086. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.64 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 1014 bp is the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: DNA primase. FS: PHDBLAST= PhageName= Weasels2, ProteinNumber= 99, Function= DNA primase, EValue= 2.0E-77. NCBIBLAST= PhageName= DNA primase [Rhodococcus phage Weasels2] >gb|AOZ63688.1| DNA primase [Rhodococcus phage Weasels2], Coverage= 97.9229, SubjectRange= 1:327, QueryRange= 1:337, EValue= 2.95772E-94. HHPRED= Accession= 2AU3_A, Description= DNA primase; Zinc Ribbon, TOPRIM, RNA POLYMERASE, DNA REPLICATION, TRANSFERASE; 2.0A {Aquifex aeolicus}, Probability= 100.0. Coverage= 94.362, SubjectRange= 7:350, QueryRange= 7:335. CDD= Accession= TIGR01391, Coverage= 71.8101, SubjectRange= 37:329, QueryRange= 37:290, EValue= 4.2871E-23. /note=Strong hits for DNA primase on HHPred. Function list does not mention requirements to call its function. Selected start had decent scores, however if were to be changed, the gaps would become much larger. CP was good /note= /note=Changing start would reduce captured coding potential, good call. CDS 59139 - 59948 /gene="73" /product="SSB protein" /function="SSB protein" /locus tag="Dorin_73" /note=Original Glimmer call @bp 59139 has strength 7.66; Genemark calls start at 59139 /note=SSC: Start = 59139, Stop = 59948. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.243 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 810 bp is the longest possible ORF. GAP: 52 bp. ST: SS=NA. F: SSB protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 74, Function= function unknown, EValue= 1.0E-161. NCBIBLAST= PhageName= ERF family ssDNA binding protein [Rhodococcus phage NiceHouse], Coverage= 93.3085, SubjectRange= 5:236, QueryRange= 5:258, EValue= 7.91278E-42. HHPRED= Accession= 8GME_B, Description= Single-stranded DNA-binding protein; T4, gp32, Dda, complex, DNA BINDING PROTEIN-DNA complex; 4.98A {Tequatrovirus T4}, Probability= 99.9. Coverage= 80.6691, SubjectRange= 7:240, QueryRange= 7:228. CDD= . /note=Contains the most annotated start, but does not call it. Called 50% of the time when present. Coding potential is good. HHPred has high probability and e-values for single stranded binding protein. Function list claims this is the preferred term over SSDNA. CDS 59997 - 60470 /gene="74" /product="endonuclease VII" /function="endonuclease VII" /locus tag="Dorin_74" /note=Original Glimmer call @bp 59997 has strength 1.82; Genemark calls start at 59997 /note=SSC: Start = 59997, Stop = 60470. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 0.373 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 474 bp is the longest possible ORF. GAP: 48 bp. ST: SS=NA. F: endonuclease VII. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 75, Function= function unknown, EValue= 2.0E-91. NCBIBLAST= PhageName= endonuclease VII [Rhodococcus phage NiceHouse], Coverage= 82.8026, SubjectRange= 2:124, QueryRange= 2:134, EValue= 6.90078E-23. HHPRED= Accession= 3GOX_A, Description= Restriction endonuclease Hpy99I; ENDONUCLEASE-DNA COMPLEX, RESTRICTION ENZYME, HPY99I, PSEUDOPALINDROME, HYDROLASE-DNA COMPLEX; HET: 1PE; 1.5A {Helicobacter pylori}, Probability= 99.7. Coverage= 78.9809, SubjectRange= 70:199, QueryRange= 70:136. CDD= Accession= pfam02945, Coverage= 45.2229, SubjectRange= 10:81, QueryRange= 10:133, EValue= 3.09497E-9. /note=Both scores for each possible start were not great, however the given start`s gap was the least out of the two. We took a look at CP and if we were to change the start, the gene would be much smaller and gap would be bigger. Because of this, we decided to keep the suggested start despite poor scores. There are strong hits for endonuclease vii which match up with other annotated phages (NiceHouse). CDS 60424 - 64227 /gene="75" /product="DnaE-like DNA polymerase III (alpha)" /function="DnaE-like DNA polymerase III (alpha)" /locus tag="Dorin_75" /note=Original Glimmer call @bp 60424 has strength 5.85; Genemark calls start at 60424 /note=SSC: Start = 60424, Stop = 64227. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.185 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 3804 bp is the longest possible ORF. GAP: -47 bp. ST: SS=NA. F: DnaE-like DNA polymerase III (alpha). FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 76, Function= function unknown, EValue= 0.0. NCBIBLAST= PhageName= DnaE-like DNA polymerase III [Rhodococcus phage NiceHouse], Coverage= 99.2897, SubjectRange= 1:1250, QueryRange= 1:1261, EValue= 0.0. HHPRED= Accession= 2HPI_A, Description= DNA polymerase III alpha subunit; Pol-beta-like Nucleotidyltransferase fold, TRANSFERASE; 3.0A {Thermus aquaticus}, Probability= 100.0. Coverage= 98.5793, SubjectRange= 2:1109, QueryRange= 2:1250. CDD= Accession= COG0587, Coverage= 83.3465, SubjectRange= 3:876, QueryRange= 3:1060, EValue= 0.0. /note=Does not call most annotated start, but called 100 percent of the time when present. HHpred has good probability and NCBI Blast has good e-values. Both identifies function as a DNA-helicase. Upon further research in SEA-PHAGES, determined it as a dnaE, due to conserved domains containing multiple dnaE helicase domains. CDS 64241 - 64429 /gene="76" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_76" /note=Original Glimmer call @bp 64202 has strength 4.83; Genemark calls start at 64202 /note=SSC: Start = 64241, Stop = 64429. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.454 is not the highest start score. SCS: Start is not called by Glimmer and is not called by Genemark. LO: 189 bp is not the longest possible ORF. GAP: 13 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= . CDD= . /note=Start was changed to 64241 because it decreased the gap while maintaining relatively the same scores. CP is good; No hits in NCBI Blast, evidence presents itself as hypothetical protein. CDS 64426 - 64758 /gene="77" /product="MazG-like nucleotide pyrophosphohydrolase" /function="MazG-like nucleotide pyrophosphohydrolase" /locus tag="Dorin_77" /note=Original Glimmer call @bp 64426 has strength 10.45; Genemark calls start at 64426 /note=SSC: Start = 64426, Stop = 64758. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.196 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 333 bp is the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: MazG-like nucleotide pyrophosphohydrolase. FS: PHDBLAST= PhageName= Weasels2, ProteinNumber= 104, Function= nucleotide pyrophosphohydrolase, EValue= 3.0E-28. NCBIBLAST= PhageName= MazG-like pyrophosphatase [Rhodococcus phage Weasels2] >gb|AOZ63693.1| nucleotide pyrophosphohydrolase [Rhodococcus phage Weasels2], Coverage= 100.0, SubjectRange= 1:110, QueryRange= 1:110, EValue= 2.25111E-33. HHPRED= Accession= 2Q73_B, Description= Hypothetical protein; MazG, Vibrio, NTP-PPase, HYDROLASE; 1.8A {Vibrio sp. DAT722} SCOP: a.204.1.0, Probability= 99.4. Coverage= 90.0, SubjectRange= 1:97, QueryRange= 1:103. CDD= Accession= cd11541, Coverage= 81.8182, SubjectRange= 1:90, QueryRange= 1:97, EValue= 4.74281E-20. /note=The start looks good, has a small/reasonable overlap, is the longest ORF, has good coding potential, and really good scores. This gene has strong hits for the MazG-like nucleotide pyrophosphohydrolase in HHPred as well as the NCBI blast, and has good e-values for each respectively. Why not hit lists in phagesdb blast if they support your function? Make sure you have everything that hits function listed. CDS 64755 - 65762 /gene="78" /product="RecA-like DNA recombinase" /function="RecA-like DNA recombinase" /locus tag="Dorin_78" /note=Original Glimmer call @bp 64755 has strength 10.25; Genemark calls start at 64755 /note=SSC: Start = 64755, Stop = 65762. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.849 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 1008 bp is the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: RecA-like DNA recombinase. FS: PHDBLAST= PhageName= Moab, ProteinNumber= 65, Function= RecA-like DNA recombinase, EValue= 9.0E-68. NCBIBLAST= PhageName= RecA-like DNA recombinase [Streptomyces phage Gilson] >gb|QQV92432.1| RecA-like DNA recombinase [Streptomyces phage MeganTheeKilla] >gb|QZE11203.1| RecA-like DNA recombinase [Streptomyces phage Forrest] >gb|QZE11430.1| RecA-like DNA recombinase [Streptomyces phage Jada] >gb|URQ04679.1| RecA-like DNA recombinase [Streptomyces phage Emma1919] >gb|AZU97143.1| RecA-like DNA recombinase [Streptomyces phage Gilson], Coverage= 97.6119, SubjectRange= 2:335, QueryRange= 2:329, EValue= 2.21957E-80. HHPRED= Accession= 3HR8_A, Description= Protein recA; Alpha and beta proteins (a/b, a+b), ATP-binding, Cytoplasm, DNA damage, DNA recombination, DNA repair, DNA-binding, Nucleotide-binding; 1.95A {Thermotoga maritima}, Probability= 100.0. Coverage= 98.5075, SubjectRange= 10:340, QueryRange= 10:333. CDD= Accession= cd00983, Coverage= 82.3881, SubjectRange= 60:324, QueryRange= 60:328, EValue= 2.67779E-33. /note=Does not contain the most annotated start, but called 100% of the time when present. The selected start has the best z/final scores as well as good coding potential. HHPred and Blast results produced good hits for RecA recombinase which good probability and significant e-values. This function aligns with the one in the sea-phages approved function list. /note= /note=NCBI and conserved domains have hits, why not list them? CDS 65752 - 66084 /gene="79" /product="Holliday junction resolvase" /function="Holliday junction resolvase" /locus tag="Dorin_79" /note=Original Glimmer call @bp 65752 has strength 7.4; Genemark calls start at 65749 /note=SSC: Start = 65752, Stop = 66084. (Forward). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 2.543 is not the highest start score. SCS: Start is called by Glimmer and is not called by Genemark. LO: 333 bp is not the longest possible ORF. GAP: -11 bp. ST: SS=NA. F: Holliday junction resolvase. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 80, Function= function unknown, EValue= 3.0E-59. NCBIBLAST= PhageName= Holliday junction resolvase [Rhodococcus phage Weasels2] >gb|AOZ63695.1| holliday junction resolvase [Rhodococcus phage Weasels2], Coverage= 99.0909, SubjectRange= 1:109, QueryRange= 1:109, EValue= 4.12776E-37. HHPRED= Accession= 7BGS_A, Description= Holliday junction resolvase; archeal holliday junction resolvase helicase DNA binding enzyme phage 15-6 thermus thermophilus, RECOMBINATION; HET: SO4, MSE; 2.5A {Thermus thermophilus phage 15-6}, Probability= 99.5. Coverage= 90.0, SubjectRange= 5:145, QueryRange= 5:101. CDD= . /note=Does not have the most annotated start in starterator. Has the best RBS scores with a realistic gap. Has good CP. Has many hits for holliday junction resolvase in HHPred and NCBI BLAST. CDS 66081 - 66413 /gene="80" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_80" /note=Original Glimmer call @bp 66081 has strength 8.73; Genemark calls start at 66081 /note=SSC: Start = 66081, Stop = 66413. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.057 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 333 bp is the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 81, Function= function unknown, EValue= 7.0E-57. NCBIBLAST= PhageName= hypothetical protein FDI69_gp107 [Rhodococcus phage Trina] >gb|ASZ74921.1| hypothetical protein SEA_TRINA_107 [Rhodococcus phage Trina], Coverage= 96.3636, SubjectRange= 4:108, QueryRange= 4:110, EValue= 2.33876E-17. HHPRED= Accession= 6L81_C, Description= Gamma-tubulin complex component 5; gamma tubulin complex, microprotein, microtubule, TRANSLATION; 2.19650999049A {Homo sapiens}, Probability= 88.0. Coverage= 57.2727, SubjectRange= 22:81, QueryRange= 22:70. CDD= . /note=Does not contain most annotated start, called 100 percent of the time. There is a small gap, but if changed start would worsen the final score and z-score. HHPred results produce insignificant results to Gamma-tubulin complex with high e-values. Significant Blast results towards hypothetical protein. CDS 66468 - 67226 /gene="81" /product="Cas4 exonuclease" /function="Cas4 exonuclease" /locus tag="Dorin_81" /note=Original Glimmer call @bp 66468 has strength 8.51; Genemark calls start at 66468 /note=SSC: Start = 66468, Stop = 67226. (Forward). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 2.736 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 759 bp is the longest possible ORF. GAP: 54 bp. ST: SS=NA. F: Cas4 exonuclease. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 82, Function= function unknown, EValue= 1.0E-146. NCBIBLAST= PhageName= Dna2/Cas4 domain-containing protein [Alphaproteobacteria bacterium], Coverage= 92.4603, SubjectRange= 31:259, QueryRange= 31:248, EValue= 1.52673E-34. HHPRED= Accession= cd09637, Description= Cas4_I-A_I-B_I-C_I-D_II-B; CRISPR/Cas system-associated protein Cas4. CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) and associated Cas proteins comprise a system for heritable host defense by prokaryotic cells against phage and other foreign DNA., Probability= 99.8. Coverage= 78.1746, SubjectRange= 1:177, QueryRange= 1:235. CDD= Accession= PHA01622, Coverage= 79.7619, SubjectRange= 12:199, QueryRange= 12:235, EValue= 9.12202E-10. /note=Has good CP and RBS scores. The chosen start has the smallest gap out of the options, only found in Dorin and Francesca but called in both. Gene has a lot of hits for Cas4 and has conserved domains that align with this function. CDS 67223 - 67765 /gene="82" /product="RuvC-like resolvase" /function="RuvC-like resolvase" /locus tag="Dorin_82" /note=Original Glimmer call @bp 67223 has strength 7.9; Genemark calls start at 67223 /note=SSC: Start = 67223, Stop = 67765. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.749 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 543 bp is the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: RuvC-like resolvase. FS: PHDBLAST= PhageName= Trina, ProteinNumber= 109, Function= RuvC-like resolvase, EValue= 2.0E-38. NCBIBLAST= PhageName= RuvC-like resolvase [Rhodococcus phage Trina] >gb|ASZ74923.1| RuvC-like resolvase [Rhodococcus phage Trina], Coverage= 98.8889, SubjectRange= 2:179, QueryRange= 2:180, EValue= 1.25867E-46. HHPRED= Accession= SCOP_d6lw3a_, Description= c.55.3.6 (A:) automated matches {Pseudomonas aeruginosa [TaxId: 287]} | CLASS: Alpha and beta proteins (a/b), FOLD: Ribonuclease H-like motif, SUPFAM: Ribonuclease H-like, FAM: RuvC resolvase, Probability= 99.9. Coverage= 97.2222, SubjectRange= 1:152, QueryRange= 1:176. CDD= Accession= COG0817, Coverage= 93.3333, SubjectRange= 1:145, QueryRange= 1:172, EValue= 5.94865E-5. /note=Does not contain most annotated start, but called 66.7 percent of the time when present. There is a gap, but if changed start would worsen final score and z-score. HHPred results to RuvC resolvase and Holiday Junction. Blast results produced significant hits towards RuvC. Function call was RuvC due to synteny with NiceHouse. CDS 67762 - 67899 /gene="83" /product="helix-turn-helix DNA binding domain" /function="helix-turn-helix DNA binding domain" /locus tag="Dorin_83" /note=Original Glimmer call @bp 67762 has strength 3.06; Genemark calls start at 67762 /note=SSC: Start = 67762, Stop = 67899. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.912 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 138 bp is the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: helix-turn-helix DNA binding domain. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 84, Function= function unknown, EValue= 2.0E-20. NCBIBLAST= PhageName= helix-turn-helix DNA-binding domain protein [Rhodococcus phage NiceHouse], Coverage= 95.5556, SubjectRange= 5:47, QueryRange= 5:44, EValue= 7.8806E-7. HHPRED= Accession= PF11242.12, Description= DUF2774 ; Protein of unknown function (DUF2774), Probability= 96.3. Coverage= 77.7778, SubjectRange= 2:37, QueryRange= 2:41. CDD= . /note=This start has the best scores, not too much of an overlap, it doesn`t have the best coding potential, but we think based on the scores, and given the fact that it would not be improved with a different start, we think that the lesser coding potential can still be overlooked. We got a significant hit for helix-turn-helix and checked this with the protein sequence on HHPred and it did show the helix-turn-helix structure. CDS 67896 - 68411 /gene="84" /product="thymidylate kinase" /function="thymidylate kinase" /locus tag="Dorin_84" /note=Original Glimmer call @bp 67896 has strength 5.7; Genemark calls start at 67896 /note=SSC: Start = 67896, Stop = 68411. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.057 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 516 bp is the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: thymidylate kinase. FS: PHDBLAST= PhageName= NiceHouse, ProteinNumber= 99, Function= thymidylate kinase, EValue= 8.0E-24. NCBIBLAST= PhageName= thymidylate kinase [Rhodococcus phage NiceHouse], Coverage= 95.3216, SubjectRange= 3:168, QueryRange= 3:166, EValue= 2.94183E-26. HHPRED= Accession= 4YER_A, Description= ABC transporter ATP-binding protein; PF00005 family protein, P-loop containing nucleoside triphosphate hydrolases fold, Structural Genomics, Joint Center for Structural Genomics; HET: MSE, ADP; 2.35A {Thermotoga maritima}, Probability= 99.5. Coverage= 91.8129, SubjectRange= 31:206, QueryRange= 31:158. CDD= Accession= PRK06217, Coverage= 16.3743, SubjectRange= 1:28, QueryRange= 1:28, EValue= 1.02722E-5. /note=Does not contain most annotated start, but called 100 percent of the time when present. Considered changing start to improve final score, but would largen gap and decrease z-score. Significant hits for thymidylate kinase. Synteny with NiceHouse also supports this function. CDS 68392 - 68544 /gene="85" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_85" /note=Original Glimmer call @bp 68392 has strength 8.35; Genemark calls start at 68392 /note=SSC: Start = 68392, Stop = 68544. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.902 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 153 bp is the longest possible ORF. GAP: -20 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= PhageName= hypothetical protein SEA_NICEHOUSE_100 [Rhodococcus phage NiceHouse], Coverage= 90.0, SubjectRange= 1:45, QueryRange= 1:45, EValue= 2.41245E-13. HHPRED= Accession= PF06676.15, Description= DUF1178 ; Protein of unknown function (DUF1178), Probability= 99.2. Coverage= 80.0, SubjectRange= 1:55, QueryRange= 1:41. CDD= Accession= smart00834, Coverage= 68.0, SubjectRange= 1:41, QueryRange= 1:34, EValue= 0.00510992. /note=The start had good scores, although it has a large overlap, it is still the better option. We considered changing the start but it would have increased the gap significantly and shortened the gene considerably. It has no strong hits for any particular function. CDS 68544 - 68846 /gene="86" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_86" /note=Original Glimmer call @bp 68544 has strength 7.02; Genemark calls start at 68544 /note=SSC: Start = 68544, Stop = 68846. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.288 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 303 bp is not the longest possible ORF. GAP: -1 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= NiceHouse, ProteinNumber= 101, Function= function unknown, EValue= 7.0E-22. NCBIBLAST= PhageName= hypothetical protein SEA_NICEHOUSE_101 [Rhodococcus phage NiceHouse], Coverage= 90.0, SubjectRange= 2:91, QueryRange= 2:94, EValue= 1.43907E-24. HHPRED= Accession= 6A7K_B, Description= Tlr0636 protein; NADH dehydrogenase-like complex, NDH-1, cyclic electron flow (CEF), Ferredoxin, ELECTRON TRANSPORT; 1.9A {Thermosynechococcus elongatus (strain BP-1)}, Probability= 90.7. Coverage= 62.0, SubjectRange= 9:72, QueryRange= 9:99. CDD= . /note=Does not have the most annotated start, however called 100% of the time when present. Selected start has the best z/final scores. No significant hits resulted from HHPRED and NCBI Blast resulted in significant hits for hypothetical protein. CDS 68839 - 69357 /gene="87" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_87" /note=Original Glimmer call @bp 68839 has strength 11.64; Genemark calls start at 68839 /note=SSC: Start = 68839, Stop = 69357. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.185 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 519 bp is the longest possible ORF. GAP: -8 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 88, Function= function unknown, EValue= 6.0E-93. NCBIBLAST= PhageName= hypothetical protein FDI69_gp115 [Rhodococcus phage Trina] >gb|ASZ74929.1| hypothetical protein SEA_TRINA_115 [Rhodococcus phage Trina], Coverage= 96.5116, SubjectRange= 3:168, QueryRange= 3:170, EValue= 2.09986E-25. HHPRED= Accession= PF11242.12, Description= DUF2774 ; Protein of unknown function (DUF2774), Probability= 97.2. Coverage= 20.9302, SubjectRange= 4:40, QueryRange= 4:53. CDD= . /note=Has good coding potential, is the longest ORF, and has the best scores. It has no hits for any particular function. CDS 69354 - 71168 /gene="88" /product="terminase" /function="terminase" /locus tag="Dorin_88" /note=Original Glimmer call @bp 69354 has strength 5.66; Genemark calls start at 69354 /note=SSC: Start = 69354, Stop = 71168. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.959 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 1815 bp is the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: terminase. FS: PHDBLAST= PhageName= Trina, ProteinNumber= 116, Function= terminase, EValue= 1.0E-162. NCBIBLAST= PhageName= terminase [Rhodococcus phage Trina] >gb|ASZ74930.1| terminase [Rhodococcus phage Trina], Coverage= 99.6689, SubjectRange= 4:590, QueryRange= 4:604, EValue= 0.0. HHPRED= Accession= 6Z6D_A, Description= Terminase large subunit; genome packaging, bacteriophage, ATPase, nuclease, VIRAL PROTEIN; HET: BR; 2.2A {Enterobacteria phage HK97}, Probability= 100.0. Coverage= 88.0795, SubjectRange= 15:497, QueryRange= 15:552. CDD= Accession= COG4626, Coverage= 74.6689, SubjectRange= 60:464, QueryRange= 60:492, EValue= 1.63488E-11. /note=Does not contain most annotated start, but called 100 percent of the time when present. HHPred and NCBI Blast suggest terminase large subunit. Additionally, this fits the conserved domain and synteny with NiceHouse. /note=Terminase called due to no subunit being found in this genome and in the similar genomes with synteny of this region CDS 71168 - 71725 /gene="89" /product="HNH endonuclease" /function="HNH endonuclease" /locus tag="Dorin_89" /note=Original Glimmer call @bp 71168 has strength 3.77; Genemark calls start at 71168 /note=SSC: Start = 71168, Stop = 71725. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.189 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 558 bp is not the longest possible ORF. GAP: -1 bp. ST: SS=NA. F: HNH endonuclease. FS: PHDBLAST= PhageName= NiceHouse, ProteinNumber= 104, Function= HNH endonuclease, EValue= 5.0E-55. NCBIBLAST= PhageName= HNH endonuclease [Rhodococcus phage NiceHouse], Coverage= 87.5676, SubjectRange= 15:173, QueryRange= 15:170, EValue= 6.80725E-64. HHPRED= Accession= 7ENH_A, Description= CRISPR-associated endonuclease Cas9; Inhibitor, Complex, VIRAL PROTEIN; HET: NI; 2.097A {Staphylococcus aureus}, Probability= 97.8. Coverage= 36.2162, SubjectRange= 40:109, QueryRange= 40:80. CDD= Accession= pfam01844, Coverage= 27.5676, SubjectRange= 1:47, QueryRange= 1:77, EValue= 3.15133E-7. /note=Has the most annotated start but does not call it, that start would be the longest ORF, but the current start has less of an overlap so we have decided to keep that start. This gene has significant hits for HNH endonuclease, and contains the conserved domains for it, additionally, we saw at least one H-N-H stretch in the protein sequence within 30 bases which confirms this function. CDS 71727 - 72179 /gene="90" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_90" /note=Original Glimmer call @bp 71727 has strength 10.08; Genemark calls start at 71727 /note=SSC: Start = 71727, Stop = 72179. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.196 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 453 bp is not the longest possible ORF. GAP: 1 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 91, Function= function unknown, EValue= 6.0E-82. NCBIBLAST= PhageName= hypothetical protein SEA_NICEHOUSE_105 [Rhodococcus phage NiceHouse], Coverage= 90.0, SubjectRange= 8:142, QueryRange= 8:143, EValue= 1.19015E-10. HHPRED= Accession= PF17686.5, Description= DUF5534 ; Family of unknown function (DUF5534), Probability= 54.9. Coverage= 53.3333, SubjectRange= 68:160, QueryRange= 68:141. CDD= . /note=Chosen start has good CP and RBS scores and has good final/z-scores. No conserved domains were found and the gene did not have any significant hits for the function. tRNA 72254 - 72327 /gene="91" /product="tRNA-Glu(ttc)" /locus tag="DORIN_91" /note=tRNA-Glu(ttc) CDS 72328 - 72768 /gene="92" /product="HNH endonuclease" /function="HNH endonuclease" /locus tag="Dorin_92" /note=Original Glimmer call @bp 72268 has strength 3.67; Genemark calls start at 72268 /note=SSC: Start = 72328, Stop = 72768. (Forward). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 2.165 is not the highest start score. SCS: Start is not called by Glimmer and is not called by Genemark. LO: 441 bp is not the longest possible ORF. GAP: 148 bp. ST: SS=NA. F: HNH endonuclease. FS: PHDBLAST= . NCBIBLAST= . HHPRED= . CDD= . /note=Start was changed due to scores and also synteny with Francesca when looking for tRNAs; CP is good. Evidence points to HNH endonuclease, confirmed as per guidelines by finding H-N-H within a 30 amino acid span tRNA 72769 - 72843 /gene="93" /product="tRNA-Gly(gcc)" /locus tag="DORIN_93" /note=tRNA-Gly(gcc) tRNA 72908 - 72981 /gene="94" /product="tRNA-Glu(ctc)" /locus tag="DORIN_94" /note=tRNA-Glu(ctc) tRNA 73046 - 73117 /gene="95" /product="tRNA-Pro(ggg)" /locus tag="DORIN_95" /note=tRNA-Pro(ggg) tRNA 73130 - 73202 /gene="96" /product="tRNA-Gly(tcc)" /locus tag="DORIN_96" /note=tRNA-Gly(tcc) CDS 73233 - 73658 /gene="97" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_97" /note=Original Glimmer call @bp 73233 has strength 8.32; Genemark calls start at 73233 /note=SSC: Start = 73233, Stop = 73658. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.766 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 426 bp is not the longest possible ORF. GAP: 464 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 98, Function= function unknown, EValue= 4.0E-78. NCBIBLAST= PhageName= hypothetical protein [Mycolicibacterium fortuitum] >gb|MCV7141669.1| hypothetical protein [Mycolicibacterium fortuitum] >gb|MDV7195623.1| hypothetical protein [Mycolicibacterium fortuitum] >gb|MDV7209272.1| hypothetical protein [Mycolicibacterium fortuitum] >gb|MDV7231141.1| hypothetical protein [Mycolicibacterium fortuitum] >gb|MDV7262718.1| hypothetical protein [Mycolicibacterium fortuitum], Coverage= 64.539, SubjectRange= 15:103, QueryRange= 15:101, EValue= 3.11014E-6. HHPRED= Accession= PF09943.13, Description= DUF2175 ; Uncharacterized protein conserved in archaea (DUF2175), Probability= 78.2. Coverage= 34.0426, SubjectRange= 2:37, QueryRange= 2:93. CDD= . /note=The selected start is called 100% of the time when present and has good final/z-scores. Good coding potential that includes the selected start. HHPred yielded no significant hits. NCBI Blast produced good e-values declaring it as a hypothetical protein. CDS 73655 - 73822 /gene="98" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_98" /note=Original Glimmer call @bp 73655 has strength 3.49; Genemark calls start at 73655 /note=SSC: Start = 73655, Stop = 73822. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.913 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 168 bp is not the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 99, Function= function unknown, EValue= 3.0E-26. NCBIBLAST= . HHPRED= Accession= 7CI2_D, Description= MbCpf1; anti-CRISPR, CRISPR-Cas, AcrVA2, Cpf1, Cas12a, MbCpf1, MbCas12a, Acr, HYDROLASE; 2.8A {Moraxella bovoculi}, Probability= 19.6. Coverage= 23.6364, SubjectRange= 14:27, QueryRange= 14:25. CDD= . /note=Chosen start has good CP and the best RBS scores. Only got one hit for function in HHPred for CRISPER hydrolase, not significant. CDS 73815 - 74132 /gene="99" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_99" /note=Original Glimmer call @bp 73815 has strength 5.37; Genemark calls start at 73815 /note=SSC: Start = 73815, Stop = 74132. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.835 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 318 bp is not the longest possible ORF. GAP: -8 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 100, Function= function unknown, EValue= 1.0E-58. NCBIBLAST= PhageName= hypothetical protein GCM10025732_47720 [Glycomyces mayteni], Coverage= 97.1429, SubjectRange= 8:113, QueryRange= 8:103, EValue= 7.21498E-19. HHPRED= Accession= 6V1W_A, Description= DNA packaging protein; viral packaging motor, terminase, RNase H fold, VIRAL PROTEIN; NMR {Bacillus phage phi29}, Probability= 74.3. Coverage= 71.4286, SubjectRange= 29:111, QueryRange= 29:105. CDD= . /note=Does not have the most annotated start, but called 100% of the time when present. CP is good and contains the selected start. Z/final scores are good as well. HHPred yielded no significant hits. NCBI Blast produced good e-values declaring it as a hypothetical protein. CDS 74202 - 74468 /gene="100" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_100" /note=Original Glimmer call @bp 74208 has strength 5.24; Genemark calls start at 74208 /note=SSC: Start = 74202, Stop = 74468. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.243 is the highest start score. SCS: Start is not called by Glimmer and is not called by Genemark. LO: 267 bp is the longest possible ORF. GAP: 69 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 101, Function= function unknown, EValue= 4.0E-47. NCBIBLAST= PhageName= hypothetical protein [Candidatus Solirubrobacter pratensis], Coverage= 55.6818, SubjectRange= 28:76, QueryRange= 28:58, EValue= 1.25146E-9. HHPRED= Accession= 2LT4_A, Description= Transcriptional regulator, CarD family; CdnL, CarD, TRCF-RID, PF02559, RNA polymerase, TRANSCRIPTION; NMR {Myxococcus xanthus}, Probability= 88.2. Coverage= 27.2727, SubjectRange= 2:26, QueryRange= 2:25. CDD= . /note=The start was changed to increase the z/final scores. The gap was slightly increased, but the scores were significantly better. The selected start is called 100% of the time when present and shares this start with Francesca. The selected start has good CP with good z/final scores. HHPred yielded no significant hits. NCBI Blast yielded no values. tRNA 74523 - 74596 /gene="101" /product="tRNA-Asn(gtt)" /locus tag="DORIN_101" /note=tRNA-Asn(gtt) tRNA 74741 - 74811 /gene="102" /product="tRNA-Tyr(gta)" /locus tag="DORIN_102" /note=tRNA-Tyr(gta) tRNA 74935 - 75006 /gene="103" /product="tRNA-Trp(cca)" /locus tag="DORIN_103" /note=tRNA-Trp(cca) tRNA 75018 - 75089 /gene="104" /product="tRNA-Thr(cgt)" /locus tag="DORIN_104" /note=tRNA-Thr(cgt) tRNA 75122 - 75197 /gene="105" /product="tRNA-Leu(tag)" /locus tag="DORIN_105" /note=tRNA-Leu(tag) CDS 75225 - 75422 /gene="106" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_106" /note=Original Glimmer call @bp 75225 has strength 8.43; Genemark calls start at 75225 /note=SSC: Start = 75225, Stop = 75422. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.341 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 198 bp is the longest possible ORF. GAP: 756 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 107, Function= function unknown, EValue= 3.0E-30. NCBIBLAST= . HHPRED= Accession= 1SMP_I, Description= ERWINIA CHRYSANTHEMI INHIBITOR; COMPLEX (METALLOPROTEASE-INHIBITOR), COMPLEX (METALLOPROTEASE-INHIBITOR) complex; 2.3A {Serratia marcescens} SCOP: b.61.2.1, Probability= 90.0. Coverage= 36.9231, SubjectRange= 63:86, QueryRange= 63:56. CDD= . /note=The selected start is called 100% of the time when present and shares similar start with Francesca. Coding potential includes the start and has good final/z scores. HHPred yielded a bad hit for ERWINIA CHRYSANTHEMI INHIBITOR. NCBI Blast produced no values. tRNA 75431 - 75502 /gene="107" /product="tRNA-Met(cat)" /locus tag="DORIN_107" /note=tRNA-Met(cat) tRNA 75733 - 75806 /gene="108" /product="tRNA-Lys(ctt)" /locus tag="DORIN_108" /note=tRNA-Lys(ctt) tRNA 75997 - 76069 /gene="109" /product="tRNA-Lys(ttt)" /locus tag="DORIN_109" /note=tRNA-Lys(ttt) tRNA 76190 - 76262 /gene="110" /product="tRNA-Arg(cct)" /locus tag="DORIN_110" /note=tRNA-Arg(cct) CDS 76292 - 76540 /gene="111" /product="membrane protein" /function="membrane protein" /locus tag="Dorin_111" /note=Original Glimmer call @bp 76292 has strength 5.9; Genemark calls start at 76292 /note=SSC: Start = 76292, Stop = 76540. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.341 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 249 bp is not the longest possible ORF. GAP: 869 bp. ST: SS=NA. F: membrane protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 112, Function= function unknown, EValue= 1.0E-41. NCBIBLAST= . HHPRED= Accession= PF18181.5, Description= SLATT_1 ; SMODS and SLOG-associating 2TM effector domain 1, Probability= 83.1. Coverage= 67.0732, SubjectRange= 30:79, QueryRange= 30:58. CDD= . /note=This start has the best scores, it doesn`t have the best coding potential but it comes off of a region of really low to no coding potential except for one small gene, so in comparison it is still better than any alternative. It does not have strong hits for any particular function. TMHMM sows membrane domains. CDS 76540 - 76779 /gene="112" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_112" /note=Original Glimmer call @bp 76540 has strength 3.84; Genemark calls start at 76540 /note=SSC: Start = 76540, Stop = 76779. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.597 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 240 bp is the longest possible ORF. GAP: -1 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 113, Function= function unknown, EValue= 2.0E-43. NCBIBLAST= PhageName= hypothetical protein [Nocardioides sp. cx-169] >gb|MCD4535664.1| hypothetical protein [Nocardioides sp. cx-169], Coverage= 70.8861, SubjectRange= 6:58, QueryRange= 6:62, EValue= 0.0365569. HHPRED= Accession= PF12731.11, Description= Mating_N ; Mating-type protein beta 1, Probability= 82.6. Coverage= 59.4937, SubjectRange= 42:86, QueryRange= 42:75. CDD= . /note=Does not contain the most annotated start, however is called 100% of the time when present. Good CP and the selected start has good z/final scores. HHPred yielded no significant hits. NCBI Blast produced good e-values declaring it as a hypothetical protein. CDS 76776 - 77033 /gene="113" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_113" /note=Original Glimmer call @bp 76776 has strength 5.49; Genemark calls start at 76776 /note=SSC: Start = 76776, Stop = 77033. (Forward). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 3.196 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 258 bp is the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 114, Function= function unknown, EValue= 7.0E-44. NCBIBLAST= . HHPRED= Accession= 4WSF_A, Description= Serine/threonine-protein phosphatase 4 regulatory subunit 3; phosphatase EVH1 domain, signaling protein; 1.501A {Drosophila melanogaster}, Probability= 43.4. Coverage= 45.8824, SubjectRange= 18:56, QueryRange= 18:57. CDD= . /note=The selected start is called 100% of the time when present and shares this start with Francesca. The selected start has good CP with good z/final scores. HHPred yielded no significant hits. NCBI Blast yielded no values. tRNA 77054 - 77128 /gene="114" /product="tRNA-Gln(ctg)" /locus tag="DORIN_114" /note=tRNA-Gln(ctg) tRNA 77202 - 77275 /gene="115" /product="tRNA-His(gtg)" /locus tag="DORIN_115" /note=tRNA-His(gtg) CDS 77276 - 77656 /gene="116" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_116" /note=Original Glimmer call @bp 77276 has strength 6.79; Genemark calls start at 77276 /note=SSC: Start = 77276, Stop = 77656. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.784 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 381 bp is the longest possible ORF. GAP: 242 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 117, Function= function unknown, EValue= 3.0E-75. NCBIBLAST= . HHPRED= Accession= cd20471, Description= Tudor_Agenet_FMR1_rpt1; first Tudor-like Agenet domain found in synaptic functional regulator FMR1 and similar proteins., Probability= 68.2. Coverage= 24.6032, SubjectRange= 7:40, QueryRange= 7:53. CDD= . /note=Kept start. Best rbs scores that also cover all cp. /note=No good functional matches CDS 77649 - 78056 /gene="117" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_117" /note=Original Glimmer call @bp 77649 has strength 6.9; Genemark calls start at 77649 /note=SSC: Start = 77649, Stop = 78056. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.166 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 408 bp is the longest possible ORF. GAP: -8 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Dorin_Draft, ProteinNumber= 115, Function= function unknown, EValue= 9.0E-73. NCBIBLAST= . HHPRED= Accession= 1G2C_P, Description= FUSION PROTEIN (F); membrane fusion, pneumovirus, HRSV, Viral protein; 2.3A {Human respiratory syncytial virus}, Probability= 64.6. Coverage= 18.5185, SubjectRange= 7:32, QueryRange= 7:80. CDD= . /note=Kept start. -8 overlap but it`s the only start that covers all cp. Best rbs scores. /note=No good functional matches CDS 78150 - 78458 /gene="118" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_118" /note=Original Glimmer call @bp 78150 has strength 5.53; Genemark calls start at 78150 /note=SSC: Start = 78150, Stop = 78458. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.189 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 309 bp is the longest possible ORF. GAP: 93 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Peregrin, ProteinNumber= 123, Function= function unknown, EValue= 0.003. NCBIBLAST= . HHPRED= Accession= PF18145.5, Description= SAVED ; SMODS-associated and fused to various effectors sensor domain, Probability= 81.9. Coverage= 31.3725, SubjectRange= 158:191, QueryRange= 158:76. CDD= . /note=Kept start. Covers all cp and best rbs scores. /note=No clear function call. tRNA 78465 - 78558 /gene="119" /product="tRNA-Ser(gct)" /locus tag="DORIN_119" /note=tRNA-Ser(gct) CDS 78574 - 78813 /gene="120" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_120" /note=Original Glimmer call @bp 78559 has strength 7.38; Genemark calls start at 78559 /note=SSC: Start = 78574, Stop = 78813. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.136 is not the highest start score. SCS: Start is not called by Glimmer and is not called by Genemark. LO: 240 bp is not the longest possible ORF. GAP: 115 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= . CDD= . /note=Changed start. New start has much better rbs scores while still covering all cp. /note=No clear function call. (still no clear function call even with the original start) CDS 78886 - 80877 /gene="121" /product="rIIA-like protein" /function="rIIA-like protein" /locus tag="Dorin_121" /note=Original Glimmer call @bp 78886 has strength 4.6; Genemark calls start at 78886 /note=SSC: Start = 78886, Stop = 80877. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.264 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 1992 bp is the longest possible ORF. GAP: 72 bp. ST: SS=NA. F: rIIA-like protein. FS: PHDBLAST= PhageName= Trina, ProteinNumber= 156, Function= rIIA-like protein, EValue= 1.0E-75. NCBIBLAST= PhageName= RIIA lysis inhibitor [Rhodococcus phage Trina] >gb|ASZ74953.1| rIIA-like protein [Rhodococcus phage Trina], Coverage= 98.9442, SubjectRange= 1:637, QueryRange= 1:656, EValue= 2.26084E-81. HHPRED= . CDD= . /note=Kept start. Best rbs scores and is the only start that covers all cp. /note=No clear function call. Hit to riia-like protein, but the coverage and identity of the hit is too low to consider calling it. The next gene has low hits to riib-like protein but also aren`t strong enough evidence to call it. These two genes are back-to-back with their stop and start (gap of 0). Used separate HHpred blast run using the UniProt-SwissProt-viral database according to the approved function list and got a clear hit to rIIA-like protein. Called off of synteny with other phages that have rIIA and rIIB domains, as well as the evidence. CDS 80878 - 81957 /gene="122" /product="rIIB-like protein" /function="rIIB-like protein" /locus tag="Dorin_122" /note=Original Glimmer call @bp 80878 has strength 6.21; Genemark calls start at 80878 /note=SSC: Start = 80878, Stop = 81957. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.353 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 1080 bp is the longest possible ORF. GAP: 0 bp. ST: SS=NA. F: rIIB-like protein. FS: PHDBLAST= PhageName= Peregrin, ProteinNumber= 117, Function= rIIB-like protein, EValue= 5.0E-53. NCBIBLAST= PhageName= rIIB-like protein [Rhodococcus phage Peregrin], Coverage= 99.4429, SubjectRange= 1:360, QueryRange= 1:357, EValue= 9.97898E-59. HHPRED= Accession= PF19062.4, Description= DUF5758 ; Family of unknown function (DUF5758), Probability= 94.5. Coverage= 19.4986, SubjectRange= 1:98, QueryRange= 1:207. CDD= . /note=Kept start. Best rbs scores and any other start would lose cp coverage. /note=No clear function call. Look at previous gene (104) for it`s notes, they are very similar. This gene has more hits to riib-like protein than 104 has hits to riia-like gene protein. Three conserved domains that match with phamerator. /note=Used separate HHpred blast run using the UniProt-SwissProt-viral database according to the approved function list and got a clear hit to rIIB-like protein. Called off of evidence and synteny with rIIA-like protein in this and other phages. CDS 81972 - 82130 /gene="123" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_123" /note=Original Glimmer call @bp 81972 has strength 4.71; Genemark calls start at 81972 /note=SSC: Start = 81972, Stop = 82130. (Forward). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 3.341 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 159 bp is the longest possible ORF. GAP: 14 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= PhageName= hypothetical protein FDH04_gp122 [Rhodococcus phage Weasels2] >gb|AOZ63711.1| hypothetical protein SEA_WEASELS2_122 [Rhodococcus phage Weasels2], Coverage= 90.3846, SubjectRange= 2:48, QueryRange= 2:49, EValue= 3.247E-9. HHPRED= . CDD= . /note=Kept start. Best rbs scores and has the most cp coverage. (no earlier start to cover a tiny bit more cp that is cut off) /note=No clear function call CDS 82218 - 82379 /gene="124" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_124" /note=Original Glimmer call @bp 82218 has strength 9.87; Genemark calls start at 82218 /note=SSC: Start = 82218, Stop = 82379. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.104 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 162 bp is the longest possible ORF. GAP: 87 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= . CDD= . /note=Kept start. Covers all cp and has best rbs scores. /note=No clear function call. tRNA 82433 - 82504 /gene="125" /product="tRNA-Thr(tgt)" /locus tag="DORIN_125" /note=tRNA-Thr(tgt) tRNA 82601 - 82673 /gene="126" /product="tRNA-Thr(ggt)" /locus tag="DORIN_126" /note=tRNA-Thr(ggt) CDS 82694 - 83350 /gene="127" /product="DNA methyltransferase" /function="DNA methyltransferase" /locus tag="Dorin_127" /note=Original Glimmer call @bp 82694 has strength 6.3; Genemark calls start at 82694 /note=SSC: Start = 82694, Stop = 83350. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.236 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 657 bp is not the longest possible ORF. GAP: 314 bp. ST: SS=NA. F: DNA methyltransferase. FS: PHDBLAST= PhageName= Grayson, ProteinNumber= 224, Function= DNA methyltransferase, EValue= 2.0E-81. NCBIBLAST= PhageName= DNA methyltransferase [Rhodococcus phage Grayson], Coverage= 97.2477, SubjectRange= 2:213, QueryRange= 2:216, EValue= 7.52048E-86. HHPRED= Accession= SCOP_d3ubta_, Description= c.66.1.26 (A:) automated matches {Haemophilus aegyptius [TaxId: 197575]} | CLASS: Alpha and beta proteins (a/b), FOLD: S-adenosyl-L-methionine-dependent methyltransferases, SUPFAM: S-adenosyl-L-methionine-dependent methyltransferases, FAM: C5 cytosine-specific DNA methylase, DCM, Probability= 99.6. Coverage= 65.5963, SubjectRange= 1:160, QueryRange= 1:147. CDD= Accession= cd00315, Coverage= 57.7982, SubjectRange= 2:138, QueryRange= 2:130, EValue= 1.00132E-12. /note=Kept Start. Best rbs scores and covers all cp. Starterator supports this start as Dorin does not have the most annotated start. /note=All evidence with HHPred, NCBI, and with phagesdb blast points to a DNA Methyltransferase. /note= /note=*Don`t forget to mention the Conserved Domain hits, and check off the applicable ones - MCS* tRNA 83359 - 83430 /gene="128" /product="tRNA-Gln(ttg)" /locus tag="DORIN_128" /note=tRNA-Gln(ttg) CDS 83432 - 83785 /gene="129" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_129" /note=Original Glimmer call @bp 83432 has strength 2.84; Genemark calls start at 83432 /note=SSC: Start = 83432, Stop = 83785. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.292 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 354 bp is not the longest possible ORF. GAP: 81 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= PhageName= hypothetical protein [Enterococcus sp.], Coverage= 100.0, SubjectRange= 1:126, QueryRange= 1:117, EValue= 2.89458E-11. HHPRED= . CDD= . /note=Kept start. Best rbs scores and covers all cp. /note=No clear evidence for function, only some NCBI blast hits to hypothetical protein. CDS 83808 - 83945 /gene="130" /product="nothingburger" /function="nothingburger" /locus tag="Dorin_130" /note=Original Glimmer call @bp 83808 has strength 0.08 /note=SSC: Start = 83808, Stop = 83945. (Forward). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 1.319 is not the highest start score. SCS: Start is called by Glimmer and is not called by Genemark. LO: 138 bp is not the longest possible ORF. GAP: 22 bp. ST: SS=NA. F: nothingburger. FS: PHDBLAST= . NCBIBLAST= . HHPRED= . CDD= . /note=Kept start to avoid overlap. No cp, bad rbs scores, very poor glimmer score, orpham, tiny gene. /note=No clear function call. /note=Should probably delete this gene. tRNA 83953 - 84025 /gene="131" /product="tRNA-Cys(gca)" /locus tag="DORIN_131" /note=tRNA-Cys(gca) CDS 84037 - 84222 /gene="132" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_132" /note=Original Glimmer call @bp 84037 has strength 9.61; Genemark calls start at 84037 /note=SSC: Start = 84037, Stop = 84222. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.189 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 186 bp is not the longest possible ORF. GAP: 91 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= PhageName= hypothetical protein PBI_PEREGRIN_124 [Rhodococcus phage Peregrin], Coverage= 75.4098, SubjectRange= 6:51, QueryRange= 6:55, EValue= 1.55068E-5. HHPRED= Accession= SCOP_d1no1a1, Description= a.179.1.1 (A:2-67) Replisome organizer (g39p helicase loader/inhibitor protein) {Bacteriophage Spp1 [TaxId: 10724]} | CLASS: All alpha proteins, FOLD: Replisome organizer (g39p helicase loader/inhibitor protein), SUPFAM: Replisome organizer (g39p helicase loader/inhibitor protein), FAM: Replisome organizer (g39p helicase loader/inhibitor protein), Probability= 79.2. Coverage= 49.1803, SubjectRange= 36:66, QueryRange= 36:45. CDD= . /note=Kept start. Covers all cp and has best rbs scores. /note=No clear function call. CDS 84222 - 84497 /gene="133" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_133" /note=Original Glimmer call @bp 84222 has strength 1.06; Genemark calls start at 84222 /note=SSC: Start = 84222, Stop = 84497. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.439 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 276 bp is the longest possible ORF. GAP: -1 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= Accession= SCOP_d1eara1, Description= b.107.1.1 (A:1-74) Urease metallochaperone UreE, N-terminal domain {Bacillus pasteurii [TaxId: 1474]} | CLASS: All beta proteins, FOLD: Urease metallochaperone UreE, N-terminal domain, SUPFAM: Urease metallochaperone UreE, N-terminal domain, FAM: Urease metallochaperone UreE, N-terminal domain, Probability= 93.4. Coverage= 24.1758, SubjectRange= 52:74, QueryRange= 52:28. CDD= . /note=Kept start. Best rbs scores that cover all cp. -1 overlap. /note=No clear function call. Many hits to "Urease metallochaperone UreE" and evidence to support this function call but it`s not on the approved function list. tRNA 84507 - 84581 /gene="134" /product="tRNA-Ile(gat)" /locus tag="DORIN_134" /note=tRNA-Ile(gat) CDS 84608 - 84871 /gene="135" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_135" /note=Original Glimmer call @bp 84608 has strength 7.5; Genemark calls start at 84608 /note=SSC: Start = 84608, Stop = 84871. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.438 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 264 bp is not the longest possible ORF. GAP: 110 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 135, Function= function unknown, EValue= 1.0E-44. NCBIBLAST= . HHPRED= Accession= cd16377, Description= 23S_rRNA_IVP_like; 23S rRNA-intervening sequence protein and similar proteins. A family of functionally uncharacterized bacterial proteins, some of which are encoded by an atypically large intervening sequence present within some 23S rRNA genes., Probability= 87.4. Coverage= 49.4253, SubjectRange= 65:108, QueryRange= 65:48. CDD= . /note=Kept start. Best rbs scores that cover all cp. /note=No clear function call. tRNA 84881 - 84979 /gene="136" /product="tRNA-Ser(tga)" /locus tag="DORIN_136" /note=tRNA-Ser(tga) tRNA 85097 - 85169 /gene="137" /product="tRNA-Val(tac)" /locus tag="DORIN_137" /note=tRNA-Val(tac) tRNA 85171 - 85244 /gene="138" /product="tRNA-Met(cat)" /locus tag="DORIN_138" /note=tRNA-Met(cat) CDS 85348 - 85530 /gene="139" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_139" /note=Original Glimmer call @bp 85348 has strength 4.13; Genemark calls start at 85282 /note=SSC: Start = 85348, Stop = 85530. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.077 is not the highest start score. SCS: Start is called by Glimmer and is not called by Genemark. LO: 183 bp is not the longest possible ORF. GAP: 476 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= Accession= PF17968.5, Description= Tlr3_TMD ; Toll-like receptor 3 trans-membrane domain, Probability= 65.2. Coverage= 10.0, SubjectRange= 27:33, QueryRange= 27:52. CDD= . /note=Changed start for better rbs scores. No cp to cover and starterator isn`t helpful, but this start matches with Dorin`s draft start. /note=No clear function call. CDS 85531 - 86094 /gene="140" /product="HNH endonuclease" /function="HNH endonuclease" /locus tag="Dorin_140" /note=Original Glimmer call @bp 85531 has strength 7.05; Genemark calls start at 85531 /note=SSC: Start = 85531, Stop = 86094. (Forward). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 2.902 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 564 bp is the longest possible ORF. GAP: 0 bp. ST: SS=NA. F: HNH endonuclease. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 139, Function= function unknown, EValue= 1.0E-109. NCBIBLAST= PhageName= HNH endonuclease family protein [Streptomyces bambusae] >gb|MCB5166257.1| HNH endonuclease family protein [Streptomyces bambusae], Coverage= 88.2353, SubjectRange= 55:225, QueryRange= 55:187, EValue= 5.10247E-39. HHPRED= Accession= PF14410.10, Description= GH-E ; HNH/ENDO VII superfamily nuclease with conserved GHE residues, Probability= 94.5. Coverage= 35.2941, SubjectRange= 1:68, QueryRange= 1:141. CDD= Accession= pfam07510, Coverage= 51.8717, SubjectRange= 2:102, QueryRange= 2:149, EValue= 9.4304E-8. /note=Kept start. Good rbs scores, covers all cp, matches with Francesca on starterator. /note= /note=Some hits to HNH-endonuclease, as per the guidelines and linked papers, a HNN sequence was found within a 14 AA span, which is evidence of an HNH endonuclease CDS 86103 - 86528 /gene="141" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_141" /note=Original Glimmer call @bp 86103 has strength 8.85; Genemark calls start at 86103 /note=SSC: Start = 86103, Stop = 86528. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.438 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 426 bp is not the longest possible ORF. GAP: 8 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= Accession= 2FUP_A, Description= hypothetical protein PA3352; Structural genomics, Joint Center for Structural Genomics, JCSG, Protein Structure Initiative, PSI-2, biosynthetic protein; HET: MSE; 1.48A {Pseudomonas aeruginosa} SCOP: a.47.5.1, Probability= 60.9. Coverage= 31.2057, SubjectRange= 8:52, QueryRange= 8:135. CDD= . /note=Kept start. Best rbs scores that cover all cp. Starterator matches with Francesca /note=No clear function call. Some hits to serine integrase in phages db but the e-values aren`t good enough to use for evidence. CDS 86525 - 86815 /gene="142" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_142" /note=Genemark calls start at 86525 /note=SSC: Start = 86525, Stop = 86815. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.457 is the highest start score. SCS: Start is not called by Glimmer and is called by Genemark. LO: 291 bp is the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 141, Function= function unknown, EValue= 5.0E-52. NCBIBLAST= PhageName= hypothetical protein KNU44_gp112 [Mycobacterium phage CicholasNage] >gb|QBP29888.1| hypothetical protein SEQ_HALENA_114 [Mycobacterium phage Halena] >gb|QDK04108.1| hypothetical protein SEA_AVADAKEDAVRA_114 [Mycobacterium phage AvadaKedavra] >gb|QGJ93126.1| hypothetical protein SEA_ZARIA_116 [Mycobacterium phage Zaria] >gb|QWT30632.1| hypothetical protein SEA_ROSE5_115 [Mycobacterium phage Rose5] >gb|UEM46389.1| hypothetical protein SEA_ENCELADUS_111 [Mycobacterium phage Enceladus] >gb|WMI34701.1| hypothetical protein SEA_CALM_119 [Mycobacterium phage Calm], Coverage= 57.2917, SubjectRange= 1:55, QueryRange= 1:55, EValue= 8.39315E-5. HHPRED= Accession= 4IJJ_B, Description= Putative C4-type zinc finger protein, DksA/TraR family; DksA fold, transcription factor, RNA polymerase, disulfide bond, HYDROLASE; HET: SO4; 3.25A {Pseudomonas aeruginosa}, Probability= 87.4. Coverage= 83.3333, SubjectRange= 37:122, QueryRange= 37:84. CDD= . /note=Kept start. Covers all cp, better rbs scores, -4 overlap. /note=No clear function call. /note=Glimmer doesn`t seem to call this gene. /note=Assorted hits with decent probability to Zinc finger/binding protein, but terrible e-values so can`t be used as proper evidence. CDS complement (86741 - 86938) /gene="143" /product="nothing burger" /function="nothing burger" /locus tag="Dorin_143" /note=Original Glimmer call @bp 86938 has strength 5.15 /note=SSC: Start = 86938, Stop = 86741. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.749 is not the highest start score. SCS: Start is called by Glimmer and is not called by Genemark. LO: 198 bp is not the longest possible ORF. GAP: 30 bp. ST: SS=NA. F: nothing burger. FS: PHDBLAST= . NCBIBLAST= . HHPRED= . CDD= . /note=Kept start, okay rbs scores. /note=Odd reverse gene in the middle of forward genes, no clear function, overlaps with previous gene by 73bp, other starts would overlap significantly with following gene, and genemark doesn`t call it. This gene likely should be deleted. CDS 86969 - 87190 /gene="144" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_144" /note=Original Glimmer call @bp 86969 has strength 8.25; Genemark calls start at 86969 /note=SSC: Start = 86969, Stop = 87190. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.189 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 222 bp is the longest possible ORF. GAP: 30 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= Accession= PF11690.12, Description= DUF3287 ; Protein of unknown function (DUF3287), Probability= 74.8. Coverage= 76.7123, SubjectRange= 33:89, QueryRange= 33:58. CDD= . /note=Kept start, some cp is cut off because it is before this genes`s earliest start, okay rbs scores. /note=No clear function call. CDS 87528 - 87950 /gene="145" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_145" /note=Original Glimmer call @bp 87528 has strength 7.71; Genemark calls start at 87528 /note=SSC: Start = 87528, Stop = 87950. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.189 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 423 bp is the longest possible ORF. GAP: 337 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= PhageName= excalibur calcium-binding domain-containing protein [Streptomyces sp. DSM 42041] >gb|MDT0377277.1| excalibur calcium-binding domain-containing protein [Streptomyces sp. DSM 42041], Coverage= 92.1429, SubjectRange= 27:143, QueryRange= 27:137, EValue= 5.29689E-8. HHPRED= Accession= 5J8T_A, Description= Choline binding protein; Excalibur, Choline-binding Protein L, Pneumococcal Adhesion, hydrolase; HET: CA; NMR {Streptococcus pneumoniae}, Probability= 95.4. Coverage= 28.5714, SubjectRange= 6:47, QueryRange= 6:140. CDD= Accession= pfam05901, Coverage= 22.8571, SubjectRange= 4:36, QueryRange= 4:137, EValue= 5.74957E-11. /note=Kept start. Covers all cp, best rbs scores, starterator matches with Francesca. /note=No clear function call. /note=*Many hits to excalibur calcium-binding domain-containing protein, but it`s not an approved function* tRNA 87993 - 88066 /gene="146" /product="tRNA-Ala(tgc)" /locus tag="DORIN_146" /note=tRNA-Ala(tgc) tRNA 88100 - 88173 /gene="147" /product="tRNA-Asp(gtc)" /locus tag="DORIN_147" /note=tRNA-Asp(gtc) CDS 88197 - 88442 /gene="148" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_148" /note=Original Glimmer call @bp 88197 has strength 6.01; Genemark calls start at 88197 /note=SSC: Start = 88197, Stop = 88442. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.438 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 246 bp is the longest possible ORF. GAP: 246 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= PhageName= hypothetical protein PBI_GRAYSON_127 [Rhodococcus phage Grayson], Coverage= 97.5309, SubjectRange= 1:79, QueryRange= 1:79, EValue= 4.41848E-7. HHPRED= Accession= 4F98_A, Description= hypothetical protein; PF10976 family protein, DUF2790, Structural Genomics, Joint Center for Structural Genomics, JCSG, Protein Structure Initiative, PSI-BIOLOGY; HET: MSE; 1.26A {Pseudomonas aeruginosa}, Probability= 93.2. Coverage= 61.7284, SubjectRange= 17:59, QueryRange= 17:68. CDD= . /note=Kept start. Covers all cp, best rbs scores. /note=No clear function call. CDS 88443 - 88613 /gene="149" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_149" /note=Original Glimmer call @bp 88443 has strength 8.16; Genemark calls start at 88443 /note=SSC: Start = 88443, Stop = 88613. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.427 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 171 bp is the longest possible ORF. GAP: 0 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= Accession= PF18536.5, Description= DUF5623 ; Domain of unknown function (DUF5623), Probability= 65.5. Coverage= 60.7143, SubjectRange= 36:71, QueryRange= 36:40. CDD= . /note=Kept start. Perfect lineup with previous gene, going for better rbs scores would cut off cp. CDS 88606 - 88848 /gene="150" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_150" /note=Original Glimmer call @bp 88606 has strength 8.48; Genemark calls start at 88606 /note=SSC: Start = 88606, Stop = 88848. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.438 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 243 bp is the longest possible ORF. GAP: -8 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= PhageName= hypothetical protein PBI_PEREGRIN_127 [Rhodococcus phage Peregrin], Coverage= 81.25, SubjectRange= 6:69, QueryRange= 6:67, EValue= 6.18365E-7. HHPRED= . CDD= . /note=Kept start. Only other start covers almost no cp, but this start has a -8 gap (overlap). /note=No clear function call. CDS complement (88951 - 89091) /gene="151" /product="nothing burger" /function="nothing burger" /locus tag="Dorin_151" /note=Original Glimmer call @bp 89091 has strength 5.92; Genemark calls start at 89091 /note=SSC: Start = 89091, Stop = 88951. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.313 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 141 bp is the longest possible ORF. GAP: 501 bp. ST: SS=NA. F: nothing burger. FS: PHDBLAST= . NCBIBLAST= . HHPRED= . CDD= . /note=Kept start. No cp, odd reverse gene, orpham. Should probably delete this gene. /note=No clear function call. tRNA 89138 - 89210 /gene="152" /product="tRNA-Pro(tgg)" /locus tag="DORIN_152" /note=tRNA-Pro(tgg) tRNA 89453 - 89528 /gene="153" /product="tRNA-Phe(gaa)" /locus tag="DORIN_153" /note=tRNA-Phe(gaa) CDS 89593 - 90255 /gene="154" /product="HNH endonuclease" /function="HNH endonuclease" /locus tag="Dorin_154" /note=Original Glimmer call @bp 89389 has strength 3.48; Genemark calls start at 89593 /note=SSC: Start = 89593, Stop = 90255. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.003 is not the highest start score. SCS: Start is not called by Glimmer and is called by Genemark. LO: 663 bp is not the longest possible ORF. GAP: 501 bp. ST: SS=NA. F: HNH endonuclease. FS: PHDBLAST= . NCBIBLAST= PhageName= HNH endonuclease signature motif containing protein [uncultured Corynebacterium sp.], Coverage= 90.9091, SubjectRange= 26:223, QueryRange= 26:209, EValue= 1.6543E-38. HHPRED= Accession= PF19575.3, Description= HTH_58 ; Helix-turn-helix domain, Probability= 97.2. Coverage= 20.0, SubjectRange= 29:68, QueryRange= 29:45. CDD= . /note=Changed start. This start has better rbs scores, covers all cp, and is called by genemark. The suggested start doesn`t appear or appear properly on genemark`s graph. /note=Strong matches to HNH endonuclease, and found HNNH within a 22 AA span (although it does have 2 HKH domains in the portion of the gene that was cut off as well, interesting) CDS 90326 - 90640 /gene="155" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_155" /note=Original Glimmer call @bp 90326 has strength 7.4; Genemark calls start at 90326 /note=SSC: Start = 90326, Stop = 90640. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.427 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 315 bp is the longest possible ORF. GAP: 70 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= . CDD= . /note=good z score and will cover all CP /note=starterator calls start 2 but that does not have a good z score and will shorten length /note=no good HHPRED hits no NCBI hits CDS 90650 - 91006 /gene="156" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_156" /note=Original Glimmer call @bp 90650 has strength 5.31; Genemark calls start at 90650 /note=SSC: Start = 90650, Stop = 91006. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.756 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 357 bp is the longest possible ORF. GAP: 9 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 157, Function= function unknown, EValue= 1.0E-68. NCBIBLAST= PhageName= hypothetical protein FDI69_gp219 [Rhodococcus phage Trina] >gb|ASZ74967.1| hypothetical protein SEA_TRINA_183 [Rhodococcus phage Trina], Coverage= 52.5424, SubjectRange= 21:82, QueryRange= 21:75, EValue= 0.0024702. HHPRED= . CDD= . /note=Kept start. Best rbs scores and covers all cp. /note=No clear function call. CDS 90969 - 91235 /gene="157" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_157" /note=Original Glimmer call @bp 90969 has strength 7.19; Genemark calls start at 90975 /note=SSC: Start = 90969, Stop = 91235. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.543 is not the highest start score. SCS: Start is called by Glimmer and is not called by Genemark. LO: 267 bp is not the longest possible ORF. GAP: -38 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Dorin_Draft, ProteinNumber= 154, Function= function unknown, EValue= 2.0E-45. NCBIBLAST= PhageName= hypothetical protein [Prescottella equi], Coverage= 96.5909, SubjectRange= 13:95, QueryRange= 13:88, EValue= 7.03181E-13. HHPRED= . CDD= . /note=good z score and covers all CP /note=called 100% of times when present /note=good NCBI hits no good HHPRED hits CDS 91228 - 91365 /gene="158" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_158" /note=Original Glimmer call @bp 91228 has strength 0.05; Genemark calls start at 91228 /note=SSC: Start = 91228, Stop = 91365. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.596 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 138 bp is the longest possible ORF. GAP: -8 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 159, Function= function unknown, EValue= 5.0E-21. NCBIBLAST= . HHPRED= . CDD= . /note=Kept start. Best rbs scores and covers all cp, but -8 overlap. /note=No clear function call. CDS 91376 - 91681 /gene="159" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_159" /note=Original Glimmer call @bp 91376 has strength 7.18; Genemark calls start at 91376 /note=SSC: Start = 91376, Stop = 91681. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.353 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 306 bp is the longest possible ORF. GAP: 10 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 160, Function= function unknown, EValue= 3.0E-54. NCBIBLAST= . HHPRED= . CDD= . /note=decent z score but will cover all CP /note=called 100% of times when present /note=no good HHPRED hits and no NCBI hits tRNA 91819 - 91891 /gene="160" /product="tRNA-Arg(tct)" /locus tag="DORIN_160" /note=tRNA-Arg(tct) CDS 91948 - 92229 /gene="161" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_161" /note=Genemark calls start at 91948 /note=SSC: Start = 91948, Stop = 92229. (Forward). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 3.104 is the highest start score. SCS: Start is not called by Glimmer and is called by Genemark. LO: 282 bp is the longest possible ORF. GAP: 266 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 162, Function= function unknown, EValue= 8.0E-51. NCBIBLAST= PhageName= hypothetical protein [Clostridium perfringens], Coverage= 97.8495, SubjectRange= 2:89, QueryRange= 2:93, EValue= 7.65335E-18. HHPRED= . CDD= . /note=Start chosen because it is called 100% of the time, and contains the most coding potential of the starts. /note= /note= /note=PhagesDB and NCBI BLAST results suggest a shared hypothetical protein. CDS complement (91921 - 92232) /gene="162" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_162" /note=Original Glimmer call @bp 92232 has strength 4.76 /note=SSC: Start = 92232, Stop = 91921. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.495 is the highest start score. SCS: Start is called by Glimmer and is not called by Genemark. LO: 312 bp is not the longest possible ORF. GAP: 34 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Dorin_Draft, ProteinNumber= 158, Function= function unknown, EValue= 2.0E-53. NCBIBLAST= . HHPRED= . CDD= . /note=decent z score and will cover all CP /note=no starterator ORPHAM /note=no good HHPRED hits no NCBI hits CDS 92267 - 92443 /gene="163" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_163" /note=Original Glimmer call @bp 92267 has strength 6.7; Genemark calls start at 92267 /note=SSC: Start = 92267, Stop = 92443. (Forward). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 2.438 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 177 bp is not the longest possible ORF. GAP: 34 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 163, Function= function unknown, EValue= 3.0E-30. NCBIBLAST= . HHPRED= . CDD= . /note=Start chosen because it`s called 100% percent of the time, both in the CG cluster and in general, and covers most coding potential. /note= /note=Hypothetical protein is shared with fellow CG phage Francesca CDS 92545 - 92682 /gene="164" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_164" /note=Original Glimmer call @bp 92545 has strength 8.2; Genemark calls start at 92545 /note=SSC: Start = 92545, Stop = 92682. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.353 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 138 bp is the longest possible ORF. GAP: 101 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 164, Function= function unknown, EValue= 9.0E-20. NCBIBLAST= PhageName= hypothetical protein [Nocardia nova], Coverage= 68.8889, SubjectRange= 4:34, QueryRange= 4:38, EValue= 0.0395953. HHPRED= . CDD= . /note=not the best z score but other starts will shorten length /note=no good HHPRED hits decent NCBI hits /note=called 100% of times when present CDS 92694 - 92933 /gene="165" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_165" /note=Original Glimmer call @bp 92694 has strength 9.29; Genemark calls start at 92694 /note=SSC: Start = 92694, Stop = 92933. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.104 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 240 bp is not the longest possible ORF. GAP: 11 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 165, Function= function unknown, EValue= 1.0E-36. NCBIBLAST= . HHPRED= . CDD= . /note=Start chosen because it is called 100% of the time by both CG phages and covers all coding potential. /note= /note=PhagesDB results show a similar hypothetical protein in fellow CG phage Francesca. tRNA 93028 - 93104 /gene="166" /product="tRNA-Leu(taa)" /locus tag="DORIN_166" /note=tRNA-Leu(taa) CDS 93126 - 93341 /gene="167" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_167" /note=Original Glimmer call @bp 93126 has strength 9.05; Genemark calls start at 93126 /note=SSC: Start = 93126, Stop = 93341. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.276 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 216 bp is the longest possible ORF. GAP: 192 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Dorin_Draft, ProteinNumber= 164, Function= function unknown, EValue= 8.0E-34. NCBIBLAST= . HHPRED= . CDD= . /note=not the best z score but will cover all coding potential /note=starterator calls start 2 but start 1 will be a tiny bit longer with a slightly better final score /note=no good HHPRED hits and no NCBI hits CDS 93358 - 93870 /gene="168" /product="nucleotidyl transferase" /function="nucleotidyl transferase" /locus tag="Dorin_168" /note=Original Glimmer call @bp 93358 has strength 2.94; Genemark calls start at 93358 /note=SSC: Start = 93358, Stop = 93870. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.998 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 513 bp is the longest possible ORF. GAP: 16 bp. ST: SS=NA. F: nucleotidyl transferase. FS: PHDBLAST= PhageName= Peregrin, ProteinNumber= 241, Function= tRNA nucleotidyltransferase, EValue= 3.0E-34. NCBIBLAST= PhageName= nucleotidyltransferase [Rhodococcus phage Weasels2] >gb|AOZ63827.1| nucleotidyltransferase [Rhodococcus phage Weasels2], Coverage= 98.2353, SubjectRange= 2:167, QueryRange= 2:169, EValue= 4.63082E-40. HHPRED= Accession= 3C18_C, Description= Nucleotidyltransferase-like protein; ZP_00538802.1, nucleotidyltransferase-like protein, Structural Genomics, Joint Center for Structural Genomics, JCSG, Protein Structure Initiative, PSI-2, TRANSFERASE; HET: MSE, GOL; 1.9A {Exiguobacterium sibiricum}, Probability= 98.1. Coverage= 54.7059, SubjectRange= 18:111, QueryRange= 18:95. CDD= Accession= pfam10127, Coverage= 99.4118, SubjectRange= 18:205, QueryRange= 18:170, EValue= 8.64775E-4. /note=Start chosen because it`s the longest ORF, covers all coding potential, and is called by 100% of phages in the CG cluster. /note= /note=NCBI and PhagesDB BLAST evidence points to this being a nucleotidyltransferase. CDS 93872 - 94147 /gene="169" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_169" /note=Original Glimmer call @bp 93872 has strength 4.97; Genemark calls start at 93872 /note=SSC: Start = 93872, Stop = 94147. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.335 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 276 bp is not the longest possible ORF. GAP: 1 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Dorin_Draft, ProteinNumber= 166, Function= function unknown, EValue= 5.0E-52. NCBIBLAST= PhageName= hypothetical protein [Candidatus Methanoperedens sp.], Coverage= 80.2198, SubjectRange= 1:63, QueryRange= 1:73, EValue= 1.41286E-4. HHPRED= . CDD= . /note=good z score and will cover all coding potential /note=called 100% of times when start 10 present /note=decent NBCI hits to call function tRNA 94188 - 94261 /gene="170" /product="tRNA-Arg(acg)" /locus tag="DORIN_170" /note=tRNA-Arg(acg) tRNA 94386 - 94461 /gene="171" /product="tRNA-Leu(caa)" /locus tag="DORIN_171" /note=tRNA-Leu(caa) CDS 94489 - 94776 /gene="172" /product="thioredoxin" /function="thioredoxin" /locus tag="Dorin_172" /note=Original Glimmer call @bp 94489 has strength 7.68; Genemark calls start at 94489 /note=SSC: Start = 94489, Stop = 94776. (Forward). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 3.353 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 288 bp is not the longest possible ORF. GAP: 341 bp. ST: SS=NA. F: thioredoxin. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 173, Function= function unknown, EValue= 1.0E-48. NCBIBLAST= PhageName= thioredoxin family protein [Verrucomicrobiota bacterium], Coverage= 78.9474, SubjectRange= 48:125, QueryRange= 48:79, EValue= 6.4181E-8. HHPRED= Accession= 3QOU_A, Description= protein ybbN; thioredoxin-like fold, tetratricopeptide repeat, lysine dimethylation, PROTEIN BINDING; HET: MLY; 1.8A {Escherichia coli}, Probability= 99.7. Coverage= 94.7368, SubjectRange= 27:112, QueryRange= 27:91. CDD= . /note=Start is called by both CG phages, has best scores and captures all coding potential. /note= /note=HHPred and NCBI BLAST suggest this gene is a thioredoxin CDS 94763 - 95044 /gene="173" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_173" /note=Original Glimmer call @bp 94763 has strength 5.48; Genemark calls start at 94763 /note=SSC: Start = 94763, Stop = 95044. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.248 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 282 bp is the longest possible ORF. GAP: -14 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 174, Function= function unknown, EValue= 3.0E-51. NCBIBLAST= . HHPRED= . CDD= . /note=covers all coding potential but there is overlap. switching starts will not cover all CP so start should be kept /note=good z score /note=start called 100% of time when present /note=no good HHPRED hits no NCBI hits CDS 95053 - 95193 /gene="174" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_174" /note=Genemark calls start at 95053 /note=SSC: Start = 95053, Stop = 95193. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.427 is the highest start score. SCS: Start is not called by Glimmer and is called by Genemark. LO: 141 bp is the longest possible ORF. GAP: 8 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 175, Function= function unknown, EValue= 4.0E-20. NCBIBLAST= . HHPRED= . CDD= . /note=good z score and captures all coding potential /note=start 1 called 100% of times when present /note=no good HHPRED hits and no NCBI hits CDS 95195 - 95365 /gene="175" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_175" /note=Original Glimmer call @bp 95195 has strength 8.61; Genemark calls start at 95195 /note=SSC: Start = 95195, Stop = 95365. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.189 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 171 bp is the longest possible ORF. GAP: 1 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 176, Function= function unknown, EValue= 5.0E-27. NCBIBLAST= PhageName= hypothetical protein SEA_NICEHOUSE_187 [Rhodococcus phage NiceHouse], Coverage= 87.5, SubjectRange= 7:55, QueryRange= 7:51, EValue= 3.24604E-8. HHPRED= . CDD= . /note=Start was chosen for good z and final scores, being called 100% of the time in the CG cluster, and for covering all coding potential. /note= /note=Analysis of e-scores suggest a chance that this phage could be a chorismate mutase domain of P-protein, but more investigation is needed. CDS 95452 - 95598 /gene="176" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_176" /note=Original Glimmer call @bp 95452 has strength 9.59; Genemark calls start at 95362 /note=SSC: Start = 95452, Stop = 95598. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.537 is the highest start score. SCS: Start is called by Glimmer and is not called by Genemark. LO: 147 bp is not the longest possible ORF. GAP: 86 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 177, Function= function unknown, EValue= 4.0E-22. NCBIBLAST= . HHPRED= . CDD= . /note=Kept start because it covers all coding potential. /note= /note=Shares this hypothetical protein with fellow CG phage Francesca CDS 95721 - 96350 /gene="177" /product="membrane protein" /function="membrane protein" /locus tag="Dorin_177" /note=Original Glimmer call @bp 95721 has strength 7.6; Genemark calls start at 95721 /note=SSC: Start = 95721, Stop = 96350. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.276 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 630 bp is the longest possible ORF. GAP: 122 bp. ST: SS=NA. F: membrane protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 178, Function= function unknown, EValue= 1.0E-120. NCBIBLAST= . HHPRED= . CDD= . /note=Start chosen because it`s the most called start and covers all coding potential. /note= /note=Only BLAST result is fellow CG phage Francesca. tRNA 96380 - 96454 /gene="178" /product="tRNA-Ile(tat)" /locus tag="DORIN_178" /note=tRNA-Ile(tat) CDS 96483 - 97028 /gene="179" /product="membrane protein" /function="membrane protein" /locus tag="Dorin_179" /note=Original Glimmer call @bp 96765 has strength 4.97; Genemark calls start at 96483 /note=SSC: Start = 96483, Stop = 97028. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.225 is not the highest start score. SCS: Start is not called by Glimmer and is called by Genemark. LO: 546 bp is not the longest possible ORF. GAP: 132 bp. ST: SS=NA. F: membrane protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= . CDD= . /note=Changed start to cover all cp by deleting gene at start 96626 (currently gene 146 until it is deleted or start changed). This start has good rbs scores. /note=No clear function call, TMHMM says membrane protein CDS 96626 - 96739 /gene="180" /product="nothing burger" /function="nothing burger" /locus tag="Dorin_180" /note=Original Glimmer call @bp 96626 has strength 1.07 /note=SSC: Start = 96626, Stop = 96739. (Forward). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 1.078 is the highest start score. SCS: Start is called by Glimmer and is not called by Genemark. LO: 114 bp is not the longest possible ORF. GAP: -403 bp. ST: SS=NA. F: nothing burger. FS: PHDBLAST= . NCBIBLAST= . HHPRED= . CDD= . /note=Hardly any cp is covered by this gene, the rbs scores are bad, its an orpham, and the genemark doesn`t call this gene. /note=Should delete this gene. By deleting this gene, gene 145 could extend to fill all of its cp and have a much better start CDS 97039 - 98076 /gene="181" /product="RNA ligase" /function="RNA ligase" /locus tag="Dorin_181" /note=Original Glimmer call @bp 97039 has strength 6.97; Genemark calls start at 97039 /note=SSC: Start = 97039, Stop = 98076. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.353 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 1038 bp is the longest possible ORF. GAP: 299 bp. ST: SS=NA. F: RNA ligase. FS: PHDBLAST= PhageName= NiceHouse, ProteinNumber= 194, Function= RNA ligase, EValue= 2.0E-57. NCBIBLAST= PhageName= RNA ligase [Rhodococcus phage NiceHouse], Coverage= 89.5652, SubjectRange= 1:318, QueryRange= 1:309, EValue= 1.43357E-63. HHPRED= Accession= 5TT6_A, Description= T4 RNA ligase 1; metal catalysis, covalent nucleotidyltransferase, lysyl-AMP, LIGASE; HET: ATP; 2.187A {Enterobacteria phage T4}, Probability= 100.0. Coverage= 93.3333, SubjectRange= 26:369, QueryRange= 26:332. CDD= Accession= pfam09511, Coverage= 52.1739, SubjectRange= 1:221, QueryRange= 1:233, EValue= 3.69216E-19. /note=Start chosen because it covers all coding potential and is called by the majority of phams that contain this start. /note= /note=Evidence from HHPred, NCBI BLAST, and PhagesDB BLAST suggest this is an RNA ligase. CDS complement (98073 - 98333) /gene="182" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_182" /note=Original Glimmer call @bp 98333 has strength 6.87; Genemark calls start at 98459 /note=SSC: Start = 98333, Stop = 98073. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.422 is not the highest start score. SCS: Start is called by Glimmer and is not called by Genemark. LO: 261 bp is not the longest possible ORF. GAP: 133 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 182, Function= function unknown, EValue= 3.0E-45. NCBIBLAST= . HHPRED= . CDD= . /note=Start selected for good z and final scores as well as covering all of the coding potential. /note= /note=Only significant BLAST result is fellow CG phage Francesca CDS complement (98467 - 99537) /gene="183" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_183" /note=Original Glimmer call @bp 99537 has strength 3.82; Genemark calls start at 99537 /note=SSC: Start = 99537, Stop = 98467. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.08 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 1071 bp is the longest possible ORF. GAP: 14 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= NiceHouse, ProteinNumber= 195, Function= hydrolase, EValue= 4.0E-16. NCBIBLAST= PhageName= hydrolase [Rhodococcus phage NiceHouse], Coverage= 48.0337, SubjectRange= 1:165, QueryRange= 1:171, EValue= 7.23762E-12. HHPRED= . CDD= . /note=good z score and covers all coding potential /note=called 100% when present /note=HHPred was run separately and gave some good hits for head fiber. It is found among minor tail proteins, meaning it could be called that, but no clear evidence CDS complement (99552 - 101456) /gene="184" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_184" /note=Original Glimmer call @bp 101456 has strength 4.33; Genemark calls start at 101456 /note=SSC: Start = 101456, Stop = 99552. (Reverse). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 2.091 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 1905 bp is not the longest possible ORF. GAP: 63 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Stormageddon, ProteinNumber= 26, Function= minor tail protein, EValue= 2.0E-65. NCBIBLAST= PhageName= hydrolase [Arthrobacter phage Qui] >gb|QED11527.1| minor tail protein [Arthrobacter phage Qui] >gb|QOC56359.1| minor tail protein [Arthrobacter phage Paella], Coverage= 55.205, SubjectRange= 69:447, QueryRange= 69:594, EValue= 2.22503E-39. HHPRED= Accession= 3QC7_A, Description= Head fiber protein; supercoiled triple repeating helix-turn-helix, VIRAL PROTEIN; 1.52A {Bacillus phage phi29}, Probability= 96.2. Coverage= 12.3028, SubjectRange= 8:136, QueryRange= 8:112. CDD= . /note=Chosen because it has the best z and final scores, it`s the most called start, and covers most of the coding potential. /note=Phagesdb Blast: good e value for Francesca and DX cluster phages. But this gene not in those phages. /note=HHPRED: no good hits /note=NCBI Blast: good e value but low % identity and coverage, only only one hit for minor tail protein /note=Could maybe be called a minor tail protein, but not great evidence CDS complement (101520 - 102644) /gene="185" /product="minor tail protein" /function="minor tail protein" /locus tag="Dorin_185" /note=Original Glimmer call @bp 102644 has strength 10.42; Genemark calls start at 102644 /note=SSC: Start = 102644, Stop = 101520. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.341 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 1125 bp is the longest possible ORF. GAP: 24 bp. ST: SS=NA. F: minor tail protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 185, Function= function unknown, EValue= 0.0. NCBIBLAST= PhageName= tail fiber protein [Gordonia phage Clawz] >gb|QKY79981.1| hypothetical protein SEA_CLAWZ_69 [Gordonia phage Clawz], Coverage= 95.4545, SubjectRange= 1:356, QueryRange= 1:357, EValue= 1.13685E-83. HHPRED= Accession= 8I4M_j, Description= Fiber protein(gp 28) of the cyanophage P-SCSP1u; Whole virus, Capsid, cyanophage, T7-like virus, VIRUS; 3.81A {Prochlorococcus phage P-SCSP1u}, Probability= 88.4. Coverage= 12.0321, SubjectRange= 523:577, QueryRange= 523:64. CDD= Accession= COG5301, Coverage= 44.6524, SubjectRange= 189:355, QueryRange= 189:326, EValue= 1.44205E-6. /note=covers all cp but decent z score /note=Phagesdb Blast: good e value with phages from CE cluster and Francesca /note=HHPRED: no good hits /note=NCBI BLast: pretty good hits for hypothetical protein but low % identity CDS complement (102669 - 103490) /gene="186" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_186" /note=Original Glimmer call @bp 103490 has strength 8.55; Genemark calls start at 103490 /note=SSC: Start = 103490, Stop = 102669. (Reverse). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 2.132 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 822 bp is not the longest possible ORF. GAP: 98 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Trina, ProteinNumber= 206, Function= function unknown, EValue= 8.0E-29. NCBIBLAST= PhageName= hypothetical protein SEA_NICEHOUSE_198 [Rhodococcus phage NiceHouse], Coverage= 99.6337, SubjectRange= 1:252, QueryRange= 1:272, EValue= 1.01798E-26. HHPRED= Accession= PF18667.5, Description= BppU_IgG ; Baseplate upper protein immunoglobulin like domain, Probability= 67.6. Coverage= 17.5824, SubjectRange= 4:69, QueryRange= 4:267. CDD= . /note=Chose this start because it`s the most called within the CG cluster, it covers most of the coding potential, and has the best z and final scores. /note=Phagesdb Blast: good e value for CE cluster phages and francesca with unknwon function /note=HHPRED: no good hits /note=NCBI Blast: good e value but low %identity and coverage for hypothetical protein functions CDS 103589 - 103837 /gene="187" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_187" /note=Original Glimmer call @bp 103589 has strength 5.1; Genemark calls start at 103589 /note=SSC: Start = 103589, Stop = 103837. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.264 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 249 bp is the longest possible ORF. GAP: 98 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 187, Function= function unknown, EValue= 3.0E-29. NCBIBLAST= . HHPRED= Accession= PF10486.13, Description= PI3K_1B_p101 ; Phosphoinositide 3-kinase gamma adapter protein p101 subunit, Probability= 58.7. Coverage= 30.4878, SubjectRange= 160:184, QueryRange= 160:38. CDD= . /note=Phagesdb Blast: good e value for francesca with unknown function /note=HHPRED: no good hits, low probability, coverage and bad e value CDS 103839 - 104069 /gene="188" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_188" /note=Original Glimmer call @bp 103839 has strength 9.78; Genemark calls start at 103839 /note=SSC: Start = 103839, Stop = 104069. (Forward). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 2.947 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 231 bp is not the longest possible ORF. GAP: 1 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 188, Function= function unknown, EValue= 7.0E-38. NCBIBLAST= . HHPRED= Accession= 5ZDH_Q, Description= Type II secretion system lipoprotein; Pilotin, Secretin, PROTEIN TRANSPORT; 3.2A {Escherichia coli O78:H11 (strain H10407 / ETEC)}, Probability= 65.2. Coverage= 35.5263, SubjectRange= 75:102, QueryRange= 75:70. CDD= . /note=Chosen because it`s the most called start, has the best z and final scores, and covers the majority of the coding potential. /note=HHPRED: no good hits, low probability, coverage and high e value /note=Only significant BLAST match is the fellow CG phage Francesca. CDS 104071 - 104262 /gene="189" /product="membrane protein" /function="membrane protein" /locus tag="Dorin_189" /note=Original Glimmer call @bp 104071 has strength 3.13; Genemark calls start at 104071 /note=SSC: Start = 104071, Stop = 104262. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.947 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 192 bp is not the longest possible ORF. GAP: 1 bp. ST: SS=NA. F: membrane protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 189, Function= function unknown, EValue= 2.0E-26. NCBIBLAST= . HHPRED= Accession= PF14017.10, Description= DUF4233 ; Protein of unknown function (DUF4233), Probability= 91.6. Coverage= 76.1905, SubjectRange= 56:95, QueryRange= 56:58. CDD= . /note=good z score and will cover all coding potential /note=no good hits for HHPRED /note= /note=TMHMM shows membrane domains CDS 104265 - 105167 /gene="190" /product="polynucleotide kinase" /function="polynucleotide kinase" /locus tag="Dorin_190" /note=Original Glimmer call @bp 104265 has strength 5.31; Genemark calls start at 104265 /note=SSC: Start = 104265, Stop = 105167. (Forward). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 2.313 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 903 bp is the longest possible ORF. GAP: 2 bp. ST: SS=NA. F: polynucleotide kinase. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 190, Function= function unknown, EValue= 1.0E-174. NCBIBLAST= PhageName= polynucleotide kinase [Rhodococcus phage NiceHouse], Coverage= 99.3333, SubjectRange= 4:301, QueryRange= 4:300, EValue= 5.06173E-106. HHPRED= Accession= 4XRP_A, Description= Pnkp1; RNA repair, kinase, phosphatase, methyltransferase, ligase, PROTEIN BINDING; HET: SO4, GOL, PO4; 3.3A {Capnocytophaga gingivalis}, Probability= 100.0. Coverage= 99.6667, SubjectRange= 11:312, QueryRange= 11:300. CDD= Accession= PHA02530, Coverage= 100.0, SubjectRange= 2:300, QueryRange= 2:300, EValue= 0.0. /note=Selected this start because it`s in the majority of the phages in this pham and is called almost always when present, is the longest ORF, and covers the majority of the coding potential. /note=Phagesdb Blast: good e value for muntaha, nicehouse and francesca. muntaha and nicehouse have polynulceotide kinase function. /note=HHPRED: high probability and coverage, good e value for polynucleotide kinase function /note=NCBI Blast: not so ghigh %identity but high coverage and good e value for polynucleotide kinase from nicehouse /note=conserved domain database: really good evalue, high coverage, not so high probability for polynucleotide kinase /note=these suggest that this genes function is polynucleotide kinase CDS 105188 - 105457 /gene="191" /product="membrane protein" /function="membrane protein" /locus tag="Dorin_191" /note=Original Glimmer call @bp 105188 has strength 6.95; Genemark calls start at 105188 /note=SSC: Start = 105188, Stop = 105457. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.027 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 270 bp is the longest possible ORF. GAP: 20 bp. ST: SS=NA. F: membrane protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 191, Function= function unknown, EValue= 5.0E-43. NCBIBLAST= . HHPRED= Accession= PF17255.6, Description= EbsA ; EbsA-like protein, Probability= 82.7. Coverage= 95.5056, SubjectRange= 16:95, QueryRange= 16:87. CDD= . /note=decent z score will cover all coding potential /note=no good HHPRED or NCBI hits /note=HHPRED: high probability, coverage but bad e value /note=TMHMM shows membrane domains CDS 105454 - 105621 /gene="192" /product="membrane protein" /function="membrane protein" /locus tag="Dorin_192" /note=Genemark calls start at 105454 /note=SSC: Start = 105454, Stop = 105621. (Forward). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 2.225 is not the highest start score. SCS: Start is not called by Glimmer and is called by Genemark. LO: 168 bp is the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: membrane protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 192, Function= function unknown, EValue= 6.0E-22. NCBIBLAST= . HHPRED= Accession= PF11772.12, Description= EpuA ; DNA-directed RNA polymerase subunit beta, Probability= 85.0. Coverage= 65.4545, SubjectRange= 3:41, QueryRange= 3:39. CDD= . /note=Selected start because it is the most called and longest ORF, and encapsulates most of the coding potential. /note= /note=Only significant BLAST match is fellow CG phage Francesca, also with a hypothetical protein. /note=TMHMM shows membrane domains CDS 105635 - 105832 /gene="193" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_193" /note=Original Glimmer call @bp 105635 has strength 4.08; Genemark calls start at 105635 /note=SSC: Start = 105635, Stop = 105832. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.196 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 198 bp is the longest possible ORF. GAP: 13 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 193, Function= function unknown, EValue= 1.0E-33. NCBIBLAST= . HHPRED= Accession= 3NSW_B, Description= Excretory-secretory protein 2; Ancylostoma ceylanicum, hookworm, excretory-secretory protein, merohedral twinning, immunomodulator, netrin domain, IMMUNE SYSTEM; HET: EPE; 1.75A {Ancylostoma ceylanicum}, Probability= 38.0. Coverage= 55.3846, SubjectRange= 66:103, QueryRange= 66:50. CDD= . /note=best z score out of possible starts, covers all coding potential /note=no good hits on HHPRED and NCBI CDS 105890 - 106039 /gene="194" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_194" /note=Original Glimmer call @bp 105890 has strength 9.6; Genemark calls start at 105890 /note=SSC: Start = 105890, Stop = 106039. (Forward). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 3.353 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 150 bp is the longest possible ORF. GAP: 57 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 194, Function= function unknown, EValue= 2.0E-20. NCBIBLAST= . HHPRED= Accession= PF04999.17, Description= FtsL ; Cell division protein FtsL, Probability= 93.4. Coverage= 69.3878, SubjectRange= 36:70, QueryRange= 36:35. CDD= . /note=Start chosen because it`s the most called and encapsulates most coding potential, and is longest ORF. /note=no good hits for NCBI blast, HHPRED or NCBI Blast /note=Only significant BLAST match is fellow CG phage Francesca, also for a hypothetical protein. CDS 106053 - 106217 /gene="195" /product="membrane protein" /function="membrane protein" /locus tag="Dorin_195" /note=Original Glimmer call @bp 106053 has strength 4.05; Genemark calls start at 106053 /note=SSC: Start = 106053, Stop = 106217. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.276 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 165 bp is not the longest possible ORF. GAP: 13 bp. ST: SS=NA. F: membrane protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 195, Function= function unknown, EValue= 2.0E-23. NCBIBLAST= . HHPRED= Accession= PF08114.15, Description= PMP1_2 ; ATPase proteolipid family, Probability= 86.2. Coverage= 44.4444, SubjectRange= 12:36, QueryRange= 12:53. CDD= . /note=Called start change because the cp does not cover the entire orph, and a longer start would cause overlap. Starterator and zscores support. /note=Starterator calls most annotated start (though only a 2 member pham with Francesca and Dorin) /note=good z score and cover all coding potential /note=bad e value on HHPRED and no NCBI hits /note=TMHMM shows membrane domains CDS 106217 - 106447 /gene="196" /product="membrane protein" /function="membrane protein" /locus tag="Dorin_196" /note=Original Glimmer call @bp 106217 has strength 6.02; Genemark calls start at 106217 /note=SSC: Start = 106217, Stop = 106447. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.353 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 231 bp is the longest possible ORF. GAP: -1 bp. ST: SS=NA. F: membrane protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 196, Function= function unknown, EValue= 5.0E-37. NCBIBLAST= . HHPRED= Accession= PF20556.2, Description= DUF6768 ; Family of unknown function (DUF6768), Probability= 87.5. Coverage= 68.4211, SubjectRange= 43:103, QueryRange= 43:53. CDD= . /note=Start was chosen because it is the most called on starterator (though Francesca and Dorin are the only members), encapsulated all coding potential, and is the longest ORF. /note=Z-scores are good /note=Only significant BLAST match was the related phage, Francesca. /note=NCBI Blast has no hits /note=HHPRED matches all had poor e-values, not significant. /note=TMHMM shows membrane domains CDS 106517 - 107539 /gene="197" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_197" /note=Original Glimmer call @bp 106517 has strength 7.1; Genemark calls start at 106517 /note=SSC: Start = 106517, Stop = 107539. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.947 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 1023 bp is the longest possible ORF. GAP: 69 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 197, Function= function unknown, EValue= 0.0. NCBIBLAST= PhageName= hypothetical protein FDI69_gp197 [Rhodococcus phage Trina] >gb|ASZ74989.1| hypothetical protein SEA_TRINA_210 [Rhodococcus phage Trina], Coverage= 98.2353, SubjectRange= 1:341, QueryRange= 1:334, EValue= 1.52979E-116. HHPRED= Accession= PF06067.15, Description= DUF932 ; Domain of unknown function (DUF932), Probability= 99.9. Coverage= 68.2353, SubjectRange= 1:220, QueryRange= 1:325. CDD= . /note=Does not have the most annotated start, but called 100% of time when present. (Same called start as Francesca and Trina, as has held true for many genes) /note=Good z-score, covers all CP, and is longest possible orph /note=PhagesDB Blast has hits with several similar phages (Trina, Peregrin, Weasles2, and NiceHouse) though all called hypothetical protein. /note=good HHPRED and NCBI hits to call for unknown protein function CDS 107598 - 108119 /gene="198" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_198" /note=Original Glimmer call @bp 107598 has strength 9.62; Genemark calls start at 107598 /note=SSC: Start = 107598, Stop = 108119. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.438 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 522 bp is the longest possible ORF. GAP: 58 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 198, Function= function unknown, EValue= 2.0E-90. NCBIBLAST= . HHPRED= Accession= PF09868.13, Description= DUF2095 ; Uncharacterized protein conserved in archaea (DUF2095), Probability= 43.4. Coverage= 8.67052, SubjectRange= 74:89, QueryRange= 74:166. CDD= . /note=Chosen start has best z-score, covers all cp, is the LPO, and starterator has no evidence to change (2 member pham, both call this start, Francesca is the only other phage present...) /note= /note=PhagesDB BLAST was not informative-only significant e-score was the related phage Francesca, and for a similar hypothetical protein. /note=HHPRED had hits, but all were poor calls with bad e-values/coverage. /note=NCBI Blast had no hits CDS 108121 - 108468 /gene="199" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_199" /note=Original Glimmer call @bp 108121 has strength 10.6; Genemark calls start at 108121 /note=SSC: Start = 108121, Stop = 108468. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.509 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 348 bp is not the longest possible ORF. GAP: 1 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 199, Function= function unknown, EValue= 9.0E-63. NCBIBLAST= . HHPRED= Accession= PF09941.13, Description= DUF2173 ; Uncharacterized conserved protein (DUF2173), Probability= 54.6. Coverage= 34.7826, SubjectRange= 75:105, QueryRange= 75:112. CDD= . /note=Start called 100% of time when present (Dorin and Francesca both call and are the only members of the pham) /note=Best z-score and covers all CP. Not longest possible orf, but going longer will not cover more CP and will cause overlap with another gene. /note=no good HHPRED hits and no NCBI at all CDS 108492 - 108935 /gene="200" /product="membrane protein" /function="membrane protein" /locus tag="Dorin_200" /note=Original Glimmer call @bp 108492 has strength 3.97; Genemark calls start at 108492 /note=SSC: Start = 108492, Stop = 108935. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.979 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 444 bp is the longest possible ORF. GAP: 23 bp. ST: SS=NA. F: membrane protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 200, Function= function unknown, EValue= 2.0E-77. NCBIBLAST= PhageName= hypothetical protein SEA_NICEHOUSE_210 [Rhodococcus phage NiceHouse], Coverage= 82.3129, SubjectRange= 1:121, QueryRange= 1:133, EValue= 1.26559E-7. HHPRED= Accession= 8HFS_E, Description= Mannose-specific PTS system, IIC component; antibiotic resistance antimicrobial peptides mannose phosphotransferase system man-PTS bacteriocins non-pediocin-like/class IId; HET: MAN; 2.98A {Lactococcus lactis subsp. lactis (strain KF147)}, Probability= 94.5. Coverage= 95.2381, SubjectRange= 127:255, QueryRange= 127:144. CDD= . /note=Not the most annotated start, but called 100% of the time it is present (at this point only Dorin and Francesca call it, but the orpham only has 4 members...) /note=Encapsulates all coding potential, and is the longest ORF. /note= /note=PhagesDB Blast has hits with Francesca (CG cluster) and both members of the CE cluster. All called hypothetical protein /note=HHPRED had no convincing hits, all e-values/confidence were too low to be considered /note=NCBI Blast had a hit on a hypothetical protein call found in the NiceHouse Rhodococcus phage. /note= /note=TMHMM shows 4 membrane domains, calling membrane protein CDS 108945 - 109226 /gene="201" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_201" /note=Original Glimmer call @bp 108945 has strength 4.98; Genemark calls start at 108945 /note=SSC: Start = 108945, Stop = 109226. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.189 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 282 bp is the longest possible ORF. GAP: 9 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 201, Function= function unknown, EValue= 2.0E-49. NCBIBLAST= PhageName= hypothetical protein FDI69_gp193 [Rhodococcus phage Trina] >gb|ASZ74993.1| hypothetical protein SEA_TRINA_214 [Rhodococcus phage Trina], Coverage= 82.7957, SubjectRange= 16:88, QueryRange= 16:93, EValue= 1.75348E-11. HHPRED= Accession= 8P66_B, Description= Clp protease ClpC,Heat shock survival AAA family ATPase ClpK; ATPase associated with diverse cellular activities (AAA), protein aggregation, molecular chaperone, stress, 70 kilodalton heat shock; HET: ZN; NMR {Pseudomonas aeruginosa}, Probability= 89.7. Coverage= 33.3333, SubjectRange= 5:38, QueryRange= 5:58. CDD= . /note=Does not have most annotated start, but has the second most called start called 100% when present (3/6 chose each start) /note=Same start as Francesca and Trina which have been similar throughout. /note=good z score and coding potential for called start, covers all CP /note=no good HHPRED hits but good e value hits on NCBI to call hypothetical protein (from a Bailinhaonella thermotolerans and Rhodococcus Phages.) CDS 109238 - 109510 /gene="202" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_202" /note=Original Glimmer call @bp 109238 has strength 6.3; Genemark calls start at 109238 /note=SSC: Start = 109238, Stop = 109510. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.196 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 273 bp is the longest possible ORF. GAP: 11 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Dorin_Draft, ProteinNumber= 199, Function= function unknown, EValue= 2.0E-46. NCBIBLAST= . HHPRED= Accession= 6EJP_A, Description= Yop proteins translocation protein U; AUTOCLEAVAGE, TYPE III SECRETION SYSTEM, INHIBITOR, TRANSPORT PROTEIN; HET: PO4, B8E; 2.48A {Yersinia pestis}, Probability= 73.8. Coverage= 21.1111, SubjectRange= 8:27, QueryRange= 8:77. CDD= . /note=No BLAST results of any kind /note=HHPred hits show no significant similarity. /note= /note=Chose this start because it encapsulates all coding potential and is the longest ORF.(Is an orpham, so no evidence for a different start...) CDS 109497 - 109661 /gene="203" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_203" /note=Original Glimmer call @bp 109497 has strength 8.58; Genemark calls start at 109497 /note=SSC: Start = 109497, Stop = 109661. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.344 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 165 bp is not the longest possible ORF. GAP: -14 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 202, Function= function unknown, EValue= 6.0E-24. NCBIBLAST= . HHPRED= Accession= PF11036.12, Description= YqgB ; Virulence promoting factor, Probability= 47.3. Coverage= 18.5185, SubjectRange= 27:37, QueryRange= 27:17. CDD= . /note=no good HHPRED or NCBI hits /note=there will be an overlap of -14 but we want to save coding potential so we will not change it. good z score CDS 109703 - 110161 /gene="204" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_204" /note=Original Glimmer call @bp 109703 has strength 10.45; Genemark calls start at 109703 /note=SSC: Start = 109703, Stop = 110161. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.947 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 459 bp is the longest possible ORF. GAP: 41 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 203, Function= function unknown, EValue= 5.0E-84. NCBIBLAST= PhageName= hypothetical protein FDI69_gp192 [Rhodococcus phage Trina] >gb|ASZ74994.1| hypothetical protein SEA_TRINA_215 [Rhodococcus phage Trina], Coverage= 100.0, SubjectRange= 1:150, QueryRange= 1:152, EValue= 2.09636E-24. HHPRED= Accession= cd04191, Description= Glucan_BSP_MdoH; Glucan_BSP_MdoH catalyzes the elongation of beta-1,2 polyglucose chains of glucan., Probability= 95.8. Coverage= 80.9211, SubjectRange= 1:119, QueryRange= 1:137. CDD= . /note=Phagesdb Blast: good evalues for ce cluster phages, with uknown function /note=HHPRED: high probability, coverage but bad e value /note=NCBI Blast: low % identity and high coverage and good e value for hypothetical protein hits CDS 110154 - 110552 /gene="205" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_205" /note=Original Glimmer call @bp 110154 has strength 8.73; Genemark calls start at 110154 /note=SSC: Start = 110154, Stop = 110552. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.902 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 399 bp is the longest possible ORF. GAP: -8 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 204, Function= function unknown, EValue= 8.0E-75. NCBIBLAST= PhageName= hypothetical protein FDI69_gp191 [Rhodococcus phage Trina] >gb|ASZ74995.1| hypothetical protein SEA_TRINA_216 [Rhodococcus phage Trina], Coverage= 97.7273, SubjectRange= 15:143, QueryRange= 15:131, EValue= 4.65905E-5. HHPRED= Accession= 2DJ6_C, Description= hypothetical protein PH0634; 6-pyruvoyl tetrahydrobiopterin synthase (PTPS), Structural Genomics, NPPSFA, National Project on Protein Structural and Functional Analyses, RIKEN; 2.1A {Pyrococcus horikoshii} SCOP: d.96.1.0, Probability= 57.1. Coverage= 12.1212, SubjectRange= 17:32, QueryRange= 17:92. CDD= . /note=starterator: only other phage that called for chosen start was francesca /note=Phagesdb Blast: high e value for francesca /note=HHPRED: al hits low probability, coverage and bad e value /note=NCBI Blast: low % identity but good coverage and 10-5 evalue, function of hypothetical protein CDS 110509 - 110682 /gene="206" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_206" /note=Original Glimmer call @bp 110509 has strength 3.6 /note=SSC: Start = 110509, Stop = 110682. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.765 is the highest start score. SCS: Start is called by Glimmer and is not called by Genemark. LO: 174 bp is the longest possible ORF. GAP: -44 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 205, Function= function unknown, EValue= 3.0E-29. NCBIBLAST= . HHPRED= Accession= SCOP_d2akla2, Description= g.41.3.5 (A:3-40) Hypothetical protein PA0128, N-terminal domain {Pseudomonas aeruginosa [TaxId: 287]} | CLASS: Small proteins, FOLD: Rubredoxin-like, SUPFAM: Zinc beta-ribbon, FAM: PhnA zinc-binding domain, Probability= 90.3. Coverage= 43.8596, SubjectRange= 3:28, QueryRange= 3:38. CDD= . /note=genemark: not good cp but chosen start covers all coding potential. If start is changed has no gap but cuts length length a lot and dosen`t cover all cp. /note=Phagesdb Blast: good e value for francesca /note=HHPRED: good probability, low coverage and bad e values CDS 110663 - 111040 /gene="207" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_207" /note=Original Glimmer call @bp 110663 has strength 6.77; Genemark calls start at 110663 /note=SSC: Start = 110663, Stop = 111040. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.19 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 378 bp is not the longest possible ORF. GAP: -20 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 206, Function= function unknown, EValue= 1.0E-69. NCBIBLAST= PhageName= hypothetical protein FDI69_gp189 [Rhodococcus phage Trina] >gb|ASZ74997.1| hypothetical protein SEA_TRINA_218 [Rhodococcus phage Trina], Coverage= 95.2, SubjectRange= 2:119, QueryRange= 2:121, EValue= 5.28711E-10. HHPRED= Accession= PF14445.10, Description= Prok-RING_2 ; Prokaryotic RING finger family 2, Probability= 65.5. Coverage= 41.6, SubjectRange= 3:55, QueryRange= 3:97. CDD= . /note=starterator: 110657 start has better z and final score however starterator shows same start as Francesca and NiceHouse /note=Phagesdb Blast: high e value for francesca and trina /note=HHPRED: bad e value, coverage and does not have high probability /note=NCBI Blast: low % identity, good coverage and good e value, for hits of hypothetical protein CDS 111111 - 111533 /gene="208" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_208" /note=Original Glimmer call @bp 111099 has strength 11.8; Genemark calls start at 111111 /note=SSC: Start = 111111, Stop = 111533. (Forward). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 3.036 is the highest start score. SCS: Start is not called by Glimmer and is called by Genemark. LO: 423 bp is not the longest possible ORF. GAP: 70 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= Accession= 3C8L_B, Description= FtsZ-like protein of unknown function; Structural genomics, Joint Center for Structural Genomics, JCSG, Protein Structure Initiative, PSI-2, unknown function; HET: SO4, IMD, MSE, GOL; 1.22A {Nostoc punctiforme}, Probability= 23.0. Coverage= 17.1429, SubjectRange= 5:29, QueryRange= 5:133. CDD= . /note=start was changed, chosen start has better z and final score. alsochanging the start does not changes the amount of cp that is covered. /note=Phagesdb Blast: good e value for francesca and trina /note=HHPRED: called function has bad probability, coverage and e value CDS 111629 - 111814 /gene="209" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_209" /note=Original Glimmer call @bp 111629 has strength 5.95; Genemark calls start at 111629 /note=SSC: Start = 111629, Stop = 111814. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.353 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 186 bp is the longest possible ORF. GAP: 95 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 208, Function= function unknown, EValue= 3.0E-30. NCBIBLAST= . HHPRED= Accession= PF04697.17, Description= Pinin_SDK_N ; pinin/SDK conserved region, Probability= 82.7. Coverage= 49.1803, SubjectRange= 1:31, QueryRange= 1:31. CDD= . /note=phagesdb Blast: good e score for francesca /note=HHPRED: bad %identity, coverage and e value CDS 111817 - 112053 /gene="210" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_210" /note=Genemark calls start at 111817 /note=SSC: Start = 111817, Stop = 112053. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.586 is not the highest start score. SCS: Start is not called by Glimmer and is called by Genemark. LO: 237 bp is the longest possible ORF. GAP: 2 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 209, Function= function unknown, EValue= 1.0E-41. NCBIBLAST= PhageName= hypothetical protein FDJ30_gp122 [Streptomyces phage BillNye] >gb|AVD99311.1| hypothetical protein SEA_BILLNYE_134 [Streptomyces phage BillNye] >gb|QBZ72394.1| hypothetical protein SEA_CIRCINUS_135 [Streptomyces phage Circinus], Coverage= 96.1538, SubjectRange= 4:79, QueryRange= 4:75, EValue= 4.80157E-6. HHPRED= Accession= PF10879.12, Description= DUF2674 ; Protein of unknown function (DUF2674), Probability= 55.6. Coverage= 41.0256, SubjectRange= 17:41, QueryRange= 17:39. CDD= . /note=phagesdb Blast: good e value for francesca and phages from BK2 cluster that are not from rhodococcus /note=HHPRED: bad probability, coverage and e value /note=MCBI Blast: bad % identity, good coverage and decent e value for function of hypothetical protein CDS 112050 - 112334 /gene="211" /product="WhiB family transcription factor" /function="WhiB family transcription factor" /locus tag="Dorin_211" /note=Original Glimmer call @bp 112050 has strength 4.59; Genemark calls start at 112050 /note=SSC: Start = 112050, Stop = 112334. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.027 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 285 bp is the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: WhiB family transcription factor. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 210, Function= function unknown, EValue= 1.0E-52. NCBIBLAST= PhageName= WhiB family transcription factor [Rhodococcus phage NiceHouse], Coverage= 97.8723, SubjectRange= 7:98, QueryRange= 7:93, EValue= 8.66069E-21. HHPRED= Accession= 6ONO_A, Description= Transcription regulator WhiB1; Iron-sulfur cluster, transcription regulation, redox-sensing, TRANSCRIPTION; HET: PEG, SF4, MSE; 1.85A {Mycobacterium tuberculosis H37Rv}, Probability= 99.8. Coverage= 74.4681, SubjectRange= 1:74, QueryRange= 1:72. CDD= Accession= pfam02467, Coverage= 61.7021, SubjectRange= 1:58, QueryRange= 1:61, EValue= 1.60393E-6. /note=Starterator shows only present in this and francesca, but called 100% when present, good scores /note=phages db: good e values for phages from CE and CB cluster with a function ofWhiB family transcription factor /note=HHPRED: good probability, coverage and e value for Transcription regulator WhiB1 /note=NCBI Blast: low % identity, good coverage and e-value for function of WhiB family transcription factor /note=conserved domain database: low % identity, coverage but pretty good e value for Transcription factor WhiB CDS complement (111769 - 112098) /gene="212" /product="Delete this one too" /function="Delete this one too" /locus tag="Dorin_212" /note=Original Glimmer call @bp 112098 has strength 1.32 /note=SSC: Start = 112098, Stop = 111769. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.843 is the highest start score. SCS: Start is called by Glimmer and is not called by Genemark. LO: 330 bp is the longest possible ORF. GAP: 216 bp. ST: SS=NA. F: Delete this one too. FS: PHDBLAST= . NCBIBLAST= . HHPRED= Accession= 3MHS_C, Description= SAGA-associated factor 11; Multi-protein complex, Hydrolase-transcription regulator-protein binding complex, Acetylation, Cytoplasm, Isopeptide bond, Nucleus, Phosphoprotein, Ubl conjugation; 1.89A {Saccharomyces cerevisiae}, Probability= 28.5. Coverage= 23.8532, SubjectRange= 5:31, QueryRange= 5:97. CDD= . /note=*Delete Gene /note=NCBI Blast: no good hits, all bad probability, coverage and e value /note=This gene is deleted because it is orpham, has too much overlap with genes. Also has significantly lower cp than the genes overlaping CDS 112315 - 112617 /gene="213" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_213" /note=Original Glimmer call @bp 112315 has strength 4.35; Genemark calls start at 112315 /note=SSC: Start = 112315, Stop = 112617. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.005 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 303 bp is the longest possible ORF. GAP: 216 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 211, Function= function unknown, EValue= 7.0E-52. NCBIBLAST= PhageName= hypothetical protein SEA_NICEHOUSE_219 [Rhodococcus phage NiceHouse], Coverage= 76.0, SubjectRange= 15:87, QueryRange= 15:100, EValue= 7.83792E-12. HHPRED= Accession= 6T9K_B, Description= Transcription factor SPT20; Coactivator, Transcription, Histone acetyltransferase, Histone deubiquitinase, GENE REGULATION; 3.3A {Saccharomyces cerevisiae (strain ATCC 204508 / S288c)}, Probability= 74.1. Coverage= 76.0, SubjectRange= 119:217, QueryRange= 119:89. CDD= . /note=phagesdb Blast: low e value for francesca /note=HHPRED: not that good probability and coverage, bad e value /note=NCBI Blast: low % identity, pretty high coverage but bad e value CDS 112610 - 112798 /gene="214" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_214" /note=Original Glimmer call @bp 112610 has strength 6.18; Genemark calls start at 112610 /note=SSC: Start = 112610, Stop = 112798. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.824 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 189 bp is the longest possible ORF. GAP: -8 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 212, Function= function unknown, EValue= 7.0E-30. NCBIBLAST= . HHPRED= Accession= 7Q21_V, Description= Actinobacterial supercomplex, subunit C (AscC); MEMBRANE PROTEIN, CRYO-EM, RESPIRATORY SUPERCOMPLEX, ACTINOBACTERIA, ELECTRON TRANSPORT; HET: 9XX, TWT, MQ9, HEC, TRD, HAS, PLM, 9YF, HEM, CDL, 7PH; 3.0A {Corynebacterium glutamicum ATCC 13032}, Probability= 60.3. Coverage= 25.8065, SubjectRange= 48:66, QueryRange= 48:39. CDD= . /note=phagesdb Blast: only good e value for francesca /note=HHPRED: bad probability, coverge and e value CDS 112878 - 113237 /gene="215" /product="membrane protein" /function="membrane protein" /locus tag="Dorin_215" /note=Original Glimmer call @bp 112878 has strength 9.66; Genemark calls start at 112878 /note=SSC: Start = 112878, Stop = 113237. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.946 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 360 bp is the longest possible ORF. GAP: 79 bp. ST: SS=NA. F: membrane protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 213, Function= function unknown, EValue= 3.0E-61. NCBIBLAST= PhageName= hypothetical protein SEA_NICEHOUSE_221 [Rhodococcus phage NiceHouse], Coverage= 76.4706, SubjectRange= 1:91, QueryRange= 1:91, EValue= 1.28854E-19. HHPRED= Accession= PF10828.12, Description= DUF2570 ; Protein of unknown function (DUF2570), Probability= 97.9. Coverage= 83.1933, SubjectRange= 3:97, QueryRange= 3:108. CDD= . /note=phagesdb Blast: good e value for phages francesca, trina, nicehouse but no called function /note=HHPRED: high probabbility, and decent coverage but bad e value /note=MNCBI Blast: good e values, coverage but low % identity for hypothetical protein on rhodococcus (nicehouse, trina) /note=1 membrane domain called here, checked with SOSUI CDS 113227 - 113382 /gene="216" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_216" /note=Original Glimmer call @bp 113227 has strength 5.06; Genemark calls start at 113227 /note=SSC: Start = 113227, Stop = 113382. (Forward). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 3.086 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 156 bp is the longest possible ORF. GAP: -11 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 214, Function= function unknown, EValue= 2.0E-23. NCBIBLAST= . HHPRED= Accession= cd22266, Description= AcrIE1; Anti-CRISPR type I subtype E1. AcrIE1 (also known as AcrE1) is an anti-CRISPR (Acr) protein which binds as a homodimer to and inactivates the CRISPR-associated helicase/nuclease Cas3 protein., Probability= 21.3. Coverage= 23.5294, SubjectRange= 11:23, QueryRange= 11:24. CDD= . /note=Starterator: not the most annotated, only called 2/25 times (the other is Francesca). Uncommon, but only possible start... /note=Z-scores are fine...covers all cp and is longest possible orf /note=PHagesDB blast only hits on Francesca /note=HHPRED has no good hits (all bad evalue/coverage/etc) /note=NCBI blast got nothing CDS 113375 - 113632 /gene="217" /product="membrane protein" /function="membrane protein" /locus tag="Dorin_217" /note=Original Glimmer call @bp 113375 has strength 7.44; Genemark calls start at 113375 /note=SSC: Start = 113375, Stop = 113632. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.166 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 258 bp is not the longest possible ORF. GAP: -8 bp. ST: SS=NA. F: membrane protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 215, Function= function unknown, EValue= 5.0E-42. NCBIBLAST= . HHPRED= Accession= 7KDF_B, Description= NUF2 isoform 1,NUF2 isoform 1; Stu2, tension sensing, Ndc80, kinetochore, CELL CYCLE; HET: SO4; 2.72A {Saccharomyces cerevisiae}, Probability= 70.9. Coverage= 96.4706, SubjectRange= 43:158, QueryRange= 43:85. CDD= . /note=Most called start in starterator, but pham only has Francesca and Dorin. /note=Not longest possible, but covers all the cp and has no major overlap with previous gene (as a longer start would). /note=PhagesDB blast only hits on Francesca /note=HHPRED has no good hits (poor evalue) /note=NCBI Blast has no hits /note=TMHMM shows 2 membrane domains CDS 113650 - 113808 /gene="218" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_218" /note=Original Glimmer call @bp 113650 has strength 4.04; Genemark calls start at 113650 /note=SSC: Start = 113650, Stop = 113808. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.189 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 159 bp is the longest possible ORF. GAP: 17 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Weasels2, ProteinNumber= 209, Function= function unknown, EValue= 0.14. NCBIBLAST= . HHPRED= Accession= 3V4V_D, Description= Integrin beta-7; Cell Adhesion, MAdCAM-1, Membrane; HET: BMA, NAG, MAN, 0DU, CA; 3.1A {Homo sapiens}, Probability= 63.0. Coverage= 26.9231, SubjectRange= 482:496, QueryRange= 482:31. CDD= . /note=Orpham...but called start covers all cp, has a good z-score, and is LORF. /note=No good hits for PhagesDB blast (not even Francesca) /note=HHPRED has no good hits (poor evalue) /note=NCBI Blast has no hits CDS 113818 - 114294 /gene="219" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_219" /note=Original Glimmer call @bp 113818 has strength 4.4; Genemark calls start at 113884 /note=SSC: Start = 113818, Stop = 114294. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.814 is not the highest start score. SCS: Start is called by Glimmer and is not called by Genemark. LO: 477 bp is the longest possible ORF. GAP: 9 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 216, Function= function unknown, EValue= 1.0E-90. NCBIBLAST= PhageName= hypothetical protein FDI69_gp177 [Rhodococcus phage Trina] >gb|ASZ75009.1| hypothetical protein SEA_TRINA_230 [Rhodococcus phage Trina], Coverage= 56.962, SubjectRange= 16:87, QueryRange= 16:105, EValue= 1.73424E-5. HHPRED= Accession= PF07093.15, Description= SGT1 ; SGT1 protein, Probability= 23.6. Coverage= 32.2785, SubjectRange= 109:151, QueryRange= 109:54. CDD= . /note=Starterator has most called start (pham of Francesca and Dorin only, but both called) /note=Covers all cp, scores are fine, and is the LORF. /note=PhagesDB Blast has good hits on Francesca, Trina, and NiceHouse (all call function unknown) /note=HHPRED has no good hits (evalue > 200...) /note=NCBI has evalues on the edge of o.k., but poor identity and coverage (hypothetical Rhodococcus phage gene from Trina) CDS 114305 - 114634 /gene="220" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_220" /note=Original Glimmer call @bp 114305 has strength 5.41; Genemark calls start at 114305 /note=SSC: Start = 114305, Stop = 114634. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.353 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 330 bp is the longest possible ORF. GAP: 10 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 217, Function= function unknown, EValue= 3.0E-59. NCBIBLAST= . HHPRED= Accession= 6GW6_B, Description= Xre antitoxin; Toxin antitoxin type II system, Xre, Cro repressor, RES domain, NAD+ glycohydrolase, NADase, toxin; HET: GOL, IMD; 2.205A {Pseudomonas putida KT2440}, Probability= 89.4. Coverage= 75.2294, SubjectRange= 65:144, QueryRange= 65:106. CDD= . /note=Most called start in starterator, pham of 2 (Francesca and Dorin only) but both called this start /note=Covers all cp, zscore is good, LORF /note=PhagesBD Blast only calls Francesca /note=HHPRED has no good hits /note=NCBI Blast has no hits CDS 114634 - 115263 /gene="221" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_221" /note=Original Glimmer call @bp 114634 has strength 7.06; Genemark calls start at 114634 /note=SSC: Start = 114634, Stop = 115263. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.341 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 630 bp is the longest possible ORF. GAP: -1 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 218, Function= function unknown, EValue= 3.0E-93. NCBIBLAST= PhageName= hypothetical protein SEA_NICEHOUSE_151 [Rhodococcus phage NiceHouse], Coverage= 86.1244, SubjectRange= 1:170, QueryRange= 1:180, EValue= 6.94483E-14. HHPRED= Accession= 3ZG9_A, Description= PENICILLIN-BINDING PROTEIN 4; PENICILLIN-BINDING PROTEIN; HET: GOL, DXF; 1.804A {LISTERIA MONOCYTOGENES}, Probability= 59.4. Coverage= 11.0048, SubjectRange= 24:47, QueryRange= 24:90. CDD= . /note=Most annotated start with Francesca and Dorin both calling the same start (2 phage pham). /note=Covers all cp, overlap of -1, best zscores, and LORF /note=PhagesDB Blast has hits on Francesca and NiceHouse /note=HHPRED has no good hits /note=NCBI Blast has a good evalue for hypothetical Rhodococcus Phage, but % identity is too low... CDS 115256 - 115444 /gene="222" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_222" /note=Original Glimmer call @bp 115256 has strength 6.76; Genemark calls start at 115256 /note=SSC: Start = 115256, Stop = 115444. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.451 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 189 bp is not the longest possible ORF. GAP: -8 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 219, Function= function unknown, EValue= 2.0E-26. NCBIBLAST= . HHPRED= Accession= cd16250, Description= EFh_DTNB; EF-hand-like motif found in beta-dystrobrevin. Beta-dystrobrevin, also termed dystrobrevin beta (DTN-B), is a dystrophin-related protein that is restricted to non-muscle tissues and is abundantly expressed in brain, lung, kidney, and liver., Probability= 86.4. Coverage= 82.2581, SubjectRange= 49:100, QueryRange= 49:60. CDD= . /note=Starterator has most called start, pham is only Francesca and Dorin, but both called the same start /note=Not LORF, but called start has better scores and no major overlap, covers all CP /note=PhagesDB Blast only has a hit on Francesca /note=HHPRED has no good hits (poor evalues) /note=NCBI has no hits. CDS 115460 - 115741 /gene="223" /product="membrane protein" /function="membrane protein" /locus tag="Dorin_223" /note=Original Glimmer call @bp 115616 has strength 4.18; Genemark calls start at 115460 /note=SSC: Start = 115460, Stop = 115741. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.749 is not the highest start score. SCS: Start is not called by Glimmer and is called by Genemark. LO: 282 bp is the longest possible ORF. GAP: 15 bp. ST: SS=NA. F: membrane protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 220, Function= function unknown, EValue= 1.0E-17. NCBIBLAST= . HHPRED= Accession= PF15471.10, Description= TMEM171 ; Transmembrane protein family 171, Probability= 85.8. Coverage= 55.914, SubjectRange= 20:74, QueryRange= 20:55. CDD= . /note=Changed start to cover all CP and lengthen the gene. Shorten gap. Now LORF, though zscores are not the best. /note=PhagesDB Blast only has a match with Francesca /note=HHPRED all had poor evalues, but good identity/etc, function calls of Transmembrane proteins /note= and NCBI Blast have no good hits /note=Deep TMHMM has 2 TMR hits CDS 115745 - 116227 /gene="224" /product="DprA-like DNA processing chain A" /function="DprA-like DNA processing chain A" /locus tag="Dorin_224" /note=Original Glimmer call @bp 115745 has strength 6.23; Genemark calls start at 115745 /note=SSC: Start = 115745, Stop = 116227. (Forward). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 2.313 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 483 bp is the longest possible ORF. GAP: 3 bp. ST: SS=NA. F: DprA-like DNA processing chain A. FS: PHDBLAST= PhageName= Enceladus, ProteinNumber= 82, Function= DprA-like DNA processing chain A, EValue= 1.0E-17. NCBIBLAST= PhageName= DprA-like DNA processing chain A [Mycobacterium phage Enceladus], Coverage= 46.25, SubjectRange= 12:83, QueryRange= 12:158, EValue= 1.95891E-18. HHPRED= Accession= SCOP_d3uqza2, Description= c.129.1.4 (A:72-282) DNA processing protein A (DprA) {Pneumococcus (Streptococcus pneumoniae) [TaxId: 1313]} | CLASS: Alpha and beta proteins (a/b), FOLD: MCP/YpsA-like, SUPFAM: MCP/YpsA-like, FAM: SLOG domain from DNA-Processing proteins, Probability= 99.4. Coverage= 98.75, SubjectRange= 36:177, QueryRange= 36:159. CDD= . /note=Does not have the most annotated start. Only Dorin and Francesca called this start...2/57 in pham but called 100% when present. /note=Covers all CP, not the best zscores but passable, LORF /note=PhagesBD Blast has a hit on Francesca and phages from L1 cluster (AppleTree2, DirkDirk, etc.). Some call unknown function and others call DPRA-like DNA Processing Chain A /note=HHPRED: lots of good hits for DNA Processing Protein A /note=NCBI Blast has a bunch of good hits for hypothetical protein and a few for DprA-like DNA processing chain A. CDS 116237 - 116500 /gene="225" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_225" /note=Original Glimmer call @bp 116237 has strength 6.79; Genemark calls start at 116237 /note=SSC: Start = 116237, Stop = 116500. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.756 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 264 bp is the longest possible ORF. GAP: 9 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 222, Function= function unknown, EValue= 1.0E-42. NCBIBLAST= . HHPRED= Accession= 6QX9_A1, Description= Splicing factor 3A subunit 1,Splicing factor 3A subunit 1,Splicing factor 3A subunit 1; RNP complex, splicing, RNA, protein, spliceosome; HET: IHP, M7M, GTP; 3.28A {Homo sapiens}, Probability= 63.8. Coverage= 89.6552, SubjectRange= 159:257, QueryRange= 159:85. CDD= . /note=Most annotated start in starterator (pham is only Francesca and Dorin), both called this start. /note=LORF, good cp and coverage, zscores are best in the group /note=PhagesDB blast only hit on Francesca /note=HHPRED had no good hits /note=NCBI blast had no hits CDS 116500 - 116673 /gene="226" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_226" /note=Original Glimmer call @bp 116500 has strength 4.41; Genemark calls start at 116500 /note=SSC: Start = 116500, Stop = 116673. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.714 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 174 bp is the longest possible ORF. GAP: -1 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 223, Function= function unknown, EValue= 3.0E-27. NCBIBLAST= . HHPRED= Accession= SCOP_d2fm9a1, Description= a.257.1.1 (A:49-263) Cell invasion protein SipA, N-terminal domain {Salmonella typhimurium [TaxId: 90371]} | CLASS: All alpha proteins, FOLD: SipA N-terminal domain-like, SUPFAM: SipA N-terminal domain-like, FAM: SipA N-terminal domain-like, Probability= 84.3. Coverage= 28.0702, SubjectRange= 197:213, QueryRange= 197:23. CDD= . /note=dorin and francesca called same start, gap is okay not bad overlap. longest possiible orf and covera all cp. z scores good. /note=Phagesdb Blast: only good hit with francesca /note=HHPRED: no good hits as bad e values /note=NCBI Blast has no hits CDS 116697 - 116822 /gene="227" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_227" /note=Original Glimmer call @bp 116697 has strength 3.67 /note=SSC: Start = 116697, Stop = 116822. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.607 is not the highest start score. SCS: Start is called by Glimmer and is not called by Genemark. LO: 126 bp is not the longest possible ORF. GAP: 23 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 224, Function= function unknown, EValue= 1.0E-20. NCBIBLAST= . HHPRED= Accession= cd16819, Description= SP-RING_PIAS2; SP-RING finger found in protein inhibitor of activated STAT protein 2 (PIAS2) and similar proteins., Probability= 84.4. Coverage= 21.9512, SubjectRange= 40:49, QueryRange= 40:36. CDD= . /note=dorin and francesca call same start. longest possible orf and covers all cp, also has best scores. /note=Phagesdb Blast: hits francesca with good e value /note=HHPRED: no good hits as bad e value /note=NCBI Blast: no hits CDS 116869 - 117027 /gene="228" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_228" /note=Original Glimmer call @bp 116869 has strength 5.48 /note=SSC: Start = 116869, Stop = 117027. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.077 is not the highest start score. SCS: Start is called by Glimmer and is not called by Genemark. LO: 159 bp is the longest possible ORF. GAP: 46 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 225, Function= function unknown, EValue= 3.0E-24. NCBIBLAST= . HHPRED= Accession= PF06785.15, Description= UPF0242 ; Uncharacterised protein family (UPF0242) N-terminus, Probability= 52.6. Coverage= 30.7692, SubjectRange= 1:17, QueryRange= 1:42. CDD= . /note=Has the most annotated start, pham of 2, both Dorin and Francesca called. /note=Covers al cp, but has poor coding coverage overall (only low-level stuff with a spike in the middle) /note=Best z-score and LORF /note=HHPRED has no good hits, all have poor evalues/coverage/etc. /note=NCBI Blast has no hits /note=Deep TMHMM turned up nothing. CDS 117088 - 117813 /gene="229" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_229" /note=Original Glimmer call @bp 117088 has strength 9.73; Genemark calls start at 117088 /note=SSC: Start = 117088, Stop = 117813. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.565 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 726 bp is the longest possible ORF. GAP: 60 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 226, Function= function unknown, EValue= 1.0E-140. NCBIBLAST= . HHPRED= Accession= 3IOX_A, Description= AgI/II; alpha helix, PPII helix, supersandwich fold, surface adhesin, Cell wall, Peptidoglycan-anchor, CELL ADHESION; HET: PMS; 1.8A {Streptococcus mutans}, Probability= 23.7. Coverage= 12.0332, SubjectRange= 447:476, QueryRange= 447:153. CDD= . /note=2 member pham (Dorin and Francesca), both called this start. /note=Selected start has the best zscore, socers all CP, and LORF. /note=PhagesDB Blast turns up a match to Francesca and Gilgamesh (both called hypothetical protein) /note=HHPRED had no good hits (evalue ~300) /note=NCBI Blast had no hits /note=Deep TMHMM had no hits CDS 117825 - 118046 /gene="230" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_230" /note=Original Glimmer call @bp 117825 has strength 7.78; Genemark calls start at 117825 /note=SSC: Start = 117825, Stop = 118046. (Forward). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 3.185 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 222 bp is not the longest possible ORF. GAP: 11 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 227, Function= function unknown, EValue= 8.0E-38. NCBIBLAST= . HHPRED= Accession= PF14584.10, Description= DUF4446 ; Protein of unknown function (DUF4446), Probability= 84.4. Coverage= 39.726, SubjectRange= 84:114, QueryRange= 84:51. CDD= . /note=Starterator called most annotated start, pham has only Francesca and Dorin, both called it. /note=Called start has the strongest zscores, and covers the majority of the CP. The longer start (not called here) would only increase length by 15 bp and cover slightly more cp. However, its scores are much worse... /note=HHPRED has no good hits (poor evalues) /note=NCBI Blast has no hits /note=Phages DB Blast had no hits besides Francesca CDS 118055 - 118540 /gene="231" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_231" /note=Original Glimmer call @bp 118055 has strength 10.27; Genemark calls start at 118055 /note=SSC: Start = 118055, Stop = 118540. (Forward). CP: Does not contain all GeneMarkHost capacity. SD: ZScore 2.189 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 486 bp is the longest possible ORF. GAP: 8 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 228, Function= function unknown, EValue= 7.0E-93. NCBIBLAST= . HHPRED= Accession= cd16400, Description= ParB_Srx_like_nuclease; ParB/Srx_like nuclease and putative transcriptional regulators related to SbnI. This family contains a Pyrococcus Furiosus enzyme reported to have DNA nuclease activity and resembles the N-terminal domain of ParB proteins of the parABS bacterial chromosome partitioning system., Probability= 96.6. Coverage= 42.8571, SubjectRange= 19:70, QueryRange= 19:136. CDD= . /note=suggested start covers most coding potential and longest possible orf. /note=Phagesdb Blast: only good e value for francesca /note=HHPRED has no good hits and NCBI Blast does not have any hits CDS 118540 - 118683 /gene="232" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_232" /note=Original Glimmer call @bp 118540 has strength 8.56; Genemark calls start at 118540 /note=SSC: Start = 118540, Stop = 118683. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.196 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 144 bp is the longest possible ORF. GAP: -1 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 229, Function= function unknown, EValue= 4.0E-21. NCBIBLAST= . HHPRED= Accession= PF21184.1, Description= HAT1_C_fung ; Fungal HAT1, C-terminal, Probability= 70.7. Coverage= 25.5319, SubjectRange= 11:23, QueryRange= 11:22. CDD= . /note=chosen start covers all coding potential and is the longest possible orf. /note=Phagesdb blast only has goof hit for franncesca. HHPRED has no good hits as e values are bad and NCBI Blast has no hits. CDS 118676 - 118888 /gene="233" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_233" /note=Original Glimmer call @bp 118676 has strength 6.93; Genemark calls start at 118676 /note=SSC: Start = 118676, Stop = 118888. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.068 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 213 bp is the longest possible ORF. GAP: -8 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 230, Function= function unknown, EValue= 2.0E-33. NCBIBLAST= . HHPRED= Accession= 2KBW_B, Description= BH3-interacting domain death agonist; Mcl-1, Bid_BH3, complex, Alternative splicing, Apoptosis, Cytoplasm, Developmental protein, Differentiation, Membrane, Mitochondrion, Nucleus, Phosphoprotein, Polymorphism, Transmembrane; NMR {Homo sapiens}, Probability= 68.2. Coverage= 27.1429, SubjectRange= 3:22, QueryRange= 3:21. CDD= . /note=Starterator calls most annotated start. 2 member pham (Francesca and Dorin) and both called this start. /note=Called start has the best zscores, is LORF, and covers all the cp /note=PhagesDB Blast only has a hit on Francesca (which called hypothetical protein) /note=HHPRED had no good hits (poor evalues) /note=NCBI Blast had no hits CDS 118891 - 119103 /gene="234" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_234" /note=Original Glimmer call @bp 118891 has strength 7.35; Genemark calls start at 118891 /note=SSC: Start = 118891, Stop = 119103. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.78 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 213 bp is the longest possible ORF. GAP: 2 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 231, Function= function unknown, EValue= 6.0E-35. NCBIBLAST= . HHPRED= Accession= PF11967.12, Description= RecO_N ; Recombination protein O N terminal, Probability= 84.0. Coverage= 40.0, SubjectRange= 2:30, QueryRange= 2:29. CDD= . /note=Starterator called start is most called, 2/2, only other member of pham is Francesca. /note=Suggested start is LORF, has best scores, contains all CP, same start called in Francesca. /note= /note=BLAST and HHPRED had no significant results, so protein called hypothetical. CDS 119103 - 119795 /gene="235" /product="DNA methyltransferase" /function="DNA methyltransferase" /locus tag="Dorin_235" /note=Original Glimmer call @bp 119103 has strength 6.69; Genemark calls start at 119103 /note=SSC: Start = 119103, Stop = 119795. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.949 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 693 bp is the longest possible ORF. GAP: -1 bp. ST: SS=NA. F: DNA methyltransferase. FS: PHDBLAST= PhageName= Ruotula, ProteinNumber= 92, Function= DNA methyltransferase, EValue= 2.0E-76. NCBIBLAST= PhageName= DNA methyltransferase [Mycobacterium phage Ruotula], Coverage= 97.3913, SubjectRange= 3:210, QueryRange= 3:224, EValue= 4.60151E-94. HHPRED= Accession= SCOP_d3ubta_, Description= c.66.1.26 (A:) automated matches {Haemophilus aegyptius [TaxId: 197575]} | CLASS: Alpha and beta proteins (a/b), FOLD: S-adenosyl-L-methionine-dependent methyltransferases, SUPFAM: S-adenosyl-L-methionine-dependent methyltransferases, FAM: C5 cytosine-specific DNA methylase, DCM, Probability= 99.5. Coverage= 56.087, SubjectRange= 1:160, QueryRange= 1:133. CDD= Accession= COG0270, Coverage= 46.087, SubjectRange= 1:119, QueryRange= 1:106, EValue= 1.76345E-14. /note=Selected start had best scores, LORF, contained all CP. /note=Starterator: does not have the most annotated start, found in 12/204 /note=Multiple blast and HHPRED hits with good E-values signaling a DNA methyltransferase function. Some hits were Q1:T1 /note=Conserved domain database: checked, but no good scores (evalue was fine, but other scores excluded it from use) /note=Approved functions list: no further action/check needed. It is on the approved functions list CDS 119795 - 120073 /gene="236" /product="membrane protein" /function="membrane protein" /locus tag="Dorin_236" /note=Original Glimmer call @bp 119855 has strength 4.99; Genemark calls start at 119795 /note=SSC: Start = 119795, Stop = 120073. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.187 is not the highest start score. SCS: Start is not called by Glimmer and is called by Genemark. LO: 279 bp is the longest possible ORF. GAP: -1 bp. ST: SS=NA. F: membrane protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 233, Function= function unknown, EValue= 2.0E-49. NCBIBLAST= . HHPRED= Accession= 7SPP_C, Description= VNAR 2C02; RBD, VIRAL PROTEIN, VNAR, VIRAL PROTEIN-IMMUNE SYSTEM complex; HET: NAG, EDO; 1.96A {Severe acute respiratory syndrome coronavirus 2}, Probability= 78.8. Coverage= 16.3043, SubjectRange= 100:115, QueryRange= 100:60. CDD= . /note=SS (second start) did not contain all CP, and had worse scores. LORF is only start that has all CP and has best scores. Start changed to LORF /note= /note=PhagesDB Blast has good hits on Francesca, the rest are peripheral. /note=HHPred: no good scores /note=NCBI Blast: no hits /note=Deep TMHMM: 2 TMR CDS 120079 - 120228 /gene="237" /product="membrane protein" /function="membrane protein" /locus tag="Dorin_237" /note=Original Glimmer call @bp 120079 has strength 10.87; Genemark calls start at 120079 /note=SSC: Start = 120079, Stop = 120228. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.353 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 150 bp is not the longest possible ORF. GAP: 5 bp. ST: SS=NA. F: membrane protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 234, Function= function unknown, EValue= 2.0E-20. NCBIBLAST= . HHPRED= Accession= 7AR7_c, Description= Transmembrane protein; Complex-I Arabidopsis, ELECTRON TRANSPORT; HET: T7X, 8Q1, UQ9, PTY, LMN, FMN, NDP, SF4, PC7, PSF, PGT;{Arabidopsis thaliana}, Probability= 80.0. Coverage= 59.1837, SubjectRange= 17:46, QueryRange= 17:41. CDD= . /note=The two starts are very close together, both would encapsulate all the CP, but selected start was supported by starterator (2/2 in pham), and has better RBS scores. /note=Function is unknown due to lack of significant results from HHPred and BLAST. /note= /note=TMHMM has one membrane domain, checked with SOSUI to confirm CDS 120235 - 120441 /gene="238" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_238" /note=Original Glimmer call @bp 120235 has strength 12.68; Genemark calls start at 120235 /note=SSC: Start = 120235, Stop = 120441. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.328 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 207 bp is the longest possible ORF. GAP: 6 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 235, Function= function unknown, EValue= 2.0E-32. NCBIBLAST= . HHPRED= Accession= 6DM9_C, Description= DHD15_extended_A; Computational design, heterodimer, coiled-coil, DE NOVO PROTEIN; HET: FME; 2.25A {synthetic construct}, Probability= 78.2. Coverage= 48.5294, SubjectRange= 19:52, QueryRange= 19:40. CDD= . /note=Has most annotated start (2/2 in pham with Francesca) /note=Start kept due to good CP and full CP coverage, Longest ORF, and best scores. /note=Function cannot be called due to lack of significant HHPred and BLAST results. CDS 120735 - 121214 /gene="239" /product="SprT-like protease" /function="SprT-like protease" /locus tag="Dorin_239" /note=Genemark calls start at 120735 /note=SSC: Start = 120735, Stop = 121214. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.185 is the highest start score. SCS: Start is not called by Glimmer and is called by Genemark. LO: 480 bp is the longest possible ORF. GAP: 293 bp. ST: SS=NA. F: SprT-like protease. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 237, Function= function unknown, EValue= 7.0E-95. NCBIBLAST= PhageName= SprT-like protease [Mycobacterium phage SirPhilip] >gb|ASR85292.1| SprT-like protease [Mycobacterium phage SirPhilip], Coverage= 85.5346, SubjectRange= 45:175, QueryRange= 45:136, EValue= 7.59587E-36. HHPRED= Accession= 6MDW_A, Description= SprT-like domain-containing protein Spartan; DPC repair protease, DNA BINDING PROTEIN; HET: FLC, MLZ, ADP; 1.5A {Homo sapiens}, Probability= 99.8. Coverage= 89.3082, SubjectRange= 22:192, QueryRange= 22:147. CDD= Accession= pfam10263, Coverage= 69.8113, SubjectRange= 4:131, QueryRange= 4:125, EValue= 2.38213E-11. /note=Does not have the most annotated start...only found in 5/391 in pham and called in 2/5. NI /note=Phagesdb Blast: several hits in similar phages, Francesca, JF1, and SirPhilip. Called function in the latter 2 were consistant with SprT-like protease. /note=HHPRED had several hits that included a "Metallopeptidase Zymogen" function. (also a hit for SprT-like protease) /note=NCBI Blast: all hits for SprT-like protease /note=NCBI Blast is stronger evidence of function, will go with this call /note=Approved functions list has nothing to add to this, no more requirements were necessary to call. CDS 121224 - 121532 /gene="240" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_240" /note=Genemark calls start at 121224 /note=SSC: Start = 121224, Stop = 121532. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.353 is the highest start score. SCS: Start is not called by Glimmer and is called by Genemark. LO: 309 bp is the longest possible ORF. GAP: 9 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 238, Function= function unknown, EValue= 1.0E-58. NCBIBLAST= PhageName= hypothetical protein SEA_NICEHOUSE_245 [Rhodococcus phage NiceHouse], Coverage= 94.1176, SubjectRange= 2:97, QueryRange= 2:99, EValue= 1.45402E-30. HHPRED= Accession= 1KRX_A, Description= NITROGEN REGULATION PROTEIN NR(I); two component signal transduction, receiver domain, BeF3, phosphorylation, Bacterial nitrogen regulatory protein, SIGNALING PROTEIN; HET: BEF; NMR {Salmonella typhimurium} SCOP: c.23.1.1, Probability= 98.7. Coverage= 94.1176, SubjectRange= 4:107, QueryRange= 4:98. CDD= . /note=Does not have the most annotated start, but is called 100% when present. /note=Start kept due to good coverage of good CP, best scores and longest ORF. /note=HHPRED has boarderline hits for a variety of functions /note=No function called due to lack of significant BLAST and HHPred results. None are particularly convincing /note=NCBI Blast: Good evalues for hypothetical proteins in a variety of phages. (hosts included Rhodococcus, Gordonia, and Streptomyces) /note=Phagesdb Blast includes hits with the DD and BK1 phams CDS complement (120603 - 121535) /gene="241" /product="nothingslider" /function="nothingslider" /locus tag="Dorin_241" /note=Original Glimmer call @bp 121535 has strength 4.33 /note=SSC: Start = 121535, Stop = 120603. (Reverse). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.883 is not the highest start score. SCS: Start is called by Glimmer and is not called by Genemark. LO: 933 bp is the longest possible ORF. GAP: 58 bp. ST: SS=NA. F: nothingslider. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 236, Function= function unknown, EValue= 0.0. NCBIBLAST= . HHPRED= Accession= PF19538.3, Description= DUF6062 ; Family of unknown function (DUF6062), Probability= 18.6. Coverage= 6.45161, SubjectRange= 56:76, QueryRange= 56:281. CDD= . /note=Start chosen due to being longest ORF, decent scores, and covering all CP. No function can be called due to lack of significant HHpred and BLAST results. /note= /note=- Almost no CP here and it overlaps with two other forward genes with good CP. In Francesca, we were planning on deleting this gene... (- Drew and Simon) /note= /note=JMH: this is the SS, wrong box selected CDS 121594 - 122166 /gene="242" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_242" /note=Original Glimmer call @bp 121594 has strength 6.64; Genemark calls start at 121594 /note=SSC: Start = 121594, Stop = 122166. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.104 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 573 bp is not the longest possible ORF. GAP: 58 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Yara, ProteinNumber= 85, Function= function unknown, EValue= 2.0E-15. NCBIBLAST= PhageName= hypothetical protein [Pseudonocardia sp. C8] >gb|MBC3189465.1| hypothetical protein [Pseudonocardia sp. C8], Coverage= 93.6842, SubjectRange= 23:192, QueryRange= 23:183, EValue= 8.1267E-28. HHPRED= Accession= PF09629.14, Description= YorP ; YorP protein, Probability= 90.9. Coverage= 36.3158, SubjectRange= 5:67, QueryRange= 5:159. CDD= . /note=Calles most annotated start in starterator /note=Don`t call LORF because of starterator start and advantage in RBS scores /note=Start kept due to good RBS, captures all CP, and suggested start /note=Strong support for hypothetical protein from Blast results /note=No TMHMM results CDS 122302 - 122571 /gene="243" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_243" /note=Original Glimmer call @bp 122302 has strength 7.33; Genemark calls start at 122302 /note=SSC: Start = 122302, Stop = 122571. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.769 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 270 bp is the longest possible ORF. GAP: 135 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 240, Function= function unknown, EValue= 1.0E-48. NCBIBLAST= PhageName= hypothetical protein KHQ85_gp017 [Gordonia phage Skog] >gb|QIG58169.1| hypothetical protein SEA_SKOG_17 [Gordonia phage Skog], Coverage= 100.0, SubjectRange= 3:98, QueryRange= 3:89, EValue= 5.91598E-11. HHPRED= Accession= PF21617.1, Description= CV_2116-like ; Uncharacterized protein CV_2116-like, Probability= 86.9. Coverage= 55.0562, SubjectRange= 10:61, QueryRange= 10:79. CDD= . /note=Has the most annotated start, supports it despite worse RBS scores. /note=Start chosen due to being longest ORF, covering all CP and Starterator. /note=Function unknown due to a lack of HHpred or BLAST hits with called functions. CDS 122573 - 122971 /gene="244" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_244" /note=Original Glimmer call @bp 122573 has strength 9.42; Genemark calls start at 122573 /note=SSC: Start = 122573, Stop = 122971. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.194 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 399 bp is the longest possible ORF. GAP: 1 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Waits, ProteinNumber= 48, Function= function unknown, EValue= 0.008. NCBIBLAST= . HHPRED= Accession= PF19698.3, Description= DUF6197 ; Family of unknown function (DUF6197), Probability= 99.8. Coverage= 91.6667, SubjectRange= 5:139, QueryRange= 5:131. CDD= . /note=suggested start is LORF, has good scores, and only start that contains all CP. Same start is called in Francesca. Chosen start called 2 out of 2 in starterator. /note=Phagesdb Blast good hit only with francesca. HHPRED good hit for hypothetical protein CDS 123030 - 123254 /gene="245" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_245" /note=Original Glimmer call @bp 123030 has strength 6.48; Genemark calls start at 123030 /note=SSC: Start = 123030, Stop = 123254. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.341 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 225 bp is the longest possible ORF. GAP: 58 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Odette_Draft, ProteinNumber= 241, Function= function unknown, EValue= 2.0E-4. NCBIBLAST= . HHPRED= Accession= PF07494.15, Description= Reg_prop ; Two component regulator propeller, Probability= 87.3. Coverage= 18.9189, SubjectRange= 10:24, QueryRange= 10:29. CDD= . /note=suggested start is LORF, has good scores, and only start that contains all CP. Same start is called in Francesca. Start called 2 out of 2 in starterator. /note=Phagesdb Blast good hit only with francesca. HHPRED: no good hits as bad e values. CDS 123256 - 123435 /gene="246" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_246" /note=Original Glimmer call @bp 123256 has strength 4.22; Genemark calls start at 123256 /note=SSC: Start = 123256, Stop = 123435. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.586 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 180 bp is the longest possible ORF. GAP: 1 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 243, Function= function unknown, EValue= 1.0E-30. NCBIBLAST= . HHPRED= Accession= PF18067.5, Description= Lipase_C ; Lipase C-terminal domain, Probability= 88.9. Coverage= 33.8983, SubjectRange= 19:39, QueryRange= 19:23. CDD= . /note=suggested start is LORF, has good scores, and only start that contains all CP. Same start is called in Francesca. Start called 2 out of 2 in starterator. /note=No significant hits, leading to a hypothetical protein call for function. CDS 123493 - 123735 /gene="247" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_247" /note=Original Glimmer call @bp 123493 has strength 7.93; Genemark calls start at 123493 /note=SSC: Start = 123493, Stop = 123735. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.189 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 243 bp is the longest possible ORF. GAP: 57 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Cinna, ProteinNumber= 80, Function= RNA polymerase sigma factor, EValue= 6.2. NCBIBLAST= . HHPRED= Accession= PF04695.17, Description= Pex14_N ; Pex14 N-terminal domain, Probability= 74.1. Coverage= 22.5, SubjectRange= 27:45, QueryRange= 27:63. CDD= . /note=suggested start is LORF, has good scores, and only start that contains all CP. Same start is called in Francesca. Start called two out of two in starterator. /note= /note=No significant hits, leading to calling hypothetical protein as function. Some hits had RNA Polymerase sigma factor as function, but these hits all had poor E-values. CDS 123740 - 123925 /gene="248" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_248" /note=Original Glimmer call @bp 123749 has strength 6.58; Genemark calls start at 123749 /note=SSC: Start = 123740, Stop = 123925. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.636 is not the highest start score. SCS: Start is not called by Glimmer and is not called by Genemark. LO: 186 bp is not the longest possible ORF. GAP: 4 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 245, Function= function unknown, EValue= 6.0E-29. NCBIBLAST= . HHPRED= Accession= PF14155.10, Description= DUF4307 ; Domain of unknown function (DUF4307), Probability= 90.3. Coverage= 50.8197, SubjectRange= 2:31, QueryRange= 2:36. CDD= Accession= pfam14155, Coverage= 52.459, SubjectRange= 1:32, QueryRange= 1:36, EValue= 2.41783E-5. /note=Start chosen due to covering all CP and evidence from Francesca in Starterator. Function cannot be called due to a lack of HHpred and BLAST results. CDS 123929 - 124156 /gene="249" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_249" /note=Original Glimmer call @bp 123929 has strength 7.57; Genemark calls start at 123929 /note=SSC: Start = 123929, Stop = 124156. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.912 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 228 bp is the longest possible ORF. GAP: 3 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 246, Function= function unknown, EValue= 2.0E-37. NCBIBLAST= . HHPRED= Accession= PF15361.10, Description= RIC3 ; Resistance to inhibitors of cholinesterase homologue 3, Probability= 25.3. Coverage= 38.6667, SubjectRange= 120:149, QueryRange= 120:31. CDD= . /note=Start kept: good RBS, captures all CP. chosen start found 2 out of 2 in starterator. /note=no good hits for HHPRED and NCBI Blast. CDS 124153 - 124587 /gene="250" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_250" /note=Original Glimmer call @bp 124153 has strength 3.7; Genemark calls start at 124153 /note=SSC: Start = 124153, Stop = 124587. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.503 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 435 bp is the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Trina, ProteinNumber= 137, Function= Ro-like RNA binding protein, EValue= 0.95. NCBIBLAST= . HHPRED= Accession= 7M2G_A, Description= Interleukin-2; CYTOKINE IL-2 mutein human, CYTOKINE; 1.79A {Homo sapiens} SCOP: a.26.1.2, Probability= 56.5. Coverage= 47.2222, SubjectRange= 60:131, QueryRange= 60:114. CDD= . /note=Suggested start is LORF, has good scores, and is only start that contains all CP. Also start has synteny with Francesca. Starterator: chosen start found 2 out of 2 in genes /note=No good BLAST or HHPRED hits were found, leading to protein being called as hypothetical protein. CDS 124619 - 124789 /gene="251" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_251" /note=Original Glimmer call @bp 124631 has strength 3.91; Genemark calls start at 124631 /note=SSC: Start = 124619, Stop = 124789. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.864 is not the highest start score. SCS: Start is not called by Glimmer and is not called by Genemark. LO: 171 bp is not the longest possible ORF. GAP: 31 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 248, Function= function unknown, EValue= 2.0E-23. NCBIBLAST= . HHPRED= Accession= 3C3W_B, Description= TWO COMPONENT TRANSCRIPTIONAL REGULATORY PROTEIN DEVR; RESPONSE REGULATOR, TWO-COMPONENT REGULATORY SYSTEM, DNA-BINDING PROTEIN, TUBERCULOSIS, Transcription, Transcription regulation; 2.2A {Mycobacterium tuberculosis}, Probability= 22.6. Coverage= 46.4286, SubjectRange= 190:216, QueryRange= 190:29. CDD= . /note=Start chosen because it covers all CP, and has okay scores. Suggested start in starterator does not cover all coding potential though it has better scores. Function cannot be called due to lack of HHpred and BLAST results. CDS 124806 - 125129 /gene="252" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_252" /note=Original Glimmer call @bp 124806 has strength 6.15; Genemark calls start at 124806 /note=SSC: Start = 124806, Stop = 125129. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.922 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 324 bp is the longest possible ORF. GAP: 16 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 249, Function= function unknown, EValue= 1.0E-60. NCBIBLAST= . HHPRED= Accession= 7WHG_G, Description= Lokiarchaeota gelsolin (2DGel); Asgard, gelsolin, actin, filament, STRUCTURAL PROTEIN; HET: HIC, ADP; 3.25A {Oryctolagus cuniculus}, Probability= 97.7. Coverage= 54.2056, SubjectRange= 288:334, QueryRange= 288:93. CDD= . /note=Start kept: good RBS. all CP covered, suggested starterator start (called 2 out of 2). /note=One very minor potential to call a structural protein but not strong enough to call /note=No TMHMM results CDS 125138 - 125365 /gene="253" /product="membrane protein" /function="membrane protein" /locus tag="Dorin_253" /note=Original Glimmer call @bp 125138 has strength 6.46; Genemark calls start at 125138 /note=SSC: Start = 125138, Stop = 125365. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.189 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 228 bp is the longest possible ORF. GAP: 8 bp. ST: SS=NA. F: membrane protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 250, Function= function unknown, EValue= 6.0E-38. NCBIBLAST= . HHPRED= Accession= PF18935.4, Description= DUF5683 ; Family of unknown function (DUF5683), Probability= 77.6. Coverage= 45.3333, SubjectRange= 9:44, QueryRange= 9:54. CDD= . /note=Start chosen due to longest ORF, good scores, and good CP and good CP coverage. suggest start found 2 out of 2 genes. Lack of HHPred and BLAST evidence means we cannot call the function at this time. /note=TMHMM cals 2 membrane domains CDS 125436 - 125747 /gene="254" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_254" /note=Original Glimmer call @bp 125436 has strength 6.64; Genemark calls start at 125436 /note=SSC: Start = 125436, Stop = 125747. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.185 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 312 bp is not the longest possible ORF. GAP: 70 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 251, Function= function unknown, EValue= 6.0E-54. NCBIBLAST= . HHPRED= Accession= cd19176, Description= SET_SETD3; SET domain found in SET domain-containing protein 3 (SETD3) and similar proteins., Probability= 56.4. Coverage= 26.2136, SubjectRange= 2:30, QueryRange= 2:100. CDD= . /note=Start kept due to best scores, covering all CP but is not longest possible orf. chosen start is called 2 out of 2 in starterator. Function called as hypothetical protein due to lack of significant HHpred or BLAST results. CDS 125852 - 125998 /gene="255" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_255" /note=Original Glimmer call @bp 125852 has strength 6.0; Genemark calls start at 125852 /note=SSC: Start = 125852, Stop = 125998. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.189 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 147 bp is not the longest possible ORF. GAP: 104 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= Accession= SCOP_d1st6a6, Description= a.24.9.1 (A:647-718) Vinculin {Chicken (Gallus gallus) [TaxId: 9031]} | CLASS: All alpha proteins, FOLD: Four-helical up-and-down bundle, SUPFAM: alpha-catenin/vinculin-like, FAM: alpha-catenin/vinculin, Probability= 75.9. Coverage= 83.3333, SubjectRange= 24:70, QueryRange= 24:43. CDD= . /note=Contains all CP, has best scores. Same start called in Francesca. /note=HHPRED, NCBI Blast, Phagesdb Blast are all inconclusive /note=Conserved Domains: N/A /note=Deep TMHMM: N/A CDS 126014 - 126187 /gene="256" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_256" /note=Original Glimmer call @bp 126014 has strength 13.26; Genemark calls start at 126014 /note=SSC: Start = 126014, Stop = 126187. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.189 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 174 bp is the longest possible ORF. GAP: 15 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 253, Function= function unknown, EValue= 4.0E-26. NCBIBLAST= . HHPRED= Accession= SCOP_d6gwka_, Description= b.38.1.0 (A:) automated matches {Caulobacter vibrioides [TaxId: 190650]} | CLASS: All beta proteins, FOLD: Sm-like fold, SUPFAM: Sm-like ribonucleoproteins, FAM: automated matches, Probability= 79.1. Coverage= 45.614, SubjectRange= 20:46, QueryRange= 20:37. CDD= . /note=Has the most annotated start, pham of only Francesca and Dorin. /note=Suggested start chosen is longest ORF, covers CP, and has ok scores. /note=Function can`t be called due to lack of significant HHpred and BLAST results. CDS 126198 - 126422 /gene="257" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_257" /note=Original Glimmer call @bp 126198 has strength 12.89; Genemark calls start at 126198 /note=SSC: Start = 126198, Stop = 126422. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.353 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 225 bp is not the longest possible ORF. GAP: 10 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 254, Function= function unknown, EValue= 4.0E-34. NCBIBLAST= . HHPRED= Accession= PF11305.12, Description= DUF3107 ; Protein of unknown function (DUF3107), Probability= 76.8. Coverage= 36.4865, SubjectRange= 22:49, QueryRange= 22:52. CDD= . /note=Suggested start chosen due to small gap and no overlap, good scores and covers all CP. Starterator only has Dorin and Francesca, but both call the same start. /note=HHPRED has low probabilities and high e-values, NCBI has no results, and PhagesDB blast doesn`t have good results, so function is hypothetical protein /note=Conserved Domain: N/A /note=Deep TMHMM: N/A CDS 126422 - 126868 /gene="258" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_258" /note=Original Glimmer call @bp 126422 has strength 8.88; Genemark calls start at 126422 /note=SSC: Start = 126422, Stop = 126868. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.661 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 447 bp is the longest possible ORF. GAP: -1 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Wakanda, ProteinNumber= 123, Function= function unknown, EValue= 3.0E-8. NCBIBLAST= PhageName= hypothetical protein EVB79_052 [Rhizobium phage RHph_N3_13] >gb|QIG69878.1| hypothetical protein F67_I3_11_052 [Rhizobium phage RHph_I3_11], Coverage= 87.8378, SubjectRange= 12:136, QueryRange= 12:141, EValue= 2.04053E-11. HHPRED= Accession= PF11181.12, Description= YflT ; Heat induced stress protein YflT domain, Probability= 57.3. Coverage= 13.5135, SubjectRange= 1:21, QueryRange= 1:107. CDD= . /note=Starterator only had Dorin and Francesca but they both called the same start /note=HHPRED, NCBI Blast, and Phagesdb are all inconclusive /note=Conserved Domains: N/A /note=Deep TMHMM: N/A CDS 126871 - 127044 /gene="259" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_259" /note=Original Glimmer call @bp 126871 has strength 4.63; Genemark calls start at 126871 /note=SSC: Start = 126871, Stop = 127044. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.997 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 174 bp is not the longest possible ORF. GAP: 2 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 256, Function= function unknown, EValue= 1.0E-25. NCBIBLAST= . HHPRED= Accession= 6T8M_B, Description= Prolyl 4-hydroxylase subunit alpha; Prolyl Hydroxylase, Oxygen Sensing, 2-oxoglutarate and iron dependent oxygenase, OXIDOREDUCTASE; HET: OGA, GOL; 2.02A {Dictyostelium discoideum}, Probability= 62.2. Coverage= 78.9474, SubjectRange= 160:224, QueryRange= 160:54. CDD= . /note=CP good, contained by suggested start. Hard to distinguish between first 2 starts but second start has slightly better scores. /note=HHPRED, NCBI Blast, and Phagesdb Blast are all inconclusive /note=Conserved Domains: N/A /note=Deep TMHMM: N/A CDS 127041 - 127331 /gene="260" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_260" /note=Original Glimmer call @bp 127041 has strength 6.16; Genemark calls start at 127041 /note=SSC: Start = 127041, Stop = 127331. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.912 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 291 bp is the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Magritte, ProteinNumber= 139, Function= function unknown, EValue= 1.6. NCBIBLAST= . HHPRED= Accession= 6TFL_D, Description= RNA-binding protein Lsm; Lsm, SmAP, RNA-chaperon, RNA BINDING PROTEIN; HET: CA, URI; 2.397A {Halobacterium salinarum R1}, Probability= 60.3. Coverage= 13.5417, SubjectRange= 49:62, QueryRange= 49:90. CDD= . /note=Start has best scores, LORF, only start that contains all CP. Starterator only has Dorin and Francesca, but both call the same start. /note=HHPRED did not have any good results; probability low and e-values high /note=PhagesDB blast also had no good results /note=NCBI Blast had no results. /note=Conserved Domain: N/A /note=Deep TMHMM: N/A CDS 127433 - 127594 /gene="261" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_261" /note=Original Glimmer call @bp 127433 has strength 4.85; Genemark calls start at 127433 /note=SSC: Start = 127433, Stop = 127594. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.185 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 162 bp is the longest possible ORF. GAP: 101 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Rabbitrun, ProteinNumber= 83, Function= function unknown, EValue= 1.0E-10. NCBIBLAST= PhageName= hypothetical protein L3Y19_gp079 [Gordonia phage Neville] >gb|AXQ64448.1| hypothetical protein SEA_NEVILLE_79 [Gordonia phage Neville], Coverage= 86.7924, SubjectRange= 7:52, QueryRange= 7:47, EValue= 1.53677E-11. HHPRED= Accession= PF13451.10, Description= zf-trcl ; Probable zinc-ribbon domain, Probability= 91.3. Coverage= 75.4717, SubjectRange= 2:42, QueryRange= 2:43. CDD= . /note=Same SS as Francesca, though within a larger pham /note=Phagesdb Blast had some low hits for "function unknown" /note=HHPRED had some decent evidence for two different proteins (not enough evidence to call this gene either one, though) /note=NCBI Blast is inconclusive /note=Conserved Domains: N/A /note=Deep TMHMM: N/A CDS 127598 - 127942 /gene="262" /product="membrane protein" /function="membrane protein" /locus tag="Dorin_262" /note=Original Glimmer call @bp 127763 has strength 2.34; Genemark calls start at 127610 /note=SSC: Start = 127598, Stop = 127942. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.235 is not the highest start score. SCS: Start is not called by Glimmer and is not called by Genemark. LO: 345 bp is the longest possible ORF. GAP: 3 bp. ST: SS=NA. F: membrane protein. FS: PHDBLAST= PhageName= Patbob_Draft, ProteinNumber= 289, Function= function unknown, EValue= 0.55. NCBIBLAST= . HHPRED= Accession= PF11044.12, Description= TMEMspv1-c74-12 ; Plectrovirus spv1-c74 ORF 12 transmembrane protein, Probability= 69.3. Coverage= 23.6842, SubjectRange= 1:28, QueryRange= 1:84. CDD= . /note=CP is not consistent, but LORF is only start that contains most CP. All scores are poor. Start changed from 127763 to LORF due to CP. Starterator not available. /note=Terrible results for HHPRED and no results for NCBI Blast. /note=Conserved Domain: N/A /note=Deep TMHMM: 2 hits for a membrane protein /note= /note=RAN: changed function to membrane protein because deep TMHMM had 2 hits for membrane proteins CDS 127932 - 128171 /gene="263" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_263" /note=Original Glimmer call @bp 127932 has strength 5.87; Genemark calls start at 127932 /note=SSC: Start = 127932, Stop = 128171. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.346 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 240 bp is the longest possible ORF. GAP: -11 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= Accession= cd21388, Description= GAT_STAM; non-canonical GAT domain found in metazoan signal transducing adapter molecules (STAMs) and similar proteins., Probability= 74.1. Coverage= 89.8734, SubjectRange= 5:77, QueryRange= 5:78. CDD= . /note=No starterator data /note=HHPRED, NCBI Blast, and Phagesdb Blast are all inconclusive /note=Conserved Domains: N/A /note=Deep TMHMM: N/A CDS 128171 - 128356 /gene="264" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_264" /note=Original Glimmer call @bp 128171 has strength 7.02; Genemark calls start at 128171 /note=SSC: Start = 128171, Stop = 128356. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.353 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 186 bp is the longest possible ORF. GAP: -1 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 260, Function= function unknown, EValue= 6.0E-32. NCBIBLAST= . HHPRED= Accession= PF08590.14, Description= DUF1771 ; Domain of unknown function (DUF1771), Probability= 41.6. Coverage= 65.5738, SubjectRange= 11:51, QueryRange= 11:50. CDD= . /note=Start chosen because it is the longest ORF, has best scores and covers all CP. Starterator only has Francesca and Dorin, but both call the same start. /note=NCBI blast had no results. HHPRED had low probabilities and very high e-values. No good results for PhagesDB blast either, so function is hypothetical protein. /note=Conserved Domain: N/A /note=Deep TMHMM: N/A CDS 128322 - 128546 /gene="265" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_265" /note=Original Glimmer call @bp 128322 has strength 4.95 /note=SSC: Start = 128322, Stop = 128546. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.678 is the highest start score. SCS: Start is called by Glimmer and is not called by Genemark. LO: 225 bp is the longest possible ORF. GAP: -35 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Peregrin, ProteinNumber= 247, Function= function unknown, EValue= 0.002. NCBIBLAST= . HHPRED= Accession= SCOP_d1m36a1, Description= g.37.1.2 (A:3-33) Monocytic leukemia zinc finger protein Moz {Human (Homo sapiens) [TaxId: 9606]} | CLASS: Small proteins, FOLD: beta-beta-alpha zinc fingers, SUPFAM: beta-beta-alpha zinc fingers, FAM: C2HC finger, Probability= 94.8. Coverage= 28.3784, SubjectRange= 4:26, QueryRange= 4:68. CDD= . /note=Start kept due to capturing all Cp and solid RBS, current suggested start on starterator as well /note=HHPRED has decent evidence for Zinc Finger protein /note=NCBI Blast and Phagesdb Blast are inconclusive /note=Conserved Domains: N/A /note=Deep TMHMM: N/A CDS 128550 - 128915 /gene="266" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_266" /note=Original Glimmer call @bp 128550 has strength 8.34; Genemark calls start at 128550 /note=SSC: Start = 128550, Stop = 128915. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.642 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 366 bp is the longest possible ORF. GAP: 3 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= PhageName= hypothetical protein [Streptomyces sp. CS081A] >gb|PVC73505.1| hypothetical protein DBP18_14255 [Streptomyces sp. CS081A], Coverage= 89.2562, SubjectRange= 57:156, QueryRange= 57:112, EValue= 2.71147E-10. HHPRED= Accession= PF19698.3, Description= DUF6197 ; Family of unknown function (DUF6197), Probability= 99.8. Coverage= 96.6942, SubjectRange= 8:140, QueryRange= 8:118. CDD= . /note=Does not call the most annotated start /note=HHPRED has good evidence for "family of unknown function" /note=NCBI Blast and Phagesdb Blast are inconclusive /note=Conserved Domains: N/A /note=Deep TMHMM: N/A CDS 128893 - 129231 /gene="267" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_267" /note=Original Glimmer call @bp 128893 has strength 9.96; Genemark calls start at 128893 /note=SSC: Start = 128893, Stop = 129231. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.155 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 339 bp is not the longest possible ORF. GAP: -23 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Madraxi_Draft, ProteinNumber= 81, Function= function unknown, EValue= 4.0E-8. NCBIBLAST= PhageName= hypothetical protein [Planosporangium mesophilum] >gb|NJC85354.1| hypothetical protein [Planosporangium mesophilum] >dbj|GII23181.1| hypothetical protein Pme01_27780 [Planosporangium mesophilum], Coverage= 97.3214, SubjectRange= 19:131, QueryRange= 19:110, EValue= 1.68523E-8. HHPRED= Accession= PF19698.3, Description= DUF6197 ; Family of unknown function (DUF6197), Probability= 99.9. Coverage= 99.1071, SubjectRange= 8:136, QueryRange= 8:112. CDD= . /note=First two starts are very close, have overlap, have similar scores. Hard to distinguish between these 2 starts. Only reason second start is chosen is because it is most common start in PHAM. /note=Starterator only has Francesca and Dorin, but both call the same start /note=One good hit on HHPRED for protein of unknown function: the rest of HHPRED results are bad. No good results from NCBI blast /note=Conserved Domain: N/A /note=Deep TMHMM: N/A CDS 129228 - 129557 /gene="268" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_268" /note=Original Glimmer call @bp 129228 has strength 8.46; Genemark calls start at 129228 /note=SSC: Start = 129228, Stop = 129557. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.194 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 330 bp is the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= PhageName= hypothetical protein GOOTI_034_00110 [Gordonia otitidis NBRC 100426], Coverage= 70.6422, SubjectRange= 6:84, QueryRange= 6:81, EValue= 5.32865E-4. HHPRED= Accession= PF19913.3, Description= WCOB ; Wolframin C-terminal OB-fold domain, Probability= 81.6. Coverage= 14.6789, SubjectRange= 29:45, QueryRange= 29:73. CDD= . /note=Starterator only has Dorin and Francesca but they both call the same start /note=HHPRED, NCBI Blast, and Phagesdb Blast are all inconclusive /note=Conserved Domains: N/A /note=Deep TMHMM: N/A CDS 129544 - 129828 /gene="269" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_269" /note=Original Glimmer call @bp 129544 has strength 7.0; Genemark calls start at 129544 /note=SSC: Start = 129544, Stop = 129828. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.755 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 285 bp is the longest possible ORF. GAP: -14 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 265, Function= function unknown, EValue= 2.0E-49. NCBIBLAST= . HHPRED= Accession= 1WV8_A, Description= hypothetical protein TTHA1013; Hypothetical, STRUCTURAL GENOMICS, Unknown function, novel fold, RIKEN Structural Genomics/Proteomics Initiative, RSGI; 2.2A {Thermus thermophilus} SCOP: d.304.1.1, Probability= 91.0. Coverage= 36.1702, SubjectRange= 12:46, QueryRange= 12:60. CDD= . /note=Start chosen due to being longest ORF, covering all CP, and best scores. /note=Starterator only has Francesca and Dorin, but both call the same start /note=Function is hypothetical protein due to lack of significant HHPRED results (high e-value) and no BLAST results. /note=Conserved Domain: N/A /note=Deep TMHMM: N/A CDS 129828 - 129989 /gene="270" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_270" /note=Genemark calls start at 129828 /note=SSC: Start = 129828, Stop = 129989. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.106 is the highest start score. SCS: Start is not called by Glimmer and is called by Genemark. LO: 162 bp is not the longest possible ORF. GAP: -1 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 266, Function= function unknown, EValue= 2.0E-26. NCBIBLAST= . HHPRED= Accession= 3BL4_B, Description= Uncharacterized protein; Structural genomics, Joint Center for Structural Genomics, JCSG, Protein Structure Initiative, PSI-2, unknown function; HET: SO4, MSE; 2.2A {Arthrobacter sp.}, Probability= 78.8. Coverage= 32.0755, SubjectRange= 103:120, QueryRange= 103:30. CDD= . /note=Start kept due to covering all CP and ok scores; not enough evidence to change start. /note=Starterator only has Francesca and Dorin, but both call the same start /note=Function called hypothetical protein because no NCBI blast results and very weak HHPRED results. /note=Conserved Domain: N/A /note=Deep TMHMM: N/A CDS 130013 - 130234 /gene="271" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_271" /note=Original Glimmer call @bp 130013 has strength 3.83; Genemark calls start at 130013 /note=SSC: Start = 130013, Stop = 130234. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.642 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 222 bp is not the longest possible ORF. GAP: 23 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= PhageName= hypothetical protein SEA_TROGGLEHUMPER_89 [Rhodococcus phage Trogglehumper], Coverage= 73.9726, SubjectRange= 1:68, QueryRange= 1:54, EValue= 1.95252E-4. HHPRED= Accession= PF08274.16, Description= YjdM_Zn_Ribbon ; PhnA Zinc-Ribbon, Probability= 91.9. Coverage= 39.726, SubjectRange= 7:29, QueryRange= 7:41. CDD= . /note=Starterator only has Dorin and Fancesca but they both call the same start /note=Phagesdb and NCBI Blast are inconclusive /note=HHPRED has some good evidence for Zinc-Ribbon, but it`s not enough evidence to call this gene that protein /note=Conserved Domains: N/A /note=Deep TMHMM: N/A CDS 130231 - 130557 /gene="272" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_272" /note=Original Glimmer call @bp 130231 has strength 8.87; Genemark calls start at 130231 /note=SSC: Start = 130231, Stop = 130557. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 1.987 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 327 bp is the longest possible ORF. GAP: -4 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Keelan, ProteinNumber= 26, Function= function unknown, EValue= 2.0E-5. NCBIBLAST= PhageName= hypothetical protein MARCHEWKA_03470 [Brevundimonas phage vB_BpoS-Marchewka], Coverage= 73.1481, SubjectRange= 17:96, QueryRange= 17:99, EValue= 1.18922E-4. HHPRED= Accession= PF19698.3, Description= DUF6197 ; Family of unknown function (DUF6197), Probability= 99.9. Coverage= 96.2963, SubjectRange= 6:141, QueryRange= 6:106. CDD= . /note=Suggested Start is LORF, good scores, has all CP, and has good synteny with Francesca. Not much difference between first and second starts when it comes to any of these factors (besides LORF and synteny) /note=Starterator only has Francesca and Dorin, but both call the same start /note=Only hits were for hypothetical proteins, therefore function is called as hypothetical protein. /note=HHPRED suggests hypothetical protein. NCBI doesn`t have good results /note=Conserved Domain: N/A /note=Deep TMHMM: N/A CDS 130559 - 130756 /gene="273" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_273" /note=Original Glimmer call @bp 130559 has strength 9.95; Genemark calls start at 130559 /note=SSC: Start = 130559, Stop = 130756. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.276 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 198 bp is the longest possible ORF. GAP: 1 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Tourach, ProteinNumber= 101, Function= function unknown, EValue= 7.9. NCBIBLAST= PhageName= type IIA topoisomerase [Bacillus phage Mgbh1] >gb|AMQ66727.1| type IIA topoisomerase [Bacillus phage Mgbh1], Coverage= 78.4615, SubjectRange= 3:58, QueryRange= 3:52, EValue= 1.51383E-5. HHPRED= Accession= PF05573.16, Description= NosL ; NosL, Probability= 81.9. Coverage= 83.0769, SubjectRange= 84:132, QueryRange= 84:56. CDD= . /note=Suggested start is LORF, has best scores, contains all CP. /note=Starterator only has Francesca and Dorin, but both call the same start /note=Most hits were for hypothetical proteins. One hit was Q2:T3 for a topoisomerase, but E-value was high. /note=HHPRED has low probabilities and high e-values. No good results for NCBI blast /note=Conserved Domain: N/A /note=Deep TMHMM: N/A CDS 130746 - 130955 /gene="274" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_274" /note=Original Glimmer call @bp 130746 has strength 9.85; Genemark calls start at 130746 /note=SSC: Start = 130746, Stop = 130955. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.196 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 210 bp is not the longest possible ORF. GAP: -11 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= Accession= 2L02_B, Description= Uncharacterized protein; Structural Genomics, NORTHEAST STRUCTURAL GENOMICS CONSORTIUM (NESG), PSI-2, Protein Structure Initiative, Unknown function; NMR {Bacteroides thetaiotaomicron}, Probability= 97.1. Coverage= 92.7536, SubjectRange= 2:63, QueryRange= 2:65. CDD= . /note=Starterator only has Dorin and Francesca but they both call the same start. /note=NCBI Blast and Phagesdb Blast are inconclusive. /note=HHPRED has good evidence for multiple different proteins but there isn`t enough evidence to call this gene one of the options. /note=Conserved Domains: N/A /note=Deep TMHMM: N/A CDS 130979 - 131206 /gene="275" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_275" /note=Original Glimmer call @bp 130979 has strength 7.08; Genemark calls start at 130979 /note=SSC: Start = 130979, Stop = 131206. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.427 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 228 bp is the longest possible ORF. GAP: 23 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Sporto, ProteinNumber= 73, Function= DNA helicase, EValue= 0.95. NCBIBLAST= . HHPRED= Accession= PF11950.12, Description= DUF3467 ; Protein of unknown function (DUF3467), Probability= 70.8. Coverage= 37.3333, SubjectRange= 53:85, QueryRange= 53:59. CDD= . /note=Start is LORF and contains all CP. /note=Starterator only has Francesca and Dorin, but both call the same start /note=BLAST has no hits. HHPRED yields no significant results because of low probability and high e-values. Phage function frequency and Phage internal BLAST suggest possible helicase function, but no hard evidence. CDS 131178 - 131390 /gene="276" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_276" /note=Original Glimmer call @bp 131193 has strength 1.67; Genemark calls start at 131199 /note=SSC: Start = 131178, Stop = 131390. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.668 is the highest start score. SCS: Start is not called by Glimmer and is not called by Genemark. LO: 213 bp is not the longest possible ORF. GAP: -29 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= CheeseTouch, ProteinNumber= 44, Function= function unknown, EValue= 8.0. NCBIBLAST= . HHPRED= Accession= PF10209.13, Description= DUF2340 ; Uncharacterized conserved protein (DUF2340), Probability= 28.5. Coverage= 45.7143, SubjectRange= 43:74, QueryRange= 43:48. CDD= . /note=Selected starts contains all of coding potential. /note=HHPRED has low probability and high e-values and BLAST has no hits. /note=Conserved Domain: N/A /note=Deep TMHMM: N/A CDS 131427 - 131882 /gene="277" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_277" /note=Original Glimmer call @bp 131427 has strength 9.44; Genemark calls start at 131427 /note=SSC: Start = 131427, Stop = 131882. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.912 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 456 bp is not the longest possible ORF. GAP: 36 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 273, Function= function unknown, EValue= 4.0E-81. NCBIBLAST= . HHPRED= Accession= 7NRC_A, Description= GCN1; Ribosome, Disome, GCN1, Translation, GAAC, ISR, Rbg2, Gir2; HET: 5CT; 3.9A {Saccharomyces cerevisiae S288C}, Probability= 69.8. Coverage= 90.7285, SubjectRange= 185:329, QueryRange= 185:140. CDD= . /note=Starterator only has Dorin and Francesca but they both call the same start /note=BLAST, HHPRED, and Phagesdb Blast are inconclusive /note=Conserved Domains: N/A /note=Deep TMHMM: N/A CDS 133882 - 134088 /gene="278" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_278" /note=Original Glimmer call @bp 133882 has strength 4.21; Genemark calls start at 133882 /note=SSC: Start = 133882, Stop = 134088. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.933 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 207 bp is the longest possible ORF. GAP: 1999 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= Accession= PF10367.13, Description= Vps39_2 ; Vacuolar sorting protein 39 domain 2, Probability= 89.6. Coverage= 22.0588, SubjectRange= 92:107, QueryRange= 92:18. CDD= . /note=Starterator only contains Dorin and Francesca but they both call the same start /note=BLAST, HHPRED, and Phagesdb Blast present inconclusive data. /note=Conserved Domains: N/A /note=Deep TMHMM: N/A CDS 134148 - 134504 /gene="279" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_279" /note=Original Glimmer call @bp 134148 has strength 13.5; Genemark calls start at 134148 /note=SSC: Start = 134148, Stop = 134504. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.243 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 357 bp is not the longest possible ORF. GAP: 59 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= Lizz, ProteinNumber= 51, Function= DnaE-like DNA polymerase III, EValue= 2.1. NCBIBLAST= . HHPRED= Accession= PF15569.10, Description= Imm40 ; Immunity protein 40, Probability= 50.1. Coverage= 59.322, SubjectRange= 15:93, QueryRange= 15:82. CDD= . /note=Start contains all CP. /note= /note=HHPRED and BLAST hits have unacceptable e values that do not suggest a specific function. /note=Starterater only has Francesca and Dorin but both call the same start /note=Conserved Domain: N/A /note=Deep TMHMM: N/A CDS 134587 - 134730 /gene="280" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_280" /note=Original Glimmer call @bp 134611 has strength 6.07; Genemark calls start at 134569 /note=SSC: Start = 134587, Stop = 134730. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.776 is not the highest start score. SCS: Start is not called by Glimmer and is not called by Genemark. LO: 144 bp is not the longest possible ORF. GAP: 82 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= Accession= 3JB9_W, Description= Pre-mRNA-splicing factor cdc5; Spliceosome, U2/U5/U6, Lariat, RNA BINDING PROTEIN-RNA complex; HET: GDP, ADP; 3.6A {Schizosaccharomyces pombe 972h-} SCOP: j.138.1.1, Probability= 64.6. Coverage= 46.8085, SubjectRange= 729:751, QueryRange= 729:47. CDD= . /note=Start was changed to include all of the CP, even though z and final scores are better in the latter. /note=No informative hits provided by BLAST, HHPRED, or Phagesdb Blast. /note=Conserved Domains: N/A /note=Deep TMHMM: N/A CDS 134820 - 134981 /gene="281" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_281" /note=Original Glimmer call @bp 134820 has strength 5.94; Genemark calls start at 134820 /note=SSC: Start = 134820, Stop = 134981. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.254 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 162 bp is the longest possible ORF. GAP: 89 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= PhageName= BillNye, ProteinNumber= 131, Function= rIIB-like protein, EValue= 1.6. NCBIBLAST= . HHPRED= Accession= PF17064.9, Description= QVR ; Quiver family u-PAR/Ly-6-like domain, Probability= 83.9. Coverage= 73.5849, SubjectRange= 1:34, QueryRange= 1:44. CDD= . /note=Start contains all CP /note= /note=HHPRED and PhagesDB blast only yield hits with low confidence e values and coverage. NCBI BLAST yields no hits. /note=Conserved Domain: N/A /note=Deep TMHMM: N/A CDS 134984 - 135217 /gene="282" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_282" /note=Original Glimmer call @bp 134984 has strength 4.52; Genemark calls start at 134984 /note=SSC: Start = 134984, Stop = 135217. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.471 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 234 bp is the longest possible ORF. GAP: 2 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= Accession= PF08489.15, Description= DUF1743 ; Domain of unknown function (DUF1743), Probability= 59.6. Coverage= 32.4675, SubjectRange= 34:56, QueryRange= 34:56. CDD= . /note=Start kept due to good RBS, captures all CP, starterator only had Dorin and Francesca but both called the same start /note=No information on function because *HHPRED, NCBI Blast, and Phagesdb Blast were all inconclusive* /note=Conserved Domains: N/A /note=Deep TMHMM: N/A CDS 135233 - 135826 /gene="283" /product="glycosylase" /function="glycosylase" /locus tag="Dorin_283" /note=Original Glimmer call @bp 135233 has strength 1.24; Genemark calls start at 135233 /note=SSC: Start = 135233, Stop = 135826. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.243 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 594 bp is the longest possible ORF. GAP: 15 bp. ST: SS=NA. F: glycosylase. FS: PHDBLAST= PhageName= MK4, ProteinNumber= 76, Function= glycosylase, EValue= 3.0E-43. NCBIBLAST= PhageName= glycosylase [Mycobacterium phage MK4], Coverage= 99.4924, SubjectRange= 1:191, QueryRange= 1:196, EValue= 4.71913E-52. HHPRED= Accession= 3FHG_A, Description= N-glycosylase/DNA lyase; ogg, helix-hairpin-helix, glycosylase, 8-oxoguanine, 8-oxoG, SsOGG, DNA damage, DNA repair, Glycosidase, Hydrolase, Lyase, Multifunctional enzyme, Nuclease; HET: SO4, GOL; 1.9A {Sulfolobus solfataricus}, Probability= 96.8. Coverage= 40.1015, SubjectRange= 126:206, QueryRange= 126:197. CDD= . /note=Start kept same due to covering all CP and being longest ORF. Based on BLAST data and HHPred, glycosylase was called /note=Conserved Domain: N/A /note=Deep TMHMM: N/A CDS 135905 - 136573 /gene="284" /product="Lsr2-like DNA bridging protein" /function="Lsr2-like DNA bridging protein" /locus tag="Dorin_284" /note=Original Glimmer call @bp 135905 has strength 11.31; Genemark calls start at 135905 /note=SSC: Start = 135905, Stop = 136573. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 2.427 is not the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 669 bp is the longest possible ORF. GAP: 78 bp. ST: SS=NA. F: Lsr2-like DNA bridging protein. FS: PHDBLAST= PhageName= Francesca_Draft, ProteinNumber= 6, Function= function unknown, EValue= 1.0E-123. NCBIBLAST= . HHPRED= Accession= 2KNG_A, Description= Protein lsr2; DNA-binding domain, Immune response, DNA BINDING PROTEIN; NMR {Mycobacterium tuberculosis}, Probability= 95.8. Coverage= 15.3153, SubjectRange= 11:45, QueryRange= 11:119. CDD= Accession= pfam11774, Coverage= 31.0811, SubjectRange= 46:104, QueryRange= 46:115, EValue= 8.61863E-5. /note=Chosen start is the LORF and contains all CP. /note= /note=Function frequency in PhagesDB and HHPRED suggest possible Lsr2 protein identity, confirmed with synteny. CDS 136640 - 136762 /gene="285" /product="Hypothetical Protein" /function="Hypothetical Protein" /locus tag="Dorin_285" /note=Original Glimmer call @bp 136640 has strength 5.95; Genemark calls start at 136640 /note=SSC: Start = 136640, Stop = 136762. (Forward). CP: Does contain all GeneMarkHost capacity. SD: ZScore 3.243 is the highest start score. SCS: Start is called by Glimmer and is called by Genemark. LO: 123 bp is the longest possible ORF. GAP: 66 bp. ST: SS=NA. F: Hypothetical Protein. FS: PHDBLAST= . NCBIBLAST= . HHPRED= Accession= PF21042.1, Description= Stu2_CTS ; Stu2, C-terminal segment, Probability= 38.6. Coverage= 52.5, SubjectRange= 2:23, QueryRange= 2:38. CDD= . /note=Start kept due to good RBS, CP, and current suggested start /note=No clues from HHPRED, NCBI Blast, or Phagesdb Blast that point to a function because they were all inconclusive /note=Good CP /note=Conserved Domains: N/A /note=Deep TMHMM: N/A