CDS 86 - 538 /gene="1" /product="gp1" /function="terminase, small subunit" /locus tag="IttyBittyPiggy_1" /note=Original Glimmer call @bp 86 has strength 11.76; Genemark calls start at 86 /note=SSC: 86-538 CP: yes SCS: both ST: SS BLAST-Start: [terminase small subunit [Arthrobacter phage Amyev] ],,NCBI, q1:s1 100.0% 1.9885E-93 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.209, -2.0949393225970705, yes F: terminase, small subunit SIF-BLAST: ,,[terminase small subunit [Arthrobacter phage Amyev] ],,YP_010677704,96.0,1.9885E-93 SIF-HHPRED: Terminase small subunit; genome packaging, bacteriophage, DNA binding, VIRAL PROTEIN; 1.4A {Enterobacteria phage HK97},,,6Z6E_C,53.3333,98.6 SIF-Syn: Terminase small subunit, no gene upstream, downstream gene is terminase large subunit, just like in phages Amyev and Ascela. /note=Primary Annotator Name: Dweik, Qaiss /note=Auto-annotation: Both Glimmer and GeneMark agree on a start site at bp 86 with an ATG (Met) start codon. /note=Coding Potential: Gene has great coding potential within the putative ORF, as indicated by Self and Host GeneMark maps with high levels of typical coding potential. The start site does cover all of the coding potential, as it is placed to the left of the beginning of the high peaks of the coding potential and the stop site is placed to the right on the maps. /note=SD (Final) Score: The original start site has the best SD score (least negative) at -2.095 and is not part of an operon due to the large gap preceding the start site. /note=Gap/overlap: Since this is the first gene in the genome, there is no gap/overlap with a preceding gene. The earliest proposed start site (at bp 86) corresponds to the first potential start codon identified. The auto-annotated start site provides the longest reasonable ORF of all the options with an acceptable gene length (453 bp). /note=Phamerator: As of 01/13/2024, this gene is found in pham 133530. Other genes of members in the AZ1 cluster are present in this pham as well. Some of these phages include JuneStar, Kaylissa, London, and Mudpuppy. /note=Starterator: There is a reasonable start site for which the genes in this pham (133530) are conserved at start site 40 (which is at bp 86 for IttyBittyPiggy_1). There are 152 non-draft members in this pham and an additional 43 draft members. Of the 152 non-draft members, 44 of them call start site 40. The Starterator program is informative, as few members in this pham are drafts and it provides the number of manual annotations for which the other non-draft genes have called start site 40. /note=Location call: The gathered evidence suggests that this is a real gene with the original start site @ bp 86 being correct due to its complete encompassing of the coding potential, its creation of the longest reasonable ORF of all of the start site candidates, its ideal RBS and Z-score statistics, and its consistency with the Starterator report. /note=Function call: Predicted function is terminase small subunit, based on several hits from PhagesDB with e-values around 1e-76 and several hits from NCBI blastp with 100% query coverage and 90% sequence identity and e-values around 1e-93 (Amyev_1 and Iter_1). HHpred also had 2 hits with >46% coverage, and e-values around 1e-6. CDD returned no conserved domains for this gene. /note=Transmembrane domains: No transmembrane domains were predicted, signifying that this gene does not code for a membrane protein. /note=Secondary Annotator Name: Indiresan, Neeti /note=Secondary Annotator QC: I agree with your location call and function call based on the evidence provided. All boxes are filled out and evidence is checked. CDS 535 - 2232 /gene="2" /product="gp2" /function="terminase, large subunit" /locus tag="IttyBittyPiggy_2" /note=Original Glimmer call @bp 535 has strength 11.89; Genemark calls start at 535 /note=SSC: 535-2232 CP: yes SCS: both ST: SS BLAST-Start: [terminase large subunit [Arthrobacter phage Cassia]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.08, -4.394706008439538, no F: terminase, large subunit SIF-BLAST: ,,[terminase large subunit [Arthrobacter phage Cassia]],,WGH21075,98.4071,0.0 SIF-HHPRED: Terminase large subunit; genome packaging, bacteriophage, ATPase, nuclease, VIRAL PROTEIN; HET: BR; 2.2A {Enterobacteria phage HK97},,,6Z6D_A,92.9204,100.0 SIF-Syn: /note=Primary Annotator Name: Kathiravan, Anoushka /note=Auto-annotation: The start site is identified as 535 by both GeneMark and Glimmer. /note=Coding Potential: Based on Host-trained GeneMark and Self-Trained GeneMark there is coding potential found in the forward direction by both systems because the region of coding falls between the predicted start site and the stop site. /note=SD (Final) Score: The Z score is 2.08 and the Final Score is -4.395. This is the highest Z score and the least negative Final score. /note=Gap/overlap: The gap is -4bp, indicating that there is some overlap between it and the previous gene. /note=Phamerator: This gene is in pham 130372. Other members of the cluster AZ1 are present in this pham including AEgle, Community, and Joemato. /note=Starterator: On 1/17/23 the pham is 130372. There are 1353 members in this pham. This gene contains start site 61 which is also called by 53 other members in the pham 92% of the time it was present. /note=Location call: Start site is likely 535 /note=Function call: Terminase, large subunit. The highest ranked phages in HHPred have the function of terminase, large subunit protein with an e value of 3.6e-38 at 100% probability. In NCBI BLAST the top phages also call terminase, large subunit protein with 96.46% identity. The CDD hits also have terminase, large subunit with and e-value of 9.01428e-15 at 10.98% identity. /note=Transmembrane domains: None. This makes sense because according to HHpred and CDD this is a terminase protein. These protein are used to bind DNA are would not have TMRs. /note= /note=Secondary Annotator Name: Kalliomaa, Kira /note=Secondary Annotator QC: I have QC`ed this gene and agree with the primary annotator however; /note= /note=>SD: This specifically asks about the final score, I also would be cautious on your wording because the score documented in these notes is not the `least negative final score`. The least negative is at start 1276, and also has the better Z score. However I agree with this pick because of the -4bp gap overlap indicting its part of an operon. /note=>GAP: You just mentioned that there is a -4b gap. However you didn`t mention that it indicates that this suggests that this is part of an operon, not to mention there were no mentions of that this gene is conserved in other phages, when it is in at least two (Adolin and Dr.Manhattan). /note=>Phamerator: just the wording specifically in this section. Is this gene conserved in other genomes/phamilies? I`m assuming that is what you meant but its always best to be specific and leave no room for assumption. /note=>Starterator: I believe you were actually supposed to mark the date you accessed phamerator and put the pham there, because starterator tells you where in the pham the every possible start site is and what has been manually annotated and pharmaerator is constantly updated and changing. I also would mention if the start called by starterator agrees with the glimmer and genemark call and mention what the start site is on your gene, Is site 61 start 535? /note=>Location call: I would mention that based on the prior/previous evidence mentioned that the likely start site is 535. /note=>Function call: I would also mark and mention the Phages DB blast hits. Its not on page 1 but on page 10 with great E-values. Aside from these things everything else is good! /note=>Also! Since this is a called function, make sure the synteny box is filled out :) CDS 2272 - 3651 /gene="3" /product="gp3" /function="portal protein" /locus tag="IttyBittyPiggy_3" /note=Original Glimmer call @bp 2272 has strength 12.29; Genemark calls start at 2272 /note=SSC: 2272-3651 CP: yes SCS: both ST: SS BLAST-Start: [portal protein [Arthrobacter phage Janeemi]],,NCBI, q1:s1 99.7821% 0.0 GAP: 39 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.209, -1.953940808934884, yes F: portal protein SIF-BLAST: ,,[portal protein [Arthrobacter phage Janeemi]],,UVK63524,97.1554,0.0 SIF-HHPRED: Portal protein; Bacteriophage, SPP1, Portal Protein, Head completion proteins, Connector Complex, DNA Channel, VIRAL PROTEIN; 2.7A {Bacillus subtilis},,,7Z4W_B,91.0675,100.0 SIF-Syn: Portal protein, upstream gene is terminate, large subunit, downstream gene is capsid maturation protease and VIP2-like ADP ribosyltransferase toxin, shares synteny with Adumb2043. /note=Primary Annotator Name: Aves, Alexandra /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 2272 which has a start codon of ATG and is the LORF. /note=Coding Potential: Both Host Trained and Self Trained GeneMarks demonstrate very strong coding potential throughout the entirety of the suggested range of 2272 - 3651. /note=SD (Final) Score: The final score for the selected gene is the least negative (-1.954) and has the highest Z-score (3.209). /note=Gap/overlap: The selected gene has the smallest gap out of the suggested as 39bp and is the LORF. /note=Phamerator: As of 1/15/24, Phamerator calls the pham as 131677. There are 1918 members in this pham and it includes genes from clusters A, BD, BL etc. The vast majority of these proteins are finalized as portal proteins such as 20ES and 40AC which are both in cluster A. /note=Starterator: As of 1/15/24, Starterator calls the pham as 131677 which has 1916 members, 154 are drafts. The selected start site is not the most annotated, and is found in 1.8% of genes in this pham. /note=Location call: There is strong evidence for the realness of this gene and that it starts at 2272. There is strong synteny for this gene with many finalized genes such as Cassia and Ascela. Additionally, all of the coding potential seems to be included within the range. /note=Function call: There is very strong evidence for the function of this gene to be a portal protein. Both PhagesDB and NCBI Blast demonstrated numerous hits with e-values of 0, with Janeemi and ObiToo being the top two hits on each. CDD demonstrates 1 hit with a portal protein with 88% coverage and an e-value of 1.42e-41. HHPRED has numerous hits as well, with the grand majority being portal proteins, including the first two with 100% probability, 7Z4W_B and PD05133.18. /note=Transmembrane domains: There is no evidence of transmembrane properties. /note=Secondary Annotator Name: Potter, Sofia /note=Secondary Annotator QC: Evidence presented looks good, and I think the function call of portal protein looks accurate. CDS 3670 - 5721 /gene="4" /product="gp4" /function="capsid maturation protease and VIP2-like ADP-ribosyltransferase toxin" /locus tag="IttyBittyPiggy_4" /note=Original Glimmer call @bp 3670 has strength 12.09; Genemark calls start at 3670 /note=SSC: 3670-5721 CP: yes SCS: both ST: SS BLAST-Start: [capsid maturation protease and VIP2-like ADP-ribosyltransferase toxin [Arthrobacter phage Tuck]],,NCBI, q1:s1 100.0% 0.0 GAP: 18 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.054, -2.417348306996335, yes F: capsid maturation protease and VIP2-like ADP-ribosyltransferase toxin SIF-BLAST: ,,[capsid maturation protease and VIP2-like ADP-ribosyltransferase toxin [Arthrobacter phage Tuck]],,WAB10780,87.9941,0.0 SIF-HHPRED: d.166.1.1 (A:265-550) automated matches {Anthrax bacillus (Bacillus anthracis) [TaxId: 1392]} | CLASS: Alpha and beta proteins (a+b), FOLD: ADP-ribosylation, SUPFAM: ADP-ribosylation, FAM: ADP-ribosylating toxins,,,SCOP_d4dv8a1,34.1142,99.5 SIF-Syn: Gene 4 in IttyBittyPiggy aligns with gene 4 in Adolin (cluster AZ) and both have the same function; gene 3 in both phages are portal proteins. Gene 4 in IttyBittyPiggy aligns with gene 4 in Percival (cluster EH) and have similar functions (capsid maturation protease and VIP2-like ADP-ribosyltransferase toxin in IttyBittyPiggy vs ADP-ribosyltransferase domain and MuF-like fusion protein). Gene 3 in both phages are portal proteins. /note=Primary Annotator Name: Ebrahimi, Shaghaiegh Mariam /note=Auto-annotation start source: Glimmer and Genemark both call start at 3670. /note=Coding Potential: Good coding potential is present in Self-Genemark and HostGenemark. It is present throughout the entire reading frame and only in the forward strand. /note=SD (Final) Score: -2.417 this is the largest final score among the start sites. /note=Gap/overlap: Gap is 18 which is reasonable and is the smallest gap amongst the start sites. /note=Phamerator: Pham as of 1/17/24 is 2324 and includes 55 other members mostly in the AZ cluster with some in EH (18 of the other members are drafts). /note=Starterator: Start site number 3 is called for this gene and is the most annotated in Starterator. It was called in 32/35 non-draft genomes. /note=Location call: Most likely start site is 3670 since it was called by both programs and has the highest z-score (3.054) and final score (-2.417), and was most annotated in Starterator. Gene also has the smallest gap and likely start codon. /note=Function call: Most likely function is capsid maturation protease and VIP2-like ADP-ribosyltransferase toxin. Good hits on phagesdb BLAST with e-values ranging from 0 to 5e-53. Good hits on HHpred for similar proteins with e-values ranging 8.3e-10 to 0.066. Many good hits on NCBI BLAST ranging from 0 to 5e-29. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs and so this is not a transmembrane protein. /note=Secondary Annotator Name: Salinas, Juan /note=Secondary Annotator QC: I agree with the primary annotator. The start site at 3670 makes the most sense given the appropriate gap and the most faovrbale z-score and final score. I also agree with the function call given the strong evidence in BLAST and HHPRED. The e-value of 0 is convincing and provides strong evidence for this call. CDS 5769 - 6128 /gene="5" /product="gp5" /function="hypothetical protein" /locus tag="IttyBittyPiggy_5" /note=Original Glimmer call @bp 5769 has strength 10.84; Genemark calls start at 5769 /note=SSC: 5769-6128 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CASSIA_5 [Arthrobacter phage Cassia]],,NCBI, q1:s1 100.0% 2.28988E-75 GAP: 47 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.209, -1.953940808934884, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CASSIA_5 [Arthrobacter phage Cassia]],,WGH21078,97.479,2.28988E-75 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Tran, Michelle /note=Auto-annotation: Both Glimmer and GeneMark call a start site for this gene at 5769. /note=Coding Potential: The coding potential for this gene is indicated on both the host-trained and self-trained GeneMark in the assigned ORF. This coding potential is only found on the forward strand, which supports the annotation of this as a forward gene. /note=SD (Final) Score: The final score for this start site is -1.954, which is the best available start site for this gene on PECAAN. /note=Gap/overlap: There is a 47-bp gap between the start site of this gene and the stop site of the upstream gene, which is a reasonable size. It is also the smallest available gap for this start site available on PECAAN. /note=Phamerator: As of January 17, 2024, this gene is under pham 133748. This is conserved in other phages within cluster AZ1, such as Adumb2043 and Adolin. /note=Starterator: Start site 12, which corresponds to 5769 in IttyBittyPiggy, was manually annotated in 2/41 of the non-draft genes in the pham. This evidence would agree with the auto-annotations predicted by Glimmer and GeneMark but is noted to not be the most common start site among phages in that pham. /note=Location call: Based on the evidence above, this gene is most likely real with a start site of 5769. /note=Function call: NKF. Both Phagesdb BLAST and NCBI BLAST yield strong results supporting the lack of known function (e-value <10e-60) for this protein. The HHPred results were unusable because of all of them being too weak (e-value >1). The Conserved Domain Database yielded no results. /note=Transmembrane domains: This is not a membrane protein because DeepTMHMM does not predict any TMDs. It is impossible to determine the protein’s function from there because it currently has NKF. /note=Secondary Annotator Name: Zaragoza, Evelin /note=Secondary Annotator QC: I agree with the annotator`s start site. They may choose to elaborate on certain areas of evidence e.g. identify factors that led to this start site call, mention Z-score in final score notes, and explain reasoning on starterator. GM box needs to be checked off. I agree with this annotator`s function call. CDS 6246 - 6791 /gene="6" /product="gp6" /function="scaffolding protein" /locus tag="IttyBittyPiggy_6" /note=Original Glimmer call @bp 6246 has strength 14.66; Genemark calls start at 6246 /note=SSC: 6246-6791 CP: yes SCS: both ST: SS BLAST-Start: [head scaffolding protein [Arthrobacter phage Lizalica] ],,NCBI, q1:s1 100.0% 3.99237E-100 GAP: 117 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.975, -3.3503726477288405, yes F: scaffolding protein SIF-BLAST: ,,[head scaffolding protein [Arthrobacter phage Lizalica] ],,YP_010677571,90.5028,3.99237E-100 SIF-HHPRED: Scaffold protein; major capsid protein, HK97-like fold, scaffolding protein, procapsid, VIRUS; 3.72A {Staphylococcus phage 80alpha},,,6B0X_b,59.6685,98.4 SIF-Syn: Comparing with phage Amyev, the aligned gene both are a scaffolding protein. The upstream has no listed function, however the downstream gene both are a major capsid protein. /note=Primary Annotator Name: Kim, Abby /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 6246. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.350. It is the best final score on PECAAN. /note=Gap/overlap: 117 bp gap which is quite big. Coding potential in the gap is weak, therefore there are no signs of a new gene. /note=Phamerator: pham: 1850. Date 1/12/24. It is conserved and found in Asa16_6 (AZ) and Emotion_5 (AZ), in addition to many other phages from various clusters /note=Starterator: Start site 19 in Starterator was manually annotated in 48 out 51 genes in this pham. Start 19 agrees with the site predicted by Glimmer and GeneMark 6246. /note=Location call: Based on the evidence, this is a real gene and the most likely start site is 6246. /note=Function call: Scaffolding Protein. The third and fifth phagesDB BlastP hits have the function as scaffolding protein (Tuck_7: 5e-85 and Lizalica_6: 1e-83). The top NCBI Blast hit has the function call of a scaffolding protein with a query cover of 100% (e-value: 4e-100 and 86.74% identity) and the second hit is also a scaffolding protein with a query cover of 100% (e-value: 1e-97 and 84.53% identity). HHpred had a hit for unknown function with an e-value of 4.6e-7, however, the second hit is for a scaffold protein with a probability of 98.43% and a bit of a higher e-value of 0.000019. When running CDD, it does not produce any hits so there is no available data. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Kibria, Kamille /note=Secondary Annotator QC: Additional synteny with Tuck and Lizalica. I agree with the primary annotation. CDS 6819 - 7766 /gene="7" /product="gp7" /function="major capsid protein" /locus tag="IttyBittyPiggy_7" /note=Original Glimmer call @bp 6819 has strength 14.33; Genemark calls start at 6819 /note=SSC: 6819-7766 CP: yes SCS: both ST: SS BLAST-Start: [major capsid protein [Arthrobacter phage ObiToo]],,NCBI, q1:s1 99.6825% 0.0 GAP: 27 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.099, -4.625844833826125, yes F: major capsid protein SIF-BLAST: ,,[major capsid protein [Arthrobacter phage ObiToo]],,WGH21183,95.5556,0.0 SIF-HHPRED: Major capsid protein; P22 Bacteriophage, VIRUS; 3.3A {Salmonella phage P22},,,5UU5_B,93.0159,100.0 SIF-Syn: /note=Primary Annotator Name: To, Nathan /note=Auto-annotation: Both Glimmer and Genemark call this start site at 6819 with start codon ATG. /note=Coding Potential:Both Genemark Self and Host show coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. /note=SD (Final) Score: -4.626, this is the best final score available. /note=Gap/overlap: Gap of 27, this is reasonable and the smallest gap possible, it also allows for the largest ORF. /note=Phamerator: As of 1/17/24, phamerator calls the pham 228. This pham contains 297 members and has phages from numerous clusters including other AZ phages. Everyone function call that is present is for major capsid protein. /note=Starterator: As of 1/13/24, starterator calls the pham 228. This pham contains 296 members, 43 of which are drafts. The selected start site is the most annotated, called 100% of the time when present. /note=Location call:The gathered evidence suggests that this is a real gene, with start site @6819. This gene has good coding potential, and does not have large gaps before or after it. The start site 6819 seems most likely due to both Glimmer and Genemark calling it, its good Z score and final score, and a start codon of ATG. Starterator also provides good evidence for it, with it being the most annotated site and being called 100% of the time when present. /note=Function call: Predicted function is major capsid protein, based on multiple NCBi and PhagesDB BLASTs with predicted function major capsid protein and extremely low E-values, (1e-163 for major capsid protein on phage Crewmate). HHpred also makes calls for major capsid protein with good probability, e value, and coverage (100 probability, 2.4e-26 e-value and 93.0159 coverage for hit). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs. /note=Secondary Annotator Name: Dweik, Qaiss /note=Secondary Annotator QC: I agree with the location and function call of this gene`s annotation. Everything looks great! CDS 7843 - 8253 /gene="8" /product="gp8" /function="head-to-tail adaptor" /locus tag="IttyBittyPiggy_8" /note=Original Glimmer call @bp 7843 has strength 11.72; Genemark calls start at 7843 /note=SSC: 7843-8253 CP: yes SCS: both ST: SS BLAST-Start: [head-to-tail adaptor [Arthrobacter phage Wildwest]],,NCBI, q1:s1 100.0% 3.80791E-77 GAP: 76 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.209, -2.0162541296952132, yes F: head-to-tail adaptor SIF-BLAST: ,,[head-to-tail adaptor [Arthrobacter phage Wildwest]],,WNO26028,91.3043,3.80791E-77 SIF-HHPRED: Head completion protein gp15; Bacteriophage, SPP1, Portal Protein, Head completion proteins, Connector Complex, DNA Channel, VIRAL PROTEIN; 2.7A {Bacillus subtilis},,,7Z4W_l,79.4118,99.1 SIF-Syn: When compared to phage Adumb2043, the aligned gene is listed as a head-to-tail adaptor. The upstream gene has no function listed, and the downstream gene for Adumb2043 is a major capsid protein. /note=Primary Annotator Name: Valente, Nina /note=Auto-annotation: Both Glimmer and GeneMark call this gene. They agree on start site 7843. /note=Coding Potential: All of the coding potential is covered by this start site. Coding potential is in the forward direction. /note=SD (Final) Score: The final score is -2.016. It is the best final score on PECAAN. /note=Gap/overlap: There is a 76 base pair gap. Since this gap is over 50 bp, this may be notable. However, there does not appear to be any space for a missing gene, or a gene that should be deleted. /note=Phamerator: As 1/16 the pham listed is 131852. There are 179 other members in this pham. /note=Starterator: The start site 7843 has 47/144 manual annotations and is found in 42.7% of genes in the pham and is called 89.5% of the time when present. /note=Location call: Based on the above evidence, this appears to be a real gene with a start site of 7843. /note=Function call: The top 21 NCBI Blastp hits show a head-to-tail adaptor protein, with e-values ranging from 3.80791e-77 (top hit) to 1.30725e-60 (21st hit). There are no results from CDD. Two of the top three results from HHpred are listed as head completion proteins, with e-values of 1.7e-8 and 7.9e-7 respectively. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Kathiravan, Anoushka /note=Secondary Annotator QC: I agree with the start site and functional call based on the evidence and HHpred and CDD hits. CDS 8267 - 8377 /gene="9" /product="gp9" /function="hypothetical protein" /locus tag="IttyBittyPiggy_9" /note=Genemark calls start at 8267 /note=SSC: 8267-8377 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein HOU52_gp09 [Arthrobacter phage Yang] ],,NCBI, q1:s1 97.2222% 6.83801E-12 GAP: 13 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.054, -2.417348306996335, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU52_gp09 [Arthrobacter phage Yang] ],,YP_009815627,86.4865,6.83801E-12 SIF-HHPRED: SIF-Syn: NKF /note=Primary Annotator Name: Laureano, Ryan /note=Auto-annotation: GeneMark calls the start site at 8267 but Glimmer does not call a location. The codon is ATG. /note=Coding Potential: There is good coding potential shown in GeneMark with the suggested start site. The coding potential covers >80% of the start to stop site. /note=SD (Final) Score: The suggested start site at 8267 has a final score of -2.417 which is significantly better than the alternative which is -5.323. /note=Gap/overlap: The gap with the suggested start site is 13 which is not high enough for the indication of a missing gene. This start site also provides a length of 111 which, albeit rather short, provides the LORF. /note=Phamerator: As of 1/16/24, this gene is in pham 132161. All the genes in this pham are part of the AZ cluster. Such genes include Lego_9, Powerpuff_9, and Yang_9. /note=Starterator: The most annotated start site is site number 1 which is called in 30/30 non-draft genes in the pham. This gene calls this start site and the basepair location is 8267. /note=Location call: The gene seems to be real albeit on the shorter end which is a pattern among genes in this pham. The starterator data and the final score backs up the proposed start site. /note=Function call: PhagedbBLAST have hits with VResidence and Yang at low e-values but both of these genes have NKF. These same genes result in hits with NCBI BLAST but are marked as hypothetical proteins. HHpred has no hits. /note=Transmembrane domains: There are no recorded TMRs for this gene. /note=Secondary Annotator Name: Aves, Alexandra /note=Secondary Annotator QC: I have reviewed the evidence gathered above and agree with the function call of NKF. CDS 8374 - 8727 /gene="10" /product="gp10" /function="head-to-tail stopper" /locus tag="IttyBittyPiggy_10" /note=Original Glimmer call @bp 8374 has strength 15.27; Genemark calls start at 8374 /note=SSC: 8374-8727 CP: yes SCS: both ST: NI BLAST-Start: [head-to-tail stopper [Arthrobacter phage Cassia]],,NCBI, q2:s1 99.1453% 8.50685E-65 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.821, -2.9063687850157054, yes F: head-to-tail stopper SIF-BLAST: ,,[head-to-tail stopper [Arthrobacter phage Cassia]],,WGH21083,95.6897,8.50685E-65 SIF-HHPRED: Stopper protein Rcc01689; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_E,94.8718,99.7 SIF-Syn: Head-to-tail stopper, exhibits both downstream and upstream synteny with Adeline, Adumb2043, and Amyev, all of which belong to subcluster AZ1 and Pham 133718. /note=Primary Annotator Name: Ryan, Kaitlin /note=Auto-annotation: Both Glimmer and GeneMark call the start site as 8374. /note=Coding Potential: High coding potential in the forward direction as indicated by the direct sequence on both GeneMark Host and GeneMark S outputs. However, there is also coding potential in the reverse direction as indicated by the complementary sequence on both GeneMark Host and GeneMark S outputs. /note=SD (Final) Score: -2.906. This is the only, and thus also least negative, final score listed on PECAAN. /note=Gap/overlap: There is an overlap of 4 base pairs, indicating that this gene is an operon. /note=Phamerator: pham: 133718. Date: 1/11/2024. This gene shows extensive synteny with phages that are also members of the AZ1 subcluster, as well as Pham 133718, such as Adolin, Adumb2043, and Amyev. /note=Starterator: Start site 8 was the most annotated in 43 of the 51 non-draft genes in this pham. However, this start site was not manually annotated for this phage gene. /note=Location call: Based on the above evidence, this is a real gene and the start site is 8374. /note=Function call: Head-to-tail stopper, as called by the majority of significant non-draft hits from PhagesDB Blast (as of 1/15/2024, with e-values < 3e-46), NCBI Blast (as of 1/11/2024, with comparisons to phages Cassia, Crewmate, and ObiToo, with > 80% identity, > 97% coverage, > 86% aligned, and e-values < 3.8e-58), and HHPRED (as of 1/11/2024, probability 99.7%, coverage 94.87%, and e-value of 3.6e-15). CDD was irrelevant. /note=Transmembrane domains: Deep TMHMM does not predict any TMDs; therefore, this is not a transmembrane protein. /note=Secondary Annotator Name: Ebrahimi, Shaghaiegh Mariam /note=Secondary Annotator QC: I agree with the location call and the function call I would just clarify the starterator section why was start 8 not chosen for this gene, is it not present? What was the chosen start site? // 1/23/24 I have updated notes to remedy this error. CDS 8739 - 9041 /gene="11" /product="gp11" /function="hypothetical protein" /locus tag="IttyBittyPiggy_11" /note=Original Glimmer call @bp 8739 has strength 9.15; Genemark calls start at 8739 /note=SSC: 8739-9041 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQE12_gp11 [Arthrobacter phage Adumb2043] ],,NCBI, q1:s1 99.0% 1.8842E-44 GAP: 11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.209, -2.033982896655645, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE12_gp11 [Arthrobacter phage Adumb2043] ],,YP_010677921,87.1287,1.8842E-44 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sanchez, Kayla /note=Auto-annotation: Both Glimmer and Genemark. Both call the start at 8739. /note=Coding Potential: Good coding potential is found both in GeneMark Self and Host. Our start and stop is outside of the coding potential. Coding potential is in the forward region so we can say that our gene is a forward gene. /note=SD (Final) Score: -2.034. It is the best final score on PECAAN because it is the lowest negative value. /note=Gap/overlap: Gap of 11bp. This is a reasonable gap that remains conserved in other phages (Adolin and Adumb2043). Acceptable gene length of 303bp. /note=Phamerator: pham: 135424. Date 1/17/24. It is conserved; found in Adumb2043 and Cassia (AZ1). /note=Starterator: Start number 24 in Starterator was manually annotated in 50/159 non-draft genes in this pham. Start 37 is 8739 in IttyBittyPiggy with 48 manual annotations. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the good coding potential and the small e-value, this gene is real and the most likely start site is 8739. Our findings also show synteny because of the similar position as well as similar sizes of genes (IttyBittyPiggy11 - 303 bp, Adolin 11 - 308 bp, Amyev11 - 305 bp). /note=Function call: Function unknown. The top two PhagesDB BLASTp hits have the function unknown (E-value <10^-36), and the top two NCBI BLAST hits function unknown (81%+ identity, and E-value <10^-44). HHpred cannot be used to determine if it is a function unknown because they have very large e-values (0.012 and 14). CDD did not have any domain hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. It predicts that the protein is found on the inside of the membrane. /note=Secondary Annotator Name: Tran, Michelle /note=Secondary Annotator QC: I agree with the location and function calls, but the pham has changed and the relevant information needs to be updated. CDS 9041 - 9454 /gene="12" /product="gp12" /function="tail terminator" /locus tag="IttyBittyPiggy_12" /note=Original Glimmer call @bp 9041 has strength 12.9; Genemark calls start at 9041 /note=SSC: 9041-9454 CP: yes SCS: both ST: SS BLAST-Start: [tail terminator [Arthrobacter phage Nitro]],,NCBI, q1:s1 99.2701% 9.44155E-84 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.503, -3.430952136556091, yes F: tail terminator SIF-BLAST: ,,[tail terminator [Arthrobacter phage Nitro]],,WNN93968,94.8529,9.44155E-84 SIF-HHPRED: Tail terminator protein Rcc01690; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_F,94.1606,99.3 SIF-Syn: Synteny is present with phages such as Adumb2043 and Crewmate, in which they both call the gene as a tail terminator. /note=Primary Annotator Name: Hon, Darren /note=Auto-annotation: Gene 12 (stop@9454F) /note=Coding Potential: Good coding potential is present and called by both host and self-trained GeneMark. Genemark and Glimmer both call the start site as 9041. There are also no switches in gene orientation. This gene is the LORF, with a length of 414 kp. /note=SD (Final) Score: The start codon is GTG. The final score of -3.431 and z-score of 2.503 indicate a good RBS score. The original site of 9041 is thereby kept. /note=Gap/overlap: There is an overlap of -1, indicating no additional insertion of a gene. /note=Phamerator: As of 1/17/24, the gene is in pham 133659. There are a total of 96 members, 28 of which are drafts. /note=Starterator: As of 1/17/24, the most annotated start is 8, of which 35 of 68 non-draft genes call it. The start position is listed as 9041, which was also the auto-annotated start. /note=Location call: Strong coding potential is found between the start site of 9041 and stop site of 9454. The start site is supported by both Glimmer and GeneMark. Additionally, the final score of -3.431 and z-score of 2.503 are optimal values. This gene is the LORF. As of 1/17/24, the phamerator and starterator both call the start site at 9041 with start number 8. There is an overlap of -1, indicating no additional gene insertion. Therefore, this is a real gene and the start site is at 9454. /note=Function call: According to the blastp hits, other genes, with valid e-values, indicate tail terminator as the function, such as Nitro and Ascela, 7e-67 and 4e-66 respectively. CDD did not have any domain hits. HHPred had hits that indicate a tail-to-head joining protein, tail terminator protein, and prohage LABDALM01 antigen B. Holistically, this protein has a tail terminator function. /note=Transmembrane domains: According to DeepTMHMM, this is not a transmembrane protein. There are no predicted TMRs. /note=Secondary Annotator Name: Kim, Abby /note=Secondary Annotator QC: I agree with all of the annotation. The only things I would work on is filling out the synteny box and being more specific with the functional call. Include specific e-values and which hits. CDS 9467 - 10018 /gene="13" /product="gp13" /function="major tail protein" /locus tag="IttyBittyPiggy_13" /note=Original Glimmer call @bp 9467 has strength 18.79; Genemark calls start at 9467 /note=SSC: 9467-10018 CP: yes SCS: both ST: SS BLAST-Start: [major tail protein [Arthrobacter phage Cassia]],,NCBI, q1:s1 100.0% 9.52728E-123 GAP: 12 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.219, -2.0720764396375664, yes F: major tail protein SIF-BLAST: ,,[major tail protein [Arthrobacter phage Cassia]],,WGH21086,96.7213,9.52728E-123 SIF-HHPRED: YSD1_22 major tail protein; Bacteriophage tail, helical assembly, VIRAL PROTEIN; 3.5A {Bacteriophage sp.},,,6XGR_M,95.6284,98.5 SIF-Syn: /note=Primary Annotator Name: Claire, Monjov /note=Auto-annotation start source: Both Glimmer and Genemark call a start site of 9476. The start codon that is auto-annotated is an ATG start codon. /note=Coding Potential: There is high coding potential starting just outside of the start site and ending at 10018. More than 90% of the gene is covered by the start site. /note=SD (Final) Score: Out of all of the options, the final score of -2.072 is the lowest possible score and the one for the selected start site. The z-score is 3.219 which is over 2 and another good indicator that we have the correct start site. /note=Gap/overlap: There is a 12 base pair gap between the upstream gene and the start of our gene. This is very reasonable, especially when comparing the other start site`s gap of 255 base pairs. The length of the gene is 552 base pairs which is on the larger size but not unreasonable by any means. /note=Phamerator: As of 1/20/24 at 4:30 EST, the gene is part of pham 120341. This phage is part of sub-cluster AZ1, and there are many other phages (32) within the cluster giving a 42 percent frequency. I used phages Cassia and Adumb for comparison as they are AZ1 phages like IttyBittyPiggy. The function call on Phamerator was a major tail subunit protein which is on the approved function list. /note=Starterator: The start site 9467 is conserved for 100/102 non-draft phage genomes within the cluster. Start number 8 corresponds to base pair start site 9467 and this start site has 100 manual annotations making it the best option. /note=Location call: From all of the evidence, many of which is strong like the Starterator outputs, makes me believe that the start site is 9467. This is definitely a real gene based off of the coding potential in the forward direction. /note=Function call: Evidence from NCBI BLASTp and Phagedb BLAST point to a gene that codes for a major tail protein. The e-values from Phagesdb BLAST are very low, close to 10^-100 and the probability from NCBI BLASTp is over 90% for some hits. /note=Transmembrane domains: There are no hits for transmembrane protein meaning this is non a transmembrane gene protein most likely. /note=Secondary Annotator Name: Nathan, Joseph /note=Secondary Annotator QC: I have QCed this annotation and I agree with the location call due to the convincing z-score and final score favoring this start as well as having a reasonable start codon and LORF with good coding potential. I also agree with the function call, with strong blast and HHpred hits pointing to this gene being a major tail protein. CDS 10108 - 10371 /gene="14" /product="gp14" /function="tail assembly chaperone" /locus tag="IttyBittyPiggy_14" /note=Original Glimmer call @bp 10108 has strength 16.19; Genemark calls start at 10108 /note=SSC: 10108-10371 CP: yes SCS: both ST: SS BLAST-Start: [tail assembly chaperone [Arthrobacter phage Cassia]],,NCBI, q3:s4 97.7011% 9.1043E-52 GAP: 89 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.219, -1.9310779259753799, yes F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Arthrobacter phage Cassia]],,WGH21087,96.5909,9.1043E-52 SIF-HHPRED: SIF-Syn: There is good synteny however it appears that modifications may be needed to this gene since it appears there may be a frameshift mutation due to the coding potential observed in both genes 14 and 15. A meeting with the instructional team will be held to discuss this matter. /note=Primary Annotator Name: Labib, Youstina /note=Auto-annotation: Glimmer identifies the start site at 10108. GeneMark identifies the start site to be 10108. Both start sites match, indicating a greater likelihood of this being the correct start site. /note=Coding Potential: There is good coding potential observed within the first open reading frame for this gene on the GeneMarkS output report and GeneMarkHmm. This is supportive evidence that this gene is real and the coding potential on the forward confirms this is a forward gene. After discussion with the instructional team, revisions may need to be made to the gene itself with DNA master. A meeting will be held with the instructional team on Tuesday 1/23 to modify this gene. /note=SD (Final) Score: The SD (final) score is -1.931, this is the best-observed score it appears to have more supportive evidence compared to other calls. The z-score is 3.219 which is the best of those called. /note=Gap/overlap: This gene has an 89bp gap with the gene downstream. This gap was investigated and there is minimal coding potential observed within this gap. The instructional team will be reviewing this. We suspect that there may be a translational frameshift. /note=Phamerator: #133703 observed on 1/19/24. This has many members within its pham predominantly from the cluster AZ. Some of these genes within this pham come from the phages Adolin and Amyev. Many of the genes within this pham are tail assembly chaperones. /note=Starterator: 1/19/24 The start site that was called the most was start 9 which occurred 1 of 82 times in the AZ cluster. There are no non-draft phage annotations that were called. /note=Location call: Based on the above evidence, it appears this is a real gene and the start site is 10108. /note=Function call: Based on the presented evidence the functional call for this gene is tail assembly chaperone. There are multiple significant e-values demonstrated by PhagesDB blastp data for tail assembly chaperone. There are several NCBI hits indicative of tail assembly chaperones but no CDD hits. The HHPRED data demonstrates evidence for the tail assembly chaperone. /note=Transmembrane domains: There are no transmembrane domains observed. All the graph data supports inside signals indicating that this gene is not a transmembrane component. /note=Secondary Annotator Name: Valente, Nina /note=Secondary Annotator QC: I agree with the above evidence presented. All categories have been thoroughly considered. CDS join(10108..10365,10365..10706) /gene="15" /product="gp15" /function="tail assembly chaperone" /locus tag="IttyBittyPiggy_15" /note= /note=SSC: 10108-10706 CP: yes SCS: neither ST: NA BLAST-Start: [tail assembly chaperone [Arthrobacter phage Yang] ],,NCBI, q3:s5 98.995% 3.32329E-130 GAP: -264 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.219, -1.9310779259753799, yes F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Arthrobacter phage Yang] ],,YP_009815632,94.5274,3.32329E-130 SIF-HHPRED: SIF-Syn: This gene has synteny with Cluster AZ1 phages Adolin and Berrie. The major tail protein is upstream and the tape measure protein is downstream. /note=Primary Annotator Name: Woodward, Lauren /note=Auto-annotation: This gene was manually added by Dr. Freise. There was strong coding potential in the area and evidence of a frameshift commonly seen within the tail assembly chaperone. The gene was assigned to pham #134004 as of 01/23/24; however, this pham only contains draft genomes. The selected start site at 10108 has the best z-score and final score available, 3.219 and -1.931, respectively. The coding potential starts in the first ORF and shifts to the third ORF. /note=Location call: The start site is 10108, based on the above evidence. /note=Function call: Tail Assembly Chaperone. There are several significant non-draft BLASTp hits that have called the function as Tail Assembly Chaperone, the top two non-draft hits being from phages Yang and Cassia with E values of -48 and -46, respectively. Yang and Cassia are also the top two hits on NCBI BLAST, with E values of -58 and -56. There were no CDD hits. There were two hits on HHPred with significant E values, but the top hit was a domain of unknown function and the second top hit was a Bacteriophage Gp15 protein without a listed function. /note=Transmembrane domains: There are no TMDs, so this is not a membrane protein. /note=Secondary Annotator Name: Ryan Laureano /note=Secondary Annotator QC: I agree with this annotation. CDS 10723 - 13035 /gene="16" /product="gp16" /function="tape measure protein" /locus tag="IttyBittyPiggy_16" /note=Original Glimmer call @bp 10723 has strength 10.93; Genemark calls start at 10723 /note=SSC: 10723-13035 CP: yes SCS: both ST: SS BLAST-Start: [tail length tape measure protein [Arthrobacter phage Crewmate] ],,NCBI, q1:s1 100.0% 0.0 GAP: 15 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.209, -2.86135216970947, yes F: tape measure protein SIF-BLAST: ,,[tail length tape measure protein [Arthrobacter phage Crewmate] ],,YP_010678268,91.9271,0.0 SIF-HHPRED: Tape Measure Protein, gp57; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_BF,78.1818,99.6 SIF-Syn: Tape measure protein, upstream gene is tail assembly chaperone, downstream is minor tail protein, just like in Adolin (AZ1) and DrManhattan (AZ1) /note=Primary Annotator Name: Carnes, Julianne /note=Auto-annotation: Both Glimmer and Genemark call the start site at 10,723 /note=Coding Potential: Coding potential in host and self genemark covers the length of the entire gene in the forward direction only /note=SD (Final) Score: -2.861. Best final score, very good z-score of 3.209. /note=Gap/overlap: There is a 16 base pair gap. This is reasonable. /note=Phamerator: Pham 133481 as of 1/13/24. This gene is conserved and is found in 233/286 non-draft phages in clusters AZ, EE, AK mainly. Synteny. /note=Starterator: 404 error report. Server numbers do not match, so therefore there is no Starterator analysis yet as of 1/13/24. Update; Starterator report as of 1/23/24. Start site 6. Found in 20.7% of genes in this pham. 40/2238 manual annotations have been made of this start site, mainly those in the AZ cluster. /note=Location call: The start site of this gene is very likely 10,723 /note=Function call: Tape Measure Protein. Blastp had Obitoo (e=0.0) and Crewmate (e=0.0) as best hits with TMP function. NCBI Blastp had many hits, top two being Crewmate (e=0.0, 84% identity, 100% query cover) and Adumb2043 (e=0.0, 83% identity, 95% query cover) for TMP function. CDD had TMP_3 superfamily (8.74 e-32) as a specific hit. HHpred had hit for 6V8I_BF (e=4.6e-16, 99% probability) for tape measure protein. /note=Transmembrane domains: None /note=Secondary Annotator Name: Ryan, Kaitlin /note=Secondary Annotator QC: I agree with all of the evidence presented above and see no changes to be made at this time. The functional call is supported by significant and relevant evidence, and the location call is accurate based on supporting evidence. CDS 13035 - 14447 /gene="17" /product="gp17" /function="minor tail protein" /locus tag="IttyBittyPiggy_17" /note=Original Glimmer call @bp 13092 has strength 9.41; Genemark calls start at 13035 /note=SSC: 13035-14447 CP: no SCS: both-gm ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage Adolin]],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.899, -3.2535385006449706, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Adolin]],,QHB36599,83.2272,0.0 SIF-HHPRED: HYPOTHETICAL PROTEIN 19.1; VIRAL PROTEIN, DISTAL TAIL PROTEIN; 2.95A {BACILLUS PHAGE SPP1},,,2X8K_C,93.8298,99.9 SIF-Syn: /note=Primary Annotator Name: Chamorro, Marco /note=Auto-annotation: Glimmer calls a start site of 13092 and GeneMark calls a start site of 13035 which is the LORF. /note=Coding Potential: Both the host trained and self trained show coding potential that is consistent with the 13035 (GeneMark) start site. The Glimmer start site excludes a small region with coding potential. /note=SD (Final) Score: The GeneMark start site (13035) has the highest Z-score (2.899) and a Final score that is closest to zero (-3.254), relative to the other start sites. /note=Gap/overlap: The GeneMark start site has the smallest gap (-1) relative to the other start sites, and this is indicative of an operon. /note=Phamerator: As of 1/18/23, this gene belongs to the 88754 pham, which contains 6 members, (4 final, 2 drafts) /note=Starterator: Start site 1 was manually annotated in 4/4 of the non-draft genes. Start site 1 corresponds to the start site at 13035 in IttyBiggyPiggy. /note=Location call: This is a real forward gene with a start site of 13035. The coding potential, SD scores, minimal gap, and starterator report support this start site. /note=Function call: This gene appears to be a Minor Tail Protein. NCBI Blast and PhagesDB blast shows accurate hits to DrSierra, Adolin, and DrManhattan (All with e-values of 0.0), and these hits support the minor tail protein function call. HHpred obtained several hits which suggests the function is that of a distal tail protein. CDD obtained less accurate hits. /note=Transmembrane domains: TmHmm suggests that there are zero TMDs. /note=Secondary Annotator Name: SANCHEZ, KAYLA NICOLE /note=Secondary Annotator QC: I agree with the information provided, however I would include that the coding potential is found in the forward regions which makes it a forward gene. I would also include that the better final scores and z scores are associated with the start provided by GeneMark. I would include that the pham is conserved in other phages like DrSierra. Start site 3 is correlated to 13035. I would not check off the third HHpred hit because it has a very low coverage percentage (25%). Also don`t forget to check of the coding capacity box. Also fill out synteny box. CDS 14441 - 15652 /gene="18" /product="gp18" /function="minor tail protein" /locus tag="IttyBittyPiggy_18" /note=Original Glimmer call @bp 14438 has strength 12.15; Genemark calls start at 14438 /note=SSC: 14441-15652 CP: no SCS: both-cs ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage Adolin]],,NCBI, q1:s1 100.0% 0.0 GAP: -7 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.065, -4.697746622201849, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Adolin]],,QHB36600,90.0744,0.0 SIF-HHPRED: SIF-Syn: This gene has synteny with DrManhattan, Adolin, and more phages in the same cluster. /note=Primary Annotator Name: Mahadev, Anirudh /note=Auto-annotation: Glimmer and GeneMark both call the start site at 14438. The start codon at this position is ATG. /note=Coding Potential: There is reasonable coding potential in the second ORF on the Host-Trained GeneMark. The putative ORF does cover all coding potential. There is also reasonable coding potential on the Self-Trained GeneMark from PhagesDB. /note=SD (Final) Score: With a final score of -4.347, it is not the best SD score, but is reasonable enough to be a possible ribosome binding site. The Z score of the site is 2.065. /note=Gap/overlap: This start site has the longest open reading frame, and an overlap of 10 base pairs. The length is acceptable. While the start site at 14441 may have a smaller overlap and a similar length, the start site of GTG is unlikely compared to the ATG at 14438. /note=Phamerator: Pham 133656, Date: 1/17/24. This gene is conserved; it is found in several AZ and AK Cluster phages such as Adolin and Albanese. /note=Starterator: This pham has 91 non-draft members. Start site 5 in Starterator was found in 97/98 genes in the pham(including draft genomes), and manually called 83.5% (76/91) when present in non-draft genomes. Start 5 has coordinates (5, 14438). This leads me to believe that start site 5 at 14438 is the most conserved and most likely true start site. /note=Location call: Based on the above reasons, this is a real gene and the most likely start site is at 14438. /note=Function call: Several PhagesDB Blast hits suggest the function to be a minor tail protein compared to phages like DrManhattan and Adolin with e-values of 0 and the same cluster and pham as IttyBittyPiggy. HHPred did not seem to show any collagen or glycine rich domains, but the hits seemed to point to it being a tail protein. However, NCBI Blast had several hits calling it a minor tail protein with low e-values of 0, which satisfies the requirement for the mention of “minor tail protein” as part of its functional assignment. This leads me to believe that this is not part of the synthetic domain of the minor tail protein, but is still part of the minor tail protein. /note=Transmembrane domains: DeepTmHmm does not predict any trans-membrane domains. /note=Secondary Annotator Name: Hon, Darren /note=Secondary Annotator QC: There is good coding potential present in the selected ORF. Final score and z-score are both valid values. It is noted that this is the LORF. The gap of 10 is reasonable. As of 1/23/24, the gene is conserved and found in pham 133656. This is called in both phamerator and starterator, and 14438 is likely the true start site. According to NCBI Blast results, there is evidence to suggest it being a minor tail protein. The hits from HHPred support this claim, having strong hits that call it the tail protein. Therefore, the function is correctly noted as the minor tail protein. The only thing missing is marking the evidence for Phagesdb BLAST. CDS 15653 - 16660 /gene="19" /product="gp19" /function="minor tail protein" /locus tag="IttyBittyPiggy_19" /note=Original Glimmer call @bp 15653 has strength 11.92; Genemark calls start at 15653 /note=SSC: 15653-16660 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage Wildwest]],,NCBI, q1:s1 100.0% 0.0 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.466, -3.649482460397077, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Wildwest]],,WNO26039,88.0239,0.0 SIF-HHPRED: Putative tail fiber protein; Complex, VIRAL PROTEIN; 3.8A {Ralstonia phage GP4},,,8JOV_4,37.0149,96.8 SIF-Syn: phams 133656 appear before and pham 88988 appear after for this genome and Adolin (AZ1) as well as WildWest (AZ1) /note=Primary Annotator Name: Hosford, Ryan /note=Auto-annotation: Both GeneMark and Glimmer call the start site at 15653, which also has the best Z-score and final score of the start sites present. /note=Coding Potential: the coding potential from the Host-trained covers the gene completely but the Self-trained GeneMark seems like there is some reverse coding potential in the 16000 region which raises some suspicion. /note=SD (Final) Score: -3.649 which is the best of the bunch /note=Gap/overlap: 0 /note=Phamerator: Phamerator shows that most of the genes in this pham have similar lengths at around 1000 bp /note=Starterator: Starterator shows that other AZ1 phages have the same start 6 as this gene, showing 11 of the 13 being called manually. /note=Location call: I agree with the auto annotation site of 15653 for this gene due to the Z-score and Final score being the best, the coding potential being complete through the gene, and synteny observed with other AZ1 phages with Adolin being a clear example. /note=Function call: Looking at HHPred there appears to be high probability that this is a minor tail protein, following up with NCBI there is high coverage and allignment for minor tail proteins from other phages including AZ1 WildWest /note=Transmembrane domains: No transmembrane domains present. /note=Secondary Annotator Name: Claire, Monjov /note=Secondary Annotator QC: The evidence agrees with what the primary annotator started. The start site is appropriate along with the function call. I agree with the primary annotators conclusions. CDS 16657 - 17733 /gene="20" /product="gp20" /function="minor tail protein" /locus tag="IttyBittyPiggy_20" /note=Original Glimmer call @bp 16657 has strength 11.0; Genemark calls start at 16657 /note=SSC: 16657-17733 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage Wildwest]],,NCBI, q1:s1 100.0% 9.3265E-144 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.975, -2.5052746077145835, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Wildwest]],,WNO26040,76.9444,9.3265E-144 SIF-HHPRED: SIF-Syn: minor tail protein, upstream gene is minor tail protein just like in phage Adolin and DrManhattan. /note=Primary Annotator Name: Indiresan, Neeti /note=Auto-annotation: Both Glimmer and Genemark, start site 16657, start codon ATG /note=Coding Potential: The gene shows coding potential in the forward direction according to both host and self. The chosen start site covers this coding potential. /note=SD (Final) Score: -2.505. It is the best RBS final score on PECAAN. /note=Gap/overlap: 4 bp overlap. This indicates that the gene may be part of an operon, which is further supported by the fact that the overlap sequence is ATGA. The 4 bp overlap is also conserved in other AZ1 cluster genomes such as in phage Adolin. /note=Phamerator: pham: 88988. Date 01/18/2024. It is conserved, found in Adolin (AZ) and DrManhattan (AZ). /note=Starterator: Start site 1 in starterator was manually annotated in 4/4 non-draft genes in this pham. Start site 1 is 16657 in IttyBittyPiggy. This evidence agrees with the start site predicted by Glimmer and Genemark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 16657. /note=Function call: Minor tail protein. All non-draft phagesDB BLASTp hits had the function minor tail protein (e-values < 10^-12). The top 2 NCBI BLASTp hits had the function of minor tail protein (coverage 57-100%, identity > 63%, e-value < 10^-97). There were no informative hits in HHPred and no hits in CDD. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Labib, Youstina /note=Secondary Annotator QC: I agree with the location and functional call of the primary annotator. No changes needed. CDS 17803 - 18075 /gene="21" /product="gp21" /function="membrane protein" /locus tag="IttyBittyPiggy_21" /note=Original Glimmer call @bp 17803 has strength 10.05; Genemark calls start at 17803 /note=SSC: 17803-18075 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Cassia]],,NCBI, q1:s1 100.0% 9.42775E-51 GAP: 69 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.112, -2.2364030944336752, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Cassia]],,WGH21095,93.3333,9.42775E-51 SIF-HHPRED: SIF-Syn: Membrane protein, upstream gene is in pham #88988 and the downstream gene is in pham #132106. This is just like the phage Adolin, another member of the AZ1 cluster. /note=Primary Annotator Name: Kalliomaa, Kira /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 17803. /note=Coding Potential:Coding potential in this Open Reading Frame (ORF) is on the forward strand only, indicating that this is a forward gene. Coding potential is found in both GeneMark Self and GeneMark Host. /note=SD (Final) Score: -2.236. This is the best final score on PECANN. /note=Gap/overlap: : 69 bp. This is a somewhat large gap but in the end is reasonable due to other phages (Adolin and DrManhattan) conserving the gap. There is also no coding potential in the gap that could indicate a new gene. /note=Phamerator: 132066. Date Accessed: 01/17/2024. It is conserved; found in Adolin, Lego, and YesChef (AZ1). /note=Starterator: Start site 14 in Starterator was manually annotated in 13/21 non-draft genes in this pham (132066). Start 14 is 17803 in IttyBittyPiggy. This evidence agrees with the site predicted in both Glimmer and Genemark. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 17803. /note=Function call: Multiple hits on NCBI Blastp have the suggested function membrane protein. With e-values ranging from 9.42775e-51 to 1.10262e-46, identity 88.8889% to 88.764%, and coverage 100% . This is also supported by two different programs, Deep TMHMM and SOSUI, predicting one membrane domain. /note=Transmembrane domains: 1; Deep TMHMM predicts just one TMD. SOSUI also predicts one membrane domain. Based on this evidence, this gene can be assumed to have one real TMD and therefore is a membrane protein. /note= /note=Secondary Annotator Name: Woodward, Lauren /note=Secondary Annotator QC: I agree with both the location call and the function call, based on the logic presented above. Make sure to fill out the synteny box since this gene has an assigned function. CDS 18085 - 18336 /gene="22" /product="gp22" /function="membrane protein" /locus tag="IttyBittyPiggy_22" /note=Original Glimmer call @bp 18085 has strength 14.59; Genemark calls start at 18085 /note=SSC: 18085-18336 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Nitro]],,NCBI, q4:s3 96.3855% 6.66904E-41 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.309, -3.8366550365551095, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Nitro]],,WNN93979,92.6829,6.66904E-41 SIF-HHPRED: SIF-Syn: Membrane protein, upstream gene is in pham 132066, downstream gene is an endolysin in pham 124466, just like in non-draft AZ1 phage Nitro /note=Primary Annotator Name: Potter, Sofia /note=Auto-annotation: Glimmer and GeneMark both call the start at 18085. ATG is the start codon for this site. /note=Coding Potential: Both GeneMark Self and GeneMark Host show consistent, significant coding potential within the forward strand between the predicted start and stop sites for this gene, with no overlap and no coding potential in the reverse strand. /note=SD (Final) Score: The final score for the selected start site is -3.837, which is by far the most positive of the options on PECAAN. /note=Gap/overlap: The selected start site has a gap of 9 bp, which is highly preferable against the two other potential start sites` gaps of 60 and 168. /note=Phamerator: As of January 19, 2024, this gene is in pham 132106. This pham is widely conserved among non-draft members of AZ1, including but not limited to phages Adolin, Ascela, Cassia, Elezi, Eraser, Powerpuff, Tuck, and YesChef. /note=Starterator: Of the 38 non-draft members of cluster AZ1, start site 8 was called in 26 of these phages, making it the most annotated start. Start site 8 corresponds to 18085 bp in IttyBittyPiggy. /note=Location call: All aforementioned evidence supports this being a real gene, with a start site at 18085 bp. There is agreement among GeneMark, Glimmer, and Starterator for this start site. /note=Function call: CDD produced no hits on the protein sequence for this gene, and HHPred returned some results matching phage holins, but with extremely high e-values-- the lowest e-value was 24. Top PhagesDB BLAST non-draft hits are for proteins marked as "function unknown." NCBI BLASTp returned extremely close hits (e values of 7e-41, 5e-39, and 1e-38, query covers from 96-100%, percent identity 83-91%) for membrane proteins in fellow AZ1 phages Nitro, Amyev, Elezi, and Crewmate. /note=Transmembrane domains: DeepTMHMM predicts two transmembrane domains, the first being 21 amino acids long and the second being 19 amino acids long. This fits the SEA-PHAGES criteria to call this gene as a membrane protein, given that there is at least one TMD, and both TMDs are between 17-22 amino acids long. /note=Secondary Annotator Name: Carnes, Julianne /note=Secondary Annotator QC: I agree with this annotation. Starterator and phamerator report very strong, coding potential is strong with no potential in the reverse direction. Transmembrane domains meet SEA-PHAGES qualifications. CDS 18356 - 19831 /gene="23" /product="gp23" /function="endolysin" /locus tag="IttyBittyPiggy_23" /note=Original Glimmer call @bp 18356 has strength 10.28; Genemark calls start at 18356 /note=SSC: 18356-19831 CP: yes SCS: both ST: NI BLAST-Start: [endolysin [Arthrobacter phage London] ],,NCBI, q1:s1 99.1853% 0.0 GAP: 19 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.209, -2.4811409279978642, yes F: endolysin SIF-BLAST: ,,[endolysin [Arthrobacter phage London] ],,QOP64328,84.3373,0.0 SIF-HHPRED: Endolysin; amidase-2 domain, HYDROLASE; HET: ZN; 2.27A {Staphylococcus phage GH15},,,4OLS_A,39.7149,99.2 SIF-Syn: There is synteny with phage Aymev. It is a final draft phage. This gene is also called as an endolysin. /note=Primary Annotator Name: Salinas, Juan Carlos /note=Auto-annotation: Both Glimmer and GeneMark call this start site at 18356. This makes sense given that it is the most favorable Z-score (3.209) and final score (-2.481). At this start site, this gene also has the LORF, with start codon ATG, also known as favorable characteristics. /note=Coding Potential: There is strong coding potential in the forward direction only. This is evidence that this gene is well supported. The entire ORF at this start site covers the coding potential. /note=SD (Final) Score: -2.481. This is the highest final score and the most favorable, suggesting this is the best sequence match. The Z-score is 3.209. /note=Gap/overlap: 19bp. The gap at this start site is reasonable given that no gene likely fits in between. This start site decreases the gap the most out of all other predictions on PECAAN. /note=Phamerator: 124466 as of 1/22/2024 . There are 28 members in this pham, 19 of which are non-drafts. This gene is conserved across all members primarily of clusters AZ and FC. The approximate length ranges from 1401 - 1671bp long. /note=Starterator: Start site 8 was manually annotated in 16/19 of the non-draft genes. Start site 8 corresponds to start site at 18356 in IttyBiggyPiggy. /note=Location call: Considering all of the evidence available, I would confidently call this start site at 18356. This makes sense given the strong coding potential within the ORF. Additionally, it has the most convincing final score and z-score. Moreover, it reduces the gap the most. Staterator provided additional convincing information, revealing that this start site is conserved across many other non-draft genes such as Adolin, DrManhattan, and Tallboi. /note=Function call: Endolysin. BLAST provided many phages who’s function has been called as an endolysin at this gene with an e-value of 0. Additionally, because synteny exists with Aymev, its function revealed it`s also been called an endolysin. HHPRED had many significant hits. Most convincingly is a call as a peptidoglycan recognition protein with e-value 5.4e-14 and probability 99.58, among others. /note=Transmembrane domains: There are no transmembrane domains predicted by TmHmm. /note=Secondary Annotator Name: Chamorro, Marco /note=Secondary Annotator QC: I agree with the primary annotator`s location call and function call. The only thing that needs to be fixed is the pharmerator section because the date is missing. CDS 19968 - 20585 /gene="24" /product="gp24" /function="deoxynucleoside monophosphate kinase" /locus tag="IttyBittyPiggy_24" /note=Original Glimmer call @bp 19968 has strength 17.16; Genemark calls start at 19968 /note=SSC: 19968-20585 CP: yes SCS: both ST: SS BLAST-Start: [deoxynucleoside monophosphate kinase [Arthrobacter phage Cassia]],,NCBI, q1:s1 100.0% 1.19529E-139 GAP: 136 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.975, -2.442961286954254, yes F: deoxynucleoside monophosphate kinase SIF-BLAST: ,,[deoxynucleoside monophosphate kinase [Arthrobacter phage Cassia]],,WGH21097,97.561,1.19529E-139 SIF-HHPRED: DEOXYNUCLEOSIDE MONOPHOSPHATE KINASE; TRANSFERASE, PHOSPHOTRANSFERASE; HET: DGP, OCS; 2.0A {Enterobacteria phage T4} SCOP: c.37.1.1,,,1DEK_A,92.6829,99.8 SIF-Syn: Synteny expressed with Adolin, Amyev, and Adumb2043, all of which are nondraft phage. Deoxynucleoside monophosphate kinase, upstream gene is endolysin, downstream is NKF , just like in phage Amyev. /note=Primary Annotator Name: Zaragoza, Evelin /note=Auto-annotation: GeneMark and Glimmer. Both call the start at 19968, which is reasonable because it is the LORF, the start site has a large Z-score (statistical analysis of all final scores of that gene) of 2.975, where a more positive number above 2 is better. The final score is the highest at -2.443, which suggests the best sequence match to the Shine-Dalgarno sequence (likelihood ribosome binds to it). The start codon, ATG, is common. Length is acceptable at 618 bp. /note=Coding Potential: Gene likely exists as it is greater than 120bp with this start site. Chosen start site (19968) has most of the coding potential (in Self GeneMark and Host GeneMark) in direct sequence with the ORF showing reasonable potential. Does not overlap with surrounding genes. No significant overlapping coding potential on the complementary sequence. /note=SD (Final) Score: Final score is -2.443 with the z-score being 2.975. The z-score being over 2 is favorable and the final score is the highest out of all the other possibilities so this suggests this start site is likely. /note=Gap/overlap: There is a large gap with the gene to the left of 136 bp. Such a gap is conserved in other phage such as Adumb2043. However, there is very little coding potential to warrant the addition of a new gene, and the start site cannot be pushed back further. The high z-score and high final score support this conclusion as well. /note=Phamerator: The pham number is 97531 as of 01/05/2024. It is conserved and found in non-draft phages such as AEgle, AGrandiflora, AbbeyMikolon, Adolin, and Adumb2043, which are all part of the cluster AZ. IttyBittyPiggy is in cluster AZ and subcluster AZ1. /note=Starterator: Start site 51 in Starterator was manually annotated in 49/191 non-draft genes in pham 97531. However, IttyBittyPiggy does not have start site 51 as a possibility and thus it is not called. This evidence suggests that the site predicted by Glimmer and GeneMark (which corresponds to start site 46 in Starterator) is still the best option. Other candidate starts for IttyBittyPiggy_24 include: (134, 20352) and (156, 20469). However, these are not the best options since, as stated above, all these start sites have lower final scores and would not encompass more of the coding potential. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 19968.The gene does not need to be deleted because the coding potential proves that there needs to be a gene there and the start site called for this particular gene has most of the coding potential. /note=Function call: Deoxynucleoside monophosphate kinase. The top three phagesdb BLAST hits with known function have this same function (e-value <10^-90), and the top three NCBI BLAST hits also have this function (E-value<10^-115, coverage >95%, identity >80%). HHpred has several significant hits for deoxynucleoside monophosphate kinase (>85% coverage, E-values<10^-16, probability >90%). CDD had one hit with accession number PHA02575 for deoxynucleoside monophosphate kinase (E-value 7.85211e^-22 and coverage 87.3171%). /note=Transmembrane domains: DeepTMHMM predicts no TMDs. /note=Secondary Annotator Name: Mahadev, Anirudh /note=Secondary Annotator QC: I agree with the primary annotator`s annotation of this gene. CDS 20700 - 21278 /gene="25" /product="gp25" /function="hypothetical protein" /locus tag="IttyBittyPiggy_25" /note=Original Glimmer call @bp 20700 has strength 11.36; Genemark calls start at 20700 /note=SSC: 20700-21278 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein HOU52_gp25 [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 5.55422E-117 GAP: 114 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.302, -6.595517435460291, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU52_gp25 [Arthrobacter phage Yang] ],,YP_009815643,94.2708,5.55422E-117 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kibria, Kamille /note=Auto-annotation: Glimmer and GeneMark both call the start site at 20700. /note=Coding Potential: Self trained and Host Trained GeneMark shows reasonable coding potential, and show a start site predicted by GeneMark and Glimmer. All of the coding potential is contained within the start and stop site. /note=SD (Final) Score: -6.596. Not the best score among the candidates but is still reasonable. /note=Gap/overlap: 114. Smallest gap amongst candidates but still a large gap. Coding potential maps do not suggest to add a gene upstream. /note=Phamerator: Pham 1819 as of 1/18/24. There are 78 members in this pham. Conserved in the AZ and EH cluster. Length is generally between 400-600 bp. /note=Starterator: The start number called most often is 22. This was called by IttyBittyPiggy, corresponding to the start site at 20700. It was manually annotated in 38 of the 53 non draft genes in the pham. /note=Location call: The gene is real with a likely start site at 20700. /note=Function call: NKF. From PhagesDB Blast, significant hits were generated, providing good evidence of protein sequence similarity. Yang_25, Cassia_25, Iter_25 had sufficient identity and e values. These gene products were also part of the same pham and cluster. These were all “no known function.” NCBI pBLAST corroborated the significant hits, with 100% query cover and over 90% alignment in Yang_25 and Cassia_25. Identities and e values were more than sufficient to use as evidence. Significant hits were “hypothetical proteins.” CDD generated no hits at all. No significant hits in HHPRED (e values >100). There is also synteny with Yang, Cassia, Crewmate, and Iter. /note=Transmembrane domains: No TMHs predicted in TmHmm so not a membrane protein. /note=Secondary Annotator Name: Hosford, Ryan /note=Secondary Annotator QC: From reviewing the evidence I agree with the primary annotators findings that the start site is 20700 based off of autoannotation and the synteny it shares with the phams around it. also there is not enough evidence for any function and the hits coming up seem to align with NKF that other phages have for similar nucleotide sequences. CDS 21487 - 22314 /gene="26" /product="gp26" /function="exonuclease" /locus tag="IttyBittyPiggy_26" /note=Original Glimmer call @bp 21487 has strength 15.29; Genemark calls start at 21502 /note=SSC: 21487-22314 CP: yes SCS: both-gl ST: SS BLAST-Start: [exonuclease [Arthrobacter phage Amyev] ],,NCBI, q1:s1 100.0% 0.0 GAP: 208 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.054, -2.2763497933341483, yes F: exonuclease SIF-BLAST: ,,[exonuclease [Arthrobacter phage Amyev] ],,YP_010677730,99.2727,0.0 SIF-HHPRED: Mitochondrial genome maintenance exonuclease 1; human MGME1, DNA complex, DNA exonuclease, DNA BINDING PROTEIN; 2.702A {Homo sapiens},,,5ZYT_C,84.0,99.8 SIF-Syn: Exonuclease, upstream gene has no known function (just like in phages Amyev and Lego), downstream gene is HNH endonuclease (no synteny in any other phage directly after this exonuclease gene). However, the gene directly downstream of the HNH endonuclease gene is a nucleoside deoxyribosyltransferase (function most prevalent in AZ1 phages directly after the exonuclease gene). /note=Primary Annotator Name: Dweik, Qaiss /note=Auto-annotation: Both Glimmer and GeneMark agree on a start codon of GTG but disagree on the start site (Glimmer calls bp 21487 while GeneMark calls bp 21502). /note=Coding Potential: Gene has great coding potential within the putative ORF, as indicated by Self and Host GeneMark maps with high levels of typical coding potential. The start site does cover all of the coding potential, as it is placed to the left of the beginning of the high peaks of the coding potential and the stop site is placed to the right on the maps. /note=SD (Final) Score: The original start site (@ bp 21487) has the best (least negative) SD score at -2.276 and has the highest Z-score (3.054). /note=Gap/overlap: The gap between this gene`s original start site (@ bp 21487) and the preceding one is 208 bp, which isn`t very reasonable, but isn`t indicative of any missing gene in its place, as there is no coding potential in any forward ORF in the gap. This proposed start site also creates the longest reasonable ORF of all the options and the length is acceptable (828 bp). /note=Phamerator: As of 01/13/2024, this gene is found in pham 130576. Some of the members present in this pham are also part of the AZ1 cluster. Some of these phages include Berrie, Cassia, DrManhattan, DrSierra, JohnDoe, and Pumpkins. /note=Starterator: There is a reasonable start site for which the genes in this pham (130576) are conserved at start site 43 (which is at bp 21487 for IttyBittyPiggy_26). There are 127 non-draft members in this pham and an additional 38 draft members. Of the 127 non-draft members, 53 of them call start site 43. The Starterator program is informative, as few members in this pham are drafts and it provides the number of manual annotations for which the other non-draft genes have called start site 43. /note=Location call: The gathered evidence suggests that this is a real gene with the original start site @ bp 21487 being correct due to its complete encompassing of the coding potential, its creation of of the longest reasonable ORF of all the start site candidates, its ideal RBS and Z-score statistics, and its consistency with the Starterator report. /note=Function call: Predicted function is exonuclease, based on several hits from PhagesDB with e-values around 1e-156 and several hits from NCBI blastp with 100% query coverage and 97% sequence identity and e-values around 0 (Amyev_27 and Lego_26). HHpred also had 2 hits with >80% coverage, and e-values less than 1e-15. CDD returned no conserved domains for this gene. /note=Transmembrane domains: No transmembrane domains were predicted, signifying that this gene does not code for a membrane protein. /note=Secondary Annotator Name: Indiresan, Neeti /note=Secondary Annotator QC: I agree with your location call. I don`t think that the gap is an issue, as it seems to be conserved in other members of the cluster. I agree with your function call but I think that there is better evidence from HHPred that you can check instead. All boxes are filled out and evidence is checked. CDS 22317 - 22682 /gene="27" /product="gp27" /function="HNH endonuclease" /locus tag="IttyBittyPiggy_27" /note=Original Glimmer call @bp 22317 has strength 10.23; Genemark calls start at 22317 /note=SSC: 22317-22682 CP: no SCS: both ST: SS BLAST-Start: [HNH endonuclease [Arthrobacter phage VroomVroom]],,NCBI, q1:s2 100.0% 6.40116E-44 GAP: 2 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.802, -3.90505122954242, yes F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage VroomVroom]],,WIC90180,72.4409,6.40116E-44 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kathiravan, Anoushka /note=Auto-annotation: The start site is identified as 22317 by both GeneMark and Glimmer. /note=Coding Potential: Based on Host-trained GeneMark and Self-Trained GeneMark there is coding potential found in the forward direction by both systems because the region of coding falls between the predicted start site and the stop site. /note=SD (Final) Score: The Z score is 2.802 and the Final Score is -3.905. This is the highest Z score and the least negative Final score. /note=Gap/overlap: The gap is 2bp. /note=Phamerator: This gene is in pham 133563. Other members of the cluster AZ1 are present in this pham including AEgle, Community, and Joemato. /note=Starterator: On 1/17/23 the pham is 133563. There are 156 members in this pham. This gene contains start site 40 which is also present by 2 other members in the pham and called 50% of the time it was present. The gene does not contain the most annotated start site for this pham. /note=Location call: Start site is likely 133563 /note=Function call: HNH Endonuclease. The highest ranked phages in HHPred have the function of HNH endonuclease with an e value of 6.9e-19 at 99.8% probability. In NCBI BLAST the top phages also call HNH endonuclease protein with 59.84% identity. The CDD hits also have HNH endonuclease with an e-value of 3.56351e-11 at 47.82% identity. /note=Transmembrane domains: None. This makes sense because according to HHpred and CDD this is a HNH Endonuclease protein. These proteins would not have TMRs. /note= /note=Secondary Annotator Name: Kalliomaa, Kira /note=Secondary Annotator QC: I agree with this location and functional call, however some of the evidence categories need to be interpreted/corrected: /note=>Gap: as part of the annotation manual it says to not just list a value, but to interpret it. So you would want to add that the 2bp gap is acceptable because it is below the recommended 50bp limit. /note=>Phamerator, you would want to have the date you accessed phamerator, as the pham numbers frequently change over time. I am assuming but the genes you have listed are the ones that were for comparasion/are conserved? /note=>Starterator: I would double check this note, because it was mentioned that starterator did not have an agreeable start site with what was manually annotated for this pham, but you checked off on the top "starterator box" that it is the suggested start site. So I would double check that area and adjust the box to fit what your notes say. /note=>location call: because we have to interpret what we write, you would want to say that based on the evidence gathered, the likely start site is : ____. vs. just saying the start site with out any interpretation. /note=>Function call: Also make sure you check the boxes on PECANN that you are using as evidence. Since you are using HHpred, CDD and NCBI Blast, make sure to check those boxes as functional evidence. Finally, make sure that the synteny box is filled out since you`ve called a function! CDS 22679 - 23044 /gene="28" /product="gp28" /function="nucleoside deoxyribosyltransferase" /locus tag="IttyBittyPiggy_28" /note=Original Glimmer call @bp 22679 has strength 7.33; Genemark calls start at 22703 /note=SSC: 22679-23044 CP: yes SCS: both-gl ST: SS BLAST-Start: [nucleoside deoxyribosyltransferase [Arthrobacter phage Elezi]],,NCBI, q1:s1 100.0% 7.59474E-64 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.207, -4.190451406743478, yes F: nucleoside deoxyribosyltransferase SIF-BLAST: ,,[nucleoside deoxyribosyltransferase [Arthrobacter phage Elezi]],,QNJ56529,93.4426,7.59474E-64 SIF-HHPRED: SIF-Syn: Nucleoside deoxyribosyltransferase, upstream gene is HNH endonuclease, downstream gene is LAGLIDADG, shares synteny with Adumb2043 /note=Auto-annotation: Glimmer calls the start as 22679 which has a start codon of GTG and is the LORF, but GeneMark calls it at 22703 with ATG as the start codon. /note=Coding Potential: There is strong coding potential for the range with the start site called by GeneMark on both Host and Self Trained, and a little coding potential at the start site called by Glimmer on the Self Trained GeneMark. /note=SD (Final) Score: The selected suggested gene has the least negative final score (-4.190) and the highest Z-score (2.207). /note=Gap/overlap: The selected gene which starts at 22679 has an overlap of -4 which is indicative of an operon and is the LORF. /note=Phamerator: As of 1/15/24, Phamerator calls the pham number as 131745. There are 348 members in this pham from clusters such as E, AS, AZ, etc. Examples include finalized genes such as those found in Abidatro (AS) and Adolin (AZ). /note=Starterator: As of 1/15/24, Starterator calls the pham number as 131745. The pham has 348 members, 49 of which are drafts. IBP does not have the most annotated start site and instead calls start 33 (start@22679) which is found in 7.8% of genes in this pham. /note=Location call: There is strong evidence for the realness of this gene and that it starts at 22679. There is strong synteny with Ascela and Cassia. Additionally, all of the coding potential seems to be contained between this range. /note=Function call: There is extremely compelling evidence for the function of this gene to be a nucleoside deoxyribosyltransferase based on hits found in both PhagesDB and NCBI Blast. PhagesDB lists YesChef and Powerpuff with e-values of 5e-61 both with this function. NCBI Blast also showed strong hits with phages Elezi (100% coverage, 7e-64) and Powerpuff (98%, 9e-64) both with this function. HHPRED and CDD did not show any significant evidence. /note=Transmembrane domains: There is no evidence of any TMHs. /note=Secondary Annotator Name: Potter, Sofia /note=Secondary Annotator QC: Remember to fill out the box below Pham Starterator at the top of the page, and the All GM Coding Capacity box, along with the synteny box since a specific function was called. Other than that, I agree that the start call for 22679 looks good, and the function call looks good based on synteny with similar phages, too. CDS 23044 - 23448 /gene="29" /product="gp29" /function="LAGLIDADG endonuclease" /locus tag="IttyBittyPiggy_29" /note=Original Glimmer call @bp 23044 has strength 11.9; Genemark calls start at 23044 /note=SSC: 23044-23448 CP: yes SCS: both ST: SS BLAST-Start: [HNH endonuclease [Arthrobacter phage Crewmate] ],,NCBI, q1:s1 100.0% 8.58993E-90 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.493, -3.513874779476501, yes F: LAGLIDADG endonuclease SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage Crewmate] ],,YP_010678283,99.2537,8.58993E-90 SIF-HHPRED: RRNA intron-encoded endonuclease; protein-DNA complex, LAGLIDADG, homing, endonuclease, DNA recognition, HYDROLASE-DNA COMPLEX; 2.5A {Vulcanisaeta distributa},,,3E54_A,77.6119,99.6 SIF-Syn: Gene 29 in IttyBittyPiggy aligns with gene 32 in Crewmate (cluster AZ) and both have the same function; gene 28 lines up with 31 and both are nucleoside ribosyltransferase; gene 30 lines up with 33 and both are recombination directionality factor. Gene 29 in IttyBittyPiggy aligns with gene 28 in Crewmate (cluster AZ) and both have the same function; gene 28 lines up with 27 and both are nucleoside ribosyltransferase; gene 30 lines up with 29 and both are recombination directionality factor. /note=Primary Annotator Name: Ebrahimi, Shaghaiegh Mariam /note=Auto-annotation: Glimmer and Genemark both call start at 23044. /note=Coding Potential: Good coding potential is present in Self-Genemark and HostGenemark. It is present throughout the entire reading frame and only in the forward strand. /note=SD (Final) Score: -3.514 this is the largest final score among the start sites. /note=Gap/overlap: This start site has the smallest gap which is -1 which is indicative of an operon. /note=Phamerator: Pham as of 1/19/24 is 129258 and includes 78 other members mostly in the AZ cluster with some in EH and FP (23 of the other members are drafts). /note=Starterator: Start site number 22 is called for this gene and is the most annotated in Starterator. It was called in 39/53 non-draft genomes and is 100% of the time when present. /note=Location call: Most likely start site is 23044 since it was called by both programs and has the highest z-score (2.493) and final score (-3.514), and was most annotated in Starterator. Gene also has the smallest gap and likely start codon. /note=Function call: Most likely function is LAGLIDADG endonuclease. Good hits on phagesdb BLAST for this function with e-values ranging from 4e-75 to 3e-06. Good hits on HHpred with e-values ranging 5.3e-14 to 0.15. Many good hits on NCBI BLAST ranging from 5e-88 to 3e-09. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs and so this is not a transmembrane protein. /note= /note=Secondary Annotator Name: Salinas, Juan /note=Secondary Annotator QC: I agree with the start site called by the primary annotator. The final score and z score are the most convincing, given that this start site also has the most reasonable gap of -1. The evidence from starterator is also very favorable. I also agree with the function call given the evidence supported by HHPRED, BLAST and NCBI BLAST. CDS 23578 - 24291 /gene="30" /product="gp30" /function="recombination directionality factor" /locus tag="IttyBittyPiggy_30" /note=Original Glimmer call @bp 23578 has strength 15.52; Genemark calls start at 23578 /note=SSC: 23578-24291 CP: yes SCS: both ST: SS BLAST-Start: [recombination directionality factor [Arthrobacter phage Iter] ],,NCBI, q1:s1 100.0% 9.82461E-163 GAP: 129 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.493, -4.103700314387452, yes F: recombination directionality factor SIF-BLAST: ,,[recombination directionality factor [Arthrobacter phage Iter] ],,URQ05018,98.3122,9.82461E-163 SIF-HHPRED: Gp3-like ; Recombination directionality factor-like,,,PF18897.4,87.7637,100.0 SIF-Syn: Recombination directionality factor; upstream gene is pham 129258, downstream is pham 135578, just like in phages TforTroy and Tbone /note=Primary Annotator Name: Tran, Michelle /note=Auto-annotation: Both Glimmer and GeneMark predict the start site for this gene as 23578. /note=Coding Potential: The coding potential for this gene is indicated on both the host-trained and self-trained GeneMark in the assigned ORF. This coding potential is only found on the forward strand, which supports the annotation of this as a forward gene. /note=SD (Final) Score: The final score for this start site is -4.104, which is the best available final score for this gene on PECAAN. /note=Gap/overlap: There is a 129-bp gap between this gene’s start site and the stop site for the upstream gene. This is a somewhat large gap, but this is the smallest available gap (and largest available ORF) for this gene on PECAAN, and the coding potential on GeneMark does not support moving the start site elsewhere. /note=Phamerator: As of January 19, 2024, this gene is under pham 848. This is conserved in other phages in cluster AZ1, such as Amyev and TforTroy. /note=Starterator: Start site 39, which corresponds to 23578 in IttyBittyPiggy, was manually annotated in 58/121 of the non-draft genes in the pham. This would match with the results predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, the gene is most likely real and has a start site at 23578. /note=Function call: Recombination directionality factor. Both Phagesdb BLAST and NCBI BLAST yield extremely strong hits supporting this result (e-value <10e-125). HHpred has a strong hit supporting this (e-value: 2.9e-33), but all of the other hits are extremely weak (e-value >100). There was no available data in the Conserved Domain Database. /note=Transmembrane domains: This is not a membrane protein because DeepTMHMM does not predict any TMDs. /note=Secondary Annotator Name: Zaragoza, Evelin /note=Secondary Annotator QC: I agree with this annotator`s start site and function call. They may choose to elaborate on certain areas e.g. auto-annotation, SD. GM box should be checked. CDS 24291 - 24428 /gene="31" /product="gp31" /function="membrane protein" /locus tag="IttyBittyPiggy_31" /note=Original Glimmer call @bp 24291 has strength 12.81; Genemark calls start at 24291 /note=SSC: 24291-24428 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Amyev] ],,NCBI, q1:s1 100.0% 1.21334E-15 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.08, -4.455662434380964, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Amyev] ],,YP_010677734,88.8889,1.21334E-15 SIF-HHPRED: SIF-Syn: Comparing with phage Amyev, the aligned gene and downstream gene has no listed function, however the upstream gene is both a recombination directionality factor. /note=Primary Annotator Name: Kim, Abby /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 24291. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.456. It is the best final score on PECAAN. /note=Gap/overlap: 1 bp overlap which is very small. The overlap is small enough to be negligible. /note=Phamerator: pham: 135578. Date 1/22/24. It is conserved and found in DrMahattan_29 (AZ) and Tweety19_31 (AZ), in addition to many other phages from various clusters /note=Starterator: Start site 4 in Starterator was manually annotated in 31 out of 31 genes in this pham. Start 4 agrees with the site predicted by Glimmer and GeneMark 24291. /note=Location call: Based on the evidence, this is a real gene and the most likely start site is 24291. /note=Function call: Membrane Protein. The fourth and fifth phagesDB BlastP hits have the function as NKF (Amyev_31: 5e-15 and Lizalica_30: 8e-15). The top NCBI Blast hit has the function call of a membrane protein with a query cover of 100% (e-value: 1e-15 and 73.33% identity) and the second hit is also a membrane protein with a query cover of 100% (e-value: 2e-15 and 75.56% identity). Based on phagesDB and NCBI Blast, it is more likely to be a membrane protein because of the high query coverage and lower e-values. HHpred had a hit for Lipopolysaccharide assembly protein A domain, however the e-value of 35 is too high and the probability is very low at 55.26%. When running CDD, it does not produce any hits so there is no available data. /note=Transmembrane domains: DeepTMHMM predicts 1 TMD with the length of 45, therefore it is a membrane protein. /note=Secondary Annotator Name: Kibria, Kamille /note=Secondary Annotator QC: Pham as of 1/22/24 is 135578. Other than this I agree with the primary annotation. CDS 24503 - 24853 /gene="32" /product="gp32" /function="hypothetical protein" /locus tag="IttyBittyPiggy_32" /note=Original Glimmer call @bp 24503 has strength 18.0; Genemark calls start at 24503 /note=SSC: 24503-24853 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CASSIA_33 [Arthrobacter phage Cassia]],,NCBI, q1:s1 100.0% 2.80601E-76 GAP: 74 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.975, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CASSIA_33 [Arthrobacter phage Cassia]],,WGH21106,98.2759,2.80601E-76 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: To, Nathan /note=Auto-annotation: Both Glimmer and Genemark call this start site at 24503 with start codon GTG. /note=Coding Potential:Both Genemark Self and Host show coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. /note=SD (Final) Score: -2.505, this is the best final score available. /note=Gap/overlap: Gap of 74, this is slightly large but not unreasonable and the smallest gap possible, it also allows for the largest ORF. /note=Phamerator: As of 1/17/24, phamerator calls the pham 132100. This pham contains 55 members and has phages from only the AZ cluster. No function calls are present. /note=Starterator: As of 1/13/24, starterator calls the pham 132100. This pham contains 55 members, 19 of which are drafts. The selected start site is the most annotated, called 100% of the time when present. /note=Location call:The gathered evidence suggests that this is a real gene, with start site @24503. This gene has good coding potential, and does not have large gaps before or after it. The start site 24503 seems most likely due to both Glimmer and Genemark calling it and its good Z score and final score. Starterator also provides good evidence for it, with it being the most annotated site and being called 100% of the time when present. /note=Function call: No known function, based on multiple NCBi and PhagesDB BLASTs with unknown function and low e-values (2e-60 for phage Cassia).. HHpred or CDD do not make any convincing calls. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs. /note=Secondary Annotator Name: Dweik, Qaiss /note=Secondary Annotator QC: I agree with the location and function call for this gene`s annotation. However, the coding capacity and Starterator drop-downs should be selected. Also, I think other evidence for the PhagesDB BLAST should be selected instead of the draft genomes (TforTroy_Draft and Pumpkins_Draft) and some evidence should be selected for the NCBI BLAST. CDS 24853 - 25107 /gene="33" /product="gp33" /function="NrdH-like glutaredoxin" /locus tag="IttyBittyPiggy_33" /note=Original Glimmer call @bp 24853 has strength 12.59; Genemark calls start at 24853 /note=SSC: 24853-25107 CP: yes SCS: both ST: SS BLAST-Start: [glutaredoxin [Arthrobacter phage Yang] ],,NCBI, q5:s9 95.2381% 2.5377E-35 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.042, -4.457038081200389, yes F: NrdH-like glutaredoxin SIF-BLAST: ,,[glutaredoxin [Arthrobacter phage Yang] ],,YP_009815651,75.0,2.5377E-35 SIF-HHPRED: SIF-Syn: Compared to phage Yang, neither the upstream or downstream genes have a listed function, but the aligned gene has a function of NrdH-like glutaredoxin. /note=Primary Annotator Name: Valente, Nina /note=Auto-annotation: Both Glimmer and GeneMark call this gene. They agree on start site 24853. /note=Coding Potential: All of the coding potential is covered by this start site. Coding potential is in the forward direction. /note=SD (Final) Score: The final score is -4.457. This is the best final score listed. /note=Gap/overlap: There is a 1 base pair overlap. Given how small this overlap is, it does not appear to provide any notable evidence for or against this gene. /note=Phamerator: As of 1/18/24, the gene belongs to pham 133401. There are 819 other members of this pham. /note=Starterator: The start site of 24853 is found in 0.5% of genes in this pham. There are 2 out of 741 manual annotations of this start and it is called 100% of the time when present. This start site does not provide the longest ORF. The start site with the longest ORF has a final score of -7.84 and a 256 base pair overlap. /note=Location call: Based on the above evidence, this is most likely a real gene with a start site at 24853. /note=Function call: The top five non-draft hits from Phagesdb Blastp are listed as NrdH-like glutaredoxin. The top hit, Yang, has an e-value of 2E-28 and identities = 74%. The top two HHpred hits also list glutaredoxin like protein NRDH and some derivatives. The top result has an e-value of 1.6E-10. The top seven CDD hits list glutaredoxin-like protein NrDH (e-value = 1.81E-19) or a similar/related protein, including NrdH-redoxin family (e-value = 9.39E-18), and glutaredoxin-like protein (e-value = 1.25E-13). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Kathiravan, Anoushka /note=Secondary Annotator QC: I agree with more the start site call and the function call. Based on the evidence and auti-annotation the start site is likely 24853 and the most likely function is NrdH-like glutaredoxin. CDS 25104 - 25307 /gene="34" /product="gp34" /function="hypothetical protein" /locus tag="IttyBittyPiggy_34" /note=Original Glimmer call @bp 25104 has strength 15.71; Genemark calls start at 25104 /note=SSC: 25104-25307 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQD88_gp34 [Arthrobacter phage Amyev] ],,NCBI, q1:s1 100.0% 3.01007E-37 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.054, -2.417348306996335, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQD88_gp34 [Arthrobacter phage Amyev] ],,YP_010677737,95.5224,3.01007E-37 SIF-HHPRED: SIF-Syn: NKF /note=Primary Annotator Name: Laureano, Ryan /note=Auto-annotation: GeneMark and Glimmer both call the start site at 25104 and the codon is ATG. /note=Coding Potential: There is good coding potential and the suggested start site covers all of it. /note=SD (Final) Score: The suggested start site has a final score of -2.417 which is significantly lower than the other two suggested start sites which have -6 and -7. /note=Gap/overlap: The overlap with the suggested start site is -4 which may indicate overlap in an operon. This has a small and reasonable overlap compared to the other start sites which have large overlaps at -61 and -205. /note=Phamerator: As of 1/16/24, this gene is in pham 130856. It is in cluster AZ. Some genes that share the same pham and cluster include Lego_34, Phives_36, and Yang_34. /note=Starterator: The most published annotated start number is 19 which is called in 30/32 non-draft genomes in the pham. IttyBittyPiggy calls this start number which corresponds to 25104. /note=Location call: This gene is real as it has good coding potential and good length. The start site suggested is backed by starterator. It is not the LORF but has the best Z-score, final score, and smallest overlap. Thus the start site is at 25104. /note=Function call: PhagedbBLAST have hits with Amyev and JohnDoe at low e-values but both of these genes have NKF. These same genes result in hits with NCBI BLAST but are marked as hypothetical proteins. HHpred has no hits with low e-values. /note=Transmembrane domains: There are no recorded TMRs. /note=Secondary Annotator Name: Aves, Alexandra /note=Secondary Annotator QC: I have reviewed the evidence gathered above and agree with the function call of NKF. CDS 25304 - 25516 /gene="35" /product="gp35" /function="hypothetical protein" /locus tag="IttyBittyPiggy_35" /note=Original Glimmer call @bp 25304 has strength 10.5; Genemark calls start at 25304 /note=SSC: 25304-25516 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQE17_gp40 [Arthrobacter phage Crewmate] ],,NCBI, q1:s1 100.0% 3.72459E-33 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.347, -3.7560019881956337, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE17_gp40 [Arthrobacter phage Crewmate] ],,YP_010678291,90.0,3.72459E-33 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Ryan, Kaitlin /note=Auto-annotation: Both Glimmer and GeneMark call the start site as 25304. /note=Coding Potential: There is high coding potential in the forward direction as indicated by only the direct sequence on both GeneMark Host and GeneMark S outputs. /note=SD (Final) Score: -3.756. This is the best and least negative final score on PECAAN. /note=Gap/overlap: There is an overlap of 4 base pairs, indicating that this gene is an operon. /note=Phamerator: Pham: 10329. Date: 1/11/2024. This gene also shows extensive synteny with phages in the same subcluster (AZ1) and pham (10329), such as Ascela, Crewmate, and Iter. /note=Starterator: Start site 6 was the most annotated in 2 out of 4 of the non-draft genes in this pham. This is also called as the most annotated start for IttyBittyPiggy with 2 manual annotations. /note=Location call: Based on the above evidence, this is a real gene and the start site is 25304. /note=Function call: Function unknown, based on the majority of significant non-draft hits from PhagesDB blast (phages Crewmate, ObiToo, e-values < 3e-27) and NCBI Blast (phages Crewmate, ObiToo, > 81% identity, > 87% aligned, > 94% coverage, e-values < 6.1e-32), as of 1/11/2024. HHPRED and CDD were irrelevant. /note=Transmembrane domains: Deep TMHMM does not predict any TMDs; therefore, this is not a transmembrane protein. /note=Secondary Annotator Name: Ebrahimi, Shaghaiegh Mariam /note=Secondary Annotator QC: I agree with the function and location calls. Would maybe elaborate on the starterator section, but it is sufficient. CDS 25503 - 25862 /gene="36" /product="gp36" /function="Holliday junction resolvase" /locus tag="IttyBittyPiggy_36" /note=Original Glimmer call @bp 25503 has strength 13.01; Genemark calls start at 25503 /note=SSC: 25503-25862 CP: no SCS: both ST: SS BLAST-Start: [holliday junction resolvase [Arthrobacter phage Cassia]],,NCBI, q1:s1 100.0% 1.0593E-76 GAP: -14 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.339, -3.8362837638757656, yes F: Holliday junction resolvase SIF-BLAST: ,,[holliday junction resolvase [Arthrobacter phage Cassia]],,WGH21109,97.479,1.0593E-76 SIF-HHPRED: Holliday junction resolvase; archeal holliday junction resolvase helicase DNA binding enzyme phage 15-6 thermus thermophilus, RECOMBINATION; HET: SO4, MSE; 2.5A {Thermus thermophilus phage 15-6},,,7BGS_A,89.916,99.7 SIF-Syn: Holliday junction resolvase, upstream gene is NKF, downstream is DNA primase/helicase, just like in phage Yang. Upstream gene has different pham number to Yang. /note=Primary Annotator Name: Sanchez, Kayla /note=Auto-annotation: Both Glimmer and Genemark. Both call the start at 25503. /note=Coding Potential: Good coding potential is found both in GeneMark Self and Host. However, the start and stop sites are within the coding potential. Coding potential is in the forward region so we can say that our gene is a forward gene. /note=SD (Final) Score: -3.836. It is the best final score on PECAAN because it is the lowest negative value. /note=Gap/overlap: Overlap of 14bp. This overlap remains fairly conserved in other phages like Tbone with an overlap of 3bp. Acceptable gene length of 360bp. /note=Phamerator: pham: 135496. Date 1/11/24. It is conserved; found in Yang and KeAlii (AZ1). /note=Starterator: Start number 19 in Starterator was manually annotated in 27/28 non-draft genes in this pham. Start 19 is 25503 in IttyBittyPiggy and has 27 manual annotations. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the good coding potential and the small e-value, this gene is real and the most likely start site is 25503. Our findings also show synteny because of the similar position as well as similar sizes of genes (IttyBittyPiggy36 - 360 bp, Yang11 - 359 bp). However, this synteny is not always conserved in other phages which could be a sign for concern or could mean an upstream gene is not accurate. /note=Function call: Holliday Junction Resolvase. The top three PhagesDB BLASTp non-draft hits have the function of a holliday junction resolvase (E-value <10^-56), and the top two NCBI BLAST hits are holliday junction resolvase (90%+ identity, 100% coverage, E-value <10^-72). The first HHpred hit is also holliday junction resolvase with a 99+% probability and e-value<10-16. CDD did not have any domain hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. It predicts that the protein is found on the inside of the membrane. /note=Secondary Annotator Name: Tran, Michelle /note=Secondary Annotator QC: I agree with the location and function calls, but the pham has changed and the relevant information needs to be updated. CDS complement (25829 - 25993) /gene="37" /product="gp37" /function="hypothetical protein" /locus tag="IttyBittyPiggy_37" /note= /note=SSC: 25993-25829 CP: no SCS: neither ST: NI BLAST-Start: [hypothetical protein SEA_OBITOO_41 [Arthrobacter phage ObiToo]],,NCBI, q1:s1 96.2963% 1.51744E-22 GAP: 201 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.906, -4.820269098574683, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_OBITOO_41 [Arthrobacter phage ObiToo]],,WGH21217,85.4545,1.51744E-22 SIF-HHPRED: SIF-Syn: CDS 26195 - 28678 /gene="38" /product="gp38" /function="DNA primase/helicase" /locus tag="IttyBittyPiggy_38" /note=Original Glimmer call @bp 26195 has strength 12.53; Genemark calls start at 26195 /note=SSC: 26195-28678 CP: yes SCS: both ST: SS BLAST-Start: [DNA primase/helicase [Arthrobacter phage ObiToo]],,NCBI, q1:s1 100.0% 0.0 GAP: 201 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.975, -2.442961286954254, yes F: DNA primase/helicase SIF-BLAST: ,,[DNA primase/helicase [Arthrobacter phage ObiToo]],,WGH21219,95.8788,0.0 SIF-HHPRED: Putative primase C962R; polymerase, primase, PrimPol, Helicase, DNA BINDING PROTEIN; HET: ANP;{African swine fever virus BA71V},,,8IQI_C,54.5345,100.0 SIF-Syn: Synteny is present with Crewmate and Eraser, both calling the gene as a DNA primase/helicase. /note=Primary Annotator Name: Hon, Darren /note=Auto-annotation: Gene 37 (stop@28678F) /note=Coding Potential: Good coding potential is present and called by both host and self-trained GeneMark. Both Glimmer and Genemark call the start site, 26195. No switches in gene orientation are present. The length of the gene is 2595 kp. /note=SD (Final) Score: The start codon is ATG. The final score of -2.443 and z-score of 2.975 indicate a good RBS score. The called start site of 26195 is kept. /note=Gap/overlap: There is a large gap of 221 kp. However, in accordance with the pham maps, comparison to non-draft phages demonstrate the same gap present. There is also synteny present between this and non-draft phages. Therefore, this gap is negligible. /note=Phamerator: As of 1/17/24, the gene is in pham 85082. There are a total of 155 members, 36 of which are drafts. /note=Starterator: As of 1/17/24, the most annotated start is start number 44. The start position is listed at 26195, which was also the auto-annotated start. /note=Location call: Strong coding potential is found between the start site of 26195 and stop site of 28678. The start site is supported by both Glimmer and GeneMark. Additionally, the final score of -2.443 and z-score of 2.975 are optimal values. As of 1/17/24, the phamerator and starterator both call the start site at 26195 with start number 44. There is a gap of 332, but as demonstrated by synteny in the pham maps, other non-draft genes have this gap. Therefore, this is a real gene and the start site is at 26195. /note=Function call: CDD did not indicate any hits. Blastp indicated many hits with an overarching function of it being a DNA primase/helicase, such as Crewmate and Cassia. HHPred indicated many hits in which the function, with 100% probability is a DNA primase, found by 8IQI_C and 8APM_B, with 3.5e-35 and 1.1e-31 respectively. Therefore, it is concluded that the function of this protein is a DNA primase/helicase. /note=Transmembrane domains: DeepTMHMM indicate no numbers of predicted TMRs, indicating that it is not a transmembrane protein. /note=Secondary Annotator Name: Kim, Abby /note=Secondary Annotator QC: I agree with this annotation. Just make sure to fill out the synteny box and be more specific with which hits and their e-values for the function call. Make sure to check the boxes for the evidence below in regards to the hits. CDS 28689 - 28799 /gene="39" /product="gp39" /function="hypothetical protein" /locus tag="IttyBittyPiggy_39" /note=Genemark calls start at 28689 /note=SSC: 28689-28799 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein PQE13_gp38 [Arthrobacter phage Elezi] ],,NCBI, q1:s2 100.0% 1.02139E-12 GAP: 10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.219, -1.9310779259753799, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE13_gp38 [Arthrobacter phage Elezi] ],,YP_010678016,89.1892,1.02139E-12 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Claire, Monjov /note=Auto-annotation start source: There is no evidence from Glimmer and only a start site from GeneMark. GeneMark calls a start site of 28689. There is only one annotated start site that we will have to use, we don`t have any more to compare to. The start site codon that corresponds to the singular start site is an ATG codon /note=Coding Potential: There is some coding potential but it doesn`t completely cover the gene. It starts at the correct start site but only covers the first 60 percent or so of the gene. Not conclusive evidence in either direction. /note=SD (Final) Score: The final score is -1.931 which is relativley positive in most cases. Since we can`t compare with start sites, it doesn`t mean much. The z-score is 3.219 which is another good score because it is over 2 and we can use that as evidence that the gene is real along with the start site being in the correct position. /note=Gap/overlap: There is a 10 base pair gap which is not unreasonable. There is not too large of a gap to leave a gene missing. The gene is 111 base pairs long which is on the shorter side but not conclusive evidence to disregard it. /note=Phamerator: As of 1/20/24 at 4:55pm EST, this gene is part of pham 85720. All of the other phages that we a part of this pham were all AZ cluster phages as well. There is nothing there to indicate a function. /note=Starterator: 14/37 non-draft genome phages within the cluster called the same start site of 10. This corresponds to start site 28689. /note=Location call: Based on the information, even though there wasn`t much at all, the start site 28689 seems like a very possible start site for the gene. The gene is on the shorter side but the coding potential is a good indicator of the gene. /note=Function call: For now, all evidence found cannot give us a function call. I believe the gene is real but there isn`t any evidence to point to a function so for now, it will be classified as NKF. /note=Transmembrane domains: There were no hits for a transmembrane protein so most likely this is an internal gene coding protein. /note=Secondary Annotator Name: Nathan, Joseph /note=Secondary Annotator QC: I have QCed this annotation and I agree with the location call due to the convincing z-score and final score favoring this start as well as having a reasonable start codon and LORF with good coding potential. Additionally, it is the only start site available to select. I also agree with the function call, the blast hits have NKF and no HHpred calls are convincing, so NKF seems to be the current best call. CDS 28784 - 28939 /gene="40" /product="gp40" /function="hypothetical protein" /locus tag="IttyBittyPiggy_40" /note=Original Glimmer call @bp 28784 has strength 11.04; Genemark calls start at 28784 /note=SSC: 28784-28939 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQE18_gp38 [Arthrobacter phage DrSierra] ],,NCBI, q5:s3 92.1569% 4.82729E-15 GAP: -16 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.856, -5.435213932249347, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE18_gp38 [Arthrobacter phage DrSierra] ],,YP_010678363,71.6981,4.82729E-15 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Labib, Youstina /note=Auto-annotation: Glimmer identifies the start site at 28784. GeneMark identifies the start site to be 28784. Both start sites match, indicating a greater likelihood of this being the correct start site. /note=Coding Potential: There is good coding potential observed within the second open reading frame for this gene on the GeneMarkS output report and GeneMarkHmm. This is supportive evidence that this gene is real and the coding potential on the forward confirms this is a forward gene. /note=SD (Final) Score: The SD (final) score is -5.435, this is the only call. The z-score is 1.856 which is the only score reported. /note=Gap/overlap: This gene has a 16bp overlap with the gene downstream. This is minor and no modification is needed. /note=Phamerator: #4064 observed on 1/19/24. This has many members within its pham predominantly from the cluster AZ. Some of these genes within this pham come from the phages Adolin and Lego. Many of the genes within this pham have no known function. /note=Starterator: The start site that was called the most was start 2 which occurred 22 of 27 times in the AZ cluster. There are no non-draft phage annotations that were called. /note=Location call: Based on the above evidence, it appears this is a real gene and the start site is 28784. /note=Function call: Based on the lack of functional evidence it appears that this gene has no known function (NKF). There are no significant e-values demonstrated by PhagesDB blastp data besides NKF. There are no significant NCBI or CDD hits. The HHPRED data demonstrates no significant functions. /note=Transmembrane domains: There are no transmembrane domains observed. All the graph data supports inside signals indicating that this gene is not a transmembrane component. /note=Secondary Annotator Name: Valente, Nina /note=Secondary Annotator QC: I agree with the above evidence. All categories have been thoroughly considered. Only modification would be that the synteny box does not need to be completed with NKF. CDS 29144 - 31009 /gene="41" /product="gp41" /function="DNA polymerase I" /locus tag="IttyBittyPiggy_41" /note=Original Glimmer call @bp 29144 has strength 15.01; Genemark calls start at 29144 /note=SSC: 29144-31009 CP: yes SCS: both ST: SS BLAST-Start: [DNA polymerase I [Arthrobacter phage Cassia]],,NCBI, q1:s1 100.0% 0.0 GAP: 204 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.975, -2.5052746077145835, yes F: DNA polymerase I SIF-BLAST: ,,[DNA polymerase I [Arthrobacter phage Cassia]],,WGH21113,98.7118,0.0 SIF-HHPRED: Apicoplast DNA polymerase; DNA polymerase, exonulease, apicoplast, Plasmodium falciparum, REPLICATION, TRANSFERASE; HET: PEG, EDO; 2.5A {Plasmodium falciparum (isolate 3D7)},,,7SXQ_B,96.7794,100.0 SIF-Syn: DNA polymerase I, upstream gene is in pham #28939, downstream is in pham #31182, just like in phage Tbone, another member of cluster AZ1. /note=Primary Annotator Name: Woodward, Lauren /note=Auto-annotation: Glimmer and GeneMark both mark the start site as 29144. /note=Coding Potential: There is coding potential in the second ORF of the reverse direction in both Host and Self-trained GeneMark, indicating that this is a forward gene. /note=SD (Final) Score: The z-score and final score are the best available, at 2.975 and -2.505 respectively. /note=Gap/overlap: There is a gap of 204 base pairs between this and the preceding gene, but there is no coding potential in that gap. /note=Phamerator: As of 1/16/24, this gene is in pham #133396. There are 1546 non-draft members of this pham, from a wide variety of clusters. The most common clusters present are A and B. /note=Starterator: 860/1546 non-draft genomes called start site 230 as the start site. IttyBittyPiggy’s auto-annotation called this as the start site. /note=Location call: Based on the above evidence I agree with the auto-annotated start site at 29144. /note=Function call: DNA polymerase I; BLASTp had several hits with e-values of zero that called the function as DNA pol I; the top two were for phages Cassia and Tbone. NCBI BLAST yielded that same result. The top hit for CDD was DNA polymerase family A with an e-value of -68. The top hit on HHPred was also for a DNA polymerase with an e-value of -68. /note=Transmembrane domains: There are no TMDs, so this is not a membrane protein. /note=Secondary Annotator Name: Laureano, Ryan /note=Secondary Annotator QC: I agree with this annotation. CDS 31021 - 31182 /gene="42" /product="gp42" /function="hypothetical protein" /locus tag="IttyBittyPiggy_42" /note=Original Glimmer call @bp 31021 has strength 5.29; Genemark calls start at 31021 /note=SSC: 31021-31182 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_OBITOO_47 [Arthrobacter phage ObiToo]],,NCBI, q1:s6 100.0% 7.33601E-27 GAP: 11 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.288, -4.980654393880506, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_OBITOO_47 [Arthrobacter phage ObiToo]],,WGH21223,91.3793,7.33601E-27 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Carnes, Julianne /note=Auto-annotation: Both Glimmer and Genemark call the start at 31,021. /note=Coding Potential: Both host and self Genemark show significant coding potential in the forward direction only. /note=SD (Final) Score: -4.981. This is the best final score. Z-score is 2.288. /note=Gap/overlap: There is an 11 base pair gap, which is reasonable. /note=Phamerator: Pham 85757 as of 1/13/24. 55 members, 19/55 are drafts. Conserved among members of the AZ and K clusters. /note=Starterator: Start 7 found in 54/54 of genes in pham. 13/26 manual annotations of this start site. Matches both Glimmer and Genemark prediction. /note=Location call:The start site of this gene is likely 31,021. /note=Function call: NKF. Blastp`s two best hits are Crewmate (9e-25) and ObiToo (9e-25) for no known function. CDD has no hits. Two best hits for NCBI Blastp are Crewmate (7e-27, 96% identity, 100% query cover) and ObiToo (1e-26, 96% identity, 100% query cover) for hypothetical protein. No significant HHpred hits (lowest E-value was 20). /note=Transmembrane domains: None /note=Secondary Annotator Name: Ryan, Kaitlin /note=Secondary Annotator QC: I agree with all of the evidence presented above. Both the functional and location calls seem accurate and well-supported by significant evidence. CDS 31175 - 31477 /gene="43" /product="gp43" /function="DNA ligase" /locus tag="IttyBittyPiggy_43" /note=Original Glimmer call @bp 31175 has strength 14.36; Genemark calls start at 31175 /note=SSC: 31175-31477 CP: no SCS: both ST: SS BLAST-Start: [DNA ligase [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 3.82385E-59 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.975, -2.970161406017234, yes F: DNA ligase SIF-BLAST: ,,[DNA ligase [Arthrobacter phage Yang] ],,YP_009815660,96.0,3.82385E-59 SIF-HHPRED: d.142.2.2 (A:) Adenylation domain of NAD+-dependent DNA ligase {Enterococcus faecalis [TaxId: 1351]} | CLASS: Alpha and beta proteins (a+b), FOLD: ATP-grasp, SUPFAM: DNA ligase/mRNA capping enzyme, catalytic domain, FAM: Adenylation domain of NAD+-dependent DNA ligase,,,SCOP_d3ba9a_,59.0,99.2 SIF-Syn: /note=Primary Annotator Name: Chamorro, Marco /note=Auto-annotation: Both Glimmer and GeneMark call the same start site of (31175). /note=Coding Potential: The self-trained and the host trained show coding potential that is consistent with the 31175 start site. Coding potential is in the forward direction. /note=SD (Final) Score: The 31175 start site is the only possible starting point for this gene. It has a Z-score of 2.975 and a final score of -2.970. This is a GTG codon. /note=Gap/overlap: There is a gap of -8 which is indicative of an operon. /note=Phamerator: As of 1/18/24, this gene is part of pham #119091, with 56 total members and 23 draft phages. /note=Starterator: Start site #19 was the most annotated in 15/33 of the non-draft phages. This corresponds to 31175. /note=Location call: The start site of this gene is 31175. This is based on the evidence from the starterator report, Glimmer, GeneMark, and the coding potential. /note=Function call: This is a DNA Ligase. NCBI Blast obtained hits to Yang and Cassia with E-values of 3.8e-59 and 1.2e-56 respectively, and PhagesDB blast obtained strong hits to Yang and Cassia with e-values of 6e-48 and 2e-48. Both support the DNA Ligase function call. HHpred obtained 13 hits above a 1e-9 e-value threshold and all of these hits are DNA ligase. Additionally CDD obtained a hit that supports the DNA ligase function call. /note=Transmembrane domains: TmHmm predicts 0 TMDs. /note=Secondary Annotator Name: SANCHEZ, KAYLA NICOLE /note=Secondary Annotator QC: I agree with the calls made however I would not check off the HHpred hit because it has a very low coverage percentage (2%). Also check off the coding box and fill in synteny. CDS 31474 - 31779 /gene="44" /product="gp44" /function="hypothetical protein" /locus tag="IttyBittyPiggy_44" /note=Original Glimmer call @bp 31474 has strength 13.9; Genemark calls start at 31474 /note=SSC: 31474-31779 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HOU52_gp43 [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 1.46138E-60 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.714, -5.080915947802713, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU52_gp43 [Arthrobacter phage Yang] ],,YP_009815661,97.0297,1.46138E-60 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Mahadev, Anirudh /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 31474. The start codon is ATG. /note=Coding Potential: There is reasonable coding potential in the putative ORF in the Host Trained GeneMark and the auto-annotated start site’s ORF does cover all coding potential. There is also coding potential on the Self-Trained GeneMark. /note=SD (Final) Score: The SD score of the auto-annotated start site is -5.081, which is the most negative score. The Z score may be lower than the score at the other start site at 31702, but only the 31474 start site encompasses all coding potential. /note=Gap/overlap: The overlap is 4 base pairs, which suggests the possibility that this gene may be a part of an operon. The other start site seems highly unlikely because it does not encompass all coding potential and has a much shorter length. /note=Phamerator: Pham 965, Date 1/17/24. This gene is conserved, it is found in several members of the pham with a similar length. It is mostly in clusters AZ and EB. /note=Starterator: The start site at 31474 is the most annotated start site, and is found in 126 of 139 genes in the pham. It was manually annotated 98/106 times, and is called 99.2% of the time when present. The base pair coordinates for this start site are (36, 31474). /note=Location call: Based on the above evidence, I believe that this gene is real and starts at the suggested start site at 31474. /note=Function call: PhagesDB Blast has several hits among phages in the same cluster, but they all have no known function. HHPred was not informative because of very high e-values, and NCBI Blast had no informative hits except for some hypothetical proteins without function data. Based on the data, I think this gene has no known function. /note=Transmembrane domains: DeepTmHmm had no transmembrane domain hits. /note=Secondary Annotator Name: Hon, Darren /note=Secondary Annotator QC: There is good coding potential within the autoannotated start site. The final score is good, but the z-score is not. However, comparing to other start sites, this LORF would be the better of the two choices. There is a small overlap which is negligible. As of 1/23/24, the gene is conserved. As demonstrated by both phamerator and starterator, start site of 31474 is the most annotated start site, indicating that it is the accurate start site. Phagesdb BLAST indicate strong e-values compared to other phages suggesting NKF. This is supported by no good hits from HHPred. NCBI Blast indicate strong hits that indicate NKF. Therefore, this gene is correctly marked as NKF. Phagesdb BLAST should have some phages marked as evidence for NKF. CDS 31970 - 32758 /gene="45" /product="gp45" /function="DNA binding protein" /locus tag="IttyBittyPiggy_45" /note=Original Glimmer call @bp 31970 has strength 17.46; Genemark calls start at 31970 /note=SSC: 31970-32758 CP: no SCS: both ST: NI BLAST-Start: [DNA binding protein [Arthrobacter phage Cassia]],,NCBI, q1:s1 100.0% 6.13691E-163 GAP: 190 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.878, -2.6453814847322845, yes F: DNA binding protein SIF-BLAST: ,,[DNA binding protein [Arthrobacter phage Cassia]],,WGH21117,95.057,6.13691E-163 SIF-HHPRED: RNA polymerase sigma factor RpoS; Transcription-activator, DNA/RNA, SigmaS, beta`, TRANSCRIPTION, Transferase-DNA complex; 3.26A {Escherichia coli},,,6OMF_F,96.5649,100.0 SIF-Syn: Shares Synteny with Simpson, Obitwo, and JohnDoe, with phams133469 after and 965 preceeding /note=Primary Annotator Name: Hosford, Ryan /note=Auto-annotation: Both GeneMark and Glimmer call the same start site of 31970. this is not the LORF but has the best Final and Z-score of -2.645 and 2.878 respectively. /note=Coding Potential: The Host trained GeneMark shows full coverage of coding potential throughout the gene and some coding potential starting outside the start site. /note=SD (Final) Score: -2.645, which is the lowest of the group /note=Gap/overlap: 190 /note=Phamerator: Phamerator shows the gene as conserved between multiple clusters and length is fairly close as well. /note=Starterator: starterator shows the most annotated start site and calls it 94.2% when present. This is cosistant with many other AZ1 clustered phages. /note=Location call: I agree with the auto annotation call of 31970 because it has the best final and Z scores of -2.645 and 2.878 respectively. it also shares synteny with Simpson, Obitwo, and JohnDoe (AZ1). /note=Function call: Based off of HHPred and NCBI BLAST, the function of the gene appears to be a DNA Binding protein with high percent coverage shared in HHPred and BLASTp but they say RNA binding sigma factor which is stated as a do not use in the function database and BLASTp displays helix-turn helix binding domains so that is what I believe it to be. /note=Transmembrane domains: no transmembrane domains /note=Secondary Annotator Name: Claire, Monjov /note=Secondary Annotator QC: The start site at the top is not a great start site, and I agree with the primary annotators decision to go with number 2. There is better coding potential, z score, and RBS scores with this start site. For function, there is strong evidence to point towards a helix-turn-helix DNA binding domain protein. I agree with the primary annotators findings. CDS 32844 - 33215 /gene="46" /product="gp46" /function="helix-turn-helix DNA binding domain" /locus tag="IttyBittyPiggy_46" /note=Original Glimmer call @bp 32844 has strength 12.66; Genemark calls start at 32844 /note=SSC: 32844-33215 CP: yes SCS: both ST: SS BLAST-Start: [helix-turn-helix DNA binding domain protein [Arthrobacter phage Cassia]],,NCBI, q1:s1 99.187% 2.89934E-72 GAP: 85 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.714, -5.160958035523474, no F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding domain protein [Arthrobacter phage Cassia]],,WGH21118,90.6977,2.89934E-72 SIF-HHPRED: Putative uncharacterized protein; DNA BINDING PROTEIN; NMR {Hyperthermus butylicus},,,2LVS_A,65.0406,98.6 SIF-Syn: helix-turn-helix DNA binding domain. Upstream gene is a helix-turn-helix binding domain, similar to DNA binding protein in Cassia and DrSierra. Downstream gene is SprT-like protease, just like in Cassia and DrSierra. /note=Primary Annotator Name: Indiresan, Neeti /note=Auto-annotation: Both Glimmer and Genemark, start site 32844, start codon ATG /note=Coding Potential: The gene shows coding potential in the forward direction according to both host and self. The chosen start site covers this coding potential. /note=SD (Final) Score: -5.161. It is not the best RBS final score on PECAAN. The start site 33024 has the best final score of -3.292, however it does not cover all of the coding potential. /note=Gap/overlap: 85 bp gap. This is somewhat large, but it is conversed in other genes such as in phage Cassia. /note=Phamerator: pham: 133449. Date 01/18/2024. It is conserved, found in Abidatro (AS) and Amyev (AZ). /note=Starterator: Start site 71 in starterator was called the most. It was manually annotated in 220/346 non-draft genes in this pham. Start site 71 was not called in IttyBittyPiggy. 32844 is start site 79, which was manually annotated in 30 genes and called as the start site by Starterator. This evidence agrees with Glimmer and Genemark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 32844. /note=Function call: Helix-turn-helix DNA binding domain. The top PhagesDB BLASTp hit had the function helix-turn-helix DNA binding domain protein (e-value 10^-57) and the remaining top hits were DNA binding proteins and some helix-turn-helix DNA binding domains (e-values < 10^-52). Three out of the top 5 NCBI BLASTp hits had the function of helix-turn-helix DNA binding domain protein (coverage > 97%, identity > 81.67%, e-value < 10^-66). The remaining 2 hits were a helix-turn-helix DNA binding protein and a DNA binding protein. HHPred had some hits with the functions of DNA binding proteins and helix-turn-helix domains. In addition, HHPred showed that IttyBittyPiggy contains the HTH motif. There were no informative hits in CDD (high e-value and coverage < 35%). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Labib, Youstina /note=Secondary Annotator QC: I agree with the functional call of the primary annotator, but it appears the gap is large and there are better z-score options. Please confirm the start site selected matches the coding potential. CDS 33285 - 33881 /gene="47" /product="gp47" /function="SprT-like protease" /locus tag="IttyBittyPiggy_47" /note=Original Glimmer call @bp 33285 has strength 13.65; Genemark calls start at 33285 /note=SSC: 33285-33881 CP: yes SCS: both ST: SS BLAST-Start: [SprT-like protease [Arthrobacter phage ObiToo]],,NCBI, q1:s1 100.0% 3.95905E-139 GAP: 69 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.878, -2.707694805492614, yes F: SprT-like protease SIF-BLAST: ,,[SprT-like protease [Arthrobacter phage ObiToo]],,WGH21228,98.9899,3.95905E-139 SIF-HHPRED: SprT-like domain-containing protein Spartan; DPC repair protease, DNA BINDING PROTEIN; HET: FLC, MLZ, ADP; 1.5A {Homo sapiens},,,6MDW_A,51.5152,99.5 SIF-Syn: SprT-like protease, upstream gene is in pham #135217 and the downstream gene is in pham #2921. This is just like in the phage JohnDoe which is also part of the AZ1 cluster. /note=Primary Annotator Name: Kalliomaa, Kira /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 33285. /note=Coding Potential: Coding potential in this Open Reading Frame (ORF) is on the forward strand only, indicating that this is a forward gene. Coding potential is found in both GeneMark Self and GeneMark Host. /note=SD (Final) Score: -2.708. This is the best final score on PECANN. /note=Gap/overlap: 69. bp. This is a somewhat large gap but in the end is reasonable due to other phages (Adolin and DrManhattan) conserving the gap. There is also no coding potential in the gap that could indicate a new gene. /note=Phamerator: 1210. Date Accessed: 01/17/2024 It is conserved; found in Adolin, DrManhattan, Lego, and YesChef (AZ1). /note=Starterator: Start site 48 in Starterator was manually annotated in 44/83 non-draft genes in this pham (1210). Start 48 is 33285 in IttyBittyPiggy. This evidence agrees with the site predicted in both Glimmer and Genemark. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 33285. /note=Function call: There were multiple PhagesDB BLAST hits that had suggested SprT-like protease. PhagesDB Blast had multiple small e-values ranging from e-109 to 4e-44. HHPRED had 99.5% probability, 51.5152% coverage, and an e-value of 8e-14. /note=Transmembrane domains: 0; DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note= /note=Secondary Annotator Name: Woodward, Lauren /note=Secondary Annotator QC: I agree with both the location call and the function call, based on the logic presented above. Make sure to fill out the synteny box, since this gene has an assigned function. CDS 34002 - 34160 /gene="48" /product="gp48" /function="hypothetical protein" /locus tag="IttyBittyPiggy_48" /note=Original Glimmer call @bp 34002 has strength 20.93; Genemark calls start at 34002 /note=SSC: 34002-34160 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ITER_47 [Arthrobacter phage Iter]],,NCBI, q1:s1 100.0% 9.09446E-25 GAP: 120 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.975, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ITER_47 [Arthrobacter phage Iter]],,URQ05035,100.0,9.09446E-25 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Potter, Sofia /note=Auto-annotation: Glimmer and GeneMark both call the start at 34002. GTG is the start codon for this site. /note=Coding Potential: There is a small sliver of coding potential in GeneMark that falls just upstream of the autoannotated start site, but the coding potential is almost entirely contained within the region between the start and stop site. The other two suggested start sites on PECAAN are even further downstream by 100 bp, so the autoannotated start site still seems to best capture the coding potential present. /note=SD (Final) Score: The final score is -2.443, the best final score on PECAAN. /note=Gap/overlap: The gap for the autoannotated start site is 120 bp, which is not ideal, but still preferable to the gaps from the other two start sites on PECAAN, which are 222 and 234 bp. There does not seem to be coding potential within this gap. /note=Phamerator: As of January 19, 2024, this gene is in pham 2921. This pham is conserved throughout other AZ1 phages with non-draft genomes, including but not limited to Nitro, Tbone, Powerpuff, and YesChef. /note=Starterator: In all 26 non-draft genes in pham 2921, start number 10 was called. Start number 10 corresponds to 34002 bp in IttyBittyPiggy. /note=Location call: Based on the provided evidence, it is safe to conclude that the start site for this gene is at 34002. /note=Function call: NKF. The top ten hits on NCBI BLASTp, with e values ranging from 4e-04 to 9e-25, are for hypothetical proteins in Arthrobacter phages. The protein sequence for this gene shares 100% identity with a hypothetical protein in Arthrobacter phage Iter, which is also in subcluster AZ1. CDD does not return with any hits. There are no significant hits from HHpred, with the top result matching to Phospholamban, but with an extremely high and insignificant e value of 17. The most significant Phagesdb BLAST hits are all for proteins with unknown function. /note=Transmembrane domains: DeepTMHMM does not predict any transmembrane domains. /note=Secondary Annotator Name: Carnes, Julianne /note=Secondary Annotator QC: I agree with this annotation. Starterator and phamerator are strong, there is coding potential in the forward direction. Gap is not ideal, but there isn`t coding potential to fill that gap. NKF is an accurate function call based on the evidence. CDS 34220 - 34564 /gene="49" /product="gp49" /function="hypothetical protein" /locus tag="IttyBittyPiggy_49" /note=Original Glimmer call @bp 34220 has strength 20.77; Genemark calls start at 34220 /note=SSC: 34220-34564 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ITER_48 [Arthrobacter phage Iter] ],,NCBI, q1:s1 100.0% 1.32641E-71 GAP: 59 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.675, -3.0713502170045657, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ITER_48 [Arthrobacter phage Iter] ],,URQ05036,95.614,1.32641E-71 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Salinas, Juan Carlos /note=Auto-annotation: Both Glimmer and GeneMark call this start site at 34220. The coding potential covers the entire open reading frame in both Host-trained and GeneMark and is suggestive that a real gene is present in this region. /note=Coding Potential: The coding potential covers the entire open reading frame in both Host-trained and GeneMark and is suggestive that a real gene is present in this region. The coding potential only exists in the forward direction, suggesting this is a forward gene. /note=SD (Final) Score: -3.071. This is the highest final score and more favorable of all other final scores from other predictions. /note=Gap/overlap: 59. This gap is reasonable as there is no coding potential that could diminish this gap more. /note=Phamerator: pham: 132118 as of 1/16/2024. There are 55 members in this pham, 39 of which are non-draft genes. It is conserved in cluster AZ, more specifically in Adolin (final), Aegle (final), and Amyev. /note=Starterator: The most annotated start site is start site 15, annotated in 19 of the 39 non-draft genes. This start site is found and called by IttyBittyPiggy, and corresponds to start site 34220. /note=Location call: I would call this start site at 34220 given the evidence above. All other start sites predicted have large gaps that would overlap with the majority of the nearby gene. The final score and z-score are the most favorable at this start site. The conservation of this start site in other members of the same pham is the most convincing evidence. /note=Function call: NKF. No significant evidence is available to make a function call. BLAST reveals genes in the same pham have been called ‘unknown function’. All other available data, such as that of HHPRED and NCBI are not significant. /note=Transmembrane domains: There are no transmembrane domains, suggesting that this is not a transmembrane protein. /note=Secondary Annotator Name: Chamorro, Marco /note=Secondary Annotator QC: I agree with the primary annotator`s location call and function call. CDS 34701 - 36119 /gene="50" /product="gp50" /function="serine integrase" /locus tag="IttyBittyPiggy_50" /note=Original Glimmer call @bp 34701 has strength 10.91; Genemark calls start at 34698 /note=SSC: 34701-36119 CP: yes SCS: both-gl ST: NI BLAST-Start: [serine integrase [Arthrobacter phage Cassia]],,NCBI, q1:s1 100.0% 0.0 GAP: 136 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.754, -3.732968838111421, no F: serine integrase SIF-BLAST: ,,[serine integrase [Arthrobacter phage Cassia]],,WGH21121,98.0932,0.0 SIF-HHPRED: INTEGRASE; HYDROLASE, SERINE RECOMBINASE, UNIDIRECTIONAL, SITE-SPECIFIC RECOMBINATION; 2.15A {STREPTOMYCES PHAGE PHIC31},,,4BQQ_B,63.9831,100.0 SIF-Syn: Synteny expressed with Adolin and Adumb2043, both of which are nondraft phage. Serine integrase, upstream gene is NKF, downstream is NKF, just like in phage Adumb2043. /note=Primary Annotator Name: Zaragoza, Evelin /note=Auto-annotation: Glimmer and GeneMark. Glimmer calls the start site at 34701 and GeneMark calls it at 34698. 34701 appears to be the best option since it optimizes length, the z-score is above 2, and the final score is the second highest out of all the options. /note=Coding Potential: Coding potential falls within the frame. Gene likely exists as it is greater than 120bp with this start site. Does not overlap with surrounding genes. There is coding potential on the complementary sequence, but it is not comparable to the coding potential in the direct sequence. /note=SD (Final) Score: Final score is -3.733 with the z-score being 2.754. The z-score being over 2 is favorable and the final score is one of the highest out of all the other possibilities so this suggests this start site is likely. /note=Gap/overlap: There is a large gap with the gene to the left of 136 bp. Such a gap is conserved in other phage such as Adumb2043. There is very little coding potential in this gap and so a new gene should likely not be added. /note=Phamerator: The pham number is 133437 as of 01/13/2024. It is conserved and found in non-draft phages such as AEgle, Adolin, and Adumb2043, which are all part of the cluster AZ. IttyBittyPiggy is in cluster AZ and subcluster AZ1. /note=Starterator: Start site 71 in Starterator was manually annotated in 123/386 non-draft genes in pham 133437. However, IttyBittyPiggy does not have start site 71 as a possibility and thus it is not called. Such evidence suggests that the start site listed above is still the best option when weighing final score, z-score, gap/overlap, gene length, and synteny. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 34701.The gene does not need to be deleted because the coding potential proves that there needs to be a gene there and the start site called for this particular gene has all of the coding potential. /note=Function call: Serine integrase (a subfamily of serine site-specific recombinases). The top three phagesdb BLAST hits (non-draft) with known function have this same function (e-value= 0), and three top NCBI BLAST hits also have this function (E-value=0, coverage=100%, identity>90%). HHpred has two significant hits for a type of integrase (>60% coverage, E-value<10^-32, probability =100%). NCBI CDD had one hit with accession number COG1961 for site-specific DNA recombinase (E-value = 7.67545e-28 and coverage > 44%). /note=Transmembrane domains: DeepTMHMM predicts no TMDs. /note=Secondary Annotator Name: Mahadev, Anirudh /note=Secondary Annotator QC: I agree with the primary annotator`s annotation of this gene. CDS 36368 - 36658 /gene="51" /product="gp51" /function="hypothetical protein" /locus tag="IttyBittyPiggy_51" /note=Original Glimmer call @bp 36368 has strength 10.53; Genemark calls start at 36368 /note=SSC: 36368-36658 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CASSIA_49 [Arthrobacter phage Cassia]],,NCBI, q1:s1 100.0% 8.60358E-53 GAP: 248 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.975, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CASSIA_49 [Arthrobacter phage Cassia]],,WGH21122,96.9072,8.60358E-53 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kibria, Kamille /note=Auto-annotation: Glimmer and GeneMark both called the start site at 36368. /note=Coding Potential: Self trained and Host Trained GeneMark shows reasonable coding potential, and show a start site predicted by GeneMark and Glimmer. All of the coding potential is contained within the start and stop site. /note=SD (Final) Score: -2.505. No other candidates. /note=Gap/overlap: 248. This is a large gap. However, there is not good coding potential in between to add a gene between the upstream gene in Host Trained GeneMark. There is potential coding potential upstream of this gene in Self Trained GeneMark but the respective ORF would not make sense to add. /note=Phamerator: Pham 133767 as of 1/18/24. There are 60 members in this pham. Conserved in the AZ cluster. /note=Starterator: Report generated on 1/18/24. The start number called most often in published annotations is 13. This was called in 34 of 36 non draft genes. IttyBittyPiggy called this start site, corresponding to a start site at 36368. This matches the auto annotation. /note=Location call: I am a little skeptical because of the large gap upstream and the start codon GTG. But the rest of the evidence shows that this gene is likely real with a start site at 36368. /note=Function call: NKF. PhagesDB pBLAST generated significant hits. Cassia_49, ObiToo_55, and Crewmate_56 generated significant hits and are all in the AZ cluster. There is also synteny with Yang and Cassia (AZ cluster phages). These are all NKF genes. In NCBI pblast, significant hits for hypothetical proteins were generated for Cassia, Yang, ObiToo, and Crewmate. No hits in CDD. No significant HHPred hits (e values >30). /note=Transmembrane domains: No TMH predicted by TmHmm so not a membrane protein. /note=Secondary Annotator Name: Hosford, Ryan /note=Secondary Annotator QC: I agree with the primary annotator. The gap seems a little offputing but based off synteny with Nitro and other AZ1 phages i believe that the start site is 36368 as the primary annotator has stated. Also based off the limited evidence the function seems to be NKF. CDS 36661 - 36891 /gene="52" /product="gp52" /function="RNA binding protein" /locus tag="IttyBittyPiggy_52" /note=Original Glimmer call @bp 36661 has strength 14.48; Genemark calls start at 36661 /note=SSC: 36661-36891 CP: no SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_ITER_53 [Arthrobacter phage Iter] ],,NCBI, q1:s1 100.0% 6.66732E-43 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.821, -2.9063687850157054, yes F: RNA binding protein SIF-BLAST: ,,[hypothetical protein SEA_ITER_53 [Arthrobacter phage Iter] ],,URQ05041,94.7368,6.66732E-43 SIF-HHPRED: Asl2047 protein; HFQ, SM, RNA-BINDING PROTEIN, SRNA, TRANSLATIONAL REGULATION, RNA BINDING PROTEIN; 2.31A {Nostoc sp.},,,3HFN_A,96.0526,96.7 SIF-Syn: CDS 36888 - 37079 /gene="53" /product="gp53" /function="RNA binding protein" /locus tag="IttyBittyPiggy_53" /note=Original Glimmer call @bp 36888 has strength 14.3; Genemark calls start at 36888 /note=SSC: 36888-37079 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HOU52_gp52 [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 1.72212E-28 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.388, -3.8116689268127657, yes F: RNA binding protein SIF-BLAST: ,,[hypothetical protein HOU52_gp52 [Arthrobacter phage Yang] ],,YP_009815670,90.4762,1.72212E-28 SIF-HHPRED: Asl2047 protein; HFQ, SM, RNA-BINDING PROTEIN, SRNA, TRANSLATIONAL REGULATION, RNA BINDING PROTEIN; 2.31A {Nostoc sp.},,,3HFN_A,93.6508,97.4 SIF-Syn: /note=Primary Annotator Name: Kathiravan, Anoushka /note=Auto-annotation: The start site is identified as 36888 by both GeneMark and Glimmer. /note=Coding Potential: Based on Host-trained GeneMark and Self-Trained GeneMark there is coding potential found in the forward direction because the region of coding falls between the predicted start site and the stop site. /note=SD (Final) Score: The Z score is 2.388 and the Final Score is -3.812. This is the highest Z score and the least negative Final score. /note=Gap/overlap: The gap is -4bp indicating that there is some overlap. /note=Phamerator: This gene is in pham 135762. Other members of the cluster AZ1 are present in this pham including AEgle, Community, and Joemato. /note=Starterator: On 1/21/23 the pham is 135762. There are 28 members in this pham. This gene contains start site 5 which is also present by 27 other members in the pham. The gene contains the most annotated start site for this pham. /note=Location call: Start site is likely 36888 /note=Function call: RNA Binding Protein. The highest ranked phages in HHPred have the function of RNA Binding Protein with an e value of 0.006 at 97.4% probability. In NCBI BLAST the top phages also call RNA binding protein with an e-value of 1.72212e-28. The CDD hits also have RNA Binding protein with an e-value of 0.00965573 at 16.43% identity. /note=Transmembrane domains: None. This makes sense because according to HHpred and CDD this is a RNA Binding protein. These proteins would not have TMRs. /note= /note=Secondary Annotator Name: Kalliomaa, Kira /note=Secondary Annotator QC: Secondary Annotator QC: I agree with this location and functional call, however some of the evidence categories need to be interpreted/corrected: /note=>Gap: as part of the annotation manual it says to not just list a value, but to interpret it. So you would want to add that the 4bp overalp, is acceptable because it is most likely part of an operon. /note=>Phamerator: you would want to have the date you accessed phamerator, as the pham numbers frequently change over time. I am assuming but the genes you have listed are the ones that were for comparison/are conserved? Make sure those are listed because you want to have documented what final genomes you compared to/were conserved as part of your evidence. /note=>Starterator: How many genes were manually annotated? Is start site 5 the start site that was also called by Glimmer and GeneMark? /note=>location call: because we have to interpret what we write, you would want to say that based on the evidence gathered, the likely start site is : ____. vs. just saying the start site with out any interpretation. /note=>Function call: Also make sure that the synteny box is filled out since you`ve called a function! CDS 37076 - 37516 /gene="54" /product="gp54" /function="hypothetical protein" /locus tag="IttyBittyPiggy_54" /note=Original Glimmer call @bp 37076 has strength 15.53; Genemark calls start at 37076 /note=SSC: 37076-37516 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CASSIA_52 [Arthrobacter phage Cassia]],,NCBI, q1:s1 94.5205% 1.30446E-32 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.388, -3.8116689268127657, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CASSIA_52 [Arthrobacter phage Cassia]],,WGH21125,67.9688,1.30446E-32 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Aves, Alexandra /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 37076 with the start codon as ATG. /note=Coding Potential: The Host Trained Gene Mark demonstrates strong coding potential for the indicated range of 37076 - 37516, however the Self Trained shows a severe dip ~37250 but there is no other coding potential in this area on other frames so this discrepancy can be ignored. /note=SD (Final) Score: The selected suggested gene has the least negative final score (-3.812) and the highest Z-score (2.388). /note=Gap/overlap: The selected gene has an overlap of -4 which is indicative of an operon and is the LORF. /note=Phamerator: As of 1/15/24, Phamerator calls the pham number as 86120. There are 36 members (15 drafts) in this pham all of which are in cluster AZ such as Yang and YesChef. /note=Starterator: As of 1/15/24, Starterator calls the pham number as 86120. There are 36 members, 15 of which are drafts. The most annotated start site was called in 20/21 of the non-drafts, including IBP, which is 37076. /note=Location call: There is strong evidence that this gene is real and the start site is 37076, including the auto annotated call and the synteny shown in the finalized genes in Yang and Cassia. Additionally, all of the coding potential is captured between the suggested start and stop site. /note=Function call: There are numerous hits on PhagesDB of finalized genes with strong e-values, all with functions listed as unknown, such as Yang (1e-28) and Cassia (6e-32). CDD also demonstrated strong hits, especially with those of Cassia and Yang, both with 94% coverage and functions listed as hypothetical proteins. HHPRED showed no significant hits. /note=Transmembrane domains: There are no predicted TMHs /note=Secondary Annotator Name: Potter, Sofia /note=Secondary Annotator QC: Don`t forget to fill out the Starterator and All GM Coding Capacity boxes! Other than that, I think the location call looks good, and I agree based on (lack of) evidence that the function call should be NKF. CDS 37513 - 37716 /gene="55" /product="gp55" /function="hypothetical protein" /locus tag="IttyBittyPiggy_55" /note=Original Glimmer call @bp 37513 has strength 11.36; Genemark calls start at 37513 /note=SSC: 37513-37716 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQE17_gp59 [Arthrobacter phage Crewmate] ],,NCBI, q4:s3 92.5373% 7.53048E-28 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.326, -5.972387497942108, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE17_gp59 [Arthrobacter phage Crewmate] ],,YP_010678310,68.2353,7.53048E-28 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Ebrahimi, Shaghaiegh Mariam /note=Auto-Annotation: Glimmer and Genemark both call start at 37513. /note=Coding Potential: Good coding potential is present in Self-Genemark and HostGenemark. It is present throughout the entire reading frame and only in the forward strand. /note=SD (Final) Score: -5.972 this is not the largest final score among the start sites, but the start with the largest final score has an overlap of 274 which is unreasonable. /note=Gap/overlap: This start site has the smallest gap which is -4 which is indicative of an operon. /note=Phamerator: Pham as of 1/19/24 is 4936 and includes 18 other members all in the AZ cluster (3 of the other members are drafts). /note=Starterator: Start site number 10 is called for this gene and is the most annotated in Starterator. It was called in 10/14 non-draft genomes and called 100% of the time when present. /note=Location call: Most likely start site is 37513 since it was called by both programs and smallest gap even though the z-score (1.326) and final score (-5.972) weren’t the highest. This start was also most annotated in Starterator. /note=Function call: No known function. No good hits for any proteins with known function in PhagesDB, NCBIblast or HHpred. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs and so this is not a transmembrane protein. /note= /note=Secondary Annotator Name: Salinas, Juan /note=Secondary Annotator QC: I agree with the calls made by the annotator. The evidence available, particularly when looking at the gap, shows that the start at 37513 is the most favorable. The function call is NKF due tot he lack of available information on the softwares available. CDS 37709 - 38119 /gene="56" /product="gp56" /function="hypothetical protein" /locus tag="IttyBittyPiggy_56" /note=Original Glimmer call @bp 37709 has strength 15.9; Genemark calls start at 37709 /note=SSC: 37709-38119 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_JANEEMI_56 [Arthrobacter phage Janeemi]],,NCBI, q1:s1 99.2647% 1.65694E-41 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.878, -2.7863799983944713, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_JANEEMI_56 [Arthrobacter phage Janeemi]],,UVK63576,67.3611,1.65694E-41 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Tran, Michelle /note=Auto-annotation: Both Glimmer and GeneMark predict the start site for this gene at 37709. /note=Coding Potential: The coding potential for this gene is indicated on both the host-trained and self-trained GeneMark in the assigned ORF. This coding potential is only found on the forward strand, which supports the annotation of this as a forward gene. /note=SD (Final) Score: The final score for this start site is -2.786, which is the best and only available final score for this gene available on PECAAN. /note=Gap/overlap: There is an 8-bp overlap between the start site of this gene and the stop site of the upstream gene. This is a reasonably-sized overlap. /note=Phamerator: As of January 19, 2024, this gene is under pham 86029. This is conserved in other phages in cluster AZ1, such as Tbone and ObiToo. /note=Starterator: Start site 8, which corresponds to 37709, was manually annotated in all of the non-draft genes in the pham (26/26). This would match with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence shown above, this gene is most likely real and has a start site at 37709. /note=Function call: NKF. Both Phagesdb BLAST and NCBI BLAST yield strong results supporting the lack of known function (e-value <10e-35) for this protein. The single HHPred result was unusable because of it being too weak (e-value >1). The Conserved Domain Database yielded no results. /note=Transmembrane domains: This is not a membrane protein because DeepTMHMM does not predict any TMDs. It is impossible to determine the protein’s function from there because it currently has NKF. /note=Secondary Annotator Name: Zaragoza, Evelin /note=Secondary Annotator QC: I agree with this annotator`s call for the start site and function. They may choose to elaborate further on the evidence for this start site, but that is not necessary as this is the only start site called. They may choose to mention Z-score when discussing final score. Please check off your starterator and GM box as well. Good job! CDS 38132 - 38326 /gene="57" /product="gp57" /function="hypothetical protein" /locus tag="IttyBittyPiggy_57" /note=Original Glimmer call @bp 38132 has strength 11.9; Genemark calls start at 38132 /note=SSC: 38132-38326 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HOU52_gp57 [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 3.98647E-33 GAP: 12 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.219, -2.0720764396375664, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU52_gp57 [Arthrobacter phage Yang] ],,YP_009815675,96.875,3.98647E-33 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kim, Abby /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 38132. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -2.072. It is the best final score on PECAAN. /note=Gap/overlap: 12 bp gap which is very small. There is no coding potential in the gap that might be a new gene as well. /note=Phamerator: pham: 135301. Date 1/19/24. It is conserved and found in Adolin_58 (AZ) and Adumb2043_53 (AZ), in addition to many other phages from various clusters /note=Starterator: Does not call the most annotated start site. Manually annotated in 8 out of 29 genes in this pham (Start site 38132). Start site 59 is not possible due to how small it makes the length of the gene. /note=Location call: Based on the evidence, this is a real gene and the most likely start site is 38132. /note=Function call: NKF. The fourth and sixth phagesDB BlastP hits have the function as NKF (Yang_57: 1e-27 and ObiToo_61: 2e-24). The top 2 NCBI Blast hits have the function call as a hypothetical protein (1st hit: 4e-33 and 100% coverage) (2nd hit: 7e-24 and 95.1% coverage). HHpred had a hit for Herpesvirus UL56 protein, however the e-value of 190 is too high and the probability is very low at 22.52%. When running CDD, it does not produce any hits so there is no available data. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Kibria, Kamille /note=Secondary Annotator QC: There is a starterator report as of 1/22/24. IttyBittyPiggy does not call the most annotated start site but there are 8 manual annotations of start 29 (corresponding to 38132).The other start site (59) is not a plausible start site because it makes the length of the gene very small. All manual annotations of this start site were in AZ1 which provides good evidence. I agree with the rest of the annotation. CDS 38329 - 38547 /gene="58" /product="gp58" /function="hypothetical protein" /locus tag="IttyBittyPiggy_58" /note=Original Glimmer call @bp 38329 has strength 19.41; Genemark calls start at 38329 /note=SSC: 38329-38547 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQE10_gp56 [Arthrobacter phage Tbone] ],,NCBI, q14:s5 81.9444% 0.0409216 GAP: 2 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.196, -4.213314289702982, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE10_gp56 [Arthrobacter phage Tbone] ],,YP_010677827,55.7377,0.0409216 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: To, Nathan /note=Auto-annotation: Both Glimmer and Genemark call this start site at 38329 with start codon GTG. /note=Coding Potential:Both Genemark Self and Host show coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. /note=SD (Final) Score: -4.213, this is the best final score available. /note=Gap/overlap: Gap of 2, this is reasonable and the smallest gap possible without a significant overlap. /note=Phamerator: As of 1/17/24, phamerator calls the pham 23827. This pham contains 3 members and has phages from only the AZ cluster. No function calls are present. /note=Starterator: As of 1/13/24, starterator calls the pham 23827. This pham contains 3 members, 3 of which are drafts. The selected start site is the most annotated, called 100% of the time when present. /note=Location call:The gathered evidence suggests that this is a real gene, with start site @38329. This gene has good coding potential, and does not have large gaps before or after it. The start site 38329 seems most likely due to both Glimmer and Genemark calling it and its good Z score and final score. Starterator also provides good evidence for it, with it being the most annotated site and being called 100% of the time when present. /note=Function call: No known function, based on multiple NCBi and PhagesDB BLASTs with unknown function and low e-values (.006 for phage Tbone and lower calls for draft genomes). HHpred or CDD do not make any convincing calls. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs. /note=Secondary Annotator Name: /note=Secondary Annotator QC: CDS 38667 - 38981 /gene="59" /product="gp59" /function="hypothetical protein" /locus tag="IttyBittyPiggy_59" /note=Original Glimmer call @bp 38667 has strength 20.59; Genemark calls start at 38667 /note=SSC: 38667-38981 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQD80_gp57 [Arthrobacter phage Lizalica] ],,NCBI, q1:s1 100.0% 7.63051E-61 GAP: 119 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.727, -3.0248543014571307, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQD80_gp57 [Arthrobacter phage Lizalica] ],,YP_010677622,92.381,7.63051E-61 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Valente, Nina /note=Auto-annotation: Both Glimmer and GeneMark call this gene.They agree on start site 38667. /note=Coding Potential: This start site covers nearly all coding potential. Coding potential is in the forward direction. A very slightly longer ORF would resolve this issue, but would result in a start site with a poor final score. /note=SD (Final) Score: The final score is -3.025. This is the best final score on PECAAN. /note=Gap/overlap: There is a 119 base pair gap. /note=Phamerator: As of 1/18/24, this gene belongs to pham 87428. This pham has 12 members. /note=Starterator: The start site is found in 91.7% of genes in the pham and has 4/5 manual annotations. It is called 100% of the time when present. /note=Location call: Based on the above evidence, particularly Starterator and final scores, this is likely a real gene with a start site at 38667. /note=Function call: Every Phagesdb Blastp search, with one exception, showed unknown function. The top two non-draft hits have e-values of 8E-58 and 1E-24 respectively. The results from HHpred have very poor e-values, with the top hit having an e-value of 10. For this reason, these results should not be used confidently as evidence. No results were available from CDD. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Kathiravan, Anoushka /note=Secondary Annotator QC: I agree with the start site and functional call. Based on the auto-annotation and the Z score and Final score 38667 seems to be the start site. The functional call also seems to be the most likely. CDS 39043 - 39240 /gene="60" /product="gp60" /function="hypothetical protein" /locus tag="IttyBittyPiggy_60" /note=Original Glimmer call @bp 39043 has strength 21.2; Genemark calls start at 39043 /note=SSC: 39043-39240 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CASSIA_60 [Arthrobacter phage Cassia]],,NCBI, q1:s1 100.0% 1.93868E-30 GAP: 61 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.975, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CASSIA_60 [Arthrobacter phage Cassia]],,WGH21133,86.1538,1.93868E-30 SIF-HHPRED: SIF-Syn: NKF /note=Primary Annotator Name: Laureano, Ryan /note=Auto-annotation: GeneMark and Glimmer both call the start site at 39043 and the codon is ATG. /note=Coding Potential: There is good coding potential and the suggested start site covers all of it. /note=SD (Final) Score: The suggested start site has a final score of -2.505 which is significantly lower than the other start site which is -5. /note=Gap/overlap: The gap from this suggested start site is 61. This is rather large but not a cause for concern as there is no coding potential in this gap nor does a comparison with other phages suggest a missing gene. This start site provides the LORF. /note=Phamerator: As of 1/17/24, this gene is in pham 86088. Genes that share the same pham and cluster (AZ) include Lego_59 and Snek_59. /note=Starterator: The most published annotated start number is 7 which is called in 14/24 non-draft genomes in the pham. IttyBittyPiggy calls this start number which corresponds to 39043. /note=Location call: This gene is real as it has good coding potential and good length. The start site suggested is backed by starterator. It is the LORF and has the best Z-score, final score, and smallest overlap. Thus the start site is at 39043. /note=Function call: PhagedbBLAST have hits with Cassia and Adumb2043 at low e-values but both of these genes have NKF. These same genes result in hits with NCBI BLAST but are marked as hypothetical proteins. HHpred has no hits with low e-values. /note=Transmembrane domains: There are no recorded TMRs. /note=Secondary Annotator Name: Aves, Alexandra /note=Secondary Annotator QC: I have reviewed the gathered evidence above and agree with the function call of NKF. CDS 39364 - 39705 /gene="61" /product="gp61" /function="hypothetical protein" /locus tag="IttyBittyPiggy_61" /note=Original Glimmer call @bp 39364 has strength 18.16; Genemark calls start at 39364 /note=SSC: 39364-39705 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CASSIA_61 [Arthrobacter phage Cassia]],,NCBI, q1:s1 100.0% 9.38592E-53 GAP: 123 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.878, -2.707694805492614, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CASSIA_61 [Arthrobacter phage Cassia]],,WGH21134,94.6429,9.38592E-53 SIF-HHPRED: SIF-Syn: No known function, exhibits both downstream and upstream synteny with phage genomes in the same subcluster (AZ1) and pham (132128), such as Amyev, Adumb2043, and Adolin. /note=Primary Annotator Name: Ryan, Kaitlin /note=Auto-annotation: Both Glimmer and GeneMark call the start site as 39364. /note=Coding Potential: High coding potential in the forward direction only as indicated by the direct sequence from both GeneMark Host and GeneMarkS outputs. /note=SD (Final) Score: -2.708. This is the best and least negative final score on PECAAN. /note=Gap/overlap: There is a gap of 123 base pairs. This is the second smallest gap on PECAAN, but has the most supportive evidence in addition to the gap (i.e. best Z-score and best final score). There is also no coding potential in the gap indicative of a new gene. /note=Phamerator: Pham: 132128. Date: 1/11/2024. This gene shows extensive synteny with members of the same pham (132128), cluster (AZ), and subcluster (AZ1), such as Amyev, Adumb2043, and Adolin. /note=Starterator: Start site 18 was the most annotated in 25 of the 34 non-draft genes in this pham. IttyBittyPiggy also called this as the most annotated start site and has 25 manual annotations for it. /note=Location call: Based on the above evidence, this is a real gene and the start site is 39364. /note=Function call: Function unknown, based on the majority of non-draft significant hits from PhagesDB Blast (phages Cassia, Tbone, Warda, Ascela, and Iter, e-values < 3e-41) and NCBI Blast (phages Cassia, Ascela, and Iter, > 69% identity, > 77% aligned, > 88% coverage, and e-values < 1.07e-46), as of 1/11/2024. HHPRED and CDD were irrelevant. /note=Transmembrane domains: Deep TMHMM does not predict any TMDs; therefore, this is not a transmembrane protein. /note=Secondary Annotator Name: Ebrahimi, Shaghaiegh Mariam /note=Secondary Annotator QC: I agree with the function and location calls. Would maybe elaborate on the starterator section, but it is sufficient. CDS 39776 - 40090 /gene="62" /product="gp62" /function="hypothetical protein" /locus tag="IttyBittyPiggy_62" /note=Original Glimmer call @bp 39776 has strength 15.68; Genemark calls start at 39776 /note=SSC: 39776-40090 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CASSIA_62 [Arthrobacter phage Cassia]],,NCBI, q1:s1 100.0% 5.42124E-66 GAP: 70 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.975, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CASSIA_62 [Arthrobacter phage Cassia]],,WGH21135,97.1154,5.42124E-66 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sanchez, Kayla /note=Auto-annotation: Both Glimmer and Genemark. Both call the start at 39776. /note=Coding Potential: Good coding potential is found both in GeneMark Self and Host. Our start and stop is outside of the coding potential. Coding potential is in the forward region so we can say that our gene is a forward gene. /note=SD (Final) Score: -2.505. It is the best final score on PECAAN because it is the lowest negative value. It is also the only final score available. /note=Gap/overlap: Gap of 70bp. Though the gap is larger than the 50 bp, this is still a reasonable gap to have because it is conserved in other genes like DrSierra. Acceptable gene length of 315bp. /note=Phamerator: pham: 135558. Date 1/11/24. It is conserved; found in VResidence and Cassia (AZ1). /note=Starterator: Start number 32 in Starterator was manually annotated in 163/315 non-draft genes in this pham. Start 32 is 39776 in IttyBittyPiggy and has 163 manual annotations. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the good coding potential and the small e-value, this gene is real and the most likely start site is 39776. Our findings also show synteny because of the similar position as well as similar sizes of genes (IttyBittyPiggy61 - 315 bp, VResidence62 - 314 bp, DrSierra59 - 311 bp). /note=Function call: Function unknown. The top two PhagesDB BLASTp non-draft hits have the function unknown (E-value <10^-50) and the top two NCBI BLAST hits are hypothetical proteins (86%+ identity, 100% coverage, and E-value <10^-60). HHpred cannot be used to determine if it is a function unknown because they have very large e-values (2.3 and 5). CDD did not have any domain hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. It predicts that the protein is found on the inside of the membrane. /note=Secondary Annotator Name: Tran, Michelle /note=Secondary Annotator QC: I agree with the location and function calls, but the pham has changed and the relevant information needs to be updated. CDS 40083 - 40214 /gene="63" /product="gp63" /function="hypothetical protein" /locus tag="IttyBittyPiggy_63" /note=Original Glimmer call @bp 40083 has strength 11.15; Genemark calls start at 40083 /note=SSC: 40083-40214 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQD80_gp62 [Arthrobacter phage Lizalica] ],,NCBI, q1:s1 100.0% 6.41936E-10 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.975, -3.095100142625534, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQD80_gp62 [Arthrobacter phage Lizalica] ],,YP_010677627,74.4186,6.41936E-10 SIF-HHPRED: SIF-Syn: Synteny is present with phages such as London and Lego. /note=Primary Annotator Name: Hon, Darren /note=Auto-annotation: Gene 62 (stop@40214F) /note=Coding Potential: Host and self-trained Genemark indicate good coding potential between the start site of 40083, a site called by both Glimmer and GeneMark. The length is 132 kp and is the LORF. No switches in gene orientation is present. /note=SD (Final) Score: The start codon is ATG. The final score of -3.095 and z-score of 2.975 indicate a good RBS score. It is also the only gene available ORF available. The called start site of 40083 is kept. /note=Gap/overlap: There is an overlap of -8, which is valid, demonstrated by synteny with final genes such as from phages Amyev and Adumb2043. /note=Phamerator: As of 1/19/24, the gene is in pham 133794. There are a total of 54 members, 22 of which are drafts. /note=Starterator: As of 1/19/24, the most annotated start is start number 6. The start position is listed at 40083, which was also the auto-annotated start. /note=Location call: There is strong coding potential present between start site 40083 and stop site 40214, demonstrated by both host and self-trained Genemark. The gene itself is 132 kp, the LORF, and has a final score of -3.095 and z-score of 2.975, both of which indicate a good RBS score. This is also the only gene available. An overlap of -8 is present. Synteny is present between this gene and other non-draft genes. As of 1/19/24, both phamerator and starterator call the start site as 40083, with start number 6. Thus, is is concluded that this is a real gene with a start site at 40083. /note=Function call: According to the blastp hits, other genes with valid e-values have unknown functions. The only comparable, non-draft gene is Lizalica which noted this gene as function unknown. HHPred indicated no hits that had good e-values. CDD did not have any domain hits. Therefore, this gene currently has function unknown. /note=Transmembrane domains: According to Deep TMHMM, this is a transmembrane protein. There is 1 predicted TMR, with a length of 19 amino acids, from amino acids 9 to 27. /note=Secondary Annotator Name: Kim, Abby /note=Secondary Annotator QC: I agree with this annotation. Just make sure to check the box for the TMHMM for evidence below. CDS 40207 - 40560 /gene="64" /product="gp64" /function="hypothetical protein" /locus tag="IttyBittyPiggy_64" /note=Original Glimmer call @bp 40207 has strength 18.86; Genemark calls start at 40207 /note=SSC: 40207-40560 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQE13_gp62 [Arthrobacter phage Elezi] ],,NCBI, q1:s1 100.0% 5.91523E-65 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.975, -3.095100142625534, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE13_gp62 [Arthrobacter phage Elezi] ],,YP_010678040,90.678,5.91523E-65 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Claire, Monjov /note=Auto-annotation start source: Both Glimmer and GeneMark call a start site of 40207. For that start site, the start codon is an ATG. /note=Coding Potential: For this gene, the coding potential starts just after the start site and covers most of the gene. This is a good indicator that the gene is real because 80% of the gene is covered in the forward direction. /note=SD (Final) Score: The final score is -3.095 which is the highest value amongst the other start site candidates. The z-score is 2.975 which is over 2 and something we look for. These indicate that the start site selected is a very good candidate. /note=Gap/overlap: There is an 8 base pair overlap which is not very long. This is within the acceptable range and is a very reasonable overlap. The length is 354 base pairs which is what we would expect for a normal. These are good indicators that the gene is real and the start site is correct. /note=Phamerator: As of 1/20/24, this gene is part of pham 133778. There is a total of 59 members within this pham, and many are a part of cluster AZ like our phage. I used phages like Adumb and Adolin to compare. There was no function call for this particular gene. /note=Starterator: Out of the 41 non-draft phage genomes within the cluster, 24 called a start site of 9. This corresponds to start site 40207. /note=Location call: Based on the z-score, final score and starterator, I believe that the most probable start site is 40207. This gene is very likely a real gene based off of the plentiful evidence pointing in that direction. /note=Function call: Though I believe the gene is real, there is little evidence to point towards a function. There is a function for this gene, but we do not have the resources and evidence to claim a function. For now, we will label it NKF. /note=Transmembrane domains: There were no hits for transmembrane proteins so it is most likely not one. /note=Secondary Annotator Name: Nathan, Joseph /note=Secondary Annotator QC: I have QCed this annotation and I agree with the location call due to the convincing z-score and final score (The best available options for both) favoring this start as well as having a reasonable start codon and LORF with good coding potential. I also agree with the function call, the blast hits all show NKF and there are no convincing HHpred calls, so NFK seems to be the best option. CDS 40560 - 40736 /gene="65" /product="gp65" /function="membrane protein" /locus tag="IttyBittyPiggy_65" /note=Original Glimmer call @bp 40560 has strength 9.48; Genemark calls start at 40560 /note=SSC: 40560-40736 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Adumb2043] ],,NCBI, q3:s2 94.8276% 1.61365E-27 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.005, -4.553144366458975, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Adumb2043] ],,YP_010677973,92.9825,1.61365E-27 SIF-HHPRED: SIF-Syn: There is good synteny observed in upstream and downstream directions when compared across multiple final phage genomes. /note=Primary Annotator Name: Labib, Youstina /note=Auto-annotation: Glimmer identifies the start site at 40560. GeneMark identifies the start site to be 40560. Both start sites match, indicating a greater likelihood of this being the correct start site. /note=Coding Potential: There is good coding potential observed within the sixth open reading frame for this gene on the GeneMarkS output report and GeneMarkHmm. This is supportive evidence that this gene is real and the coding potential on the forward confirms this is a forward gene. /note=SD (Final) Score: The SD (final) score is -4.553, this is the best-observed score it appears to have more supportive evidence compared to other calls. The z-score is 2.005 which is the best of those called. /note=Gap/overlap: This gene has a 1bp overlap with the gene downstream. This is a very small gap and poses no concerns. /note=Phamerator: #133794 observed on 1/19/24. This has many members within its pham predominantly from the cluster AZ. Some of these genes within this pham come from the phages Adolin and Cassia. Many of the genes within this pham have no known function /note=Starterator: 1/19/24 The start site that was called the most was start 6 which occurred 49 of 54 times in the AZ cluster. Many non-draft phage annotations were called. /note=Location call: Based on the above evidence, it appears this is a real gene and the start site is 40560. /note=Function call: Based on the presented evidence there is an indication That this is a transmembrane protein. There are multiple significant e-values demonstrated by PhagesDB blastp data for NKF. There are several NCBI hits indicative of NKF but no CDD hits. The HHPRED data demonstrates no significant evidence for the tail assembly chaperone. The PhagesDB data contradicts the transmembrane findings indicating that this gene may indeed have a function as a transmembrane protein. /note=Transmembrane domains: There is evidence of transmembrane domains observed. All the graph data supports inside signals indicating that this gene could potentially be a transmembrane component. There are inside, inner, and outside signals observed. /note=Secondary Annotator Name: Valente, Nina /note=Secondary Annotator QC: I agree with the above evidence. All of the categories have been thoroughly considered. CDS 40729 - 40893 /gene="66" /product="gp66" /function="hypothetical protein" /locus tag="IttyBittyPiggy_66" /note=Original Glimmer call @bp 40729 has strength 13.07; Genemark calls start at 40768 /note=SSC: 40729-40893 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_CASSIA_64 [Arthrobacter phage Cassia]],,NCBI, q1:s1 100.0% 9.07569E-21 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.975, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CASSIA_64 [Arthrobacter phage Cassia]],,WGH21137,83.3333,9.07569E-21 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Woodward, Lauren /note=Auto-annotation: Glimmer marks the start site as 40729 and GeneMark marks the start site as 40768. /note=Coding Potential: There is coding potential in the first ORF of the reverse direction in both Host and Self-trained GeneMark, indicating that this is a forward gene. /note=SD (Final) Score: The z-score and final score are the best available, at 2.975 and -2.505 respectively. /note=Gap/overlap: There is an overlap of 8 base pairs with this and the preceding gene. This is a reasonably sized overlap. /note=Phamerator: As of 1/16/24, this gene is in pham #2353. There are 34 non-draft members of this pham, and they all belong to phages in cluster AZ. /note=Starterator: 33/34 non-draft genomes called start site 10 as the start site, but IttyBittyPiggy does not have this start site. IttyBittyPiggy’s auto-annotation called start site 10, which corresponds to 40729. /note=Location call: Based on the above evidence I agree with the auto-annotated start site at 40729. /note=Function call: NKF; BLASTp and NCBI BLAST had significant hits, but none had known functions. There were no significant hits for CDD or HHpred. /note=Transmembrane domains: There are no TMDs, so this is not a membrane protein. /note=Secondary Annotator Name: Laureano, Ryan /note=Secondary Annotator QC: I agree with this annotation. CDS 40890 - 41003 /gene="67" /product="gp67" /function="hypothetical protein" /locus tag="IttyBittyPiggy_67" /note=Original Glimmer call @bp 40890 has strength 16.0; Genemark calls start at 40890 /note=SSC: 40890-41003 CP: yes SCS: both ST: NA BLAST-Start: [membrane protein [Arthrobacter phage Amyev] ],,NCBI, q1:s1 100.0% 4.14728E-9 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.975, -2.583959800616441, yes F: hypothetical protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Amyev] ],,YP_010677768,83.7838,4.14728E-9 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Carnes, Julianne /note=Auto-annotation: 40,890 is the start site called by both Glimmer and Genemark /note=Coding Potential: Both host and self genemark have coding potential in the forward direction only. /note=SD (Final) Score: -2.584. Best final score with a z-score of 2.975 /note=Gap/overlap: There is a -4 base pair overlap. It is likely that the start codon overlaps with the upstream stop codon, operon /note=Phamerator: Pham 133847 as of 1/13/24. There are 44 members in the AZ cluster, 28 are non-draft phages. Conserved. /note=Starterator: 404 error report. Server numbers do not match, so therefore there is no Starterator analysis yet as of 1/13/24 /note=Location call: It is likely the start site of this gene is 40,890. /note=Function call: NKF. Blastp`s two best hits are Pumpkins (2e-10) and TforTroy (2e-10) for no known function. NCBI Blastp has one hit Amyev for membrane protein (4e-09, 100% query cover, 75% identity), no CDD hits, no significant HHpred hits (lowest e-value was 2.8) /note=Transmembrane domains: None. There is only a mark for a single peptide. /note=Secondary Annotator Name: Ryan, Kaitlin /note=Secondary Annotator QC: I agree with all of the above evidence; the function and location calls are well supported by significant hits. One thing to maybe note is that a 4 base pair overlap is typical of operon; you could mention that in your notes if you wanted to. Also, make sure to fill out the Starterator menu as NA. CDS 41000 - 41344 /gene="68" /product="gp68" /function="HNH endonuclease" /locus tag="IttyBittyPiggy_68" /note=Original Glimmer call @bp 40991 has strength 6.7; Genemark calls start at 41000 /note=SSC: 41000-41344 CP: no SCS: both-gm ST: SS BLAST-Start: [HNH endonuclease [Arthrobacter phage Janeemi]],,NCBI, q4:s2 97.3684% 7.11277E-54 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.044, -4.3913063680686815, yes F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage Janeemi]],,UVK63587,88.3929,7.11277E-54 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chamorro, Marco /note=Auto-annotation: Glimer calls a start site of 40991 and GeneMark calls a start site of 41000 /note=Coding Potential: Both the self-trained and the host-trained show coding potential that is consistent with the 41000 start site. /note=SD (Final) Score: The 41000 start site has the best final score of -4.391 and the best Z-score of 2.044 /note=Gap/overlap: The 41000 gap minimizes the gap (-4) and indicates that this is an operon. /note=Phamerator: As of 1/18, this gene is part of pham #133755. There are 63 members and 23 draft phages. /note=Starterator: Start site 18 was the most annotated in 10/40 non-draft phages. Start site 18 does not have a corresponding start site in IttyBittyPiggy. Start site 17 (41000) was MA in 3/40 non-draft phages. /note=Location call: The start site is 41000 for this forward gene. The evidence that supports this call is the final score, z-score, coding potential, and minimal gap. /note=Function call: HNH Endonuclease. PhagesDB blast obtained strong hits to Janeemi and Tuck with e-value of 2e-53 and 2e-43. NCBI blast obtained strong hits to Janeemi and DrManhatthan, both HNH Endonuclease with e-values of 8.5e-54 and 4.4e-41. CDD obtained no hits. HHpred did not obtain accurate hits. /note=Transmembrane domains: TmHmm predicts 0 TMDs. /note=Secondary Annotator Name: SANCHEZ, KAYLA NICOLE /note=Secondary Annotator QC: I agree with the claims. Don`t forget to check off boxes that show HNH endonuclease as evidence except for HHpred. Also fill out synteny box and GM coding drop down. CDS 41649 - 41834 /gene="69" /product="gp69" /function="hypothetical protein" /locus tag="IttyBittyPiggy_69" /note=Original Glimmer call @bp 41649 has strength 9.04; Genemark calls start at 41649 /note=SSC: 41649-41834 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein HOU52_gp67 [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 5.47441E-27 GAP: 304 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.209, -2.86135216970947, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU52_gp67 [Arthrobacter phage Yang] ],,YP_009815685,88.7097,5.47441E-27 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Mahadev, Anirudh /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 41649. The start codon is an ATG. /note=Coding Potential: There is reasonable coding potential in the third open reading frame on the Host Trained GeneMark. On the Self Trained GeneMark, the coding potential appears to start before the start site, but there is no start site that can be used to fully encompass all of the coding potential. /note=SD (Final) Score: While the longest open reading frame is at 41643, which also has a less negative final score, I still think the start site at 41649 is better because of the ATG start codon. The small differences in gap, length and spacer are insignificant. /note=Gap/overlap: While the gap of 304 base pairs is large, there is no coding potential in the gap nor a start and stop codon to suggest the addition of a gene. /note=Phamerator: Pham 130864, 1/18/2024. 44 members in the pham. The other members of the pham are all from the AZ cluster and have similar length genes, which suggests that this gene is conserved. /note=Starterator: Start site #21 is manually called in 15/27 non-draft genomes, and is found in 29/44 genes in the pham. This start site’s coordinates are (21, 41649). This appears to be the most conserved and most auto-annotated start site. /note=Location call: I believe this is a real gene and the most likely start site is at 41649. /note=Function call: Phagesdb has several hits, but it does not call a single protein consistently. Phagesdb BLAST has several hits with low e-values, but they all have no known functions. HHpred had several hits, but the e-values from 6.1 to >50 leads me to believe that this data is insignificant. NCBI Blast also did not have any informative hits. I think this gene has no known function. /note=Transmembrane domains: No transmembrane domains were predicted by DeepTmHmm. /note=Secondary Annotator Name: Hon, Darren /note=Secondary Annotator QC: There is good coding potential according to both host and self-trained GeneMark. However, I agree with the annotator’s note that there is no start site that encompasses the entire coding potential, as there is a small upwards region of the coding potential that is not contained. Additionally, the large gap is accurately described, as there are no coding potential found between the genes, supported by synteny with non-draft phages on pham maps. As of 1/23/24, pham 130864 is called, and supported by both phamerator and starterator. The autoannotated start site at 41649 is also correct, thus supporting the claim that this is the correct start site despite not encompassing the coding potential holistically. Phagesdb BLAST have phages with solid e-values indicating NKF. HHPred supports this claim as it does not have any strong hits. NCBI BLAST also support it as an hypothetical protein as demonstrated by the best hits. Thus, the function is correctly called NKF. The phamerator and starterator needs to be marked as ss (suggested start). The GM coding capacity needs to be selected as no. Phagesdb BLAST and NCBI BLAST needs to have some phages and genes marked as evidence. CDS 41834 - 42145 /gene="70" /product="gp70" /function="HNH endonuclease" /locus tag="IttyBittyPiggy_70" /note=Original Glimmer call @bp 41834 has strength 4.85 /note=SSC: 41834-42145 CP: yes SCS: glimmer ST: SS BLAST-Start: [HNH endonuclease [Arthrobacter phage DrSierra] ],,NCBI, q1:s1 97.0874% 7.556E-50 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.186, -4.233889550688795, no F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage DrSierra] ],,YP_010678391,94.0,7.556E-50 SIF-HHPRED: HNH endonuclease; Thermophilic bacteriophage, HNH Endonuclease, DNA nicking, HYDROLASE; 1.52A {Geobacillus virus E2},,,5H0M_A,68.932,97.3 SIF-Syn: Shares synteny with ObiToo and Nitro, both are AZ1 phages that this pham is the last gene in the genome and has pham 130864 preceding it /note=Primary Annotator Name: Hosford, Ryan /note=Auto-annotation: Only Glimmer called the start site for the gene at 41834, LORF and ATG start codon Best final score of -4.234, but not the best Z-score of 2.186 /note=Coding Potential: Coding potential is there but does not cover the entire gene. /note=SD (Final) Score: -4.234 /note=Gap/overlap: -1 /note=Phamerator: Phamerator shows similar function and length of the gene across many phages including AZ /note=Starterator: does not have the most annotated start site but uses one that is called 15.5% of the time and when called manual annotations are at 96.5% /note=Location call: With the data present I would agree with the auto annotated start site of 41834, the final score of -4.234 while not great is the best of the possible starts present and even with the second best Z-score of 2.186 it is still within the suggested range and with the gap being the smallest at -1 it seems to be the best start site. also with the length of phamerators genes for the pham being similar it appears that this start site is correct. /note=Function call: Looking at HHPred and BLAST it is evident that this gene is an HNH endonuclease based off of the probability and percent coverage. the length as stated in the function assignment document needs to be over 30 bp which it is. BLAST shows a large percentage of alignment as well which further points to HNH endonuclease. /note=Transmembrane domains: No transmembrane domain /note=Secondary Annotator Name: Claire, Monjov /note=Secondary Annotator QC: The start site called by the primary annotator seems reasonable. The evidence I found aligns with theirs. For function, many other genes within the same pham also called for a HNH endonuclease which is strong evidence that the function call is also correct. I agree with the primary annotators findings.