CDS 67 - 516 /gene="1" /product="gp1" /function="hypothetical protein" /locus tag="PinkFriday_1" /note=Original Glimmer call @bp 67 has strength 7.76; Genemark calls start at 1 /note=SSC: 67-516 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein FDH62_gp01 [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 5.97146E-104 GAP: 0 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.288, -1.953940808934884, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDH62_gp01 [Arthrobacter phage Pumancara] ],,YP_009602867,100.0,5.97146E-104 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Akkinepally, Mrudula /note=Auto-annotation: Glimmer (start at 67) and GeneMark (start at 1). /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -1.954 /note=Gap/overlap: None /note=Phamerator: Pham 1198 Date 10/31/2023. It is conserved; found in Lasagna (AK) and Cholula (AK). /note=Starterator: Start site 10 in Starterator was manually annotated in 77/78 non-draft genes in this pham. Start 10 is 67 in PinkFriday. This evidence agrees with the site predicted by Glimmer but not GeneMark. /note=Location call: Although Glimmer and GeneMark disagree, based on the above evidence the most likely start site is 67 bp /note=Function call: NKF. The top three phagesdb BLAST hits have unknown function (E-value <8e-79), and the top 2 NCBI BLAST hits have hypothetical function. (100/99% coverage, 100/99%+ identity, and E-value <3e-103). HHpred had and CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMRs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Tubeileh, Shareef /note=Secondary Annotator QC: I think you may need to add a bit more information to the function call section. For example, do HHpred and CDD tell you that corresponding genes in other phages are also NKF? (just provide a bit more detail on why its NKF). Also, I think maybe a bit more information on the transmembrane domains. You can see the sample pecaan notes section in the annotation lab manual for more information. HHPRED doesn`t seem very helpful and phagesDB blast seems to give all "unknown function" results, so I would agree that this gene has NKF. Also make sure to check off evidence on Pecaan that supports your call. CDS 428 - 1993 /gene="2" /product="gp2" /function="terminase" /locus tag="PinkFriday_2" /note=Original Glimmer call @bp 599 has strength 12.93; Genemark calls start at 599 /note=SSC: 428-1993 CP: yes SCS: both-cs ST: SS BLAST-Start: [terminase [Arthrobacter phage Sergei] ],,NCBI, q1:s1 100.0% 0.0 GAP: -89 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.735, -3.2575878860394365, no F: terminase SIF-BLAST: ,,[terminase [Arthrobacter phage Sergei] ],,YP_010050416,99.8081,0.0 SIF-HHPRED: Terminase large subunit; genome packaging, bacteriophage, ATPase, nuclease, VIRAL PROTEIN; HET: BR; 2.2A {Enterobacteria phage HK97},,,6Z6D_A,88.8676,100.0 SIF-Syn: Terminase, upstream gene has NKF, downstream is portal and MuF-like fusion protein, just like in phage Aledel. /note=Primary Annotator Name: Giusti, Alessia /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start at 599, but the start site was manually changed to 428 due to a better Z-score (2.735) and FS (-3.258) (599 has a Z-score of 0.934 and an FS of -6.908), better synteny with genes in other phages, and better coverage of coding potential. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: The final score is the best option at -3.258 and the z score is the highest at 2.735. /note=Gap/overlap: The overlap with the upstream gene is 89, which is a bit large, but is seen in several other phages with the same gene such as CristinaYang and Aledel. /note=Phamerator: pham: 120147. Date 10/31/2023. It is conserved; found in Aledel (AK), CristinaYang (AK), and Daiboju (AK). /note=Starterator: Start number 49 corresponds to 428 in PinkFriday. It was manually annotated as the start site at the highest frequency (75 times). This start site was not the most conserved start site (153) and does not agree with the site predicted by Glimmer and GeneMark. The most conserved start site, however, is only called in 146/1114 (13.1%) of the pham. Other phages in this pham and in the same cluster (AK) such as Aledel and CristinaYang, however, also have start number 49 as the most frequently manually annotated start site. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 428. /note=Function call: All of the top 40 phagesdb BLAST and NCBI BLAST hits have the function of terminase protein with an E-value of 0. CCD had a hit for a domain that corresponds to the large subunit of a terminase protein with an E-value of 7.92e-22. Additionally, HHpred had multiple hits with 100% coverage corresponding to a terminase. The top 7 had 100% coverage, E-values <2e-30, and coverage >75% and all correlated to a terminase. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Soan, Jessica Hyunsil /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 2012 - 4039 /gene="3" /product="gp3" /function="portal protein" /locus tag="PinkFriday_3" /note=Original Glimmer call @bp 2012 has strength 8.47; Genemark calls start at 2012 /note=SSC: 2012-4039 CP: yes SCS: both ST: SS BLAST-Start: [portal and MuF-like fusion protein [Arthrobacter phage Temper16] ],,NCBI, q1:s1 100.0% 0.0 GAP: 18 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.675, -3.595227487224058, no F: portal protein SIF-BLAST: ,,[portal and MuF-like fusion protein [Arthrobacter phage Temper16] ],,ASZ74317,99.7037,0.0 SIF-HHPRED: Portal protein; portal, HK97, bacteriophage, packaging, VIRAL PROTEIN;{Hendrixvirus},,,8CEZ_K,54.8148,100.0 SIF-Syn: Portal protein; upstream gene is capsid maturation protease, downstream gene is terminase, just like in phage Eunoia. /note=Primary Annotator Name: Joo, Hannah /note=Auto-annotation: Both Glimmer and GeneMark; both call the start codon 2012 /note=Coding Potential: The ORF has strong coding potential, particularly in the forward direction. Coding potential is found in GeneMark and Host. /note=SD (Final) Score: The best final score on PECAAN is -3.472. This is not the final score of the selected ORF. The final score of the ORF of the selected start site is -3.595. /note=Gap/overlap: 18bp. The gap is fairly minimal and also, there is very strong synteny with other phages and there is no coding potential in the gap that could indicate another new gene. /note=Phamerator: 121335. Date 11/01/2023. It is found in Albanese (AK) and Aledel (AK). /note=Starterator: Start site 8 was manually annotated in 77/164 non-draft phages in this pham. Start 8 is 2012 in Pink Friday. This evidence agrees with the site predicted by Glimmer and Genemark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 2012. /note=Functional call: Portal protein. The top two phagesDB BLAST have the function of portal protein (E-value of 0.0). The NCBI BLAST’s top two hits also have this function (99% identity, E-value of 0.0, 100% coverage). HHPred had a hit for portal protein with 100% probability, 54.8148% coverage, and e-value of 1.2e-36. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Jacobs, Sarisha /note=Secondary Annotator QC: I agree with the primary annotator`s start call and evidence. These notes can be improved by including the fact that the best Z score is not the Z score of the selected ORF (the Z score of this ORF should also be listed). /note=11/17 I agree with the primary annotator`s function call of portal protein. Phagesdb and NCBI blasts both support this with the top hits being portal proteins. There is also a conserved domain for phage portal protein. A review of pham maps shows god synteny (the primary annotator needs to fill this box). May want to refresh HHPred on PECAAN to select stronger evidence. CDS 4058 - 4834 /gene="4" /product="gp4" /function="Capsid maturation protease" /locus tag="PinkFriday_4" /note=Original Glimmer call @bp 4058 has strength 13.73; Genemark calls start at 4064 /note=SSC: 4058-4834 CP: yes SCS: both-gl ST: SS BLAST-Start: [head maturation protease [Arthrobacter phage Sergei] ],,NCBI, q1:s1 100.0% 0.0 GAP: 18 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.645, -3.368377069717189, yes F: Capsid maturation protease SIF-BLAST: ,,[head maturation protease [Arthrobacter phage Sergei] ],,YP_010050418,99.6124,0.0 SIF-HHPRED: Peptidase_S78 ; Caudovirus prohead serine protease,,,PF04586.21,64.3411,99.8 SIF-Syn: Capsid maturation protease, upstream gene is major capsid protein, downstream is a portal protein, just like in phage Nancia. /note=Primary Annotator Name: Garcia, Isabella /note=Auto-annotation start source: Glimmer and GeneMark. Glimmer calls the start at 4058 and GeneMark calls it at 4064. /note=Coding Potential: All of the coding potential is located within the ORF and is on the forward strand only, indicating that this is a forward gene. /note=SD (Final) Score: -3.368 /note=Gap/overlap: 19 bp /note=Phamerator: It belongs to pham 106150. (As of 10/31/23). It is conserved--Found in OMalley and Nancia as well. /note=Starterator: Start site 44 was manually annotated in 167 non draft genes. Start 44 is located at 4058 in PinkDragon. This agrees with the site predicted by Glimmer. /note=Location call: Based on the evidence above, this is a real gene with a likely start site of 4058. /note=Function call: Predicted function is head maturation protease, based on hits from NCBI BLAST. The most significant hits had >99% query cover, high identity percentage, and an e-value of 0.0, and encoded for capsid/head maturation proteases. CCD hits returned maturation proteases with extremely low e values, and HHpred returned significant hits, with more than 5 having high quality alignment (90%+) and coding for proteases with low e-values (>10e-15). Given this evidence, I hypothesize that the gene encodes for a capsid maturation protease. /note=Transmembrane domains: There are no transmembrane domains found for this gene, which is to be expected considering it must be fully within the cell to carry out its function. /note=Secondary Annotator Name: Akkinepally, Mrudula /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 4834 - 6081 /gene="5" /product="gp5" /function="major capsid protein" /locus tag="PinkFriday_5" /note=Original Glimmer call @bp 4834 has strength 15.3; Genemark calls start at 4834 /note=SSC: 4834-6081 CP: yes SCS: both ST: SS BLAST-Start: [major capsid protein [Arthrobacter phage Sergei] ],,NCBI, q1:s1 99.759% 0.0 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.055, -3.095100142625534, yes F: major capsid protein SIF-BLAST: ,,[major capsid protein [Arthrobacter phage Sergei] ],,YP_010050419,99.5181,0.0 SIF-HHPRED: Major capsid protein; HK97-like fold, capsid size redirection, major capsid protein, VIRUS; 4.0A {Staphylococcus aureus},,,7RWZ_C,99.5181,100.0 SIF-Syn: Major capsid protein, upstream gene is capsid maturation protein, downstream is NKF, just like in phage Bodacious. /note=Primary Annotator Name: Hernandez, Sarah /note=Auto-annotation: Both Glimmer and Genemark say 4834, ATG /note=Coding Potential: The gene has reasonable coding potential predicted in the forward ORF and the start site appears to cover all of the coding potential /note=SD (Final) Score: -3.095, this was the best final score /note=Gap/overlap: -1, this is a good overlap. /note=Phamerator: Pham 104749, date: 11/1/23, it is conserved, found in Albanese (AK) and Aledel (AK) /note=Starterator: Start site 36 was manually annotated in 135 out of 241 genes in this pham. Start 36 is 4834 in PinkFriday, this matches the start site predicted by Glimmer and Genemark /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 4834 /note=Function call: PhagesDB BLASTp and NCBI BLASTp results show several phages with percentage identity values that are almost 100% and e-value 0.0 and they all call the gene function major capsid protein. CDD shows that there is one significant hit within a major capsid protein domain of the gene and an e-value of 9.68e-32 which suggests the protein has a major capsid function. HHPred has at least two significant hits with e-values less than 5.7e-3) with one hit being associated with major capsid protein structure. Overall, HHPred and CDD hits can be used as evidence to reinforce function calls made by PhagesDB BLASTp and NCBI BLASTp and the gene could be for a major capsid protein. /note=Transmembrane domains: DeepTMHMM predicts 1 TMDs, however it is 9 aa long which is outside of the acceptable range of 17 -22, therefore it is not a membrane protein. This makes sense because the capsid is not involved in entry/exit of the host cell. /note=Secondary Annotator Name: Samudrala, Vaishnavi /note=Secondary Annotator QC: /note=Primary Annotator Name: Hernandez, Sarah /note=Auto-annotation: Both Glimmer and GeneMark. Both call the start site 4834 (codon: ATG). /note=Coding Potential: The gene has high coding potential in ORFs of the forward strands. The start site suggested by both Glimmer and GeneMark covers all of the coding potential. /note=SD (Final) Score: The SD score for the start site called by both Glimmer and GeneMark has the best final score of -3.095 and z-score at 3.055. The z-score indicates that ribosome binding to this site is above average as compared to other RBS scores present for the gene, although it still has low binding affinity to ribosomes. This makes it the best possible start site in terms of SD score. Since there is only an overlap of one base pair (gap value = -1), the RBS score is a relevant factor to assess the start site. /note=Gap/overlap: There is a 1 base pair overlap between the gene and another upstream gene. This means the site isn’t a part of an operon and it`s a reasonable overlap since it isn’t any greater than -4 (which indicates that it’s neither a part of a gene or operon and reduces the space needed for a promoter, increasing suspicion). The length of the gene is 1,247 bps which is greater than 120 bps (the minimum length needed for adequate coding of a protein). /note=Phamerator: The pham the gene is found in is 104749 as of 11/2/23. The pham in which the gene is conserved is in other members of the same cluster as PinkFriday (AK). Phages used for comparison include Aledel_5 (AK), Eunoia_5 (AK), HunterDalle_5 (AK), OMalley_5 (AK), Riovina_5 (AK), and Vulture_5 (AK). The Pham Maps/ Phamerator data doesn’t call a function of this gene, but calls the function for a similar gene present in other phages of the same cluster (Pumancara and Pterodactyl). The function that calls for the similar genes is “major capsid protein”. This function was called consistently in other phages of the same cluster (AK). /note=Starterator: The reasonable start site, that is conserved among the members of the pham to which the gene belongs, is site 36 (at base pair coordinate 4834 for PinkFriday). There are 262 total members in this pham of which 22 are drafts. One of the drafts includes PinkFriday. 135 non- draft members out of the 241 non-draft genes call the most conserved start site. The Starterator is informative. It indicates that the start site called by both Glimmer and GeneMark is not only the most conserved in a large number of members in the pham (containing the target gene), but that it’s also been manually annotated 135 times as a start site for the gene. This builds greater authenticity to the site being the actual start site. /note=Location call: The gene examined is a real gene as it has high coding potential on the forward strands it’s transcribed on. The gene is also conserved on Phamerator in other phages. Both Glimmer and GeneMark calls as well as Starterator data suggest that the site 4834 is the start site of the gene. This start site covers all of the coding potential, is the most conserved in other phages of the same pham as the gene (and the same cluster as PinkFriday), and has a high number (135) of manual annotations which call it as a start site. Considering all of this information, the site 4834 is the best start site candidate. /note=Function call: Both PhagesDB BLASTp and NCBI BLASTp results show there are multiple phages (at least 4) high percentage identity values (>99%) but with low E-values (0) which all call the function of the protein product of the gene as “major capsid protein”. NCBI BLASTp results also show high query coverage values (>99%) for 2 of the pages but also low E-values (0). CDD shows that there is one significant hit within a major capsid protein domain of the gene (E-value= 9.68e-32) which suggests the protein has a major capsid function. HHPred has at least two significant hits (E-values <5.7e-31) with one hit being associated with major capsid protein structure. Overall, HHPred and CDD hits can be used as evidence to reinforce function calls made by PhagesDB BLASTp and NCBI BLASTp and the protein could have “a major capsid protein” function. /note=Transmembrane domains: There is one transmembrane domain that has a length of 9 amino acids. The TMD is noted as TMhelix which indicates its not a signaling peptide.The TMD is not an adequate length, so the protein product cannot be called a membrane protein. This makes sense in the context of the function considering the capsid generally lacks contact with any host bacterial membranes as opposed to the tail proteins. /note=Secondary Annotator Name: Samudrala, Vaishnavi /note= /note=Note for primary annotator (overall gist): Great job, I agree with the function call. Some suggestions: One more piece of evidence needs to be checked for NCBI BLASTp. Also, HHPred needs two pieces of evidence checked (since there are some significant hits) as well as for CDD. CDS 6162 - 6335 /gene="6" /product="gp6" /function="hypothetical protein" /locus tag="PinkFriday_6" /note=Original Glimmer call @bp 6162 has strength 16.47; Genemark calls start at 6162 /note=SSC: 6162-6335 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDH62_gp06 [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 6.76949E-31 GAP: 80 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.757, -3.2123487306667524, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDH62_gp06 [Arthrobacter phage Pumancara] ],,YP_009602872,100.0,6.76949E-31 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hanna Bidzan /note=Auto-annotation: Glimmer and GeneMark. Start site 6162. ATG /note=Coding Potential: Good coding potential with the start site covering all the coding potential in the forward strand. There is also high activity with the the line going over 0.5 /note=SD (Final) Score: The original start site is 6162 with a rbs final score of -3.212. It is not the best score although it would be better if it was less negative, however it is the best compared to other starts. /note=Gap/overlap: There is an 80bp gap and this is the longest reasonable ORF with no forward or reverse switch. There are no gaps upstream or downstream that cannot be filled by changing the start site of the gene. /note=Location call: Given the agreed start site, coding potential, and synteny, it can be concluded that the start site of 6162 is the most likely site. /note=Phamerator: As of 10/31/23, the gene is within pham 114924. It part of cluster AK along with 82 other members in this cluster including Suppi 10 and Puppers 6. /note=Starterator: There is a highly probably start site 14 @ 6162. There are 90 other non draft genes with the start site at 14 thats also within cluster AK /note=Location call: This gene is a real gene and has a start site at basepair 6162 /note=Function call: No, there is no known function. None of the genes had any functions listed since they were all unknown/hypothetical. There was a strong hit as listed for Pumancara, however it still isn`t sufficient enough evidence to suggest that our gene has a function. It also had a 100% identity, but again since none of the genes listed had known function or super great e values, the function of this gene is unknown until we gather further evidence. NKF /note=Transmembrane domains: No TMDs by DeepTMHMM /note=Secondary Annotator Name: Joo, Hannah /note=Secondary Annotator QC: I agree with this annotation for the location call and function call. All of the evidence categories have been considered. Perhaps, include which specific evidence categories had insufficient strength to be counted. CDS 6345 - 6926 /gene="7" /product="gp7" /function="head-to-tail adaptor" /locus tag="PinkFriday_7" /note=Original Glimmer call @bp 6345 has strength 6.47; Genemark calls start at 6345 /note=SSC: 6345-6926 CP: yes SCS: both ST: SS BLAST-Start: [head-tail adaptor [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 1.63617E-137 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.918, -5.186767100221219, no F: head-to-tail adaptor SIF-BLAST: ,,[head-tail adaptor [Arthrobacter phage Pumancara] ],,YP_009602873,100.0,1.63617E-137 SIF-HHPRED: Adaptor protein Rcc01688; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_C,98.9637,99.9 SIF-Syn: Head-to-tail adaptor, upstream gene is NKF in pham 124306, downstream is head-to-tail stopper, just like in phage Temper16. /note=Primary Annotator Name: Sass, Arielle /note=Auto-annotation: GeneMark and Glimmer both agree on the same start site, 6345. Start codon is GTG. /note=Coding Potential: Within the putative ORF the gene has a reasonable coding potential predicted by GeneMark. The chosen start site covers all of the coding potential. /note=SD (Final) Score: The SD score is -5.187. While it is not the most positive option, it is more positive than most of the other first 10 options for start sites. /note=Gap/overlap: The upstream gap of 9 base pairs is reasonable. The start site chosen results in the longest ORF. The resulting length of the gene, 582 base pairs, is reasonable. /note=Phamerator: As of 10/30/2023, this gene is in pham 121199. 82 phages in the same cluster as PinkFriday, cluster AK, also have a gene in this Pham however many other phage clusters also have this gene’s Pham present. /note=Starterator: As of 11/6/23, start site 14 is present in 77 genes in this pham and called in 47 non-draft phage annotations, or 63.6% of the time when the start is present. The called start site is located at base pair 6345 in PinkFriday. All of 46 other genes with start site 14 called are in the same cluster as PinkFriday (cluster AK). /note=Location call: Taken together, the gathered evidence suggests that the gene is real and that the start site for this gene is at base pair 6345. /note=Function call: PhagesDB and NCBI BLAST have hits with the suggested function head-to-tail adaptor protein with small e values of <10^-136. HHPRED has hits that correspond to unique SEA-PHAGES requirements for this gene. Has alignment with phage HK97 gp6 with 98.8% probability, 96.89% coverage, and e-value of 9.1e-8. /note=Transmembrane domains: There are no TMDs predicted by DeepTMHMM. /note= /note=Primary Annotator Name: Sass, Arielle /note=Auto-annotation: Glimmer and Genemark. Both call the start site at 6345. /note=Coding Potential: There is good coding potential for this ORF in the forward strand. The predicted start site contains all the coding potential. Coding potential is found in both Genemark Self and Host. /note=SD (Final) Score: -5.187. This is not the best final score as there are several others with less negative scores. However, there are only a few with better final scores and the correct start site doesn’t necessarily need to have the best final score. /note=Gap/overlap: There is a 9 bp gap between this gene and the gene before. This is a relatively small gap and this gap is the smallest of the different suggested start sites and also results in the longest gene. The gap also seems to be conserved in other phages (Aledel, Pterodactyl). No coding potential in the gap. /note=Phamerator: Pham: 121199 /note=Starterator: Start site 18 in Starterator was manually annotated in 47 of 419 non-draft genes in this pham. All of these genes are in the same cluster as PinkFriday as well. This evidence agrees with the site predicted by Glimmer and Genemark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 6345. /note=Function call: PhagesDB and NCBI had hits for head to tail adapter with low e values (1e-107). HHPred had several relevant hits such as one for an adapter protein (prob: 99.9, coverage: 98.96,e value: 1.2e-22) and head completion protein (prob: 99.4, coverage: 92.746, e value: 2.1e-11.) /note=Transmembrane domains: There were no TMDs predicted by DeepTMHMM. /note=Secondary Annotator Name: Sotelo, Jessie /note=Secondary Annotator QC: I agree with the conclusions of this annotator so far. However, I found an inconsistency in the starterator data. I found 47/114 as the number of non-draft phages with the suggested start number for this gene instead of 114. I think the 114 comes from start number 17. /note=I agree with the function call of Head-to-tail adapter. CDS 6926 - 7264 /gene="8" /product="gp8" /function="head-to-tail stopper" /locus tag="PinkFriday_8" /note=Original Glimmer call @bp 6926 has strength 12.79; Genemark calls start at 6926 /note=SSC: 6926-7264 CP: yes SCS: both ST: SS BLAST-Start: [head-to-tail stopper [Arthrobacter phage Supakev]],,NCBI, q1:s1 100.0% 6.88706E-77 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.13, -4.532304881730868, no F: head-to-tail stopper SIF-BLAST: ,,[head-to-tail stopper [Arthrobacter phage Supakev]],,AZS10386,99.1071,6.88706E-77 SIF-HHPRED: Stopper protein Rcc01689; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_E,97.3214,99.7 SIF-Syn: Head-to-tail stopper, upstream gene is head-to-tail adaptor in pham 124101, downstream is NKF, as is in phage KingBob. /note=Primary Annotator Name: Diaz, Sebastian /note=Auto-annotation: Both Glimmer and GeneMark are utilized, they both agree at start site #6926, with the start codon ATG. /note=Coding Potential: Within both the Host-Trained and Self-Trained coding potential graphs, the gene’s ORF contains substantial coding potential activity and the start site does cover all the coding potential. /note=SD (Final) Score: The SD score (for the Glimmer start site) is -4.532 (which is not the best). However, this score is an acceptable value as it is a unique number and can be considered suggestive of the presence of a credible ribosome binding site. *This gene may be transcribed as polycistronic due to a -1 overlap, in this case it has a Z-score of 2.13. /note=Gap/overlap: This gene’s gap value is “-1” which is a reasonable value. The gene length for Glimmer’s start site is also acceptable (339). There are alternative start site candidates that would result in larger ORFs but this results in enlarged overlap values (e.g. -257, -433). Due to this, these alternative candidates are dismissed. /note=Phamerator: As of October 31st, 2023 this gene is found within Pham: 121341. The pham is conserved with other members within its cluster, including phage BigMack’s gene 8 and phage Eunoia’s gene 8. In addition, the phamerator database did have a highly consistent function called for this gene: head-to-tail stopper protein. /note=Starterator: Yes, there is a reasonable state site that is highly conserved within the pham. The conserved start site is site number 25, which corresponds to base pair coordinate 6926 for PinkFriday. There are approximately 170 members in this pham and 130/159 nondraft genes call site #25. /note=Location call: I believe this is a real gene due to its conservation within its pham group and its reasonable coding potential. The gene’s potential start site candidate #36/6926 seems the most reasonable. /note=Function call: The predicted function for this gene is the head-to-tail stopper protein. This is based on pieces of evidence such as the various phagesDB BLAST hits listed with the same function; all the hits have small e values ranging from 1e-59 to 6e-60. In addition, phagesDB function frequency for the head-to-tail stopper protein within subcluster AK is 82%. Also, NCBI BLAST hits for the head-to-tail stopper have high query coverage (100%), and percent identity at 98.2%. /note=Transmembrane domains: This protein is not a membrane protein because it has no transmembrane domains. /note=Secondary Annotator Name: JACOBS, SARISHA MALAIKA /note=Secondary Annotator QC: I agree with the start site (6926) but the primary, but seeing as the starterator has updated over this weekend, please refresh and confirm the Pham group and the start number of the start site. I would also list the final score of the ORF that you selected not just the best final score. /note=11/17 There is strong evidence to suggest that this protein is a head to tail stopper (I would agree with the primary annotator). The Phagesdb and NCBI blasts show hits for the head to tail stopper and there is good synteny with other phages. The only thing I am concerned about is that on the SEA phages approved function list it says that in order for this protein to be called " SPP1 16 (5A21 chain E or F in the macromolecular complex) or Bacillus protein yqbH." I would double check that one of these crystal structures have been called. CDS 7264 - 7551 /gene="9" /product="gp9" /function="hypothetical protein" /locus tag="PinkFriday_9" /note=Original Glimmer call @bp 7264 has strength 7.45; Genemark calls start at 7264 /note=SSC: 7264-7551 CP: no SCS: both ST: SS BLAST-Start: [neck protein [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 1.48636E-59 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.053, -4.551528886052257, no F: hypothetical protein SIF-BLAST: ,,[neck protein [Arthrobacter phage Pumancara] ],,YP_009602875,100.0,1.48636E-59 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Dweik, Qaiss /note=Auto-annotation: Both Glimmer and GeneMark call and agree on the start site at bp 7264 with a start codon of ATG (Met). /note=Coding Potential: Gene has reasonable coding potential within the putative ORF, as indicated by Self and Host GeneMark maps with high levels of typical coding potential. The start site does not cover all of the coding potential, as it is placed slightly to the right of a high peak of coding potential on the Self and Host GeneMark maps. /note=SD (Final) Score: The original start site has the best SD score (least negative) and is not part of an operon due to the -1 bp overlap with the preceding gene, meaning the RBS (SD) score is an effective determinant of choosing the start site. /note=Gap/overlap: The overlap with the preceding gene (-1 bp) is reasonable and is not indicative of this gene being part of an operon. This proposed start site also creates the gene with the longest ORF out of all the alternative start site candidates. The proposed length of the gene (288 bp) is acceptable as well. /note=Phamerator: As of 11/01/2023, this gene is found in pham 121229. Other genes of members in the AK cluster are present in this pham as well. Some of these phages include Albanese, Aledel, AppleCider, Beethoven, Bennie, and Korra. There was no function called for this gene. /note=Starterator: There is a reasonable start site for which the genes in this pham (121229) are conserved at start site 62 (which is at bp 7264 for PinkFriday_9). There are 314 non-draft members in this pham and an additional 36 draft members. Of the 314 non-draft members, 122 of them call start site 62. The Starterator program is informative, as very few members in this pham are drafts and it provides the number of manual annotations for which the other non-draft genes have called start site 62. /note=Location call: The gathered evidence suggests that this is a real gene with the original start site @ bp 7264 being correct due to its (almost) complete encompassing of the coding potential, very little overlap with the preceding gene (-1 bp), creation of the longest possible ORF of all of the start site candidates, and consistency with the Starterator report. /note=Function call: Hits from NCBI and PhagesDB BLASTp, both of which had 2 hits with high query coverage (100%), high % identity (>93%), and relatively low E-values (<1e-42) indicate that this gene codes for a neck protein. However, this gene shall remain under NKF because neither CDD or HHpred called any significant hits and no TMDs were found. /note=Transmembrane domains: No transmembrane domains were predicted, signifying that this gene does not code for a membrane protein. /note=Secondary Annotator Name: Samudrala, Vaishnavi /note=Secondary Annotator QC: /note=Primary Annotator Name: Dweik, Qaiss. /note=Auto-annotation: Glimmer and GeneMark. Both call the start site at 7264 (codon: ATG). /note=Coding Potential: The gene does have a high, reasonable coding potential on the forward strands for which it’s transcribed. The start site chosen by both Glimmer and GeneMark doesn’t completely cover all the coding potential. /note=SD (Final) Score: The SD score for the suggested start site by Glimmer and GeneMark is the highest and best SD score (-4.552) and z-score (2.053). This indicates that, although there is poor binding of the ribosomes with the potential start site, its a higher than average RBS score as compared to other sites on the gene. The RBS score is relevant to the data since there is only a 1 bp overlap between the gene and an upstream gene. The suggested start site of the gene isn’t a part of an operon so RBS can be used to properly measure its ribosome binding. /note=Gap/overlap: There is a 1 base pair overlap between the gene and another upstream gene. This means the site isn’t a part of an operon and it`s a reasonable overlap since it isn’t any greater than -4 (which would indicate that it’s neither a part of a gene or operon and reduces the space needed for a promoter, increasing suspicion). The length of the gene is 287 bps which is greater than 120 bps (the minimum length needed for adequate coding of a protein). A more upstream start site could be considered so that more of the coding potential is covered by the start site through extension of the ORF. /note=Phamerator: The pham the gene is found in is 121229 as of 11/2/23. The pham in which the gene is conserved is in other members of the same cluster as PinkFriday (AK). Phages used for comparison include Albanese_13 (AK), Aledel_9 (AK), and AppleCider_13 (AK). The Pham Maps/ Phamerator data doesn’t call a function of this gene. /note=Starterator: The reasonable start site, that is conserved among the members of the pham to which the gene belongs, is site 62 (at base pair coordinate 7264 for PinkFriday). There are 350 total members in this pham of which 36 are drafts. One of the drafts includes PinkFriday. 122 non- draft members out of the 314 non-draft genes call the most conserved start site. The Starterator is informative. It indicates that the start site called by both Glimmer and GeneMark is not only the most conserved in a large number of members in the pham (containing the target gene), but that it’s also been manually annotated 122 times as a start site for the gene. /note=Location call: The gene examined is a real gene as it has high coding potential on the forward strands it’s transcribed on. The gene is also conserved on Phamerator in other phages. Both Glimmer and GeneMark calls as well as Starterator data suggest that the site 7264 is the start site of the gene. Also, it`s the most conserved start site in other phages of the same pham as the gene (and the same cluster as PinkFriday), and it has a high number (122) of manual annotations which call it as a start site. Although there is much evidence that supports the 7264 to be the actual start site, it doesn’t cover all of the coding potential when examined through self- and host-trained GeneMark. Considering this fact, it can be concluded that the start site is a good candidate and could act as a fully functioning start site but a more upstream start site is the actual start site. /note=Function call: Both PhagesDB BLASTp and NCBI BLASTp results show there are multiple phages (at least 4) high percentage identity values (>93%) and low E-values (<1e-7). PhagesDB BLASTp significant hits call the function of the gene as unknown but some NCBI BLASTp hits call the function as a “neck protein”. However, this function isn’t on the SEA-PHAGES list of approved functions. NCBI BLASTp results also show high query coverage values (>100%) for 2 of the pages but also low E-values (<1e-7). CDD and HHPred show no significant hits. /note=Transmembrane domains: There are no transmembrane domains predicted by DeepTMHMM. The protein product cannot be called as a membrane protein. /note=Secondary Annotator Name: Samudrala, Vaishnavi /note= /note=note to primary annotator: Update the synteny box CDS 7548 - 7982 /gene="10" /product="gp10" /function="tail terminator" /locus tag="PinkFriday_10" /note=Original Glimmer call @bp 7554 has strength 5.6; Genemark calls start at 7554 /note=SSC: 7548-7982 CP: no SCS: both-cs ST: SS BLAST-Start: [tail terminator [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 2.7029E-99 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.002, -6.844600918460983, no F: tail terminator SIF-BLAST: ,,[tail terminator [Arthrobacter phage Pumancara] ],,YP_009602876,99.3056,2.7029E-99 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kathiravan, Anoushka /note=Auto-annotation: Start site predicted to be 7554 by both Glimmer and GeneMark /note=Coding Potential: There is good coding potential in the forward ORF only found in both Glimmer and GeneMark /note=SD (Final) Score: -3.987 (not the best score) /note=Gap/overlap: 2, fairly small indicating the gene may be part of an operon. /note=Phamerator: The gene pham is 121351 and the date was 10/28/23. This is conserved as it is found in Daiboju (AK) and Vulture (AK) /note=Starterator: Start site 14 is identified in the starterator and is found in 61 of the 140 non-draft genes. In PinkFriday Start 14 is 7554 which is the same as what was identified in Genemark and Glimmer. /note=Location call: Based on the above evidence this is a real gene and the start site is 7554. /note=Function call: Tail Terminator. The highest ranked phages in BLAST have the function of tail terminator with an e value lower than 8e-76. In NCBI BLAST the top phages also have a function call of tail terminator (98.59% coverage, 9e-98 e-value). In CDD the top ranked phage called the gene a recombinase protein however in HHpred the phages called tail terminator (99.6%, 5.2e-13 e-value) /note=Transmembrane domains: None. This makes sense because according to HHpred and CDD this is a tail terminator protein and based on that call there would be no transmembrane region. Therefore this makes sense. /note=Secondary Annotator Name: Sotelo, Jessie /note=Secondary Annotator QC: I agree with the calls of this annotator. I agree with the function call of tail terminator. The evidence for PhagesDB and NCBI blast should be checked and the synteny box needs to be filled out. /note= /note=Primary Annotator Name: Anoushka, Kathiravan /note=Auto-annotation: Glimmer and Genemark both agree. They call the start at 7554. /note=Coding Potential: Substantial coding potential in this ORF is found on the forward strand. Coding potential is found on both Genemark Self and Host. /note=SD (Final) Score: -3.987. This is the best final score. Not the longest ORF however. /note=Gap/overlap: Overlap: 4bp. This overlap is a reasonable length. /note=Phamerator: Pham: 122509. Date 11/8/23. It is conserved. Found in Albanese_14 (AK) and Aledel_10 (AK). /note=Starterator: Start site 17 in starterator was manually annotated in 79/194 non-draft genes in this pham. This evidence agrees with the site predicted by Glimmer and Genemark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 7554. /note=Function call: PhagesDB and NCBI blast have several hits for tail terminator with low e values. HHPred also has a hit for a tail terminator (prob: 99.6, coverage: 90.85, e value: 5.2e-13. CDD has no relevant hits. /note=Transmembrane domains: There were no TMDs predicted by DeepTMHMM. /note=Secondary Annotator Name: Sotelo, Jessie /note=Secondary Annotator QC: I agree with the calls of this annotator. I agree with the calls of this annotator. I agree with the function call of tail terminator. The evidence for PhagesDB and NCBI blast should be checked and the synteny box needs to be filled out. CDS 7979 - 8884 /gene="11" /product="gp11" /function="major tail protein" /locus tag="PinkFriday_11" /note=Original Glimmer call @bp 7979 has strength 14.79; Genemark calls start at 7979 /note=SSC: 7979-8884 CP: yes SCS: both ST: SS BLAST-Start: [major tail protein [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.645, -3.368377069717189, yes F: major tail protein SIF-BLAST: ,,[major tail protein [Arthrobacter phage Pumancara] ],,YP_009602877,100.0,0.0 SIF-HHPRED: Major capsid protein; polyvalent staphylococcal bactoriophage, Myoviridae, tail sheath, tail contraction, virus like particle; 3.8A {Staphylococcus phage 812},,,5LII_P,25.2492,98.1 SIF-Syn: Major tail protein, upstream gene is tail terminator, downstream is tail assembly chaperone, just like in phage BigMack. /note=Primary Annotator Name: Aves, Alexandra /note=Auto-annotation: Glimmer and GeneMark called the same start site at 7979, with the start codon listed as ATG. /note=Coding Potential: Both Host Trained Gene Mark and the Self Trained Gene Mark showed good coding potential and there were strong similarities in both (including a large dip around 8450, however this dip is insignificant at this time relative to the rest of the gene’s potential). The chosen start site covers all the coding potential. /note=SD (Final) Score: This gene has the least negative final score listed, as -3.368. /note=Gap/overlap: The -4 overlap is indicative of an operon and it is of the LORF. /note=Phamerator: The pham number as of 10/29/23 is 121196. This gene is conserved in phages Aledel, CristinaYang, and AppleCider, all in the same cluster (AK) as PinkFriday. The function call for this gene is major tail protein, and it is consistent between Phamerator and the phams database. /note=Starterator: Start site 37 was manually annotated in 147/447 non-draft genes in this pham. Start site 37 is at position 7979 in PinkFriday. This evidence agrees with the site predicted by Glimmer and Genemark. /note=Location call: Evidence suggests that the called start site is correct at 7979. Based on the Glimmer/GeneMark start call, the -4 gap, the z score above 2 and the least negative final score, there is strong evidence for the realness of this gene. /note=Function call: Predicted function is a major tail protein based on hits from NCBI BLASTp and phagesDB, both of which had hits with high percent identity (99%) and low e-values (0), especially with Arthrobacter phage Pumancara. There are some HHpred data that supports the call of this gene as a major tail protein. The first is a good hit of a staphylococcus phage, with a probability for 98% and an e-value of 0.000085. /note=Transmembrane domains: Based on DeepTMHMM there are no hits for a membrane protein. /note=Secondary Annotator Name: Yao, Alice /note=Secondary Annotator QC: Might want to double-check on the Starterator start site again because I got something different. I have the start site to be 6. You might also want to write out that there is 0 transmembrane domain instead of N/A. Other than that, it looks great! CDS 8971 - 9393 /gene="12" /product="gp12" /function="tail assembly chaperone" /locus tag="PinkFriday_12" /note=Original Glimmer call @bp 8962 has strength 9.37; Genemark calls start at 8962 /note=SSC: 8971-9393 CP: no SCS: both-cs ST: SS BLAST-Start: [tail assembly chaperone [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 6.49322E-93 GAP: 86 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.151, -4.487636275856737, no F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Arthrobacter phage Pumancara] ],,YP_009602879,98.5714,6.49322E-93 SIF-HHPRED: SIF-Syn: Surrounding genes of CristinaYang and WonderBoy have synteny. Gene 11 lines up with gene 11 of PinkFriday for both and all three code for major tail protein. CristinaYang and WonderBoy have gene 12 overlap with their gene 13 and both are tail assembly chaperones. Gene 12 and 14 of PinkFriday overlap with gene 13 (14 does not show synteny). Gene 15 of PinkFriday lines up with gene 14 of CristinaYang and WonderBoy and all code for tape measure protein. /note=Primary Annotator Name: Ebrahimi, Shaghaiegh Mariam /note=Auto-annotation: 8962 start site: Glimmer and Genemark agree /note=Coding Potential: Coding potential found in Self-GeneMark and Host-GeneMark and is only in the forward gene with no large gaps. /note=SD (Final) Score: -5.536, not largest final score, but other scores are around the same /note=Gap/overlap: 77, smallest gap among start sites, and gene is conserved /note=Phamerator: The pham number as of Nov 1, 2023 is 85377. The gene is conserved in 82 other phages, all in the AK cluster. Gene conserved in CristinaYang and WonderBoy /note=Starterator: There are 78 non-draft members of this Pham (82 total phages). 52/78 non-draft members call start site 10, which correlates to a start site of 8971. Start 8 at 8962, called by 11/78 non-draft phages, it is not most annotated in Starterator, but was chosen because it had the smallest gap and a more likely start codon than the most annotated. /note=Location call: Evidence points to this being a real gene with the most likely start site being 8962 which was called by Glimmer and Genemark. /note=Function call: Tail assembly chaperone is included on approved functions list. Good hits on both NCBI and Phagesdb. No good hits on CDD or HHpred. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs and so this is not a transmembrane protein. /note= /note=Secondary Annotator Name: Aguirre, Austin /note=Secondary Annotator QC: I agree on this location call, but I was confused on the date they called their Phamerator. I am only able to call the Pham from 10/28/23, but they called the pham on 10/28/23? I also noticed that the coding potential graphs contained upstream coding potential in different forward frames, but there was no direct overlap in the coding potential of the frame containing the coding potential for this gene. 11/16/23 update: Overall I agree with the call as there is very strong evidence that I noted down for tail assembly protein; however, the PECAAN notes could be more thorough and explain the evidence. Synteny notes must also be filled out. CDS join(8971..9366,9366..9743) /gene="13" /product="gp13" /locus tag="PinkFriday_13" /note= /note=SSC: 8971-9743 CP: no SCS: neither ST: NI BLAST-Start: [tail assembly chaperone [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 0.0 GAP: -423 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.151, -4.487636275856737, no F: SIF-BLAST: ,,[tail assembly chaperone [Arthrobacter phage Pumancara] ],,YP_009602878,99.2218,0.0 SIF-HHPRED: SIF-Syn: CDS 9750 - 12308 /gene="14" /product="gp14" /function="tape measure protein" /locus tag="PinkFriday_14" /note=Original Glimmer call @bp 9750 has strength 8.5; Genemark calls start at 9750 /note=SSC: 9750-12308 CP: yes SCS: both ST: SS BLAST-Start: [tail length tape measure protein [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 0.0 GAP: 5 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.98, -3.5088110057482766, yes F: tape measure protein SIF-BLAST: ,,[tail length tape measure protein [Arthrobacter phage Pumancara] ],,YP_009602880,98.9437,0.0 SIF-HHPRED: Tape Measure Protein, gp57; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_AF,11.385,99.9 SIF-Syn: tape measure protein, upstream gene has tail assembly chaperone, downstream is minor tail protein, just like in phage Pumancara /note=Primary Annotator Name: Estampa, Julia /note=Auto-annotation: Glimmer and Genemark both call the gene and indicate that the start site is at 9750 bp. The start codon is ATG. Host-Trained and Self-Trained GeneMark both reflect relatively high coding potential that is consistent with the ORF. The chosen start site covers all the coding potential. /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF. /note=SD (Final) Score: The SD score is -3.509 and is the best from the list, and this start site includes the LORF. /note=Gap/overlap: The overlap is 6 bp (gap of -6 bp) long upstream of the gene, which is small and reasonable. The length of the gene (2559 bp) is acceptable. /note=Phamerator: As of 11/11/23, the pham number is 124121. There are 6 clusters represented in this pham: EH, EE, AK, EC, AZ1, AZ2. Tape measure protein is the function called most frequently in this pham. /note=Starterator: The start number called the most often in the published annotations is 19 and was called in 99 of the 190 non-draft genes in the Pham. However, PinkFriday was called for start number 3, which was manually annotated 78 of the 190 non-draft genes for cluster AK. Start number 3 was found in 83 of 209 ( 39.7% ) of genes in pham and was called 100.0% of the time when present. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Gathered evidence suggests this is a real gene with good coding potential and that the strongest candidate for the start site is 9750 bp. This start site is consistent with the ORF and the 6 bp overlap. /note=Function call: tape measure protein. The majority of phagesDB BLAST top hits revealed “tape measure protein” for phages with E-values ranging from 0 to 1e-115. Top PhagesDB BLAST sequence alignment hit scores are 1576 and 1566 for most phages belonging in the same pham 124121 and AK cluster. NCBI BLAST hits revealed similar results suggesting tape measure protein. Accession number for top phage hits: YP_009602880 and YP_010050428. CDD results top domain hits for accession number COG5412, phage-related protein [Mobilome: prophages, transposons], with an e-value of 1.70e-16 and accession number PHA01351, putative minor structural protein, with an e-value of 2.56e-04. In HHPred, the top hit suggested tape measure protein function with a probability of 99.9% and E-value of 2.7e-15. /note=Transmembrane domains: Number of predicted TMHs returned is 0, suggesting it’s not a membrane protein. /note=Secondary Annotator Name: Sotelo, Jessie /note=Secondary Annotator QC: I agree with this annotation so far. However, it is missing PECAAN notes for phamerator and startertor. The pham starterator box needs an option selected as well. The function and synteny boxes need to be filled out and relevant evidence from Blast, CDD, and HHPred need to be checked. Based on the evidence I would call this is tape measure protein. /note= /note=Primary Annotator Name: Estampa, Julia /note=Auto-annotation: Glimmer and Genemark both agree. They call the start at 9750. /note=Coding Potential: Substantial coding potential in this ORF is found on the forward strand. Coding potential is found on both Genemark Self and Host. /note=SD (Final) Score: -3.509. This is the best final score on PECAAN and this ORF is the longest. /note=Gap/overlap: Gap: 6 bp. This is a reasonable gap as it is very small and conserved in other phages. /note=Phamerator: Pham: 121305. Date: 11/2/23. It is conserved. Found in DreamTeam_14 and Aledel_14. /note=Starterator: Start site 3 in Startertor was manually annotated in 78/190 non-draft genes in this pham. All the genes with this start site are from the same cluster (AK). This evidence agrees with the site predicted by Glimmer and Genemark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 9750. /note=Function call: PhagesDB and NCBI blast have several hits for tape measure protein with low e values (0). HHPred also contains many hits for Tape measure protein. One of the top hits has prob: 99.7, coverage: 86.03, e value: 2.2e-8. CDD has no relevant hits. /note=Transmembrane domains: There were no TMDs predicted on DeepTMHMM. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: I agree with this annotation so far. However, it is missing PECAAN notes for phamerator and startertor. The pham starterator box needs an option selected as well. The function and synteny boxes need to be filled out and relevant evidence from Blast, CDD, and HHPred need to be checked. Based on the evidence I would call this is tape measure protein. CDS 12305 - 14440 /gene="15" /product="gp15" /function="minor tail protein" /locus tag="PinkFriday_15" /note=Original Glimmer call @bp 12305 has strength 10.68; Genemark calls start at 12305 /note=SSC: 12305-14440 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage Christian]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.731, -5.293288977131869, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Christian]],,ASR83399,84.1515,0.0 SIF-HHPRED: SIF-Syn: Minor tail protein, upstream gene is minor tail protein as well, just like in phage Aledel. Downstream gene is tape measure protein for PinkFriday and Aledel. /note=Primary Annotator Name: Sotelo, Jessie /note=Auto-annotation: Glimmer and GeneMark both agree on the start site of 12305. /note=Coding Potential: Coding potential in this ORF is almost exclusively on the forward strand, indicating this is a forward gene. Coding potential is found in both GeneMark Self and Host. /note=SD (Final) Score: -5.293. This is the second best final score on PECAAN. Although there is a start site with a better final score, it isn’t by much. Therefore this start site is still the preferred one since it has a longer ORF and less overlap. /note=Gap/overlap: There is an overlap of 4 bp which is very small and therefore does not give a cause for concern. This overlap size can also be an indication of an operon. /note=Phamerator: Pham: 2058. Date: 10/30/2023. It is conserved; Found in all 42 non-draft genes in this pham and all are in the same cluster AK, examples) PitaDog_15, Urla_15. /note=Starterator: Start site 1 in Starterator was manually annotated in 42/42 non-draft genes in this pham. The start site agrees with the site predicted by Glimmer and Genemark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 12305. /note=Function call: Minor tail protein. The top two BLAST hits on both phagesDB and NCBI had the function of minor tail protein. (All E-values of 0.0) HHpred had no relevant hits. CDD had a hit for a phage tail protein. Protein sequence had many G`s supporting it being glycine rich. /note=Transmembrane domains: DeepTMHMM did not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Alice Yao /note=Secondary Annotator QC: I think the pham map for PinkFriday is updated now so you might want to look at it again to correct the synteny box too. Also might want to check the starterator because the numbers changed. Other than that, everything looks good! CDS 14431 - 15588 /gene="16" /product="gp16" /function="minor tail protein" /locus tag="PinkFriday_16" /note=Original Glimmer call @bp 14431 has strength 11.97; Genemark calls start at 14431 /note=SSC: 14431-15588 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage HunterDalle] ],,NCBI, q1:s1 100.0% 0.0 GAP: -10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.714, -4.068367735406665, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage HunterDalle] ],,YP_009602636,91.1688,0.0 SIF-HHPRED: Protein gp18; NP_465809.1, prophage tail protein gp18, Structural Genomics, Joint Center for Structural Genomics, JCSG, Protein Structure Initiative; HET: MLY, MSE; 1.7A {Listeria monocytogenes EGD-e},,,3GS9_A,98.1818,99.7 SIF-Syn: minor tail protein, upstream gene is minor tail protein, downstream is a minor tail protein, just like in phage Aledel. /note=Primary Annotator Name: Vazquez, Eunice /note=Auto-annotation: Glimmer and Genemark. Both call the start at 14431. /note=Coding Potential: Coding potential in this ORF is found on the forward strand only, indicating this is a forward gene. /note=SD (Final) Score: -4.068 is the final score, but it is not the best final score on PEECAN. The z score is 2.714. /note=Gap/overlap: -10 bp overlap. This is a reasonable overlap considering it is not 50bp or more. /note=Phamerator: 120381. As of october 25th 2023. It is conserved; found in Omalley, vulture and Herb. All in cluster AK /note=Phamerator: 120381 /note=Starterator: 76/96 non-draft members call start site 5, which correlates to a start site of 14431 bp. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 14431. /note=Function Call: Minor tail protein. The top two BLAST hits on both phagesDB and NCBI had the function of minor tail protein. all e-values of 0. HHpred had no relevant hits. CDD had no relevant hits. /note=Transmembrane domains: There were no TMDs predicted on DeepTMHMM. /note=Secondary Annotator Name: Tubeileh, Shareef Ashraf /note=Secondary Annotator QC: Make sure to add if this gene contains all GM coding capacity in the first table. Also make sure to fill in the synteny box. You can reference other finished genes to see how they write their synteny boxes. On pecaan, make sure to check the boxes that agree/provide evidence for your function call. I do agree that this is a minor tail protein, as seen by HHpred results. Also, add evidence as to why this is a minor tail protein, see sample pecaan notes for this, like does HHpred support this, BLASTp? CDS 15588 - 16619 /gene="17" /product="gp17" /function="minor tail protein" /locus tag="PinkFriday_17" /note=Original Glimmer call @bp 15588 has strength 11.17; Genemark calls start at 15588 /note=SSC: 15588-16619 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage HunterDalle] ],,NCBI, q1:s1 99.7085% 0.0 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.549, -3.5707972674952195, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage HunterDalle] ],,YP_009602637,91.4956,0.0 SIF-HHPRED: SIF-Syn: Minor tail protein, upstream and downstream are also possibly minor tail proteins as shown in phage Potatoes. /note=PECAAN Notes /note=Primary Annotator Name: Aguirre, Austin. /note=Auto-annotation: Glimmer and GeneMark both call the start at 15588. Start codon called is ATG. /note=Coding Potential: Strong coding potential was completely covered btw the start site of 15, 588. However, I did notice some coding potential in the reverse strand between 16100 and 16000, which corresponds to a dip in coding potential on the forward strand. This indicates a gene in-between my current gene. /note=Z-score: Z score of 2.549 with a final score of -3.571, this is a very good z score that supports this gene call. /note=Gap/overlap: -1, could be an operon. This gene has the ideal start site as not other start site has a gap this ideal. /note=Phamerator: 15588, 10/28/23. Start site is highly conserved with 88 manual annotations in Pham. /note=Starterator: Start site 1, 15588 manually annotated in 88 of 99 non-draft genes. This evidence does agree with glimmer and gene mark. /note=Location call: Based on evidence, this is a real gene with start site at 15588. There is strong evidence that this gene has a highly annotated and conserved start site based on the Pham data which directly relates to the auto annotation data. Additionally, this start site contains all the coding potential for this gene, which supports the argument for this gene`s start site. /note=Function call: Minor tail protein /note=On PhagesDB and NCBI pBlast, both found hits of similar genes that were determined to be minor tail proteins with an e value of 0. The strongest piece of evidence was found in the protein structure where polyglycine residues were found, indicating that this is a minor tail protein. HHPred called a long tail fiber distal unit, but the e-value was not very significant at 0.85. /note=Transmembrane domains: ​​The absence of TMDs makes sense in the context of the minor tail protein. As the protein is most likely on the outside of the membrane instead, this makes sense as the minor tail protein might serve enzymatic or cell recognition purposes. /note=Secondary Annotator Name: Bidzan, Hanna /note=Secondary Annotator QC: After reviewing Blastp, CDD, and HHPred hits, I agree with the primary annotators function call of minor tail protein. CDS 16619 - 17698 /gene="18" /product="gp18" /function="minor tail protein" /locus tag="PinkFriday_18" /note=Original Glimmer call @bp 16619 has strength 11.42; Genemark calls start at 16619 /note=SSC: 16619-17698 CP: yes SCS: both ST: SS BLAST-Start: [virion structural protein [Arthrobacter phage Gisselle] ],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.134, -3.1837611541087343, yes F: minor tail protein SIF-BLAST: ,,[virion structural protein [Arthrobacter phage Gisselle] ],,YP_010049877,93.3148,0.0 SIF-HHPRED: Receptor Binding Protein; beta sandwich domain, phage receptor binding protein, Lactococcus lactis pellicle cell wall polyphosphosaccharide, VIRAL PROTEIN; 1.75A {Lactococcus phage 1358},,,4L9B_A,47.3538,99.1 SIF-Syn: In other phages, such as Beethoven and Applecider, the two genes ahead of this gene are also minor tail proteins. The two genes after this gene tend to be unassigned functions. For Pink Friday, the two genes before are one minor tail protein and one unassigned/NKF. The two genes after are one minor tail protein and one assigned/NFK. This shows little synteny but could also be updated later. /note=Primary Annotator Name: Jacobs, Sarisha /note=Auto-annotation: Glimmer and GeneMark call and agree that the gene start site is at 16619. /note=Coding Potential: This ORF does show decent coding potential for the forward strand only on both the Host and Self Genemark. The start site is also included in the coding potential. /note=SD (Final) Score: -3.184 (This is the best on PECAAN). /note=Gap/overlap: There is a slight overlap of 1 bp(-1), but this overlap is shared with other phages in the same cluster (ChewChew and BigMack). It also suggests that this may be a part of an operon. /note=Phamerator: 103376 10/31/2023. It is conserved and found in BigMack (AK) and ChewChew(AK) /note=Starterator: Start site 4 was manually annotated by 57/93 non-draft genes in this pham. Start 4 is 16619 in PinkFriday. Genemark and Glimmer support this. /note=Location call: Based on the evidence, this is a real gene, and the start site agreed on by Glimmer, Genemark, and Starterator is 16619. /note=Function call: The phagesdb BLASTp shows lots of hits for minor tail proteins. The NCBI BLASTp also shows hits for minor tail protein, but it also shows hit for viron structural protein. There were no CDD hits, but the Hppred showed multiple hits from the PDR database for RBP (RNA Binding protein) from the Lactococcus lactis phage. I believe that the function of this ORF is a minor tail protein that recognizes RNA on the surface of the bacterial cell and binds to it. /note=Transmembrane domains: Deep TMHMM does not predict any TMD, therefore, this is not a membrane protein. /note=Secondary Annotator Name: Vazquez, Eunice /note=Secondary Annotator QC: /note=Auto-annotation: The gene is called by both Glimmer and GeneMark at the start site 16619. The start codon is ATG. /note=Coding Potential: The gene has reasonable coding potential. The chosen start site covers all this coding potential. /note=SD (Final) Score: The final score is the best compared to all other start sites. The z score is also good. /note=Gap/overlap: Very small overlap . /note=The function call looks good to me, and the synteny box is filled out and also agree that it is not a transmembrane protein. CDS 17691 - 18134 /gene="19" /product="gp19" /function="minor tail protein" /locus tag="PinkFriday_19" /note=Original Glimmer call @bp 17691 has strength 3.81; Genemark calls start at 17691 /note=SSC: 17691-18134 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 2.26031E-98 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.472, -4.322809268821859, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Pumancara] ],,YP_009602885,100.0,2.26031E-98 SIF-HHPRED: SIF-Syn: Minor tail protein, the upstream gene is also a minor tail protein like in AppleCider, and the downstream gene is a membrane protein, /note=Primary Annotator Name: Tubeileh, Shareef /note=Auto-annotation: Glimmer and GeneMark. Both call the gene at 17691. /note=Coding Potential: Coding potential in this ORF is only on the forward strand, meaning its a forward gene. Coding potential is found in both GeneMarkS and host. /note=SD (Final) Score: The final score given by Glimmer is -4.323. This is the best final score given on PECAAN. /note=Gap/overlap: The gap between the upstream gene and this gene -8. This is fine because it is conserved in other similar phages, and the gap is not too large. /note=Phamerator: The pham for this gene is 99370. Run on 10/28/2023. It is conserved as it is found in both Albanese (AK) and AppleCider (AK). /note=Starterator: Start site 19 was manually annotated in 78/79 nondraft genes. In another phage, AppleCider, start site 19 is at 17691. Both Glimmer and GeneMark call at this site. This correlates to other phages with the same start site. /note=Location call: The most likely start site is 17691 since it is called by both programs, so its likely a real gene. /note=Function call: This is likely a minor tail protein gene. 6/7 of the (top) first results on NCBI are minor tail proteins. The e-values for these results are all less than 3e-55, meaning that they are very similar in sequence. On PhagesDB, the functions are labeled as unknown for the top hits, so this does not confirm nor deny NCBI results. HHpred had a 63.1% probability match, but the function is unknown. CDD had no data available. /note=Transmembrane domains: There are four transmembrane regions that appear from results on TMHMM. Therefore, this is likely a membrane protein. /note=Secondary Annotator Name: Giusti, Alessia /note=Secondary Annotator QC: I agree with this location and functional call. The only thing I would note is that the synteny box may need to be updated since the downstream protein has now been annotated to be a membrane protein. /note= /note=Primary Annotator Name: Tubeileh, Shareef /note=Auto-annotation: Glimmer and GeneMark. Both call the gene at 17691. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.323. It is the best final score on PECAAN. The Z-score is also the best at 2.472. /note=Gap/overlap: The overlap between the start of this gene and the beginning of the previous one is 8. While not large, it is a relatively uncommon overlap number. It is, however, relatively conserved in other phages of the same cluster such as CristinaYang and Daiboju, both of which are cluster AK. /note=Phamerator: pham: 99370. Date 11/3/2023. It is conserved; found in Aledel, CristinaYang, and Daijobu, all of which are in cluster AK. /note=Starterator: Start site 19 was manually annotated in 78 of the 79 non-draft genes in the pham. Start 19 is 17691 in PinkFriday. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 17691. /note=Function call: 17/20 of the top hits in NCBI BLASTp correspond to a minor tail protein and all have e-values less than 4e-20. All of these hits also correspond to an Arthrbacter phage. All BLASTp Phages DB hits came back as unknown function. CCD returned no hits and all hits in HHpred had e-values greater than 130. All things considered, this is likely a minor tail protein. /note=Transmembrane domains: DeepTMHMM predicts four TMD. Based on this evidence this gene is likely a “membrane protein”, but the functional call of minor tail protein is more specific. /note=Secondary Annotator Name: Giusti, Alessia /note=Secondary Annotator QC: I agree with this location and functional call. CDS 18134 - 18460 /gene="20" /product="gp20" /function="membrane protein" /locus tag="PinkFriday_20" /note=Original Glimmer call @bp 18134 has strength 8.16; Genemark calls start at 18134 /note=SSC: 18134-18460 CP: yes SCS: both ST: SS BLAST-Start: [hol-like chemotaxis [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 8.00579E-71 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.049, -3.3633184689272606, yes F: membrane protein SIF-BLAST: ,,[hol-like chemotaxis [Arthrobacter phage Pumancara] ],,YP_009602886,100.0,8.00579E-71 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Yao, Alice /note=Auto-annotation: Both Glimmer and Genmark call gene at 18134. Good Z score (3.049 > 2). Least negative among the Final Scores and there is the best (-3.363). /note=Coding Potential: There is high coding potential in this ORF in the forward strand. Coding potential is found in both Genemark Self and Host. /note=SD (Final) Score: -3.363 (Best option since it is the least negative among the gene candidates). /note=Overlap: -1 (There is a slight overlap but it should not be a huge concern) but because it is a -1 gap, this could also be an operon. /note=Phamerator: Pham 1160. Date: 10/31/2023. It is conserved. Found in Beethovan and AppleCider. /note=Starterator: Start Site 5 in Starterator was manually annotated in 63/79 non draft genes in this pham. Start 5 is 6305 in Aledel. The stop site agrees with Glimmer and GeneMark. /note=Location call: Based on everything that was stated above, this is probably a real gene and it most likely has the start site of 18134. /note=Function call: This cannot be determined because CDD and HHpred do not have relevant hits. There was also nothing in the Conserved Domain Database in PECAAN. /note=Transmembrane domains: DeepTMHMM predicts just one TMD. Based on this evidence this gene can be assumed to have a real TMD and is, therefore, a “membrane protein”, but we cannot determine precisely the function of this gene based on CDD or HHpred. /note=Secondary Annotator Name: TUBEILEH, SHAREEF ASHRAF /note=Secondary Annotator QC: I agree with the primary annotation considering that evidence from most sources like HHpred and PhagesDB seems inconclusive. Since this protein seems to have one TMD, then it can be classified as a membrane protein. CDS 18457 - 19320 /gene="21" /product="gp21" /function="endolysin" /locus tag="PinkFriday_21" /note=Original Glimmer call @bp 18457 has strength 12.19; Genemark calls start at 18457 /note=SSC: 18457-19320 CP: yes SCS: both ST: SS BLAST-Start: [amidase [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.782, -3.925516065825927, yes F: endolysin SIF-BLAST: ,,[amidase [Arthrobacter phage Pumancara] ],,YP_009602887,99.6516,0.0 SIF-HHPRED: N-acetylmuramoyl-L-alanine amidase amiD; ZINC AMIDASE, PGRP, Peptidoglycan Recognizing Protein, AmpD, N-ACETYLMURAMYL-L-ALANINE AMIDASE, Cell wall biogenesis/degradation, Hydrolase, Lipoprotein, Membrane, Metal-binding; HET: GOL, AH0; 1.75A {Escherichia coli},,,3D2Y_A,42.1603,99.3 SIF-Syn: Holin protein immediately downstream and endolysin function is conserved in Pumancara, Lucy, Aledel, and Albanese. /note=Primary Annotator Name: Wang, Jordan /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 18457. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. /note=SD (Final) Score: -3.926. There is a better final score for a start site at 18673. /note=Gap/overlap: -4. Conserved by other phages such as Aledel and Bennie, and also indicative of an operon. /note=Phamerator: pham: 85367. Date 11/01/2023. It is conserved; found in Albanese (AK) and Aledel (AK). /note=Starterator: Start site 3 in Starterator was manually annotated in 78/78 non-draft genes in this pham. Start 3 is 18457 in PinkFriday. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 18457. /note=Function call: Endolysin. The top three phagesdb BLAST hits have the function of endolysin (E-value <10^-151), and 5 out of 5 top NCBI BLAST hits also have the function of endolysin. (100% coverage, 99%+ identity, and E-value =0). HHpred had a hit for an endolysin protein with 99.3% probability, 42% coverage, and E-value of 1.4e-10. CDD suggested family PGRP with an E value of 3.75e-12. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Kathiravan, Anoushka /note=Secondary Annotator QC: Looking over all of the evidence I agree with the assessment provided. This is a real gene with the start site of 18457. Note: Make sure to fill out the Starterator drop down menu and coding potential. CDS 19339 - 19839 /gene="22" /product="gp22" /function="holin" /locus tag="PinkFriday_22" /note=Original Glimmer call @bp 19339 has strength 8.38; Genemark calls start at 19339 /note=SSC: 19339-19839 CP: yes SCS: both ST: SS BLAST-Start: [holin [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 6.20835E-113 GAP: 18 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.225, -4.190250106893126, yes F: holin SIF-BLAST: ,,[holin [Arthrobacter phage Pumancara] ],,YP_009602888,100.0,6.20835E-113 SIF-HHPRED: SIF-Syn: Holin protein, upstream gene is endolysin, downstream is DNA binding protein, just like in phage Bodacious. /note=Primary Annotator Name: Richard, Ketan /note=Auto-annotation: Glimmer and GeneMark both call the start site at 19339 /note=Coding Potential: Coding potential in this ORF is on the forward strand primarily, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.190. It is the best final score on PECAAN. /note=Gap/overlap: The gap associated with the best final score is 18 base pairs. This is ultimately reasonable because the gap is conserved in another phage (AppleCider) and this is a small enough gap that it is unlikely that a gene is present in the gap. /note=Phamerator: Phamerator: pham: 85369. Date 10/28/2023. It is conserved; found in Bodacious (AK) and Huntingdon (AK). /note=Starterator: Start site 9 in Starterator was manually annotated in 8/78 genes in this pham. Start 9 is 19339 in PinkFriday. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 19339. /note=Function call: Holin protein. The top phagesdb BLAST hits have the function of holin protein (E-value <4e-61), and a majority of the NCBI BLAST hits also have the function of holin protein. (100% coverage, 100% identity, and E-value <5e-100). With 100% identity, we can conclude that the function is holin. CDD and HHpred information were not helpful since the E-values were so high. /note=Transmembrane domains: DeepTMHMM predicts 3 TMDs, therefore it is a membrane protein which makes sense for holin. /note=Secondary Annotator Name: Sass, Arielle /note=Secondary Annotator QC: /note= /note=Primary Annotator Name: Richard, Ketan Leonard /note=Auto-annotation: Glimmer and Genemark are in agreement that the start site is at 19,339. The start codon for this ORF is ATG. /note=Coding Potential: Within the putative ORF the gene has significant coding potential predicted by GeneMark. The chosen start site covers the entire extent of the coding potential. /note=SD (Final) Score: The SD score is -4.190, which is the most positive option of the ten first start sites presented in PECAAN. /note=Gap/overlap: The upstream gap of 18 base pairs is reasonable and the start site chosen results in the longest ORF. The resulting length of the gene, 501 base pairs, is reasonable. /note=Phamerator: As of 11/2/2023, this gene is in pham 85369. This Pham has 78 non-draft members, all of which are also in cluster AK such as Albanese (AK) and Aledel (AK). /note=Starterator: Start site 9 is conserved in 22 of 82 genes in this pham and called 40.9% when present, or 8/22 of the genes. The start site is located at base pair 19339 in PinkFriday. /note=Location call: Taken together, the gathered evidence suggests that the gene is real and that the start site for this gene is at base pair 19339. /note=Function call: PhagesDB BLASTp has hits with the suggested function holin protein with small e-values of <10^-89, and NCBI BLASTp has hits with the same suggested function with e-values <10^-110. There were no HHPRED hits with low enough e-values to consider. /note=Transmembrane domains: DeepTMHMM predicted 3 transmembrane domains, one that is 18 amino acids long, one that is 21 amino acids long, and one that is 20 amino acids long. This is consistent with the predicted function of a holin protein since holins are embedded in the bacterial cell membrane. /note=Secondary Annotator Name: Sass, Arielle CDS 20015 - 20323 /gene="23" /product="gp23" /function="helix-turn-helix DNA binding domain" /locus tag="PinkFriday_23" /note=Original Glimmer call @bp 20015 has strength 14.23; Genemark calls start at 20015 /note=SSC: 20015-20323 CP: yes SCS: both ST: SS BLAST-Start: [HTH DNA binding protein [Arthrobacter phage Pumancara] ],,NCBI, q1:s3 100.0% 6.16087E-69 GAP: 175 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.202, -2.274496637415597, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[HTH DNA binding protein [Arthrobacter phage Pumancara] ],,YP_009602889,98.0769,6.16087E-69 SIF-HHPRED: ComR; Streptococcus, Competence, Quorum sensing, ComR, TRANSCRIPTION REGULATOR; 2.9A {Streptococcus suis (strain 05ZYH33)},,,5FD4_B,68.6274,98.5 SIF-Syn: helix-turn-helix DNA binding protein, upstream gene is Colin, downstream gene is NKF, but has Pham number 124339, just like in phage Glenn. /note=Primary Annotator Name: Zamora, Alexandra /note=Auto-annotation: Auto-annotation for both Glimmer and Genemark; Both agree the start site is at 20015; Start codon is called at ATG /note=Coding Potential: The gene has reasonable coding potential predicted; The chosen start site covers all this coding potential. /note=SD (Final) Score: -2.274. The SD score for the chosen start site is the best /note=Gap/overlap: Gap: 175bp. The gap between the upstream gene is reasonable; This start site was chosen over one with a longer reading frame because there is no coding potential within this gap region; The length of the gene is acceptable given the auto-annotated/chosen start site. /note=Phamerator: The pham number for this gene on October 28, 2023 is 85404. The pham in which this gene is conserved is also in other members of the AK cluster, for example it is present in phages Scuttle and Cholula. /note=Starterator: There is a reasonable start site for this gene that is conserved among members of the pham this gene belongs to. The start site number is 4 and the base pair coordinates for it are 20015. The starterator report supports the location call of this gene. /note=Location call: The gathered evidence suggests that the correct start site is ATG at site 20015. The gene is a real gene based on its good coding potential. /note=Function call: helix-turn-helix DNA binding protein. The top two phagesdb BLAST hits have the function of helix-turn-helix DNA binding domain protein (e-value 2e-53), and the top 3 NCBI BLAST hits also have the function of DNA binding protein. (<96% identities, <10e65 e-values). CDD had a relevant hit for domain binding protein with an e-value of 6.4e-5 and 100% coverage. HHpred also had two relevant hits, one for a DNA binding domain and the other for a transcription regulator. Both hits had over a 98% probability and e-values smaller than 1e-5. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Richard, Ketan /note=Secondary Annotator QC: /note=Auto-annotation: Glimmer and GeneMark both call the start site to be 20015 /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. Coding potential is extremely high which is a good indicator that the start and end sites forma gene. /note=SD (Final) Score: -3.274. It is the best final score on PECAAN. /note=Gap/overlap: The gap is about 175 base pairs, but there are not all earlier start sites that would fill the gap and the gap is conserved in other final phages like Bodacious. /note=Phamerator: pham: 85404. 10/28/23. The pham is conserved with other phages in the same cluster like Bodacious. /note=Starterator: start number: 4. This is not the most annotated start number, but it is found in over 20% of the phages in the pham, so it is not unusual. /note=Location call: I agree with Glimmer and GeneMark that this gene is real and the start site is 20015. /note=Function call: helix-turn-helix DNA binding protein. The top two phagesdb BLAST hits have the function of helix-turn-helix DNA binding domain protein which are Temper and Sergei (e-value 2e-53), and the top NCBI BLAST hits also have the function of DNA binding protein. (<96% identities, <10e65 e-values). CDD had a relevant hit for domain binding protein with an e-value of 6.4e-5 and 100% coverage. HHpred also had two relevant hits, one for a DNA binding domain and the other for a transcription regulator. Both hits had over a 98% probability and e-values smaller than 1e-5. All of the e-values, % coverage, and final phages suggest that the function is helix-turn-helix DNA binding protein. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. CDS 20316 - 20621 /gene="24" /product="gp24" /function="hypothetical protein" /locus tag="PinkFriday_24" /note=Original Glimmer call @bp 20349 has strength 4.7; Genemark calls start at 20316 /note=SSC: 20316-20621 CP: yes SCS: both-gm ST: SS BLAST-Start: [hypothetical protein FDH62_gp24 [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 9.77495E-68 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.733, -5.578026979270037, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDH62_gp24 [Arthrobacter phage Pumancara] ],,YP_009602890,100.0,9.77495E-68 SIF-HHPRED: SIF-Syn: Displays synteny with at least 2 other non-draft phage genomes (Bennie and Christian). Genes upstream and downstream also have the same function call (helix-turn-helix DNA binding protein downstream and DNA helicase upstream). /note=Primary Annotator Name: Soan, Jessica Hyunsil /note=Auto-annotation start source: Glimmer calls the start at 20349. GeneMark calls the start at 20316. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host within the putative ORF. The chosen start site does include all of the coding potential. /note=SD (Final) Score: Score: -3.544 is the best final score on PECAAN however, it does not correspond to the best start site. Gene is not organized into an operon. The final score for the start site called be GeneMark is -5.578. The z-score for this chosen start site is 1.733. This is not the best z-score but it is not a poor score (Z < 1.6). /note=Gap/overlap: Gap is 25 or -8 based on the two possible start sites, which is not significant enough to predict the presence of a new gene. /note=Phamerator: PinkFriday_24 is found in Pham 85398 as of 10/31/23.There are 80 total members in Pham 85398, 2 are drafts. PinkFiday_24 is conserved in other members of the same cluster AK. In fact, all phages within Pham 85398 are in cluster AK. BigMack_24, Bodacious_24, and Christian_24 are three examples of phages that were compared to PinkFriday_24. There were no functions called for this gene. A gene length of 306 is also the closest to the gene lengths of other phages in the same pham. /note=Starterator: (Start: 8 @20316 has 66 MA`s) is conserved among most members of the pham. There are 80 total members with 2 draft members. Start 8 (20316) is found in 80 of 80 ( 100.0% ) of genes in pham. Manual Annotations of this start: 66 of 78. Called 83.8% of time when present. The length of the gene is acceptable. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 20316 bp. There is stronger evidence supporting GeneMark’s start. /note=Function call: Multiple phagesDB BLAST has hits with the suggested function “function unknown” with small e values of 2e-54 to 4e-53. Both NCBI BLAST and Phagesdb BLAST conclude a function equivalent to a “hypothetical protein” or of “unknown function”. This does not mean the protein is not real, just the function has not been determined. Draft proteins were not used. /note=HHPRED has hits that do not correspond to unique SEA-PHAGES requirements for this gene. Has alignment with a few phages with low probability (<70%). CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDS, therefore it is not a membrane protein. /note=Secondary Annotator Name: Giusti, Alessia /note=Secondary Annotator QC: I have QC’ed this location and functional call and agree with the first annotator. /note= /note=Primary Annotator Name: Soan, Jessica Hyunsil /note=Auto-annotation start source: Glimmer calls the start at 20349. GeneMark calls the start at 20316. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: The highest FS is -3.544 (start site 20379), but does not correspond to the start site called by GeneMark or by Glimmer. Between the start sites called by Glimmer and GeneMark, GeneMark’s call of 20316 has a better FS (GeneMark: -5.578; Glimmer: -6.569) and better Z-Score (GeneMark: 1.733; Glimmer: 1.488). GeneMark’s 20316 call produces an acceptable Z-Score greater than 1.6 (1.733). /note=Gap/overlap: The gap for the start site called by GeneMark is -8 and the gap for the start site called by Glimmer is 25. Neither is an incredibly common gap, but the gap of -8 appears to be more acceptable, as it is conserved in other phages of the same cluster (AK), such as Albanese and Daiboju. Neither gap would call for the addition of a new gene. /note=Phamerator: pham: 122715. Date 11/3/2023. It is conserved; found in Aledel, CristinaYang, and Daijobu, all of which are in cluster AK. /note=Starterator: Start number 8 corresponds to 20316 in PinkFriday. It was manually annotated as the start site at the highest frequency (66 times). All genes in this pham have this start site and 66/78 of the non-draft genomes call it. This evidence agrees with the start site called by GeneMark (20316). /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 20316. /note=Function call: All BLASTp PhagesDB hits with significant e-values correspond to function unknown and all BLASTp NCBI hits correspond to hypothetical proteins. CCD returned no hits and all HHpred hits have e-values >14 and probabilities <73%. As such, the function of this protein cannot be determined at this time. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Giusti, Alessia /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 20614 - 22011 /gene="25" /product="gp25" /function="DNA helicase" /locus tag="PinkFriday_25" /note=Original Glimmer call @bp 20761 has strength 4.78; Genemark calls start at 20674 /note=SSC: 20614-22011 CP: yes SCS: both-cs ST: SS BLAST-Start: [DNA helicase [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 0.0 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.055, -2.442961286954254, yes F: DNA helicase SIF-BLAST: ,,[DNA helicase [Arthrobacter phage Pumancara] ],,YP_009602891,99.5699,0.0 SIF-HHPRED: DNA repair protein RAD5; DNA binding protein, DNA damage tolerance, Helicase, Snf2 family; 3.3A {Kluyveromyces lactis NRRL Y-1140},,,6L8O_A,98.4946,100.0 SIF-Syn: DNA Helicase, one of the upstream genes is a helix-turn-helix binding domain just like in phages Herb and KittyKat /note=Primary Annotator Name: Akkinepally, Mrudula /note=Auto-annotation: Glimmer (start at 20761) and GeneMark (start at 20674). /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.529 /note=Gap/overlap: -7 bp gap /note=Phamerator: Pham 118822 Date 10/31/2023. It is conserved. Found in KittyKat (AK) and Herb (AK). /note=Starterator: Start site 118 in Starterator was manually annotated in 67/219 non-draft genes in this pham. Start 118 is 20614 in PinkFriday. This evidence disagrees with the site predicted by Glimmer (20761) and GeneMark (20674). /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 20614. /note=Function call: DNA Helicase. 9 out of the top 10 phagesdb BLAST hits have function DNA Helicase (E-value = 0), and the top 2 NCBI BLAST hits have DNA Helicase function. (99% coverage, 98%+ identity, and E-value = 0). HHpred had and CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Garcia, Isabella /note=Secondary Annotator QC: I agree that the gene is real with a likely start site of 20614, due to the starterator info and the fact that it results in the longest reasonable ORF. CDS 22169 - 22357 /gene="26" /product="gp26" /function="hypothetical protein" /locus tag="PinkFriday_26" /note=Original Glimmer call @bp 22169 has strength 8.05; Genemark calls start at 22169 /note=SSC: 22169-22357 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDH62_gp26 [Arthrobacter phage Pumancara] ],,NCBI, q1:s29 100.0% 1.01416E-36 GAP: 157 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.298, -1.993391246735709, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDH62_gp26 [Arthrobacter phage Pumancara] ],,YP_009602892,68.8889,1.01416E-36 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Giusti, Alessia /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start at 22169. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -1.993. It is the best final score on PECAAN. The Z-score is also the best at 3.298. /note=Gap/overlap: Gap: 157bp. Large, but ultimately reasonable because the gap is conserved in other phages (Makoto, Maria1952, and Temper16) and there is no coding potential in the gap that might be a new gene. /note=Phamerator: pham: 85643. Date 11/1/2023. It is conserved; found in Aledel, CristinaYang, and Daijobu, all of which are in cluster AK. /note=Starterator: Start site 18 was manually annotated in 34 of the 52 non-draft genes in the pham, Start 18 is 22169 in PinkFriday. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 22169. /note=Function call: All 76 PhagesDB BLAST hits were returned as function unknown and all NCBI BLAST hits were returned as “hypothetical protein.” There were no CCD hits and HHpred returned hits that had E-values >15 and probabilities <70%. Some had decent coverage (>80%), but had E-values >15, so were unusable. As such, the function of this gene is not able to be determined at this time. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Dweik, Qaiss /note=Secondary Annotator QC: I agree with the primary annotator`s statements regarding the validity of the gene and the start site position at bp 22169. The primary annotations are brief but well thought-out and all signs point to a start site at bp 22169. I also agree with not assigning any function to this gene, as no hits with defined functions were produced on PhagesDB or NCBI and no significant hits were found on HHpred/CDD. Also, no TMDs were found, meaning this gene`s function cannot be designated as a membrane protein. CDS 22354 - 22554 /gene="27" /product="gp27" /function="hypothetical protein" /locus tag="PinkFriday_27" /note=Original Glimmer call @bp 22354 has strength 2.65 /note=SSC: 22354-22554 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein FDH58_gp28 [Arthrobacter phage HunterDalle] ],,NCBI, q5:s1 93.9394% 1.30077E-24 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.417, -4.615406096948978, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDH58_gp28 [Arthrobacter phage HunterDalle] ],,YP_009602648,82.2581,1.30077E-24 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Joo, Hannah /note=Auto-annotation: Glimmer calls the start site 22354. GeneMark does not call the gene. /note=Coding Potential: There is reasonable coding potential throughout the gene, across different open reading frames /note=SD (Final) Score: -4.615. It is the best final score on PECAAN. /note=Gap/overlap: Overlap: 4bp. This overlap is acceptable and indicates the gene is most likely part of an operon. /note=Phamerator: 87101. Date 11/01/2023. It is found in Eunoia (AK) and Aledel (AK). /note=Starterator: Start site 4 was manually annotated in 7/14 non-draft phages in this pham. Start 4 is 22354 in Pink Friday. This evidence agrees with the site predicted by Glimmer. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 22354. /note=Function call: Function unknown. PhagesDB and NCBI BLAST hits with unknown/hypothetical proteins with e-values ranging from 1e-24 to 3-10. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note= /note=Secondary Annotator Name: Ebrahimi, Shaghaiegh Mariam /note=Secondary Annotator QC: Annotations look accurate and I agree with the start site call. Pham number is 87107 as of 11/3/23. I agree with the function call and see no known hits. /note=Secondary Annotator Name: Ebrahimi, Shaghaiegh Mariam /note=Auto-annotation: 22354 start site: Genemark and Glimmer agree /note=Coding Potential: coding potential found in self-GeneMark and Host-GeneMark and is only in the forward gene with no large gaps. Potential is spread across different reading frames /note=SD (Final) Score: -4.615 is largest among the options /note=Gap/overlap: Overlap is reasonable -4, and is indicative of an operon /note=Phamerator: The pham number as of Nov 3, 2023 is 87107. The gene is conserved in 14 other phages, all in the AK cluster. Conserved in Eunoia and Aledel. /note=Starterator: There are 14 non-draft members of this Pham (15 total phages). 7/14 non-draft members call start site 4, which correlates to a start site of 22354. Most called was start site 4. /note=Location call: Evidence points to this being a real gene with the most likely start site being 22354 which was called by Glimmer and Genemark. Starterator had most annotated at start 4 which was chosen /note=Function call: No known function. PhagesDB and NCBI BLAST only have hits with hypothetical proteins with e-values ranging from 1e-24 to 3-10. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. CDS 22551 - 23180 /gene="28" /product="gp28" /function="hypothetical protein" /locus tag="PinkFriday_28" /note=Original Glimmer call @bp 22551 has strength 5.22; Genemark calls start at 22551 /note=SSC: 22551-23180 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDH62_gp28 [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 8.14646E-139 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.055, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDH62_gp28 [Arthrobacter phage Pumancara] ],,YP_009602894,94.7368,8.14646E-139 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Garcia, Isabella /note=Auto-annotation: Glimmer and Genemark are in agreement that the start site is at 22551. The start codon for this ORF is ATG. /note=Coding Potential: Within the putative ORF the gene has significant coding potential predicted by GeneMark. The chosen start site covers the extent of the coding potential. /note=SD (Final) Score: The SD score is -2.443, which is the most positive option of the two possible start sites presented in PECAAN. /note=Gap/overlap: The overlap of 4 base pairs is favorable and indicates the gene may be in an operon. The start site chosen results in the longest ORF. The resulting length of the gene, 630 base pairs, is reasonable. /note=Phamerator: As of 11/2/2023, this gene is in pham 1061. This Pham has 90 non-draft members, most of which are also in cluster AK such as Albanese (AK), and 15 of which were in cluster EJ such as SBlackberry (EJ). /note=Starterator: Start site 9 was the most annotated start. It is conserved in 82 of 90 non-draft genes in the pham, and called in 76 of the 82 non-draft genes in which it was present. The start site is located at base pair 22551 in PinkFriday. /note=Location call: Taken together, the gathered evidence suggests that the gene is real and that the start site for this gene is at base pair 22551. /note=Function call: Given the data, I hypothesize that there is no known function at this time. There was conflicting results concerning the function from PhagesDB and inconclusive (hypothetical protein) function on NCBI BLAST. The fact that no significant hits could be returned on HHpred or CDD makes it difficult to determine a function for this protein. /note=Transmembrane domains: There are no TMDs present. /note=Secondary Annotator Name: Sass, Arielle /note=Secondary Annotator QC: /note=Primary Annotator Name: Garcia, Isabella /note=Auto-annotation: Glimmer and Genemark are in agreement that the start site is at 22551. The start codon for this ORF is ATG. /note=Coding Potential: Within the putative ORF the gene has significant coding potential predicted by GeneMark. The chosen start site covers the extent of the coding potential. /note=SD (Final) Score: The SD score is -2.443, which is the most positive option of the two possible start sites presented in PECAAN. /note=Gap/overlap: The overlap of 4 base pairs is favorable and indicates the gene may be in an operon. The start site chosen results in the longest ORF. The resulting length of the gene, 630 base pairs, is reasonable. /note=Phamerator: As of 11/2/2023, this gene is in pham 1061. This Pham has 90 non-draft members, most of which are also in cluster AK such as Albanese (AK), and 15 of which were in cluster EJ such as SBlackberry (EJ). /note=Starterator: Start site 9 was the most annotated start. It is conserved in 82 of 90 non-draft genes in the pham, and called in 76 of the 82 non-draft genes in which it was present. The start site is located at base pair 22551 in PinkFriday. /note=Location call: Taken together, the gathered evidence suggests that the gene is real and that the start site for this gene is at base pair 22551. /note=Function call: There is no known function for this gene, based on hits from PhagesDB and NCBI BLASTp with high % identity (>90%), and low E-values (<10^-116). There were no HHPRED hits with low enough e-values to consider. /note=Transmembrane domains: There are no TMDs predicted by DeepTMHMM /note=Secondary Annotator Name: Sass, Arielle CDS 23188 - 23679 /gene="29" /product="gp29" /function="hypothetical protein" /locus tag="PinkFriday_29" /note=Original Glimmer call @bp 23188 has strength 7.48; Genemark calls start at 23188 /note=SSC: 23188-23679 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein FDH62_gp29 [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 1.68126E-114 GAP: 7 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.202, -2.9617282384803714, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDH62_gp29 [Arthrobacter phage Pumancara] ],,YP_009602895,100.0,1.68126E-114 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hernandez, Sarah /note=Auto-annotation: both Glimmer and Genemark say 23188, ATG /note=Coding Potential: The gene has reasonable coding potential predicted in the forward ORF and the start site appears to cover all of the coding potential /note=SD (Final) Score: -2.962, this was the best score in PECAAN /note=Gap/overlap: 7, this is a small gap which is within the acceptable range /note=Phamerator: 44621, date 11/1/23, it is conserved, found in Beethoven (AK) and Albanese (AK) /note=Starterator: Start site 8 was manually called, annotated in 6 out of 77 genes in this pham, start number 8 is 23188 in Pink Friday, this matches the start site predicted by Glimmer and Genemark /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 23188 /note=Function call: NKF /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Garcia, Isabella /note=Secondary Annotator QC: I agree with the annotation. This is a real gene with a likely start site of 23188. CDS 23660 - 24220 /gene="30" /product="gp30" /function="RuvC-like resolvase" /locus tag="PinkFriday_30" /note=Original Glimmer call @bp 23660 has strength 9.86; Genemark calls start at 23672 /note=SSC: 23660-24220 CP: yes SCS: both-gl ST: SS BLAST-Start: [RuvC-like resolvase [Arthrobacter phage Albanese]],,NCBI, q1:s1 97.8495% 7.77695E-115 GAP: -20 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.173, -4.826259462263871, yes F: RuvC-like resolvase SIF-BLAST: ,,[RuvC-like resolvase [Arthrobacter phage Albanese]],,UYL86760,90.8108,7.77695E-115 SIF-HHPRED: NtMOC1; chloroplast, resolvase, PLANT PROTEIN; 2.02A {Nicotiana tabacum},,,6KVN_A,86.5591,99.8 SIF-Syn: /note=Primary Annotator Name: Bidzan, Hanna /note=Auto-annotation: Glimmer has a start site of 23660 and Genemark has a start site of 23672. They also list different start codons ATG and GTG. This could indicate that our gene is not real. /note=Coding Potential: Good coding potential. Start site covers coding potential. /note=SD (Final) Score: The best RBS final score is -4.495 and the z-value is 2.08. This start site at 24128 has the best RBS score. /note=Gap/overlap: There is a 93bp gap. It is not the longest ORF however as the start at 23660 is. /note=Location call: I think it is best to use the start site of 24128 since it has the best RBS and SD score. /note=Phamerator: As of 10/31/23, this gene is a part of pham 114949 and a part of cluster AK along with 77 other phages such as Suppi 34 and Scuttle 34. /note=Starterator: There is a highly probably start site at 22 at basepair position 23660 along with 85 of 88 nondraft genes /note=Location call: This is a real gene with a start site at basepair 23660 /note=Function call: According to the BLAST findings, there appears to be insufficient evidence to establish a hypothesis regarding the function. All genes were identified as having an unknown function with relatively low scores. Nevertheless, Arthrobacter phage Aledel exhibited a notably high e-score, suggesting a potential function. Additional research is needed to explore this further. NKF /note=Transmembrane domains: No TMDs by DeepTMHMM /note=Secondary Annotator Name: Jacobs, Sarisha /note=Secondary Annotator QC: I do not agree with the primary annotator start site in the notes section (may have been a typo). I called start at 23660. The annotator may want to refresh starterator as and recheck the number of members in the pham. They also may want to check the starting codons of the start sites suggested by Glimmer and Genemark. I would also recommend including that the final score of the ORF they selected is not the best final score listed on PECAAN. /note= /note=11/17: I agree with the primary annotator that there is insufficient evidence to call a function for this protein. While there are hits in HHPred for an endodeoxyribonuclease, there is insufficient evidence when comparing synteny and analyzing blasts that this is the function shared with other phages. Therefore I cannot confidently call this a function. The only thing I would change in the primary annotator`s notes is that Adel Phage`s version of this protein is also a hypothetical protein and does not suggest there is a function. CDS 24285 - 24500 /gene="31" /product="gp31" /function="hypothetical protein" /locus tag="PinkFriday_31" /note=Original Glimmer call @bp 24285 has strength 14.08; Genemark calls start at 24285 /note=SSC: 24285-24500 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein KDJ06_gp32 [Arthrobacter phage Sergei] ],,NCBI, q1:s1 100.0% 1.04627E-39 GAP: 64 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.161, -4.4057176022952405, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein KDJ06_gp32 [Arthrobacter phage Sergei] ],,YP_010050446,95.7747,1.04627E-39 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sass, Arielle /note=Auto-annotation: Glimmer and Genemark are in agreement that the start site is at 24285. The start codon for this ORF is ATG. /note=Coding Potential: Within the putative ORF the gene has a reasonable coding potential predicted by GeneMark. The chosen start site covers the extent of the coding potential. /note=SD (Final) Score: The SD score is -4.406, which is the most positive options of the three possible start sites presented in PECAAN. /note=Gap/overlap: The upstream gap of 64 base pairs is reasonable. The start site chosen results in the longest ORF. The resulting length of the gene, 216 base pairs, is reasonable. /note=Phamerator: As of 10/30/2023, this gene is in pham 85405. This Pham has 75 non-draft members, all of which are also in cluster AK such as Albanese (AK) and Aledel (AK). /note=Starterator: Start site 3 was conserved and called in 69 of 75 non-draft genes in this Pham. The start site is located at base pair 24285 in PinkFriday. /note=Location call: Taken together, the gathered evidence suggests that the gene is real and that the start site for this gene is at base pair 24285. /note=Function call: There is no known function for this gene, based on hits from PhagesDB and NCBI BLASTp with high query coverage (100%), high % identity (>90%), and low E-values (<10^-38). There were no HHPRED hits with low enough e-values to consider. /note=Transmembrane domains: There are no TMDs predicted by DeepTMHMM. /note=Secondary Annotator Name: Hernandez, Sarah /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 24558 - 24713 /gene="32" /product="gp32" /function="hypothetical protein" /locus tag="PinkFriday_32" /note=Original Glimmer call @bp 24558 has strength 20.14; Genemark calls start at 24558 /note=SSC: 24558-24713 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VALLEJO_36 [Arthrobacter phage Vallejo] ],,NCBI, q1:s1 100.0% 2.42095E-26 GAP: 57 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.298, -2.0111200136961407, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VALLEJO_36 [Arthrobacter phage Vallejo] ],,AOT24126,98.0392,2.42095E-26 SIF-HHPRED: SIF-Syn: NKF, upstream the gene is a protein with NKF, the same case downstream, which is alike the gene order within phage ChewChew /note=Primary Annotator Name: Diaz, Sebastian /note=Auto-annotation: Both auto-annotation programs Glimmer and GeneMark are used. They also both agree on start coordinates 24558 with codon ATG. /note=Coding Potential: Within both the coding potential graphs provided by Host-Trained and Self-Trained GeneMark, there are high consistent peaks throughout the gene’s putative ORF. In addition, the start site does cover all the coding potential. /note=SD (Final) Score: The SD score is -2.011, which is not only an optimal score (a low negative number), but it is also the best score from all the candidate start sites. Therefore there is credible evidence for this site to be considered strong for ribosome binding. /note=Gap/overlap: Based on the start site 24558, this gene has a gap score of 57. This means there is a 57 base pair gap between it and other genes which is considered reasonable. The only other start site candidate creates a smaller ORF (156 v 111) and doubles the gene gap to 102. /note=Phamerator: As of October 31st, 2023 this gene is found within Pham: 2185. The pham is conserved with other members within its cluster, phage BigMack’s gene 32 and phage Beethoven’s gene 36 both have Pham numbers 2185. Unfortunately, the phamerator does not have a function called for this gene. /note=Starterator: Yes, there is a reasonable state site that is highly conserved. The start site number is 6, which corresponds to base pair coordinate 24558. 38/40 nondraft genes call site #6. /note=Location call: I believe this is a real gene due to its conservation within its pham group and its reasonable coding potential. Start site candidate 24558 seems most reasonable. /note=Function call: There is no predicted function for this gene, All CDD and HHpred hits were assigned no function or were unreliable. Therefore no function could be assigned (NKF). /note=Transmembrane domains: This protein is not a membrane protein because it has no transmembrane domains predicted by TMHMM. /note=Secondary Annotator Name: Samudrala, Vaishnavi /note=Secondary Annotator QC: /note=Primary Annotator Name: Diaz, Sebastian /note=Auto-annotation: Glimmer and GeneMark. Both call the start site at 24558 (codon: ATG). /note=Coding Potential: The gene does have a high, reasonable coding potential on the forward strands for which it’s transcribed. The start site chosen by both Glimmer and GeneMark completely covers all the coding potential. /note=SD (Final) Score: The SD score for the suggested start site by Glimmer and GeneMark is the highest and best SD score (-2.011) and z-score (3.298). The RBS score is irrelevant to the data, however, since there is a 57 bp gap between the gene and an upstream gene. /note=Gap/overlap: There is a 57 base pair overlap between the gene and another upstream gene. This means the site is neither a part of a gene or operon and reduces the space needed for a promoter, increasing suspicion. The length of the gene, however, is 155 bps long which is the minimum length needed for adequate coding of a protein (greater than 120 bps). Since the chosen start site by both Glimmer and GeneMark covers all of the coding potential,has the best SD scores, and allows for an adequate gene length. No alternative start site needs to be considered. /note=Phamerator: The pham the gene is found in is 2185 as of 11/3/23. The pham in which the gene is conserved is in other members of the same cluster as PinkFriday (AK). Phages used for comparison include Beethoven_36 (AK), Bennie_32 (AK), BigMack_32 (AK), Bodacious_33 (AK), and ChewChew_33 (AK). The Pham Maps/ Phamerator data doesn’t call a function of this gene. /note=Starterator: The reasonable start site, that is conserved among the members of the pham to which the gene belongs, is site 6 (at base pair coordinate 24558 for PinkFriday). There are 42 total members in this pham of which 2 are drafts. One of the drafts includes PinkFriday. 36 non- draft members out of the 40 non-draft genes call the most conserved start site. The Starterator is not informative. Although, it indicates that the start site called by both Glimmer and GeneMark is not only the most conserved in a large number of members in the pham and that it’s also been manually annotated as a start site 36 times, close to all of the members of the pham have an identical start site. More members of the pham needed to be tested in the Starterator for it to be more informative. /note=Location call: The gene examined is a real gene as it has high coding potential on the forward strands it’s transcribed on. The gene is also conserved on Phamerator in other phages. Both Glimmer and GeneMark suggest that the site 24558 is the start site of the gene. It`s the most conserved start site in other phages of the same pham as the gene (and the same cluster as PinkFriday) and it covers all of the coding potential on the forwards strands. Although the Starterator data also supports through some (36) manual annotations that the 24558 site is the actual start site, there is a lack of diversity and number of phages that make up the pham of the gene being tested so the data is uninformative. The RBS score is also irrelevant and cannot be interpreted without further testing or experimentation. Overall, start site 2455 is the best start site candidate. /note=Function call: Both PhagesDB BLASTp and NCBI BLASTp results show there are multiple phages (at least 4) high percentage identity values (>92%) and low E-values (<1e-7). PhagesDB BLASTp significant hits call the function of the gene as unknown. NCBI BLASTp results also show high query coverage values (>100%) for 2 of the pages but also low E-values (<1e-7). CDD and HHPred show no significant hits. /note=Transmembrane domains: There are no transmembrane domains predicted by DeepTMHMM. The protein product cannot be called as a membrane protein. /note=Secondary Annotator Name: Samudrala, Vaishnavi /note= /note=Note to primary annotator: I agree with all of your calls, great work! Just make sure to add information about the significant hits you had for BLASTp results from the PhagesDB and NCBI databases in your function call section. CDS 24885 - 25187 /gene="33" /product="gp33" /function="hypothetical protein" /locus tag="PinkFriday_33" /note=Original Glimmer call @bp 24885 has strength 5.03; Genemark calls start at 24885 /note=SSC: 24885-25187 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ALEDEL_33 [Arthrobacter phage Aledel] ],,NCBI, q1:s1 100.0% 2.27118E-61 GAP: 171 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.038, -2.5588120788329394, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEDEL_33 [Arthrobacter phage Aledel] ],,AZF98657,99.0,2.27118E-61 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Dweik, Qaiss /note=Auto-annotation: Both Glimmer and GeneMark call and agree on the start site at bp 24885 with a start codon of ATG (Met). /note=Coding Potential: Gene has great coding potential within the putative ORF, as indicated by Self and Host GeneMark maps with high levels of typical coding potential. The start site does cover all of the coding potential, as it is placed to the left of the beginning of the high peaks of the coding potential and the stop site is placed to the right on the maps. /note=SD (Final) Score: The original start site has the best SD score (least negative) at -2.559 and is not part of an operon due to the large gap preceding the start site. /note=Gap/overlap: The gap between this gene and the preceding one (171 bp) is not reasonable and may be indicative of a missing gene that has yet to be annotated. However, this proposed start site does create the gene with the longest ORF out of all the alternative start site candidates. The proposed length of the gene (303 bp) is acceptable as well. /note=Phamerator: As of 11/01/2023, this gene is found in pham 123276. All of the members present in this pham (123276) are part of the AK cluster. Some of these phages include Aledel, Chridison, Eunoia, HunterDalle, OMalley, and Vulture. There was no function called for this gene. /note=Starterator: There is a reasonable start site for which the genes in this pham (123276) are conserved at start site 1 (which is at bp 24885 for PinkFriday_9). There are 8 non-draft members in this pham and an additional 2 draft members. Of the 8 non-draft members, 5 of them call start site 1. The Starterator program is mostly informative, as very few members in this pham are drafts and it provides the number of manual annotations for which the other non-draft genes have called start site 1. However, there are very few members within this pham, which is indicative of a lack of synteny across different phages and their clusters. /note=Location call: The gathered evidence suggests that this is a real gene with the original start site @ bp 24885 being correct (out of the options listed) due to its complete encompassing of the coding potential, its creation of the longest possible ORF of all of the start site candidates, the fact that it provides the smallest gap between this gene and the preceding one out of all options, and its consistency with the Starterator report. However, there likely is a gene that hasn’t been identified just upstream of this gene, as there is a 171 bp gap. /note=Function call: No program has returned any informative results as to the function of this gene and no TMDs were found. /note=Transmembrane domains: No transmembrane domains were predicted, signifying that this gene does not code for a membrane protein. /note=Secondary Annotator Name: Yao, Alice /note=Secondary Annotator QC: Everything is good! CDS 25184 - 25528 /gene="34" /product="gp34" /function="hypothetical protein" /locus tag="PinkFriday_34" /note=Original Glimmer call @bp 25184 has strength 9.41; Genemark calls start at 25184 /note=SSC: 25184-25528 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein KDJ06_gp34 [Arthrobacter phage Sergei] ],,NCBI, q1:s1 100.0% 1.04257E-71 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.185, -4.274735973818826, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein KDJ06_gp34 [Arthrobacter phage Sergei] ],,YP_010050448,95.614,1.04257E-71 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kathiravan, Anoushka /note=Auto-annotation: The start site is predicted to be 25184 in both Glimmer and GeneMark /note=Coding Potential: There is good coding potential as determined by Glimmer and GeneMark evidenced by the gene only having one ORF in the forward direction. /note=SD (Final) Score: -4.275 which is not the best option but is acceptable. /note=Gap/overlap: -4 which indicates that it is part of an operon. /note=Phamerator: The pham number is 1198 on 10/28/23. This is conserved as it is found in BigMack and Bennie. /note=Starterator: Start site 6 is annotated 62 of 78 times. This number is 25184 in PinkFriday which agrees with Glimmer and Genemark. /note=Location call: This is a real gene and the start site is 25184. /note=Function call: NKF. The top ranked BLAST genes all have no known functions. This was also the case in NCBI BLAST. There were no CDD hits and the HHpred hits had a very high e score of .31. /note=Transmembrane domains: Based on the fact that there is no function call for this gene is makes sense that there is no distinctive transmembrane protein. /note=Secondary Annotator Name: Richard, Ketan /note=Secondary Annotator QC: /note=Auto-annotation: Both Glimmer and GeneMark mark the start site as 25184 /note=Coding Potential: There is very high coding potential and all of the coding potential is within the predicted start and stop sites. /note=SD (Final) Score: -4.275. This is the best final score for this gene. /note=Gap/overlap: There is an overlap of -4 which likely indicates that this gene is part of an operon. /note=Phamerator: 1182. 10/28/23. This pham is found in multiple other phages part of the cluster like Bodacious and Huntingdon. /note=Starterator: start number: 6. This start number is part of the most annotated start number and calls the start site to be 25184. Almost 80% of the phages in the pham have this start number. /note=Location call: Based on all the information, I agree with Glimmer and GeneMark that the start site is 25184 and it is a real gene. /note=Function call: The top ranked BLAST genes all have no known functions. This was also the case in NCBI BLAST. The CDD and HHpred hits had extremely high e-values, so they are of no use. So, no known function is the best call. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. CDS 25525 - 25770 /gene="35" /product="gp35" /function="hypothetical protein" /locus tag="PinkFriday_35" /note=Original Glimmer call @bp 25525 has strength 10.4; Genemark calls start at 25525 /note=SSC: 25525-25770 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDH62_gp35 [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 1.16556E-40 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.055, -2.583959800616441, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDH62_gp35 [Arthrobacter phage Pumancara] ],,YP_009602901,88.8889,1.16556E-40 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Aves, Alexandra /note=Auto-annotation: Glimmer and GeneMark called the same start site at 25525, with the start codon listed as ATG. /note=Coding Potential: Both Host Trained and Self Trained Gene Mark showed good coding potential; there are no countering potentials in the reverse frames, and the suggested start and stop sites are outside of the indicated coding potential (coding potential is covered). There is a slight divot shown on the Host Trained, however it doesn’t dip below 0.5 and is relatively insignificant with respect to the entire potential area. /note=SD (Final) Score: This gene has the least negative final score listed, as -2.584 /note=Gap/overlap: The -4 overlap is indicative of an operon and it is of the LORF. /note=Phamerator: The pham number as of 10/29/23 is 121487. This gene is conserved in phages Aledel, CristinaYang, and AppleCider, all in the same cluster as PinkFriday (AK). The function call is unknown. /note=Starterator: The start site 4 was manually annotated in 75/75 non-draft genes in this pham. Start site is 4 is position 25525 in PinkFriday. This evidence agrees with the site predicted by Glimmer and GeneMark /note=Location call: Evidence suggests that 25525 is the correct start call. Based on Glimmer/GeneMark/Starterator start call, the -4 gap, z-score above 2 and the least negative final score, there is strong support for the realness of this gene. /note=Function call: The function of this gene is still unknown, but there are good hits on both NCBI BLASTp and phagesDB BLASTp with high identity percentage (85%) and low e-values (e^-40), especially with that of Arthrobacter phage Pumancara. HHpred and CDD did not show good hits with any known function, as there were no CDD hits and those on HHpred had a lowest e-value of 44. /note=Transmembrane domains: N/A /note=Secondary Annotator Name: AKKINEPALLY, MRUDULA /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 25767 - 26216 /gene="36" /product="gp36" /function="hypothetical protein" /locus tag="PinkFriday_36" /note=Original Glimmer call @bp 25941 has strength 10.36; Genemark calls start at 25767 /note=SSC: 25767-26216 CP: yes SCS: both-gm ST: NA BLAST-Start: [hypothetical protein FDH62_gp36 [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 1.1643E-99 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.781, -3.0194634687155197, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDH62_gp36 [Arthrobacter phage Pumancara] ],,YP_009602902,99.3289,1.1643E-99 SIF-HHPRED: DUF732 ; Protein of unknown function (DUF732),,,PF05305.18,51.0067,99.3 SIF-Syn: /note=Primary Annotator Name: Ebrahimi, Shaghaiegh Mariam /note=Auto-annotation: start sites disagree 25767 start site Genemark, and 25941 Glimmer /note=Likely Genemark site is correct /note=Coding Potential: Coding potential found in Self-GeneMark and Host-GeneMark and is only in the forward gene with no large gaps. /note=SD (Final) Score: -3.019 is largest final score among start sites /note=Gap/overlap: Overlap is reasonable -4, and is indicative of an operon /note=Phamerator: The pham number as of Nov 1, 2023 is 100691. Gene is orpham /note=Starterator: N/A /note=Location call: Evidence points to this being a real gene since without this gene there would be a large gap. The most likely start site is 25767 which was called by Genemark. Glimmer call has considerably lower z-score and final score. /note=Function call: No known function. No good hits with any proteins with known function in PhagesDB, NCBIblast or HHpred. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs and so this is not a transmembrane protein. /note= /note=Secondary Annotator Name: Aves, Alexandra /note=Secondary Annotator QC: Complete and correct /note=Primary Annotator Name: Ebrahimi, Shaghaiegh Mariam /note=Auto-annotation: Glimmer calls the start site at 25941, but GeneMark calls it at 25767. Coding potential leans towards GeneMark’s call. The start codon is called as ATG. /note=Coding Potential: There is reasonable coding potential found in both Self and Host Trained Gene Marks. Both contain a large dip around 25900. /note=SD (Final) Score: This gene has the least negative final score, as -3.019. /note=Gap/overlap: The -4 overlap is indicative of a operon, and it is of the LORF /note=Phamerator: As of 11/03/23, this gene is the only member of pham 100691; it is an orpham. /note=Starterator: N/A /note=Location call: Evidence points to the realness of this gene and its start site being at GeneMark’s call of 25767 as it has the best overlap, z-score, and final score and it covers the entirety of the coding potential. /note=Function call: NKF /note=Transmembrane domains: N/A CDS 26216 - 26458 /gene="37" /product="gp37" /function="membrane protein" /locus tag="PinkFriday_37" /note=Original Glimmer call @bp 26225 has strength 8.01; Genemark calls start at 26216 /note=SSC: 26216-26458 CP: no SCS: both-gm ST: SS BLAST-Start: [hypothetical protein FDH62_gp37 [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 2.83385E-52 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.914, -2.740768097004632, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein FDH62_gp37 [Arthrobacter phage Pumancara] ],,YP_009602903,100.0,2.83385E-52 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Samudrala, Vaishnavi /note=Auto-annotation: Glimmer and GeneMark. Have conflicting start site predictions as Glimmer predicts the site being at 26225 and GeneMark predicts it at 26216. The starting codon for both sites is ATG. /note=Coding Potential: Self-trained GeneMark suggests high coding potential near the stop site while the host-trained GeneMark suggests there could be a gap in coding potential upstream of the stop site leading to some inconclusive results. Overall, there could be coding potential since the host-trained GeneMark doesn’t provide enough information to reject the self-trained GeneMark results. Both the GeneMark and Glimmer predicted start sites cover all of the coding potential but the GeneMark start site is more upstream. It is more likely the GeneMark start site covers all of the gene potential as opposed to the Glimmer start site (which could overlap with some of the coding potential). /note=SD (Final) Score: The SD score for the suggested start site by GeneMark has the best RBS final score (-2.741) and a z-score is 2.914. This indicates that there is high binding affinity of the ribosomes with the GeneMark called start site. It’s also higher than average when compared to other RBS values of other start sites on the gene. Overall, the RBS score is irrelevant as there is a 1 bp overlap between the gene and an upstream gene, suggesting the start site is a part of an operon. /note=Gap/overlap: There is a 1 base pair overlap between the gene and another upstream gene. The start site can be reasonably interpreted to be a part of an operon. The length of the gene with the suggested Glimmer start site is 233 bps long which is the minimum length needed for adequate coding of a protein (greater than 120 bps). However, since the chosen start site by GeneMark covers all of the coding potential, has the best SD scores, and also allows for an adequate ORF or gene length of 242 bps, it is the better start site candidate. /note=Phamerator: The pham the gene is found in is 85375 as of 10/31/23. As of 11/7/23: Phages DB shows that this gene’s pham is 85375. The pham that this gene belongs to is present in other members of the cluster that PinkFriday belongs to (cluster AK). The phages used for comparison include Albanese_40 (AK), Aledel_37 (AK), Chridison_36 (AK), Daiboju_37 (AK), Eunoia_37 (AK). There are no functions called for this gene in the Pham Map/ Phamerator. /note=Starterator: There is a reasonable conserved start site among members of the pham for which the gene belongs at site 2. PinkFriday doesn’t have a base pair coordinate for the site since the phage doesn’t contain start site 2. The pham contains 81 members other than PinkFriday and 4 of them are drafts. 53 of the total 78 non-draft members call the most conserved start site. As of 11/7/23: Phages DB shows that this pham has 83 members of which 5 are drafts. /note=Location call: The gene is real since it has high coding potential, a reasonable length, and has strong blast hit values. There are also no sudden changes in the direction that the gene is transcribed or significant overlaps between genes being transcribed in the forward and reverse directions. The gene is also conserved in the Phamerator. The GeneMark called start site at 26216 seems to be the best start site candidate since it covers all of the coding potential, has a higher RBS final score (-2.741) (which could be better assessed as a potential start site through its z-value [which quite high at around 2.914]), and the site is shown on Starterator to have more manual annotations associated with it. /note=Function call: Both PhagesDB and NCBI BLASTp hits show that at least four other phages have low e-values (<1e-7) and high identity values (>81%) to support the conclusion that the gene has no known function. Also, the NCBI BLASTp hits show that the two 2 phages with the strongest hits have high query coverage values (>98.7). Both HHPred and CDD do not give significant hits. Deep TMHMM analysis, however, provides evidence that the protein product from this gene has a transmembrane domain and so it could be called a “membrane protein”. /note=Transmembrane domains: There is one transmembrane domain that has a length of 17 amino acids. The TMD is noted as TMhelix which indicates its not a signaling peptide. In this way, the protein can be called a “membrane protein”. There was a previous function call of “NKF” made before the TMD analysis. This suggests that this information doesn’t conflict with previous predictions of the protein’s function. /note=Secondary Annotator Name: Wang, Jordan /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator regarding function and potential presence in the membrane. CDS 26442 - 26642 /gene="38" /product="gp38" /function="hypothetical protein" /locus tag="PinkFriday_38" /note=Original Glimmer call @bp 26442 has strength 7.3; Genemark calls start at 26469 /note=SSC: 26442-26642 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein FDH62_gp38 [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 1.06866E-37 GAP: -17 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.569, -3.546978408003887, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDH62_gp38 [Arthrobacter phage Pumancara] ],,YP_009602904,96.9697,1.06866E-37 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Qin, Kaley /note=Auto-annotation: Glimmer and GeneMark call two different start sites. Glimmer calls 26422 with a start codon of ATG while GeneMark calls 26469 with a start codon of GTG. Both start codons are relatively common. /note=Coding Potential: All coding potential is included within Glimmer’s called start site for both host-trained and self-trained GeneMark. All coding potential is also included within GeneMark’s called start site for both host-trained and self-trained GeneMark. The coding potential in both versions of GeneMark is reasonable. There is no reasonable coding potential on the reverse strand in host-trained and self-trained GeneMark. /note=SD (Final) Score: Glimmer’s start site has a final score of -3.547 and a z-score over 2. GeneMark’s start site has a final score of -5.201 and a z-score over 2. Since Glimmer’s called start site has a final score is closer to 0 and z-score is larger than GeneMark’s, so a start site of 26442 is more likely. /note=Gap/overlap: The gap of -17 from Glimmer’s called start site is conserved in other AK phages including AppleCider and ChewChew. This makes Glimmer’s called start site more likely. /note=Phamerator: pham: 119032. Date 11/10/2023. It is well-conserved in phages from cluster AK. Two examples of AK phages where this pham is conserved is AppleCider (AK) and ChewChew (AK). /note=Starterator: Start site 10 is not the most manually annotated start site in this pham. It is manually annotated in 10 of 72 phages; however, it is called 100% of the time when it is present. Start site 10 corresponds with start site 26422 in PinkFriday which agrees with the start site predicted by Glimmer. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 26422. /note=Function call: NKF. On phagesDB pBLAST, the top two hits have significant e-values < 10e-6 (Pumancara’s e-value is 4e-29 and Daiboju’s e-value is 2e-24) for unknown function proteins. NCBI pBLAST confirms these unknown function protein hits for Pumancara (100% coverage, 97% alignment, and e-value of 1.07e-37)and Daiboju (100% coverage, 89% alignment, and e-value of 2.7e-31). There were no hits from CDD and no informative hits from HHpred (all the e-values were well above a reasonable threshold of 10e-3). So, we can only call the function as NKF. /note=Transmembrane domains: DeepTMHMM predicts no TMDs. So, it is not a membrane protein. /note=Secondary Annotator Name: Diaz, Sebastian /note=Secondary Annotator QC: /note=Auto-annotation: Both the auto-annotation programs call this gene but Glimmer calls the start site at 26442 with codon ATG and GeneMark calls this gene with start site at 26469 and start codon GTG. /note=Coding Potential: The gene does posses good coding potential on both the Host-Trained and Self-Trained graphs, however the activity doesn’t peak until around the GeneMark start site, but there is atypical activity around the Glimmer start site. Overall there is enough evidence within this putative ORF. /note=SD (Final) Score: This gene’s potential start site called by Glimmer posses the best SD score with ‘-3.547’ and it also has a Z-score of 2.569 (which is the best from all the potential candidates). /note=Gap/overlap: For the auto-annotated Glimmer start site, the gene has a -17 overlap, this site also contains the longest ORF, the next best gene start site (the autoannotated GeneMark site) has a 10 base pair gap and as a result as a slightly smaller ORF. For these reasons, the Glimmer start site is preferred. /note=Phamerator: As of November 3rd, 2023 this gene is found within Pham 119032. It is conserved in other phages within the subcluster like Daijobu and Herb. There was no function called for all the genes within this pham. /note=Starterator: Yes, there is a reasonable start site that is highly conserved which is start site 12 found within 56 of 72 nondraft genes. However, PinkFriday’s start site is number 10 (corresponds to base pair 26442), which is conserved in 10 of 72 nondraft genes. /note=Location call: I believe this is a real gene due to its conservation within its pham group and its reasonable coding potential. The gene’s potential start site candidate #10/26642 seems the most reasonable. /note=Function call: The predicted function for this gene is unknown. There could not be any conclusion drawn due to no CDD hits called as well us uninformative HHpred hits. There were hits from phagesDB BLASTp with reasonable e values but they were all associated with no known function. Therefore this gene’s function remains unknown. /note=Transmembrane domains: This protein is not a membrane protein because it has no transmembrane domains called by TMHMM. CDS 26620 - 27570 /gene="39" /product="gp39" /function="Cas4 exonuclease" /locus tag="PinkFriday_39" /note=Original Glimmer call @bp 26620 has strength 14.13; Genemark calls start at 26620 /note=SSC: 26620-27570 CP: yes SCS: both ST: SS BLAST-Start: [exonuclease [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 0.0 GAP: -23 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.116, -2.9669382204036308, yes F: Cas4 exonuclease SIF-BLAST: ,,[exonuclease [Arthrobacter phage Pumancara] ],,YP_009602905,100.0,0.0 SIF-HHPRED: PDDEXK_1 ; PD-(D/E)XK nuclease superfamily,,,PF12705.11,90.5063,99.9 SIF-Syn: The upstream and downstream genes of other phage genomes are not called and the upstream and downstream genes for pinkfriday are NKF. /note=Primary Annotator Name: Sotelo, Jessie /note=Auto-annotation: Glimmer and GeneMark both agree on the start site of 26620. /note=Coding Potential: Coding potential in this ORF is primarily on the forward strand, with little evidence of coding potential on the reverse strand. Therefore this is likely a forward gene. Coding potential is found on both Genemark Self and Host. /note=SD (Final) Score: -2.967. It is the best final score on PECAAN. /note=Gap/overlap: Overlap: 23 bp. No gap and minimal overlap. /note=Phamerator: Pham: 122637. Date: 11/8/23. It is conserved in many AK phages such as Aledel_39 (AK), Daiboju_39 (AK), and Eunoia_39 (AK). /note=Starterator: Start site 22 in Starterator was manually annotated in 15/107 non-draft genes in this pham. This evidence agrees with the start site that was predicted in Glimmer and Genemark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 26620. /note=Function call: Cas 4 Family exonuclease. The top BLAST hits on Phagesdb and NCBI have the function of Cas 4 family exonuclease or exonuclease. There were significantly more hits with Cas 4 family exonuclease and all had very low e values (0). CDD had a hit for a nuclease family protein within the Cas4 superfamily. HHPRED had a hit for a CRISPR associated exonuclease. /note=Transmembrane domains: DeepTMHMM did not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Richard, Ketan /note=Secondary Annotator QC: /note= /note=Auto-annotation: Both Glimmer and GeneMark call the start site to be 26620. /note=Coding Potential: The coding potential is found primarily on the forward strand and all of the coding potential is within the predicted start and stop sites. /note=SD (Final) Score: -2.967. This is the best final score for the gene. /note=Gap/overlap: There is an overlap of -23 but the other start sites have very large gaps and this overlap is found in other phages from the same cluster like Bodacious. /note=Phamerator: 122637. 10/28/23. This pham is found in other phages from the same cluster like Bodacious. /note=Starterator: Start site 66 in Starterator was manually annotated in 77/292 non-draft genes in this pham. This evidence agrees with the start site that was predicted in Glimmer and Genemark. /note=Location call: Based on the evidence, I agree with Glimmer and GeneMark that the start site is 26620. /note=Function call: Cas 4 Family exonuclease. The top 2 BLAST hits on Phagesdb and NCBI have the function of Cas 4 family exonuclease or exonuclease and they have e-values of 0 (Pumancara and Vulture). CDD had a hit for a nuclease family protein within the Cas4 superfamily. HHPRED had a hit for a CRISPR associated exonuclease. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. CDS 27570 - 27842 /gene="40" /product="gp40" /function="hypothetical protein" /locus tag="PinkFriday_40" /note=Original Glimmer call @bp 27570 has strength 8.27; Genemark calls start at 27570 /note=SSC: 27570-27842 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDH62_gp40 [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 8.89255E-59 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.925, -4.324286579155733, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDH62_gp40 [Arthrobacter phage Pumancara] ],,YP_009602906,100.0,8.89255E-59 SIF-HHPRED: SIF-Syn: NKF, upstream gene is cas4exonuclease, downstream is recalikedna recombinase , just like in phage chewchew. /note=Primary Annotator Name: vazquez, Eunice /note=Auto-annotation: Both Glimmer and Genemark call the gene and they agree with the start site at 27570 bp. /note=Coding Potential: Coding potential in this ORF is found on the forward strand only, indicating this is a forward gene. /note=SD (Final) Score: -4.324. It is the best final score on PEECAN. The z score is 2.925. /note=Gap/overlap: The overlap is -1. Indicating that there is a favorable overlap. /note=Phamerator: 116403. This is the pham number as of october 25th 2023. The gene is conserved in phages Herb, Kingbob, temper16 all in cluster AK. The function for all of these are unknown. /note=Starterator: 68/78 non-draft members call start site 4, which correlates to a start site of 121478 bp. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 27570 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: NKF. The top two BLAST hits on phagesDB had NKF with an evalue of 4e-49 and 9e-46. The top two hits of NCBI had hypothetical function with an e value of. ( 8.89255e-59 and 9.19382e-55). HHpred had no relevant hits. CDD had no relevant hits. /note=Transmembrane domains: There were no TMDs predicted on DeepTMHMM. /note=Secondary Annotator Name: Zamora, Alexandra /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. /note=Auto-annotation start source: Glimmer and Genemark. Both call the start at 27570. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that it is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.324. It is the best final score on PECAAN. /note=Gap/overlap: Overlap: -1 bp. Overall indicates the gene is likely located in an operon. This is an ideal overlap for a phage gene. /note=Phamerator: no pham number found. /note=Starterator: no starterator report. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 27570. /note=Function call: no known function. The top two phages db BLAST hit have no known function with e-values 4e-49 and 9e-46. The top three NCBI BLAST hits also had “hypothetical function” with >94% identity and e-values <9e-54. HHpred had no relevant hits. CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. CDS 27951 - 29000 /gene="41" /product="gp41" /function="RecA-like DNA recombinase" /locus tag="PinkFriday_41" /note=Original Glimmer call @bp 27918 has strength 14.76; Genemark calls start at 27918 /note=SSC: 27951-29000 CP: yes SCS: both-cs ST: SS BLAST-Start: [RecA-like DNA recombinase [Arthrobacter phage Daiboju]],,NCBI, q1:s1 100.0% 0.0 GAP: 108 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.733, -3.120979338716238, no F: RecA-like DNA recombinase SIF-BLAST: ,,[RecA-like DNA recombinase [Arthrobacter phage Daiboju]],,AXH43968,98.2709,0.0 SIF-HHPRED: RecA protein; Recombination, Radioresistance, DNA-repair, ATPase, DNA-binding protein, DNA BINDING PROTEIN; HET: AGS; 2.5A {Deinococcus radiodurans} SCOP: c.37.1.11, d.48.1.1,,,1XP8_A,66.1891,99.6 SIF-Syn: Rec-A like DNA recombinase, genes upstream and downstream are both NKF. /note=Primary Annotator Name: Aguirre, Austin /note=Auto-annotation: Glimmer and GeneMark both have the start sequence set at 27918 with the start codon at ATG. /note=Coding Potential: Coding potential found in the forward strand of only one frame. Start at 27918 perfectly encompasses all of the gene`s coding potential. While 27952 also has strong coding potential, glimmer and gene mark both call 27918, which will be in favor over a TTG start site. /note=SD (Final) Score: -5.570 with a Z score of 2 which supports this call. /note=Gap/overlap: 75 Slightly large, but gap is not too big to evoke any concern. /note=Phamerator: Gene is found in the Pham group 121371 based on data 11/01/23. Phamerator did not show any specific conservation of this gene. This pham has 134 members. There are a few other genes in this Pham belonging to AK, but there is no majority. /note=Starterator: The most common start site was 34, and it was called in 26 of the 125 non-draft genes in the Pham. PinkFriday did not call this start number. The most annotated start in PinkFriday was 19 @27951, but the auto-annotated start number and position was Start: 17 @27918. Overall, starterator was not very informative for this gene. Phages in this group include phages Pumancara and Herb. /note=Location call: Based on the above evidence, 27918 is the start site. The other possible start site, 27951, has a TTG codon which is rare, despite this start site having 20 manual annotations compared to the six found for 27918. Additionally, 27918 is the longest ORF and has a gap of 75 which is way better than 100. Finally, Glimmer and Genemark both called the start site of 27918, and that is more significant than the amount of manual annotations. /note=Function call: Rec-A like DNA recombinase /note=Gene was called as a Rec-A recombinase on phagesDB with an e-value of 0 and 360/360 identities with phage Pumancara. Gene was called again in phagesDB as a Rec-A recombinase with an e-value of 0. /note=NCBIp called a Sak4-like ssDNA annealing protein, but this function is similar to RecA and is not on the approved function list. No CDD hits, but Hpred also called Protein RecA. /note=Transmembrane domains: Not a TMD. This protein is found internally, and this makes sense as its purpose is to repair DNA. /note=Secondary Annotator Name: Soan, Jessica Hyunsil /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. /note= /note=TA Notes: After investigating this function, the HHpred hit did not contain the N or C terminal domain structures which are required to have RecA-like DNA recombinase. SEA-PHAGES advises to call it as an ASCE ATPase. This function is not approved within PECAAN so called it as AAA ATPase. -JS CDS 29060 - 29572 /gene="42" /product="gp42" /function="hypothetical protein" /locus tag="PinkFriday_42" /note=Original Glimmer call @bp 29060 has strength 14.2; Genemark calls start at 29060 /note=SSC: 29060-29572 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDH62_gp42 [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 4.28533E-121 GAP: 59 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.095, -4.81615282661272, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDH62_gp42 [Arthrobacter phage Pumancara] ],,YP_009602908,100.0,4.28533E-121 SIF-HHPRED: DUF669 ; Protein of unknown function (DUF669),,,PF05037.17,78.8235,99.7 SIF-Syn: /note=Primary Annotator Name: Jacobs, Sarisha /note=Auto-annotation: Glimmer and GeneMark Both calls start at 29060. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, making this a bold gene. Coding potential is found on both GeneMark Host and Phagesdb self. /note=SD (Final) Score:-4.816 (The Best final score on PECAAN). /note=Gap/overlap: 59 bp gap. This is a medium-sized gap that is conserved and seen in other phages of the same cluster. There is also no coding potential in the gap. /note=Phamerator: 10/31/2023 pham 114883. It is conserved (113 members) and found in BigMack and Bodacious. /note=Starterator: Start site 12 was manually annotated in 107/107 non-draft genes in this pham. Start 12 is 29060 in PinkFriday, and Genemark and Glimmer support this. /note=Location call: Based on the information above, this gene is a real gene and has a start site at 29060. Evidence from Starterator, Glimmer, and Genemark supports this. /note=Function call: Based on the Blastp results, this gene is found in multiple phages. However, like in all the other phages, there is no assigned function to this gene. There were no CD results, and there only significant Hppred result was structure from PBD that had no known function. All other hits were insignificant and could not be called as a function. This gene has no known function. /note=Transmembrane domains: Deep TMHMM does not predict any TMD, therefore this is not a membrane protein. /note=Secondary Annotator Name: Zamora, Alexandra /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start at 29060. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.816. This is the best final score on PECAAN. /note=Gap/overlap: Gap: 59 bp. A little high, but reasonable considering there is no coding potential in the gap that might be a new gene. /note=Phamerator: pham: 114883. Date 11/10/23. It is conserved; found in Dino (AK) and Scuttle (AK). /note=Starterator: Start site 12 in Starterator was manually annotated in 107/107 non-draft genes in this pham. Start 12 is 30109 in Scuttle. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 29060. /note=Function call: no known function. The top two phagesdb BLAST hits have unknown function (e-values: 9e-97) and the top three NCBI BLAST hits also have the function “hypothetical protein” (>95% coverage and e-values <2e-115). HHpred had no relevant hits. CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. CDS 29646 - 32279 /gene="43" /product="gp43" /function="DNA primase/polymerase/helicase" /locus tag="PinkFriday_43" /note=Original Glimmer call @bp 29646 has strength 11.27; Genemark calls start at 30108 /note=SSC: 29646-32279 CP: yes SCS: both-gl ST: SS BLAST-Start: [DNA primase/polymerase/helicase [Arthrobacter phage Supakev] ],,NCBI, q1:s1 100.0% 0.0 GAP: 73 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.404, -3.893834241316366, no F: DNA primase/polymerase/helicase SIF-BLAST: ,,[DNA primase/polymerase/helicase [Arthrobacter phage Supakev] ],,AZS10420,98.4036,0.0 SIF-HHPRED: DNA primase; Helicase, DNA binding, AMPPNP, REPLICATION; HET: ANP; 3.1A {Staphylococcus aureus},,,7OM0_C,41.3911,100.0 SIF-Syn: DNA primase/polymerase/helicase, upstream gene is NKF, downstream gene is also NKF, just like in Albanese and AppleCider. /note=Primary Annotator Name: Tubeileh, Shareef /note=Auto-annotation: Glimmer and GeneMark. Glimmer calls the gene 29646. GeneMark calls the gene at 30108. /note=Coding Potential: This gene seems like it codes for a gene since all of the coding potential is inside of the start and stop site, and the complementary strand (reverse strand) does not have coding potential. /note=SD (Final) Score: The final score chosen is -3.894, which corresponds to the most annotated start site. /note=Gap/overlap: There is somewhat of a large gap (73), but this is conserved in other phages (Albanese), and there is no coding potential in this gap. /note=Phamerator: Pham is 917, as of 10/28/2023. It is conserved, found in AppleCider (AK) and Albanese (AK). /note=Starterator: Start site 8 is manually annotated in 73/107 non-draft phages. In Albanese, start site 8 is also the chosen start site, but at 31153. /note=Location call: This is likely a real gene, and it is called at 29646 by Glimmer. /note=Function call: DNA helicase/polymerase/primase. A lot of top hits on PhagesDB have this function of DNA primase/polymerase/helicase (E-value of 0), and some NCBI BLAST hits also have this function. (100% coverage, 95%+ identity, and E-value of 0). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Yao, Alice /note=Secondary Annotator QC: Don’t forget to click the boxes you think support your case. (Hint: Do this for function call). Other than that, it’s good! CDS 32664 - 32894 /gene="44" /product="gp44" /function="hypothetical protein" /locus tag="PinkFriday_44" /note=Original Glimmer call @bp 32664 has strength 8.57; Genemark calls start at 32664 /note=SSC: 32664-32894 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDH62_gp44 [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 5.25408E-48 GAP: 384 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.361, -4.811481410788036, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDH62_gp44 [Arthrobacter phage Pumancara] ],,YP_009602910,100.0,5.25408E-48 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Wang, Jordan /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 32664. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. /note=SD (Final) Score: -4.811. There is a better final score of -4.125 at start site 32748. /note=Gap/overlap: 384 bp gap. Relatively large but reasonable due to its conserved nature (Temper 16, Pumancara) and the lack of coding potential in that region. /note=Phamerator: pham: 85395. Date 11/01/2023. It is conserved; found in Albanese (AK) and Aledel (AK). /note=Starterator: Start site 11 in Starterator was manually annotated in 46/76 non-draft genes in this pham. Start 11 does not correspond to a start site in PinkFriday. However Start 12 corresponds to start site 32664. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 32664. /note=Function call: NKF /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Diaz, Sebastian /note=Secondary Annotator QC: /note= /note=Auto-annotation: Both Glimmer and GeneMark call this gene with start site at 32664 and start codon ATG. /note=Coding Potential: The gene possesses strong coding potential throughout its putative ORF in both Host-Trained and Self-Trained graphs, and the start site covers the entirety of the coding potential. /note=SD (Final) Score: This gene’s potential start site (called by Glimmer and GeneMark) has an SD score of -4.811 (this is the second best score; second to an ORF that is about 100bp smaller than the one called by the auto-annotation programs, as a result this start site candidate is dismissble, making the auto-annotated start site the next best and reasonable enough to be considered a credible ribosome binding site). /note=Gap/overlap: For the auto-annotated start site, the gene’s /note=Phamerator: As of November 1st, 2023 this gene is found within Pham 85395. It is conserved in other phages within the subcluster like Joann and Misaeng. The function called was DNA helicase for other members of the pham. /note=Starterator: Yes, there is a reasonable start site that is highly conserved which is start site 11 found within 54 of 80 nondraft genes. However, PinkFriday’s start site is number 12 (corresponds to base pair 32664), which is conserved in 26 of 80 nondraft genes. /note=Location call: I believe this is a real gene due to its conservation within its pham group and its reasonable coding potential. The gene’s potential start site candidate #12/32664 seems the most reasonable. /note=Function call: Unfortunately, this gene’s function could not be called. This is due to the lack of CDD hits, uninformative HHPRED hits with low coverage (40%) and low probability (63%). In addition phagesDB BLASTp hits possessed decent e-values (6e-40 from phage Pumancara) but these genes also had no function assigned. As a result no function can be assigned to this gene. /note=Transmembrane domains: This protein is not a membrane protein because it has no transmembrane domains called by TMHMM. CDS 32897 - 34996 /gene="45" /product="gp45" /function="DNA polymerase I" /locus tag="PinkFriday_45" /note=Original Glimmer call @bp 32897 has strength 14.24; Genemark calls start at 32897 /note=SSC: 32897-34996 CP: yes SCS: both ST: SS BLAST-Start: [DNA polymerase I [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 0.0 GAP: 2 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.625, -3.4892599424135016, no F: DNA polymerase I SIF-BLAST: ,,[DNA polymerase I [Arthrobacter phage Pumancara] ],,YP_009602911,99.7139,0.0 SIF-HHPRED: Apicoplast DNA polymerase; DNA polymerase, exonulease, apicoplast, Plasmodium falciparum, REPLICATION, TRANSFERASE; HET: PEG, EDO; 2.5A {Plasmodium falciparum (isolate 3D7)},,,7SXQ_B,98.8555,100.0 SIF-Syn: DNA polymerase I protein, upstream gene is NKF and the pham is 85395 just like in phage Bodacious, downstream is hydrolase in PinkFriday and HNH endonuclease in Bodacious, but both have the same pham of 124268, so there is synteny. /note=Primary Annotator Name: Richard, Ketan /note=Auto-annotation: Glimmer and GeneMark both call the start site at 32897 /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. The start site is also outside the coding potential which is another positive sign. /note=SD (Final) Score: -3.489. This start site has a very favorable final score. /note=Gap/overlap: The gap associated with this start site is 2 base pairs which is small and does not predict the presence of an operon. This is ultimately reasonable because the gap is conserved in another phage (AppleCider) and this is a small enough gap that it is unlikely that a gene is present in the gap. /note=Phamerator: pham: 101038. Date 10/28/2019. It is conserved; found in Bodacious (AK) and Huntingdon (AK). /note=Starterator: Start site 28 in Starterator was manually annotated in 80/107 genes in this pham. Start 28 is 32897 in PinkFriday. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 32897. /note=Function call: DNA polymerase I. All of the final phagesdb BLAST hits have the function of DNA polymerase I with E-values of 0.0 or approximately 0.0, and all of the NCBI BLAST hits also have the function of DNA polymerase I. (100% coverage, 99%+ identity, and E-value ~0.0). HHpred had multiple hits for members of the DNA polymerase family over 99% probability, over 98% coverage, and E-value of 0.5.5e-64. CDD had one hit in the polymerase family with an E-value of 0. All of this is a good rationale to suggest that the protein is DNA polymerase I. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Aves, Alexandra /note=Secondary Annotator QC: Complete and correct /note=Primary Annotator Name: RICHARD, KETAN LEONARD /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 32897. The called start codon is ATG. /note=Coding Potential: Both Self Trained and Host Trained GeneMarks demonstrate good coding potential in the second forward frame. Additionally, both demonstrate similar variable dips 33800-34100. The start site covers the entirety of the coding potential. /note=SD (Final) Score: This gene has the least negative final score (-3.489). /note=Gap/overlap: There is a gap of 2 which does not indicate an operon, however this gap is negligible and most likely does not need to be filled with a gene. /note=Phamerator: As of 11/1/23, Phamerator calls this gene to be a member of pham 101038. This pham includes many members of the same cluster as PinkFriday (AK), such as Aledel and Bodacious, both of which are final phages. /note=Starterator: As of 10/28/23, start site 28 was the most annotated and was called in 80/107 non-draft genes in the pham, including PinkFriday. Start site 28 is position 32897 in PinkFriday, which agrees with the site predicted by Glimmer and GeneMark. /note=Location call: The evidence above suggests that this gene is real and the start site is most likely 32897 as it is supported by Glimmer, GeneMark and Starterator, and covers all coding potential. /note=Function call: DNA Pol I /note=Transmembrane domains: N/A CDS 35018 - 35407 /gene="46" /product="gp46" /function="HNH endonuclease" /locus tag="PinkFriday_46" /note=Original Glimmer call @bp 35018 has strength 4.52; Genemark calls start at 35018 /note=SSC: 35018-35407 CP: yes SCS: both ST: SS BLAST-Start: [HNH endonuclease [Arthrobacter phage Glenn] ],,NCBI, q1:s3 97.6744% 2.24073E-71 GAP: 21 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.199, -4.897952686918946, no F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage Glenn] ],,YP_009602605,83.5821,2.24073E-71 SIF-HHPRED: HNH endonuclease; Thermophilic bacteriophage, HNH Endonuclease, DNA nicking, HYDROLASE; 1.52A {Geobacillus virus E2},,,5H0M_A,61.2403,98.0 SIF-Syn: HNH endonuclease, upstream gene is DNA polymerase I, downstream gene is Pur-A like adenylosuccinate synthetase, just like in phage Glenn. /note=Primary Annotator Name: Zamora, Alexandra /note=Auto-annotation: Both Glimmer and Genemark showed auto-annotations for this gene; Both agree on the same start site at 35018; The called start codon is ATG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Good coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.898. The SD score is not the best, yet very reasonable to suggest the presence of a credible ribosome binding site /note=Gap/overlap: The gap upstream of the gene is 21 bp which is reasonable given the bp gap limit; this start site is the most reasonable because it provides the shortest gap from the upstream gene, has the longest ORF, and contains a good Z-score and RBS score; The length of the gene is acceptable given the auto-annotated start site and the chosen start site. /note=Phamerator: As of November 3, 2023, this gene is found in pham 121417. There are a few other members of the AK cluster that also belong to this pham, examples are ChewChew and Kittykat. /note=Starterator: There is a reasonable start site for this gene that is conserved among other members of pham 121417. The start site number is 37 and the base pair coordinates for it are 35018. 8 out of 107 members call this as the most conserved start site. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 35018. /note=Function call: HNH endonuclease. The top two phagesdb BLAST hits have the function of HNH endonuclease protein (e-value <1e^-69) and the top four NCBI BLAST hits also have the function of HNH endonuclease protein. (82% identities, and e-value <1e^-69). HHpred had two significant hits for hydrolase proteins. Both hits had a probability higher than 97%, coverage higher than 55%, and e-values lower than 0.0001. CDD had no relevant hits. The gene follows the requirements of an HNH endonuclease protein. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Kathiravan, Anoushka /note=Secondary Annotator QC: Based on the evidence I would somewhat agree that this is a real gene and the most likely start site is 35018. However I am unable to open the Starterator and according to the notes only 8 out of 107 members call this start site most conserved which is fairly low. Note: I would troubleshoot Starterator to reevaluate the evidence. CDS 35476 - 36609 /gene="47" /product="gp47" /function="adenylosuccinate synthetase PurA-like" /locus tag="PinkFriday_47" /note=Original Glimmer call @bp 35476 has strength 10.04; Genemark calls start at 35488 /note=SSC: 35476-36609 CP: yes SCS: both-gl ST: SS BLAST-Start: [2-aminooxy adenylosuccinate synthetase [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 0.0 GAP: 68 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.116, -2.9669382204036308, yes F: adenylosuccinate synthetase PurA-like SIF-BLAST: ,,[2-aminooxy adenylosuccinate synthetase [Arthrobacter phage Pumancara] ],,YP_009602912,98.939,0.0 SIF-HHPRED: c.37.1.10 (A:) Adenylosuccinate synthetase, PurA {Malaria parasite (Plasmodium falciparum) [TaxId: 5833]} | CLASS: Alpha and beta proteins (a/b), FOLD: P-loop containing nucleoside triphosphate hydrolases, SUPFAM: P-loop containing nucleoside triphosphate hydrolases, FAM: Nitrogenase iron protein-like,,,SCOP_d1p9ba_,98.4085,100.0 SIF-Syn: There is synteny. This gene is conserved in at least two other genes of BigMack and Bodacious. Genes upstream and downstream also have the same function call ((HNH endonuclease and NKF upstream). /note=Primary Annotator Name: Jessica Soan /note=Auto-annotation start source: Glimmer calls the start site at 35476. GeneMark calls the start site at 35488. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -2.962. The SD score for this start site is the best out of all the other start sites called on PECAAN. /note=Gap/overlap: Gap: 68bp gap. This gap is >30 so it is relatively larger than wanted. There is very slight coding potential in this gap, but it is in the reverse direction. This gap is relatively conserved in other phages (Bodacious, Juboset). /note=Phamerator: This gene is found in Pham 1002 as of 11/1/23.There are 102 total members in Pham 1002, 6 are drafts. This gene is conserved in other members of the same cluster AK. Albanese_49 (AK), Daiboju_46 (AK), and GreenHearts_50 (AK) are three examples of phages that were compared to PinkFriday. Function: PurA-like adenylosuccinate synthetase was repeatedly called in other phages of the same pham and cluster. /note=Starterator: Starterator: (Start: 12 @35476 has 15 MA`s). Start #12 was manually annotated 15 times for cluster AK. Found in 18/102 (17.6%) of genes in pham. Called 94.4% of time when present. PinkFriday_48 does not have the “most annotated” start #11. This evidence supports the start site called by Glimmer. /note=Location call: Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 35476 bp. There is stronger evidence supporting Glimmer’s start call. Start 35476 contains all the coding potential within the corresponding ORF. /note=Function call: The top three phagesdb BLAST hits have the function of pura-like adenylosuccinate synthetase, and NCBI BLAST hits also have the function of pura-like adenylosuccinate synthetase. (100% coverage, 97.6127%+ identity, and E-value 0). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Wang, Jordan Jeffrey /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator regarding function and potential presence in the membrane. CDS 36612 - 37316 /gene="48" /product="gp48" /function="hypothetical protein" /locus tag="PinkFriday_48" /note=Original Glimmer call @bp 36612 has strength 6.15; Genemark calls start at 36612 /note=SSC: 36612-37316 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDH62_gp47 [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 1.65515E-142 GAP: 2 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.845, -5.0535707005373816, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDH62_gp47 [Arthrobacter phage Pumancara] ],,YP_009602913,95.0,1.65515E-142 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Akkinepally, Mrudula /note=Auto-annotation: Glimmer and GeneMark (start at 36612). /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -5.054 /note=Gap/overlap: 4 bp gap with previous gene /note=Phamerator: Pham 85380 Date 10/31/2023. It is conserved; found in Lasagna (AK) and Herb (AK). /note=Starterator: Start site 12 in Starterator was manually annotated in 46/78 non-draft genes in this pham. Start 12 is not in PinkFriday. This evidence disagrees with the site predicted by Glimmer (36612) and GeneMark (36612). /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 36612. /note=Function call: NKF. The top three phagesdb BLAST hits have unknown function (E-value <4e-98), and the top 2 NCBI BLAST hits have hypothetical function. (89% and 83% coverage, 89% and 73% identity, and E-value <2e-121). HHpred had and CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Garza, Dom /note=Secondary Annotator QC: Overall, I think your functional call looks good, but I believe that you still need to check 2-3 `good` hits for PhagesDB Blast. These pieces of evidence are important for supporting this being a real gene. Double check the Annotation Manual to make sure this is correct. CDS 37316 - 37528 /gene="49" /product="gp49" /function="hypothetical protein" /locus tag="PinkFriday_49" /note=Original Glimmer call @bp 37316 has strength 9.57; Genemark calls start at 37316 /note=SSC: 37316-37528 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDH62_gp48 [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 1.86231E-42 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.484, -3.9971770988089914, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDH62_gp48 [Arthrobacter phage Pumancara] ],,YP_009602914,100.0,1.86231E-42 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Giusti, Alessia /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start at 37316. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: FS: -3.997; Z-score: 2.484; Neither score is the best in PECAAN, but those with better scores create gaps that are too large that are neither conserved in other phages nor cover all of the necessary coding potential. /note=Gap/overlap: Gap: -1. This implies that it could be part of an operon. /note=Phamerator: pham: 1191. Date 11/1/2023. It is conserved; Found in CristinaYang, Aledel, and Daiboju, all of which are in cluster AK. /note=Starterator: Start site 8 in Starterator corresponds to 37316 in Pink Friday. It was manually annotated as the correct start site at the highest frequency (15 times). This start site was not the most conserved start site (7), but does agree with the site predicted by both Glimmer and GeneMark. Daiboju and Aledel, both in cluster AK, also call start site 8. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 37316. /note=Function call: All PhagesBD BLAST hits came back as function unknown and all NCBI BLAST hits came back as “hypothetical protein.” A CCD hit came back that matched to a hydrolase, but its E-Value was borderline (1.18e-03) and no other software pulled up strong hits to a hydrolase. Additionally, all HHpred hits returned with E-values >17 and probabilities <80. As such, the function of this protein cannot be determined at this time. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Joo, Hannah /note=Secondary Annotator QC: I agree with this annotation for the location call and function call. All of the evidence categories have been considered. CDS 37525 - 38451 /gene="50" /product="gp50" /function="dUTPase" /locus tag="PinkFriday_50" /note=Original Glimmer call @bp 37525 has strength 13.03; Genemark calls start at 37525 /note=SSC: 37525-38451 CP: yes SCS: both ST: SS BLAST-Start: [dUTPase [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.549, -4.4158953075094765, no F: dUTPase SIF-BLAST: ,,[dUTPase [Arthrobacter phage Pumancara] ],,YP_009602915,100.0,0.0 SIF-HHPRED: DUTPase; jelly-roll, hydrolase; HET: DUP; 2.1A {Staphylococcus phage 11} SCOP: b.85.4.0,,,4GV8_C,47.7273,99.9 SIF-Syn: dUTPase. Upstream gene is NKF, downstream gene is NKF, just like in phage Pumancara. /note=Primary Annotator Name: Joo, Hannah /note=Auto-annotation: Both Glimmer and GeneMark call the gene and agree on the start site at 37525 bp. /note=Coding Potential: The gene has high coding potential, and the chosen start site includes all of the coding potential. /note=SD (Final) Score: -4.416. This is the final score for the best start site. /note=Gap/overlap: Overlap: 4bp. The overlap is reasonable and indicates that the gene is most likely a part of the operon. /note=Phamerator: 1154. Date 11/01/2023. It is found in Eunoia (AK) and Aledel (AK). /note=Starterator: Start site 10 was manually annotated in 77/81 non-draft phages in this pham. Start 10 is 37525 in Pink Friday. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the start site is most likely at 37525. /note=Function call: dUTPase. The top two phagesdb BLAST hits have dUTPase listed as the gene function and have e-values of 1e-144 and 1e-133. 5 out of 5 of the top NCBI BLAST hits also have dUTPase listed and have small e-values ranging from 6.80e-29 to 3.26e-20 (41.5584% coverage, 37.2093% identity, e-value 6.80e-20). HHpred has a strong hit for dUTPase with 99.92% probability, 47.7273% coverage, and and e-value of 7.4e-23. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Hernandez, Sarah /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. Synteny box needs to be filled out, just include info from the Pham maps. CDS complement (38502 - 38774) /gene="51" /product="gp51" /function="hypothetical protein" /locus tag="PinkFriday_51" /note=Original Glimmer call @bp 38774 has strength 14.15; Genemark calls start at 38774 /note=SSC: 38774-38502 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDH62_gp50 [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 1.15741E-58 GAP: 130 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.055, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDH62_gp50 [Arthrobacter phage Pumancara] ],,YP_009602916,100.0,1.15741E-58 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Garcia, Isabella /note=Auto-annotation start source: Glimmer and GeneMark, both with call at 38774 /note=Coding Potential: All of the coding potential is contained within the ORF with the chosen start site. /note=SD (Final) Score: -2.443 /note=Gap/overlap: 130bp–There is not any coding potential found within the gap, nor do other phages in this pham have genes in this gap. /note=Phamerator: Belongs to pham 64663 (11/17/23). Both Albanese and Aledel, members of the AK cluster as well, belong to this same pham. There is no listed function for these genes. /note=Starterator: Start site 8 is conserved in 54 of the 72 non-draft genomes in the pham, which corresponds to the bp coordinate of 38774, our chosen start site. /note=Location call: The gene is a real gene with a likely start site of 38774, based on the evidence above. /note=Function call: This gene has no known function. There were not any CDD hits and HHpred only returned insignificant results that did not provide evidence for any function. /note=Transmembrane domains: No transmembrane domains were found, indicating that this is not a membrane protein. /note=Secondary Annotator Name: Tubeileh, Shareef /note=Secondary Annotator QC: I agree with the primary annotator about this gene having an unknown function. Hits from BlastP and HHPred were insignificant, which supports this function call. I think you may need to select the "All GM Coding Capacity" if it includes all coding potential or not. CDS complement (38905 - 39138) /gene="52" /product="gp52" /function="hypothetical protein" /locus tag="PinkFriday_52" /note=Original Glimmer call @bp 39138 has strength 12.06; Genemark calls start at 39180 /note=SSC: 39138-38905 CP: yes SCS: both-gl ST: SS BLAST-Start: [HTH DNA binding domain protein [Arthrobacter phage GreenHearts] ],,NCBI, q1:s1 100.0% 2.99547E-47 GAP: 125 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.298, -1.9310779259753799, yes F: hypothetical protein SIF-BLAST: ,,[HTH DNA binding domain protein [Arthrobacter phage GreenHearts] ],,YP_010049975,97.4026,2.99547E-47 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hernandez, Sarah /note=Auto-annotation: Both call different starts, Glimmer: 39138, GeneMark: 39180, TTG /note=Coding Potential: This gene has a reasonable coding potential predicted within the reverse ORF, and the chosen start site does cover all of it /note=SD (Final) Score: -1.931, this was the best final score on PECAAN /note=Gap/overlap: 125, overall a large gap however the gap is conserved in other phages (Dino, Fluke) and there is no coding potential present in the gap /note=Phamerator: 121454, date 11/1/23, it is conserved, found in Beethoven (AK) and Albanese (AK) /note=Starterator: Start site 7 was manually annotated in 57 out of 94 genes in this pham. Start number 7 is 39138 in PinkFriday, this matches the start site predicted by Glimmer /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 39138 /note=Function call: NKF /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note= /note=Secondary Annotator Name: Ebrahimi, Shaghaiegh Mariam /note=Secondary Annotator QC: Annotations look accurate and I agree with the start site call. For function call why was it called as NKF? I saw good hits on NCBI blast for helix-turn-helix DNA binding protein which is an approved function. Also saw on CDD with 6.54e-03 /note=Secondary Annotator Name: Ebrahimi, Shaghaiegh Mariam /note=Auto-annotation: start sites disagree 39180 start site Genemark with ATG codon, and 39138 Glimmer with TTG codon. Likely Glimmer site is correct /note=Coding Potential: Coding potential found in Self-GeneMark and Host-GeneMark and is only in the reverse gene with conserved gap. /note=SD (Final) Score: -1.931, largest final score /note=Gap/overlap: 125, large gap is conserved in other phages /note=Phamerator: The pham number as of Nov 3, 2023 is 121454. The gene is conserved in 95 other phages, all in the AK cluster. Gene conserved in Beethoven and Albanese /note=Starterator: There are 89 non-draft members of this Pham (95 total phages). 57/89 non-draft members call start site 7, which correlates to a start site of 39138. Start 7 seems most likely. /note=Location call: Evidence points to this being a real gene with the most likely start site being 39138 which was called by Glimmer and is most annotated in Starterator /note=Function call: Helix-turn-helix DNA binding protein. PhagesDB and NCBI BLAST hits with e-values ranging from 5e-11 to 3e-47. No hits on HHpred with e-values less than 1. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. CDS complement (39264 - 39842) /gene="53" /product="gp53" /function="hypothetical protein" /locus tag="PinkFriday_53" /note=Original Glimmer call @bp 39842 has strength 9.98; Genemark calls start at 39842 /note=SSC: 39842-39264 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ALEDEL_52 [Arthrobacter phage Aledel] ],,NCBI, q1:s1 98.4375% 3.75704E-89 GAP: 161 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.298, -1.9310779259753799, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEDEL_52 [Arthrobacter phage Aledel] ],,AZF98676,89.0625,3.75704E-89 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Bidzan, Hanna /note=Auto-annotation: Glimmer and genemark both have a start site of 39842 however the start codons vary between ATG and GTG. /note=Coding Potential: High activity and no overlaps, there appears to be good coding potential. /note=SD (Final) Score: The RBS final score is -1.931 and the z score is 3.298. This is the best rbs value compared to the other genes and the best z value compared to the others /note=Gap/overlap: There is a 161bp gap which is rather large, but not too large. It is also the longest ORF. There are no gaps upstream or downstream that cannot be filled by changing the start site of the gene. /note=Location call: Yes, I would keep the original start site given the coding potential, its z score and rbs score being the best for that start site and the gene being the longest ORF. /note=Phamerator: As of 11/01/23, this gene is found within pham 85281. It is part of cluster AK along with 78 other phages such as Beethoven 64, Applcider 64, and Korra 56 /note=Starterator: Highly probably start site at start 5 position 39842. There are 71 of 90 non draft genes also with the same start site within cluster AK /note=Location call: This is a real gene and has a start site at basepair 39842 /note=Function call: No known function /note=Transmembrane domains: no TMDs by DeepTMHMM /note=Secondary Annotator Name: Dweik, Qaiss /note=Secondary Annotator QC: I agree with the primary annotator`s statements regarding the validity of the gene and the start site position at bp 39842. The primary annotations are brief but well thought-out and all signs point to a start site at bp 39842. I also agree with not assigning any function to this gene, as no hits with defined functions were produced on PhagesDB or NCBI and no significant hits were found on HHpred/CDD. Also, no TMDs were found, meaning this gene`s function cannot be designated as a membrane protein. CDS complement (40004 - 40552) /gene="54" /product="gp54" /function="hypothetical protein" /locus tag="PinkFriday_54" /note=Original Glimmer call @bp 40552 has strength 13.02; Genemark calls start at 40552 /note=SSC: 40552-40004 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ALEDEL_53 [Arthrobacter phage Aledel] ],,NCBI, q1:s1 100.0% 7.61215E-102 GAP: 104 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.055, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEDEL_53 [Arthrobacter phage Aledel] ],,AZF98677,86.6667,7.61215E-102 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sass, Arielle /note=Auto-annotation: Glimmer and Genemark are in agreement that the start site is at 40552. The start codon for this ORF is ATG. /note=Coding Potential: Within the putative ORF the gene has significant coding potential predicted by GeneMark. The chosen start site covers the extent of the coding potential. /note=SD (Final) Score: The SD score is-2.443, which is the most positive option of the six possible start sites presented in PECAAN. /note=Gap/overlap: The upstream gap of 104 base pairs is large however Host-trained and self-trained GeneMark do not show any upstream coding potential within the 104 bp gap, and the gap is conserved in other related phages. The start site chosen results in the second longest ORF. The resulting length of the gene, 549 base pairs, is reasonable. /note=Phamerator: As of 11/1/2023, this gene is in pham 1273. This Pham has 76 non-draft members, all of which are also in cluster AK such as Albanese (AK) and Aledel (AK). /note=Starterator: Start site 5 was the most annotated start and was called in 66 of 73 non-draft genes in this Pham. The start site is located at base pair 40552 in PinkFriday. /note=Location call: Taken together, the gathered evidence suggests that the gene is real and that the start site for this gene is at base pair 40552. /note=Function call: There is no known function for this gene, based on hits from PhagesDB and NCBI BLASTp. The top 4 PhagesDB BLASTp hits had low e-values of 1e-79 and top 6 NCBI BLASTp hits had e-values ≤ 3e-89 and query coverage of 100%. HHPRED had no significant hits. /note=Transmembrane domains: There are no TMDs predicted by DeepTMHMM. /note=Secondary Annotator Name: Wang, Jordan /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator regarding function and potential presence in the membrane. CDS complement (40657 - 41019) /gene="55" /product="gp55" /function="hypothetical protein" /locus tag="PinkFriday_55" /note=Original Glimmer call @bp 40875 has strength 8.45; Genemark calls start at 41019 /note=SSC: 41019-40657 CP: yes SCS: both-gm ST: NI BLAST-Start: [hypothetical protein SEA_ALEDEL_54 [Arthrobacter phage Aledel] ],,NCBI, q1:s1 100.0% 7.13079E-83 GAP: 243 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.298, -1.993391246735709, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ALEDEL_54 [Arthrobacter phage Aledel] ],,AZF98678,100.0,7.13079E-83 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Diaz, Sebastian /note=Auto-annotation: Both Glimmer and GeneMark are utilized in this auto-annotation. However Glimmer calls the start site at 40875 (with codon GTG) while GeneMark calls the site at 41019 (with codon ATG). /note=Coding Potential: Within both the coding potential activity graphs provided by Self-Trained and Host-Trained GeneMark. There is enough activity on the graphs within the putative ORF to suggest a great amount of coding potential, the start site (in the reverse direction) does cover all the coding potential. /note=SD (Final) Score: The SD score for Glimmer start site is -2.112, which is the second best value from all the start site candidates. This score is still an acceptable value as it is a low unique number and can be considered suggestive of the presence of a credible ribosome binding site. Interestingly, GeneMark`s call for the start site has an SD score of -1.993 (the best score). /note=Gap/overlap: The gap value for Glimmer is a large “387” which is reasonable. There are other start sites that create larger ORFs, such as GeneMark’s call which has a gap value of “243” and subsequently has a longer ORF (363 to Glimmer’s 219). As a result, due to GeneMark having a larger ORF, a smaller SD score, and decent coding potential throughout its putative ORF (abnormal coding potential activity) we must consider the start site at 41019 ORF instead of Glimmer`s 40875. /note=Phamerator: As of October 25th, 2023 this gene is found within Pham: 89728. The pham is extremely small, only having four other members. As a result, the gene is only conserved with one other phage: KingBob’s gene 54. There was no function called for this gene. These results aren`t good enough to consider seriously. /note=Starterator: Due to the small pham size, the start site for the gene is only conserved with one other phage. The conserved start site is site 6 which corresponds to coordinates 40875. 2/4 call site #6. Although the start site coincides with Glimmer`s call, this data is dismissable due to the extremely small pham group. /note=Location call: I believe this gene has high coding potential based on the sequence graphs from both Glimmer and GeneMark, however its lack of conservation is a little concerning still, this gene`s start site can be called at #41019 /note=Function call: Based on all the evidence gathered this gene doesn’t seem to be highly conserved. However two other phages within its pham call the gene, and an NCBI BLASTp hit with 100% coverage and 98.6% identity labels this gene as a hypothetical protein. In addition, there are no CDD hits and no phagesDB hits. Overall no function can be called but we can with confidence state that this gene is real. /note=Transmembrane Domain: This protein is not a membrane protein because it has no transmembrane domains called by TMHMM. /note=Primary Annotator Name:Aguirre, Austin /note=Secondary Annotator QC: Missing module 4, please update. Also, I am looking on the coding potential charts, and coding potential does not seem to be unreasonable for 41019 as there appears to be potential upstream of 40875. Please elaborate in notes. Also, drop down menus need to be filled out.11/17/23 update: excellent notes and I agree that this is a protein with NKF. I think you should state clear evidence of NKF but stating that there were no CDD hits, and NKF in phagesDB. Otherwise great work! CDS 41263 - 41997 /gene="56" /product="gp56" /function="hypothetical protein" /locus tag="PinkFriday_56" /note=Original Glimmer call @bp 41263 has strength 8.9; Genemark calls start at 41263 /note=SSC: 41263-41997 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein FDH62_gp56 [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 3.22829E-177 GAP: 243 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.208, -4.577061420233807, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDH62_gp56 [Arthrobacter phage Pumancara] ],,YP_009602922,99.5902,3.22829E-177 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Dweik, Qaiss /note=Auto-annotation: Both Glimmer and GeneMark call and agree on the start site at bp 41263 with a start codon of ATG (Met). /note=Coding Potential: Gene has great coding potential within the putative ORF, as indicated by Self and Host GeneMark maps with high levels of typical coding potential. The start site does cover all of the coding potential, as it is placed to the left of the beginning of the high peaks of the coding potential and the stop site is placed to the right on the maps. However, there is some slight concern in the fact that there is some coding potential in the reverse orientation for the described region. /note=SD (Final) Score: The original start site does not have the best SD score (least negative) at -4.577, as the start site at bp 41356 has a less negative RBS score of -2.953. This gene is not part of an operon due to the large gap preceding all proposed start sites. /note=Gap/overlap: The gap between this gene and the preceding one (387 bp) is not reasonable and may be indicative of a missing gene that has yet to be annotated. However, this proposed start site does create the gene with the longest ORF out of all the alternative start site candidates. The proposed length of the gene (735 bp) is acceptable as well. /note=Phamerator: As of 11/01/2023, this gene is found in pham 103441. All of the members present in this pham (103441) are part of the AK cluster. Some of these phages include Albanese, BigMack, GreenHearts, Huckleberry, PitaDog, and Wawa. There was no function called for this gene. /note=Starterator: There is a reasonable start site for which the genes in this pham (103441) are conserved at start site 8 (which is included but not called by PinkFriday_57). There are 66 non-draft members in this pham and an additional 4 draft members. Of the 66 non-draft members, 20 of them call start site 8. PinkFriday_57 calls start site 13, which corresponds to bp 41263. The Starterator program is mostly informative, as very few members in this pham are drafts and it provides the number of manual annotations for which the other non-draft genes have called start site 8. However, the program suggests that the auto-annotated start site at bp 41263 is not consistent with the manual annotations from other members of this pham (which most often call start site 8). /note=Location call: The gathered evidence suggests that this is a real gene with the original start site @ bp 41263 being correct due to its complete encompassing of the coding potential, its creation of the longest possible ORF of all of the start site candidates, and the fact that it provides the smallest gap between this gene and the preceding one out of all options. However, this start site is not consistent with the Starterator program, which recommends an earlier one. This may be indicative of problems with the auto-annotations of the preceding upstream genes (which are in reverse orientation) and could result in a more upstream start site for this gene. /note=Function call: No program has returned any informative results as to the function of this gene and no TMDs were found. /note=Transmembrane domains: No transmembrane domains were predicted, signifying that this gene does not code for a membrane protein. /note=Secondary Annotator Name: Aguirre, Austin /note=Secondary Annotator QC: Please fill out coding potential and Pham drop down sheets. I agree with the called start site, but I am concerned about the coding potential found in the reverse direction on the gene, and I think this should be discussed more in the annotation notes. 11/17/23 notes: I agree with the NKF, but I do think you could state the evidence supporting the claim that this is a real gene. I did see that this was an unknown function protein in PhagesDB and a hypothetical protein in NCBI blast. CDS 41994 - 42314 /gene="57" /product="gp57" /function="hypothetical protein" /locus tag="PinkFriday_57" /note=Original Glimmer call @bp 41994 has strength 7.9; Genemark calls start at 41994 /note=SSC: 41994-42314 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein FDH62_gp57 [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 1.45752E-70 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.288, -2.606079664606164, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDH62_gp57 [Arthrobacter phage Pumancara] ],,YP_009602923,99.0566,1.45752E-70 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kathiravan, Anoushka /note=Auto-annotation: The start site is predicted to be 41994 in both Glimmer and GeneMark /note=Coding Potential: There is good coding potential as determined by Glimmer and GeneMark evidenced by the gene only having one ORF in the forward direction. /note=SD (Final) Score: -2.606 which is not the best score but is still acceptable. /note=Gap/overlap: -4 which indicates that the gene part of an operon. /note=Phamerator: The pham number is 85892 on 10/28/23. It is conserved in BigMack and Vulture. /note=Starterator: The start number is 6 and if found in 42 of the 42 non-draft genes. In PinkFriday, start 6 is 41994 which agrees with Glimmer and Genemark /note=Location call: This is a real gene and the start site is 41,994. /note=Function call: NKF. The top ranked BLAST and NCBI BLAST all have no known functions for their genes and in addition there were no CDD hits as well as very high e scores for the HHpred hits (17). /note=Transmembrane domains: None. This makes sense because according to HHpred and CDD this is a tail terminator protein and based on that call there would be no transmembrane region. Therefore this makes sense. /note=Secondary Annotator Name: Soan, Jessica Hyunsil /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 42314 - 42433 /gene="58" /product="gp58" /function="hypothetical protein" /locus tag="PinkFriday_58" /note=Original Glimmer call @bp 42314 has strength 4.06 /note=SSC: 42314-42433 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein FDH62_gp58 [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 9.14931E-20 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.493, -5.231756043413167, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDH62_gp58 [Arthrobacter phage Pumancara] ],,YP_009602924,100.0,9.14931E-20 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Aves, Alexandra /note=Auto-annotation: Glimmer lists the start as 42314, however Genemark does not have a start site listed. The start codon called is ATG. /note=Coding Potential: Neither Host Trained or Self Trained Gene Mark demonstrated very good coding potential; both had peaks in the second forward frame that were less than 0.5, and peaks in the first reverse frame that was less than 0.25. The start site does seem to cover all of the available coding potential. /note=SD (Final) Score: The final score is the second least negative listed, as -5.232. /note=Gap/overlap: The -1 overlap is a good indicator of an operon. This overlap does not create the LORF, however it is the most reasonable overlap listed. By starting at the Glimmer start, the length of the gene is 119bp, which may question the realness of the gene. /note=Phamerator: The pham number as of 10/29/23 is 85563. The gene is conserved in phages Aledel, Carpal and BigMack, all in the same cluster as PinkFriday (AK). The function call for this gene is unknown. /note=Starterator: There are 61 members of this pham, only 4 of which are drafts. The start number called the most often in the published annotations is 13, however the auto-annotated start site 14 for this gene in PinkFriday which is position 42314, which is consistent with Glimmer`s call. /note=Location call: Based on the evidence above, this gene is real and has a start site at 14, position 42314 bp. Although it is not the most annotated start site, this agrees with Glimmer and Starterator. 42314 is currently the most probable because it demonstrates a -1 overlap, the second best final score, and a z-score above 2. /note=Function call: The function of this gene is still unknown, however both NCBI BLASTp and phagesDB BLASTp both demonstrate good hits with low e-values (e^-18) and high identity percentages (100%), especially with that of Arthrobacter phage Pumancara. CDD (no data) and HHpred (showed eukaryotic proteins) did not supply supporting evidence for a specific function. /note=Transmembrane domains: N/A /note=Secondary Annotator Name: Bidzan, Hanna /note=Secondary Annotator QC: After reviewing Blastp, CDD, and HHPred hits, I agree with the primary annotators function call of no known function CDS 42430 - 42705 /gene="59" /product="gp59" /function="hypothetical protein" /locus tag="PinkFriday_59" /note=Original Glimmer call @bp 42430 has strength 11.69; Genemark calls start at 42430 /note=SSC: 42430-42705 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein KDJ06_gp59 [Arthrobacter phage Sergei] ],,NCBI, q3:s7 94.5055% 3.72622E-53 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.761, -3.1417032362307356, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein KDJ06_gp59 [Arthrobacter phage Sergei] ],,YP_010050473,87.234,3.72622E-53 SIF-HHPRED: SIF-Syn: Gene 59 in CristinaYang and Sergei is NKF and lines up with gene 58 in PinkFriday. Gene 60 of PinkFriday lines up with gene 59 in CristinaYang and gene 58 in Sergei Gene 60 in Sergei and gene 61 in CristinaYang are HNH endonucleases like gene 62 in PinkFriday. /note=Primary Annotator Name: Ebrahimi, Shaghaiegh Mariam /note=Auto-annotation: 42430 start site: Genemark and Glimmer agree /note=Coding Potential: coding potential found in Self-GeneMark and Host-GeneMark and is only in the forward gene with no large gaps /note=SD (Final) Score: -3.142 is largest among the options /note=Gap/overlap: Overlap is reasonable -4, and is indicative of an operon /note=Phamerator: The pham number as of Nov 1, 2023 is 86103. The gene is conserved in 34 other phages, all in the AK cluster. Gene conserved in CristinaYang and Herb /note=Starterator: There are 31 non-draft members of this Pham (34 total phages). 7/31 non-draft members call start site 10, which correlates to a start site of 42430. Most called was start site 11; 14/31 non-draft genomes /note=Location call: Evidence points to this being a real gene with the most likely start site being 42430 which was called by Glimmer and Genemark. Starterator had most annotated at start 11, but start 10 was chosen /note=Function call: No known function. No reasonable hits with known proteins on PhagesDB, HHpred, or NCBIblast. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs and so this is not a transmembrane protein. /note= /note=Secondary Annotator Name: vazquez, eunice /note=Secondary Annotator QC: /note=Auto-annotation: The gene is called by both Glimmer and GeneMark at the start site 42430. The start codon is ATG. /note=Coding Potential: The gene has reasonable coding potential. The chosen start site covers all this coding potential. /note=SD (Final) Score: The final score is the best compared to all other start sites. The z score is also good. /note=Gap/overlap: Very small overlap. /note=Need to put NKF and fill synteny box and also check any boxes from phagesdb blast, hhpred, ncbi if applicable. CDS 42702 - 42794 /gene="60" /product="gp60" /function="membrane protein" /locus tag="PinkFriday_60" /note=Genemark calls start at 42702 /note=SSC: 42702-42794 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein KDJ07_gp61 [Arthrobacter phage Urla] ],,NCBI, q1:s1 100.0% 2.78295E-10 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.625, -3.4892599424135016, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein KDJ07_gp61 [Arthrobacter phage Urla] ],,YP_010050535,90.625,2.78295E-10 SIF-HHPRED: SIF-Syn: Before this protein is an NKF and after this protein is an NHN endonuclease which is in line with the synteny for Dino. /note=Primary Annotator Name: Qin, Kaley /note=Auto-annotation: There is no Glimmer called start site. GeneMark calls 42702 with a start codon of ATG which is a common start codon. /note=Coding Potential: All reasonable coding potential is included within the start-site for both host-trained and self-trained GeneMark. /note=SD (Final) Score: The start site has a final score of -3.489 and a z-score of 2.625. These are both favorable because -3.489 is the largest final score and the z-score is above 2. /note=Gap/overlap: The gap of -4 bp is the smallest gap. A gap of -4 is also most preferred, even though it doesn’t offer the LORF. The gap is conserved in other AK phages including Dino and Scuttle. /note=Phamerator: pham: 3342. Date 11/12/2023. It is well-conserved in phages from cluster AK. Two examples of AK phages where this pham is conserved is Dino (AK) and Scuttle (AK). /note=Starterator: Start site 11 is the most annotated start site, called in 19 of 22 phages. It is called 100% of the time when it is present. Start site 11 corresponds with start site 42702 in PinkFriday which agrees with the start site predicted by GeneMark /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 42702. /note=Function call: NKF. On phagesDB pBLAST, the top two hits have significant e-values < 10e-6 (Urla’s e-value is 4e-12and Glenn 5e-6) for unknown function proteins. NCBI pBLAST confirms these unknown function protein hits for Urla (100% coverage, 90.6% alignment, and e-value of 2.78e-10). There were no hits from CDD and no informative hits from HHpred (all the e-values were well above a reasonable threshold of 10e-3). So, we can only call the function as NKF. /note=Transmembrane domains: DeepTMHMM predicts one TMDs. So, it is a membrane protein. /note=Secondary Annotator Name: Jacobs, Sarisha /note=Secondary Annotator QC: /note=Auto-annotation: GeneMark calls start at 42702 (ATG). This is not the longest ORF. There is no Glimmer Start. /note=Coding Potential: The gene has significant coding potential on the forward strand on both self and host reports. There is noncoding potential in the reverse direction. It contains all significant coding potential, but there is a bit of coding potential (significantly less than 0.5 ) that lies outside of the start. Please see coding potential map. /note=SD (Final) Score: The best final score is a -3.489. The Z score is also above 2.0 (2.625). /note=Gap/overlap: None of the suggested ORFs have any gaps. All have varying overlaps. The smallest overlap is 4 bp. /note=Phamerator: 11/3/2023 Pham: 3342 has 26 members, all from cluster AK. It is conserved in Dino and Glenn. /note=Starterator: The most called start is start 11 and it was called in 19 of the 22 published phages. PinkFriday does have this start @42702. This is supported by Genemark. /note=Location call: Based on the evidence above, this is a real gene, and the start site is most likely @42702 /note=Function call: /note=Transmembrane domains: /note= /note=Annotaor`s notes: While I agree with the assigned function on PECAAN`s dropdown menu, this annotator needs to fill out the notes section. I do believe that this is a membrane protein because it does have a hit on DeepTMHMM for one TMR. Phagesdb, NCBI, and CCD either list it as a hypothetical protein or do not have any hits for this protein. Therefore, there is an insufficient amount of evidence to give it a function other than membrane protein. CDS 42959 - 43216 /gene="61" /product="gp61" /function="HNH endonuclease" /locus tag="PinkFriday_61" /note=Original Glimmer call @bp 42959 has strength 7.88; Genemark calls start at 42959 /note=SSC: 42959-43216 CP: yes SCS: both ST: SS BLAST-Start: [HNH endonuclease [Arthrobacter phage Pumancara] ],,NCBI, q1:s1 100.0% 1.06047E-55 GAP: 164 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.347, -3.9967659997675007, yes F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage Pumancara] ],,YP_009602927,100.0,1.06047E-55 SIF-HHPRED: HNH endonuclease; Thermophilic bacteriophage, HNH Endonuclease, DNA nicking, HYDROLASE; 1.52A {Geobacillus virus E2},,,5H0M_A,74.1176,98.0 SIF-Syn: HNH endonuclease, upstream is pham 3342 on 1/31/2024, just like in phage Dino. /note=Primary Annotator Name: Yao, Alice /note=Auto-annotation: Both Glimmer and Genmark call gene at 42959. Good Z score (2.347 >2). Final Score is -3.997 which is the least negative among the Final Scores and therefore is the best Final Score. /note=Coding Potential: Coding potential in this ORF is in both the forward and reverse strand. Coding potential is found in both Genemark Self and Host. /note=SD (Final) Score: -3.997 (Least negative among the gene candidates and there is the best Final Score) /note=Gap: 164 (Huge gap. However, there is no coding potential in this gap so no need to add a gene) /note=Phamerator: Pham 121194. Date: 11/1/2023. It’s conserved. Found in Albanese and Aledel. /note=Starterator: Start Site 88 in Starterator was manually annotated in 97/457 non draft genes in this pham. The stop site agrees with Glimmer and GeneMark. /note=Location call: Based on everything that was stated above, this is probably a real gene and it most likely has the start site at 42959 /note=Function call: HNH endonuclease. The top several phagesdb BLAST hits have the function of HNH endonuclease (E-value <10^-3), and all the NCBI BLAST hits also have the function of HNH endonuclease. (100% coverage, 65%+ identity, and E-value <10^-7). HHpred had a hit for HNH endonuclease with 98% probability, 71% coverage, and E-value of 0.000021. This gene also satisfies the requirement on the SEAPHAGES` list of approved functions. The protein sequence has the H-N-H motif across the 30 aa space. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Dom, Garza /note=Secondary Annotator QC: Overall, I agree with your functional call. However, I think you should add a piece of evidence to your functional call telling your annotator how this protein satisfies the following: H-N-H across at least a 30 aa space. This is an important requirement on the SEAPHAGES` list of approved functions. Otherwise, great job!