Borosins: The biosynthesis of ribosomal alpha-N-methylated peptides from fungi and bacteria A Dissertation SUBMITTED TO THE FACULTY OF UNIVERSITY OF MINNESOTA BY Fredarla Seraphina Miller IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Adviser, Dr. Michael F. Freeman May 2020 © Fredarla Seraphina Miller 2020 i Acknowledgements The person at the top of the list of people I want to thank is my advisor, Dr. Mike Freeman. His first week on campus corresponded to my first week as his rotation student— and despite the lack of usable lab space or actual up-and-running projects, he took me on. And I am so grateful that he did! This was a unique opportunity to help build a lab from the ground up (ordering supplies, unpacking them, learning the lab would be remodeled, re-packing the supplies, moving into cubicles down the hall, moving back into the lab, unpacking again, etc). Mike, you’re a great scientist who takes your job as mentor and teacher extremely seriously—and I am a better scientist and a better person for it. Your high expectations for integrity, work ethic, experimental design, and approach to the field in general all play crucial roles in this. I am so appreciative of the opportunity to have learned from you and for the perpetual support that you provide. I believe that the success of my Ph.D. (and the fact that it was, overall, a positive experience) is largely because of you and the lab culture you cultivate—a culture that fosters teamwork and collaboration, mutual encouragement, commiseration, teasing, incomprehensible 1980’s references, memes, googly eye stickers…you know, all the normal stuff. Mike, thanks for being a wonderful advisor, mentor, and friend. And thank you for giving me the support that I needed to succeed. Of course, I also need to thank the rest of the Freeman lab. Aman, I still think you made a mistake in choosing to share a lab bay with me. You’ve put up with my endless stress-induced chatter and solved countless of my technical problems by simply scooting your chair closer to see something over my shoulder. Obviously, I am very glad that you sat next to me because in addition to all that, you’re a great sounding board for a whole variety of topics, so thanks for everything! Aileen, you give the lab that little bit of secret sass that every lab desperately needs. I’m also glad we eventually got over that “we should be friends with each other because we are both friends with Aman” stage and were able to become actual friends—at least I think we made it there. I mean, you’re one of the few people that understands the labor of love it takes to care for Spike. What more is there to ii say besides that? And Kathryn! You and I have disturbingly similar brains. So far it seems like you’ve managed to squash the super-spaz to some degree—considering the extreme success you’ve had in the lab so far. Come on, really. Your timing and skillset really could not have been more perfect for this project. I am so happy that you decided to join the lab and that you were happy to work with me. Thank you so much for playing the double role of friend and lab partner! Kelly—the honorary Freeman lab member—you’re also a huge part of why I have enjoyed/survived grad school. You decided we would be friends way back at Itasca and luckily I was amenable to the plan. You’ve also increased my saltiness substantially, but I see that as a positive thing. I am so grateful for our friendship which is pretty much equal parts complaining, discussing nerd things/our animals/science, and (shocking!) actually supporting each other. Thanks for being my BFF! Matilda—also known as Dr. Rev. Matilda S. Newton—also known as Her Majestic Squiggliness (did you think I wouldn’t put all that in here?)—I am so happy that I invited myself into your life. You were my supervisor during one of the most difficult times in my life and lucky for me, you are not only an amazing mentor and scientist—you are also a thoughtful and kind and extremely punny friend. Being able to troubleshoot experiments and life problems all in the same conversation is pretty cool. Thank you for always supporting my scientific career and, like, officiating my wedding or whatever. No big deal, nothing personal here. Many other scientists have supported me on this long journey! I want to take a moment to thank Danielle—now Dr. Drabeck! Thank you for being the first person to give me an opportunity to do science. I am so grateful to have a role model like you as the foundation for my scientific career. I also want to thank Komal and Bridget of the Bond/Gralnick labs. You two helped me figure out how to try to be a microbiologist and were always a reliable source of comfort and venting when basically nothing worked ever (see Chapter 6 of this thesis). iii Of course my family has played a huge role in my success and they deserve all the credit I could possibly give them (and more). My mom and Terry (stepdad extraordinaire) let me live with them in the first year of grad school when my whole life fell apart and have done nothing but encourage and support me through absolutely everything. My dad and Heidi (the loveliest of stepmoms) have also provided pivotal support for me—not just during grad school but long before. I am so grateful to have four such amazing parents on my side—rooting for me, giving me advice, and telling me to stop being such a moron when I occasionally seem to lack basic life skills. I also want to thank my brother, Max: we started grad school at the same time but you always* hit the milestones just before me— thank you for paving the way and talking me through things so many times! And to Art— the best baby brother I could ask for. Thanks for the extreme unconditional love and support! I would be remiss if I didn’t also give proper thanks to my newly-acquired family. Derreen, Beth, and Shane—I couldn’t have asked for better in-laws. Thank you so much for welcoming me into your family and always being happy to listen to me babble (or complain) about school and research. Lastly—thank you to my husband, Nathan. On our first date, I warned you that grad school was going to be occasionally tough to witness as a bystander. I believe I also padded the warning by describing my mom’s delicious pancakes—but for whatever reason (pancakes or otherwise), you stuck around through all of it. I cannot begin to describe the positive impact you’ve had on my life. As you like to say, you are “the rock that drags me down.” Just kidding! …well, sort of. You do keep me grounded—you keep me always looking on the bright side and you are my safe place. Thank you. *It has been brought to my attention that, actually, this is not true. But it sure seems that way. iv Dedication This thesis is dedicated to Spike. Spike has seen me through countless life stages: post-undergrad listlessness, the turmoil of most of my twenties, applying to graduate school, prelims, and beyond. We are both high-strung and jumpy so we work well together. You can usually find him snoozing in one of the many heated cat beds around the house or basking outside in the sunshine for hours at a time, allowing the breeze to blow through his glorious cheek fluff and luxurious tail feathers. This tiny and somewhat (somewhat) ridiculous dog has been a great source of comfort and levity over the years and I hope he knows how much he is loved. Good job, Spike! v Abstract Natural products are bioactive small molecules synthesized by living organisms. They are often used in medicine and industry as pharmaceuticals, flavors, pigments, and more. α-N-Methylation of the peptide backbone, as seen in the immunosuppressant natural product cyclosporin A, confers desirable pharmacokinetic characteristics such as resistance against proteolytic degradation, enhanced rigidity/target specificity, membrane permeability, and drug oral bioavailability. Until recently, it was thought that this backbone modification was only naturally accessible through non-ribosomal peptide synthesis. Our discovery of the borosin family of peptide natural products challenged that assumption by identifying nematocidal metabolites as α-N-methylated, ribosomally synthesized and posttranslationally modified peptides (RiPPs, a quickly growing class of natural products known for their modular and potentially engineerable biosynthesis). The first borosins we discovered, found primarily in basidiomycete fungi, are signified by a unique autocatalytic mechanism for incorporating α-N-methylations, where the natural product sequence is tethered in a single protein to an iteratively acting autocatalytic methyltransferase. In-depth bioinformatics analysis has since revealed an even more diverse subfamily of putative α- N-methylated peptides that have natural product precursors encoded in trans to the methyltransferase. These so-called “split borosins” are found primarily in bacteria. This thesis describes how borosins fit into the larger field of natural product and RiPP biosynthesis. It includes the discovery of additional domain architectures for the fused and split systems as well as a structural and biochemical analysis of a putative split borosin system in the bacterium Shewanella oneidensis MR-1. We have also begun to investigate the native role the split borosin peptide natural product may play in S. oneidensis MR-1 through the creation of genetic knockouts and in vivo phenotyping experiments. vi Table of Contents List of Tables.………………………………………………………………………………………x List of Figures..……………………………………………………………………………………xi 1 Introduction ............................................................................................................... 1 1.1 Natural products: definition and context .............................................................. 1 1.1.1 Peptide natural products: a brief introduction ............................................... 4 1.2 Non-ribosomal peptide natural products .............................................................. 5 1.2.1 NRP biosynthesis .......................................................................................... 5 1.3 Ribosomally encoded peptide natural products ................................................... 9 1.3.1 RiPP biosynthesis........................................................................................ 12 1.3.2 The RiPP recognition element (RRE) ......................................................... 15 1.4 Genomics approach to discover new RiPP BGCs and families ......................... 16 1.5 Repertoire of PTMs in RiPPs ............................................................................. 17 1.6 PTMs for structure and stability in RiPPs .......................................................... 18 1.7 α-N-Methylation in peptide natural products ..................................................... 20 1.8 α-N-Methylation in RiPPs .................................................................................. 21 1.8.1 RiPPs in basidiomycete fungi ..................................................................... 22 1.9 The omphalotins: founding members of the borosin RiPP family ..................... 24 1.9.1 Biochemical and structural characterization of OphMA ............................ 27 1.10 Contents of this thesis ..................................................................................... 30 2 Distinct autocatalytic α-N-methylating precursors expand the borosin RiPP family of peptide natural products ................................................................................ 32 2.1 Introduction ........................................................................................................ 33 2.2 Results and discussion ........................................................................................ 36 2.2.1 Identification of putative borosin pathways ................................................ 36 2.2.2 Validation of borosin precursors ................................................................. 40 2.2.3 Distinct borosin precursor structural types ................................................. 42 2.2.4 Linking borosin gene clusters to metabolites.............................................. 43 2.3 Conclusion .......................................................................................................... 46 2.4 Materials and methods ....................................................................................... 46 2.4.1 Materials ..................................................................................................... 46 2.4.2 Borosin identification and phylogenetic analysis ....................................... 47 vii 2.4.3 Cloning and gene synthesis ......................................................................... 48 2.4.4 Protein expression and purification ............................................................ 49 2.4.5 Proteolytic digestion ................................................................................... 50 2.4.6 Peptide mass spectrometric analysis (LC-MS/MS) .................................... 51 2.4.7 RNA expression of ledMA in L. edodes mycelium and fruiting body ........ 52 2.4.8 Genomic DNA isolation of G. fusipes ........................................................ 52 2.4.9 Degenerate PCR amplification of putative gymnopeptide borosins ........... 53 2.4.10 Inverse PCR ................................................................................................ 54 2.4.11 G. fusipes fosmid library, PCR screening, and cloning of gymMA1 .......... 55 3 Preliminary findings of split borosins found in the bacteria Rhodospirillum centenum SW and Streptomyces sp. NRRL S-118 ........................................................ 56 3.1 Introduction ........................................................................................................ 56 3.2 Split borosin BGC found in R. centenum SW .................................................... 59 3.2.1 Biochemical analysis of the putative borosin methyltransferase and precursor from R. centenum SW ............................................................................................... 63 3.3 Split borosin BGC found in Streptomyces sp. NRRL S-118 ............................. 70 3.3.1 Biochemical analysis of the putative borosin methyltransferase and precursor from Streptomyces sp. NRRL-S118 .......................................................................... 72 3.4 Conclusion .......................................................................................................... 83 3.5 Materials and methods ....................................................................................... 84 3.5.1 DNA and protein sequences........................................................................ 84 3.5.2 Molecular cloning and creation of plasmid constructs ............................... 86 3.5.3 Heterologous protein expression and purification ...................................... 88 3.5.4 SUMO cleavage by bdSENP1 protease ...................................................... 89 3.5.5 In vitro multiple turnover experiment for MS analysis .............................. 89 3.5.6 Mass spectrometric analysis ....................................................................... 90 4 Preliminary findings of a split borosin found in the bacterium Shewanella oneidensis MR-1 .............................................................................................................. 91 4.1 Introduction ........................................................................................................ 91 4.2 Heterologous methylation of SonA by SonM in vivo ........................................ 93 4.2.1 Multiple substrate turnover in vitro ............................................................ 97 4.3 Conclusion ........................................................................................................ 100 4.4 Materials and methods ..................................................................................... 101 4.4.1 DNA and protein sequences...................................................................... 101 4.4.2 Molecular cloning and creation of select plasmid constructs ................... 103 viii 4.4.3 Heterologous protein expression and purification .................................... 104 4.4.4 SUMO cleavage by bdSENP1 protease .................................................... 105 4.4.5 In vitro multiple turnover experiment for MS analysis ............................ 106 4.4.6 Mass spectrometric analysis ..................................................................... 106 5 Structural and kinetic analysis of the split borosin methyltransferase and precursor from S. oneidensis MR-1 ............................................................................. 108 5.1 Introduction ...................................................................................................... 108 5.2 Crystal structure of SonMA WT ...................................................................... 110 5.2.1 Borosin binding domain (BBD) ................................................................ 114 5.2.2 SonMA and OphMA active site residues are conserved .......................... 115 5.3 Kinetic and structural characterization of the SonM active site....................... 118 5.4 Dramatic conformational changes occur due to core peptide characteristics .. 127 5.5 Conclusion ........................................................................................................ 130 5.6 Materials and methods ..................................................................................... 131 5.6.1 Genomic DNA extraction ......................................................................... 135 5.6.2 Cloning ...................................................................................................... 136 5.6.3 Protein purification ................................................................................... 138 5.6.4 Mass spectrometry .................................................................................... 139 5.6.5 Kinetics assay............................................................................................ 140 5.6.6 Generating the kinetic model .................................................................... 142 5.6.7 Protein crystallization and data collection ................................................ 144 6 Progress towards identifying a phenotype associated with the split borosin BGC in S. oneidensis MR-1 .................................................................................................... 145 6.1 Introduction ...................................................................................................... 145 6.1.1 Proposed bottom-up approach strategy to identify the son RiPP and determine its native biological role in S. oneidensis MR-1 .................................... 145 6.2 Description of the son BGC in S. oneidensis MR-1 ......................................... 148 6.3 ArcA regulation of the son BGC ...................................................................... 150 6.3.1 Known phenotypes in S. oneidensis MR-1 related to the Arc system ...... 153 6.4 Cyclic di-GMP regulation in bacteria and its implication for the son BGC .... 156 6.4.1 Related motility phenotypes in S. oneidensis MR-1 ................................. 158 6.5 Pellicle biogenesis in S. oneidensis MR-1 ....................................................... 161 6.5.1 Pellicle experiments in S. oneidensis MR-1 ............................................. 162 6.6 Hypotheses regarding the final natural product from the son BGC ................. 165 ix 6.7 Conclusion ........................................................................................................ 169 6.8 Materials and methods ..................................................................................... 169 6.8.1 Cloning ...................................................................................................... 170 6.8.2 WM3064 cells used for conjugation of S. oneidensis MR-1 .................... 179 6.8.3 Generating S. oneidensis MR-1 mutants with pSMV3 plasmids .............. 180 6.8.4 RNA extraction and reverse transcriptase PCR (RT-PCR) ...................... 180 6.8.5 Growth curve (aerobic in LB) ................................................................... 182 6.8.6 Motility and colony morphology experiments .......................................... 182 6.8.7 Pellicle experiments .................................................................................. 183 6.8.8 Mass spectrometric analysis ..................................................................... 184 6.8.9 Media recipes ............................................................................................ 185 6.8.10 DNA and protein sequences...................................................................... 188 7 Concluding remarks ............................................................................................. 194 7.1 Natural product research with synthetic biology tools ..................................... 194 7.2 Modularity in RiPP biosynthesis to expand the accessible chemical diversity through protein and peptide engineering .................................................................... 195 7.2.1 Principle 1: Variable core peptide sequences ........................................... 196 7.2.2 Principle 2: Recognition motifs in the leader can be matched to recruited BGC enzymes ......................................................................................................... 198 7.2.3 Principle 3: Slower BGC enzymes act at later biosynthetic steps ............ 199 7.3 Future directions for engineering borosin RiPPs: α–N-methylation is now an accessible PTM via traditional RiPP biosynthesis ...................................................... 200 7.3.1 Biochemical characterization of a split borosin system ............................ 201 7.4 Future directions for investigating how RiPPs are involved in central metabolism/homeostasis in bacteria ........................................................................... 202 8 Bibliography .......................................................................................................... 204 9 Appendix 1: Supplemental information for Chapter 2 ..................................... 228 10 Appendix 2: Supplemental information for Chapter 5 ..................................... 297 x List of Tables Table 1.1 Summary of known RiPP families.................................................................... 11 Table 3.1 Gene and protein sequences of split borosins ................................................... 84 Table 3.2 Plasmids used in this study ............................................................................... 86 Table 3.3 Primers used to create plasmids ........................................................................ 86 Table 4.1 Gene and protein sequences of split borosins ................................................. 101 Table 4.2 Plasmids used in this study ............................................................................. 102 Table 4.3 Primers used to create select plasmids ............................................................ 103 Table 5.1 Structures discussed in this study ................................................................... 114 Table 5.2 SonM kinetics data.......................................................................................... 122 Table 5.3 Structure statistics (abbreviated) ..................................................................... 132 Table 5.4 Primers used in this study ............................................................................... 132 Table 5.5 Plasmids used in this study ............................................................................. 133 Table 5.6 DNA sequences............................................................................................... 133 Table 5.7 Parameter values for kinetic model ................................................................ 143 Table 6.1 Bacterial strains used in this study .................................................................. 150 Table 6.2 Motility phenotypes identified in TnSeq experiment ..................................... 158 Table 6.3 Conditions and strains tested in motility/colony morphology experiments ... 159 Table 6.4 Conditions and strains tested with pellicle experiments ................................. 163 Table 6.5 Attempt to overexpress sonM and sonA in S. oneidensis MR-1 ..................... 167 Table 6.6 Plasmids used/created in this study ................................................................ 170 Table 6.7 Primers used in this study ............................................................................... 171 Table 6.8 Luria Broth (LB) for 500 mL .......................................................................... 185 Table 6.9 LB plates with 15% sucrose and no salt for 500 mL ...................................... 185 Table 6.10 LB (anaerobic) for 500 mL ........................................................................... 185 Table 6.12 Shewanella Basal Medium (SBM) recipe for 1 L ......................................... 185 Table 6.13 DL (or NB) vitamins for 1L .......................................................................... 186 Table 6.14 Trace mineral mix for 1L .............................................................................. 186 Table 6.10 LM media and variations used in this study ................................................. 187 Table 6.15 DNA sequences from the split borosin BGC in S. oneidensis MR-1 ........... 188 Table 6.16 Protein sequences of the split borosin BGC in S. oneidensis MR-1 ............. 192 Table 9.1 Sequences of primers, genes, and proteins in this study. ................................ 228 Table 9.2 Splicing variability across phylum, organism, and putative precursor. .......... 256 xi List of Figures Figure 1.1 Select examples of natural products from several classes ................................. 3 Figure 1.2 NRP biosynthesis............................................................................................... 7 Figure 1.3 Simplified RiPP biosynthesis .......................................................................... 14 Figure 1.4 RRE domain in RiPP biosynthesis .................................................................. 16 Figure 1.5 Polytheonamide biosynthesis .......................................................................... 18 Figure 1.6 PTMs are important for structural stability in RiPPs ...................................... 19 Figure 1.7 The omphalotins .............................................................................................. 22 Figure 1.8 Fungi are a rich source of natural products ..................................................... 24 Figure 1.9 Omphalotins: founding members of the borosin family of RiPPs .................. 26 Figure 1.10 OphMA structure and proposed catalytic mechanism................................... 29 Figure 2.1 RiPP NPs and their biosynthetic transformations ........................................... 34 Figure 2.2 Phylogenetic tree of putative borosin precursors ............................................ 38 Figure 2.3 Borosin precursors identified and functionally characterized in this study .... 41 Figure 2.4 Structures of the gymnopeptides and the corresponding borosin precursor analysis .............................................................................................................................. 45 Figure 3.1 RiPP biosynthesis and borosin biosynthesis.................................................... 58 Figure 3.2 Putative borosin gene cluster from R. centenum SW ...................................... 61 Figure 3.3 Alignment of RceM with the methyltransferase domain of OphMA .............. 62 Figure 3.4 Methylations found on RceA core peptide ...................................................... 64 Figure 3.5 MS2 spectra showing methylation states of RceA core peptide fragments .... 67 Figure 3.6 Relative methylation states of AspN-digested RceA core peptide fragments . 69 Figure 3.7 Putative borosin BGC in Streptomyces sp. NRRL S-118................................ 72 Figure 3.8 Co-expression of his6-SUMO-StrA with StrM and Ni-NTA purification....... 74 Figure 3.9 StrA purification .............................................................................................. 76 Figure 3.10 StrM purification ........................................................................................... 77 Figure 3.11 MS2 spectra for StrA core peptide after 16 hr in vitro reaction .................... 80 Figure 3.12 EIC showing relative methylation states of StrA .......................................... 80 Figure 4.1 Putative borosin gene cluster from S. oneidensis MR-1.................................. 92 Figure 4.2 His6-SonA strongly co-purifies with SonM when co-expressed in E. coli ..... 94 Figure 4.3 MS2 spectra showing methylation states of SonA core peptide ..................... 95 Figure 4.4 HPLC-MS EIC for SonA after co-expression with SonM for 24 hrs.............. 96 Figure 4.5 SEC purification of SonM and his6-SonA ....................................................... 97 Figure 4.6 His6-SUMO tag cleavage of SonM and SonA using bdSENP1 protease ........ 98 Figure 4.7 HPLC-MS EIC to show relative abundances of in vitro methylation ........... 100 Figure 5.1 Buffer and temperature stability testing of SonMA complex ....................... 112 Figure 5.2 Domain architecture comparison between OphMA and SonMA.................. 113 Figure 5.3 BBD overlay and alignment .......................................................................... 115 Figure 5.4 Proposed SonM catalytic mechanism ............................................................ 117 Figure 5.5 Ni-NTA purification of his6-SonA and his6-SonM ....................................... 118 Figure 5.6 Schematic for continuous coupled-enzyme kinetic assay ............................. 120 Figure 5.7 Structural analysis of SonM active site mutants............................................ 124 xii Figure 5.8 Kinetic model for the methylation of SonA .................................................. 126 Figure 5.9 Crystal morphologies of WT and select mutants .......................................... 127 Figure 5.10 Differentially occupied active sites of SonM-BBD structure and loop movement ........................................................................................................................ 128 Figure 5.11 Structural conformations of core peptide .................................................... 130 Figure 6.1 Pipelines for RiPP natural product discovery ................................................ 146 Figure 6.2 Putative split borosin BGC in MR-1 ............................................................. 149 Figure 6.3 DNA binding site of ArcA in E. coli and S. oneidensis MR-1 ...................... 151 Figure 6.4 RNA extraction to verify expression of son BGC ......................................... 153 Figure 6.5 Expression changes of TCA cycle and glyoxylate pathway in ΔarcA mutant ......................................................................................................................................... 155 Figure 6.6 Aerobic growth curve in LB .......................................................................... 156 Figure 6.7 GGDEF domain protein in the son BGC ....................................................... 157 Figure 6.8 Representative images from motility assay ................................................... 161 Figure 6.9 Representative pellicle experiment set up ..................................................... 165 Figure 6.10 PQQ, MFT, and putative core of SonA ....................................................... 166 Figure 6.11 Attempt to overexpress SonM and SonA in S. oneidensis MR-1................ 168 Figure 7.1 Diversity-generating biosynthesis ................................................................. 196 Figure 9.1 MAFFT sequence alignment of putative borosin precursors identified in this study ................................................................................................................................ 257 Figure 9.2 Genetic loci of borosin precursors catalytically validated in this study ........ 261 Figure 9.3 LC-MS(/MS) data for borosin precursor E. coli expressions ........................ 262 Figure 9.4 LC-MS/MS data for in vitro methyltransferase assays of borosin precursors CmaMA, LedMA, MroMA1, and SveMA ..................................................................... 285 Figure 9.5 MAFFT sequence alignment of putative borosin precursors identified in the Agaricales order of Basidiomycete fungi. ...................................................................... 287 Figure 9.6 LC-MS(/MS) data of E. coli expressions for the gymnopeptide B borosin precursor GymMA1 ........................................................................................................ 289 Figure 10.1 SonM WT fitted kinetic curves ................................................................... 297 Figure 10.2 Fitted kinetic curves for SonM active site mutants ..................................... 298 Figure 10.3 HPLC-MS/MS data for SonA after in vitro reaction with SonM ................ 299 1 1 Introduction 1.1 Natural products: definition and context Natural products are bioactive small molecules produced by living organisms. These small molecules are often secondary metabolites that impart a selective advantage to the organism in certain environments. The diverse chemical structures and bioactivities of natural products make them useful to humans as pharmaceuticals, food preservatives, pigments, and more.1 In fact, more than half of the drugs in use today are a natural product or a derivative thereof.2 Natural products are divided into classes based on the defining chemical characteristic(s) of the molecule, such as the presence of specific chemical functional groups and/or the mechanism of biosynthesis. As natural product classes are not defined by their biological function/bioactivity, pharmaceutically relevant compounds are found across all classes. Each molecule shown in Figure 1.1 represents a different class of natural product, with gray boxes indicating compounds that have demonstrated pharmacological uses as prescribed medications. While the natural products shown in Figure 1.1 A all possess antibiotic activity, their chemical structures and modes of action vary widely. For example, the well-known antibiotics erythromycin and penicillin G are known to act by inhibiting protein synthesis3 and cell wall synthesis,4 respectively. Figure 1.1 B presents additional classes of natural products, some with important medical applications (e.g., paclitaxel for anti-cancer treatments5 and morphine for pain relief6), while others possess ecological significance (e.g., brevetoxin A-1, a potent neurotoxin produced by the dinoflagellates responsible for red tide7). Additionally, the examples shown in Figure 1.1 highlight the wide diversity of natural product-producing organisms, with the compounds shown coming from sources including plants, fungi, algae, and bacteria (although organisms from every domain of life produce natural products). These natural products play diverse native roles, including defense molecules such as antibiotics and toxins or signaling molecules for quorum sensing, etc. Others are structural components such as the ladderanes are utilized by anammox bacteria to form a dense 2 membrane and prevent diffusion of toxic metabolic intermediates.8 The methods used to study natural products reflect the overall goals of most research in the field: drug discovery and/or enzyme engineering. The historical strategy for discovering a natural product begins with acquiring an environmental isolate, followed by bioassay-guided fractionation to isolate a molecule of interest with a known/desired bioactivity. To illustrate how natural products are used in drug discovery and development, consider Sir Alexander Fleming’s serendipitous discovery of the β-lactam antibiotic penicillin G in 1928 from the fungus Penicillium notatum.9 Fleming observed that the fungus produced a molecule that caused bacteria isolates to lyse. The compound was subsequently isolated from the fungus, named penicillin G, and produced commercially as an antibiotic.9 The central scaffold of penicillin G, 6-aminopenicillanic acid, as well as four additional β-lactam core scaffolds,10 have since been modified by scientists with other functional groups to create dozens of congeners. The congeners exhibit enhanced or additional desirable pharmacological features, including broadening the spectrum of bacterial targets, resistance to degradation by β-lactamases (e.g., methicillin11), and increased oral bioavailability (e.g., ampicillin12). Penicillin G and its congeners are still used in medicine today as antibiotics. 3 Figure 1.1 Select examples of natural products from several classes Name of the natural product class is underlined above each molecule. Natural product name and producing organism are listed below each structure. Molecules with demonstrated applications as human pharmaceuticals are boxed in gray. A: A selection of antibiotics to demonstrate structural variety across classes of natural products. Erythromycin13 inhibits ribosome activity, penicillin G14 and meonomycin A15 inhibit cell wall synthesis, and plantazolicin A is a narrow spectrum antibiotic that depolarizes the cell membrane of Bacillus anthracis, the causative agent of anthrax.16,17 B: Natural products with other bioactivities. Paclitaxel is used for anti-cancer treatments5 and morphine is used for pain relief.6 Pederin18 and brevetoxin A-17 are environmental toxins, with the latter playing a role in toxic red tides. Ladderanes are specialized fatty acids found in anammox bacteria.19 4 1.1.1 Peptide natural products: a brief introduction Peptide natural products (natural products with one or more amino acids incorporated into the final product) are particularly promising for bridging the gap between discovery and application/engineering in part due to the large body of literature available on this topic. Two examples of antibiotic peptide natural products are shown in Figure 1.1 A (penicillin G and plantazolicin A), but potent antimicrobial activity20 is only a part of peptide natural products’ potential repertoire—as a class, they show potential for targeting so-called “undruggable” protein-protein interactions.21 Inhibiting protein-protein interactions is critical for modulating many biochemical pathways and thus treating disease states, but the large biologics typically required to accomplish this are too large to cross cellular membranes, so intracellular targets are inaccessible. With development, peptide natural products may be uniquely suited for inhibiting intracellular protein-protein interactions as they share chemical components with large biologics yet they are small enough to traverse a cellular membrane to inhibit intracellular targets.21,22 Considering the stakes for pharmaceutical applications and a need to access new compounds and enzymes rapidly for further research and development, modern natural product research has shifted from bioassay-guided fractionation methods of discovery to a genomics approach. Movement away from classic microbiological techniques largely results from the problems of rediscovery and dereplication (discovery of the same microbes and natural products). As DNA sequencing has become cheaper and easier, the number of sequenced genes, genomes, metagenomes, and transcriptomes has increased dramatically. Public databases such as the National Center for Biotechnology Information (NCBI) now house millions of nucleotide and protein sequences allowing for data mining and the discovery of putative, novel, and unknown genes. From the perspective of natural product discovery and characterization, it is convenient that, in bacteria and fungi (and increasingly shown in plants),23 genes from a single biosynthetic pathway are often co-localized on the genome into a biosynthetic gene cluster (BGC). Because many peptide natural products and their BGCs are well-characterized, it is often a simple and straightforward task to 5 identify additional homologous BGCs and/or related peptide natural products.24 However, the grand challenge for natural product research is not the characterization of similar enzymes and pathways from known natural product classes, but the discovery of enzymes performing novel chemistry to build new bioactive molecules. 1.2 Non-ribosomal peptide natural products Non-ribosomal peptides (NRPs), as their name indicates, are not synthesized by the ribosome. Instead, these natural products are synthesized by dedicated non-ribosomal peptide synthetases (NRPSs). Despite the discovery of many NRPs in the mid-20th century (penicillin in 19289 and enniatin A in 1947,25 for example), the modern “NRP” nomenclature was not developed until after the so-called “pre-ribosomal era” of the 1950s.26 Similarly to the discovery of penicillin,9 early NRPs were typically discovered when microbial isolates inhibited the growth of other microbes, with subsequent media extractions performed to isolate the single compound causing the growth inhibition. The 1960s saw the piece-by-piece uncovering of the NRP biosynthetic mechanism.27 This early work focused upon the bacteria Bacillus brevis which produces the NRP antibiotics tyrocidine and gramicidin S.28 Experiments involving the use of cell- free/crude extracts, partial protein purifications, and radioactive labeling led to the publication of a proposed non-ribosomal biosynthesis of gramicidin S in 1969,29–31 based on a “thiotemplated multienzyme mechanism” akin to fatty acid biosynthesis.32 Around the same time, a similar general mechanism was proposed for the biosynthesis of tyrocidine.33,34 In 1989 the gene clusters for these two peptide natural products were identified, supporting the proposed thiotemplated multienzyme mechanism and revealing the general architecture of NRPS enzymes.35,36 1.2.1 NRP biosynthesis Many NRPS enzymes have been rigorously characterized. In general terms, an NRPS is a large, multi-domain protein that builds NRP natural products in an assembly 6 line-like fashion out of monomeric amino acid building blocks. To illustrate this mechanism, as well as its logic and flexibility, consider the three representative examples shown in Figure 1.2. In typical NRP biosynthesis, each amino acid incorporated into the mature natural product has a corresponding module in the NRPS. Each module within an NRPS enzyme can be further divided into distinct enzymatic domains responsible for the activation, modification, and polymerization of a specified amino acid into a growing peptide chain. Briefly, at the expense of ATP, the adenylation (A) domain adenylates an amino acid specified by conserved residues within that domain’s active site. This activated amino acid is then transferred to the peptidyl carrier protein (PCP) domain within the module. This domain is sometimes called the thiolation domain because the activated peptide is bound as a thioester to the 4’-phosphopantetheine cofactor within the PCP domain. The PCP domain transfers the bound amino acid to the next biosynthetic domain on the assembly line. This may be a modifying domain where the side chain or backbone of the bound amino acid will be altered (methyltransferase (M) domain is used as a representative example in Figure 1.2 A and C); a condensation (C) domain which forms the peptide bond between two amino acids bound to the NRPS enzyme; or a termination (Te) domain, which terminates peptide chain elongation—often concomitant with macrocylization of the initially linear peptide.37 This termination event regenerates the NRPS enzyme for subsequent rounds of biosynthesis. In some systems, NRPs are further modified by post- synthesis tailoring enzymes after being released from the NRPS. 7 Figure 1.2 NRP biosynthesis Three different methods of NRP biosynthesis are shown. Abbreviations for domains are: (A) Adenylation, (C) Condensation, (Te) Termination, (E) Elongation. Small light blue dots represent PCP domains. Methyltransferase (M) domains are shown in pink and the corresponding modification on the peptide is also highlighted in pink (cyclosporin A and enniatin B). A: Cyclosporin A biosynthesis is a representative example of typical, linear NRP biosynthesis where the entire peptide natural product is templated in a single NRPS enzyme.38 B: Gramacidin S biosynthesis is representative of tandem NRP biosynthesis wherein two NRPS enzymes are used sequentially.35 C: Enniatin B biosynthesis is representative of iterative NRP biosynthesis wherein the same enzyme is used in an iterative fashion.39,40 8 The biosynthesis of cyclosporin A (an FDA-approved immunosuppressant) proceeds by the most straightforward NRPS method (Figure 1.2 A). The cyclosporin A synthetase, SimA, has 11 modules—one module per amino acid in the 11 amino acid peptide. Each amino acid is sequentially activated at its appropriate “A” module; methylated if there is an “M” domain present in the designated module; and condensed into a growing peptide chain. Once all 11 amino acids have been polymerized, the “Te” domain macrocyclizes the peptide chain, releasing the final, bioactive natural product.38 SimA has a molecular weight of 1.6 MDa—making this NRPS a very large, complex, and dynamic protein. Because the incorporation of each amino acid requires an additional NRPS module (a module is typically between 100-150 kDa in size),37 there are limitations on the feasible length of an NRP natural product as the size of the synthetase is directly correlated with the length of its cognate peptide natural product. Indeed, the typical size of an NRP is just 7-9 amino acids. There are, however, examples of longer NRPs produced by systems with the NRPS machinery split across multiple enzymes or where a single NRPS is used iteratively to create longer peptides. For example, the well-elucidated biosynthesis of gramicidin S uses two NRPS enzymes, GrsA and GrsB, to build a single NRP (Figure 1.2 B).41 Additionally, the NRPS involved in the biosynthesis of enniatin B iteratively uses only two modules to build a 6 amino acid peptide (Figure 1.2 C).25 The extensive chemical diversity accessible to NRPs through the use of NRPS modules, hundreds of non-canonical amino acids,42 and post-synthesis tailoring have historically made NRPSs an attractive biosynthetic system to engineer for the production of custom peptide natural products.43 Due to the well-conserved nature of NRPS module/domain architecture, it is often possible to not only identify putative NRPSs through genome mining, but also to predict the chemical structure of the associated NRP natural product. Genome mining approaches can also be utilized to discover unique enzymology and novel NRPs. This approach can be exploited to reveal novel domains within the larger NRPS architecture by searching for “A” domains with novel amino acid 9 binding pockets,44 unique modifying domains, or uncharacterized post-synthesis tailoring enzymes present in a putative BGC. The potential applications of NRPs and their predictable, assembly-line biosynthesis has kept researchers interested in this biosynthetic system, although most research has shifted from experiments in native hosts to approaches based on genomic identification of putative NRPSs and heterologous expression. The unwieldy size and multi-domain nature of NRPSs often makes it difficult to rigorously characterize an entire system. These enzymes are extremely large and dynamic, using many domains simultaneously or in very quick succession. A recent study used X- ray crystallography and small-angle X-ray scattering experiments together to analyze the structure of a di-modular NRPS, revealing the flexibility of the enzyme in solution and the large conformational changes within the enzyme that are a necessary part of NRP biosynthesis.45 Due to NRPS flexibility and movement, detailed structural data of all possible intermediates and conformations is challenging to acquire and our understanding of how these domains and modules interact with each other remains limited.46,47 Despite these challenges, efforts to understand these powerful systems have the benefit of investigation from many routes: computational, genomic, biochemical, and classical (i.e., small molecule isolation). Thus far, putative NRPs as long as 26 amino acids (the syringopeptins) have been isolated,42 but no associated gene cluster has yet been identified. Linking orphan molecules (isolated compounds with no known BGC) such as these to their cognate NRPS, especially molecules with desirable (bio)chemical characteristics will help push this field forward from basic science and discovery into medical or industrial applications, but the large size and dynamic nature of NRPS enzymes will continue to remain a hurdle for engineering and development efforts. 1.3 Ribosomally encoded peptide natural products Because of their early discovery and characterization, NRPs have been the major focus of academic and commercial research on peptide natural products. However, recent efforts have revealed that another peptide natural product class, the Ribosomally 10 synthesized and Posttranslationally modified Peptides (RiPPs), is a compelling alternative biosynthetic process for the controlled production of peptide natural products. The ribosome-templated biosynthetic mechanism which produces the RiPP precursors makes many RiPP systems more amenable to heterologous investigation than NRP systems. For this reason, among others, interest in studying RiPPs has increased within the last 10-20 years, but metabolites originating from RiPP biosynthetic machinery have been studied for almost a century. Lanthipeptides are peptide natural products containing lanthionine bridges (two alanine residues connected by a thioether linkage).48 The lanthipeptide nisin is the RiPP with the longest history. Almost a century ago, its bioactivity was first noted when Streptococcus lactis was seen to inhibit the growth of Lactobacillus bulgaris when the two bacteria were grown in co-culture.49 Nisin was assumed to be an NRP until 1970, at which time evidence was presented that showed ribosome-inhibiting antibiotics like puromycin and chloramphenicol also inhibited the production of nisin in the native organism.50 This was the first tangible evidence for RiPPs as a new natural product class, which coalesced in the 1980s when the genes for the precursor peptides of nisin and other lanthipeptides were discovered.51–55 Since this time, more than twenty new RiPP families have been identified (Table 1.1), with each discovery offering deeper insight into the biosynthetic logic and plasticity of RiPP biosynthesis—expanding the accessible chemical diversity of this class of natural products. Research into RiPP biosynthesis has rapidly increased over the last two decades, but the body of literature supporting this class of natural products remains dwarfed by that of NRP biosynthesis. With the aim of filling this research gap, groups investigating RiPP biosynthesis rely on the concurrent growth of DNA sequencing technologies (including metagenomics), bioinformatics tools, and synthetic biology methods to discover new molecules and enzyme activities through genome mining.56,57 The processes of linking orphan natural products and characterizing cryptic/silent BGCs (putative BGCs with no known natural product are especially common in metagenomics studies) are the routes which show the most potential for uncovering new enzymes and natural products. RiPP 11 biosynthesis is especially suited for this type of investigation because its modularity and smaller, individual catalytic units are more amenable to current synthetic biology tools such as heterologous expression. The nuances of RiPP biosynthesis will be discussed in the next section. Table 1.1 Summary of known RiPP families. Unless otherwise noted, all below RiPP family definitions are from Arnison et al.48 The table was adapted from Evans58 and updated to reflect current consensus. RiPP family Defining feature Example Typical organism Lanthipeptides Lanthionine containing peptides Nisin; Subtilin Bacteria Linaridins Contain thioether crosslinks like lanthipeptides but are biosynthesized differently Cypemycin Bacteria Proteusins Most heavily modified/longest RiPP to date, nitrile hydratase in leader peptide Polytheonamides Bacteria Linear azol(in)e containing peptides Azole/azoline rings on non- macrocyclized products Streptolysin S Bacteria Cyanobactins N-to-C macrocyclic peptides with proteolytic cleavage and macrocyclization performed by serine proteases Patellamides; Ulicyclamide; Ulithiacyclamide Cyanobacteria Thiopeptides Macrocycle contains a single piperidine, dehydropiperidine, or pyridine, and several thiazole rings Thiostrepton A; Micrococcin P1 Actinobacteria Bottromycins Include a decarboxylated C-terminal thiazole and macrocyclic amidine— contain C-methylated amino acids in a series Bottromycin A2 Bacteria Microcins With exception of microcin C, which has no leader, microcins are tailored with leader and cores intact; maturation occurs when fully tailored core is cleaved from leader Microcins B17, C, J25 Entero- bacteriaceae Lasso peptides Contain specific, knotted, “lasso fold” making them very resistant to denaturing agents and proteases Siamycin I, II; Microcin J25 Bacteria Microviridins Cyclic N-acetylated tri- and tetradecapeptides containing ω-amide and/or ω-ester bonds. Most contain lactams, all contain lactone linkages, resulting in their tricyclic structures Micriviridin B; Marinostatin 1-12 Bacteria Sactipeptides Contain α-carbon to cysteine sulfur (on different amino acids) linkages Subtilosin A; Thurinsin H; Thuricin CD (α and β) Bacteria 12 Bacterial head-to-tail cyclized peptides N-to-C terminal cyclized peptides distinguished by their large size and biosynthetic machinery Cyclic bacteriocins; Enterocin As-48 is a 70mer Gram-positive bacteria Amatoxins/phallotoxins N-to-C cyclized 7-mers containing tryptothionine crosslinks α-Amanitin, phalloidin Basidiomycete fungi Cyclotides N-to-C cyclized peptides with a cyclic cysteine knot formed from three conserved disulfide bonds Kalata B1 Plants Orbitides N-to-C cyclized peptides without disulfide bonds Segetalin A, D Plants Conopeptides and other toxoglossans Contain a significantly higher density of disulfide crosslinks and PTMs than other animal venom toxins Conkunitzins; Conopressins Cone snails Glycocins Antimicrobials that include glycosylation moieties Sublancin 168; Glycocin F Bacteria Catch-all class for auto- inducing peptides (AIPs), ComX, methanobactin, and N- formylated peptides For brevity, full descriptions not provided. See the Arnison et al. review for more information.48 Methanobactin Bacteria, fungi, plants, animals Dikaritins59 Cyclic peptides with an ether bridge Ustiloxins, Phomopsins Ascomycete fungi Borosins60 α-N-methylated peptides Omphalotins; Gymnopeptides Basidiomycete fungi Epichloëcyclins61 Cyclic peptides Epichloëcyclin A Ascomycete fungi (Epichloë spp.) Epipeptides62 D-amino acid containing peptides YydF Gram-positive bacteria Catch-all for small RiPPs Smaller molecules not classified above PQQ; Pantocin Bacteria, fungi, plants, animals 1.3.1 RiPP biosynthesis In contrast to NRP biosynthesis that relies upon conserved NRPS enzymes, the RiPP natural product class does not possess an individual conserved gene for its biosynthesis: RiPP families may share conserved genes, but this conservation does not hold for the class as a whole. Instead, RiPP biosynthetic pathways follow a conserved biosynthetic logic, so a protein unique to RiPP biosynthesis and common to all known RiPP BGCs is unlikely to exist. Rather than being templated by an NRPS, a RiPP natural product scaffold is directly encoded in the genome, often as part of a BGC with other genes 13 important for the biosynthesis of that RiPP (Figure 1.3 A). The RiPP scaffold is transcribed as mRNA and translated by the ribosome into a precursor peptide, in most RiPP families conventionally designated XxxA.48 Typically, the precursor peptide is divided into two regions: a leader peptide sequence almost exclusively at the N-terminus and a core peptide at the C-terminus. The leader peptide is thought to serve as a binding domain to recruit dedicated modifying enzymes, which install posttranslational modifications (PTMs) on the core peptide. Proteolytic cleavage of the leader releases the mature, bioactive natural product (Figure 1.3 B).48 By utilizing a proteolytically removable, conserved portion of the precursor to direct modification of the core peptide, RiPP biosynthesis is thus able to maintain exquisite specificity for its cognate precursor peptides while simultaneously allowing for substrate plasticity of the core peptide sequence. Often, the leader peptide will exhibit several conserved regions, or recognition sequences (RSs), that are specific to individual modifying enzymes.63 Since RiPP precursors are synthesized by the ribosome, they are initially limited to the 20 proteinogenic amino acids prior to posttranslational modification. However, due to the processive nature of translation, RiPPs may be much longer than NRPs—such as the polytheonamides, which are 49 amino acid residues long (the longest RiPPs yet discovered).64,65 The current definition of RiPPs specifies that the mature molecule be less than 10 kDa as a somewhat arbitrary separation to distinguish them from modified small proteins.48 14 Figure 1.3 Simplified RiPP biosynthesis A: Schematic of a simplified RiPP BGC (not to scale). Typical constituents include a gene for the precursor peptide (cyan and orange) and a modifying/tailoring enzyme (light pink). Proteins involved in regulation (blue) and protease cleavage/transport (green) may also be present but are not always found in the RiPP BGC. B: The RiPP precursor peptide is translated by the ribosome, expression may be regulated by a transcription factor in the BGC (blue). In most cases, an N-terminal leader peptide (cyan) recruits modifying enzymes (light pink) to install PTMs onto the core peptide (shown here as a color change from orange to dark pink). After modification, the leader is proteolytically cleaved from the modified core, often concomitant with transport out of the cell, releasing the mature, bioactive RiPP natural product. RiPPs are found in all domains of life and exhibit a wide range of structures and bioactivities. Several RiPP natural products are produced at an industrial scale in engineered/optimized biological systems, such as thiostrepton (a veterinary antibiotic)66 and nisin (a food preservative).67 While the majority of well-known RiPPs are secondary metabolite toxins, some RiPPs perform a more central metabolic role within their native organism’s metabolism. For example, pyrroloquinoline quinone (PQQ) is a bacterial redox cofactor (and is also sold commercially as a dietary supplement)68 and ComX168 is a pheromone that triggers natural bacterial competency from quorum sensing signals.69 See Table 1.1 for a summary and short description of known RiPP families. 15 1.3.2 The RiPP recognition element (RRE) Understanding how the interaction between leader peptide and modifying enzyme dictates tailoring of the core peptide is critical for understanding RiPP biosynthesis. Recently, PqqD, a chaperone protein involved in PTM-related maturation in PQQ biosynthesis,70 attracted particular attention when “PqqD-like” domains were discovered to exist in more than half of bacterial RiPP clusters, including such diverse RiPP families as cyanobactins, lasso peptides, proteusins, and more.71 Due to its widespread presence in many RiPP BGCs, this structural motif was designated the RiPP recognition element (RRE).71 The RRE is found as a standalone protein or as part of a biosynthetic enzyme as a structural motif—in both cases, it acts as a chaperone and is responsible for presenting the precursor peptide to a modifying enzyme.71 While RREs carry the same winged-helix- turn-helix motif as seen in PqqD (Figure 1.4 A), amino acid sequence identities between different RREs are low. Interestingly, several crystal structures have shown that even the mechanism of interaction between the RRE of a particular RiPP BGC and its cognate precursor peptide is not conserved. For example, the precursor from the antibiotic microcin C7 interacts with only the β-sheets of the RRE motif, while the precursors from cyanobactins and the lantibiotic nisin interact with the β-sheets and helices of the RRE, in different orientations (Figure 1.4 B). The ubiquity of the RRE in many RiPP families together with its varied mode of interaction with precursor peptides is an intriguing nuance to the study of RiPP biosynthesis. It represents an initial step in understanding how a biosynthetic system can use conserved elements—including ribosomes and structural motifs—to build incredibly diverse bioactive small molecules. 16 Figure 1.4 RRE domain in RiPP biosynthesis Adapted from Burkhart et al.71 The RRE is depicted with purple sheets and cyan helices. Precursor peptides from several RiPPs are shown in yellow sticks. A shows the structure of PqqD (PDB 3G2B)72 and B shows the RRE motif as a domain within RiPP modifying enzymes MccB (PDB 3H9J),73 LynD (PDB 4V1T),74 and NisB (PDB 4WD9)75 to highlight differences in precursor-RRE interactions. 1.4 Genomics approach to discover new RiPP BGCs and families As discussed above, RiPP biosynthesis follows a conserved logic rather than a single conserved gene across the natural product class. Many very detailed bioinformatic tools exist for other, more well-studied natural product classes such as NRPs and polyketides, but fewer tools exist for RiPPs. Despite this disparity, there are several bioinformatic strategies for revealing putative related RiPP BGCs. These tools often exploit homologous recognition sequences in leader peptides, core peptide motifs, and specific characteristics of a prototypical precursor peptide (e.g., short open reading frame, presence of protease sites, predicted cross linking residues, etc.). Crucial to this genomic approach is the identification and use of conserved biosynthetic enzymes within RiPP families. Examples of bioinformatic tools for RiPP searches include RODEO,76 AntiSMASH 4.0,77 BAGEL4,78 RiPP-PRISM,79 and RiPPMiner.80 Because of the lack of conserved sequences/enzymes, making the jump from a known RiPP family with characterized and conserved biosynthetic enzymes to a new family of RiPPs requires some creative leaps of logic beyond currently used algorithms. For example, in an attempt to discover more thiazole/oxazole-modified microcin BGCs, whose 17 precursor peptides are short and thus can be challenging to bioinformatically identify, Haft et al. manually curated putative BGC sequences and discovered that some encoded a longer-than-expected putative precursor peptide with a nitrile hydratase domain in the leader.81 At the time, this discovery was simply noted as a mechanism by which proteins involved in primary metabolism may be co-opted or retailored for secondary metabolism.81 Several years after this find was published, the proteusin family of RiPPs was described by Freeman et al., with the polytheonamide precursor peptide exhibiting the described nitrile hydratase domain in its leader peptide, despite being unrelated to the microcin family of RiPPs.65 This example demonstrates how bioinformatics can inform the expansion of known RiPP families, but manual analysis, experimental evidence, and curiosity is still required for the discovery of new RiPP families with novel precursor architectures and modifications.82 1.5 Repertoire of PTMs in RiPPs Since RiPPs are limited to proteinogenic amino acids for their scaffolds, they have historically been considered less chemically diverse than NRPs. This assumption has begun to be challenged as metagenomics studies reveal the prevalence and diversity of RiPP natural products, pathways, and modifying enzymes in uncultivated organisms.83 Polytheonamides, the founding members of the proteusin family of RiPPs, exemplify the power and chemical diversity accessible through RiPP biosynthesis. At 49 amino acid residues long, polytheonamides were previously assumed to be the longest NRP due in part to the presence of non-proteinogenic D-amino acids in the final product. However, a recent metagenomic study, which sequenced an uncultivated bacterial symbiont of a marine sponge, identified the polytheonamide precursor peptide and associated modifying enzymes responsible for the transformation of the core peptide contained in a single BGC.64 With the reclassification of polytheonamides as RiPPs, these molecules are now considered to be among the most heavily-modified RiPPs to date, with nearly every amino acid in the core sequence exhibiting at least one PTM—furthermore, all 50+ PTMs are installed by 18 only seven enzymes.65 Figure 1.5 shows the structure of polytheonamides A and B, color coded to show PTMs and corresponding BGC enzymes. This study characterized enzymes capable of installing modifications long-thought to only be accessible through a non- ribosomal biosynthetic route including a unidirectional L-to-D epimerase and a C- methyltransferase acting at un-activated carbons of the core peptide. For a more detailed list of many PTMs found in RiPPs, please see the comprehensive review of RiPP biosynthesis from Arnison et al.48 Figure 1.5 Polytheonamide biosynthesis Figure adapted from Freeman et al.65 A: polytheonamide A and B chemical structure. Colored PTMs match respective genes in the BGC below. B: polytheonamide BGC with ORFs labeled and color coded according to the PTMs each enzyme catalyzes (note that non-biosynthetic genes and genes of unknown function have been omitted for clarity, but relative genomic distances are maintained within the BGC). 1.6 PTMs for structure and stability in RiPPs RiPPs occupy a unique chemical space between other natural product classes and proteins, and therefore possess some of the strengths and weaknesses of both types of molecules. One limitation of peptide natural products is their susceptibility to proteolytic degradation—their lack of stable secondary structure renders peptides vulnerable to proteases and decreases their half-life within cells. Many PTMs in RiPPs are known to help stabilize a secondary structure, simultaneously offering protease protection and enhanced target specificity. Examples include sulfur crosslinking (lanthionine52 and disulfide bonds84), peptide backbone modification,60,65 macrocyclization,85 and knotted or lasso 19 structures.65,86 For example, the conopeptides (RiPP neurotoxins produced by cone snails) include four disulfide bonds in a short span of 46 amino acid residues, which lock the peptide into a rigid structure and allow it to selectively bind ion channels that confers its potent bioactivity (Figure 1.6 A).87 In the case of polytheonamide B, side-chain methylations were shown to be important for maintaining the β6.3-helical conformation required for its biological activity (Figure 1.6 B).88 Figure 1.6 PTMs are important for structural stability in RiPPs A: Figure adapted from Buczek et al.87 Suite of NMR structures for conopeptide ι-RXIA demonstrates how disulfide bonds (yellow) lock the peptide into a single rigid conformation. Flexibility of the termini (marked N and C) where there are no disulfide bonds is shown in black sticks.87 B: Structure of polytheonamide B with side chain N-methylated residues shown in purple (PDB: 2RQO). H-bonds between N-methylated side chains are shown with black dashed lines. In a polar solution, these H-bonds help stabilize the wide helical structure of this molecule.88 Backbone modifications are of particular interest for increasing the breadth of chemical diversity of RiPPs for the discovery of novel bioactive compounds. Once the peptide bond is formed, the lone pair of sp2 hybridized electrons on the amide nitrogen are delocalized, eliminating any nucleophilic character. The stability and un-reactivity of this bond make it difficult to chemically modify. For this reason, NRP biosynthesis typically modifies the atoms involved in the peptide bond prior to that bond’s formation. RiPPs, however, are limited to posttranslational modification, making backbone modifications challenging. To install PTMs on the peptide backbone, most known backbone-modifying RiPP enzymes make use of radical chemistry.89 Examples of backbone-modifying enzymes 20 include the iterative epimerase in polytheonamide biosynthesis65 and the maturases in lasso peptide BGCs responsible for forming the isopeptide bond required for the knotted/lariat structure.90 The discovery of backbone-modifying enzymes in RiPP clusters has demonstrated that the chemical diversity accessible through RiPP biosynthesis is approaching that of NRP biosynthesis. 1.7 α-N-Methylation in peptide natural products α-N-Methylation, methylation of the nitrogen in the peptide backbone (not the side chain), of peptide natural products carries a suite of benefits important for bioactivity and stability. Examples include stability against proteases (a common theme in peptide natural products),91 membrane permeability, target specificity, and oral bioavailability.92 When combined with macrocyclic structures (as found in cyclosporin A), backbone methylation enhances those characteristics.85 Methylations on the amide nitrogen of the backbone of peptide natural products are common for NRPs and have been known since the structure elucidation of enniatin B in 1948 (structure of enniatin B is shown in Figure 1.2 C).93 However, it wasn’t until nearly 30 years later that the biosynthesis of enniatin B in the fungus Fusarium oxysporum was proposed to follow the same process as other known peptide antibiotics gramicidin S and tyrocidine.40 When enniatin B synthetase (the NRPS) was purified from the mycelia of F. oxysporum, additional details of the biosynthetic process were revealed. The precursors were determined to be D-2-hydroxyisovaleric acid and L-valine—notably, based on 14C labeling experiments, N-methyl-valine was not incorporated into the peptide. This led the authors to conclude that L-valine must be methylated after it is bound by the enzyme.94 Further experiments showed that the enzyme used S-adenosylmethionine (SAM) as a methyl donor.94 Enniatin B synthetase was further characterized using protein purified from F. oxysporum in the 1980s95—including a detailed investigation into the methyltransferase domain.96 Investigation into the methyltransferase of enniatin B synthetase using in vitro kinetics experiments confirmed SAM as the methyl donor and 21 sinefungin (a SAM analog lacking the donor methyl group) as an inhibitor. Further experiments demonstrated that S-adenosylhomocysteine (SAH), the product remaining after the methyl group is removed from SAM, is a potent inhibitor of the enzyme.96 The biosynthesis of cyclosporin A and enniatin B (Figure 1.2 A and C, respectively) both use dedicated methyltransferase domains to methylate designated residues as they move along the NRPS assembly line, before each peptide bond is formed. Until recently, α-N- methylation (as opposed to on the side chains or termini) was considered to be a hallmark of NRPs as there were no known examples of α-N-methylated RiPPs. 1.8 α-N-Methylation in RiPPs Historically in RiPPs, methylation was a common PTM that had been known to occur only on amino acid side chains or peptide termini. RiPPs exhibiting this PTM include the polytheonamides,65 bottromycins,97 microcins (plantazolicin),17 and linaridins (cypemycin).98 In addition to the aforementioned structural stability provided by side chain N-methylations in the polytheonamides,88 the N-terminal di-methylation of cypemycin was shown to be required for inhibiting the growth of Micrococcus luteus in a zone of inhibition assay.98 Additionally, the depsipeptide teixobactin, when synthetically modified to include backbone N-methylations, exhibits enhanced stability and antibacterial activity.99 First isolated in 1996, the omphalotins were orphan α-N-methylated cyclic peptide natural products produced by the basidiomycete fungus Omphalotus olearius (Figure 1.7).100–103 The omphalotins have selective nematicidal activity against the plant pathogen Meloidogyne incognita (LD50 2 ug/mL). Due to the α-N-methylations found on these molecules, they were long assumed to be NRPs.100–103 The recent publication of the genome of O. olearius104 finally allowed investigators to search for the BGC responsible for the biosynthesis of the omphalotins. In the hopes of discovering a posttranslational route to α- N-methylation, van der Velden et al. and Ramm et al. interrogated the putative omphalotin biosynthetic pathway with heterologous expression experiments, which will be described in detail below.60,105 22 Figure 1.7 The omphalotins Structure of omphalotins A-I with α-N-methylations highlighted in pink. All omphalotins share the same amino acid scaffold, which is labeled on the omphalotin A structure with one letter codes next to each residue.100,102,103 1.8.1 RiPPs in basidiomycete fungi Fungi are known to be prolific natural product producers and many clinically relevant small molecules originate from these organisms. In fact, as much as 47% of natural products from microbial sources are from fungi.106 Despite this, fungi remain understudied relative to bacteria, with only a fraction of the number of published genomes and extremely 23 limited genetic tools and heterologous hosts. This disparity between bacteria and fungi is due to several factors, which are especially pronounced in basidiomycete fungi: many fungi have large genomes, unpredictable splicing patterns, and/or complex life cycles.107 For these reasons, fungi are challenging to work with in a laboratory setting and thus to rigorously study and characterize. This practical difficulty means that the field of fungal natural product biosynthesis, along with its promise of novel enzymes and small molecules, is largely untapped.106 The fungal kingdom is split into seven phyla, typically delineated by reproductive structures. Two of these phyla, Ascomycota (cup fungi) and Basidiomycota (mushrooms), are prolific natural product producers. However, since ascomycete fungi are easier to manipulate and optimize for production of valuable natural products, a higher proportion of key natural products have been discovered from these organisms, and less from basidiomycete fungi (Figure 1.8 A).108 This disparity between fungi and bacteria is even starker for RiPPs. Despite the number of natural products originating from fungi, RiPPs are vastly underrepresented in these organisms, especially in basidiomycetes.109 The first fungal RiPP families, the amatoxins and phallotoxins from the deadly death cap mushroom, Amanita phalloides, were determined to be RiPPs as recently as 2007 (although these compounds were known prior to this).110 Since that time, only a handful of other fungal RiPP families have been discovered (Figure 1.8 B). However, as more fungal genomes are sequenced and algorithms to correctly predict open reading frames (ORFs) improve, data mining is revealing just how prolific these organisms are in silent and cryptic BGCs.111 24 Figure 1.8 Fungi are a rich source of natural products A: Timeline showing discovery of key fungal natural products with clinical relevance. Very few examples exist for basidiomycete fungi. Information from Aly et al.,108 figure adapted from Aileen Lee. B: Timeline showing discovery of RiPPs in fungi. Only 5 fungal RiPP families are currently known, and only two are in basidiomycete fungi.59–61,110,112 1.9 The omphalotins: founding members of the borosin RiPP family As mentioned briefly above, van der Velden et al. sought to determine the biosynthetic origins of the omphalotin molecules after the recent publication of the native organism’s genome.60,104 A search for linear permutations of the amino acid sequence corresponding to the omphalotin scaffold (WVIVVGVIGVIG) revealed a putative RiPP precursor encoded in the O. olearius genome. A typical RiPP leader peptide is between 2- 50 amino acids in length. Interestingly, the putative leader sequence fused to the omphalotin core peptide was nearly 400 amino acids in length. Until this discovery, the longest known RiPP leader belonged to the proteusins, which is approximately 100 amino 25 acids long and encodes an inactive nitrile hydratase domain.64 Furthermore, bioinformatic analysis of the long omphalotin leader peptide suggested the presence of a SAM-dependent methyltransferase domain, which was hypothesized to be responsible for the α-N- methylations on the core peptide sequence (Figure 1.9 B). To test this hypothesis, van der Velden et al. heterologously expressed the entire ORF with an N-terminal hexahistidine (his6) tag in E. coli, purified the protein, and analyzed the tryptic core fragment by high performance liquid chromatography tandem mass spectrometry (HPLC-MS/MS).60 By comparing MS2 spectra from higher-energy collision dissociation (HCD) and electron transfer dissociation (ETD), which fragment a parent peptide ion in different patterns, the mass corresponding to the methyl group could be definitively localized onto the backbone nitrogens of specific amino acid residues. The results of this experiment revealed an α-N- methylated core peptide sequence that precisely matched the predicted methylation pattern of the known omphalotin molecules, and further suggested an iterative mechanism wherein methylations are installed in an N- to C-terminal direction on the core peptide (Figure 1.9 C).60 The omphalotin precursor was first named OphA to follow RiPP naming convention (where XxxA is used for a precursor), but a discrepancy arose when Ramm et al. published work calling this protein OphMA to emphasize its unique domain architecture (a methyltransferase (M) domain encoded within the precursor).105 OphMA is currently the agreed-upon name for this protein. Due to the unique domain architecture of OphMA, the omphalotins were called the borosins, a new family of RiPPs named for Ouroboros, the mythical serpent that bites its own tail.60 26 Figure 1.9 Omphalotins: founding members of the borosin family of RiPPs A: Omphalotin gene cluster from O. olearius B: Domain architecture of the RiPP precursor, OphMA. C: Methylation pattern on core peptide of OphMA as determined by HPLC-MS/MS. Orange inset is the core of OphMA showing methylation pattern (pink boxes around amino acids: filled in boxes are confirmed methylations, outlined boxes are inferred from MS2 spectra) that matches omphalotin A. Methylation pattern indicates that methylations are installed in an N- to C-terminal direction. Data from van der Velden et al.60 Using the core peptide sequence and characterized ophMA gene as an anchor, the rest of the BGC was identified and found to encode a putative NTF2-like protein (ophC), O-acyltransferase (ophD), prolyloligopeptidase (ophP), F-box like protein (ophE), and oxidoreductases (ophB1 and ophB2) (Figure 1.9 A).60 Notably, omphalotin A is one of nine omphalotin molecules, with omphalotins B-I (Figure 1.7) exhibiting further PTMs such as hydroxylation and acylation. These other PTMs could reasonably be attributed to OphD, OphB1, and OphB2. This is further supported by experiments performed in the native host: omphalotin A is the first congener detected in O. olearius culture and its abundance subsequently diminishes as omphalotins B-I simultaneously increase in 27 abundance.100–103 Together, this supports a biosynthetic process wherein the first modification to take place is the α-N-methylation of the core peptide by OphMA, followed by proteolytic cleavage and macrocyclization carried out by OphP, and subsequent transformation by the remaining enzymes. Soon after the publication by van der Velden et al., Ramm et al. confirmed the production of omphalotin A when ophMA and ophP were heterologously co-expressed in the yeast Pichia pastoris.105 1.9.1 Biochemical and structural characterization of OphMA Further experiments conducted by van der Velden et al. sought to elucidate additional details regarding the function of OphMA and made note of several key findings.60 First, the native OphMA core peptide sequence (WVIVVGVIGVIG) could be swapped for similarly hydrophobic core peptide sequences, such as amino acid sequences similar to cyclosporin A (LVLAALLVIVG) and dictyonamide A (ATTVVVVVIVG). Using HPLC-MS/MS, up to 5 and 8 methylations were observed on these alternative sequences, respectively. Although the methylation patterns did not match the respective known NRPs, it demonstrated that OphMA can methylate a variety of core peptide sequences. Second, van der Velden et al. proposed a catalytic mechanism requiring SAM as a methyl donor. This was confirmed by the heterologous expression of active site mutants wherein putative SAM-binding residues S129 and Y98 were mutated to alanine and generated inactive OphMA mutants, as shown by HPLC-MS/MS. Third, gel filtration experiments showed that purified OphMA associated into homodimers. This finding prompted the group to wonder if OphMA was conducting an intermolecular reaction, in which the core peptide of one monomer was methylated by the methyltransferase of the other monomer. This was probed by co-expressing inactive OphMA with a catalytically active core mutant of OphMA (analogous core mutant used to distinguish it from the core attached to an inactive OphMA). After co-expression and analysis, the core peptide of the inactive OphMA showed up to ten methylations, indicating that catalysis could indeed proceed as an intermolecular reaction.60 28 Following the publication of these biochemical experiments, a collaboration with the lab of Dr. Jim Naismith led to the elucidation of the 2.4 Å crystal structure of OphMA along with a suite of OphMA mutant structures.113 The wild type crystal structure definitively supported the gel filtration experiment results wherein OphMA forms a homodimer with intermolecular/in trans activity. This was also confirmed for dbOphMA, a close homolog from the fungus Dendrothele bispora.114 The crystal structure also revealed that the homodimers form a novel concatenated ring structure. In this structure, the clasp region of OphMA wraps around the outside of the methyltransferase domain, allowing the core peptide to reach and insert into the opposite monomer’s active site (Figure 1.10 A and B).113 Despite the structural data acquired in this study, since the substrate (core peptide) remains attached to the enzyme in a pseudo-zero order reaction, a true kinetic study remains challenging. By utilizing defined expression times and HPLC- MS/MS analysis to determine relative methylation states, Song et al. were able to determine a kcat,App of 0.32 methylations h -1—with the slow reaction rate reaffirming the need for co-expressions to take place over several days (up to five) in order to detect the fully methylated core peptide.60,113 Similar kcat,App was seen for in vitro reactions where a short induction time was used (two hours) and the purified protein was further incubated with additional SAM, up to 0.17 h-1 in a high pH solution. 29 Figure 1.10 OphMA structure and proposed catalytic mechanism Crystal structure of OphMA shown as a monomer (A) and a homodimer (B). Color scheme is the same as Figure 1.9 (methyltransferase is pink, clasp is cyan). C: Proposed catalytic mechanism for α-N-methylation. Color scheme remains the same (core peptide is orange, SAM/SAH is green). All data taken from Song et al.113 30 The determination of the OphMA structure also allowed Song et al. to propose a more detailed catalytic mechanism (Figure 1.10 C).113 While OphMA residues Y98 and S129 were already proposed by van der Velden et al. to play a role in substrate binding by coordinating SAM in the active site, other residues that coordinate the substrate core peptide in the active site were revealed by the crystal structure.60,113 These residues/tested mutants include Y63F, Y66F, Q172A, W400A, R72A, R72K, and Y76F; with the first four showing reduced methylations and the last two showing no methylations after HPLC- MS/MS and crystallographic analysis.113 Together with structures for wild type OphMA, mutants, and various co-crystals with SAM or SAH bound, Song et al. were able to propose a novel catalytic mechanism for the α-N-methylation of the OphMA core peptide. Briefly, as shown in Figure 1.10 C, an imidate is generated when water removes a proton from the substrate peptide. This imidate is stabilized by hydrogen bonding to Y66 and Y76 in an oxyanion hole formed by the enzyme. R72 may aid in stabilizing the transfer of the proton from Y76 and thus from the imidic acid (bracketed in the figure) and complete the methylation reaction. The biochemical and structural data together paint an interesting picture for this enzyme. Additional examples of borosin RiPPs (and their modifying enzymes) will be needed to rigorously characterize this unique system where the peptide substrate is fused to its enzyme. 1.10 Contents of this thesis The work in this thesis builds upon the literature presented for OphMA and seeks to expand the borosin family of RiPP natural products, learn about the catalytic mechanism, and determine the molecular structure and native role of bioinformatically identified borosin BGCs. Chapter 2 is a published article that details the discovery of additional OphMA homologs in fungi through bioinformatics and biochemical analyses, including two additional unique domain architectures and core peptide types.115 OphMA homologs were identified, cloned, heterologously expressed in E. coli, purified, and biochemically 31 analyzed to identify α-N-methylations on the core peptides for this unique set of borosin precursors. Chapters 3 and 4 describe more distantly related OphMA homologs found in bacteria. Interestingly, while nearly all putative borosin precursors found in fungi exhibit the unique fusion of the core peptide to the methyltransferase, this architecture is very rarely seen in bacteria. Chapter 3 deals with a subset of these so-called “split borosins” found in bacteria, specifically focusing on putative borosins found in Rhodospirillum centenum SW and Streptomyces sp. NRRL S-118. We encountered difficulties in expressing and purifying the putative borosin methyltransferase and precursor proteins from these organisms. However, as detailed in Chapter 4, our investigation into the putative split borosin BGC found in the bacterium Shewanella oneidensis MR-1 was extremely fruitful. Among the aforementioned bacterial borosins, the putative borosin from this organism has the shortest core peptide (and fewest methylations) and resides in a genetically tractable organism. For these reasons, we pursued a more in-depth study of this putative borosin BGC. Chapter 5 details the careful biochemical analysis of this borosin methyltransferase and precursor including structural and kinetic experiments. Chapter 6 discusses progress towards isolation of the final natural product and discovery of its native role in S. oneidensis MR- 1. Finally, Chapter 7, the concluding chapter, seeks to demonstrate how this thesis contributes to the body of literature in the field of RiPP biosynthesis. Much of the work in this thesis details the basic science and discovery motivating many research groups, but RiPP biosynthesis remains an attractive system for the development of custom peptide therapeutics. Discovery of split borosins, which adhere to the canonical RiPP biosynthetic logic and permit multiple substrate turnover of the core peptide, allows α-N-methylation to be added to the repertoire of PTMs in the development of custom ribosomal peptide natural products. 32 2 Distinct autocatalytic α-N-methylating precursors expand the borosin RiPP family of peptide natural products Marissa R. Quijano,1,* Christina Zach,2,* Fredarla S. Miller,1 Aileen R. Lee,1 Aman S. Imani,1 Markus Künzler,2,† and Michael F. Freeman1,† 1Department of Biochemistry, Molecular Biology, and Biophysics and BioTechnology Institute, University of Minnesota-Twin Cities, St. Paul, Minnesota, USA 2Department of Biology, Institute of Microbiology, Eidgenössische Technische Hochschule (ETH) Zürich, Zürich, Switzerland *Equal contribution of authors †Corresponding authors This chapter was reprinted with permission from the Journal of the American Chemical Society (DOI 10.1021/jacs.9b03690). Copyright © 2019, American Chemical Society Please see Appendix 1 (Chapter 9) for extensive supplemental information, figures, and tables for this chapter. FSM cloned, heterologously expressed, purified and characterized CeuMA2 (one of several newly classified type 1 borosins), PgiMA1 (the only characterized type 2 borosin), AboMA (the only characterized type 3 borosin), and PgiMA1_mut (the only truncation mutant analyzed in this study). She also heterologously expressed, purified, and characterized the inactive borosin precursors PgiMA2, BadMA, and CmuMA. ARL worked on the Gymnopus fusipes aspects of this paper. MRQ and CZ worked on the remaining precursors in this study. MRQ performed the bioinformatics analyses. ASI made the supplemental figures. MFF led the writing of the manuscript. ABSTRACT: Backbone N-methylations impart several favorable characteristics to peptides including increased proteolytic stability and membrane permeability. Nonetheless, amide bond N-methylations incorporated as posttranslational modifications are scarce in nature and were first demonstrated in 2017 for a single set of fungal metabolites. Here we expand on our previous discovery of iterative, autocatalytic α-N- methylating precursor proteins in the borosin family of ribosomally encoded peptide natural products. We identify over fifty putative pathways in a variety of ascomycete and basidiomycete fungi, and functionally validate nearly a dozen new self-α-N-methylating catalysts. Significant differences in precursor size, architecture, and core peptide properties subdivide this new peptide family into three discrete structural types. Lastly, using targeted 33 genomics, we link the biosynthetic origins of the potent antineoplastic gymnopeptides to the borosin natural product family. This work highlights the metabolic potential of fungi for ribosomally synthesized peptide natural products. 2.1 Introduction Fungi have proven to be rich in medically relevant metabolites since the discovery of the first antibiotic, penicillin.9 An estimated 47% of known microbial bioactive molecules are of fungal origin, compared to the 41% discovered in Actinomycetes, and 12.5% in other bacteria.106 These natural products (NPs) comprise a wide array of polyketides, nonribosomal peptides, terpenoids, and other small molecules that have served as statins, anticancer compounds, and immunosuppressants (e.g., lovastatin, leucinostatin, and mycophenolic acid, respectively.).106,116 Although found widely in bacteria, natural products from the class of ribosomally synthesized and posttranslationally modified peptides (RiPPs) are underrepresented in fungi.109 Since their discovery in 2005, only four families of fungal RiPPs are currently known: the amatoxins/phallotoxins,110 dikaritins (including the ustiloxins, phomopsins, and asperipins),59,117,118 the epichloëcyclins,109 and the recently identified borosins (Figure 2.1 A).60,100 Central to every RiPP gene cluster is the precursor, a peptide composed of the core amino acid sequence corresponding to the final natural product and, with one known exception, an N-terminal leader sequence that recruits auxiliary tailoring enzymes.48 The leader peptides generally comprise ~20-110 amino acids of the precursor, the longest recorded as ~400 amino acids occurring in the biosynthesis of the only characterized borosins, the omphalotins (Figure 2.1 B). These α-N-methylated cyclic natural products are produced by the basidiomycete fungus Omphalotus olearius.101 Omphalotin A is a potent nematicide (LD50 of 2 µg/ml) toxic to the plant pathogen Meloidogyne incognita, making it significantly stronger than the clinically used drug ivermectin.100 Although the mechanism of action for the omphalotins remains unclear, amide bond α-N-methylations 34 have long been key markers of bioactivity, as evidenced in the immunosuppressant cyclosporin A and the antineoplastic agent dactinomycin. Figure 2.1 RiPP NPs and their biosynthetic transformations (a) α-Amanitin of the amatoxins/phallotoxins, ustiloxin B and asperipin-2a of the dikaritins, and omphalotin A of the borosins represent different RiPP families of fungal natural products. Members of the epichloëcyclins RiPP family have not yet been structurally defined. (b) Comparison between typical RiPP biosynthesis and borosin pathways. Generally, RiPPs are translated as a short (<100 amino acids), monomeric precursor peptides that are subject to posttranslational modifications to elicit the final metabolites. Borosins are biosynthetically distinct due to their large (>400 amino acids), dimeric, and iteratively acting autocatalytic precursors incorporating α-N-methylations into their peptide backbone. The borosin protomer is marked by a bold outline. The α-N-methylated structural feature, previously thought to originate exclusively from non-ribosomal peptide biosynthetic pathways, is prized for imparting membrane permeability and protease evasion.119 However, biochemical characterization of Oph(M)A, 35 the omphalotin precursor, revealed for the first time that ribosomally synthesized peptides are substrates for these unprecedented posttranslational modifications (PTMs). Not only is this RiPP family distinguished by their α-N-methylations, but the first 250 amino acids of their uncharacteristically long leader sequence encodes its own modifying enzyme, an iteratively acting S-adenosylmethionine (SAM)-dependent N-methyltransferase.60 Crystallographic interrogation of truncated OphMA variants and active site mutants revealed an elegant catenane-like structure, where the dimeric precursor’s subunits interweave and iteratively methylate each other’s C-termini.113 The N-methyltransferase domain precedes a ~150 amino acid clasp domain that wraps around the adjacent subunit to position the core peptide into the other subunit’s active site for iterative intermolecular methylation. An amalgamation of structural evidence, quantum-mechanical calculations, and in vitro experimentation led to a mechanistic proposal for α-N-methylation.113 Water- mediated deprotonation of the amide bond is thought to create an imidate anion intermediate that nucleophilically attacks the methyl group of SAM. The intermediate is stabilized by an oxyanion hole and through an otherwise van der Waal clash of the substrate amide nitrogen and the methyl group of SAM. More recently, the crystal structure of the related dbOphMA homolog from Dendrothele bispora was also elucidated.114 Here we aim to expand and further define the borosin RiPP family through their substrate-fused α-N-methyltransferases that are predominantly hosted by the Basidiomycota. We have identified over fifty putative borosin pathways and functionally characterized eleven new α-N-methylating catalysts. Unexpected differences among the borosin precursors revealed three distinct structural types. Lastly, we uncover that the potent antineoplastic gymnopeptides produced by the basidiomycete Gymnopus fusipes, are biosynthesized via a borosin RiPP pathway. The significant sequence conservation and unorthodox catalysis of borosin RiPP precursors affords opportunities in future genetic engineering of α-N-methylated peptides and the discovery of new bioactive metabolites, enzymes, and pathways. 36 2.2 Results and discussion 2.2.1 Identification of putative borosin pathways Expansion of RiPP families is often reliant on the presence of recognition elements or modifying enzymes within a natural product gene cluster, as prototypical precursor peptides do not maintain large stretches of sequence similarity. A number of specialized RiPP-specific algorithms including BAGEL3,120 RODEO,121 RiPP-PRISM,79 and mass spectrometric-based approaches such as RiPPQuest122 and PepSAVI-MS123 have dramatically expanded the RiPP biosynthetic landscape. Fortunately, because the borosin family of RiPPs is characterized by the presence of a modifying enzyme within its leader, basic local alignment search tools (BLAST) may readily gather homologous precursors. With the curation of fungal genomes by the National Center of Biotechnology Information (NCBI) (~943 ascomycete and 307 basidiomycete genomes at the time of analysis) and the Joint Genome Institute's (JGI) 1000 Fungal Genomes Project (~660 ascomycete and 381 basidiomycete genomes), the largely untapped resource of fungal natural products can now be mined in silico. Initial protein-based BLASTp searches of OphMA’s leading 300 amino acids indicated a number of possible homologs encoded within the fungal subkingdom Dikarya. After recovering a number of genes with homology to uroporphyrin-III C/tetrapyrrole methyltransferases, we searched publicly available fungal transcriptome data that verified 31 of these homologs as partially or fully transcribed. Through multiple alignment of these transcribed genes with our model OphMA, we observed conserved amino acid translations surrounding both splicing junctions. Manual curation and prediction of additional splicing junctions conservatively revealed 42 putative borosin precursors encoded in basidiomycetes and 12 in ascomycetes, despite the sparsity of available basidiomycete genomes. Moreover, the number of basidiomycete-derived borosins is likely a vast underestimate given the prevalence of sequence gaps and unannotated/misannotated open reading frames caused by notoriously unpredictable basidiomycete RNA splicing patterns.124 For example, tBLASTn analysis of two Rhizopogon genomes identified >30 and >60 partial borosin hits in R. vinicolor and R. 37 versiculous, respectively. In addition, several precursors with near-identical methyltransferase sequences and identical core peptides encoded within the same genome were excluded from our analysis. As there is little to no sequence similarity among the clasp domains (252-378 of OphMA) of our curated homologs, we gauged the relatedness of these putative precursors by performing Bayesian phylogenetic analysis on the borosin methyltransferase domains (Figure 2.2). Protein sequences, identification numbers, and information concerning the surrounding encoded proteins can be found in Table 9.1. The N-methyltransferase domains have 57.1% amino acid identity and 73.2% sequence similarity among the identified borosin precursors. Lack of resolution within deeper branches of the tree obscures a clear evolutionary history. However, high conservation of the two exon junctions found in OphMA could suggest a common ancestor despite some exon number variability (Table 9.2).125,126 38 Figure 2.2 Phylogenetic tree of putative borosin precursors Branching of domains corresponding to Gly10-Ala252 of OphMA are supported by Bayesian posterior probability values listed above, with the methyltransferase CobA from Bacillus megaterium used as the outgroup. The exterior ring denotes the number of unique, curated borosin precursors from each host genome of the borosin precursor listed. Active (filled) and inactive (hollow) precursors tested in this manuscript are highlighted in yellow. Previously characterized borosin precursors are signified in white. More detailed information concerning protein sequence, originating host, and the sequence alignment used to construct this tree can be found in Table 9.1 and Figure 9.1. RiPP families are signified by one or more characteristic structural features installed by conserved post-translational modifying enzymes.119 Fungi, much like bacteria, often cluster natural product genes responsible for biosynthesis, export, and resistance in 39 genomic loci.106 Our previous work identified a gene cluster in D. bispora encoding cytochrome P450s and a protease homologous to those involved in the modification and cyclization of omphalotins.60 Three additional gene clusters in the genus Lentinula are homologous to the omphalotin gene cluster, having encoding proteins highly similar to OphMA (LedMA, LlaMA, LraMA) as well the prolyl oligopeptidase necessary for peptide cyclization and C-terminal recognition peptide release.105 As Lentinula edodes, also known as the shiitake mushroom, is consumed by humans worldwide, we tested whether the omphalotin-like metabolite was produced in the fruiting bodies. Transcriptional analysis at various stages of fruiting body growth did not detect the borosin precursor gene (data not shown). However, the precursor was transcriptionally identified in the mycelia as in O. olearius, which is in line with the presumed function of the omphalotins as nematode feeding deterrents. To track whether these or any other genes may be conserved in borosin biosynthesis, we screened 15 genes upstream and downstream of all identified precursors, and a subset of this analysis is shown in Figure 9.2. Almost no synteny or gene conservation is present among the remaining borosin gene clusters. While proteases/peptidases are not always clustered with RiPP biosynthetic enzymes,48 those that are co-localized with borosin precursors are quite divergent from one another and suggest some of the putative metabolites could be linear as well as cyclic peptides. Intriguingly, the borosin gene cluster in Porodaedalea chrysoloma encodes several DUF3328 proteins homologous to UstYa/b and AprY in ustiloxin and asperipin-2a biosynthetic gene clusters, respectively.109,118 The DUF3328 oxidase AprY was shown to be involved in the cyclization of asperipin-2a, which might suggest homologs have a similar function for the borosin encoded in P. chrysoloma.127 Among the clusters, scaffolding proteins that facilitate protein-protein interactions including WD40 repeats, F-box domains, and leucine-rich repeats are generally abundant, along with enzymes homologous to ubiquitin E3 ligases and P450 oxidative enzymes.128 40 2.2.2 Validation of borosin precursors To test whether the mined genes were in fact borosin precursors, we first selected several transcriptionally supported sequences for heterologous expression in E. coli, focusing primarily on basidiomycete-derived borosins. Eight putative precursors (CeuMA2, CmaMA, CmiMA, GjuMA, LedMA, MroMA1, PocMA, SveMA) were found to be active α-N-methyltransferases as evidenced by in vivo E. coli expression for 24 h and 72 h followed by high-resolution, high pressure liquid chromatography-mass spectrometric (LC-MS/MS) analysis of the digested proteins (Figure 2.3 A). Six additional precursors (CmuMA, BadMA, RviMA1, RviMA2, GesMA, CpeMA) were either insoluble or inactive under the tested conditions. A subset of the precursors (CmaMA, CmiMA, LedMA, MroMA1, PocMA, SveMA) were verified by size exclusion chromatography to be homodimers as seen in the elegant catenane-like structures of OphMA113 and dbOphMA.114 The active α-N-methyltransferase domains, similar to OphMA, methylate in an N-to-C fashion on primarily hydrophobic core peptides. Directionality is inferred from less methylated precursors observed during 24 h versus 72 h in vivo expressions; an example of this data is presented in Figure 2.3 B for CeuMA2. Integration of LC-MS peaks at different fermentation intervals suggest the lesser methylated species are intermediates in the production of the more abundant, highly methylated precursors. A thorough analysis of all detected peptide fragments for data summarized in Figure 2.3 can be found in Figure 9.3. Interestingly, SveMA appears to initiate α-N-methylation at several residues and does not appear to methylate in a stringent pattern. SveMA plasticity in PTM initiation and distribution is reminiscent of other N-methyltransferases in bacterially derived RiPP biosynthetic pathways.65 As a final verification for autocatalysis, CmaMA, LedMA, MroMA1, and SveMA were shown to be active in vitro. Time- and SAM-dependent population shifts to more highly methylated species further support our in vivo data and inferences for methylation directionality on the core peptides (Figure 9.4). 41 Figure 2.3 Borosin precursors identified and functionally characterized in this study Cartoon representations for the corresponding borosin precursor protein architecture is shown at the top of relevant panels. Each peptide sequence depicts a proteolytic fragment comprising the C-terminal core region of the respective borosin precursor heterologously expressed in E. coli. Methylated amino acids are represented as open and filled orange circles based on LC-MS/MS data. (a) Methylation summary for newly verified type I borosins having the same overall architecture as OphMA. Asterisks (*) denote alternative methylation initiation sites. (b) An example of LC-MS/MS data and relative abundance calculations (percentages on the right) for all methylated fragments detected for the borosin precursor CeuMA2. LC- MS/MS fragmentation of the major CeuMA2 methylated peptide is also shown. Further LC-MS/MS for all data summarized in this figure can be found in Figure 9.3. Genomic information for the gene clusters encoding the characterized borosin precursors can be found in Table 9.1 and Table 9.2. (c) Methylation summary for the type II borosin precursor PgiMA1 encoding approximately ten near-identical core repeats. Non-consensus amino acids are colored grey. Peptide fragments were proteolytically cleaved between the dashed lines and analyzed by LC-MS/MS. (d) Methylation summary for the type III borosin precursor AboMA. Only the first 100 amino acids of the clasp domain have homology to any characterized proteins. The full putative C-terminal core is shown at the top of the panel for perspective. LC-MS/MS fragments flanked by dashed arrows are positionally ambiguous in the core sequence. NMT = N-methyltransferase; AA = amino acids; lowercase ‘c’ in peptide fragments denotes iodoacetamide-derivatized cysteine. Crystallographic and mutational interrogation of OphMA suggested three residues─Tyr66, Arg72, and Tyr76─are a part of an oxyanion hole and aid in the 42 deprotonation of the backbone amide hydrogen to enable nucleophilic attack of SAM. These residues are conserved in all of the active borosin precursors, with the exception of the equivalent Phe66 exchange in PgiMA1 (Figure 9.1). This residue replacement is in good agreement with the active Tyr66Phe OphMA mutant that revealed both active and inactive conformations of the core peptide in its structure.113 2.2.3 Distinct borosin precursor structural types Upon closer inspection of the variable borosin clasp and core domains, several precursors had distinct structural differences in their overall architecture. The saprophytic basidiomycete Phlebiopsis gigantea encodes the borosin precursor PgiMA1 that harbors ten near-identical 13-amino-acid core peptides, a feature seen in several RiPP families including the fungus-derived ustiloxins117 and asperipin-2a.109 Heterologous expression in E. coli revealed methylation of aspartic acid residues spanning over 120 amino acids of the repeated cores (Figure 2.3 C). Interestingly, the precursor appears insensitive to the number of repeated core peptides as a mutant with seven deleted repeats (PgiMA_mut) was still fully active (Figure 9.3). While similar repeats in other fungal RiPP precursors and proteins are cleaved by the Kex2 protease,109,112,129 the prototypical Kex2 recognition motif is not found in PgiMA1. A protease belonging to the peptidase_M64 family is found in the genes surrounding PgiMA1 (Figure 9.2); the IgA peptidase (Clostridium ramosum-type) in this family is known to cut C-terminal to proline residues according to the MEROPS database.130 Several borosin precursors, including AboMA, TisMA, TelMA, and ApeMA, deviated even further from the canonical architecture of OphMA. While the N-methyltransferase domain and first 100 amino acids of the clasp regions were homologous to all borosins, additional ~400-amino-acid domains followed by highly repetitive acidic core sequences ranging from ~60 to 80 amino acids in length were observed. The new domains in these ~90 kilodalton borosin precursors do not have any homology in sequence (HHpred)131 or structure (Phyre2 prediction)132 to characterized proteins. When heterologously expressed 43 in E. coli, AboMA revealed an impressive level of methylation in its C-terminal sequence (Figure 2.3 D). Due to the technical challenges of working with these long repetitive sequences, non-specific proteolytic digestion with proteinase K yielded the clearest data. Approximately 20 amino acids on a single 38-mer peptide fragment were found to be α-N- methylated on sequential valines and threonines in this VDVTD repeat, where we expect up to 35 methylations on the fully mature peptide precursor. Oligopeptide repeats have been observed in proteins from all domains of life.133 The VDVTD repeat in AboMA is reminiscent of pentapeptide repeat proteins that can form structures such as β-helices and β-solenoid structures.134 Ongoing studies are aimed at determining the structure and function of these peculiar borosin-derived peptides. To the best of our knowledge, these peptides are the most heavily α-N-methylated peptides or proteins observed to date. Due to the marked differences of borosin precursors, we propose further classification within this family based on their distinct protein architectures (Figure 2.3 A-D). We designate type I borosins to the canonical OphMA-type precursors of ~400 amino acids in length and a single core peptide. Type II borosins, with PgiMA1 as the only verified member, is signified by multiple core sequences C-terminal to the N-methyltransferase and clasp domains of ~400 amino acids in total length. Finally, the type III borosins are defined by the overall architecture of AboMA, and are distinguished by their additional 400-amino-acid C-terminal domain followed by long and highly repetitive core sequences. 2.2.4 Linking borosin gene clusters to metabolites Despite having expression conditions for multiple fungal strains and gene clusters listed above, we have yet to link any cluster to its natural products. This may be in part due to similarly low levels of production seen with the omphalotins, where fermentations of 200 L were necessary to isolate the more highly oxidized derivatives.103 As an alternative, we performed structural searches of α-N-methylated peptides that may stem from borosin pathways in unsequenced microorganisms. In fungi, the vast majority (>95%) of the 44 ~15,000 isolated fungal natural products have not been linked to their biosynthetic origins. Thus, many peptides may be misassumed to originate from nonribosomal peptide synthesis, just as the omphalotins were at their discovery.106,135 The gymnopeptides, 18- mer N-to-C cyclic peptides recently isolated from the fruiting bodies of the oak pathogen Gymnopus fusipes (formerly Collybia fusipes), stood out as potential borosin candidates despite the lack of genomic information (Figure 2.4 A).136 These potent antiproliferative peptides are up to 1000 times more potent than cisplatin against several cancer cell lines, and are composed of entirely proteinogenic L-amino acids with α-N-methylations at 10 out of 18 amide bonds.137 To determine whether the gymnopeptides are biosynthesized via the ribosome as borosins, we proceeded with a nested degenerate PCR approach using conserved sequences in Agaricales-derived borosin precursors and the gymnopeptide sequences (Figure 9.5).64 A ~400 base pair band with homology to OphMA was amplified out of G. fusipes MUCL 28262. Next, we performed inverse PCR138 on self-ligated segments of the G. fusipes genome to confirm the gymnopeptide sequence was encoded in a borosin precursor. The final sequence of borosin GymMA1 was determined through creating and screening a phage-assisted E. coli fosmid library of the G. fusipes genome.139 After intron prediction and cloning into an E. coli expression vector, heterologous expression and LC-MS/MS analysis revealed a methylation pattern in exact agreement with the gymnopeptides (Figure 2.4 B and Figure 9.6). Thus, the gymnopeptides join the omphalotins as the second set of bioactive peptides from the borosin family of RiPP natural products. Gymnopeptide B possesses a β-hairpin-like structure containing cis amide bonds between residues Val7-Ala8 and Thr15-Val16.136 In proteins, β-hairpins are often surface- exposed motifs involved in protein-protein interactions, and are frequently found in antibodies and cytokine receptors. Consequently, β-hairpins can also be found in a wide variety of peptide natural products that include gramicidin S, ω-conotoxin, defensins, cyclotides, and many antimicrobial peptides.140 Interestingly, the type-IV-like β-turn at Val7-Ala8 in the gymnopeptides usually requiring proline at the i+3 position is replaced 45 by an α-N-methylated amino acid, a property that has been observed in model synthetic peptides.141 Thus, borosin peptides, with their exclusive properties of genetically templated residues resulting in α-N-methylated amino acids, can survey a wide variety of β-hairpin motifs and other structures otherwise inaccessible by peptides and proteins produced by the ribosome. Figure 2.4 Structures of the gymnopeptides and the corresponding borosin precursor analysis (a) The structures of gymnopeptides A and B differ by serine or threonine at position 15, respectively. (b) LC-MS/MS data revealing the borosin precursor GymMA1 methylation pattern and sequence perfectly matches gymnopeptide B. Residue numbering is as suggested for OphMA60 and in line with RiPP nomenclature,48 where italicized residues are numbered sequentially starting from the core peptide and ‘+1’ begins with the C-terminal recognition sequence that is presumably cleaved off during cyclization. For the full sequence of GymMA1 and all the LC-MS/MS data for GymMA1-methylated fragments, please see Table 9.1 and Figure 9.6, respectively. 46 2.3 Conclusion This work outlines the biosynthetic landscape of the α-N-methylated borosin RiPP family of natural products. Through genome mining and heterologous expression, over 50 putative gene clusters encoded in basidiomycete and ascomycete fungi were identified. Through catalytic validation of over 10 autocatalytic borosin precursors, two additional borosin precursor structural types were discovered, with type II precursors defined by multiple core sequences and type III characterized by extraordinarily long catalytic leaders and highly repetitive acidic core sequences. Lastly, our evidence advocates that the antineoplastic gymnopeptides are biosynthesized via a borosin pathway. Basidiomycetes appear to be particularly robust hosts for borosin natural products, as 25 species out of several hundred sequenced genomes were found to encode one or more borosin pathways. With over 30,000 basidiomycete species, 60,000 ascomycetes, and five million total fungi currently estimated to exist on Earth, the projected biosynthetic capabilities and catalytic diversity of fungi is staggering.142 Thus, this publication adds to the small collection of research highlighting the untapped potential of fungi, especially mushrooms, for producing RiPP natural products. 2.4 Materials and methods Please see Appendix 1 (Chapter 9) for extensive supplementary tables and figures. 2.4.1 Materials HiFi DNA Assembly Master Mix, restriction enzymes, OneTaq and Q5 High Fidelity DNA polymerase were purchased from New England Biolabs (NEB). Gene synthesis and codon optimization was performed at Genscript and SGI-DNA (sequences found in Table 9.1). Commercial proteases were purchased from Promega (sequencing-grade trypsin, AspN, and chymotrypsin) or Gold Biotechnology (proteinase K). Primers were ordered from IDT and listed in Table 9.1. Unless otherwise stated, chemicals and reagents were purchased from MilliporeSigma. Gymnopus fusipes MUCL 28262 was purchased from the 47 Belgian Co-Ordinated Collections of Micro-organisms. Anomoporia bombycina ATCC 64506 was purchased from the American Type Culture Collection. 2.4.2 Borosin identification and phylogenetic analysis The Joint Genome Institute (JGI, genome.jgi.doe.gov/programs/fungi/index.jsf) and National Center for Biotechnology Information (NCBI, www.ncbi.nlm.nih.gov) were searched for OphMA homologues using the Basic Local Alignment Search Tool for proteins (BLASTp) with a BLOSUM62 scoring matrix and a standard Expect-value cutoff of 1.0E-5. For the initial query, an OphMA sequence fragment including both the N- methyltransferase domain and the first 100 amino acids of the clasp domain was used. Genomes encoding putative homologues were overlaid with publicly available transcript data from the above stated repositories using the program Geneious R10 (www.geneious.com). RNA-confirmed borosin sequences were used to manually curate predicted splicing junctions for untranscribed borosin homologues. For sequence alignments, putative borosin protein sequences were trimmed to their N- methyltransferase domains relatively spanning from Gly10 to Ala252 of OphMA (Table 9.1). Protein sequences were aligned using the MAFFT plugin (v7.388) for Geneious Prime (2019.0.4; www.geneious.com) with the following parameters: [Algorithm: Auto; Scoring Matrix: BLOSUM62, Gap open penalty: 1.53; Offset value: 0.123].143 Bayesian inference was used to estimate posterior probability and construct a phylogenetic cladogram of 54 putative borosins using the MrBayes (3.2.6) plugin for Geneious Prime with the following parameters [Rate Matrix (fixed): mtrev; Rate Variation: invgamma; Outgroup: CobA from Bacillus megaterium; Gamma Categories: 4; Chain Length: 1,100,000; Heated Chains: 4; Heated Chain Temp: 0.2; Subsampling Freq: 1,000; Burn-in Length: 110,00; Random Seed: 17,051].144 48 2.4.3 Cloning and gene synthesis Gene sequences for aboMA, badMA, ceuMA2, cmaMA, cmiMA, cmuMA, cpeMA, gesMA, gjuMA, ledMA, mroMA1, pocMA, rviMA1, rviMA2, and sveMA were fully or partially confirmed by RNA sequencing data available from the Joint Genome Institute (JGI, genome.jgi.doe.gov/programs/fungi/index.jsf). For all genes, splicing junctions were manually inspected and predicted based on sequence-confirmed borosin precursors. Gene synthesis, codon optimization for E. coli, and cloning into pET24a was performed by Genscript for genes cmiMA, cmaMA, cpeMA, gesMA, ledMA, mroMA1, rviMA1, rviMA2, and sveMA. Genes badMA, ceuMA2, cmuMA, gjuMA, and pocMA were synthesized and codon optimized for E. coli by SGI-DNA and cloned via Gibson assembly into pETDUET- 1 using the manufacturer’s suggestions. Each gene was amplified with Q5 high fidelity DNA polymerase using primers Fwd_SGI_Order_GibsonPCR and Rev_SGI_Order_GibsonPCR (Table 9.1) to add homology arms for Gibson assembly into a linear backbone. The PCR reaction was prepared according to the manufacturer’s recommendations (1x Standard Q5 PCR buffer, 200 µM dNTPs, 0.5 µM final concentration of each primer, 0.02 U Q5 high fidelity DNA polymerase / 50-µL reaction) with 5% DMSO added. The thermal cycler conditions were programmed according to manufacturer’s instructions: the DNA was denatured at 98 °C for 30 s followed by 30 cycles consisting of 10 s at 98 °C, 62.1 °C for 30 s, and 30 s at 72 °C, with a final extension at 72 °C for 2 minutes. Gene aboMA was codon-optimized, synthesized, and cloned into pETDUET-1 by SGI-DNA. All genes were expressed encoding N-terminal histidine tags; sequences are listed in Table 9.1. Gene pgiMA1 was directly cloned from Phlebiopsis gigantea genomic DNA as three exons and assembled using overlap extension PCR.145 Genomic DNA was extracted in the same manner as described below for G. fusipes except P. gigantea was grown on YMD solid media (0.4% yeast extract, 1.0% malt extract, 0.4% dextrose, 1.5% agar). Aliquots (3 µL) of genomic DNA (320 ng/µL) was used in each 25-µL PCR. PCR reactions were prepared as stated above. The first exon of pgiMA1 was amplified using primers 49 PhlgiNMT232.1F and PhlgiNMT232.1R primers for 30 cycles at 98 °C for 7 s, 59 °C for 10 s, and 72 °C for 15 s. The second exon was amplified using primers PhlgiNMT232.2Fnew and PhlgiNMT232.2R for 30 cycles of 98 °C for 7 s, 68 °C for 15 s, and 72 °C for 15 s. The third exon was amplified using primers PhlgiNMT232.3abF and PhlgiNMT232.3aR for 30 cycles of 98 °C for 10 s, 59 °C for 30 s, and 72 °C for 30 s. Verified PCR products were excised, purified (Monarch gel extraction kit, NEB) and combined in a 1:1:1 molar ratio and used as template in an overlap extension PCR, where primers GibPhlgiNMT232.1F and GibPhlgiNMT232.3aR were added after the first 5 cycles of 98 °C for 10 s, 68 °C for 30 s, and 72 °C for 40 s. The annealing temperature was then decreased to 58 °C for the remaining 25 cycles. The PCR fragment was then cloned into pET28b via Gibson assembly using the manufacturer’s instructions and sequenced verified. To create the truncated pgiMA1_mut gene, the above plasmid was used as DNA template for two PCRs using Q5 high fidelity DNA polymerase. To amplify the backbone, primers prFM1118 and prFM1119 were used in a PCR reaction for 30 cycles at 98 °C for 10 s, 72 °C for 30 s, and 72 °C for 4 minutes. The truncated gene was amplified using primers prFM1116 and prFM1117 for 30 cycles at 98 °C for 7 s, 57 °C for 20 s, and 72 °C for 5 s. The insert and backbone were verified and purified by agarose gel and combined in a Gibson assembly according to the manufacturer’s instructions. Gene pgiMA1_mut, with eight putative core peptide near identical repeats removed, and was then sequence- verified. 2.4.4 Protein expression and purification Protein expression and purification was performed as described previously.60 Briefly, genes were expressed in BL21(DE3) cells at 16 °C for 24 h and 72 h, with the exception of pgiMA1, pgiMA1_mut, and gymMA1, which were expressed in Rosetta(DE3) cells. Cells were harvested and lysed using either French Press or sonication. Recombinant proteins were purified via nickel-chelate chromatography based on manufacturers’ 50 recommendations (Ni-NTA, Gold Biotechnology). For CmiMA, CmaMA, LedMA, MroMA1, and SveMA, imidazole was removed and protein was concentrated using Amicon Ultra filters (30-kDa MWCO) and S4 loaded onto a pre-equilibrated Superdex- 200 Increase size exclusion column. Dimeric protein was collected and concentrated to 4.0 mg/mL, determined by Pierce BCA protein assay kit. For GjuMA and PocMA, a HiLoad 16/600 Superdex 200 pg size exclusion column was used. 2.4.5 Proteolytic digestion Proteolytic digestion of CmiMA, CmaMA, LedMA, MroMA1, and SveMA was performed as previously described.60 Briefly, purified protein was digested in solution with trypsin in a molar ratio of 1:80 for 5 h at 37 °C. AboMA, CeuMA2, GjuMA, GymMA1, and PocMA were digested using an in-gel digestion method. Appropriate bands from soluble fractions were excised from SDS-PAGE and cut into ~2 mm x 2 mm cubes. Gel pieces were placed in 1.5 mL LoBind tubes (Eppendorf) and then washed with a 1:1 ratio of 100 mM ammonium bicarbonate (ABC) : acetonitrile (ACN) three times until all the dye was removed. Gel pieces were then dehydrated in 100% ACN until semi-opaque (~30 sec), after which the ACN was discarded. If the putative RiPP core sequences contained cysteines, reduction (treatment with 10 mM DTT in a 65 °C water bath for 1 h before discarding DTT solution) and alkylation (treatment with 55 mM iodoacetamide in 50 mM ABC at room temperature for 30 minutes) were performed. After reduction and alkylation, gel pieces were washed twice using a 1:1 ratio of ACN:ABC and then dehydrated in 100% ACN until semi-opaque as in the previous steps. Gel pieces were then rehydrated in digestion buffer (50 mM ABC, 5 mM CaCl2, and appropriate units of protease) on ice for 15 minutes before overnight incubation at 37 °C, or for chymotrypsin, 25 °C. CeuMA2, GjuMA, GymMA1, and PocMA proteolytic digests were performed with AspN using a 1:50 protease:protein molar ratio. PgiMA1 and PgiMA1_mut proteolytic digests were performed with chymotrypsin using a 1:200 molar ratio. AboMA digests were performed with proteinase K using a 1:4 molar ratio. Digestion supernatants were recovered the next 51 day and placed in a fresh LoBind tube. Digested peptides were recovered by dehydrating the gel pieces in two successive steps. First, 60 µL of 50% ACN and 0.3% formic acid (FA) was added, incubated for 15 minutes at room temperature and recovered. Second, 60 µL of 80% ACN and 0.3% FA was added, incubated and recovered. The extracted peptides were pooled and frozen at -80 °C for 30 minutes to deactivate the protease. Peptide solutions were then thawed and dried using a SpeedVac (Eppendorf). Peptides were resuspended in 0.1% FA and further purified and desalted using C18 ZipTips according to the manufacturer’s specifications. After drying the samples again, peptides were resuspended in 15-30 µl of 20% ACN, 0.1% FA, and transferred to glass vials for MS analysis. 2.4.6 Peptide mass spectrometric analysis (LC-MS/MS) LC-MS/MS measurements of digested peptides from CmiMA, CmaMA, LedMA, MroMA1, and SveMA were performed as described previously on a Thermo Scientific Q Exactive mass spectrometer equipped with a Dionex Ultimate 3000 UHPLC system using aPhenomenex Kinetex 2.6 μm C18 100 Å (150 × 4.6 mm) column.60 Samples AboMA, CeuMA1, GjuMA, GymMA1, and PocMA were run similarly to method 2 described previously.113 Briefly, data were recorded on a Thermo Scientific Fusion mass spectrometer equipped with a Dionex Ultimate 3000 UHPLC system using a nLC column (200 mm × 75 μm) packed using Vydac 5-μm particles with a 300 Å pore size (Hichrom Limited). Elution was performed with a linear gradient using water with 0.1% FA (solvent A) and ACN with 0.1% FA (solvent B) at a flow rate of 0.3 μl/min. The column was equilibrated with 20% solvent B for 5 min, followed by a linear increase of solvent B to 85% over 32 min and a final elution step with 85% solvent B for 2 min. Mass spectra were acquired in positive-ion mode. Full MS was done at a resolution of 60,000 [automatic gain control (AGC) target, 4 × 105 ; maximum ion trap (IT), 50 ms; range, 300 to 1800 m/z], and data-dependent as well as targeted MS/MS was performed at a resolution of 15,000 (AGC target, 5 × 105 ; maximum IT, 500 ms; isolation window, 2.2) using higher-energy 52 collisional dissociation (HCD). HCD collision energies from 14-20% with steps of ±4% were used during LC-MS/MS measurements. Data were processed using Thermo Fisher Xcalibur software and MaxQuant as previously described.113 2.4.7 RNA expression of ledMA in Lenintula edodes mycelium and fruiting body RNA was isolated from L. edodes mycelium and fruiting bodies. L. edodes mycelium was grown on cellophane disks (Celloclair) on top of agar plates containing yeast extract maltose agar for 10 days at 23 °C. Various fruiting body stages were harvested from a L. edodes mycelium block, growing fruiting bodies at 23 °C throughout 8 days. For RNA isolation 400 mg flash-frozen mycelium or fruiting body were mixed with 200 μl Qiazol (Qiagen) and 200 mg of 0.5 mm glass beads. Cell lysis was done in three FastPrep (Thermo Savant) steps of 45 s at levels 4.5, 5.5 and 6.5. Between each step the samples were incubated on ice for 5 minutes. Another 800 μl Qiazol and 200 μl chloroform were added and the samples were centrifuged for 15 minutes at 4 °C and 12000 x g. The aqueous phase was used for RNA isolation with the RNeasy Lipid Tissue Mini Kit (Qiagen) following manufacturer`s instructions. From 2 μg isolated RNA, cDNA was synthesized using the Transcriptor-first strand cDNA synthesis kit (Roche Applied Science) according to the manufacturer’s protocol. Expression of ledMA mRNA in various fruiting body stages and mycelium was verified by PCR amplification of the predicted ledMA sequence from the generated cDNA using Phusion high-fidelity DNA polymerase and primers ledA_fwd and ledA_rev. 2.4.8 Genomic DNA isolation of G. fusipes Gymnopus fusipes MUCL 28262 was grown for approximately one month on a porous cellophane membrane disk over Blakesee agar media (2% dextrose, 2% malt extract, 1% peptone, 1.5% agar). DNA was extracted using the CTAB method published previously.146 Briefly, ~300 mg of fungal biomass was flash-frozen in liquid nitrogen and 53 crushed with a sterile mortar and pestle until forming a fine powder. The biomass was transferred into a 2-mL microcentrifuge tube and mixed with 500 µL of 2% CTAB buffer (2% cetyltrimethylammonium bromide (CTAB), 100 mM Tris pH 8.0, 20 mM EDTA, 1.4 M NaCl, 1% polyvinylpyrrolidone 40, and 0.2% β-mercaptoethanol added immediately prior to use). Samples were warmed at 65 °C for 30 minutes with intermittent gentle mixing. Phenol-chloroform-isoamyl alcohol 25:24:1 (500 µL) was added to the sample and placed on a gentle rocker for 20 minutes. Samples were spun down at ~14000 x g for 5 minutes and the top layer was transferred to a fresh 2-mL microcentrifuge tube. This process was repeated three times, the last wash being performed only with chloroform. The DNA was precipitated at -20 °C with 15 µl of 5 M sodium acetate. The solution was pelleted via centrifugation at ~14000 x g, and washed first with 70% ethanol, then 95% ethanol. After removing the ethanol, the DNA pellet was reconstituted in 2.5 mM Tris pH 8.0, and frozen at -20 °C until needed. 2.4.9 Degenerate PCR amplification of putative gymnopeptide borosins To identify the possible genetic origins of borosin-encoding gymnopeptides, we chose several conserved amino acid regions as degenerate PCR targets from the multiple alignment of putative Agaricales borosins displayed in Figure 9.5. Using OphMA sequence notation, the conserved regions Tyr76-Glu81 was targeted with primers Boro78AF-YVQMAE, Boro78AF-YTQMAE, and Boro78AF-YYQMSE (combined degeneracy of 320 sequences), Phe97-Gly102 was targeted with the primer 96-FYGHPG- F (degeneracy of 128 sequences), and Val208-Ile212 was targeted with the primer Boro212R-VVHI*MG*A (degeneracy of 256 sequences) listed in Table 9.1. To target the putative core sequence of the gymnopeptides, primers Core1-VAVVGV-R1, Core1- VAVVGV-R2, Core2-VGAVAV-R1, and Core2-VGAVAV-R2 (each with a degeneracy of 512 sequences) were tested. All degenerate primer sets were combined in equal molar ratios of individual sequences prior to PCRs. The successfully nested PCRs were amplified with Q5 high fidelity DNA polymerase (0.025 U/µL PCR), 4 µM total primer, 200 µM 54 dNTPs, 1x Standard Q5 PCR buffer, and 5% DMSO. The first PCR (25 µL) was amplified off of 400 ng G. fusipes genomic DNA with primers 96-FYGHPG-F and Core1- VAVVGV-R1 (6.25 nM each unique primer sequence). A touchdown PCR method was performed with an initial denaturation of 98 ° C for 3 minutes, followed by 10 cycles of 98 °C for 15s, 60-50 °C (-1 °C/cycle) for 20 s, 72 °C for 2 minutes, and 25 cycles of 98 °C for 15 s, 49 °C for 20 s, 72 °C for 2 minutes, with a final extension of 72 °C for 7 minutes. An aliquot (1 µL) from this PCR was used as a template for the following nested 50-µL PCR using primers 96-FYGHPG-F and Boro212R-VVHI*MG*A (10.4 nM each unique primer sequence) with an initial denaturation of 98 °C for 3 minutes, followed by 10 cycles of 98 °C for 15s, 60-50 °C (-1 °C/cycle) for 20 s, 72 °C for 30 s, and 25 cycles of 98 °C for 15 s, 49 °C for 20 s, 72 °C for 30 s, with a final extension of 72 °C for 7 minutes. An ~ 400 bp band was excised, gel-purified, A-tailed with OneTaq DNA polymerase, and subcloned into the pGEM-T Easy Vector System I (Promega) using the manufacturer’s suggestions. Positive clones harboring homology to borosin RiPP precursors were sequence verified. 2.4.10 Inverse PCR An inverse PCR method to more fully identify the putative borosin precursor encoded by G. fusipes was performed similarly to a published protocol.138 Briefly, G. fusipes genomic DNA (150 ng) was digested in 30-μL reactions with XbaI, HindIII-HF, NdeI, or BamHI-HF, separately. The samples were ethanol precipitated and resuspended in 30 μL water prior to ligations (100 μL) run at 15 °C for 1 h with 1 μL T4 DNA ligase (NEB), 10 μL of the DNA resuspension, and 1x T4 buffer. A PCR with primers GyfWlk2_F and GyfWlk2_R was performed with concentrations reported above with Q5 polymerase using a method of: 98 °C for 30 s, followed by 30 cycles of 98 °C for 10 s, 62 °C for 30 s, 72 °C for 2 minutes, with a final incubation at 72 °C for 2 minutes. A nested PCR was performed using 1 μL of this reaction with primers GyfWlk_F and GyfWlk_R and a similar method with a 68 °C annealing temperature and for only 25 cycles. A ~3-kb band from the BamHI- HF digested sample was subcloned using the pGEM-T Easy Vector System I using the 55 manufacturer’s suggestions. Subsequent screening and sequencing revealed a nearly complete encoded borosin precursor. 2.4.11 G. fusipes fosmid library, PCR screening, and cloning of gymMA1 Genomic DNA was extracted from G. fusipes mycelium grown on 1.5% agar plates with 0.4% yeast extract, 1% malt extract, and 0.4% dextrose over porous cellophane membranes disks at room temperature for 20 days. A 600,000-member E. coli fosmid library was created from the extracted G. fusipes genomic DNA using the CopyControl HTP Fosmid Library Production Kit (Epicentre) as previously described.139 PCR-screening for putative borosins was performed using component concentrations as mentioned above and with primers GyfInt_F and GyfInt_R with OneTaq DNA Polymerase under the following conditions: 95 °C for 5 min, then 30 cycles of 95 °C for 45 s, 56.5 °C for 20 s, 68 °C for 30 s, followed by a final incubation at 68 °C for 7 minutes. The DNA sequence of gymMA1 was determined through Sanger sequencing of positively-screened fosmids using primers GyfInt_F, GyfInt_R, GymWalk_F, and GymWalk_R. Exon junctions were predicted based on sequence characteristics and alignment with closely related homologues. Exons were amplified using Q5 High-Fidelity DNA polymerase with primers GymA-Exon1_F2, GymA-Exon1_R2, GymA-Exon2_F, GymA-Exon2_R, GymA- Exon3_F2, and GymAExon3_R2. Exons were stitched together through overlap extension PCR using component concentrations as mentioned above and under the following conditions: 98 °C for 30 s, followed by 8 cycles of 98 °C for 10 s, 62 °C for 60 s, and 72 °C for 60 s. After the initial amplification, primers GymA-Exon1_F2 and GymA- Exon3_R2 (500 nM final concentration) were added to the reaction and run for 30 cycles at 98 °C for 10 s and 72 °C for 90 s, followed by a final 2 minute extension at 72 °C. The gene insert was assembled into pET28b digested with NdeI and BamHI using the NEBuilder HiFi DNA Assembly Kit (NEB) at a 1:2 vector-to-insert mole ratio. The resulting sequence-verified gymMA1 expression plasmid was transformed into Rosetta(DE3) electrocompetent cells. 56 3 Preliminary findings of split borosins found in the bacteria Rhodospirillum centenum SW and Streptomyces sp. NRRL S-118 This chapter was written by Fredarla Miller for the purpose of this thesis. Data and results will contribute to a larger survey of bacterial borosins for future publication. Dr. Matthew Jensen performed most of the cloning and foundational work supporting this chapter. Fredarla Miller performed additional protein expressions, purifications, stability tests, in vitro experiments, and analyzed all the mass spectrometry data. Dr. Michael Freeman performed the bioinformatics analysis to identify the putative split borosins. 3.1 Introduction The borosin family of RiPP natural products thus far has two confirmed members, both from fungi: the omphalotins and the gymnopeptides.60,115 Both suites of molecules are cyclic α-N-methylated peptides that are biosynthesized via the canonical type 1 borosin precursors which encode an N-terminal α-N-methyltransferase.115 Motivated by the diverse domain architectures of borosin precursors found in fungi, we sought to further expand this family of RiPPs with an emphasis on the discovery of unique domain architectures, candidates for rigorous biochemical characterization, and novel RiPPs with unique bioactivities. Using the methyltransferase domain of OphMA as a query for BLAST searches, distantly related homologs in bacteria were identified. However, unlike borosin types 1-3, these putative borosin methyltransferase genes did not appear to be fused to a core peptide at the C-terminus. Upon manual inspection of the identified bacterial borosin methyltransferases, many were proximal to a hypothetical protein that shared qualitative similarities to known RiPP precursors yet bore little sequence identity to the clasp/core domain of OphMA. Examples of the similarities are hydrophobic amino acid residues near the C-terminus (similar to the stretch of hydrophobic residues in the core peptide of OphMA)60 and repeated motifs in the core peptide (a common feature in RiPP biosynthesis; examples include the ustiloxins, microviridins, and the borosin precursor PgiMA1).115,147,148 Also of note is that many of the identified putative bacterial borosin 57 precursors contain a conserved region in the leader moiety. This conserved region is homologous to LigA, the small non-catalytic subunit of the LigAB extradiol dioxygenase complex.149 Leader peptides often exhibit conserved motifs which act as binding sites for the recruitment of modifying enzymes to the precursor peptide for catalysis.48 The presence of a conserved domain in the leader of these newly identified split borosin precursors generated confidence in their legitimacy as functional RiPP BGCs. Split borosins follow the typical RiPP biosynthetic logic. Canonical RiPP biosynthesis is shown in Figure 3.1 A-B while C-D compare borosin biosynthetic systems. Previous attempts to artificially split OphMA into its enzymatic and substrate domains for rigorous kinetic analysis were unsuccessful (data not shown). Thus, we hoped natively “split borosins” would be more amenable to biochemical characterization because the reaction would no longer be pseudo-zero order (as the substrate is no longer fused in a 1:1 molar ratio with the enzyme). 58 Figure 3.1 RiPP biosynthesis and borosin biosynthesis A-B is identical to Figure 1.3 but is repeated here for convenience. A: Representative RiPP BGC B: Simplified RiPP biosynthesis C: Generalized biosynthesis of borosin types 1-3 wherein the modifying enzyme is encoded within the leader portion of the precursor peptide. “Autocatalytic” is in quotation marks because this is an intermolecular reaction between separate subunits in a homodimeric complex. D: Generalized biosynthesis of split borosins, which follow the canonical RiPP biosynthetic logic because the modifying enzyme is a separate ORF from the precursor peptide. 59 While we discovered dozens of putative split borosin BGCs in bacteria, this thesis will focus upon the α-N-methyltransferases and precursors found in three organisms: Rhodospirillum centenum SW, Streptomyces sp. NRRL-S118, and Shewanella oneidensis MR-1. R. centenum SW and Streptomyces sp. NRRL S-118 will be discussed in detail in the following sections of this chapter, while S. oneidensis MR-1 will be discussed in the following chapters. These Chapters 3 and 4 present preliminary data confirming that these methyltransferase-precursor sets are capable of catalyzing α-N-methylation of the respective core peptide. As each of these putative borosin BGCs were identified through genome mining efforts, they currently remain “orphan” as they have no identified metabolite associated with them. We can infer that posttranslationally modified residues within a precursor peptide are part of the core peptide, but without an associated metabolite, we cannot confirm the boundaries of the core peptide nor rule out the presence of other PTMs. From these three sets of split borosins, we sought to identify at least one set that was easily heterologously expressed and purified from E. coli, was amenable to in vitro kinetic analysis, was easily analyzed by mass spectrometry for α-N-methylation, and was in a tractable organism suitable for in vivo studies. The putative borosin genes found in Streptomyces sp. NRRL S-118 and S. oneidensis MR-1 were approached as “minimal” borosin systems and considered to offer the best chance of success. As shown in Chapter 4, the characterization of the putative borosin genes from S. oneidensis MR-1 was especially fruitful in these respects. Thus, the borosin methyltransferase and precursor from this organism were further biochemically characterized in preparation for an in-depth kinetic and structural analysis presented in Chapter 5 and in vivo analyses presented in Chapter 6. 3.2 Split borosin BGC found in R. centenum SW Rhodospirillum centenum SW, a purple photosynthetic α-proteobacterium first isolated in 1989, is anoxygenic and capable of fixing nitrogen.150 It is somewhat thermophilic, preferring temperatures between 40-44 °C and is capable of forming cysts 60 during times of environmental stress.150 This organism’s genome was sequenced in 2010, further revealing its unique metabolic capabilities including nitrogen fixation, photosynthesis, chemotrophy, chemotaxis, and formation of cysts. Due to these capabilities revealed through microbiological assays and genome analyses, it was touted as a potentially amenable model organism to study these biochemical pathways and associated physiological responses.151 Considering this organism’s unique biology, possible usefulness in agriculture for nitrogen fixation, and its genetic tractability, we were pleased to discover a putative borosin methyltransferase in its genome. The putative borosin BGC found in R. centenum SW was identified based on RceM’s homology to OphMA and is shown in Figure 3.2 A. Unfortunately, the boundaries for this BGC are not clear, so several genes up- and downstream of the putative borosin α-N- methyltransferase (rceM) and precursor (rceA) are shown in the figure. The annotation of several of the proximal genes suggest that the RiPP associated with rceMA may be further posttranslationally modified and may possibly play a unique role in the native organism’s metabolism. For example, oxidoreductases are commonly part of RiPP BGCs as modifying enzymes where they install PTMs onto core peptides. The RiPP recognition element (RRE), discussed in the introduction of this thesis, exhibits a winged helix-turn-helix structure similar to the gene just upstream of rceA. Interestingly, tetratricopeptide repeats, which are structural motifs, were often seen in putative fungal borosin BGCs, so the presence of this motif in this genetic locus is promising.115 The gene bearing the tetratricopeptide repeat also exhibits a GAF domain, which is often seen to be involved with metabolic regulation through cyclic diGMP (c-diGMP) signaling.152 With few exceptions, RiPPs are generally considered to be secondary metabolite toxins. However, together with the presence of a gene coding for phosphoenolpyruvate (PEP) synthase (of glycolysis), and the GAF domain-containing gene, there is evidence that this RiPP may play a signaling role in this organism. 61 Figure 3.2 Putative borosin gene cluster from R. centenum SW A: Genetic locus of rceM (pink) and rceA (blue and orange) genes in the genome of R. centenum SW (NC_011420.2). B: Domain architecture comparison to OphMA, the canonical type 1 borosin, and PgiMA1, the only characterized type 2 borosin. Orange insets show the OphMA and PgiMA1 core peptide sequences with methylations highlighted in pink on their respective amino acid residues (only one repeated region is shown for brevity.60,115 RceM shares a high sequence identity with the methyltransferase and clasp domains of OphMA, but the core peptide of RceA is much different, resembling the core of PgiMA1 more closely. The first part of the RceA sequence, presumably a leader sequence, is in light blue. The core peptide is in orange. The methyltransferase domain of RceM bears 42% sequence identity to OphMA (Figure 3.3). However, the leader and core of the RceA precursor peptide are strikingly different (Figure 3.2 B). Instead of a stretch of hydrophobic amino acids (which are methylated), the RceA core region consists of 11 near-identical repeated motifs of approximately 10 amino acids each. This architecture is similar to the recently discovered type 2 borosins, exemplified by PgiMA1, whose core peptide consists of approximately 12 repeated motifs and is methylated on a single aspartic acid residue in each repeated segment 62 (Figure 3.2 B).115 Based on the domain similarity of RceA to the C-terminus of PgiMA1, we hypothesized that RceA would also be methylated on acidic amino acid residues within its core. Figure 3.3 Alignment of RceM with the methyltransferase domain of OphMA Alignment created using Clustal Omega (v. 1.2.4). Asterisk (*) indicates identical residue, colon (:) and period (.) indicate similar residues. Also of note in this BGC is the presence of a putative serine hydrolase with a conserved transpeptidase/β–lactamase domain. We hypothesized that this protein could be responsible for processing the sequential removal of the repeated core motifs. Many RiPP BGCs do not encode proteases/peptidases, which can make the identification of the mature natural product challenging, as the precursor peptide is unable to be fully processed heterologously. The ustiloxins are fungal RiPPs whose precursor peptide also encodes repeated core peptide motifs.147 In ustiloxin biosynthesis, each core repeat is flanked by two protease recognition sites: KR and ED residues, where the former is recognized by the housekeeping enzyme Kex2 and the latter by a protein encoded in the RiPP BGC.112 As no such obvious recognition motifs are detectable in RceA, the proximity of the annotated serine hydrolase presents a possible avenue for the identification of the mature RiPP natural OphMA_noclasp_nocore ----------------METSTQTKAGSLTIVGTGIESIGQMTLQALSYIEAAAKVFYCVI 44 RceM MRAAPMAETETPPAAPSPSAPERPRGSLTVVGTGLRALSHMTLEAISHIRDADRVFFSVP 60 :: : ****:****:.::.:***:*:*:*. * :**:.* OphMA_noclasp_nocore DPATEAFILTKNKNCVDLYQYYDNGKSRLNTYTQMSELMVREVRKGLDVVGVFYGHPGVF 104 RceM DGVTARQIRDINPEAVDLTQYYGEDKRRKQTYVQMSEVILREVRAGSAVTAVFYGHPGFF 120 * .* * * :.*** ***.:.* * :**.****:::**** * *..*******.* OphMA_noclasp_nocore VNPSHRALAIAKSEGYRARMLPGVSAEDCLFADLCIDPSNPGCLTYEASDFLIRDRPVSI 164 RceM VFPARRILSIARKEGYRAVMLPGISSLDCLMADLRVDPSVNGCQILEATDLLLRNRPIIT 180 * *::* *:**:.***** ****:*: ***:*** :*** ** **:*:*:*:**: OphMA_noclasp_nocore HSHLVLFQVGCVGIADFNFT-GFDNNKFGVLVDRLEQEYGAEHPVVHYIAAMMPHQDPVT 223 RceM SGHVIILQVGSVGDSAFSFTAGFRHAKRAVLFERLIEAYGEEHRSVLYLAATYPGLDGQA 240 .*::::***.** : *.** ** : * .**.:** : ** ** * *:** * * : OphMA_noclasp_nocore DKYTVAQLREPEIAKRVGGVSTFYIPPKARKASNLDIIRRLELL----PAGQVP------ 273 RceM VVRPLGAYRDPKVLASVPPAGTLYIPAKDMLPTDMAMAEKLGMSALVGPDAPVPAGPDSY 300 :. *:*:: * ..*:*** * ::: : .:* : * . ** OphMA_noclasp_nocore ------------------------------------------------------------ 273 RceM GPFEAQAIAALDHYRPSPTWRPRTASKALQRVMTLLAGTPSVAAVYRKDPARLVDLHPDL 360 OphMA_noclasp_nocore -------------------------------------------------- 273 RceM TPAERKALLSRRAGPLNAVTAPPPEGAPPTVDEAGNGNGGDAPSEGETA* 409 63 product. We considered this putative borosin BGC to be a good candidate for our study due to its potentially interesting biological role in a tractable host and the quality of the BGC as a whole. 3.2.1 Biochemical analysis of the putative borosin methyltransferase and precursor from R. centenum SW To determine the methylation pattern on the core peptide of RceA, we sought to heterologously co-express an N-terminally hexahistidine-tagged (his6) RceA with untagged RceM in E. coli. Unfortunately, his6-RceA proved to be unexpressed and/or insoluble. Thus, we added a cleavable solubility/expression tag to the construct (his6- SUMO-RceA).153 Even with the addition of the tag, this protein remained recalcitrant to purification. The impure sample was not amenable to further in vitro analysis (such as gel filtration or kinetics analyses). However, upon co-expression of his6-SUMO-RceA with RceM for 24 hours, we were able to obtain a small amount his6-SUMO-RceA that was sufficient to excise the band of interest, perform an in-gel digest, and analyze the core peptide for PTMs via HPLC-MS/MS (Figure 3.4 A). Based on the amino acid sequence of RceA and the expected methylation of acidic amino acid residues due to its similarity to type II borosins, we attempted digestion with several MS-grade proteases including chymotrypsin (cleaves C-terminal to aromatic amino acids and leucine and methionine at a lower rate), GluC (cleaves C-terminal to glutamic acid residues), and AspN (cleaves N-terminal to aspartic acid residues).154 Digestion of his6-SUMO-RceA with AspN, which generates four distinguishable parent ion masses corresponding to peptide fragments DV(A/I)EL(S/F)GGEL, produced relatively better data then the other two digests (Figure 3.4 B and C). Based on predicted mass shifts of the parent ion (for MS1) and the individual amino acids (MS2), we were able to localize one methylation to the second glutamic acid residue in three of the four core peptide repeats (the C-terminal repeat was un-methylated). We were also able to observe doubly- methylated peptides in very low abundance in which the first glutamic acid residue was also methylated. It is noteworthy that, unlike the repeated motifs in PgiMA1, nearly all of 64 which vary by at least one amino acid residue, many of the repeated motifs in RceA are identical. As such, it remains unconfirmed whether we achieved full MS coverage over the length of the core peptide (Figure 3.4 B and C). Furthermore, due to the very low abundance of some of the parent ions and limited MS2 fragmentation, these observations will require corroboration with further experiments (raw MS2 spectra are shown in Figure 3.5 A-D). Figure 3.4 Methylations found on RceA core peptide A: His6-SUMO-RceA was co-expressed with RceM in E. coli for 24 h and the protein was purified by Ni- NTA affinity chromatography. The elutions from the purification were pooled, concentrated, and run on an SDS-PAGE gel for subsequent band excision and in-gel digest with AspN for HPLC-MS/MS analysis. (Gel credit: Dr. Matthew Jensen) B: Full amino acid sequence of RceA with the same color scheme as previous figure. Amino acid resides that allow us to distinguish one repeat from another are un-bolded. C: Of the 10 repeats, 8 are the same amino acid sequence following digestion with AspN, thus only four distinguishable parent ion masses can be identified. Methylations are shown in pink boxes (confirmed methylations are filled in). Each repeated segment may have up to two methylations. Letters A-C in the right margin of C refer to raw data shown in Figure 3.5 A-C. 65 66 67 Figure 3.5 MS2 spectra showing methylation states of RceA core peptide fragments HPLC-MS/MS analysis of the AspN-digested RceA protein. A-C: Each sub-figure contains the full RceA precursor peptide sequence color coded as described previously with the bolded segments referring to the corresponding MS2 data. 68 While analyzing the MS2 spectra to determine the methylation pattern on RceA, we noticed that the methylated species produced MS2 spectra with peaks that are barely above background noise. We hypothesized that this may be due to the methylated peptides being sparsely abundant relative to the un-methylated peptides. We therefore sought to measure the relative abundance of each species. As expected, based upon the extracted ion chromatogram (EIC) of each parent ion mass, the un-methylated core peptides were overwhelmingly the most abundant (Figure 3.6). Among the three methylated core repeats, the repeat closest to the N-terminus of the precursor exhibited the highest abundance of singly-methylated peptides at 17%. The other methylated core peptides were 9% and 5% of the total abundance, respectively. The doubly-methylated species were not detectable by MS1. In considering the first (most N-terminal) repeat, which exhibits the highest relative abundance of singly-methylated but no detectable doubly-methylated peptide by MS2 or MS1, together with the very minimally present doubly-methylated peptides of the other repeated segments, it is possible that the second methylation is a result of the artificially/heterologously over-expressed RceM and is not a reflection of the native methylation pattern. Increasing the high-energy collision dissociation (HCD) level may yield better MS2 fragmentation and shed more light on all species identified. 69 Figure 3.6 Relative methylation states of AspN-digested RceA core peptide fragments RceA amino acid sequence is shown on the left and EICs for each AspN-generated peptide is shown on the right. EICs show the relative abundances of 0Me, 1Me, and 2Me species of each core peptide parent ion. In all cases, the 0Me species was by far the most abundant, followed by the 1Me species, and the 2Me species were not detectable by MS1. Predicted methylations are shown in orange in the EIC figures on the right. This preliminary data is sufficient to confirm that RceM is an active enzyme that is capable of posttranslationally methylating the RceA core peptide in vivo. The added SUMO tag increased the expression and solubility of RceA, but the bulky 11 kDa tag may impede the methylation reaction, resulting in low abundance of methylated peptides. Many aspects of this R. centenum SW split borosin system remain to be optimized, including 70 heterologous expression and purification of RceM-RceA as a pair and as individual proteins. Due to the repeated motifs in the core peptide of RceA, whole protein MS will be required to determine how many methylations are present. If expression and purification of RceA can be sufficiently optimized, NMR or X-ray crystallography may also be useful tools for determining the methylation pattern. 3.3 Split borosin BGC found in Streptomyces sp. NRRL S-118 Streptomyces spp. are Gram-positive, high GC-content, soil-dwelling bacteria that are well known as prolific natural product producers.155 Streptomyces sp. NRRL S-118 was one of 344 unique genomes sequenced for the purposes of developing a bioinformatic method for natural product discovery through genome mining.156 The study found that Streptomyces spp. genomes contain an average of 21.6 BGCs (3-43 BGCs per genome) including NRPs, RiPPs, polyketides, and more.156 Despite this high number of reported putative BGCs in the genomes analyzed, the putative split borosin BGC we identified in Streptomyces sp. NRRL S-118 was undetected in that study and the corresponding α-N- methyltransferase (strM) and precursor (strA) ORFs are both annotated as hypothetical proteins. StrM is 36% identical to the methyltransferase domain of OphMA (Figure 3.7 C). StrA, whose ORF is syntenic with and just upstream of strM, encodes a hydrophobic core peptide reminiscent of OphMA (Figure 3.7 B). StrA also contains the conserved LigA domain in its leader. In addition to the qualitative similarities of StrM/StrA to OphMA, these putative Streptomyces borosin methyltransferase and precursor proteins seem to be part of a genuine natural product BGC. Two proximal genes that we predict are a part of this borosin BGC are annotated as a GCN5-related N-acetyltransferase (GNAT) family protein and an isoprenylcysteine carboxymethyltransferase family protein, respectively (Figure 3.7 A). Microviridins are N-acetylated RiPPs that encode a GNAT family N-acetyltransferase as a conserved part of their BGCs.157 Goadsporin is another example of an N-acetylated RiPP.158 Produced by Streptomyces sp. TP-A0584, goadsporin is a potent antibiotic and 71 secondary metabolism inducer in many other Streptomyces species although its molecular target has not yet been determined.159 In regards to the putative isoprenylcysteine carboxymethyltransferase, this enzyme is common in eukaryotes for prenylating a C- terminal CXXX motif of target proteins which aids in their proper cellular localization.160 While neither StrA nor StrM exhibit the C-terminal motif, this enzyme could still be involved in this putative borosin RiPP’s biosynthesis. For example, it may install an unknown PTM onto the core peptide or aid in proper cellular localization of required biosynthetic proteins. 72 Figure 3.7 Putative borosin BGC in Streptomyces sp. NRRL S-118 A: Genetic locus of strM (pink) and strA (orange and blue) in the organism’s genome (NZ_KL591006.1). B: Domain architecture and core peptide sequence comparison. The known core peptide of OphMA, and the AspN-GluC fragment of the core region of StrA is shown (the boundaries of the core are not currently known for the core of StrA). Confirmed methylations are shown in pink boxes. One methylation on StrA has not been definitively localized to a particular amino acid by MS2, thus its inferred location is shown as an empty pink box. C: Alignment of StrM with the methyltransferase domain of OphMA created with Clustal Omega (v. 1.2.4). Asterisk (*) indicates identical residue, colon (:) and period (.) indicate similar residues. 3.3.1 Biochemical analysis of the putative borosin methyltransferase and precursor from Streptomyces sp. NRRL-S118 Due to the difficulties in expressing and purifying RceA and RceM, we anticipated similar challenges for StrA and StrM. Thus, we initiated work with these proteins by OphMA_noclasp_nocore METSTQTKAGSLTIVGTGIESIGQMTLQALSYIEAAAKVFYCVIDPATEAFILTKNKNCV 60 StrM --MQETTGNAQLVVVGTGFRAIGDLTVEARACLEQADKVLCLIGDPLVTRHIEKLNASVE 58 . * ..*.:****:.:**::*::* : :* * **: : ** . .* . * . OphMA_noclasp_nocore DLYQYYDNGKSRLNTYTQMSELMVREVRKGLDVVGVFYGHPGVFVNPSHRALAIAKSEGY 120 StrM TLDVHYAVGKPRSASYEDMVEHIMSELHRDQFVCVALYGHPGVFAYTGHEAIRRAREEGI 118 * :* ** * :* :* * :: *:::. * .:*******. .*.*: *:.** OphMA_noclasp_nocore RARMLPGVSAEDCLFADLCIDPSNPGCLTYEASDFLIRDRPVSIHSHLVLFQVGCVGIAD 180 StrM AARMLPACSAEDWLFADLGLDPGERGCQSFEATDFLIRHRVFDPTGLLILWQVGVIGMID 178 *****. **** ***** :**.: ** ::**:*****.* .. . *:*:*** :*: * OphMA_noclasp_nocore FNFTGFDNNKFGVLVDRLEQEYGAEHPVVHYIAAMMPHQDPVTDKYTVAQLREPEIAKRV 240 StrM RDPGYDARPGVTTLTDALVASYGSGHPVTVYEASPYVTAEPRTTTVPLAELPDTPL---- 234 : . . .*.* * .**: ***. * *: :* * . :*:* : : OphMA_noclasp_nocore GGVSTFYIPPKARKASNLDIIRRLELLPAGQVP 273 StrM SAASTLVVPPLPPRPVDRELLARLAARR----- 262 ...**: :** : : ::: ** 73 preemptively including an N-terminal SUMO tag on the precursor to facilitate heterologous expression and purification.153 Additionally, due to the high GC content of Streptomyces spp., we codon-optimized strM and strA for heterologous expression in E. coli BL21(DE3) cells. Our typical experimental pipeline for determining if a putative borosin methyltransferase is an active enzyme begins with a co-expression experiment. We co- express the his6-tagged precursor protein with its cognate untagged methyltransferase for 24 h and subsequently purify the precursor by nickel affinity chromatography for HPLC- MS/MS analysis to detect methylations. Generally, this has been a reliable method for screening borosin proteins because only a small amount of protein is required for this sensitive method of analysis.60,115 In following this pipeline, we first sought to co-express his6-SUMO-StrA with StrM for 24 h and to purify the resulting precursor protein. The SDS-PAGE gel run after the purification showed the presence of a protein corresponding to the expected size of his6- SUMO-StrA (Figure 3.8). However, a band corresponding to StrM (28.5 kDa) was not visible in the lysate supernatant nor pellet. Despite not visualizing StrM on the protein gel, we reasoned that if even a small amount of the enzyme was present, it could still methylate the core peptide of StrA, which would be detectable by HPLC-MS/MS. Thus, the band on the SDS-PAGE gel corresponding to his6-SUMO-StrA was excised for subsequent in-gel digestion and analysis by HPLC-MS/MS for PTMs. We believed a string of hydrophobic amino acids at the C-terminus of StrA corresponded to the core peptide and would therefore be the location of methylations (Figure 3.7 B). Analyzing this peptide sequence via HPLC-MS/MS proved to be challenging. None of the common MS-grade proteases were obvious candidates for generating a peptide fragment of an appropriate size with the expected core sequence sufficiently positioned for reliable MS2 fragmentation. After several attempts to generate reliable MS/MS data from this in vivo experiment, we were unable to sufficiently confirm that methylation was taking place. In light of this challenge together with the unconfirmed 74 StrM expression in our initial co-expression experiment, we added a solubility tag to StrM and pursued an in vitro method. Figure 3.8 24 h co-expression of his6-SUMO-StrA with StrM and Ni-NTA purification SDS-PAGE gel for the expression and purification of StrA and StrM proteins. SUMO-tagged StrA is clearly visible in this gel but un-tagged StrM (28.5 kDa) is not. (Gel credit: Dr. Matthew Jensen) We cloned two additional constructs for the separate expression of his6-SUMO- StrA and his6-SUMO-StrM such that the proteins could be individually purified and added in known concentrations to an in vitro reaction. The SDS-PAGE gels representative of these expressions and nickel affinity purifications are shown in Figure 3.9 A and Figure 3.10 A. While both proteins expressed well, neither was completely purified at this stage. Despite the results of this initial purification, we considered this to be a preliminary experiment to determine if these two proteins were active borosin BGC proteins. Thus, in the interest of obtaining a quick positive or negative result, we did not yet attempt to further purify the proteins nor cleave the SUMO tags. Instead, we used the partially purified protein to prepare an in vitro reaction with a 1:1 molar ratio of StrM:StrA with excess S- adenosyl methionine (SAM, the methyl donor) and allowed the reaction to proceed for 16 h at room temperature. We chose the 1:1 ratio because this mimics the OphMA ratio of 75 enzyme:core peptide. The reaction was subsequently quenched with SDS sample buffer, run on a gel, and the band corresponding to his6-SUMO-StrA was excised. Gel pieces were treated with dithiothreitol (DTT) and iodoacetamide to prevent disulfide bond formation and subsequently digested with two proteases simultaneously (AspN and GluC). This produced the target peptide cHAVLVVIIF, where the underlined letters were the anticipated location of methylations and the lowercase “c” indicates the protected cysteine residue. 76 Figure 3.9 StrA purification A: SDS-PAGE gel for StrA. Protein was expressed (without induction) and purified as discussed in the experimental section below, with one exception. Before elution, “wash 3” of 1 mL of lysis buffer with 250 mM imidazole was used in an unsuccessful attempt to remove impurities. B: bdSENP1 protease was used to cleave the his6-SUMO tag from StrA. Un-cleaved protein and cleavage products are indicated in the margins. C: Attempt to further purify StrA after treatment with bdSENP1 using its putative heat stability. Insoluble (I) and soluble (S) protein is shown after incubation at various temperatures for 30 min. D: Gel filtration chromatogram of the 75 °C heat-purified StrA sample. Fractions at the top of each peak are labeled. E: SDS- PAGE gel corresponding to the labeled peaks on the chromatogram. 77 Figure 3.10 StrM purification A: SDS-PAGE gel for StrM. Protein was expressed and purified as discussed in the experimental section below, with one exception. Before elution, “wash 3” of 1 mL of lysis buffer with 250 mM imidazole was used in an unsuccessful attempt to remove impurities. B: bdSENP1 protease was used to cleave the his6- SUMO tag from StrM. Un-cleaved protein and cleavage products are indicated in the margins. C: Gel filtration chromatogram of the pooled “flow through” fractions. Fractions at the top of each peak are labeled. D: SDS-PAGE gel corresponding to the labeled peaks on the chromatogram. 78 Analyzing the digested StrA peptide on HPLC-MS/MS confirmed our hypothesis that StrM is a methyltransferase capable of installing up to four methylations onto the putative core peptide of StrA (full methylation pattern shown in Figure 3.7 B). Figure 3.11 shows the raw MS2 data for all five methylated species (0-4 methylations). This preliminary data suggests that the initial methylation occurs on the leucine reside of the core (L68), with methylations two and three occurring on the adjacent valine residues in an N- to C-terminal manner (V69 and V70). The fourth methylation, however, seems to be localized N-terminal to the first methylation. Previously characterized borosin methyltransferases exhibit a strictly N- to C-terminal directionality, so the StrA methylation pattern, if confirmed, is unique.60,115 However, we were unable to acquire data which definitively allowed us to localize this methylation. Furthermore, analysis of the MS1 data indicates that the 0-2 methylated peptides are in very low abundance (<1%, <1%, and 1%, respectively), while the 3- and 4-methylated peptides both occupy approximately 50% of the total ion count (Figure 3.12). The very low abundance of the 0-2 methylated species in turn generate sparse MS2 spectra, making reliable analysis of these methylation states challenging. It is possible that, similarly to the very lowly abundant second methylation on RceA core peptides, the 4Me StrA core peptide may also be an artifact of a non-native reaction environment. 79 80 Figure 3.11 MS2 spectra for StrA core peptide after 16 hr in vitro reaction We were able to detect up to four methylations on the core peptide of StrA after treating with AspN, GluC and DTT/iodoacetamide. A: 0-2Me spectra B: 3-4Me spectra. Figure 3.12 EIC showing relative methylation states of StrA Methylated species (0-2) are nearly undetectable by MS1. 3- and 4-methylated peptides are by far the most abundant in this experiment. 81 Encouraged by the discovery that StrM and StrA are active borosin proteins, we next sought to prepare these proteins for further biochemical analysis by cleaving the solubility tag from partially purified protein and optimizing the downstream purification process. We treated nickel affinity-purified his6-SUMO-StrA and his6-SUMO-StrM with purified bdSENP1 protease, which scarlessly removes the N-terminal SUMO tag from the protein of interest.153 After treatment with bdSENP1 protease, the protein mixture is then re-bound to nickel affinity resin. This strategy has several benefits. First, the cleaved his6- SUMO tag, un-cleaved protein which still displays the his6 tag, and the protease (which also displays a his6 tag) will bind the resin and are thus easily removed. Second, contaminating proteins that nonspecifically bind the nickel resin will also be removed from the mixture. In an ideal scenario, this leaves only the protein of interest in the flow through during purification, while all undesired cleavage products and contaminants are bound by the resin. While his6-SUMO-StrA and his6-SUMO-StrM were amenable to cleavage by bdSENP1, the subsequent nickel purification was not as effective as we hoped. Proteolytic cleavage of StrA resulted in pure protein in the flow through, but the yield was very low (Figure 3.9 B). Most of the cleaved StrA protein remained in the elution fraction during nickel affinity purification. In an effort to obtain a higher yield of pure StrA, we hypothesized that StrA, like other RiPP precursors, may exhibit high thermostability relative to contaminating proteins.65 To take advantage of this, we incubated bdSENP1- treated protein at a variety of temperatures, separated the soluble and insoluble protein by centrifugation, and ran the respective fractions on an SDS-PAGE gel (Figure 3.9 C). The StrA protein incubated at 75 °C resulted in approximately 60% StrA remaining in solution and relatively few contaminating proteins (~40% of the StrA in the sample precipitated into the insoluble fraction, according to an ImageJ analysis). In light of this successful partial-purification method, heat-treated StrA was confirmed by HPLC-MS/MS to be a suitable substrate for StrM by repeating the in vitro methylation experiment discussed above with the new heat-purified StrA (data not shown). With this confirmation, additional 82 bdSENP1-treated StrA was incubated at 75 °C for 30 minutes and the soluble protein was loaded onto a gel filtration column in an attempt to isolate pure StrA. SDS samples were prepared based on peaks from the chromatogram and run on a gel (Figure 3.9 D and E). Unfortunately, StrA remained in a soluble aggregate with the contaminating proteins in the sample and we were unable to purify StrA by this method. Even treatment with 6 M urea was not sufficient to purify StrA away from contaminants (data not shown). Proteolytic cleavage of his6-SUMO from StrM by bdSENP1 was also successful. Furthermore, his6-SUMO and un-cleaved protein was removed from the solution but a contaminating ~14 kDa protein remained in the flow through after re-purification by nickel affinity chromatography (Figure 3.10 B). In an attempt to remove this contaminating protein from our StrM sample, we loaded the heterogeneous mixture onto a gel filtration column. The chromatogram and corresponding SDS-PAGE gel are shown in Figure 3.10 C and D. StrM and the contaminating protein eluted in distinct peaks, allowing us to cleanly purify tag-less StrM protein. Furthermore, StrM eluted from the column at a retention volume consistent with the protein forming a dimer in solution. This is consistent with previous results demonstrating that the related OphMA forms a dimer,60,114 although in the case of StrM, it is able to dimerize without the presence of its cognate precursor peptide, StrA. To conclude, we optimized a method for expressing and purifying StrM at a suitable concentration and homogeneity to allow for further biochemical analysis. Unfortunately, while we were able to heterologously express his6-SUMO-StrA, this protein was more recalcitrant to purification and may need further optimization prior to use in additional experiments. Despite the challenge with purifying StrA, these borosin proteins from Streptomyces sp. NRRL S-118 have been confirmed to be active in vitro and the pair remains a good candidate for future investigation to discover the structure of the mature RiPP natural product and its role in the native organism. Furthermore, the similarity of the core peptide to that of OphMA offers a means to probe how the “split” system is similar (or different) from the fused system. 83 3.4 Conclusion Of the three bacterial split borosin BGCs discussed in this thesis, two were investigated in this chapter: Rhodospirillum centenum SW and Streptomyces sp. NRRL S- 118. Of the two sets, the split borosin proteins from Streptomyces (StrM and StrA) are the most similar to OphMA (the canonical type 1 borosin) and exhibits nearly identical domain architecture and a similarly hydrophobic core peptide. StrM and StrA were somewhat recalcitrant to purification and will require further optimization/investigation. The split borosin proteins from R. centenum SW (RceM and RceA) exhibit a similar domain architecture to PgiMA1, the only characterized type 2 borosin. RceA has a repeated motif in its core in which alternating glutamic acid residues are methylated. Like PgiMA1, these two proteins were recalcitrant to heterologous expression and purification and both proteins required the fusion of N-terminal SUMO tags for even minimal activity in our heterologous system. Much more work is required to address the difficulties in expression, solubility, and analysis for these proteins. In previously characterized borosin systems, we commonly witnessed a highly abundant fully methylated species.60,115 However, this was not the case with these two bacterial split borosins. While both systems exhibited methylation on expected core peptide residues (hydrophobic residues for StrM/StrA and acidic residues for RceM/RceA), the most prevalent species only accounted for approximately 50% of the methylated species (StrA). This may be due to a wide variety of factors related to heterologous expression/solubility or the nature of the “split” system itself. In the fungal bosorin systems, the fusion of core peptide to enzyme causes the substrate to remain in close proximity to the methyltransferase until proteolytic cleavage—this may not be the case in split systems. Although the BGCs from R. centenum SW and Streptomyces sp. NRRL S- 118 remain intriguing candidates for further study and optimization, the putative split borosin found in S. oneidensis MR-1 proved more amenable to purification without the need for bulky solubility tags. Thus, the encouraging preliminary results regarding S. oneidensis MR-1 are presented in the following chapter. 84 3.5 Materials and methods Unless otherwise noted all chemicals and reagents were purchased from MilliporeSigma. Unless otherwise stated, all enzymes for molecular cloning were purchased from New England Biolabs (NEB). 3.5.1 DNA and protein sequences Table 3.1 Gene and protein sequences of split borosins This table contains the DNA and protein sequences of successfully expressed proteins used in this study. Gene/protein identifiers are provided when available for the native sequences. Protein sequences include purification/solubility tags we used.153 *Due to high GC content of the native organism, strM and strA genes were codon optimized for expression in E. coli (ordered from GenScript). The sequences shown are codon optimized, but the ID number is for the native DNA sequence. Description DNA or protein sequence rceM ACJ00913.1 ATGAGAGCCGCCCCGATGGCCGAGACAGAGACACCCCCCGCCGCCCCC TCCCCGTCGGCGCCCGAGCGGCCCCGCGGCAGCCTGACCGTTGTCGGCA CCGGCCTGCGCGCCCTCTCGCACATGACGCTGGAGGCGATCTCCCACAT CCGCGACGCCGACCGCGTCTTCTTCAGCGTGCCGGACGGCGTAACCGCC CGGCAGATCCGGGACATCAATCCGGAAGCCGTGGACCTGACGCAGTAT TACGGCGAGGACAAGCGGCGGAAGCAGACCTATGTCCAGATGTCGGAG GTGATCCTGCGCGAGGTGCGCGCGGGCAGCGCCGTCACCGCCGTCTTCT ACGGCCATCCGGGTTTCTTCGTCTTTCCCGCGCGTCGCATCCTCTCGATC GCCCGCAAGGAGGGCTACCGGGCGGTGATGCTGCCGGGCATCTCCTCC CTGGACTGCCTGATGGCCGACCTGCGGGTCGATCCCAGCGTCAACGGCT GCCAGATCCTGGAGGCGACGGACCTGCTGCTGCGCAACCGGCCCATCA TCACCTCCGGCCACGTCATCATCCTCCAGGTGGGGTCGGTGGGCGATTC GGCCTTCTCCTTCACGGCCGGCTTCCGCCATGCCAAGCGGGCCGTGCTG TTCGAGCGGCTGATCGAGGCCTATGGCGAGGAACACCGCAGCGTCCTCT ATCTGGCGGCGACATATCCGGGTCTCGACGGGCAGGCCGTGGTGCGGC CGCTGGGGGCCTACCGCGATCCAAAGGTGCTGGCCTCGGTGCCGCCGG CCGGCACGCTCTACATCCCGGCGAAGGACATGCTGCCGACCGACATGG CGATGGCGGAGAAGCTGGGCATGTCCGCCCTGGTCGGCCCCGACGCGC CGGTCCCCGCCGGCCCCGACAGTTACGGCCCGTTCGAGGCGCAGGCCAT CGCCGCGCTGGACCATTACCGTCCTTCCCCGACCTGGCGCCCCCGCACG GCATCGAAGGCGCTGCAACGGGTGATGACGCTGCTGGCCGGAACGCCG TCGGTCGCCGCCGTCTACCGCAAGGACCCGGCCCGGCTGGTGGATCTGC ACCCCGACCTGACCCCGGCCGAACGCAAGGCCCTGCTCTCGCGCCGGG CCGGACCGCTGAACGCGGTGACGGCGCCGCCGCCGGAAGGGGCGCCCC CCACGGTGGACGAAGCAGGCAACGGCAATGGCGGCGACGCCCCGTCAG AGGGGGAAACCGCCTGA RceM RC1_3560 MRAAPMAETETPPAAPSPSAPERPRGSLTVVGTGLRALSHMTLEAISHIRDA DRVFFSVPDGVTARQIRDINPEAVDLTQYYGEDKRRKQTYVQMSEVILREV RAGSAVTAVFYGHPGFFVFPARRILSIARKEGYRAVMLPGISSLDCLMADL RVDPSVNGCQILEATDLLLRNRPIITSGHVIILQVGSVGDSAFSFTAGFRHAK RAVLFERLIEAYGEEHRSVLYLAATYPGLDGQAVVRPLGAYRDPKVLASV PPAGTLYIPAKDMLPTDMAMAEKLGMSALVGPDAPVPAGPDSYGPFEAQA 85 IAALDHYRPSPTWRPRTASKALQRVMTLLAGTPSVAAVYRKDPARLVDLH PDLTPAERKALLSRRAGPLNAVTAPPPEGAPPTVDEAGNGNGGDAPSEGET A* rceA ACJ00914.1 ATGACGACCATCGTCCCGACCGAACTCGACCAGCCCGACGTCATCGAA CTCTCCGGCGGCGAGCTGGATGTTGCCGAGCTTTCCGGTGGCGAGCTGG ACGTGGCCGAACTCTTCGGCGGCGAGCTGGACGTGGCCGAACTCTCCG GTGGCGAGCTGGACGTGGCCGAGCTTTCCGGCGGCGAGCTGGACGTGG CCGAGCTTTCCGGCGGCGAGCTGGATGTTGCCGAGCTTTCCGGCGGTGA GCTGGACGTGGCCGAGCTTTCCGGCGGTGAGCTGGACGTGGCCGAACT CTCCGGCGGCGAGCTGGACGTGGCCGAACTCTCCGGCGGCGAGCTGGA CGTGGCCGAGATCGGCATCATCAACACCTTCGATCTCTGA His6-SUMO- RceA RC1_3561 MGSHHHHHHHSSGLVPRGSASHINLKVKGQDGNEVFFRIKRSTQLKKLMN AYCDRQSVDMTAIAFLFDGRRLRAEQTPDELEMEDGDEIDAMLHQTGGH MTTIVPTELDQPDVIELSGGELDVAELSGGELDVAELFGGELDVAELSGGE LDVAELSGGELDVAELSGGELDVAELSGGELDVAELSGGELDVAELSGGE LDVAELSGGELDVAEIGIINTFDL* strM* IH00_RS0113665 ATGCAGGAGACCACCGGTAACGCGCAACTGGTGGTTGTGGGTACCGGT TTCCGTGCGATTGGTGACCTGACCGTTGAAGCGCGTGCGTGCCTGGAAC AGGCGGACAAGGTTCTGTGCCTGATCGGTGATCCGCTGGTGACCCGTCA CATTGAGAAACTGAACGCGAGCGTTGAAACCCTGGATGTTCATTATGCG GTGGGCAAGCCGCGTAGCGCGAGCTATGAGGACATGGTGGAACACATT ATGAGCGAACTGCACCGTGATCAATTCGTTTGCGTGGCGCTGTACGGTC ACCCGGGCGTTTTTGCGTATACCGGTCATGAGGCGATCCGTCGTGCGCG TGAGGAAGGCATCGCGGCGCGTATGCTGCCGGCGTGCAGCGCGGAAGA CTGGCTGTTTGCGGATCTGGGTCTGGACCCGGGCGAGCGTGGCTGCCAG AGCTTCGAAGCGACCGACTTTCTGATCCGTCACCGTGTGTTTGATCCGA CCGGCCTGCTGATTCTGTGGCAAGTTGGTGTGATCGGCATGATTGATCG TGATCCGGGTTATGATGCGCGTCCGGGCGTTACCACCCTGACCGATGCG CTGGTTGCGAGCTACGGTAGCGGCCACCCGGTTACCGTGTACGAGGCG AGCCCGTATGTTACCGCGGAACCGCGTACCACCACCGTGCCGCTGGCGG AGCTGCCGGACACCCCGCTGAGCGCGGCGAGCACCCTGGTTGTGCCGC CGCTGCCGCCGCGTCCGGTGGATCGTGAACTGCTGGCGCGTCTGGCGGC GCGTCGTTAA His6-SUMO-StrM WP_031073184.1 MGSHHHHHHSSGLVPRGSASHINLKVKGQDGNEVFFRIKRSTQLKKLMNA YCDRQSVDMTAIAFLFDGRRLRAEQTPDELEMEDGDEIDAMLHQTGGHM QETTGNAQLVVVGTGFRAIGDLTVEARACLEQADKVLCLIGDPLVTRHIEK LNASVETLDVHYAVGKPRSASYEDMVEHIMSELHRDQFVCVALYGHPGVF AYTGHEAIRRAREEGIAARMLPACSAEDWLFADLGLDPGERGCQSFEATDF LIRHRVFDPTGLLILWQVGVIGMIDRDPGYDARPGVTTLTDALVASYGSGH PVTVYEASPYVTAEPRTTTVPLAELPDTPLSAASTLVVPPLPPRPVDRELLA RLAARR* strA* IH00_RS0113670 ATGCCGGCGGCGGTGGTTGACTTCATGGAGGAACTGGTGACCCAGCCG CGTCGTCAACACGCGTACCGTCGTAGCGCGGAGGCGTATGTTGCGGATA GCGCGCTGACCGCTAGCGAGCGTGAAGCGGTGGTTAGCGGTGACGTGG ATCGTATGCGTGCGGTTCTGGCCGAGCACAGCGGCGTGAAAGAGGAGT GCCACGCGGTTCTGGTGGTTATCATTTTTGACCCGGATGAAGTTCCGAG CGGTGCGTAA His6-SUMO-StrA WP_158827804.1 MGSHHHHHHSSGLVPRGSASHINLKVKGQDGNEVFFRIKRSTQLKKLMNA YCDRQSVDMTAIAFLFDGRRLRAEQTPDELEMEDGDEIDAMLHQTGGHMP AAVVDFMEELVTQPRRQHAYRRSAEAYVADSALTASEREAVVSGDVDRM RAVLAEHSGVKEECHAVLVVIIFDPDEVPSGA* 86 Table 3.2 Plasmids used in this study Includes plasmid ID number and name/description. Plasmid ID Creator Description pMF1006 n/a bdSENP1 protease (SUMO-cleaving) for expression in E. coli153 pMF1180 MRJ His6-SUMO-RceA_RBS_RceM_pET28b pMF1190 MRJ His6-SUMO-StrA pMF1191 MRJ His6-SUMO-StrM pMF1185 MRJ His6-SUMO-StrA_StrM_pET28b Table 3.3 Primers used to create plasmids Primers ordered from IDT. ID number Description Sequence prmMRJ064 Forward primer used to amplify recA-rceM with NdeI restriction site ATATAACATATGACGACCATCGTCCC prmMRJ063 Reverse primer used to amplify rceA-rceM with BamHI restriction site TTATATGGATCCTCAGGCGGTTTCCCC prmMRJ066 Forward primer used to amplify strM with NdeI restriction site ATATAACATATGCAGGAGACCACCG prmMRJ067 Reverse primer used to amplify strM with BamHI restriction site TTATATGGATCCTTAACGACGCGCCG prmMRJ068 Forward primer used to amplify strA with NdeI restriction site ATATAACATATGCCGGCGGC prmMRJ069 Reverse primer used to amplify strA with BamHI restriction site TTATATGGATCCTTACGCACCGCTCGG T7_mod_fw Forward primer used for colony PCR and sequencing CCCGCGAAATTAATACGACTCACTATAGG T7_mod_rv Reverse primer used for colony PCR and sequencing CTAGTTATTGCTCAGCGGTGGC 3.5.2 Molecular cloning and creation of plasmid constructs Dr. Matthew Jensen performed the cloning work described here. For all cloning, standard conditions were used according to the manufacturer’s instructions. For amplifying DNA to be used in ligations or Gibson assemblies, Q5 high fidelity DNA polymerase was used. The final concentrations for PCRs were: 1X Standard Q5 reaction buffer, 200 μM dNTPs, 5% DMSO, 0.5 μM forward primer, 0.5 μM reverse primer, 0.02 units/50 μL PCR Q5 polymerase. For colony PCR, OneTaq DNA polymerase was used. The final concentrations for PCRs were: 1X Standard OneTaq PCR buffer, 200 μM dNTPs, 5% 87 DMSO, 0.2 μM forward primer, 0.2 μM reverse primer, and 1.25 units/50 μL reaction of polymerase. For pMF1180, pET28b-his6-SUMO backbone was digested with NdeI and BamHI, treated with Antarctic phosphatase, and the band was extracted from an agarose gel using a kit (Thermo Scientific). The genes rceA and rceM were cloned directly out of the organism with the native genomic context between the two syntenic genes. Primers prmMRJ063 and prmMRJ064 were used to amplify the two genes together and add NdeI and BamHI restriction sites to the termini such that the rceA would be inserted in-frame with the his6-SUMO tag gene. The PCR product was verified by agarose gel electrophoresis, digested with NdeI and BamHI, and the reaction was cleaned up using a kit (Thermo Scientific). T4 DNA ligase was used to ligate the sticky overhangs into the prepared plasmid backbone. Ligation reactions were transformed into electrocompetent TOP10 cells, spread onto LB agar plates with 50 µg/mL kanamycin and allowed to grow overnight at 37 °C. Resultant colonies were screened by colony PCR using primers T7_mod_fw and T7_mod_rv with an annealing temperature of 56 °C and an extension time of 1 minute 20 s. Positive hits were sequence verified by ACGT using Sanger sequencing and the same colony PCR primers. For pMF1190 and pMF1191, pET28b-his6-SUMO backbone was digested with NdeI and BamHI, treated with Antarctic phosphatase, and the band was extracted from an agarose gel using a kit (Thermo Scientific). Gene fragments for strA and strM were codon optimized for expression in E. coli and purchased as gBlocks. The strA gBlock was amplified with primers prmMRJ068 and prmMRJ069 to add NdeI and BamHI cut sites on the termini. The PCR product was verified by agarose gel electrophoresis, digested with NdeI and BamHI, and the reaction was cleaned up using a kit (Thermo Scientific). The strM gBlock was amplified with primers prmMRJ066 and prmMRJ067 to add NdeI and BamHI cut sites on the termini. The PCR product was verified by agarose gel electrophoresis, digested with NdeI and BamHI, and the reaction was cleaned up using a kit (Thermo Scientific). T4 DNA ligase was used to ligate the sticky overhangs into the 88 prepared plasmid backbone. Ligation reactions were transformed into electrocompetent TOP10 cells, spread onto LB agar plates with 50 µg/mL kanamycin and allowed to grow overnight at 37 °C. Resultant colonies were screened by colony PCR using primers T7_mod_fw and T7_mod_rv with an annealing temperature of 56 °C and an extension time of 1 minute 20 s. Positive hits were sequence verified by ACGT using Sanger sequencing and the same colony PCR primers. 3.5.3 Heterologous protein expression and purification Heterologous expressions were conducted in E. coli cells BL21(DE3). A 10 mL saturated overnight culture in LB with 50 μg/mL kanamycin was used to inoculate 1 L of Terrific Broth (TB) with 50 μg/mL kanamycin in a 2.5 L baffled Ultra Yield flask (Thomson Scientific). The 1 L culture was incubated in a 37 °C shaker until the OD600 reached approximately 0.7, at which time the culture was cold shocked in an ice bath for 30-60 minutes. After cold shocking, the culture was induced with 200 mM IPTG and placed in a 16 °C shaker for 24 h (note: over-expression of his6-SUMO-StrA did not require IPTG induction). After 24 h, the cells were harvested by centrifugation at 4000 x g for 30 minutes at 4 °C, snap frozen in liquid nitrogen, and stored at -80 °C until use. For protein purification by nickel affinity chromatography, frozen cells were thawed on ice and then resuspended to homogeneity in ice-cold lysis buffer (300 mM NaCl, 50 mM sodium phosphate, 20 mM imidazole, 10% glycerol, pH 8.0) with 4 mL of buffer for every 1 g of wet cell mass. After resuspension, lysozyme was then added to a final concentration of 1 mg/mL and incubated on ice for 30 minutes. After lysozyme treatment, cells were further lysed by sonication. After sonication, lysate was clarified by centrifugation at 15,000 x g for 45 minutes at 4 °C. The soluble protein from the clarified supernatant was then batch-bound to nickel-NTA resin (GoldBio) for 60 minutes on a rotator at 4 °C. After binding, the resin was added to a 5 mL fritted column, washed with 10 column volumes of lysis buffer, and the protein was eluted in lysis buffer with 250 mM imidazole. For subsequent gel filtration chromatography, protein was concentrated, sterile 89 filtered, and loaded onto a HiLoad 16/600 Superdex 200 pg size exclusion column at a flow rate of 1 ml/min of lysis buffer without imidazole. Protein was analyzed by SDS-PAGE gel, fractions were pooled and concentrated using Amicon Ultra centrifugal filter columns (MilliporeSigma). Concentrations were measured by Bradford assay and proteins were snap frozen in liquid nitrogen and stored at -80 °C until use. When using frozen protein, all samples were thawed on ice, centrifuged at top speed in a microcentrifuge at 4 °C for 10 minutes, aggregate removed by transferring supernatant to a fresh tube, and the concentration re-measured. 3.5.4 SUMO cleavage by bdSENP1 protease bdSENP1 SUMO protease was expressed, purified, and thawed as discussed above. The following protocol was carried out as described previously.153 Briefly, bdSENP1 was used at a 1:1000 molar ratio of bdSENP1:SUMO and was conducted in LS-S buffer (250 mM NaCl, 40 mM tris HCl pH 7.5, 2 mM MgCl2, 2 mM DTT, and 250 mM sucrose). Proteins to be cleaved were dialyzed or buffer exchanged into cold LS-S buffer. Reaction was conducted at 4 °C overnight and cleaved SUMO tags and un-cleaved protein was removed from the samples by Ni-NTA batch purification as described above (his6-SUMO will bind to resin and cleaved protein will reside in the flow through). Samples were analyzed by SDS-PAGE gel. 3.5.5 In vitro multiple turnover experiment for MS analysis Split borosin methyltransferase and precursor proteins were expressed and purified as described above in separate plasmids (not the co-expression constructs). Proteins were dialyzed into a buffer containing 50 mM HEPES, 300 mM NaCl, 10% glycerol, pH 8.0. Reactions were conducted in 100 μL final volumes with saturating amounts of SAM (dissolved in 0.5 mM HEPES pH 8.0) and SAHN. The proteins were used in a 1:1 molar ratio (50 μM of each). Reactions were incubated at room temperature for 16 h and quenched with SDS sample buffer and boiled prior to in-gel digestion and HPLC-MS/MS analysis. 90 3.5.6 Mass spectrometric analysis Purified protein was run on an SDS-PAGE gel, stained with Coommassie and destained. After destaining, gel was imaged and appropriate band was excised using a scalpel and cut into 2 mm pieces, which were placed into a LoBind tube (Eppendorf). Gel pieces were destained with 50 mM ammonium bicarbonate (ABC) in a 50% acetonitrile (ACN) solution. Once gel pieces were clear, they were dehydrated with 100% ACN until opaque, at which point ACN was removed. For StrA, the gel pieces were rehydrated with a solution containing 50 mM ABC and 55 mM DTT and incubated in a 56 °C water bath for 60 minutes. DTT solution was subsequently removed and replaced with a solution containing 50 mM ABC and 55 mM iodoacetamide, at which point the tubes were placed in the dark at room temperature for 30 minutes. The iodoacetamide solution was removed and the gel pieces were dehydrated with 100% ACN, which was subsequently removed. For all samples, the gel pieces were re-hydrated with the appropriate digest buffer according to the manufacturer’s instructions for 15 minutes on ice (digest buffer includes the appropriate protease: RceA was treated with AspN (Promega) and StrA was treated with AspN and GluC (Thermo Scientific)). If the gel pieces were no longer submerged in digest buffer, extra buffer was added to cover them and they were subsequently incubated for at least 16 h at 37 °C. After digestion, supernatant was transferred to a fresh LoBind tube and peptides were extracted from the gel pieces with increasing amounts of ACN (50%, 80%, 95%) and 0.3% formic acid (FA). After extraction, peptide solution was kept at -80 °C for at least 30 minutes to inactivate the enzymes and then vacuum concentrated until dry. Peptides were then resuspended in 0.1% FA solution and purified with a C18 ZipTip (MilliporeSigma) according to the manufacturer’s instructions. After purification, samples were vacuum concentrated until dry and resuspended in 20% ACN, 0.1% FA solution for analysis. Samples were loaded onto a Thermo Scientific Fusion mass spectrometer in accordance with our previously published method.115 91 4 Preliminary findings of a split borosin found in the bacterium S. oneidensis MR-1 This chapter was written by Fredarla Miller for the purpose of this thesis. Data and results will contribute to a larger survey of bacterial borosins for future publication and support the work presented in Chapter 5 of this thesis. Dr. Matthew Jensen performed most of the cloning and foundational work supporting this chapter. Fredarla Miller performed additional protein expressions, purifications, stability tests, in vitro experiments, and analyzed all the mass spectrometry data. Dr. Michael Freeman performed the bioinformatics analysis to identify the putative split borosin. 4.1 Introduction Shewanella oneidensis MR-1 is a Gram-negative γ-proteobacterium known for its unique metabolism and ability to reduce a variety of substrates including insoluble metals and electrodes.161 Typically isolated from a wide variety marine sediments, Shewanella spp. exhibit an equally diverse set of metabolic abilities.161 The putative split borosin BGC in this organism is shown in Figure 4.1 A and consists of at least three genes: the borosin methyltransferase (sonM), the precursor peptide (sonA), and a GGDEF-domain containing protein. There are currently 43 Shewanella spp. genomes published on NCBI. Of these, 37 contain this cluster—a level of conservation not typically seen in natural product biosynthesis. Together with the predicted functions of the BGC genes, we expect this RiPP to play a somewhat central role in this organism’s metabolism/homeostasis. Due to the intricacies of determining the biological role of an orphan RiPP BGC, this BGC from S. oneidensis MR-1 will be explored more fully in Chapter 6 of this thesis, while this present chapter will focus upon the heterologous characterization of the SonM and SonA proteins. To begin to investigate the legitimacy of this putative split borosin BGC, we first sought to determine if the SonM/SonA pair were active as a methyltransferase and precursor peptide, respectively. SonM and SonA together exhibit a very similar domain architecture to the Streptomyces sp. NRRL S-118 borosin proteins (StrM and StrA) and OphMA.60 Like StrM, SonM is 36% identical to the methyltransferase domain of OphMA 92 (Figure 4.1 C) and SonA encodes the LigA domain in its putative leader peptide, which is conserved in many of the bacterial split borosin BGCs. Furthermore, the putative core peptide of SonA, like that of StrA, contains several hydrophobic amino acids, which we predicted to be the site of posttranslational methylation by SonM (Figure 4.1 B). Whereas the 12 amino acid core of OphMA contains nine α-N-methylations, the core of SonA is much shorter and therefore can accommodate fewer methylated amino acid residues. Figure 4.1 Putative borosin gene cluster from S. oneidensis MR-1 A: Genomic locus of sonM (pink) and sonA (blue) in the genome of S. oneidensis MR-1 (AE0142992.2). Proximal genes are shown in gray and their predicted functions are annotated. B: Domain architecture and core peptide comparison with the type 1 borosin, OphMA. The predominant species of methylated core of SonA is shown (methylations are shown in pink boxes, ambiguous methylation is in dashed pink box). C: Alignment of SonM with the methyltransferase domain of OphMA created using Clustal Omega (v. 1.2.4). Asterisk (*) indicates identical residue, colon (:) and period (.) indicate similar residues. OphMA_noclasp_nocore METSTQTKAGSLTIVGTGIESIGQMTLQALSYIEAAAKVFYCVIDPATEAFILTKNKNCV 60 SonM --------MGSLVCVGTGLQLAGQISVLSRSYIEHADIVFSLLPDGFSQRWLTKLNPNVI 52 ***. ****:: **::: : **** * ** : * :: :: . * * : OphMA_noclasp_nocore DLYQYYDN---GKSRLNTYTQMSELMVREVRKGLDVVGVFYGHPGVFVNPSHRALAIAKS 117 SonM NLQQFYAQNGEVKNRRDTYEQMVNAILDAVRAGKKTVCALYGHPGVFACVSHMAITRAKA 112 :* *:* : *.* :** ** : :: ** * ..* .:*******. ** *:: **: OphMA_noclasp_nocore EGYRARMLPGVSAEDCLFADLCIDPSNPGCLTYEASDFLIRDRPVSIHSHLVLFQVGCVG 177 SonM EGFSAKMEPGISAEACLWADLGIDPGNSGHQSFEASQFMFFNHVPDPTTHLLLWQIAIAG 172 **: *:* **:*** **:*** ***.* * ::***:*:: :: . :**:*:*:. .* OphMA_noclasp_nocore IADFNFTGFDNNKFGVLVDRLEQEYGAEHPVVHYIAAMMPHQDPVTDKYTVAQLREPEIA 237 SonM EHTLTQFHTSSDRLQILVEQLNQWYPLDHEVVIYEAANLPIQAPRIERLPLANLPQAHL- 231 :. ..::: :**::*:* * :* ** * ** :* * * :: :*:* : .: OphMA_noclasp_nocore KRVGGVSTFYIPPKARKASNLDIIRRLELLPAGQVP 273 SonM ---MPISTLLIPPAKKLEYNYAILAKLGIGPEDLG- 263 :**: *** : * *: :* : * . 93 When approaching the borosin proteins in S. oneidensis MR-1, we worked on two objectives: 1) to verify activity of SonM on the SonA substrate as quickly as possible and 2) to prepare for a rigorous biochemical analysis of these proteins. In consideration of the difficulties we had in investigating the closely-related StrM/StrA pair of proteins, we prepared a variety of protein constructs with and without solubility tags in a multi- pronged/brute force approach to this BGC. With this in mind, we implemented our typical screening pipeline discussed previously: 1) clone the methyltransferase and precursor of interest into a plasmid for co-expression in E. coli, 2) heterologously over-express and purify the resultant precursor protein, and 3) analyze the core peptide by HPLC-MS/MS. What follows is the preliminary data that serves as a foundation for the subsequent results presented in Chapter 5 (in preparation for publication) and experiments in the native host presented in Chapter 6. 4.2 Heterologous methylation of SonA by SonM in vivo The genes sonM and sonA were cloned from extracted genomic DNA of S. oneidensis MR-1. An N-terminal his6-tag was added to sonA and the two genes were cloned into the multiple cloning site of the pET28b plasmid as a single operon. The proteins were heterologously expressed in E. coli for 24 h and were purified by nickel affinity chromatography in the same manner as discussed previously. Surprisingly, both SonM and SonA expressed very well without any additional solubility tags. Both proteins are clearly visible as bands in the SDS-PAGE gel in the soluble fraction of the cell lysate (Figure 4.2). Interestingly, the two proteins co-eluted when the column was washed with high imidazole buffer. We reasoned that this could be due either to non-specific binding of SonM to the column, or SonM and his6-SonA forming a very stable complex in these conditions. Considering the otherwise clean purification and approximately 1:1 molar ratio (as determined by ImageJ) between SonM:SonA, we anticipated the latter. 94 Figure 4.2 His6-SonA strongly co-purifies with SonM when co-expressed in E. coli SDS-PAGE gel demonstrating how his6-SonA strongly co-purifies with SonM (SonM is not his6-tagged). At this point, the band corresponding to his6-SonA was excised from the gel, digested with AspN protease, and analyzed by HPLC-MS/MS for PTMs. We confirmed that the core peptide of SonA was methylated by SonM in vivo, producing spectra consistent with two methylations (on L63 and I65) within the core region (raw MS2 spectra are shown in Figure 4.3). In conducting an analysis of the MS1 data to determine the relative abundance between methylated species, we saw that the doubly-methylated species was, by far, the most abundant and occupied approximately 98% of the total (Figure 4.4). We hypothesized an N- to C-directionality of methylation for SonM upon SonA, but we were unable to detect any un-methylated peptide. Additionally, the singly-methylated species seems to be present with a methylation upon L63 or the I65 (although the methylated L63 is predominant). This suggested that there may be a general N- to C- directionality as we have seen in other characterized borosin systems, but remained to be fully elucidated. We also noticed a 3Me species present at a very low abundance wherein the adjacent S66 residue is methylated. Due to its extremely low abundance (and absence in subsequent in vitro experiments), we do not believe this to be part of the native methylation pattern (Figure 4.1 B). 95 Figure 4.3 MS2 spectra showing methylation states of SonA core peptide After his6-SonA was co-expressed with SonM for 24 hr in E. coli, his6-SonA was purified and analyzed by HPLC-MS/MS. Three methylation states were found. Raw data is down on the left and masses are labeled and mapped onto the AspN fragment of the core region of SonA on the right. Localized methylations are shown with orange circles. Error is shown in parenthetical numbers on the right. Relative methylation states are shown in Figure 4.4. 96 Figure 4.4 HPLC-MS EIC for SonA after co-expression with SonM for 24 hrs AspN fragment of SonA containing the methylated residues (shown in orange with asterisks) and relative amounts of methylated species in the purified protein sample. The most abundant species is the doubly- methylated core peptide. We were encouraged by the relative homogeneity of methylation pattern and how the reaction seemed to go to completion, with nearly all (98%) of the SonA peptide doubly- methylated (Figure 4.4). Next, we sought to probe the SonM-SonA (SonMA) complex as it purified from the Ni-affinity column. We hypothesized that the SonMA pair formed a tetramer consisting of two SonA subunits and two SonM subunits to reflect the same composition of the OphMA homodimer.114 The purified protein was concentrated and loaded onto a gel filtration column and the peaks were analyzed by SDS-PAGE. Gratifyingly, the protein eluted in two distinct peaks, the first corresponding to the predicted tetrameric SonMA complex and the second to monomeric his6-SonA (Figure 4.5 A and B). We further verified that the monomeric his6-SonA protein was similarly homogenously methylated as the sample analyzed after Ni-affinity purification (Figure 4.5 C). This instance is the first borosin system we have characterized that showed promise of performing multiple substrate turnover. However, as the entire reaction occurred in vivo in which both proteins were over-expressed, this avenue required a more nuanced investigation. 97 Figure 4.5 SEC purification of SonM and his6-SonA A: His6-SonA and SonM were co-expressed in E. coli, purified by Ni-affinity chromatography, and run on a gel filtration column to achieve the presented chromatogram. Peaks corresponding to the his6-SonA-SonM tetramer complex and monomeric his6-SonA are noted B: Samples from each peak were analyzed by SDS- PAGE C: The band in the second peak was excised and analyzed by HPLC-MS/MS. Shown is the EIC to verify that the monomeric his6-SonA is predominantly doubly-methylated. 4.2.1 Multiple substrate turnover in vitro In order to probe the potential ability of SonM to methylate and turnover multiple SonA substrates, we next sought to analyze an in vitro reaction. We first confirmed that his6-SUMO-SonM and his6-SUMO-SonA were amenable to bdSENP1 cleavage and subsequent re-purification. While the re-purification yield of the protein of interest was low, both proteins were easily cleaved and re-purified by this method (Figure 4.6). After SUMO cleavage, SonM and SonA were used in several in vitro reactions in order to definitively determine if SonM is capable of turning over multiple SonA substrates by 98 analyzing the reaction after a set time point by HPLC-MS/MS. Each reaction utilized a saturating amount of SAM, 25 µM SonA, and decreasing amounts of SonM to achieve various molar ratios: 1:1 (the same molar ratio of methyltransferase to core peptide exhibited by OphMA), 1:10, and 1:50. SonA was maintained at the same concentration for easier mass spectrometric analysis. When SAM is used as a methyl donor, the product S- adenosyl homocysteine (SAH) is formed. As many methyltransferases are known to be inhibited by excess product, we performed another set of reactions where we included SAH nucleosidase (SAHN), which degrades SAH and should eliminate the product inhibition.162 The reactions were incubated at room temperature for 24 h, run on an SDS-PAGE gel, the band corresponding to SonA excised, digested, and subsequently analyzed by HPLC- MS/MS. Figure 4.6 His6-SUMO tag cleavage of SonM and SonA using bdSENP1 protease The SDS-PAGE gels showing SUMO cleavage reaction and subsequent purification for the proteins used in preliminary multiple turnover experiment. The “neg ctrl” lane in each gel is the purified his6-SUMO-tagged protein prior to bdSENP1 treatment. Direct comparison of “neg ctrl” and “pre-purification” lanes demonstrate near complete cleavage of the tag from the protein of interest. “Flow through” fractions contain only the re- purified, tag-less protein. Happily, we were able to confirm multiple turnover in vitro for this split borosin system (Figure 4.7). We further demonstrated that SonM does exhibit strong product inhibition with SAH. This is illustrated most clearly in Figure 4.7 for the 1:50 reaction. In 99 this reaction, the sample without SAHN is predominantly un-methylated (76% un- methylated, 19% singly-methylated, and 5% doubly-methylated) while the corresponding reaction with SAHN is predominantly doubly-methylated (88% doubly-methylated). Furthermore, MS2 spectra indicate no off-target methylations (data not shown). We did not detect any 3-methylated species, further suggesting that the third methylation seen in vivo may be an artifact of artificially increased concentrations of enzyme. Through this experiment, we were also able to rule out any cross-reactivity from native E. coli proteins. Although there are no putative borosin methyltransferases in E. coli, we were able to confirm that no methylations occur on SonA protein that is expressed in E. coli without the simultaneous co-expression of SonM (Figure 4.7). Since the SonA and SonM proteins proved easy to express and purify, we predicted that, unlike the borosin proteins from R. centenum SW and Streptomyces sp. NRRL S-118, those from S. oneidensis MR-1 might remain soluble without N-terminal SUMO tags. After the success of this multiple turnover experiment, his6-SonM and his6-SonA were cloned, expressed, and purified successfully using identical conditions. 100 Figure 4.7 HPLC-MS EIC to show relative abundances of in vitro methylation In vitro reactions (24 h) with SonM/SonA. Reactions without SAHN (left) and with SAHN (right) are shown. 4.3 Conclusion Our preliminary investigation of the split borosin BGC found in S. oneidensis MR-1 was exceptionally fruitful. We were initially frustrated by the challenges associated with the Streptomyces sp. NRRL S-118 and R. centenum SW split borosins, but SonM and SonA happily behaved well in vivo (heterologously in E. coli) and in vitro. The stable tetrameric complex formed in vivo is encouraging as a lead for X-ray crystallographic structural determination and the confirmation of multiple substrate turnover is promising for a kinetic investigation of the enzyme mechanism—a route that has not yet been possible for any 101 other putative borosin system. The subsequent discovery that SonM and SonA are easily heterologously expressed/purified with only an N-terminal his6-tag makes this even more straightforward as it removes the extra step of SUMO-cleavage and re-purification required by the other sets of borosin proteins. In the SonMA pair, we have seemingly identified a set of natively split borosin proteins which are suitable for extensive biochemical characterization and reside in a genetically tractable organism known for its unique metabolic abilities. The preliminary data in this chapter is foundational to the data presented in subsequent chapters. Chapter 5 presents an extensive structural and kinetic study of SonM and SonA, and Chapter 6 presents preliminary work regarding the discovery of the biological role of this cryptic BGC. 4.4 Materials and methods Unless otherwise noted all chemicals and reagents were purchased from MilliporeSigma. 4.4.1 DNA and protein sequences Table 4.1 Gene and protein sequences of split borosins This table contains the DNA and protein sequences of all the proteins used in this study. Gene/protein identifiers are provided when available for the native sequences. Protein sequences include purification/solubility tags we used.153 Description DNA or protein sequence sonM AAN54539.2 ATGGGATCACTCGTCTGTGTGGGCACTGGGTTACAGCTCGCGGGGCAA ATTAGCGTATTAAGCCGCAGCTATATTGAACATGCCGATATTGTATTTT CACTCTTACCTGACGGTTTCTCGCAGCGTTGGTTGACGAAGCTCAACCC CAATGTCATCAATTTGCAGCAGTTTTATGCGCAAAATGGTGAAGTTAA AAATCGCCGAGACACCTACGAGCAAATGGTCAATGCCATTCTAGATGC GGTGAGAGCGGGTAAAAAAACCGTGTGTGCACTCTACGGTCATCCGGG GGTATTTGCCTGTGTATCCCATATGGCGATAACTCGGGCGAAGGCCGA AGGGTTTTCGGCAAAGATGGAGCCGGGGATTTCGGCCGAAGCTTGCCT GTGGGCCGACTTAGGGATTGACCCCGGCAACTCGGGGCATCAAAGTTT TGAAGCTAGCCAGTTTATGTTTTTCAACCATGTGCCCGATCCCACTACC CACTTATTACTCTGGCAAATCGCCATTGCAGGCGAACATACCTTAACC CAATTTCATACCTCGAGTGATAGGTTGCAGATCCTCGTGGAGCAGTTG AATCAATGGTATCCCCTCGACCATGAGGTGGTCATATACGAAGCGGCC AATTTGCCAATCCAAGCCCCGCGTATCGAGCGTTTACCTTTAGCGAATT TACCCCAAGCACACTTAATGCCGATTAGTACGTTGTTAATTCCGCCAGC 102 AAAAAAGCTGGAGTACAACTATGCTATTTTGGCTAAGTTAGGGATCGG TCCCGAAGATTTGGGATAA SonM SO1478 MGSLVCVGTGLQLAGQISVLSRSYIEHADIVFSLLPDGFSQRWLTKLNPNV INLQQFYAQNGEVKNRRDTYEQMVNAILDAVRAGKKTVCALYGHPGVF ACVSHMAITRAKAEGFSAKMEPGISAEACLWADLGIDPGNSGHQSFEASQ FMFFNHVPDPTTHLLLWQIAIAGEHTLTQFHTSSDRLQILVEQLNQWYPLD HEVVIYEAANLPIQAPRIERLPLANLPQAHLMPISTLLIPPAKKLEYNYAILA KLGIGPEDLG* His6-SonM SO1478 MHHHHHHSSMGSLVCVGTGLQLAGQISVLSRSYIEHADIVFSLLPDGFSQR WLTKLNPNVINLQQFYAQNGEVKNRRDTYEQMVNAILDAVRAGKKTVC ALYGHPGVFACVSHMAITRAKAEGFSAKMEPGISAEACLWADLGIDPGNS GHQSFEASQFMFFNHVPDPTTHLLLWQIAIAGEHTLTQFHTSSDRLQILVE QLNQWYPLDHEVVIYEAANLPIQAPRIERLPLANLPQAHLMPISTLLIPPAK KLEYNYAILAKLGIGPEDLG* His6-SUMO-SonM MGSHHHHHHHSSGLVPRGSASHINLKVKGQDGNEVFFRIKRSTQLKKLM NAYCDRQSVDMTAIAFLFDGRRLRAEQTPDELEMEDGDEIDAMLHQTGG HMGSLVCVGTGLQLAGQISVLSRSYIEHADIVFSLLPDGFSQRWLTKLNPN VINLQQFYAQNGEVKNRRDTYEQMVNAILDAVRAGKKTVCALYGHPGV FACVSHMAITRAKAEGFSAKMEPGISAEACLWADLGIDPGNSGHQSFEAS QFMFFNHVPDPTTHLLLWQIAIAGEHTLTQFHTSSDRLQILVEQLNQWYPL DHEVVIYEAANLPIQAPRIERLPLANLPQAHLMPISTLLIPPAKKLEYNYAIL AKLGIGPEDLG* sonA AAN54540.1 ATGTCTGGATTATCGGATTTTTTTACCCAGTTAGGCCAAGATGCGCAGT TAATGGAAGACTATAAACAGAATCCTGAGGCGGTGATGCGTGCCCACG GATTAACTGATGAACAAATTAACGCTGTAATGACTGGGGATATGGAAA AGCTCAAAACGTTAAGTGGTGATAGTAGCTATCAATCTTACCTTGTTAT TTCACATGGTAATGGTGATTAA His6-SonA SO1479 MHHHHHHMSGLSDFFTQLGQDAQLMEDYKQNPEAVMRAHGLTDEQINA VMTGDMEKLKTLSGDSSYQSYLVISHGNGD* His6-SUMO-SonA MGSHHHHHHHSSGLVPRGSASHINLKVKGQDGNEVFFRIKRSTQLKKLM NAYCDRQSVDMTAIAFLFDGRRLRAEQTPDELEMEDGDEIDAMLHQTGG HMSGLSDFFTQLGQDAQLMEDYKQNPEAVMRAHGLTDEQINAVMTGDM EKLKTLSGDSSYQSYLVISHGNGD* Table 4.2 Plasmids used in this study Includes plasmid ID number and description. Plasmid ID Creator Description pMF1235 FSM His6-SonA_pET28b pFM1236 FSM His6-SonM_pET28b pMF1181 MRJ SonM_gRBS_His6-SonA_pET28b (uses native RBS) pMF1006 n/a bdSENP1 protease (SUMO-cleaving) for expression in E. coli153 pMF1231 n/a S-adenosyl homocysteine nucleosidase (SAHN) pMF1188 MRJ His6-SUMO-SonA pMF1189 MRJ His6-SUMO-SonM 103 Table 4.3 Primers used to create select plasmids Primers ordered from IDT. ID number Description Sequence prFM1175 Forward primer to amplify His6-SonA with Gibson homology arms for insertion into pET28b TTTAAGAAGGAGATATACATGCATC ATCATCATCAT prFM1176 Reverse primer to amplify His6-SonA with Gibson homology arms for insertion into pET28b AGTGCGGCCGCAAGCTTGTTAATCA CCATTACCATG prFM1177 Forward primer to amplify His6-SonM with Gibson homology arms for insertion into pET28b TAAGAAGGAGATATACATGCATCAT CATCATCATCACAGCAGCATGGGAT CACTCGTC prFM1178 Reverse primer to amplify His6-SonM with Gibson homology arms for insertion into pET28b AGTGCGGCCGCAAGCTTGTTATCCC AAATCTTCGGG T7_fw Forward primer used for colony PCR and sequencing TAATACGACTCACTATAGGG T7_rv Reverse primer used for colony PCR and sequencing GCTAGTTATTGCTCAGCGG 4.4.2 Molecular cloning and creation of select plasmid constructs Unless otherwise stated, all cloning enzymes were purchased from New England Biolabs (NEB). For all cloning, standard conditions were used according to the manufacturer’s instructions. Briefly, for amplifying DNA to be used in ligations or Gibson assemblies, Q5 high fidelity DNA polymerase was used. The final concentrations for PCRs were: 1X Standard Q5 reaction buffer, 200 μM dNTPs, 5% DMSO, 0.5 μM forward primer, 0.5 μM reverse primer, 0.02 units/50 μL PCR Q5 polymerase. For colony PCR, OneTaq DNA polymerase was used. The final concentrations for PCRs were: 1X Standard OneTaq PCR buffer, 200 μM dNTPs, 5% DMSO, 0.2 μM forward primer, 0.2 μM reverse primer, and 1.25 units/50 μL reaction of polymerase. Dr. Matthew Jensen created pMF1181, pMF1188, and pMF1189 using sonA and sonM genes amplified directly out of the native organism. For both pMF1235 and pMF1236, standard PCR conditions were used according to the manufacturer’s instructions. The gene coding for His6-SonA was amplified from pMF1181 using primers 104 prFM1175 and prFM1176. A two-stage PCR was used. Initial denaturation 98 °C for 30 s; first five cycles: 98 °C for 5 s, 51.5 °C for 15 s, 72 °C for 10 s; for the remaining 25 cycles, annealing temperature was increased to 65.5 °C; final extension 72 °C for 2 minutes. The gene coding for his6-SonM was amplified from pMF1181 using primers prFM1177 and prFM1178. A two-stage PCR was used. Initial denaturation 98 °C for 30 s; first five cycles: 98 °C for 7 s, 56.5 °C for 15 s, 72 °C for 20 s; for the remaining 25 cycles, annealing temperature was increased to 72 °C; final extension 72 °C for 2 minutes. After verification by agarose gel electrophoresis, the PCR products were cleaned up using a kit (Thermo Scientific). The backbone (pET28b) was prepared by digesting with NcoI-HF and SalI-HF (NEB), treating with Antarctic Phosphatase (NEB), and extracting the digested backbone from an agarose gel (NEB Monarch kit). Gibson assembly for both constructs was performed using HiFi DNA Assembly Master Mix (NEB) according to the manufacturer’s instructions. After incubating the assembly reaction at 50 °C for 60 minutes, 3 μL of the reaction was used to transform electrocompetent TOP10 E. coli cells. Resultant colonies were screened by colony PCR using primers T7_fw and T7_rv. For the colony PCR: initial denaturation 94 °C for 30 s, followed by 30 cycles 94 °C for 20 s, 46.3 °C for 40 s, 68 °C for 60 s, final extension 68 °C for 5 minutes. PCR reaction was set up according to the manufacturer’s instructions. Positive hits were sequence verified by ACGT using Sanger sequencing and the same colony PCR primers. 4.4.3 Heterologous protein expression and purification Heterologous expressions were conducted in E. coli cells BL21(DE3). A 10 mL saturated overnight culture in LB with 50 μg/mL kanamycin was used to inoculate 1 L of TB with 50 μg/mL kanamycin in a 2.5 L baffled Ultra Yield flask (Thomson Scientific). The 1 L culture was incubated in a 37 °C shaker until the OD600 reached approximately 0.7, at which time the culture was cold shocked in an ice bath for 30-60 minutes. After cold shocking, the culture was induced with 200 mM IPTG and placed in a 16 °C shaker for 24 105 h. After 24 h, the cells were harvested by centrifugation at 4000 x g for 30 minutes at 4 °C, snap frozen in liquid nitrogen, and stored at -80 °C until use. For protein purification by nickel affinity chromatography, frozen cells were thawed on ice and then resuspended to homogeneity in ice-cold lysis buffer (300 mM NaCl, 50 mM sodium phosphate, 20 mM imidazole, 10% glycerol, pH 8.0) with 4 mL of buffer for every 1 g of wet cell mass. After resuspension, lysozyme was then added to a final concentration of 1 mg/mL and incubated on ice for 30 minutes. After lysozyme treatment, cells were further lysed by sonication. After sonication, lysate was clarified by centrifugation at 15,000 x g for 45 minutes at 4 °C. The soluble protein from the clarified supernatant was then batch-bound to nickel-NTA resin (GoldBio) for 60 minutes on a rotator at 4 °C. After binding, resin was added to a 5 mL fritted column, washed with 10 column volumes of lysis buffer, and the protein was eluted in lysis buffer with 250 mM imidazole. For subsequent gel filtration chromatography, protein was concentrated, sterile filtered and loaded onto a HiLoad 16/600 Superdex 200 pg size exclusion column was used at a flow rate of 1 ml/min of lysis buffer without imidazole. Protein was analyzed by SDS-PAGE gel, fractions were pooled and concentrated using Amicon Ultra centrifugal filter columns (MilliporeSigma). Concentrations were measured by Bradford assay and proteins were snap frozen in liquid nitrogen and stored at -80 °C until use. When using frozen protein, all samples were thawed on ice, centrifuged at top speed in a microcentrifuge at 4 °C for 10 minutes, aggregate removed by transferring supernatant to a fresh tube, and the concentration re-measured. 4.4.4 SUMO cleavage by bdSENP1 protease bdSENP1 SUMO protease was expressed, purified, and thawed as discussed above. The following protocol was carried out as described previously.153 Briefly, bdSENP1 was used at a 1:1000 molar ratio of bdSENP1:SUMO and was conducted in LS-S buffer (250 mM NaCl, 40 mM tris HCl pH 7.5, 2 mM MgCl2, 2 mM DTT, and 250 mM sucrose). Proteins to be cleaved were dialyzed or buffer exchanged into cold LS-S buffer. Reaction 106 was conducted at 4 °C overnight and cleaved SUMO tags and un-cleaved protein was removed from the samples by Ni-NTA batch purification as described above (his6-SUMO will bind to resin and cleaved protein will reside in the flow through). Samples were analyzed by SDS-PAGE gel. 4.4.5 In vitro multiple turnover experiment for MS analysis SAHN, split borosin methyltransferase and precursor proteins were expressed and purified as described above in separate plasmids (not the co-expression constructs). All three proteins were dialyzed into a buffer containing 50 mM HEPES, 300 mM NaCl, 10% glycerol, pH 8.0. Reactions were conducted in 100 μL final volumes with saturating amounts of SAM (dissolved in 0.5 mM HEPES pH 8.0) and SAHN. An equal amount (25 μM) of precursor was used in all samples to make MS analysis easier (keep [precursor] the same, decrease [methyltransferase] to achieve desired concentrations/ratios). Reactions were incubated at room temperature for 24 h and quenched with SDS sample buffer and boiled prior to in-gel digestion and HPLC-MS/MS analysis. 4.4.6 Mass spectrometric analysis Purified protein was run on an SDS-PAGE gel, stained with Coommassie and destained. After destaining, gel was imaged and appropriate band was excised using a scalpel and cut into 2 mm pieces, which were placed into a LoBind tube (Eppendorf). Gel pieces were destained with 50 mM ammonium bicarbonate (ABC) in a 50% acetonitrile (ACN) solution. Once gel pieces were clear, they were dehydrated with 100% ACN until opaque, at which point ACN was removed. The gel pieces were then re-hydrated with digest buffer according to the manufacturer’s instructions (digest buffer includes the AspN (Promega) protease) for 15 minutes on ice. If the gel pieces were no longer submerged in digest buffer, extra buffer was added to cover them and they were subsequently incubated for at least 16 h at 37 °C. After digestion, supernatant was transferred to a fresh LoBind tube and peptides were extracted from the gel pieces with increasing amounts of ACN 107 (50%, 80%, 95%) and 0.3% formic acid (FA). After extraction, peptide solution was kept at -80 °C for at least 30 minutes to inactivate the enzymes and then speed vacuum concentrated to dryness. Peptides were then resuspended in 0.1% FA solution and purified with a C18 ZipTip (MilliporeSigma) according to the manufacturer’s instructions. After purification, samples were speed vacuum concentrated to dryness and resuspended in 20% ACN, 0.1% FA solution for analysis. Samples were loaded onto a Thermo Scientific Fusion mass spectrometer in accordance with our previously published method.115 108 5 Structural and kinetic analysis of the split borosin methyltransferase and precursor from Shewanella oneidensis MR-1 Fredarla Miller,* Kathryn Crone,* Matthew Jensen, Sudipta Shaw, William Harcombe, Mikael Elias, Michael Freeman *Co-first authors for in-process final manuscript The data and results within this chapter will be submitted for publication upon completion of the full manuscript. This chapter was written by FM for the purpose of this thesis. FM and KC shared the lab work for this chapter; FM cloned the active site mutants, analyzed all the mass spectrometry data, and generated protein crystals for SonMA complexes (WT and active site mutants); KC cloned the BBD SonA mutant, produced the crystal for SonM-BBD, and optimized the kinetics assay to obtain all of the kinetics data. MJ’s contribution is detailed in Chapter 4 of this thesis (initial cloning and verification of active SonM enzyme activity). SS helped loop crystals and provided hands-on assistance for learning crystallography methods. WH made the kinetic model. ME solved the crystal structures, helped design crystallography experiments, and assisted in interpretation of the structural data. Select figures were adapted from MF and KC and may appear in the final manuscript. MF helped design experiments and led the writing of the manuscript. Please see Appendix 2 (Chapter 10) for supplementary mass spectrometry data figures and fitted kinetics curves. 5.1 Introduction Of the three split borosin systems discussed in this thesis, the methyltransferase (SonM) and precursor (SonA) from S. oneidensis MR-1 proved to be the most amenable to biochemical characterization (see Chapter 4). When the two proteins were heterologously co-expressed in E.coli, SonM was shown to α-N-methylate SonA on two residues near its C-terminus (L63 and I65). Furthermore, mass spectrometry evidence showed that SonM could turn over multiple SonA peptides in vivo and in vitro, something that is not possible with the natively fused fungal borosin systems. The preliminary multiple turnover experiments for SonMA were encouraging but not quantitative. Furthermore, as OphMA exhibits 9 methylations on its hydrophobic core peptide, SonMA (with only two 109 methylations) is a “minimal” split borosin system and thus a good candidate for in-depth analysis as a model for other borosin methyltransferases. Thus, we next pursued a more rigorous kinetic and structural characterization of these proteins. Within the last decade, more than a dozen RiPP systems have been structurally investigated. The lynchpin in nearly all the studies is the interaction between recognition motifs within the leader peptide and a corresponding structural motif on a modifying enzyme or scaffolding protein. The most comprehensive example is the RiPP recognition element (RRE) found in at least half of bacterial RiPP BGCs, as shown in Figure 1.4 within the introduction of this thesis.71 The RRE is a structural domain that interacts with precursor peptides in order to “present” the core peptide to the active site of modifying enzymes for posttranslational modification. It is noteworthy that none of the current structural studies present a fully resolved and intact core peptide in an active site. This is a consequence of the dual nature of the precursor peptide: the binding affinity is imparted almost entirely by conserved recognition motifs in the leader moiety. In line with this, the active site of the enzyme may exhibit only minimal binding affinity for the core peptide, instead relying upon the leader to bring the core peptide within catalytic range. Currently, the only RiPP structure that includes a core peptide within an active site is OphMA (and its close homolog dbOphMA).113,114 In these examples, likely due to solubility problems caused by the hydrophobicity of the core peptide, at least six amino acids were truncated from the C-terminus of the protein prior to crystallization.113 This truncated variant was shown to be catalytically active through HPLC-MS/MS experiments. Other OphMA variants investigated via crystallography and HPLC-MS/MS included active site mutants at key residues and a completely truncated core peptide (18 residues removed from the C-terminus of the protein). Surprisingly, beyond some movement of the core peptide, no large conformational changes were seen in the OphMA structures, offering little insight into the dynamic nature of iterative α-N-methylation.113 The unique logic underlying the function of the precursor peptide makes these systems difficult to rigorously characterize—the precursor allows substrate binding to be 110 discrete from catalysis. Furthermore, RiPP enzymes often perform iterative catalysis upon multiple residues within a single core peptide, adding another layer to the challenge of capturing a full catalytic cycle in RiPP biosynthesis.65 Of the available RiPP-related structures, that of OphMA seemed to be the most promising in regards to elucidating the behavior of a core peptide within an active site. In light of the similarities between OphMA and the homologous SonM-SonA proteins, we sought out to take advantage of the well- behaved split system to probe the catalytic mechanism of α-N-methylation further. 5.2 Crystal structure of SonMA WT As discussed in Chapter 4, when SonM and his6-SonA are co-expressed in and subsequently purified from E. coli, they remain bound in a 1:1 molar ratio. Gel filtration chromatography indicated that the two proteins form a heterotetrameric complex consisting of two SonM monomers and two SonA monomers. Generally, for forming protein crystals, proteins must be stable in solution at high concentrations (at least 20 mg/mL is typical) with minimal buffer components such as buffer, salt, or glycerol. The standard buffer used for our SonMA purifications is 50 mM HEPES pH 8, 300 mM NaCl, and 10% glycerol. Thus, we tested the SonMA complex stability by dialyzing fresh purified protein in buffers containing no glycerol and decreasing concentrations of HEPES and NaCl. We then re- bound the dialyzed protein to Ni-NTA resin and performed a small-scale purification. As only SonA possesses a fused his6-tag, SonM protein will only be visible on an SDS-PAGE gel in the elution fraction if it is bound to his6-SonA. Figure 5.1 A shows that even in the minimally buffered solution (10 mM HEPES pH 8), the complex remained stable, with no SonM protein visible in the flow through or wash fractions. Although it is not clearly visible on the gel, his6-SonA is assumed to be present in low amounts (at a 1:1 molar ratio with untagged SonM). SonM is easily visible on the gel due to its higher molecular weight. The complex was also shown to be stable after one freeze-thaw event (Figure 5.1 B). After confirming that the SonMA complex was stable in the minimal buffer solution, purified protein was dialyzed into cold 10 mM HEPES pH 8, concentrated to 20 mg/mL (as 111 measured by Bradford assay), sterile filtered, and submitted to the Nanoliter Crystallization Facility (University of Minnesota). Promising-looking crystals formed in several conditions coalescing around pH 7 with 15-20% polyethylene glycol (PEG) 3350 (Figure 5.1 C). We attempted to replicate these conditions in the lab by testing the indicated concentrations of sodium malonate, malic acid, and succinic acid. We experimented with ranges of pHs and PEG 3350 concentrations. Diffraction-quality crystals formed in 240 mM sodium malonate pH 5.5-7 with 0-20% PEG 3350 at 20 ºC within 24 hours. The individually expressed his6-SonM and his6-SonA proteins, though soluble, were unable to be crystalized. 112 Figure 5.1 Buffer and temperature stability testing of SonMA complex A: SDS-PAGE gel for buffer stability experiment. Flow through (FT), wash (W), and elution (E) fractions are shown. B: SDS-PAGE gel for freeze/thaw stability experiment. A higher concentration of protein was loaded onto this gel so that his6-SonA could be visualized. C: Photos of crystals from Nanoliter Crystallization Facility screening. 113 OphMA is a homodimer in which the core peptides are methylated by the opposite subunits’ methyltransferase active site.60,113 The OphMA methyltransferase domain is well conserved in SonM (they are 36% identical, see Figure 4.1 in the previous chapter for an alignment). Based upon this homology and previously published structural data for OphMA, a molecular replacement strategy was sufficient to phase the diffraction data for the SonMA complex.113 Whereas OphMA forms a homodimeric complex in which an extended clasp domain to form the concatenated ring structure, SonM and SonA are not connected by an analogous clasp. Although the SonMA proteins are not fused, they still adopt an analogous domain arrangement to OphMA, as shown in the 2.0 Å structure presented in Figure 5.2. We have generated a suite of SonMA structures which are presented in Table 5.1 with associated abbreviations and descriptors that will be used throughout this chapter. Figure 5.2 Domain architecture comparison between OphMA and SonMA Left: OphMA (PDB: 5N0Q) is a homodimer. Methyltransferase domain (purple), borosin binding domain (BBD) (blue), and truncated core peptide (orange) are colored. Schematic showing how the two monomers intercalate is shown above with colored arrows. The core peptide of one monomer is methylated by the active site of the opposite monomer. Right: SonMA is a homodimer of heterodimers that follows the same domain arrangement as OphMA. Methyltransferase domain (pink), BBD (teal), and core peptide (orange) are colored. 114 Table 5.1 Structures discussed in this study Here, “apo” is defined as lacking a cofactor in the active site although the core peptide may be present. SonA core is either unmethylated (0Me) or doubly/fully methylated (2Me). SonM variant SonA variant Cofactor bound Structure Name Notes WT WT None WT-apo Two methylations present on SonA Core (2Me) No cofactor in either active site WT WT SAH WT-SAH Two methylations present on SonA Core (2Me) SAH is in both active sites Y58F WT none Y58F-apo Two methylations present on SonA Core (2Me) No cofactor in either active site Y93F WT none Y93F-apo Two methylations present on SonA Core (2Me) No cofactor in either active site R67A WT SAH R67A-SAH No methylations present on SonA Core (0Me) Cofactor is in both active sites WT BBD SAH SonM-BBD Core truncation mutant (no methylation) SAH is in one active site 5.2.1 Borosin binding domain (BBD) OphMA possesses a five-helix bundle in its clasp domain (the region between the methyltransferase domain and core peptide), which is conserved in the leader region of SonA. This structural feature is colored blue or teal in Figure 5.2. Interestingly, this motif is conserved in non-RiPP proteins, namely LigAB, a protocatechuate 4,5-dioxygenase capable of performing an aromatic ring-opening reaction (Figure 5.3). The conserved helical bundle is found in the LigA subunit, which forms a “cap” over the LigB subunit. The LigAB heterodimer is part of a heterotetrameric complex which consists of two LigAB subunits, such that the holoprotein contains two active sites (PDB: 1B4U). The conserved structural motif is also found in DesB (PDB: 3WRB), a homolog of LigAB in which the two subunits are fused. The DesB (fused) and LigAB (split) proteins provide an interesting parallel that seems to mimic the logic behind the domain architecture of OphMA (fused) and SonMA (split) borosin proteins. As the conserved helical bundle motif lies within the leader of the borosin precursor peptide and the RRE is associated with PTM enzymes, we propose that this may be the borosin replacement for an RRE. We have therefore named this structural feature the borosin binding domain (BBD). There is only 26% sequence identity between the BBD of SonA and OphMA; as such we expect this motif, like the 115 RRE, to primarily exhibit structural rather than sequence conservation. Further bioinformatics analyses will be required to fully elucidate its prevalence in other borosin BGCs or how the BBD can inform our understanding of borosin RiPPs and their evolutionary history. However, its role of bringing the SonA core peptide proximal to the SonM active site is foundational to this split borosin system. Thus, the BBD will be further discussed within this somewhat narrow context. Figure 5.3 BBD overlay and alignment Structural overlay and alignment of BBDs from SonA, OphMA (PDB: 5N0Q), LigA (PDB: 1B4U), and DesB (PDB: 3WRB). The root-mean-squared distances among these domains are: 2.3 Å for OphMA_BBD (306 atoms), 1.1 Å for LigA_BBD (251 atoms), and 1.9 Å DesB_BBD (286 atoms). There is 26% sequence identity between SonA and OphMA BBDs. 5.2.2 SonMA and OphMA active site residues are conserved We predicted the active site of SonMA to exhibit structural, sequence, and thus catalytic conservation with OphMA (Figure 5.4). The structural conservation of the active site was confirmed with our SonMA WT-SAH structure, which is shown three dimensionally in Figure 5.4 B. Despite the similarities in sequence and structure, we observed some crucial differences between the active sites of the two proteins. First, and perhaps most critically, the untruncated core peptide of SonA was well-resolved within the SonM active site—indeed, both WT structures (WT-apo and WT-SAH) exhibited a fully 116 resolved, intact, and doubly-methylated core peptide with the second methylation (I65) positioned productively within the active site of SonM. This contrasts with the OphMA structures published previously, which show a C-terminally truncated core peptide in the active site.113 Our structure with the fully resolved and intact SonA core peptide within the active site of its cognate modifying enzyme is the first example of a complete heteromeric RiPP complex. In addition to the core peptide in the active site, we also observed that the SonMA complex exhibits a different affinity for the cofactor than what was previously seen with OphMA. OphMA was only able to be crystallized with a cofactor (SAM or SAH) bound in the active sites (two active sites per homodimer complex)—any attempts to remove the cofactor for crystallography or other analyses resulted in denatured protein.113 In contrast, the first structure we obtained (WT-apo) had no cofactor bound in its active sites. However, we were able to intentionally co-crystallize the protein with SAH such that the molecule occupied both active sites in the SonMA WT complex (WT-SAH). Notably, other than the presence or absence of the cofactor, there is very little conformational change between the WT-apo and WT-SAH structures, including the presence of the fully methylated core peptides. 117 Figure 5.4 Proposed SonM catalytic mechanism A: Based on our kinetic and structural data, the catalytic mechanism of SonM is expected to be similar to that of OphMA. The pink residues labeled in this figure correspond to SonM. We generated mutants based on these residues for kinetic analysis, except for Y93 which is not shown in this panel. The SonA core peptide is shown in orange. B: Actual structure of SonM in complex with SonA and SAH to show catalytic residues in 3D, residue numbers correspond to SonM. OphMA analogous active site residues are overlaid in beige to visualize conserved active site residues. 118 5.3 Kinetic and structural characterization of the SonM active site Next, we sought to probe the catalytic mechanism of SonM. For a rigorous kinetic investigation, we required the ability to utilize enzyme (SonM) and substrate (SonA) in known concentrations. We thus expressed N-terminally his6-tagged sonA and sonM genes in separate cell strains such that the proteins could also be purified separately (Figure 5.5). We selected active site mutants based upon the conserved residues in OphMA and the proposed catalytic mechanism (Figure 5.4). We created the following his6-SonM mutants: Y93F, Y58F, Y71F, Y58F-Y71F, R67K, and R67A. We also created a his6-SonA mutant in which the core peptide was truncated, leaving only the BBD, which is not a substrate for α-N-methylation by SonM. All the mutants listed were also cloned into co-expression constructs (SonM with his6-SonA) for subsequent purification and crystallization attempts. All co-expressed mutants expressed and purified in the same manner as the WT complex as discussed in Chapter 4. Figure 5.5 Ni-NTA purification of his6-SonA and his6-SonM SDS-PAGE showing the heterologous production and subsequent purification of his6-SonA (top) and his6- SonM (bottom) in E. coli and subsequent Ni-NTA batch purification. All mutants expressed and purified easily; WT is shown here as a representative example. 119 For our kinetic analysis, we utilized a continuous coupled-enzyme assay in a microplate reader according to a previously published method.162 This assay indirectly measures each methylation event using three enzymes (Figure 5.6). Briefly, every methylation by SonM requires one substrate (SonA core peptide residue) and one cofactor (SAM). SAM donates one methyl group per methylation event, thus, to be fully (doubly) methylated, each SonA peptide requires two SAM molecules. The rate of reaction can be detected by following the demethylation of SAM to SAH. The coupled-enzyme assay measures the concentration of SAH through the activity of three enzymes, the last of which oxidizes one NAD(P)H molecule for every SAH molecule present in the original reaction. NAD(P)H absorbs at 340 nm (extinction coefficient 6220 M-1cm-1) and the decrease in absorbance at this wavelength is thus directly proportional to SAH concentration. All enzymes except glutamate dehydrogenase (GDH) (which was purchased) were expressed in E. coli and purified by Ni-NTA affinity chromatography and gel filtration chromatography, concentrated, and snap frozen in liquid nitrogen for storage at -80 °C prior to use. The kinetics data we acquired are summarized in Table 5.2 (the fitted curves can be found in Appendix 2 of this thesis, Figure 10.1 and Figure 10.2). 120 Figure 5.6 Schematic for continuous coupled-enzyme kinetic assay The continuous coupled-enzyme assay indirectly measures product (SAH) concentration via NAD(P)H absorbance at 340 nm. Two substrates (SAM and his6-SonA) are required for the methylation reaction catalyzed by SonM, so to determine the kinetic constraints for both substrates, we performed two independent kinetics assays on each SonM variant. In each iteration of the assay, one substrate was in excess (in the WT assays, 1000 μM SAM or 100 μM his6-SonA was used) and the other was varied. For example, in these in vitro conditions, WT was found to have a kcat of 0.52 minute -1 for his6-SonA and 0.47 minute -1 for SAM. The average enzyme has a of kcat 10 s -1, but methyltransferases, including SonM, are much slower than this, measuring turnovers in minutes rather than seconds.162,163 For example, the DNA methyltransferase SET7/9 exhibits a kcat of 32.1 minute -1.162 In stark contrast to these figures, the reported OphMA kcat, App of 0.17 h -1—measured on the scale of hours—requires multiple days to produce the fully methylated core peptide. This value was determined by 121 end-point HPLC-MS/MS experiments.113 We suspect that the very slow reaction rate exhibited by OphMA is at least partly due to the hydrophobicity of the core peptide. The fusion of the core peptide to the enzyme may help keep the core soluble and proximal to the active site—making an otherwise unlikely reaction possible. In this case, a very slow reaction to produce a valuable metabolite is preferable to no metabolite production. The initial structural and kinetic data for the SonMA system provides further evidence that this is a better model system to study borosin RiPP biosynthesis than OphMA: it is a faster enzyme, its split nature makes it amenable to continuous kinetic assays, and it can be crystalized without truncation. In addition, SonM remains soluble and well-behaved even with active site point mutations, thus enabling a thorough investigation of the catalytic mechanism. Previous work on OphMA demonstrated that its structure was sensitive to active site mutations.113 Several OphMA active site mutants resulted in insoluble protein (analogous to SonM Y71F and Y93F mutants) or completely inactive protein as determined by mass spectrometry (analogous to SonM R67K). As SonM appears to be more structurally amenable to mutation, we were able to glean more information about the residues predicted to be important for catalysis. Using our continuous kinetics assay, only two mutants were determined to be inactive as a measurable methylation rate above background was not detected: R67A and the double mutant, Y58F-Y71F. Notably, all active SonM mutants exhibited a decrease in catalytic efficiency compared to WT, ranging from a fold change of -1.6 (Y93F) to -98 (Y71F) for his6-SonA. No off-target methylations were seen in any mutant as verified by HPLC-MS/MS (Figure 10.3). To complement our kinetic analysis, we were able to produce SonMA crystal structures for select SonMA mutants (Y58F-apo, Y93F-apo, and R67A-SAH), SonM WT- apo and WT-SAH, and the SonM-BBD complex. These six structures provide useful insight into understanding the SonM catalytic mechanism for the α-N-methylation of SonA. 122 Table 5.2 SonM kinetics data All SonM active site mutants have a decrease in catalytic efficiency compared to WT. Mutants labeled n.d. indicates that activity above background was not detected. his6-SonA SonM KM (μM) Fold Δ kcat (minute-1) Fold Δ kcat/KM (M-1s-1) Fold Δ WT 8.2 ± 1.5 - 0.52 ± 0.023 - (1.1 ± 0.20) × 103 - Y93F 6.2 ± 1.0 -1.3 0.25 ± 0.0087 -2.1 (0.67 ± 0.11) × 103 -1.6 Y58F 7.6 ± 1.0 -1.1 0.034 ± 0.0016 -15 (0.074 ± 0.011) × 103 -14 R67K 18 ± 4.1 2.3 0.012 ± 0.00077 -43 (0.011 ± 0.0025) × 103 -97 Y71F 9.6 ± 1.2 1.2 0.0061 ± 0.00018 -84 (0.011 ± 0.0013) × 103 -98 Y58F-Y71F n.d. - n.d. - n.d. - R67A n.d. - n.d. - n.d. - SAM SonM KM (μM) Fold Δ kcat (minute-1) Fold Δ kcat/KM (M-1s-1) Fold Δ WT 56 ± 8.5 - 0.47 ± 0.014 - (0.14 ± 0.021) × 103 - Y93F 220 ± 39 3.8 0.24 ± 0.011 -2.0 (0.018 ± 0.00338) × 103 -7.6 Y58F 47 ± 8.7 -1.2 0.030 ± 0.0011 -16 (0.011 ± 0.0020) × 103 -13 R67K 36 ± 10 -1.6 0.011 ± 0.00054 -43 (0.0051 ± 0.0014) × 103 -27 Y71F 82 ± 18 1.5 0.0066 ± 0.00033 -71 (0.0014 ± 0.00030) × 103 -100 Y58F-Y71F n.d. - n.d. - n.d. - R67A n.d. - n.d. - n.d. - 2Me-SonA BBD SonM Ki (μM) SonM Ki (μM) WT 160 ± 26 WT 3.9 ± 0.5 The active site residues we investigated indicate a lower KM for the peptide substrate than for the cofactor; 8.2 μM (SonA) and 56 μM (SAM) in WT. This follows with our observation that the core peptide, when intact and fused to the leader (i.e., excluding the SonM-BBD structure), is always present in the active site whereas the cofactor may be present or absent. The residue Y93 appears to be the most important for cofactor binding as the Y93F mutant has a 3.8-fold higher KM for SAM than WT. Despite the dramatic increase in the KM for SAM, this mutant exhibits a kcat/KM closest to that of WT, both for SAM and SonA. The overall ability of Y93F to compensate for the loss of this residue is supported in the corresponding Y93F-apo structure, which mimics the WT-SAH active site at the Y93(F) residue. Although slight movement of some proximal residues can be detected and the orientation of the mutated residue is altered, there are no otherwise remarkable conformational changes in this structure (Figure 5.7 B). 123 Two tyrosine residues in the active site, Y58 and Y71, appear to work in coordination to both position the core peptide productively and play a role in catalysis. We created three SonM variants to investigate the role of these two residues: two individual mutants and a double mutant. Our kinetics analysis demonstrates that loss of Y58 or Y71 can be compensated for by the remaining residue, but the loss of both residues renders the enzyme inactive (as shown in our kinetics assay and HPLC-MS/MS analysis). We hypothesize this has to do with the angle that residue Y71 forms with a carbonyl of the core peptide backbone. In this interaction, the Y71 side chain donates a hydrogen for H- bonding. In both of our WT structures, this angle is constrained from the canonical trigonal planar angle of 120° to 109-110°. We believe this constrained angle causes the oxygen to exhibit a trigonal pyramidal sp3 hybridization (109.5°). This may facilitate the delocalization of the electrons across the amide bond onto the oxygen. In this case, these constrained angles resemble the trigonal pyramidal geometry that the oxygen would require to accommodate sp3 hybridization from the additional lone pair of electrons. Our evidence shows that Y58 and Y71 both play a role in maintaining this angle because they both interact with the same carbonyl of the core peptide (Y58 maintains an angle of 119-120° with the same carbonyl in WT and WT-SAH structures). This is further supported with the Y58F-apo structure, which shows a more relaxed angle of 119° for Y71 (Figure 5.7 A). This relaxed angle likely plays a role in the lower kcat for the Y58F mutant because the carbonyl can only H-bond with one Tyr side chain (Y71) and it is less favorable to maintain the negative charge on the carbonyl oxygen. We were unable to obtain crystal structures for Y71F or the double mutant, which we expect is due in part to a partially- or completely- unmethylated core peptide. As will be discussed in more detail below, mutants with unmethylated core peptides were challenging to crystallize. 124 Figure 5.7 Structural analysis of SonM active site mutants A: Angle comparison of Y71 residue with the core peptide backbone in three structures. Angle to Y58 is ~120º in both WT structures. B: Overlay of Y93F mutant (dark pink/orange) and SonMA (no SAH) (light pink/orange) active sites. Select residues near the active site exhibit slight concerted movement in the Y93F mutant, especially the Y93F residue which mimics the SAH-bound conformation (middle box of panel A). In addition to identifying the roles of specific amino acids during catalysis, we were also interested in describing the catalytic process at a larger scale. We have evidence of a strict N- to C-terminal directionality for methylation in the SonMA system (Figure 10.3 A), which is consistent with other characterized borosin systems.115 What sets the SonMA system apart from the previously characterized borosins is the separation of the precursor peptide from the methyltransferase enzyme into discrete proteins. In this situation, which allows for multiple turnover of precursor peptides, we were curious if SonA dissociates from the SonMA complex between methylations and/or if there are kinetic differences between the sequential methylation reactions (Figure 5.8 B and C). To pursue this line of investigation, we required data describing the relative amounts of each methylation state (0Me, 1Me, or 2Me) of SonA over the course of an in vitro reaction with SonM WT. To this end, several time points were taken during a continuous kinetic assay. For this assay, SAM was used in excess such that it was not a kinetic variable for the model. Additionally, 100 μM SonA was used (which is much higher than the measured KM), allowing the 125 reaction to run at its kcat. Time points were taken in duplicate, reactions quenched, run on a gel, and subjected to HPLC-MS analysis (Figure 5.8 D). The relative methylation states of SonA and kinetic constraints from our data were then compared to a kinetic model in an attempt to describe the reaction process (Figure 5.8 A). The model we generated relies on foundational assumptions of Michaelis Mentin steady state kinetics including a static KM during the reaction and the presence of one substrate and one product. Based on this model, the first methylation reaction (occurring at L63 of SonA) is slightly less efficient than the second methylation occurring at I65 and allows for the complete dissociation of SonA from the SonM complex between methylations. We were interested to parse the SonA precursor peptide into its two parts (BBD and core) to attempt to determine the role each plays in binding and/or reaction progression. Further validating this model, we also performed competitive inhibition assays with SonA and either BBD (truncated SonA with no core peptide) or doubly methylated SonA (2Me-SonA) to calculate a Ki for both (Table 5.2). We found that SonM WT exhibits a low Ki for the BBD (3.9 μM) and a high Ki for 2Me- SonA (160 μM), which indicates that the BBD contributes most of the binding affinity of SonA to SonM and the core peptide is less important for tight binding/complex formation. This conclusion is supported by the SonM-BBD and R67A-SAH structures, which exhibit dramatic conformational changes based upon the characteristics of the core peptide. 126 Figure 5.8 Kinetic model for the methylation of SonA A: Kinetic model for the methylation of SonA by SonM. The actual and estimated/modeled relative abundances of each methylation state (0Me, 1Me, or 2Me) over time are shown in solid or dashed lines, respectively. The best fit occurs if the first methylation is at least 2x slower than the second methylation reaction. Model is based on the kinetic data shown in Table 5.2 and the mass spectrometry data in panel D. B: Predicted kinetic values based upon the model. C: Proposed reaction order, values shown in Table 5.7. D: EIC chromatograms from HPLC-MS of SonA for comparison to the kinetic model. Relative methylation states are shown for SonA at indicated time points for two replicates. L63 is methylated first, I65 is methylated second. 127 5.4 Dramatic conformational changes occur due to core peptide characteristics Most of the structures discussed thus far (WT-apo, Y58F-apo, Y93F-apo) all share certain characteristics including no cofactor bound within the active site and a doubly methylated core peptide that exhibits an extended loop conformation. The only exception to the former rule was WT-SAH, which was intentionally co-crystalized with SAH. The crystals formed from these five proteins were also similar, producing an asymmetrical and elongated rectangular prism. The remaining two structures (SonM-BBD and R67A-SAH) required alternative conditions and/or further optimization to the crystallization process and resulted in drastically different crystal morphologies. SonMA R67A produced needle crystals and SonM-BBD produced large, flat crystals (Figure 5.9). We anticipated that these alternative crystal morphologies were indicative of dramatic conformational changes to the SonMA protein complexes. Figure 5.9 Crystal morphologies of WT and select mutants Note that images are not to scale. The most common crystal morphology we identified is as shown in SonMA WT. The R67A mutant exhibited needle-like crystals and SonM-BBD exhibited needle or flat crystals. The SonM-BBD structure, in which the core peptide has been truncated, exhibited a particularly unique asymmetric unit in which the two active sites of SonM were differentially occupied: one active site has SAH bound and one was empty, providing a unique opportunity to visualize two active site conformations in the same complex. The most striking feature of the SonM-BBD structure is the dramatic movement in the clamping loops, here termed Loops A and B (Figure 5.10 and Figure 5.11). When SAH is bound, 128 the loops act as a clamp over the active site and exhibit the same conformation as seen in both WT structures, even though no core peptide is present. The other active site within the complex is apo and the corresponding loops are unclamped to expose the active site. Loop A consists of approximately 16 residues and Loop B spans 10 residues from Y58 to R67. Y71 may interact with the ε-nitrogen of the R67 sidechain, but beyond this, few direct interactions are obvious. It is intriguing that the presence of SAH seems to cause the active site loops to clamp over a non-existent core peptide substrate. We believe that these dynamic loops may allow entry of SAM and the core peptide of SonA into the active site of SonM, as well as permit the exchange of SAH for SAM to allow for catalytic turnover. Figure 5.10 Differentially occupied active sites of SonM-BBD structure and loop movement SonM-BBD structure. Darker subunits have SAH bound in active site; lighter subunits are apo. Inset shows overlay of both active sites in the same structure to visualize large conformational changes in the loops. In the apo conformation, the loops are approximately 17.6 Å apart. When closed, the loops are approximately 5.9 Å apart (measurements shown in Figure 5.11). The SonM R67A mutant was determined to be inactive by HPLC-MS/MS, our kinetics assay, and by the presence of an unmethylated core peptide in the active site of our R67A-SAH structure. Upon close inspection of the SonMA WT and OphMA structures, we noticed that the SonM R67A residue may exhibit long-range contacts with several residues in Loop A. Notably, the analogous OphMA mutant (R72A) produced a structure with the core peptide in an alternative/inactive conformation.113 The SonMA R67A structure revealed dramatic changes to the active site which included a conformational 129 change of the clamping loops and the unmethylated core peptide in a new α-helix conformation. When taken together with the SonM-BBD and WT-SAH structures, we can begin to visualize the reaction process on a larger scale. First, via the BBD, SonA binds to the SonM homodimer which brings the SonA core peptide proximal to the SonM active site (Figure 5.11, top). When SAM and the core peptide enter the active site through the open loops, Loops A and B clamp over the active site (Figure 5.11, middle). With the unmethylated core peptide helix and cofactor in place, α-N-methylation takes place on L63 and the helix of the core peptide is broken causing the peptide to lose its secondary structure (Figure 5.11, bottom). It is likely that the dynamic secondary structure of the core peptide plays a direct role in determining the N- to C-directionality of methylation and the methylation pattern, allowing the enzyme to discriminate between substrate and product. When compared to the doubly methylated core peptide conformation, more of the unmethylated α-helix core peptide can be threaded into the active site, due in part to the more compact secondary structure. In the case of R67A-SAH, the core peptide is not positioned in a catalytically active conformation, which is similar to the analogous OphMA R72A structure (L63 and I65 of SonA are too far in the active site). Through possible long- range interactions, R67 may play a role in positioning the core peptide for catalysis on the correct residues of the core peptide. Remarkably, to accommodate the coiled core peptide within the active site, the BBD within the SonA leader region must also adapt a new conformation. When the core peptide winds into the compact α-helix conformation, the C- terminal helix of the BBD must “unwind” to provide slack and allow the core peptide to “reach” into the active site (shown in light cyan in Figure 5.11). In this way, the BBD utilizes a metamorphic helix to compensate for the critical conformational changes required for core positioning and catalysis. 130 Figure 5.11 Structural conformations of core peptide Top: Loops A and B are open, no core is present. Middle: unmethylated core peptide in α-helix conformation in the active site, L63 and I65 are shown as sticks. Bottom: doubly methylated core peptide loses its secondary structure. 5.5 Conclusion This chapter details the rigorous kinetic and structural characterization of the first split borosin system from S. oneidensis MR-1. The natively discrete substrate (core peptide) and enzyme (methyltransferase) from this system has allowed us to utilize a continuous kinetic assay. We have used this assay to investigate how specific amino acid residues in the active site of SonM affect catalysis to determine kinetic values for both substrates of SonM: SonA and SAM. We have confirmed that SonM and OphMA active site residues are conserved, but the SonMA system is much faster, completing methylations on a scale of minutes instead of hours, and is capable of multiple turnover of peptide substrates. 131 We have also produced six highly resolved crystal structures, revealing a suite of conformational changes both in the enzyme and the substrate critical for catalysis. Of note are the two conformations of the core peptide in its methylated (WT-apo, WT-SAH, Y93F- apo, Y58F-apo) and unmethylated (R67A-SAH) state within the active site of its cognate modifying enzyme. The movement of dynamic loops of SonM through various catalytic stages has also been described. To date, no comparable RiPP heteromeric complex has been published that maintains an intact core peptide and highlights the system’s dynamic nature. Additionally, we have identified a novel leader peptide fold that we named the BBD. Like typical leaders in RiPP biosynthesis, it is responsible for most of the binding energy required for precursor binding to SonM. However, the BBD has been revealed to be a dynamic fold that can fully unwind one helix in order to facilitate catalysis and allow the coiled core peptide to reach the active site. Future experiments will seek to further characterize this system. We are interested to understand the conformational changes required to move the core peptide from the first methylation into a productive position for the second methylation. Obtaining structures of the SonM homodimer and SonA monomer will also be useful in this endeavor. Broadly, this investigation into the SonMA RiPP system, the first characterized split borosin system, provides insight into the biosynthetic capability of RiPPs. The dynamic interplay between leader peptides, core peptides, and modifying enzymes is only beginning to be understood. Furthermore, this success widens the scope for the discovery of additional split borosins with unique domain architectures which are sure to reveal unique metabolites. 5.6 Materials and methods HiFi DNA Assembly Master Mix, restriction enzymes, phosphatase, OneTaq, and Q5 High Fidelity polymerases were purchased from New England Biolabs (NEB). AspN sequencing grade protease was purchased from Promega. Primers were ordered from IDT. Unless otherwise stated, chemicals and reagents were purchased from MilliporeSigma. Shewanella oneidensis MR-1 bacteria were given by Dr. Jeffrey Gralnick. 132 Table 5.3 Structure statistics (abbreviated) Construct Resolution (Å) Rfree/Rwork SonM WT (no cofactor) + 2Me SonA 2 24.57/19.62 SonM WT (SAH) + 2Me SonA 2.1 23.74/17.99 SonM Y58F + 2Me SonA 2.2 27.44/23.93 SonM Y93F + 2Me SonA 2 24.7/20.04 SonM R67A (SAH) + 0Me SonA 2.32 22.90/18.26 SonM WT + BBD of SonA (1/2 active sites with SAH) 1.75 21.84/18.64 Table 5.4 Primers used in this study Name Sequence (5’-3’) Description prmMRJ036_fw ACTTTAAGAAGGAGATATAC CATGGGATCACTCGTCTGTG fw primer to amplify SonM with Gibson overhang into pET28b vector with SonA prmMRJ043_rev GATGATGATGATGATGCATG TTTTCTCCTTATTGTTAATAA TGATTCAATAAC rev primer to amplify SonM with Gibson overhang to allow assembly with His- SonA into pET28b prmMRJ044_fw AGGAGAAAACATGCATCATC ATCATCATCACATGTCTGGAT TATCGGATTTTTTTAC fw primer to amplify SonA with Gibson overhang and N-terminal his tag into pET28b vector with SonM prmMRJ045_rev CGAGTGCGGCCGCAAGCTTG TCGACTTAATCACCATTACCA TGTG rev primer to amplify SonA with Gibson overhang to allow assembly with SonM into pET28b T7_fw TAATACGACTCACTATAGGG Used for colony PCR and sequencing in pET28b plasmids T7_rv GCTAGTTATTGCTCAGCGG Used for colony PCR and sequencing in pET28b plasmids prFM1175 TTTAAGAAGGAGATATACAT GCATCATCATCATCAT forward primer to amplify His-SonA for Gibson assembly into pET28b prFM1176 AGTGCGGCCGCAAGCTTGTT AATCACCATTACCATG reverse primer to amplify His-SonA for Gibson assembly into pET28b prFM1177 TAAGAAGGAGATATACATGC ATCATCATCATCATCACAGC AGCATGGGATCACTCGTC forward primer to add his6 tag to N-term of SonM and assemble into pET28b prFM1178 AGTGCGGCCGCAAGCTTGTT ATCCCAAATCTTCGGG reverse primer to amplify His-SonM for assembly into pET28b prFM1191 GAAGTTAAAAATAAACGAGA CACCTACGA SonM-R67K_fw prFM1192 GAAGTTAAAAATGCCCGAGA CACCTAC SonM-R67A_fw prFM1193 ACCATTTTGCGCATAAAACT GCTG SonM-R67_rev prFM1194 CGAGACACCTTCGAGCAAAT GGTC SonM-Y71F_fw prFM1195 GCGATTTTTAACTTCACCATT TTGCG SonM-Y71_rev prFM1212 GCAGCAGTTTTTTGCGCAAA A SonM-Y58F_fw 133 prFM1213 AAATTGATGACATTGGGGTT GAGC SonM-Y58_rev prFM1214 TGTGCACTCTTCGGTCATCC SonM-Y93F_fw prFM1215 CACGGTTTTTTTACCCGCTCT C SonM-Y93_rev prKKC1010 GAGCTCGAATTCGGATCTTA ACCACTTAACGT reverse primer to amplify SonA helical bundle for assembly into pET28b vector Table 5.5 Plasmids used in this study ID Description pMF1181 SonM-gRBS-His-SonA_pET28b pMF1235 His-SonA_pET28b pMF1236 His-SonM_pET28b pMF1230 His-ADE (JW_3640 ASKA collection) pMF1231 His-SAHN (JW_0155 ASKA collection) pMF1256 SonM-R67A-gRBS_His-SonA_pET28b pMF1257 His-SonM-R67A_pET28b pMF1258 SonM-R67K-gRBS_His-SonA_pET28b pMF1259 His-SonM-R67K_pET28b pMF1260 SonM-Y71F-gRBS_His-SonA_pET28b pMF1261 His-SonM-Y71F_pET28b pMF1263 SonM-Y58F-gRBS_His-SonA_pET28b pMF1264 His-SonM Y58F_pET28b pMF1265 SonM-Y58F-Y71F-gRBS_His-SonA_pET28b pMF1266 His-SonM Y58F + Y71F_pET28b pMF1267 SonM-Y93F-gRBS_His-SonA_pET28b pMF1268 His-SonM Y93F_pET28b pMF1269 His-SonA_helicalbundle_pET28b pMF1283 SonM-gRBS-His-SonA_helical bundle_pET28b Table 5.6 DNA sequences UniProt ID Name Description P31441 ADE Adenine deaminase ATGAATAATTCTATTAACCATAAATTTCATCACATTAGCCGGGCTGAATACCAGGAATTG TTAGCCGTTTCCCGTGGCGACGCTGTTGCCGATTATATTATTGATAATGTCTCTATTCTCG ACCTGATCAATGGCGGAGAAATTTCCGGCCCAATTGTGATTAAAGGACGTTACATTGCC GGTGTTGGCGCAGAATACACTGATGCTCCGGCTTTGCAGCGGATTGATGCTCGCGGCGC AACGGCGGTGCCAGGGTTTATTGATGCTCACCTGCATATTGAATCCAGCATGATGACGC CGGTCACTTTTGAAACCGCTACCCTGCCGCGCGGCCTGACGACCGTTATTTGCGACCCTC ATGAAATCGTCAACGTGATGGGCGAAGCCGGATTCGCCTGGTTTGCCCGCTGTGCCGAA CAGGCAAGGCAAAACCAGTACTTACAGGTCAGCTCTTGCGTACCCGCCCTGGAAGGCTG CGATGTTAACGGTGCCAGTTTTACCCTTGAACAGATGCTCGCCTGGCGGGACCATCCGC AGGTTACCGGCCTTGCAGAAATGATGGACTACCCTGGCGTAATTAGCGGGCAGAATGCG CTGCTCGATAAACTGGATGCATTTCGCCACCTGACGCTGGACGGTCACTGCCCGGGTTTG GGTGGTAAAGAACTTAACGCCTATATTACTGCGGGTATTGAAAACTGCCACGAAAGTTA TCAGCTGGAAGAAGGACGCCGGAAATTACAACTCGGCATGTCGTTGATGATCCGCGAAG GGTCCGCTGCCCGCAATCTCAACGCGCTGGCACCGTTGATCAACGAATTTAACAGCCCG 134 CAATGCATGCTCTGTACCGATGACCGTAACCCGTGGGAGATCGCCCATGAAGGACACAT CGATGCCTTAATTCGCCGCCTGATCGAACAACACAATGTGCCGCTGCATGTGGCATATC GCGTCGCCAGCTGGTCGACGGCGCGCCACTTTGGTCTGAATCACCTCGGCTTACTGGCAC CCGGCAAGCAGGCCGATATCGTCCTGTTGAGCGATGCGCGTAAGGTCACGGTGCAGCAG GTACTGGTGAAAGGCGAGCCGATTGATGCGCAAACCTTACAGGCGGAAGAGTCGGCGA GACTGGCACAATCCGCTCCGCCATATGGCAACACCATTGCCCGCCAGCCAGTTTCCGCC AGCGACTTTGCCCTGCAATTTACGCCCGGAAAACGCTATCGGGTCATTGACGTCATCCAT AACGAATTGATTACGCACTCCCACTCCAGCGTCTACAGCGAAAATGGTTTTGATCGCGA TGATGTGAGCTTTATTGCCGTACTTGAGCGTTACGGGCAACGGCTGGCTCCGGCTTGTGG TTTGCTTGGCGGCTTTGGACTGAATGAAGGTGCGCTGGCTGCGACGGTCAGCCATGACA GCCATAATATTGTGGTGATCGGTCGCAGTGCCGAAGAGATGGCGCTGGCGGTCAATCAG GTGATTCAGGATGGCGGCGGGCTGTGCGTGGTACGTAACGGCCAGGTACAAAGTCATCT GCCGTTACCCATTGCCGGGCTGATGAGCACCGACACGGCGCAGTCGCTGGCGGAACAAA TTGACGCCTTGAAAGCCGCCGCCCGTGAATGCGGTCCGTTACCCGATGAGCCGTTTATTC AGATGGCGTTTCTTTCTCTGCCAGTGATCCCCGCGCTAAAACTAACCAGTCAGGGGCTAT TTGATGGCGAGAAGTTTGCCTTCACTACGCTGGAAGTCACGGAATAA P0AF12 SAHN S-adenosylhomocysteine nucleosidase ATGAAAATCGGCATCATTGGTGCAATGGAAGAAGAAGTTACGCTGCTGCGTGACAAAAT CGAAAACCGTCAAACTATCAGTCTCGGCGGTTGCGAAATCTATACCGGCCAACTGAATG GAACCGAGGTTGCGCTTCTGAAATCGGGCATCGGTAAAGTCGCTGCGGCGCTGGGTGCC ACTTTGCTGTTGGAACACTGCAAGCCAGATGTGATTATTAACACCGGTTCTGCCGGTGGC CTGGCACCAACGTTGAAAGTGGGCGATATCGTTGTCTCGGACGAAGCACGTTATCACGA CGCGGATGTCACGGCATTTGGTTATGAATACGGTCAGTTACCAGGCTGTCCGGCAGGCT TTAAAGCTGACGATAAACTGATCGCTGCCGCTGAGGCCTGCATTGCCGAACTGAATCTT AACGCTGTACGTGGCCTGATTGTTAGCGGCGACGCTTTCATCAACGGTTCTGTTGGTCTG GCGAAAATCCGCCACAACTTCCCACAGGCCATTGCTGTAGAGATGGAAGCGACGGCAAT CGCCCATGTCTGCCACAATTTCAACGTCCCGTTTGTTGTCGTACGCGCCATCTCCGACGT GGCCGATCAACAGTCTCATCTTAGCTTCGATGAGTTCCTGGCTGTTGCCGCTAAACAGTC CAGCCTGATGGTTGAGTCACTGGTGCAGAAACTTGCACATGGCTAA Q8EGW3 SonM (SO1478) Borosin methyltransferase ATGGGATCACTCGTCTGTGTGGGCACTGGGTTACAGCTCGCGGGGCAAATTAGCGTATT AAGCCGCAGCTATATTGAACATGCCGATATTGTATTTTCACTCTTACCTGACGGTTTCTC GCAGCGTTGGTTGACGAAGCTCAACCCCAATGTCATCAATTTGCAGCAGTTTTATGCGCA AAATGGTGAAGTTAAAAATCGCCGAGACACCTACGAGCAAATGGTCAATGCCATTCTAG ATGCGGTGAGAGCGGGTAAAAAAACCGTGTGTGCACTCTACGGTCATCCGGGGGTATTT GCCTGTGTATCCCATATGGCGATAACTCGGGCGAAGGCCGAAGGGTTTTCGGCAAAGAT GGAGCCGGGGATTTCGGCCGAAGCTTGCCTGTGGGCCGACTTAGGGATTGACCCCGGCA ACTCGGGGCATCAAAGTTTTGAAGCTAGCCAGTTTATGTTTTTCAACCATGTGCCCGATC CCACTACCCACTTATTACTCTGGCAAATCGCCATTGCAGGCGAACATACCTTAACCCAAT TTCATACCTCGAGTGATAGGTTGCAGATCCTCGTGGAGCAGTTGAATCAATGGTATCCCC TCGACCATGAGGTGGTCATATACGAAGCGGCCAATTTGCCAATCCAAGCCCCGCGTATC GAGCGTTTACCTTTAGCGAATTTACCCCAAGCACACTTAATGCCGATTAGTACGTTGTTA ATTCCGCCAGCAAAAAAGCTGGAGTACAACTATGCTATTTTGGCTAAGTTAGGGATCGG TCCCGAAGATTTGGGATAA Q8EGW2 SonA (SO1479) Borosin RiPP precursor ATGTCTGGATTATCGGATTTTTTTACCCAGTTAGGCCAAGATGCGCAGTTAATGGAAGAC TATAAACAGAATCCTGAGGCGGTGATGCGTGCCCACGGATTAACTGATGAACAAATTAA CGCTGTAATGACTGGGGATATGGAAAAGCTCAAAACGTTAAGTGGTGATAGTAGCTATC AATCTTACCTTGTTATTTCACATGGTAATGGTGATTAA 135 n/a His6-SonM Hexahistidine tagged borosin precursor for heterologous expression ATGCATCATCATCATCATCACAGCAGCATGGGATCACTCGTCTGTGTGGGCACTGGGTTA CAGCTCGCGGGGCAAATTAGCGTATTAAGCCGCAGCTATATTGAACATGCCGATATTGT ATTTTCACTCTTACCTGACGGTTTCTCGCAGCGTTGGTTGACGAAGCTCAACCCCAATGT CATCAATTTGCAGCAGTTTTATGCGCAAAATGGTGAAGTTAAAAATCGCCGAGACACCT ACGAGCAAATGGTCAATGCCATTCTAGATGCGGTGAGAGCGGGTAAAAAAACCGTGTGT GCACTCTACGGTCATCCGGGGGTATTTGCCTGTGTATCCCATATGGCGATAACTCGGGCG AAGGCCGAAGGGTTTTCGGCAAAGATGGAGCCGGGGATTTCGGCCGAAGCTTGCCTGTG GGCCGACTTAGGGATTGACCCCGGCAACTCGGGGCATCAAAGTTTTGAAGCTAGCCAGT TTATGTTTTTCAACCATGTGCCCGATCCCACTACCCACTTATTACTCTGGCAAATCGCCAT TGCAGGCGAACATACCTTAACCCAATTTCATACCTCGAGTGATAGGTTGCAGATCCTCGT GGAGCAGTTGAATCAATGGTATCCCCTCGACCATGAGGTGGTCATATACGAAGCGGCCA ATTTGCCAATCCAAGCCCCGCGTATCGAGCGTTTACCTTTAGCGAATTTACCCCAAGCAC ACTTAATGCCGATTAGTACGTTGTTAATTCCGCCAGCAAAAAAGCTGGAGTACAACTAT GCTATTTTGGCTAAGTTAGGGATCGGTCCCGAAGATTTGGGATAA n/a His6-SonA Hexahistidine tagged borosin methyltransferase for heterologous expression ATGCATCATCATCATCATCACATGTCTGGATTATCGGATTTTTTTACCCAGTTAGGCCAA GATGCGCAGTTAATGGAAGACTATAAACAGAATCCTGAGGCGGTGATGCGTGCCCACGG ATTAACTGATGAACAAATTAACGCTGTAATGACTGGGGATATGGAAAAGCTCAAAACGT TAAGTGGTGATAGTAGCTATCAATCTTACCTTGTTATTTCACATGGTAATGGTGATTAA n/a His6-SonA_BBD Hexahistidine tagged SonA helical bundle/BBD ATGCATCATCATCATCATCACATGTCTGGATTATCGGATTTTTTTACCCAGTTAGGCCAA GATGCGCAGTTAATGGAAGACTATAAACAGAATCCTGAGGCGGTGATGCGTGCCCACGG ATTAACTGATGAACAAATTAACGCTGTAATGACTGGGGATATGGAAAAGCTCAAAACGT TAAGTGGTTAA 5.6.1 Genomic DNA extraction Genomic DNA from Shewanella oneidensis MR1 was extracted by resuspending cell mass in 600 μL lysis buffer (10 mM Tris pH 8, 1 mM EDTA pH 8, 0.6% SDS, 120 μg/mL proteinase K) and incubating 1 h at 37 ºC. An equal volume of phenol:chloroform:isoamyl alcohol (25:24:1 v:v:v) was added and mixed well by inversion. After centrifugation at top speed at room temperature for 5 minutes, upper aqueous phase was transferred into a fresh tube. Addition of lysis buffer was repeated until white protein phase disappeared. Phenol was removed by adding an equal volume of chloroform:isoamyl alcohol (24:1 v:v) to the aqueous layer, mixing by inversion and then spinning at 14000 x g at room temperature for 5 minutes. Aqueous layer was removed to fresh tube and DNA was precipitated using ethanol. 136 5.6.2 Cloning All constructs for the heterologous expression of SonM (Uniprot Q8EGW3) and SonA (Uniprot Q8EGW2) proteins in E. coli were made using the genes cloned out of the native organism. Q5 polymerase was used to amplify sonM and sonA genes from the extracted genomic DNA according to the manufacturer’s instructions (Q5 standard buffer used at 1X, 200 μM dNTPs, 0.5 μM each primer, 0.02 U polymerase/50 μL PCR, with 5% DMSO). All constructs were made using Hi Fi DNA Assembly Master Mix. To make the SonM-SonA co-expression construct, pET28b backbone was digested with NcoI-HF and SalI-HF, treated with Antarctic phosphatase, and the band was extracted from an agarose gel using a kit (Thermo Scientific). The native RBS was used in the co- expression construct and an N-terminal hexa-histidine (his6) tag was added to sonA. Gene sonM was amplified using primers prmMRJ036_fw and prmMRJ043_rev. Q5 polymerase was used as described above with the following reaction conditions: Initial denaturation 30 s at 98 ºC; denature 98 ºC 10 s, anneal 61.5 ºC 30 s, extend 72 ºC 25 s for 30 cycles; final extension 72 ºC 2 minutes. Gene sonA was amplified with an N-terminal his6 tag using primers prmMRJ044_fw and pRMMRJ045_rev in a PCR reaction as follows: Initial denaturation 30 s at 98 ºC; denature 98 ºC 10 s, anneal 57.5 ºC 30 s, extend 72 ºC 7 s for 30 cycles; final extension 72 ºC 2 minutes. Overlap extension PCR was used to join the sonM and his6-sonA amplicons as follows: using these two amplicons as DNA template, the first five cycles were allowed to proceed without primers under the following conditions: Initial denaturation 30 s at 98 ºC; denature 98 ºC 10 s, anneal 68 ºC 30 s, extend 72 ºC 25 s for 5 cycles; after the fifth cycle, primers prmMRJ036_fw and prmMRJ045_rev were added and the annealing temperature was increased to 72 ºC for the remaining 25 cycles, followed by a final extension 72 ºC 2 minutes. Resulting band was excised from an agarose gel before assembly into the backbone. Assembly was transformed into electrocompetent TOP10 E. coli cells and colonies were screened via colony PCR using primers T7_fw and T7_rv and OneTaq polymerase. The PCR reaction was set up as follows: Standard PCR buffer at 1X, 200 μM dNTPs, 0.2 μM each primer, 1.25 U 137 polymerase/50 μL reaction, with 5 % DMSO; initial denaturation 30 s at 94 ºC; 30 cycles denature 94 ºC 20 s, anneal 46.3 ºC 40 s, extend 68 ºC 60 s; final extension 68 ºC 5 minutes. Colonies showing a correctly sized band were sequence verified by ACGT. For individual expressions, sonM and sonA genes were amplified from extracted genomic DNA. An N-terminal his6 tag was added to each gene before assembly into the same backbone as the co-expression construct. For his6-sonM, prFM1177 and prFM1178 primers were used in a standard Q5 polymerase reaction as follows: Initial denaturation 30 s at 98 ºC; first five cycles denature 98 ºC 7 s, anneal 56.5 ºC 15 s, extend 72 ºC 20 s; remaining 25 cycles increase annealing temperature to 72 ºC; final extension 72 ºC 2 minutes. For his6-sonA, prFM1175 and prFM116 primers were used in a standard Q5 polymerase reaction as follows: Initial denaturation 30 s at 98 ºC; first five cycles denature 98 ºC 5 s, anneal 51.5 ºC 15 s, extend 72 ºC 10 s; remaining 25 cycles increase annealing temperature to 65.5 ºC; final extension 72 ºC 2 minutes. PCR products were cleaned up using a kit (Thermo Scientific), assembled with the backbone via Hi Fi DNA Assembly Master Mix, transformed into TOP10 E. coli electrocompetent cells, screened via colony PCR using OneTaq and aforementioned T7 primers, and sequence verified by ACGT as described above. Active site mutants of sonM were constructed in the co-expression and individual expression backgrounds using site directed mutagenesis. Primers prFM1191-prFM1215 were used in appropriate pairs to amplify the entire plasmid under standard Q5 reaction conditions: initial denaturation 30 s at 98 ºC; denature 98 ºC 10 s, anneal 63.5 ºC 20 s, extend 72 ºC 3 minutes for 30 cycles; final extension 72 ºC 2 minutes. PCR reaction was cleaned up using a kit (Thermo Scientific) and treated with T4 polynucleotide kinase and ligase (NEB) according to manufacturer’s instructions. Subsequent transformation and sequencing was performed as described as above. BBD (SonA with truncated core peptide sequence) constructs were assembled into a pET28b empty vector that was digested with NcoI-HF and BamHI, treated with Antarctic phosphatase, and the band was extracted from an agarose gel using a kit (Thermo 138 Scientific). Both inserts described here used plasmid pMF1181 as PCR template DNA. To amplify his6-sonA_helicalbundle, a standard Q5 PCR was run with primers prFM1175 and pKKC1010: Initial denaturation 30 s at 98 ºC; denature 98 ºC 10 s, anneal 67 ºC 30 s, extend 72 ºC 20 s for 30 cycles; final extension 72 ºC 2 minutes. To create the co-expression construct of sonM and sonA_helicalbundle, prmMRJ_036 and pKKC1010 were used in a standard Q5 PCR: Initial denaturation 30 s at 98 ºC; denature 98 ºC 10 s, anneal 72 ºC 20 s, extend 72 ºC 30 s for 30 cycles; final extension 72 ºC 2 minutes. PCR products were digested with DpnI and cleaned up using a kit (Thermo Scientific), assembled with the backbone via Hi Fi DNA Assembly Master Mix, transformed into TOP10 E. coli electrocompetent cells, screened via colony PCR using OneTaq and aforementioned T7 primers, and sequence verified by ACGT, all as described above. 5.6.3 Protein purification E. coli BL21(DE3) cells were transformed with the pET28b expression plasmids and cultured overnight with 50 μg/mL kanamycin at 37 °C. A 10 mL overnight culture was added to 1 L Terrific Broth with 50 μg/mL kanamycin in 2.5 L baffled flasks (Thomson Scientific) with foam stoppers. The 1 L culture was grown to an optical density at 600 nm (OD600) of approximately 1. At this time, the cultures were cold-shocked on ice for 30 minutes followed by induction with 200 μM IPTG. After induction, cultures were incubated at 16 °C for 24 h in a shaking incubator. Cells were harvested by centrifugation at 5,000 x g for 30 minutes at 4 °C. Cell pellets were resuspended in ice-cold lysis buffer (50mM HEPES pH 8, 300 mM NaCl, 10% (v/v) glycerol) with 20 mM imidazole, and lysed using lysozyme and sonication. The resultant lysate was clarified by centrifugation at 15,000 x g for 30 minutes at 4 °C. Benchtop and FPLC affinity purifications were used for all proteins and yielded equivalent protein with equivalent activity and purity. For benchtop purifications, supernatant was incubated with Ni-NTA beads (Gold Bio) on a rotator at 4 °C for 1 h. Beads were washed with 10 column volumes of ice-cold lysis buffer and eluted in lysis buffer containing 250 mM imidazole. For FPLC affinity purification, a 139 pre-packed HisTrap 5 mL column (GE) was used: supernatant was filtered with a 0.2 µm syringe filter before being loaded onto the pre-equilibrated column. After loading, the column was washed with 5 column volumes of lysis buffer with 20 mM imidazole and eluted using lysis buffer with 250 mM imidazole. For benchtop and FPLC purifications, fractions were collected and purified protein was concentrated using Amicon Ultra filters (10-kDa MWCO) and subsequently loaded onto a pre-equilibrated HiLoad 16/600 Superdex 200 pg size exclusion column (GE). Fractions were again collected and concentrated using Amicon Ultra filters. Concentrations were determined by BioRad Bradford assay. For his6-SonM and his6-SonA proteins to be used in kinetics assays, protein was flash frozen in liquid nitrogen and stored at -80 °C. For proteins to be used in crystallography, protein was concentrated to approximately 20 mg/mL and dialyzed into 10 mM HEPES pH 8 to de-salt and remove glycerol. Proteins were divided into 40 µL aliquots, flash frozen in liquid nitrogen, and stored at -80 °C until use. 5.6.4 Mass spectrometry Heterologously expressed and purified protein was prepared for mass spectrometric analysis by an in-gel digest method as previously described.60 Briefly, the band corresponding to his6-SonA was extracted from an SDS-PAGE gel, cut into ~2 mm x 2 mm pieces and placed in 1.5 mL LoBind tubes (Eppendorf). Gel cubes were then washed with a 1:1 ratio of 100 mM ammonium bicarbonate (ABC): acetonitrile (ACN) three times until gel pieces appeared clear. After dye removal, they were then dehydrated in 100% ACN until semi-opaque (~30 s), and the ACN was subsequently discarded. After rehydration in digest buffer (50 mM ABC and 1:50 units AspN protease (Promega)), gel pieces were placed on ice for 15 minutes and then were transferred to a 37 °C incubator overnight. The next day, excess liquid from the digest was collected and transferred to a new LoBind tube. Digested peptides were extracted from the gel pieces by first covering them with 60 µL of 50% ACN and 0.3% formic acid (FA) and incubating at room temperature for 15 minutes. After this incubation, the supernatant was recovered. This extraction was repeated with 60 140 µL of 80% ACN and 0.3% FA and the supernatant was recovered and placed into the same LoBind tube. The pooled peptide extractions were frozen at -80 °C for 30 minutes to deactivate the protease. After freezing, the extracted peptides were thawed and dried using a SpeedVac (Eppendorf). Dried peptides were reconstituted in 0.1% FA and purified/desalted using C18 ZipTips according to the manufacturer’s instructions. Purified and desalted peptides were again dried using the SpeedVac and then reconstituted in 15- 30 µl of 20% ACN, 0.1% FA, and transferred to glass vials for MS analysis. Peptide mass spectrometric analysis (LC-MS/MS HCD) LC-MS/MS measurements of digested peptides was performed as previously described.60 Briefly, data were obtained on a Thermo Scientific Fusion mass spectrometer furnished with a Dionex Ultimate 3000 UHPLC system with a nLC column (200 mm × 75 μm) packed with Vydac 5-μm particles of 300 Å pore size (Hichrom Limited). Elutions used a linear gradient consisting of 0.1% FA in water (solvent A) and 0.1% FA in ACN (solvent B) at a flow rate of 0.3 μl/min. The column was initially equilibrated with 20% solvent B for 5 minutes and then subjected to a linear increase of solvent B to 85% over 32 min followed by a final elution step of 85% solvent B for 2 minutes. Mass spectra were acquired in positive-ion mode. Full MS was done at a resolution of 60,000 [automatic gain control (AGC) target, 4 × 105; maximum ion trap (IT), 50 ms; range, 300 to 1800 m/z], and data-dependent and targeted MS/MS were both performed at a resolution of 15,000 (AGC target, 5 × 105; maximum IT, 500 ms; isolation window, 2.2) using higher-energy collisional dissociation (HCD). HCD collision energies from 14-20% with steps of ±4% were used during LC-MS/MS measurements. Data were processed and analyzed using Thermo Fisher Xcalibur software and MaxQuant as previously described.60 5.6.5 Kinetics assay Plasmids for expressing S-adenosylhomocysteine nucleosidase (SAHN; Uniprot P0AF12) and adenine deaminase (ADE; Uniprot P31441) with N-terminal his6 tags were acquired from the ASKA collection.164 SAHN was expressed and purified as above with 141 the addition of 1mM DTT in all buffers. During the expression of ADE, to replace the Fe2+ metal with Mn2+ in the active site, 20 µM 2,2’-dipyridyl and 1.0 mM MnCl2 were added at the time of induction.165 Other expression and purification steps for ADE were carried out in the same manner as for SAHN. Glutamate dehydrogenase (GDH) and ammonia assay reagent were used from the Ammonia Detection Kit (Millipore Sigma AA0100) according to previously established methods.162 For use in the kinetics assays, SAM was purified by HPLC using a BUCHI PrepChrom C-700 instrument and BUCHI FlashPure EcoFlex C18 Column (140000048). A flow rate of 10 mL/min was used with a gradient of: Solvent A) H2O with 0.1% formic acid and Solvent B) acetonitrile. The linear gradient used was Solvent A) 95% 0.5 minutes, 95%-5% 15 minutes, 5% 2 minutes. SAM was purified to ~98.5-97% purity when measured by our assay. Kinetic experiments were conducted in a clear, flat-bottomed 96-well plate in a SpectraMax ID5 (Molecular Devices, Inc). Methyl transfer was measured by monitoring the decrease in absorbance at 340 nm (corresponding to the loss of NADPH in the coupled enzyme assay). Three replicates for each condition were used, and reads were taken every 30 or 40 s. Upon assembling all assay components except the methyltransferase in the plate wells, absorbance values were collected for 10- 15 minutes prior to the addition of the methyltransferase to start the reaction. The absorbance data was used to calculate the concentration of NADPH at each time point with Beers’ Law and the reported extinction coefficient of NADPH, 6220 M-1. The concentration of the final reading before addition of the methyltransferase was used to subtract all successive concentration values from, making the curve reflect product formation over time. The slope was taken over the linear range of this curve giving the velocity of product formation (µM/min). The velocity of the three negative control replicates (lacking the varied substrate) were averaged and subtracted from the velocity of each individual replicate to account for background SAM degradation. These velocity values were then divided by the enzyme concentration used giving the rate of product formation (min-1) and plotted with their respective substrate concentrations in GraphPad Prism to produce the substrate-velocity curve. A non-linear regression analysis was used 142 to fit the data to the Michaelis-Menten equation and give values for the desired kinetic constants, Vmax, kcat, and KM or Ki, where appropriate. For the collection of data for the kinetic modelling of 0, 1, and 2 methylated species of his6-SonA, the kinetic reactions were prepared as described in the above paragraph. Duplicates of each reaction time point to be analyzed by mass spectrometry were measured using the plate reader prior to quenching the reactions in SDS-dye and boiling for 5 minutes. his6-SonA was then prepared for mass spectrometry analysis following the procedure described in the mass spectrometry section. After reconstitution in 30 μl, the samples were further diluted 200-fold. The LC method was also modified to a 1 μl/min flow rate of: Solvent B) 20% 5 minutes, 20-85% 15 minutes, 85% 2 minutes. Mass spectra were acquired and analyzed using the methods described in the mass spectrometry section. 5.6.6 Generating the kinetic model We used a mathematical simulation to evaluate which parameters would be consistent with the dynamics that we observed. Specifically, we considered a reaction of the following form: 𝐸 + 𝑆 𝑘1 ↔ 𝐸𝑆 𝑘2 ↔ 𝐸 + 𝑃1 𝑘3 ↔ 𝐸𝑃 𝑘4 ↔ 𝐸 + 𝑃2 In this case 𝑆 represents free substrate, 𝑃1 is an intermediate product and 𝑃2 is a final product. 𝐸 represents free enzyme, while 𝐸𝑆 and 𝐸𝑃 represent enzyme bound to substrate or intermediate product. We simulated these dynamics using the following series of ordinary differential equations. The rate at which each reaction proceeds in the forward direction is 𝑘𝑛, while the reverse rate of the reaction is 𝑘𝑛𝑟. The parameter values that were used for our base model can be found in Table 5.7. Modeling was done in R version 3.6.2. The model was solved using the deSolve package in R. Simulations were run for 3600 seconds by 0.1 second timesteps. Code to run simulations will be provided by W. Harcombe in the online version of this manuscript. 143 𝑑𝐸 𝑑𝑡 = −𝑘1 ∗ 𝐸 ∗ 𝑆 + 𝑘1𝑟 ∗ 𝐸𝑆 + 𝑘2 ∗ 𝐸𝑆 − 𝑘2𝑟 ∗ 𝐸 ∗ 𝑃1 − 𝑘3 ∗ 𝐸 ∗ 𝑃1 + 𝑘3𝑟 ∗ 𝐸𝑃 + 𝑘4 ∗ 𝐸𝑃 − 𝑘4𝑟 ∗ 𝐸 ∗ 𝑃2 𝑑𝐸𝑆 𝑑𝑡 = 𝑘1 ∗ 𝐸 ∗ 𝑆 − 𝑘1𝑟 ∗ 𝐸𝑆 − 𝑘2 ∗ 𝐸𝑆 + 𝑘2𝑟 ∗ 𝐸 ∗ 𝑃1 𝑑𝐸𝑃 𝑑𝑡 = 𝑘3 ∗ 𝐸 ∗ 𝑃1 − 𝑘3𝑟 ∗ 𝐸𝑃 − 𝑘4 ∗ 𝐸𝑃 + 𝑘4𝑟 ∗ 𝐸 ∗ 𝑃2 𝑑𝑆 𝑑𝑡 = −𝑘1 ∗ 𝐸 ∗ 𝑆 + 𝑘1𝑟 ∗ 𝐸𝑆 𝑑𝑃1 𝑑𝑡 = 𝑘2 ∗ 𝐸𝑆 − 𝑘2𝑟 ∗ 𝐸 ∗ 𝑃1 − 𝑘3 ∗ 𝐸 ∗ 𝑃1 + 𝑘3𝑟 ∗ 𝐸𝑃 𝑑𝑃2 𝑑𝑡 = 𝑘4 ∗ 𝐸𝑃 − 𝑘4𝑟 ∗ 𝐸 ∗ 𝑃2 Table 5.7 Parameter values for kinetic model Variables Value (M) Parameters Value (sec-1) 𝑬 0 𝑘1 1.21E+03 𝑺 7.60E-05 𝑘2 8.70E-03 𝑬𝑺 4.30E-06 𝑘3 1.21E+03 𝑬𝑷 7.00E-07 𝑘4 8.70E-02 𝑷𝟏 1.03E-05 𝑘1𝑟 1.21E-02 𝑷𝟐 8.00E-06 𝑘2𝑟 0 𝑘3𝑟 1.21E-03 𝑘4𝑟 0 144 5.6.7 Protein crystallization and data collection After purification, concentration, and exchange into 10 mM HEPES pH 8 buffer as described above, proteins were screened for precipitation and crystal formation using the JCSG+ Suite (Qiagen) at 292 K. For each condition, three protein:precipitant ratios were tested (1:1, 1:2, and 1:3). Screen was conducted by the Nanoliter Crystallization Facility at the University of Minnesota (Minneapolis, MN). For his6-SonA SonM complex, the best condition was identified as 20% polyethylene glycol (PEG) 3350 with 240 mM sodium malonate at pH 7. This condition was further refined for pH (in the range of 5.5-7) and PEG 3350 concentration (0-20%). For the his6-SonA_helicalbundle SonM complex, the best condition was identified as 100 mM Bis-Tris at pH 5.5 with 100 mM ammonium acetate and 17% PEG 10,000. This condition was further refined for pH (5-5.5) and PEG concentration (4-7%). Diffraction-quality crystals were visible at 292 K in 1 day for all crystals except the R67A mutant, which produced crystals in 3 days. SAH was dissolved in the mother liquor to a concentration of 5 mM for co-crystallization or 1 mM SAM was added to protein solutions before drops were set. Crystals were cryoprotected by transferring to a drop consisting of the mother liquor supplemented with 20% PEG and 20% glycerol. The crystals were then mounted onto a CryoLoop (Hampton Research) and flash-cooled at 100 K in liquid nitrogen. X-ray diffraction data were collected on the 23-ID-B beamline at the Advanced Photon Source (APS), Argonne, Illinois, USA using a wavelength of 1.03323 Å and a MAR 300 CCD detector with 0.2 s exposures. Individual frames consisted of 0.5 steps over a range of 400. 145 6 Progress towards identifying a phenotype associated with the split borosin BGC in S. oneidensis MR-1 Fredarla Miller performed the lab work associated with this chapter. Experiments were designed by Fredarla Miller and Dr. Michael Freeman. This chapter was written by Fredarla Miller for the purpose of this thesis. 6.1 Introduction Considering our success in biochemically characterizing the SonM and SonA proteins heterologously in vivo and in vitro, we were eager to learn more about the biological role of the split borosin BGC in the native organism. The genome of S. oneidensis MR-1 was published in 2002166 and the bacterium has since been extensively studied in part for its ability to respire a variety of substrates and its unique metabolism.161 Based upon the annotated genes within the split borosin BGC (sonM, sonA, SO1480, SO1481) and proximal regulatory elements, we generated many hypotheses regarding the native role of the son BGC and its associated natural product—all of which coalesce around physiological processes such as oxygen sensing, biofilm formation, and motility. This chapter seeks to compile the results of previous studies hinting at a biological role for this split borosin BGC and to lay the foundation for future experiments to discover a genuine phenotype as well as the structure and bioactivity of the borosin RiPP natural product. 6.1.1 Proposed bottom-up approach strategy to identify the son RiPP and determine its native biological role in S. oneidensis MR-1 The traditional pipeline for natural product discovery begins with the isolation of an “orphan” compound with a sought-after or novel bioactivity and a subsequent top-down investigational approach (Figure 6.1, left).167 However, with the increasing amount of genomic and transcriptomic data available on public databases and the concurrent development of bioinformatics tools, a bottom-up approach has become increasingly common (Figure 6.1, right). In this case, a putative natural product BGC is identified in the genome of an organism and then, typically through heterologous approaches, 146 researchers can attempt to isolate an associated natural product. This has proven to be a powerful strategy in the field of RiPP biosynthesis, especially for the expansion of known RiPP families. For example, a recent study by Marahiel and coworkers used conserved elements of lasso peptide BGCs to identify 102 cryptic BGCs in 87 strains of proteobacteria.168 As the lasso peptides investigated in this study exhibit very few PTMs, which are predictable based upon the core peptide sequence and/or the genes in the BGC, researchers were able to precisely predict the structure of the final natural products. Thus, select BGCs were cloned into E. coli for heterologous expression and subjected to mass spectrometric detection of the predicted natural products—which resulted in the discovery of 12 new lasso peptides.168 A similar heterologous approach to produce novel RiPPs from putative BGCs identified through genome mining efforts has been fruitful for other RiPP families including microviridins,148 cyanobactins,169 and more. Figure 6.1 Pipelines for RiPP natural product discovery In gene clusters shown at the bottom, borosin methyltransferase/precursors are colored in pink/blue. Other BGC genes are in gray. Left: Top-down pipeline for natural product discovery is more common (exemplified by the omphalotins and the oph BGC from O. olearius).60 Note that in this case the BGC is named for the natural product associated with it (omphalotins). Right: Bottom-up is a newer approach that relies upon sequencing data and bioinformatics to predict natural product BGCs (exemplified by putative son BGC from S. oneidensis MR-1, a generic name given based on the name of the organism). In many cases, it is possible to reconstitute a BGC in vitro or heterologously to produce a novel natural product. Core peptide region of sonA is shown with the α-N-methylated residues in pink boxes. Core region is underlined in orange but precise boundaries of the core peptide are unknown. 147 We sought to employ a bottom-up approach for our investigation of the cryptic son BGC in S. oneidensis MR-1, but this method becomes more challenging when there are fewer (or no) characterized members of a RiPP family of interest, such as the case with split borosins. Additionally, the absence of a proximal protease is a significant hurdle in the bottom-up approach for investigating a RiPP BGC. It is not uncommon in RiPP biosynthesis for the protease responsible for removing the leader peptide from the core peptide to be external to the BGC, and this is likely the case in the son BGC.170,171 Based on our biochemical characterization of the SonA precursor (Chapters 4 and 5 of this thesis), we can reasonably assume that the final natural product contains the two α-N- methylations which we have characterized. We do not yet know what other PTMs may be installed to generate the final RiPP natural product, including but not limited to, where proteolytic cleavage takes places within the precursor (i.e., where the leader is separated from the core peptide) (Figure 6.1, right). Thus, as we could not hope to produce the final natural product heterologously at this stage, we first sought to investigate the biological role of this unknown RiPP in its native host—an investigational route only rarely pursued in RiPP biosynthesis. The bottom- up/heterologous expression approach is by far the most common strategy for identifying novel RiPPs—although it often relies on metagenomic data from intractable organisms. This is a powerful strategy for many RiPP BGCs and affords the opportunity to investigate otherwise inaccessible enzymes and natural products. However, the heterologous approach renders knockout studies all but impossible due to the intractable native organism. While the lack of protease in the son BGC of S. oneidensis MR-1 makes identification of the final natural product more challenging, this BGC resides in a genetically tractable organism that has been extensively studied. We hoped to take advantage of the body of literature supporting this organism to pursue a less traversed, and thus more impactful, route by focusing on the elucidation of the native role of the BGC/natural product (rather than first focusing on discovery of the structure of the natural product itself). 148 6.2 Description of the son BGC in S. oneidensis MR-1 The son BGC is well-conserved across the Shewnella genus. At the time of this analysis, NCBI houses the genomes of 43 distinct Shewanella species. Of these 43 species, 37 contain a son BGC (Figure 6.2 B), consisting of at least three genes: sonM (borosin methyltransferase), sonA (borosin precursor), and a gene with a diguanylate cyclase domain. In S. oneidensis MR-1 specifically, there is an additional gene downstream coding for a putative potassium efflux protein, which we believe to be a part of the BGC in this organism (Figure 6.2). The annotation of these genes, additional proximal genes, regulatory elements, and the results of previous studies (to be discussed in detail in the following sections) have led us to propose the son BGC is involved in biofilm formation— a physiological process deeply intertwined with oxygen sensing, nitric oxide sensing, and motility in this organism. Each aspect will be discussed in detail within this chapter together with the details of preliminary experiments we performed. S. oneidensis MR-1 strains created for this study are in Table 6.1 (a list of plasmids can also be found in the materials and methods section, Table 6.6). 149 Figure 6.2 Putative split borosin BGC in S. oneidensis MR-1 A: At least three genes are conserved as a putative cluster in 37 of the published 43 Shewanella spp. genomes on NCBI (AE0142992.2). Additional downstream genes (not necessarily a part of the cluster) are shown because they may be relevant to discovering a biological role of the borosin BGC in S. oneidensis MR-1. B: List of all the full Shewanella spp. genomes available on NCBI sorted into two groups based on the presence of a conserved son BGC. 150 Table 6.1 Bacterial strains used in this study ID/name Description; notes Source E. coli TOP10 Used for cloning purposes Lab stock WM3064 Used to conjugate plasmids into S. oneidensis MR-1 Dr. J Gralnick UQ950 Used to propagate pSMV3 plasmids Dr. J Gralnick S. oneidensis MR-1 hMF008 WT Dr. J Gralnick hMF1024 ΔflgA Dr. J Gralnick hMF007 ΔarcA::kan Dr. J Gralnick hMF1008 ΔSO1478 This study hMF1031 ΔSO1479 This study hMF1014 ΔSO1480 This study hMF1017 ΔSO1481 This study hMF1020 ΔSO1478-79-80-81 This study hMF1034 ΔSO1478pBBR1MCS2 (empty vector control) This study hMF1026 ΔSO1478pSO1478 (complemented knockout) This study hMF1035 ΔSO1479pBBR1MCS2 (empty vector control) This study hMF1027 ΔSO1479pSO1479 (complemented knockout) This study hMF1036 ΔSO1480pBBR1MCS2 (empty vector control) This study hMF1028 ΔSO1480pSO1480 (complemented knockout) This study hMF1037 ΔSO1481pBBR1MCS2 (empty vector control) This study hMF1029 ΔSO1481pSO1481 (complemented knockout) This study hMF1038 ΔSO1478-79-80-81pBBR1MCS2 (empty vector control) This study hMF1030 ΔSO1478-79-80-81pSO1478-79-80-81 (complemented knockout) This study hMF1025 WT with empty pBBR1MCS2 (empty vector control) This study hMF1039 WTp-ind-SO1478-79 WT with inducible sonM-sonA plasmid This study hMF1040 WTp-ind-SO1478-79-80-81 WT with inducible full cluster plasmid This study hMF1042 WTpBBAD18K (empty vector control) This study hMF1043 ΔSO1478-79-80-81pBBAD18K (empty vector control) This study hMF1044 ΔSO1478-79-80-81p-ind-SO1478-79 Full cluster knockout with inducible sonM-sonA plasmid This study hMF1045 ΔSO1478-79-80-81p-ind-SO1478-79-80-81 Full cluster knockout with inducible full cluster plasmid (in process) This study 6.3 ArcA regulation of the son BGC ArcB/ArcA is a two-component signal transduction system that is directly or indirectly responsible for regulating the expression of at least 9% of all genes in E. coli.172 In E. coli, some of its regulation targets include genes involved in central metabolism and respiration. S. oneidensis MR-1 also encodes an Arc regulation system, but in S. oneidensis MR-1, the regulon is starkly different than that of E. coli, with very little overlap between 151 the two organisms. In S. oneidensis MR-1, the Arc system is responsible for regulating the switch from aerobic to anaerobic metabolism.173 In addition to the dissimilar role the Arc systems play in these two organisms, the proteins involved in the two-component system in S. oneidensis MR-1 are also unique. Notably, the histidine sensor kinase ArcB of E. coli is split into two proteins in S. oneidensis MR-1: ArcS and HptA.174,175 Despite these differences, ArcA, a transcription factor, is present in both organisms and is highly homologous between the two (87% similar, 81% identical).175 Due to this high sequence identity, Gralnick et al. hypothesized that the two ArcA proteins would have similarly conserved DNA binding targets (a 15 base pair motif).175 Indeed, the target-sequence similarity was confirmed in a later study (Figure 6.3).173 Gralnick et al. performed a bioinformatic analysis to predict potential gene targets of ArcA regulation, which were then verified by qualitative real-time PCR.175 Figure 6.3 DNA binding site of ArcA in E. coli and S. oneidensis MR-1 Figure adapted from Gao et al.173 Sequence logo showing ArcA binding sites of E. coli and S. oneidensis MR-1. Intriguingly, one of the predicted ArcA binding sites lies 59 base pairs upstream of sonA, falling into the 3’ end of sonM (Figure 6.2 A).175 Indeed, Gralnick et al. discovered that when S. oneidensis MR-1ΔarcA was grown anaerobically with DMSO and fumarate, 152 sonA (SO1479) exhibited a 17.7-fold expression difference compared to WT, indicating that the Arc system downregulates sonA expression in these conditions; in fact, it is one of the most differentially expressed genes in this condition.175 Based on this result and the understanding that the Arc transcriptional regulation system is activated in anaerobic conditions, we believed that the son BGC would be expressed in aerobic conditions. To verify this hypothesis, we inoculated two fresh cultures of S. oneidensis MR-1, one in rich medium (LB) and one in minimal medium (SBM), in preparation for RNA extraction and RT-PCR (non-quantitative). Since we do not yet definitively know the boundaries of the son BGC, the first experiment probed for a sonM transcript only. A PCR product corresponding to sonM transcript (SO1478) was present in both the LB and SBM samples (Figure 6.4 A). In an attempt to detect a longer transcript of the putative son operon, cDNA synthesis was also conducted with random hexamer primers. We were able to detect the presence of a transcript corresponding to a single operon encoding sonM and sonA when grown in LB and SBM (Figure 6.4 B). Unfortunately, we have not yet been able to detect the presence of a longer transcript, possibly due to a false negative result (for example, long RNA transcripts may have been sheared during RNA isolation). However, primer optimization and a successful positive control for the longer transcript may yield better results (Figure 6.4 C). Despite this challenge, we were able to confirm the expression of sonM and sonA in these convenient aerobic conditions in both rich and minimal media. While additional RT-PCR should be performed on anaerobic samples to confirm repression of the son BGC in that condition, we were able to plan most of the subsequent experiments to take place aerobically with a variety of media supplementations. 153 Figure 6.4 RNA extraction to verify expression of son BGC A: Agarose gel confirming that DNase treatment was successful (no PCR product is visible after the second DNase treatment in either the LB or SBM samples). After RNA extraction and RT-PCR, a transcript for SO1478 (sonM) is present in LB and SBM samples in the RT reactions that used a specific primer (prmMRJ025) or random hexamer primers. Positive control used a plasmid encoding the full BGC for template. B: PCR product corresponding to the SO1478-79 (sonM-sonA) transcript is present in LB and SBM RT-PCR reactions that used random hexamer primers. C: Attempt to detect transcript for SO1478-79-80 and SO1478-79-80-81 was unsuccessful. Positive control only worked for the former. 6.3.1 Known phenotypes in S. oneidensis MR-1 related to the Arc system The Arc system in S. oneidensis MR-1 is thought to provide transcriptional regulation during the metabolic switch from aerobic to anaerobic growth.173 Typical phenotypes associated with mutants of the Arc system proteins (ArcA, HptA, and ArcS) exhibit slower growth in aerobic and anaerobic conditions as well as hindered biofilm 154 formation, especially in anaerobic conditions.173,174 The comprehensive investigation into the ArcA regulon by Gao et al. made some additional crucial discoveries. First, and of direct interest to this thesis, three of the four genes in the son BGC (sonM, sonA, SO1480) were seen to be unaffected during aerobic growth but were induced in the ΔarcA strain during anaerobic growth173—the same qualitative result seen by Gralnick et al. with respect to sonA.175 Second, three genes involved in the glyoxylate pathway were seen to be induced during aerobic growth but were unaffected during anaerobic growth in the ΔarcA strain (Figure 6.5).173 The glyoxylate pathway is a part of central metabolism in plants, fungi, and bacteria. This pathway bypasses much of the citric acid cycle by transforming isocitrate into glyoxylate (aceA; isocitrate lyase), and glyoxylate into malate (aceB; malate synthase A). In other organisms, the glyoxylate pathway has been shown to be associated with the synthesis of carbohydrates, which are important both for cell growth and biofilm formation (e.g., exopolysaccharides).176 Furthermore, aceA and aceB are encoded just downstream of the son BGC in S. oneidensis MR-1, providing additional confidence that the son BGC may also be involved in these physiological processes (Figure 6.2 B). 155 Figure 6.5 Expression changes of TCA cycle and glyoxylate pathway in ΔarcA mutant Figure adapted from Gao et al.173 Green pathway is the glyoxylate pathway and its enzymes are upregulated in ΔarcA. No change in expression is seen in any of the other genes in the TCA cycle. In considering that the son BGC seems to be influenced by the Arc system, we sought to investigate if mutants of the BGC exhibited a similar growth phenotype by performing a simple growth curve in LB under aerobic conditions (Figure 6.6). Subsequent experiments (such as soft agar motility assays) may not differentiate between a growth defect and the phenotype of interest, so it was important to first confirm that the son BGC mutants can be expected to grow at the same rate as WT. From this experiment, we saw no growth deficiencies of son BGC mutants, but we did see a slight defect in the ΔarcA mutant, as expected from previous studies. Additional growth curves should be conducted in SBM and in aerobic, anaerobic, and microaerobic conditions (in LB and SBM) to further confirm the behavior of the mutants. This initial growth curve in aerobic conditions in rich 156 media serves as a foundation for subsequent experiments probing motility and biofilm phenotypes. Figure 6.6 Aerobic growth curve in LB Aerobic growth curve conducted in LB with S. oneidensis MR-1 WT and mutants. None of the mutants exhibit a growth defect in these conditions, but the ΔarcA mutant shows a slight growth defect, as anticipated from previous studies. 6.4 Cyclic di-GMP regulation in bacteria and its implication for the son BGC The son BGC encodes several genetic elements indicative of Arc and cyclic-di- GMP regulation, both of which are known to regulate overlapping metabolic/physiological processes such as oxygen utilization and biofilm formation. Bis-(3´-5´)-cyclic dimeric guanosine monophosphate (cyclic-di-GMP; c-di-GMP) is a molecule made from two guanosine triphosphate (GTP) monomers, a process catalyzed by diguanylate cyclase (DGC) enzymes which contain a conserved GGDEF domain (eponymous of the conserved residues) (Figure 6.7 B). c-di-GMP molecules are broken down by dedicated phosphodiesterase (PDE) enzymes. The balanced activity between DGCs and PDEs controls the signaling cascade associated with the c-di-GMP molecule.177 157 c-diGMP signaling is a complex process that is known to play a role in regulating the metabolic and physiological switch between planktonic and biofilm states of bacteria (in either direction).177 However, biofilm formation is tightly intertwined with other processes in bacteria including oxygen sensing, nitric oxide sensing, regulation of the glyoxylate shunt of the TCA cycle, exopolysaccharide biosynthesis, and motility.176,178,179 It should also be noted that many of these processes overlap with the Arc regulon in S. oneidensis MR-1.173,175 As the three-gene son BGC conserved in most Shewanella spp. encodes a DGC protein (SO1480 in S. oneidensis MR-1), we were interested in exploring c-di-GMP signaling as having possible implications for either son BGC expression/regulation or a related signaling role for the borosin natural product itself (Figure 6.7 A). c-di-GMP signaling and its role in regulating the aforementioned processes will be discussed in detail below, together with relevant experiments we performed. Figure 6.7 GGDEF domain protein in the son BGC A: son BGC from S. oneidensis MR-1. The conserved GGDEF domain is the lighter box within the protein to show the domain architecture. The active site is bracketed below the protein and corresponds to the highlighted residues in the lower panel. B: Alignment between the GGDEF protein found in the son BGC and the top BLASTp conserved protein domain hit (cd01949; E-value 6.34e-55 for the query interval 417- 572). Active site residues are highlighted in green, including the conserved GGDEF (or GGEEF) motif. No putative conserved domains are apparent in the N-terminal portion of the protein. 158 6.4.1 Related motility phenotypes in S. oneidensis MR-1 Motility is closely associated with the biological processes discussed in this chapter, such as c-di-GMP signaling.180 Deutschbauer and co-workers used TnSeq (transposon mutagenesis followed by sequencing) to generate single-gene mutants of 32 bacterial species in an effort to identify a phenotype for every gene in the organisms’ genomes.181 In this experiment, several Shewanella spp. were determined to have a motility phenotype when a homologous son BGC gene was disrupted (Table 6.2).181 In fact, in all but one case, at least one motility phenotype was seen in the top 20 phenotypes identified for each gene tested. These high-throughput experiments were conducted on soft LB agar at 30 °C in aerobic conditions. This finding was in line with our belief that we may be most likely to identify a phenotype in aerobic conditions, as that is when the son BGC is natively expressed. Table 6.2 Motility phenotypes identified in TnSeq experiment with select Shewanella spp. In cases where more than one motility assay was in the top 20 phenotypes, only the one with the most divergent fitness score is listed (unless there was a positive and negative fitness result).181 Gray cells indicate when a phenotype was not in the top 20 strongest phenotypes identified. Shewanella spp. son BGC homolog tested Gene ID Assay Relative fitness value S. oneidensis MR-1 sonM 1478 M5 outer +0.5 sonA 1479 M2 center +0.2 SO1480 1480 M1 center +0.2 S. sp. ANA-3 sonM 2948 outer cut, LB soft agar motility assay -0.6 sonA 2947 outer cut, LB soft agar motility assay -0.8 SO1480 2946 outer cut, LB soft agar motility assay -0.6 S. loihica PV-4 sonM 1272 Motility M3 +0.8 sonA 1273 Motility M4 -2.0 SO1480 1274 Motility M3 +0.5 S. loihica PV-4 sonM 2998 Motility M3 +0.8 Motility M4 -0.5 sonA 2999 Motility M3 +0.4 SO1480 3000 Motility M4 +0.6 S. amazonensis SB2B sonM 2384 outer cut, LB soft agar motility assay -0.5 sonA 2383 Motility assay, center cut sample 1 +0.5 outer cut, LB soft agar motility assay -1.0 SO1480 2382 outer cut, LB soft agar motility assay +0.3 S. amazonensis SB2B sonM 0785 Motility assay, center cut sample 2 +0.3 sonA 0784 outer cut, LB soft agar motility assay +0.8 Motility assay, center cut sample 1 -1.5 SO1480 0783 outer cut, LB soft agar motility assay +0.2 159 To assess how the son BGC in S. oneidensis MR-1 is involved in motility, we used the similar soft-agar method to conduct motility assays in our lab. In addition to WT S. oneidensis MR-1 and the son mutants, we obtained an S. oneidensis MR-1ΔflgA strain from Dr. Jeffery Gralnick (University of Minnesota, Twin Cities), which is a non-motile mutant, for use as a control. While we understood the most likely condition to present a phenotype for the son mutants was during aerobic growth, we also tested microaerobic and anaerobic conditions. We defined “microaerobic” to be plates that were inoculated aerobically and then transferred to an anaerobic chamber for subsequent incubation. Since the Arc system helps to regulate the metabolic switch from aerobic to anaerobic growth, we hoped that this strategy might induce a clear motility phenotype. We also tested rich medium (LB) and minimal medium (SBM). See Table 6.3 for a summary of conditions and strains tested. Table 6.3 Conditions and strains tested in motility/colony morphology experiments Microaerobic, as used here, is defined as inoculating colonies aerobically and then transferring the plates to the anaerobic chamber after inoculation. For motility assays, plates were used with 0.3% agar. A variety of inoculation methods was also used as discussed in the main text. Colony morphology experiments used standard 1.5% agar. Conditions tested Media Description Oxygen Note on media LB rich, undefined Aerobic LB rich, undefined Anaerobic With fumarate and lactate LB rich, undefined Microaerobic With fumarate and lactate SBM minimal, defined Aerobic SBM minimal, defined Anaerobic With fumarate and lactate SBM minimal, defined Microaerobic With fumarate and lactate Strains tested ID Description Note hMF008 WT All media conditions above were tested for the motility experiments. Colony morphology experiments were only conducted on aerobic LB and SBM. hMF1008 ΔSO1478 hMF1031 ΔSO1479 hMF1014 ΔSO1480 hMF1017 ΔSO1481 hMF1020 ΔSO1478-79-80-81 hMF1024 ΔflgA (neg motility ctrl) 160 Unfortunately, we were unable to reliably verify the motility phenotype in the son BGC mutants in any condition tested. As we were unsure of our technical ability to perform this assay, we focused on generating successful technical replicates. Thus far, only the ΔflgA mutant gave consistent results for a non-motile phenotype. Even WT technical replicates were not consistent in growth. In an attempt to get reproducible results, we tested several plating methods including direct colony transfer (from solid media to solid media) and liquid culture inoculation, normalized by OD600. While we were able to easily visualize the difference between the ΔflgA mutant and WT, our results for the son mutants were not consistent or definitive (Figure 6.8). As colony morphology can be an indicator for biofilm phenotypes, colony morphology experiments were conducted in a similar manner. Unfortunately, they produced similar results: son mutants were indistinguishable from WT. We expect the challenge in achieving reproducible results in this experiment is due to several factors. Examples include variability in the depth of the agar or depth of the inoculation, variability in the number of cells used, and uneven/inadvertent dehydrating of the agar plates during incubation. Despite these hurdles, the previous evidence for a motility phenotype relating to the son BGC as well as the Arc system and c-di-GMP signaling makes this a compelling phenotype to pursue experimentally. 161 Figure 6.8 Representative images from motility assay A: These examples were grown in aerobic conditions at 30 °C for two days before being photographed. Top: two plates with WT S. oneidensis MR-1 and S. oneidensis MR-1ΔflgA cells labeled. ΔflgA colony, negative motility control, is much smaller than WT. Bottom: representative example of son BGC mutant. ΔsonA colony looks similar in size to WT but the technical replicates shown of ΔsonA are not consistent. B: An example of a promising result that was not reproducible. This plate uses anaerobic LB media (with fumarate and lactate). The plate was degassed and the colonies were inoculated in an anaerobic chamber. Plate was left right-side-up for 7 days at 30 °C in the anaerobic chamber. Photo was adjusted for contrast to enable easier visualization of the colony sizes. Pink line is scaled to the radius of the WT colony (top) and copied to the technical replicates below of ΔsonM. In this instance, the mutant seems to have increased motility compared to WT—but this result could not be recreated despite multiple attempts. 6.5 Pellicle biogenesis in S. oneidensis MR-1 A pellicle is a type of biofilm that forms on the top of an undisturbed liquid culture at the liquid-air interface. Cells suspended at the upper surface of a pellicle have easier access to oxygen than those at the lower edge. Oxygen is required for pellicle formation by S. oneidensis MR-1 because pellicle biogenesis is initially driven by aerotaxis (chemotactic-deficient mutants are unable to form pellicles).182,183 It has also recently 162 become clear that the activity of two DGCs (PdgA and PdgB), a c-di-GMP binding protein (MxdA), and CheY3 (involved in chemotaxis regulation/motility) are all important for pellicle formation in this organism.178 Specifically, high levels of c-di-GMP may trigger biofilm formation in S. oneidensis MR-1 as increased PDE activity has been seen to cause dissociation from biofilms.179,180 The glyoxylate pathway is also known to be involved in pellicle formation. A transcriptomic analysis of S. oneidensis MR-1 cells suspended in pellicles identified aceB and aceA as having significantly increased expression levels.184 These genes are notably encoded just downstream of the son BGC (Figure 6.2 A) and also directly implicated in exopolysaccharide synthesis and biofilm formation. Recent studies have also shown the glyoxylate pathway and biofilm formation to be related to nitric oxide sensing and c-di-GMP signaling.176,185 S. oneidensis MR-1 is known to form pellicles during aerobic growth when certain cations are present.182,186 Notably, the same TnSeq experiment discussed above also identified a strong stress phenotype (fitness value of -3.5) in Shewanella sp. ANA-3 when the sonA homolog was disrupted and the mutant was grown on 500 mM chloride.181 We found this to be intriguing due to the putative transporters encoded within the son BGC: SO1481 and SO1482, a KefC-like potassium efflux protein and a TonB-like iron transporter, respectively (Figure 6.2 A). Motility and thus pellicle formation can be affected by environmental Na+ levels.187 As a small amount of iron (<0.3 mM) is required for pellicle formation in S. oneidensis MR-1 and metal chelators such as EDTA abolish pellicle formation,188 the predicted functions of these two proteins provides further support for a pellicle phenotype for the son BGC, which may be affected by specific cations and/or the presence of iron. 6.5.1 Pellicle experiments in S. oneidensis MR-1 We conducted pellicle formation assays in 6- or 24-well plates according to a previously published method.182 As oxygen is required for pellicle formation and is likely also required for the son BGC expression in S. oneidensis MR-1, all assays were conducted 163 in an aerobic environment. As this assay can be conducted in 24-well plates, we sought to test many conditions simultaneously (see Table 6.4 for a concise summary of strains and conditions tested). Briefly, we included WT, son mutants, plasmid-complemented son mutants, and a strain over-expressing sonM and sonA from an inducible pBBAD plasmid. We also tested a variety of media conditions ranging from rich (LB), to less rich (LM), to minimal (SBM). We investigated media with varying amounts of sodium, potassium, and iron. After inoculation, the 24-well plates were set on an unused bench and kept at room temperature such that they could be observed without disturbing them. Table 6.4 Conditions and strains tested with pellicle experiments All experiments were conducted at room temperature in aerobic conditions (pellicle growth requires oxygen). Conditions tested Base media Description Alterations to base recipe LB rich, undefined none SBM minimal, defined none LM minimal, undefined none LM minimal, undefined supplemented with 5 µM FeCl2 LM minimal, undefined NaCl only LM minimal, undefined KCl only Strains tested ID Description Notes hMF008 WT Used as positive control and as negative control (+EDTA) hMF1031 ΔSO1479 Tested in all media types listed above hMF1027 ΔSO1479 complemented Tested in all media types listed above hMF1020 ΔSO1478-79-80-81 Tested in all media types listed above hMF1030 ΔSO1478-79-80-81 complemented Tested in all media types listed above hMF1039 WT SO1478-79pBBAD18K Tested with and without arabinose induction in LB hMF1044 ΔSO1478-79-80-81 SO1478-79pBBAD18K Tested with and without arabinose induction in LB hMF1042 WT pBBAD18K Tested with and without arabinose induction in LB hMF1043 ΔSO1478-79-80-81 pBBAD18K Tested with and without arabinose induction in LB In the literature, S. oneidensis MR-1 is capable of forming a thick pellicle in as little as 16 h of static growth in rich liquid media (Figure 6.9 C). However, even after increasing the inoculum concentration (testing starting OD600 values of 0.05 to 0.2), our experiments 164 were not as robust as those previously published. Indeed, the pellicles that formed were very delicate and not amenable to quantification due to their tendency to break apart. As with the motility assays discussed earlier, we prioritized technical replicates as a means to optimize this assay. However, as we were utilizing a 24-well plate method, we also included biological replicates when possible. We hoped to quantify ratios of planktonic cells to those suspended in the pellicle to discover a phenotype in the son BGC mutants or strains overexpressing sonM and sonA through an inducible plasmid. Furthermore, though a small amount of EDTA was used in a previous study as a negative pellicle control, we were unable to get cell growth in the presence of EDTA.188 As we were unable to produce reliable positive or negative controls, nor visualize a difference between mutants or WT in this experiment despite many attempts, we were unable to characterize a pellicle-related phenotype. Pellicle formation is a multistep process and the formation of a durable, mature pellicle is very sensitive, among other things, to temperature fluctuations. Thus, we suspect that our experimental set up was not sufficiently temperature-controlled for reliable outcomes of this assay. However, the simplicity of this experiment as well as the potential to acquire qualitative and quantitative data make this an enticing route to pursue with further optimization. 165 Figure 6.9 Representative pellicle experiment set up All experiments were conducted aerobically on the benchtop. A: 6-well plate format using LB. Very delicate pellicle forms after two days but no difference was seen between mutants and WT. B: 24-well plate format in SBM (photo used primarily to demonstrate set up). EDTA was used in attempt to produce a negative pellicle phenotype but it resulted in no cell growth. No difference between WT and mutants (or complemented mutants) was seen. C: Example of expected WT pellicle from Gambari et al. from an experiment conducted at 28 °C.178 6.6 Hypotheses regarding the final natural product from the son BGC While this chapter focuses upon progress towards characterizing a phenotype related to the son BGC, it is important to remember the putative product of this BGC: a split borosin RiPP natural product. As mentioned above, most known RiPPs are characterized as secondary metabolite toxins, with very few exceptions.189 Notable exceptions include bacterial redox cofactors pyrroloquinoline quinone (PQQ) and mycofactocin (MFT).68,190 These are small molecules are both built from only two amino acids, whereas most RiPPs are much larger (as many as 49 amino acids). The putative core peptide of SonA is similarly small—possibly resulting in a three-amino acid RiPP with two methylations (Figure 6.10). 166 The putative final structure of the son RiPP metabolite is unlikely to act as a redox cofactor, but we still expect it to play a role in signal transduction and/or cellular homeostasis as opposed to an antimicrobial activity; there are several reasons for this. First, the son BGC and its genomic locus is well conserved throughout the Shewanella genus—a characteristic not commonly seen in natural product BGCs. Second, the proposed role of the son BGC and/or the associated natural product itself is more aligned with a small molecule second messenger. This is supported by the predicted functions of the other genes within the BGC as well as downstream genes—which are likely involved in intricately entangled biological processes (c-di-GMP signaling, biofilm formation, oxygen sensing, etc). Third, the core peptide is very small for a RiPP. And lastly, genes in the son BGC regularly appear in unrelated transcriptomic or bioinformatic studies which probe the unique metabolism of S. oneidensis MR-1.191 In natural product biosynthesis, it is more typical for a BGC to be silent until triggered by a specific signal such as the presence of a competing organism. It is unusual that the son BGC seems to be constitutively expressed in many conditions and is more closely associated with metabolic processes rather than competition. Figure 6.10 PQQ, MFT, and putative core of SonA PQQ and MFT are bacterial redox cofactors.68,190 Pre-MFT (PMFT) is shown because the final structure has not yet been elucidated. C-terminus of SonA is shown in the orange box and α-N-methylations are shown in pink. We do not yet know the boundaries of the SonA core peptide nor if other PTMs may be present in the final natural product molecule. The mutants generated in this chapter may aid in isolating the final natural product, such as through comparative metabolomic studies between WT and ΔSO1478-79-80-81 (or ΔSO1479). In pursuit of this, a preliminary experiment was conducted with the pBBAD18K inducible plasmids constructed for this chapter (Table 6.5). Fresh cultures were streaked from a glycerol stock and individual colonies were used to inoculate small 167 overnight cultures in LB. Overnight cultures were subsequently used to inoculate another LB culture, which was grown aerobically until log phase was reached (the same conditions used during the RNA extraction discussed above). After log phase was achieved, samples were induced with arabinose and were incubated for an additional 4 or 24 h. Whole cell pellets were resuspended in SDS sample buffer and run on a 15% SDS-PAGE gel (Figure 6.11). We hoped to visualize an induction band for SonM and/or SonA in the gels or see a qualitative difference in the uninduced and induced samples. While no difference was apparent, we reasoned that the protein might be too dilute to visualize in this manner. Thus, as a preliminary step, we sought to ensure that, minimally, SonM protein was present in the sample as it is required for the son RiPP maturation. To pursue this, the 4 h samples were run on a fresh gel and a wide band was extracted that roughly corresponded to the size of SonM. The protein was subjected to an in-gel digest with trypsin and analyzed by HPLC-MS/MS. Unfortunately, we were unable to confirm the presence of SonM protein in any of the samples. This negative result could be due to several factors: improper use of the pBBAD18K plasmid (e.g., a need to optimize expression conditions, etc.); a very low abundance of the protein of interest in the samples; and/or improper mass spectrometric sample preparation. This experiment bears repeating and optimizing, possibly with affinity-tagged proteins or alternative plasmids. Table 6.5 Attempt to overexpress sonM and sonA in S. oneidensis MR-1 ID Background strain Plasmid Expect to see sonM expression? hMF1039 WT SO1478-79 pBBAD18K Yes—from genome and plasmid hMF1044 ΔSO1478-79-80-81 SO1478-79 pBBAD18K Yes—from plasmid hMF1042 WT pBBAD18K (empty) Yes—from genome hMF1043 ΔSO1478-79-80-81 pBBAD18K (empty) No 168 Figure 6.11 Attempt to overexpress SonM and SonA in S. oneidensis MR-1 Purified SonM/his6-SonA was used as a size control for the SDS-PAGE analysis. Uninduced (U) and induced (I) samples are shown. Gel on the left analyzes samples harvested after a 4 h expression, gel on the right after a 24 h expression. Unfortunately, no clear bands corresponding to SonM nor SonA were easily visible. Currently, we lack all the required proteins to reconstitute the biosynthesis of this split borosin RiPP in vitro or heterologously. The main component missing for this approach is the appropriate protease required to remove the N-terminal leader peptide from the SonA precursor. Without the activity of the required protease, we cannot confirm the boundaries of the SonA core peptide. However, other Shewanella spp. split borosin clusters encode zinc-dependent proteases, which may be cross-reactive with the BGC found in S. oneidensis MR-1 or offer clues regarding candidate proteases in the S. oneidensis MR-1 genome. Possible candidates include PepN (an amino peptidase, whose expression is also controlled by ArcA in S. oneidensis MR-1) and shewasin A or D (pepsin homologs).171,192,193 Potential approaches to isolating the final natural product may require labeling techniques prior to comparative metabolomics. For example, labeled SAM (or methionine) may be doped into a cell culture such that SonM incorporates labeled methyl groups onto 169 the core peptide of SonA. Whatever the method, once the final natural product is identified from this BGC, we will be able to begin to rigorously characterize its structure and bioactivity. Antibiotic assays or other similar toxicity screenings may readily identify such a bioactivity, but if the son borosin RiPP does indeed play a regulatory or signaling role as predicted, this bioactivity may be more difficult to characterize. Despite the difficulty in identifying a phenotype based upon a cryptic BGC, the potential payoff of discovering a unique biological role for the son borosin RiPP is enticing. 6.7 Conclusion Most RiPPs are secondary metabolite toxins, with only a handful playing a role in homeostasis or signaling. We hypothesize that the RiPP resulting from the son BGC falls into the latter category. Of particular note is the down-regulation of sonA expression in anaerobic conditions by the Arc transcriptional regulation system. The metabolic switch from aerobic to anaerobic growth is complex and has global effects on the organism. The Arc regulon is deeply intertwined with other biological processes such as motility, biofilm/pellicle formation, nitric oxide sensing, and c-di-GMP signaling—most of which have been investigated in this chapter, directly or indirectly. The microbiological assays presented here require further optimization and testing of additional conditions, but this remains a promising lead to surmount a difficult challenge. Furthermore, the strains and plasmids generated in this chapter will be critical tools in later experiments. Despite the setbacks presented here, the borosin RiPP from S. oneidensis MR-1 is poised to become the first split borosin RiPP from bacteria. 6.8 Materials and methods Unless otherwise stated, all reagents were purchased from MilliporeSigma. Mutant S. oneidensis MR-1 strains were generated by following a previously published protocol and detailed below.194 170 6.8.1 Cloning See below tables for lists of plasmids (Table 6.6) and primers (Table 6.7) created and/or used in this study. Specific cloning procedures are detailed below. The plasmid pSMV3 was used to generate clean in-frame deletions in S. oneidensis MR-1. Regions of approximately 1 kb up- and downstream of the gene to be deleted were cloned into the pSMV3 backbone. The 1 kb regions up- and downstream of the gene of interest were PCR amplified from genomic DNA extracted from S. oneidensis MR-1 cell mass (same DNA sample was used as described in Chapter 4). Molecular cloning supplies: Q5 high fidelity DNA polymerase (NEB) was used to amplify DNA for the construction of plasmids, Antarctic Phosphatase (NEB) was used to treat digested plasmid backbones, OneTaq DNA polymerase (NEB) was used for colony PCRs, T4 DNA Ligase (NEB) was used in ligation reactions, HiFi DNA Assembly MasterMix (NEB) was used for Gibson assemblies, all restriction enzymes were also purchased from NEB. All enzymes were used according to the manufacturer’s instructions with the supplied buffers. PCRs also included 5% DMSO. Table 6.6 Plasmids used/created in this study ID Name Primers (prFM) used to generate pMF015 pBBAD18K n/a (1216+1217 to amplify for Gibson assembly) pMF016 pBBR1MCS2 n/a (1207+1208 to amplify for Gibson assembly) pMF024 pSMV3 n/a (1168+1167 to amplify for Gibson assembly) pMF1223 ΔSO1478_pSMV3 1218+1219 (upstream); 1220+1221 (downstream) pMF1250 ΔSO1479_pSMV3_new 1152+1196 (upstream); 1197+1155 (downstream) pMF1225 ΔSO1480_pSMV3 1226+1227 (upstream); 1228+1229 (downstream) pMF1226 ΔSO1481_pSMV3 1156+1157 (upstream); 1158+1159 (downstream) pMF1227 ΔSO1478-79-80-81_pSMV3 1160+1161 (upstream); 1162+1159 (downstream) pMF1251 SO1478_pBBR1MCS2 1198+1200 pMF1252 SO1479_pBBR1MCS2 1201+1202 pMF1253 SO1480_pBBR1MCS2 1203+1204 pMF1254 SO1481_pBBR1MCS2 1205+1199 pMF1255 SO1478-79-80-81_pBBR1MCS2 1198+1199 pMF1274 SO1478-79_pBBAD18K 1209+1211 pMF1275 SO1478-79-80-81_pBBAD18K 1209+1210*Note: this plasmid is in progress and was not used in any experiments. 171 Table 6.7 Primers used in this study ID Description Sequence (5’-3’) prFM1140 SonB_gDNA_fw TTGAAGTTTTTTAGTGTTTTCATTTTGGCAA prFM1141 SonC_gDNA_rev TTAACTCACATTCTCCCTGTCGC prFM1142 Son_pBAD_fw CTAACAGGAGGAATTAACATGGGATCACTCGTCTGT prFM1143 Son_pbad_rv TACCAGCTGCAGATCTTAACTCACATTCTCCCTGTC prFM1144 pBAD_seq_fw CCTACCTGACGCTTTTTATCGCAA prFM1145 pBAD_seq_rev GCGTTCTGATTTAATCTGTATCAGGCT prFM1150 SonCluster_screen_fw GTGCGCCAAAGCAATATGGTGAGTT prFM1151 SonCluster_screen_rev GCGCTATGACTTCCAAATCGGCAAT prFM1152 1479US_fw CCCGGGGGATCCACTAGTGCTACAATAGGGGTAAAG prFM1155 1479DS_rev GAACAAAAGCTGGAGCTCAAATCAGTTGATTATAATGC T prFM1156 1481US_fw CCCGGGGGATCCACTAGTTTGATGAGTCGAGGATGA prFM1157 1481US_rev TTATTTATTTTAGAATATCTCGAGCAGGCTCGACTCTTC CAT prFM1158 1481DS_fw ATGGAAGAGTCGAGCCTGCTCGAGATATTCTAAAATAA ATAAGAGAGC prFM1159 1481DS_rev GAACAAAAGCTGGAGCTCGTCAGCGCTTGGGGCTTA prFM1160 1478US_fw CCCGGGGGATCCACTAGTACAAAAAGCGCCATTGGC prFM1161 1478US_cluster_rev TTATTTATTTTAGAATATCTCGAGCACACAGACGAGTGA TCC prFM1162 1481DS_cluster_fw GGATCACTCGTCTGTGTGCTCGAGATATTCTAAAATAAA TAAGAGAGCC prFM1163 ΔSO1478_mid_seq_fw GCGGCCATCATACCCAAGCA prFM1164 ΔSO1479_mid_seq_fw GGCGAAGGCCGAAGGGTTTT prFM1165 ΔSO1480_mid_seq_fw CCGCGTATCGAGCGTTTA prFM1166 ΔSO1481_mid_seq_fw GGACGTTGCCGAACAATGCCG prFM1167 pSMV3_bb_rev ACTAGTGGATCCCCCGG prFM1168 pSMV3_bb_fw GAGCTCCAGCTTTTGTTCCC prFM1179 ΔSO1479_seq_rv CACGCGCCTGCTCATCGG prFM1180 ΔSO1478_US_seq_rev GCGAGCATTACCCATAAAGAAC prFM1181 ΔSO1478_DS_seq_fw CGTGCGTCTTACTTACGTTTA prFM1182 ΔSO1479_US_seq_rev CGGCATGTTCAATATAGCTGC prFM1183 ΔSO1479_DS_seq_fw GAGCTGGAGATTAATAGCGTACA prFM1184 ΔSO1480_US_seq_rev CGAAAACCCTTCGGCCTTCGCC prFM1185 ΔSO1480_DS_seq_fw GACCTGTGTGGCTTTAATCG prFM1186 ΔSO1481_US_seq_rev CCGTCAGCGCATCCATTTTG prFM1187 ΔSO1481_DS_seq_fw GGTACGTCCTTTAGAGTGGTTA 172 prFM1196 SO1479USnewREV CTTCAATTAATCACCATTACCATGTGAAATTCCAGACAT GTTTTCTCCTTATTG prFM1197 SO1479DSnewFW CAATAAGGAGAAAACATGTCTGGAATTTCACATGGTAA TGGTGATTAATTG prFM1198 SonMT_pBB_gib_fw TCACTAAAGGGAACAAAAGCTGGGTACTACCACTTAAG GAGAGGCATATG prFM1199 SonC_pBB_gib_rev GGCCGCTCTAGAACTAGTGGATCCTTAACTCACATTCTC CCTGTC prFM1200 SonMT_pBB_gib_rev CCGCTCTAGAACTAGTGGATCCTTATCCCAAATCTTCGG GACC prFM1201 SonA_pBB_gib_fw CTCACTAAAGGGAACAAAAGCTGGGTACTTAACAATAA GGAGAAAACATGTCTG prFM1202 SonA_pBB_gib_rev GGCCGCTCTAGAACTAGTGGATCCTTAATCACCATTACC ATGTGAAATAA prFM1203 SonB_pBB_gib_fw TCACTAAAGGGAACAAAAGCTGGGTACTACCTTGTTAT TTCACATGGTAATG prFM1204 SonB_pBB_gib_rev GCCGCTCTAGAACTAGTGGATCCTTAAGGCTGCCTTGCT AAC prFM1205 SonC_pBB_gib_fw CACTAAAGGGAACAAAAGCTGGGTACTTTTCGCCTTTC GCTAACG prFM1206 pBBRB_seq_fw GGCACGACAGGTTTCCCGA prFM1207 pBBRB1_Gib_rev GTACCCAGCTTTTGTTCCCTTTAGTGA prFM1208 pBBRB1_Gib_FW GGATCCACTAGTTCTAGAGCGGC prFM1209 SO1478_pBBAD_gib_fw CCATACCCGTTTTTTTGGGCTAGCGAAGGAGAGGCATAT GGGATCA prFM1210 SO1481_pBBAD_gib_rev CCAAGCTTGCATGCCTGCAGGTTAACTCACATTCTCCCT GTCG prFM1211 SO1479_pBBAD_gib_rev AGCCAAGCTTGCATGCCTGCAGGTTAATCACCATTACCA TGTGAAATAACA prFM1216 pBBAD_gib_rev CGCTAGCCCAAAAAAACGGG prFM1217 pBBAD_gib_fw CCTGCAGGCATGCAAGC prFM1218 SO1478UF-SpeI TAGAACTAGTACAAAAAGCGCCATTGGC prFM1219 SO1478UR-XhoI TAGACTCGAGCACACAGACGAGTGATCCC prFM1220 SO1478DF-XhoI TAGACTCGAGATTGAAGTGTGTTATTGAATCATTATTAA CAATAAGG prFM1221 SO1478DR-SacI TAGAGAGCTCAGAGAATCTAATAAGTAAGAGATAGCAA GCTG prFM1226 SO1480UF-SpeI TAGAACTAGTGCGTATTAAGCCGCAGCTATATT prFM1227 SO1480UR-SphI TAGAGCATGCAACACTAAAAAACTTCAATTAATCACCA TTACC prFM1228 SO1480DF-SphI TAGAGCATGCGTGGATTTGTACTCTATCTGTAACTGATT G prFM1229 SO1480DR-SacI TAGAGAGCTCATATCGGTTTCGAGCTGATGC 173 prFM1234 SO1480_gDNA_rev TTAAGGCTGCCTTGCTAACTCGCTGGTG prFM1235 SO1481_gDNA_rev TTAACTCACATTCTCCCTGTCGCCACGCAT M13F Universal primer GTAAAACGACGGCCAGT M13R Universal primer GGAAACAGCTATGACCATG pBAD-F Universal primer ATGCCATAGCATTTTTATCC pBAD-R Universal primer GATTTAATCTGTATCAGG prMRJ024 sonM forward primer TTGGGATCACTCGTCTGTGTGGGCACT prMRJ025 sonM reverse primer TTATCCCAAATCTTCGGGACCGATCCCTAACTTAGC prMRJ027 sonA reverse primer TTAATCACCATTACCATGTGAAATAACAAGGTAAGATT GATAGCTACTATCACCA pMF1223 (ΔSO1478_pSMV3) Plasmid to generate in-frame deletion of sonM Empty pSMV3 plasmid was propagated using UQ950 E. coli cells, purified, digested with SpeI-HF and SacI-HF, treated with phosphatase, and gel-extracted with a kit (NEB Monarch). Upstream DNA was amplified with primers prFM1218 and prFM1219. Downstream DNA was amplified with prFM1220 and prFM1221. The following PCR conditions were used: initial denaturation 98 °C for 30 s, followed by denaturation at 98 °C for 10 s, annealing at 54 °C for 30 s, and extension at 72 °C for 30 s for 5 cycles, and for the remaining 25 cycles, the annealing temperature was increased to 59.3 °C, and concluded with a final extension of 72 °C for 2 minutes. The PCR product for the upstream DNA was digested with SpeI-HF and XhoI-HF. The PCR product for the downstream DNA was digested with XhoI-HF and SacI-HF. Both PCR products were cleaned up with a kit. All three DNA fragments were subsequently used in a ligation reaction, transformed into electrocompetent UQ950 cells, grown overnight at 37 °C on LB agar with 50 µg/mL kanamycin, and screened by colony PCR for successful ligations. Colony PCR conditions with M13F and M13R primers: initial denaturation 94 °C for 30 s, followed by denaturation at 94 °C for 20 s, annealing at 46 °C for 30 s, and extension at 68 °C for 2 minutes and 25 s for 30 cycles, and concluded with a final extension of 68 °C for 5 minutes. Positive hits were used to inoculate small liquid cultures for subsequent plasmid purification and sequence verification with primers M13F, M13R, and prFM1163. 174 pMF1224 (ΔSO1479_pSMV3) Plasmid to generate in-frame deletion of sonA To generate the backbone, empty pSMV3 plasmid was used as template for a PCR reaction using primers prFM1167 and prFM1168: initial denaturation 98 °C for 30 s, followed by denaturation at 98 °C for 10 s, annealing at 66 °C for 30 s, and extension at 72 °C for 4 minutes for 30 cycles, and concluded with a final extension of 72 °C for 2 minutes. The resulting PCR product was treated with DpnI and cleaned up using a kit (Thermo Scientific). Upstream DNA was amplified using primers prFM1192 and prFM1196. Downstream DNA was amplified using primers prFM1197 and prFM1155. The following PCR condition was used: initial denaturation 98 °C for 30 s, followed by denaturation at 98 °C for 10 s, annealing at 52.4 °C for 30 s, and extension at 72 °C for 30 s for 5 cycles, and for the remaining 25 cycles, the annealing temperature was increased to 64.1 °C, and concluded with a final extension of 72 °C for 2 minutes. The resulting PCR products were cleaned up using a kit (Thermo Scientific) and used as template in an overlap extension PCR. Primers prFM1152 and prFM1155 were added after the first five cycles: initial denaturation 98 °C for 30 s, followed by denaturation at 98 °C for 10 s, annealing at 71 °C for 30 s, and extension at 72 °C for 1 minute, and concluded with a final extension of 72 °C for 2 minute. The resulting PCR product was cleaned up using a kit (Thermo Scientific), assembled into the prepared backbone, and transformed into electrocompetent UQ950 cells, grown overnight at 37 °C on LB agar with 50 µg/mL kanamycin, and screened by colony PCR for successful assemblies. Colony PCR conditions with M13F and M13R primers: initial denaturation 94 °C for 30 s, followed by denaturation at 94 °C for 20 s, annealing at 46 °C for 30 s, and extension at 68 °C for 2 minutes and 25 s for 30 cycles, and concluded with a final extension of 68 °C for 5 minutes. Positive hits were used to inoculate small liquid cultures for subsequent plasmid purification and sequence verification with primers M13F, M13R, and prFM1164. 175 pMF1225 (ΔSO1480_pSMV3) Plasmid to generate in-frame deletion of SO1480 Empty pSMV3 plasmid was propagated using UQ950 E. coli cells, purified, digested with SpeI-HF and SacI-HF, treated with phosphatase, and gel-extracted with a kit (NEB Monarch). Upstream DNA was amplified with primers prFM1226 and prFM1227. Downstream DNA was amplified with prFM1228 and prFM1229. The following PCR conditions were used: initial denaturation 98 °C for 30 s, followed by denaturation at 98 °C for 10 s, annealing at 54.5 °C for 30 s, and extension at 72 °C for 30 s for 5 cycles, and for the remaining 25 cycles, the annealing temperature was increased to 59.5 °C, and concluded with a final extension of 72 °C for 2 minutes. The PCR product for the upstream DNA was digested with SpeI-HF and SphI-HF. The PCR product for the downstream DNA was digested with SphI-HF and SacI-HF. Both PCR products were cleaned up with a kit. All three DNA fragments were subsequently used in a ligation reaction, transformed into electrocompetent UQ950 cells, grown overnight at 37 °C on LB agar with 50 µg/mL kanamycin, and screened by colony PCR for successful ligations. Colony PCR conditions with M13F and M13R primers: initial denaturation 94 °C for 30 s, followed by denaturation at 94 °C for 20 s, annealing at 46 °C for 30 s, and extension at 68 °C for 2 minutes and 25 s for 30 cycles, and concluded with a final extension of 68 °C for 5 minutes. Positive hits were used to inoculate small liquid cultures for subsequent plasmid purification and sequence verification with primers M13F, M13R, and prFM1165. pMF1226 (ΔSO1481_pSMV3) Plasmid to generate in-frame deletion of SO1481 To generate the backbone, empty pSMV3 plasmid was used as template for a PCR reaction using primers prFM1167 and prFM1168: initial denaturation 98 °C for 30 s, followed by denaturation at 98 °C for 10 s, annealing at 66 °C for 30 s, and extension at 72 °C for 4 minutes for 30 cycles, and concluded with a final extension of 72 °C for 2 minutes. The resulting PCR product was treated with DpnI and cleaned up using a kit (Thermo Scientific). Upstream DNA was amplified using primers prFM1156 and prFM1157. Downstream DNA was amplified using primers prFM1158 and prFM1159. The following 176 PCR condition was used: initial denaturation 98 °C for 30 s, followed by denaturation at 98 °C for 10 s, annealing at 52 °C for 30 s, and extension at 72 °C for 30 s for 5 cycles, and for the remaining 25 cycles, the annealing temperature was increased to 70.5 °C, and concluded with a final extension of 72 °C for 2 minutes. The resulting PCR products were cleaned up using a kit (Thermo Scientific), assembled into the prepared backbone, and transformed into electrocompetent UQ950 cells, grown overnight at 37 °C on LB agar with 50 µg/mL kanamycin, and screened by colony PCR for successful assemblies. Colony PCR conditions with M13F and M13R primers: initial denaturation 94 °C for 30 s, followed by denaturation at 94 °C for 20 s, annealing at 46 °C for 30 s, and extension at 68 °C for 2 minutes and 25 s for 30 cycles, and concluded with a final extension of 68 °C for 5 minutes. Positive hits were used to inoculate small liquid cultures for subsequent plasmid purification and sequence verification with primers M13F, M13R, and prFM1166. pMF1227 (ΔSO1478-79-80-81_pSMV3) Plasmid to generate in-frame deletion of the full cluster To generate the backbone, empty pSMV3 plasmid was used as template for a PCR reaction using primers prFM1167 and prFM1168: initial denaturation 98 °C for 30 s, followed by denaturation at 98 °C for 10 s, annealing at 66 °C for 30 s, and extension at 72 °C for 4 minutes for 30 cycles, and concluded with a final extension of 72 °C for 2 minutes. The resulting PCR product was treated with DpnI and cleaned up using a kit (Thermo Scientific). Upstream DNA was amplified using primers prFM1160 and prFM1161. Downstream DNA was amplified using primers prFM1162 and prFM1159. The following PCR condition was used: initial denaturation 98 °C for 30 s, followed by denaturation at 98 °C for 10 s, annealing at 54.5 °C for 30 s, and extension at 72 °C for 30 s for 5 cycles, and for the remaining 25 cycles, the annealing temperature was increased to 69.5 °C, and concluded with a final extension of 72 °C for 2 minutes. The resulting PCR products were cleaned up using a kit (Thermo Scientific), assembled into the prepared backbone, and transformed into electrocompetent UQ950 cells, grown overnight at 37 °C on LB agar with 177 50 µg/mL kanamycin, and screened by colony PCR for successful assemblies. Colony PCR conditions with M13F and M13R primers: initial denaturation 94 °C for 30 s, followed by denaturation at 94 °C for 20 s, annealing at 46 °C for 30 s, and extension at 68 °C for 2 minutes and 25 s for 30 cycles, and concluded with a final extension of 68 °C for 5 minutes. Positive hits were used to inoculate small liquid cultures for subsequent plasmid purification and sequence verification with primers M13F, M13R, and prFM1163. pBBR1MCS2 plasmids: used to complement son BGC genes in the mutant strains The pBBR1MCS2 plasmids used the same prepared backbone. Empty pBBR1MCS2 was PCR amplified using primers prFM1208 and prFM1207 in the following PCR condition: initial denaturation 98 °C for 30 s, followed by denaturation at 98 °C for 7 s, annealing at 68.5 °C for 20 s, and extension at 72 °C for 2 minutes and 45 s for 30 cycles, and concluded with a final extension of 72 °C for 2 minutes. The PCR product was digested with DpnI and purified with a kit (Thermo Scientific). Primers prFM1198 and prFM1200 were used to generate pMF1251 (ΔSO1478_pBBR1MCS2) with the following PCR conditions: initial denaturation 98 °C for 30 s, followed by denaturation at 98 °C for 10 s, annealing at 56 °C for 30 s, and extension at 72 °C for 20 s for 5 cycles, and for the remaining 25 cycles, the annealing temperature was increased to 71 °C, and concluded with a final extension of 72 °C for 2 minutes. Primers prFM1201 and prFM1202 were used to generate pMF1252 (ΔSO1479_pBBR1MCS2) with the following PCR conditions: initial denaturation 98 °C for 30 s, followed by denaturation at 98 °C for 10 s, annealing at 56 °C for 30 s, and extension at 72 °C for 7 s for 5 cycles, and for the remaining 25 cycles, the annealing temperature was increased to 71 °C, and concluded with a final extension of 72 °C for 2 minutes. Primers prFM1203 and prFM1204 were used to generate pMF1253 (ΔSO1480_pBBR1MCS2) with the following PCR conditions: initial denaturation 98 °C for 30 s, followed by denaturation at 98 °C for 10 s, annealing at 56 °C for 30 s, and extension at 72 °C for 60 s for 5 cycles, and for the remaining 25 cycles, the annealing temperature was increased to 71 °C, and concluded with a final extension of 72 178 °C for 2 minutes. Primers prFM1205 and prFM1199 were used to generate pMF1254 (ΔSO1481_pBBR1MCS2) with the same PCR conditions as pMF1253. Primers prFM1198 and prFM1199 were used to generate pMF1255 (ΔSO1478-79-80- 81_pBBR1MCS2) with the following PCR conditions: initial denaturation 98 °C for 30 s, followed by denaturation at 98 °C for 10 s, annealing at 56 °C for 30 s, and extension at 72 °C for 2 minutes and 30 s for 5 cycles, and for the remaining 25 cycles, the annealing temperature was increased to 71 °C, and concluded with a final extension of 72 °C for 2 minutes. PCR products were cleaned up using a kit (Thermo Scientific) and used in individual Gibson assemblies with the described backbone and transformed into electrocompetent TOP10 cells. Transformations were spread on LB agar plates with 50 µg/mL kanamycin and allowed to grow overnight at 37 °C until colonies formed. Individual colonies were screened for successful assembly by colony PCR using primers prFM1206 and M13F with the following condition: initial denaturation 94 °C for 30 s, followed by denaturation at 94 °C for 20 s, annealing at 52.5 °C for 30 s, and extension at 68 °C for 5 minutes for 30 cycles, and concluded with a final extension of 68 °C for 5 minutes. Positive hits were used to inoculate small liquid cultures for subsequent plasmid purification and sequence verification with primers M13F and M13R. pBBAD18K plasmids: to homologously express the son BGC operon. The pBBAD18K plasmids used the same prepared backbone. Empty pBBAD18K was PCR amplified using primers prFM1217 and prFM1216 in the following PCR condition: initial denaturation 98 °C for 30 s, followed by denaturation at 98 °C for 5 s, annealing at 56.4 °C for 10 s, and extension at 72 °C for 3 minutes and 15 s for 30 cycles, and concluded with a final extension of 72 °C for 2 minutes. The PCR product was digested with DpnI and purified with a kit (Thermo Scientific). Primers prFM1209 and prFM121 were used to generate the insert for pMF1274 (SO1478-79_pBBAD18K). The following PCR condition was used: initial denaturation 98 °C for 30 s, followed by denaturation at 98 °C for 5 s, annealing at 53.6 °C for 10 s, and extension at 72 °C for 40 s for 5 cycles, 179 and for the remaining 25 cycles, the annealing temperature was increased to 70.1 °C, and concluded with a final extension of 72 °C for 2 minutes. Primers prFM1209 and prFM1210 were used to generate the insert for pMF1275 (SO1478-79-80-81_pBBAD18K). The following PCR condition was used: initial denaturation 98 °C for 30 s, followed by denaturation at 98 °C for 10 s, annealing at 53.5 °C for 20 s, and extension at 72 °C for 2 min 30 s for 5 cycles, and for the remaining 25 cycles, the annealing temperature was increased to 70 °C, and concluded with a final extension of 72 °C for 2 minutes. Inserts were assembled into the prepared backbone by Gibson assembly, transformed into electrocompetent TOP10 cells, spread onto LB agar plates containing 50 µg/mL kanamycin, and incubated at 37 °C until colonies formed. Individual colonies were screened by colony PCR using primers prFM1144 and prFM1145 using the conditions: initial denaturation 94 °C for 30 s, followed by denaturation at 94 °C for 20 s, annealing at 57.3 °C for 30 s, and extension at 68 °C for 1 minute and 30 s (for pMF1274) or 5 minutes (for pMF1275) for 30 cycles, and concluded with a final extension of 68 °C for 5 minutes. Positive hits were used to inoculate small liquid cultures for subsequent plasmid purification and sequence verification using pBAD-F and pBAD-R primers. 6.8.2 WM3064 cells used for conjugation of S. oneidensis MR-1 Sequence-verified pSMV3, pBBR1MCS2, or pBBAD18K plasmids were transformed into electrocompetent WM3064 E. coli cells for subsequent conjugation into S. oneidensis MR-1, grown on LB agar with 50 µg/mL kanamycin and 3 µM diaminopimelic acid (DAP) overnight at 37 °C until colonies formed. The desired strain of S. oneidensis MR-1 was streaked from a frozen glycerol stock onto LB agar and grown at 30 °C until colonies formed. A fresh colony of the plasmid-harboring WM3064 cell was then patched onto a fresh colony of S. oneidensis MR-1 on LB agar supplemented with 3 µM DAP and grown overnight at 30 °C until a lawn formed. A sterile pipette tip was swiped across the lawn and streaked onto an LB agar plate containing 50 µg/mL kanamycin. The 180 plate was placed into the 30 °C incubator until colonies formed. These colonies may be used to inoculate small liquid cultures for the preparation of freezer glycerol stocks. 6.8.3 Generating S. oneidensis MR-1 mutants with pSMV3 plasmids pSMV3 plasmids were conjugated into S. oneidensis MR-1 and merodiploids were saved as glycerol stocks. Sucrose selection was used to identify clones with a double- crossover event. Several merodiploid clones (either as fresh colonies or glycerol stocks) were streaked onto LB agar plates (with 15% sucrose and no salt) and allowed to grow at 30 °C until colonies formed (up to two days). Individual colonies were screened by colony PCR to verify the correct genomic location of the deletion and the PCR products were subsequently sequence-verified. Colonies were screened by colony PCR using prFM1150 and prFM1151 (anneal in the genome just up- and downstream of SO1478 and SO1481, respectively): initial denaturation 94 °C for 30 s, followed by denaturation at 94 °C for 30 s, annealing at 59.8 °C for 1 minute, and extension at 68 °C for 7 minutes for 30 cycles, and concluded with a final extension of 68 °C for 5 minutes. PCR products corresponding to positive hits were cleaned up and sequence verified. ΔSO1478 was verified with primers prFM1180, prFM1181, and prFM1163. ΔSO1479 was verified with primers prFM1182, prFM1183, and prFM1164. ΔSO1480 was verified with primers prFM1184, prFM1185, and prFM1165. ΔSO1481 was verified with primers prFM1186, prFM1187, and prFM1166. ΔSO1478-79-80-81 was verified with primers prFM1180, prFM1187, and prFM1163. 6.8.4 RNA extraction and reverse transcriptase PCR (RT-PCR) A glycerol stock of S. oneidensis MR-1 WT was streaked onto LB agar from a glycerol stock and allowed to grow at 30 °C until individual colonies formed (overnight). The next day, two colonies were picked and inoculated into 3 mL cultures of LB or SBM in 15 mL conical tubes. The small cultures were put into shaking incubator overnight at 30 °C (approximately 13 hrs). The turbid cultures were used to inoculate fresh 10 mL LB or 181 SBM cultures (in 50 mL conical tubes) to a final OD600 of 0.005 and placed back into the shaking incubator at 30 °C until the LB culture reached an OD600 of 0.34 and the SBM culture reached an OD600 of 0.05 (4 hrs). Ideally, both cultures should have been allowed to reach an OD600 of ~0.6, but time constraints prevented this. At this point, 1x10 8 cells were harvested from each culture and RNA was extracted with the Qiagen RNeasy kit according to the manufacturer’s instructions, with the following exception: an additional DNase treatment and purification was used off-column. For the RT-PCR and cDNA synthesis, SuperScript IV Reverse Transcriptase (Invitrogen) was used according to the manufacturer’s instructions. Two reactions were performed on the LB and SBM samples: one with a primer that specifically anneals to the 3’ end of SO_1478 (prMRJ025) and one reaction using random hexamer primers. Briefly, 1 µL of 2 µM gene-specific primer (or 1 µL of 50 µM random hexamers), 1 µL 10 mM dNTP mix, 10 µL template RNA, and 1 µL RNase-free water was combined into one tube. In another tube, 4 µL 5 X SSIV buffer, 1 µL 100 mM DTT, 1 µL RNaseOUT, and 1 µL SSIV reverse transcriptase was combined. The RNA-primer mix was heated at 65 °C for 2 minutes and then incubated on ice for 1 minute. The contents of the second tube was added to the RNA-primer mix, incubated at 52.5 °C for 10 minutes, and inactivated at 80 °C for 10 minutes. After the reaction, cDNA was stored at -20 °C. In subsequent PCRs using cDNA as template, it was used at 10% of the final volume of the PCR. Negative controls used 25% to ensure a true negative result. For the RT-PCR reaction using the specific primer, only one PCR was performed and it used prMRJ024 and prMRJ025. The following conditions were used: initial denaturation 98 °C for 30 s, followed by denaturation at 98 °C for 10 s, annealing at 64.5 °C for 20 s, and extension at 72 °C for 20 s for 30 cycles, and concluded with a final extension of 72 °C for 2 minutes. The same reaction was used with the random hexamer RT-PCR sample as well as an additional PCR using the primers prMRJ024 and prMRJ027 to probe for the SO1478-79 transcript. The following conditions were used: initial denaturation 98 °C for 30 s, followed by denaturation at 98 °C for 10 s, annealing at 64 °C 182 for 20 s, and extension at 72 °C for 30 s for 30 cycles, and concluded with a final extension of 72 °C for 2 minutes. To probe for the SO1478-79-80 transcript, the following conditions were used with primers prMRJ024 and prFM1234: initial denaturation 98 °C for 30 s, followed by denaturation at 98 °C for 10 s, annealing at 64 °C for 30 s, and extension at 72 °C for 1 minute and 30 s for 30 cycles, and concluded with a final extension of 72 °C for 2 minutes. To probe for the SO1478-79-80-81 transcript, the following conditions were used with primers prMRJ024 and prFM1235: initial denaturation 98 °C for 30 s, followed by denaturation at 98 °C for 10 s, annealing at 64.5 °C for 30 s, and extension at 72 °C for 2 minutes for 30 cycles, and concluded with a final extension of 72 °C for 2 minutes. 6.8.5 Growth curve (aerobic in LB) A glycerol stock of S. oneidensis MR-1 was streaked onto LB agar from a glycerol stock and allowed to grow at 30 °C until individual colonies formed (overnight). Three colonies were used to inoculate three 2 mL LB cultures in 15 mL conical tubes and placed into the 30 °C shaker overnight. The next day, turbid cultures were used to inoculate fresh 10 mL LB cultures in loosely capped clear glass test tubes to achieve an approximate initial OD600 of 0.01. Tubes were placed back into the 30 °C shaker and OD600 readings were taken at indicated time points. 6.8.6 Motility and colony morphology experiments Low-agar motility plates were prepared with LB or SBM for aerobic, micro- aerobic, or anaerobic conditions. For aerobic conditions, the plates were prepared normally. For micro-aerobic conditions, the media was prepared with fumarate and lactate and not degassed. For anaerobic conditions, the media was prepared with fumarate and lactate and subsequently degassed in an anaerobic chamber for at least 24 h prior to use. Glycerol stocks streaked onto LB were used in LB experiments; those streaked onto SBM were used in SBM experiments to avoid contamination with rich media. 183 A glycerol stock of S. oneidensis MR-1 was streaked onto standard agar-content LB or SBM plates and incubated at 30 °C aerobically until colonies formed (overnight). We attempted two methods of inoculating the low-agar motility plates. First, the overnight colonies were directly picked using a sterile pipette tip and stabbed into the low-agar motility plate such that the tip pierced approximately halfway through the depth of the media on the plate. Alternatively, we used the initial LB or SBM plate from the glycerol stock to inoculate a 2 mL liquid culture in the same media with a single colony, which was grown in a 30 °C shaker overnight. The turbid media was then normalized by OD600 for all strains being tested. A volume of 1 µL of the normalized turbid media was used to inoculate the motility plate. Each plate was inoculated with up to three strains: WT (positive control), ΔflgA (negative control), and Δson BGC gene(s) strain of interest. Aerobic experiments were inoculated on the benchtop and then placed at 30 °C in an incubator. Micro-aerobic experiments were inoculated on the benchtop and then placed at 30 °C in an incubator. Anaerobic experiments were inoculated in the 30 °C anaerobic chamber and left in the anaerobic chamber. In all cases, plates were left upright to grow for up to 7 days. Colony morphology experiments were conducted in the same manner as the motility assays except instead of stabbing the cells into the agar, they were carefully pipetted onto the surface of the agar. Soft agar and standard agar was tested. Only aerobic conditions were tested. 6.8.7 Pellicle experiments A glycerol stock of S. oneidensis MR-1 was streaked onto standard agar-content LB plates and incubated at 30 °C aerobically until colonies formed (overnight). Three well- isolated colonies were used to inoculate 2 mL LB cultures in 15 mL conical tubes and allowed to grow at 30 °C in a shaker overnight (three biological replicates for each strain tested). After overnight growth, turbid cultures were used to inoculate fresh LB media at a 100X dilution and placed back into the 30 °C shaking incubator until the OD600 measured ~0.7 (about 3 h). Mid-log phase cultures were then used to inoculate prepared 6- or 24-well 184 plates with a final volume of 4 mL or 2 mL in each well, respectively. For LM or other minimal media, cells were washed twice with the appropriate minimal media prior to inoculation. Dilutions/inoculations into the plates of 500X to 5X were tested. Select wells included 0.3 mM EDTA for negative controls (prevents pellicle formation). After inoculation, plates were left undisturbed on the bench at room temperature for several days. 6.8.8 Mass spectrometric analysis Purified protein was run on an SDS-PAGE gel, stained with Coommassie and destained. After destaining, gel was imaged and appropriate band was excised using a scalpel and cut into 2 mm pieces, which were placed into a LoBind tube (Eppendorf). Gel pieces were destained with 50 mM ammonium bicarbonate (ABC) in a 50% acetonitrile (ACN) solution. Once gel pieces were clear, they were dehydrated with 100% ACN until opaque, at which point ACN was removed. The gel pieces were then re-hydrated with digest buffer according to the manufacturer’s instructions (digest buffer includes the Trypsin (Promega) protease) for 15 minutes on ice. If the gel pieces were no longer submerged in digest buffer, extra buffer was added to cover them and they were subsequently incubated for at least 16 h at 37 °C. After digestion, supernatant was transferred to a fresh LoBind tube and peptides were extracted from the gel pieces with increasing amounts of ACN (50%, 80%, 95%) and 0.3% formic acid (FA). After extraction, peptide solution was kept at -80 °C for at least 30 minutes to inactivate the enzymes and then speed vacuum concentrated to dryness. Peptides were then resuspended in 0.1% FA solution and purified with a C18 ZipTip (MilliporeSigma) according to the manufacturer’s instructions. After purification, samples were speed vacuum concentrated to dryness and resuspended in 20% ACN, 0.1% FA solution for analysis. Samples were loaded onto a Thermo Scientific Fusion mass spectrometer in accordance with our previously published method.115 185 6.8.9 Media recipes For standard petri plates, 1.5% bacteriological agar was used (7.5 g in 500 mL media). For motility plates, 0.3% bacteriological agar was used (1.5 g in 500 mL media). Table 6.8 Luria Broth (LB) for 500 mL Reagent g LB powder 20 Table 6.9 LB plates with 15% sucrose and no salt for 500 mL Reagent g Bacto tryptone 5 Yeast extract 2.5 Sucrose 75 Bacteriological agar 7.5 Table 6.10 LB (anaerobic) for 500 mL Reagent Amount LB powder 20 g Fumaric acid ([final] 40 mM) 2.32 g Lactic acid solution ([final] 20 mM) 1.43 mL of 7 M stock Table 6.11 Shewanella Basal Medium (SBM) recipe for 1 L g mL mM Formula MW Reagent Name [Stock] M 0.224 1.29 K2HPO4 174.18 Potassium phosphate dibasic, anhydrous 0.224 1.65 KH2PO4 136.09 Potassium phosphate monobasic 0.460 7.87 NaCl 58.44 Sodium chloride 0.225 1.70 (NH4)2SO4 132.14 Ammonium sulfate 0.117 0.475 MgSO4*7H2O 246.47 Magnesium sulfate heptahydrate 2.603 10 (HEPES) 260.29 HEPES, sodium salt 1.816 2.86 20 C3H6O3 90.8 Lactic acid 7 4.644 80 40 C4H4O4 116.1 Fumaric Acid 0.5 5 NB vitamin mix 5 NB mineral mix 0.5 0.05% casamino acids 186 Table 6.12 DL (or NB) vitamins for 1L mass (g) Component 0.002 biotin 0.002 folic acid 0.01 pyridoxine HCl 0.005 *riboflavin 0.005 thiamine 0.005 nicotinic acid 0.005 pantothenic acid 0.0001 vitamin B-12 0.005 p-aminobenzoic acid 0.005 thioctic acid Table 6.13 Trace mineral mix for 1L mass (g) Component 1.5 NTA 0.1 MnCl2*4H2O 0.5 FeSO4*7H2O 0.17 CoCl2*6H2O 0.10 ZnCl2 0.03 CuSO4*5H2O 0.005 AlK(SO4)2*12H2O 0.005 H3BO3 0.09 Na2MoO4*6H2O 0.05 NiCl2 0.02 Na2WO4*2H2O 0.10 Na2SeO4 187 Table 6.14 LM media and variations used in this study Standard LM Component name [final] amount for 1L HEPES, pH 7.3 10 mM 20 mL of 500 mM stock NaCl 100 mM 5.84 g KCl 100 mM 7.45 g Yeast extract 0.02% 0.2 g Peptone E (Gelatin) 0.01% 0.1 g Lactate 15 mM 2.15 mL of 7 M stock Water n/a to 1 L LM + Fe Component name [final] amount for 1 L HEPES, pH 7.3 10 mM 25 mL of 400 mM stock NaCl 100 mM 5.84 g KCl 100 mM 7.45 g Yeast extract 0.02% 0.2 g Peptone 0.01% 0.1 g Lactate 15 mM 2.15 mL of 7 M stock FeCl2 5 µM 50 µL of 100 mM stock (in 1N HCl) Water n/a to 1 L LM (NaCl only) Component name [final] amount for 1L HEPES, pH 7.3 10 mM 25 mL of 400 mM stock NaCl 200 mM 11.68 g KCl 0 mM 0 g Yeast extract 0.02% 0.2 g Peptone 0.01% 0.1 g Lactate 15 mM 2.15 mL of 7 M stock Water n/a to 1 L LM (KCl only) Component name [final] amount for 1L HEPES, pH 7.3 10 mM 25 mL of 400 mM stock NaCl 0 mM 0 g KCl 200 mM 14.9 g Yeast extract 0.02% 0.2 g Peptone 0.01% 0.1 g Lactate 15 mM 2.15 mL of 7 M stock Water n/a to 1 L 188 6.8.10 DNA and protein sequences Table 6.15 DNA sequences from the split borosin BGC in S. oneidensis MR-1 Gene ID + annotation DNA sequence SO1478 sonM TTGGGATCACTCGTCTGTGTGGGCACTGGGTTACAGCTCGCGGGGCAAATTAG CGTATTAAGCCGCAGCTATATTGAACATGCCGATATTGTATTTTCACTCTTACC TGACGGTTTCTCGCAGCGTTGGTTGACGAAGCTCAACCCCAATGTCATCAATT TGCAGCAGTTTTATGCGCAAAATGGTGAAGTTAAAAATCGCCGAGACACCTA CGAGCAAATGGTCAATGCCATTCTAGATGCGGTGAGAGCGGGTAAAAAAACC GTGTGTGCACTCTACGGTCATCCGGGGGTATTTGCCTGTGTATCCCATATGGC GATAACTCGGGCGAAGGCCGAAGGGTTTTCGGCAAAGATGGAGCCGGGGATT TCGGCCGAAGCTTGCCTGTGGGCCGACTTAGGGATTGACCCCGGCAACTCGG GGCATCAAAGTTTTGAAGCTAGCCAGTTTATGTTTTTCAACCATGTGCCCGAT CCCACTACCCACTTATTACTCTGGCAAATCGCCATTGCAGGCGAACATACCTT AACCCAATTTCATACCTCGAGTGATAGGTTGCAGATCCTCGTGGAGCAGTTGA ATCAATGGTATCCCCTCGACCATGAGGTGGTCATATACGAAGCGGCCAATTTG CCAATCCAAGCCCCGCGTATCGAGCGTTTACCTTTAGCGAATTTACCCCAAGC ACACTTAATGCCGATTAGTACGTTGTTAATTCCGCCAGCAAAAAAGCTGGAGT ACAACTATGCTATTTTGGCTAAGTTAGGGATCGGTCCCGAAGATTTGGGATAA SO1479 sonA ATGTCTGGATTATCGGATTTTTTTACCCAGTTAGGCCAAGATGCGCAGTTAAT GGAAGACTATAAACAGAATCCTGAGGCGGTGATGCGTGCCCACGGATTAACT GATGAACAAATTAACGCTGTAATGACTGGGGATATGGAAAAGCTCAAAACGT TAAGTGGTGATAGTAGCTATCAATCTTACCTTGTTATTTCACATGGTAATGGT GATTAA SO1480 GGDEF domain TTGAAGTTTTTTAGTGTTTTCATTTTGGCAATCTTGAGCATACTTTGTATGCCC TTGATTGCTTCAACTACAAATTATGATGAAACGCTAACTAAGATTGAAACATT ACAACATTCTGATTTACCTGCTGCTATAAACCTAATTAAAACAATTGAATCTG AATTTGGTGCTATGTCTCGCTTACAGCAGGGACGAGTTTTGCTCTTTAAAGGT GCAGCTTCTATTTATTCAGGACAATATCAAACTGCTATTGAGTTATTGGGCCA AGCAGAAGCCTTACTCAAAGACTCGGAAATGTTATTTTCTGCATATAGTTATG AGGCAACAGCCTATATTGCTTTACGTCATTTCAATGATGCATTTATTGCAATG GGAAAGAGTCTTGGTTTAATCGAACGAATAGAGGACACAAATCTTAAACGTG CGTCTTACTTACGTTTAGCTAGCTTGTATTCGGCTATGGGGATCTCAGAGGAA GTTGCTACATATGCAACTAAAGCGCTTGCTCTTGCCTCCGAAAGTGATGTTAA AGATATTTGTGGGGCTAGGCTTTATCTATCAGTACATCAATTAGAGATAGCGT CGTATGCACAAGCATTTGATGAATTTAAGTCTACTCGTAGTTATTGTGAATCC AGTGGTTACCCACTAATTGCGAATATTGCATTAAAAGGTATGGGAGAATCGA GTCTTAGATTGAATGAGCCACAGCTTGCTATCTCTTACTTATTAGATTCTCTAA AGGGGTATGAATCATTTAATTTTGAGCTGGAGATTAATAGCGTACATGAGTTG CTCAGTGAAGCTTATTTGTTGTTGCAAGATTCTACGAAAGCTGAGTTACATGC TCAATATGTTATGAATCTTGTGGATGATTCTAGCAATACAGAGCTTAAGCACG GTGCATCAGGTGTACTCGCTAAGATTTATGCTCAAAAACAGCAATTTGAACAG GCCTATGAATACTCCAGAAAAGAGCAGCATTATAATCAACTGATTTTTGATGA GTCGAGGATGAAAACCCTTGCTTATCAGGCGGCTAAGTTTAATGCCGATGAGC AGGCGCGTGAAATCAACTTGCTGAATAAAGAGCGTGAACTTTACATCGCCCA GCAAATGGTAAAAGAGCGTGAGTACACTAATATGCTCATGTTTATCACCATAT TAGTGGGTGGGCTGTTTTTTCTTGCGATTTTGTTAGTGGTGGGCAATTTGCAAA AACGGCGTTTTATGCGCATGGCCAAAATGGATGCGCTGACGGGCGTGCTTAA CCGTGGTGCAGGGCAAGATCTTGGCGAAAATATGTTTGTGCAAGCCGCTGCCC 189 GAGGGGGGGATTATTGCGTGATTTTATTCGACTTAGATCATTTTAAACGGATT AATGATTCCTACGGCCATGGCACGGGTGACTGGGCGCTTAAAAAAGTCGTTG AGGTGTTAAAACCTCATATTCGTAATGGCGATGTATTTGCTCGGATTGGCGGC GAAGAGTTTGCCTTATTTTTACCCTATGCCAATGAGGCTAAGGGGCTGGACGT TGCCGAACAATGCCGCAGCCGAATAGAAGCGATTGATACTCATCTGTCGGGG CATAAATTTACCATTACCGCGAGTTTTGGGGTCAGCGGTATGACAAAAGATGA TTTAAGCTTAGATCCACTTTTGCACCGTGCTGATATGGCGCTCTATGCCGCAA AATCGAATGGTCGTAACTGCGTATCTTGTTACCAGGATGCCTTAATGTGCGAT AAGCGAGTCACCAATCTCACCAGCGAGTTAGCAAGGCAGCCTTAA SO1481 kefC-like ATGGAAGAGTCGAGCCTGCTCACTTCTGTACTGTTGTTTTTATTGGCCGCGGT AATTTTTGTGCCCTTGGGCAAACGTTTTGGCGCAGGACCGATTCTGTCTTATCT CGCTGCTGGGGTGATTTTAGGCCCAGGAGGTATGGCATTGGTATCCGATCCTG CGGCCGTGCTGCATTTTGCCGAGCTTGGCGTGGTACTCATGTTGTTTGTACTTG GGCTTGAGCTTAATCCGAGCAAACTGTGGGAACTGCGCAGCGCTATCTTTGGT CTCGGTAGTGGTCAGTTACTCCTTTCTTGGGCGGCAATTGGTGGCTTAGCTTG GGGTTTTGGCTTATCTATTGAGGCGGCACTTGTTGTGGGGGCGGCATTATCGC TGTCATCAACCGCTTTTGCAGTGCAGTTGATGAGTGAACATCGACTGCTAACA ACGCCTCTTGGCCGCGATGCCTTTGGGGTATTGTTGATGCAAGATCTCGCGGT GATCCCCATGTTGCTGTTAACCGCTTACCTTGCGCCCAATTCGGCCCAGATTG AACACCATGCAGTGCCTTGGTATTGGACCTGTGTGGCTTTAATCGGTTTTGTGT TGGTGGGCAAATATCTGTTACCGCGGGTACTTAAACTTGTCGCCAGCAGTGGC GTTCGTGAGGTGTTAACTGCCTTTGCGCTATTGTTGGTGATGGGCAGTGCCCA ATTGATGGAGTGGCTGGGGTTGTCTGCGGGGATGGGCGCATTTTTAGCAGGG ATTATGCTGGCGAACTCCAGCTACCGGCATCAGCTCGAAACCGATATAGAGC CTTTTAAGGGCTTACTGCTTGGGCTGTTTTTTATGGCCGTGGGCATGAGTATGG ATCTCAAACTGTTTTTGACCGATCCCTTACTCATCCTTGCGATTTTGCTGGGTA TGTTACTGATTAAGACCCTAGTGCTGATGCTGCTTGGCCGAGTGCGCCACCAT ACATGGCGCCCGAGTATTGCACTTGGGCTTATCTTGGCCGAGGGCGGTGAGTT TGCCTTTGTGCTGTTATCGCAGGCGCAATTATCCAGCATAGTGGATGATAAGA TTGCCCAAATCTTAGTGCTCGCCATTGGTTTATCCATGGCGGTGACGCCGATG ATTTTCACACTATTTAGAGCCACTAAGCCTAAGGTGGTGGATACGCGCTTGCC CGACACCATCAATGTCACTGAGTCTGAGGTAGTGATTGCCGGATTTGGTCGGG TAGGGCAGATCACGGGACGGATTTTAGCCTCCTCTGGTATTCCATTTGTGGCG TTAGATAAGGATGCCAGCCATGTGGATGTGATCCGTCAATATGGTGGTGAGGT CTATTTTGGCGATGCTAGACGTTTAGATATGTTGATGTCGGCGGGGATTGCGC GCTCGCGGTTATTATTACTGGCTGTTGATAGTGTTGAAGATTCGATTGAAATT GCCCAGCAGGTAAAAACCCATTTTCCCCATATTAATATCATTGCGCGGGCGCG GGATCGTAACCATGCTTACCGATTAATGAGTCTTGGGGTGACTGATGTGTTCC GCGAAACCTTTGGTTCGGCCTTATCAGCCAGTGAGAAGATATTACAGGGCTTA GGTTTATCCCAAGTGCAGGCCAATGAACGGGTGAAGATTTTTGCTGAGCACG ATAAAAAGCTGGTGATAGCCAGTGCCGCTCATCAAAATGATTTGGCAAAACT TATTGATTTATCAAATAAAGGTAAAGCTGAGTTGGAGTCTTTGATGCGTGGCG ACAGGGAGAATGTGAGTTAA SO1482 tonB-like ATGCCAGTATCACAGCCTATATTTCGCTTATCATTAATCACACTCGCCTGTTTC AGCGCTTTAGCGCAAAACGTCTACGCCGAAAATACATCCACTGCACCCGACA CCAACGTCGAGCGTATAACGGTTTATGGCAAACAAAACTCAGTGGTGAAGAA CTCAGGACTCGCGACTAAGTCAGATATGTCTTTGATGGAAACACCCGCCGCCG TGGTGGTAGTAGACCAAGAACTGATCAGTGCTCAAGGGGTCGATAATCTCCA AGATTTAATTCGCAATATCAGCGGAGTGACTCAAGCGGGTAATAATTACGGC ATAGGTGATAACTTAGTTATCCGTGGACTTGGTGCAAACTACACCTTTGATGG TATGTATGGCGGTGCAGGCCTAGGCAACACCTTTAACCCAACACGTTCTTTAA 190 CCAATGTAGAATCCGTTGAAGTGTTAAAAGGCCCCGCAACAGGCCTATACGG TATGGGCAGCGCAGGCGGCGTTATCAACCTAATTGAAAAGAAACCACAATTT GAGTCCAAGCACAAAATCACCACTGAAGTTGGTCAATGGGACACCTACTCAC TCGCCATCGACAGTACGGGTGGAATAACCGATGATGTAGCTTATCGTTTAGTG GCCAAAACCGCCCGCAGTGAGGGCTATCGTGATTTAGGTGCTGACCGTGATG AGGTTTTCGGCTCACTAAAATGGGTATTAAGTGATAGCCAAGATGTGATGCTG TCGGGCGCGTATATCAAAGACGCCATTGCCGTTGACTCTATTGGCCATCCTAT CCGTATATATAACGCCGATTCTGTCGGCGGTAAAACCGCAGGCGAAGTGACTT GGCAAGATTTGATTAACGATCCAAATGGTCAAGGTATACAACTGACCGACGA GCAACGTCAGCAATTAGCAGCATCACTGGCCAGTGGTGATGGCTTAACCCCCT ATGCTTTTGGTGATGCAGGATTAATTTCCCCCATGGCCAAAGATAATGAAGGC GAAGAATTACGCTTCAAGCTGACCCACAATATCTACTTTACCGATAATCTGTT CCTCAACCAGCAATTGCAATATCGTGACTACACCACAGGTTTTGCCCGTCAAA CTGGCGCTTACAACTATGTGTACTGGAATAATAAAGGCAAGATAAACGCAGA TCCCCGCGCCCCACTCGTTGAAAATGGCGTGCTCTATCCCTTTGCTGCAAGAC GTCAGGAATACCGTAAACTCGATGCAGAAGAAACCTCATGGCAGTATTTTGC CGACCTGCGCTATGACTTCCAAATCGGCAATATCGATAATGAGCTTTTGGTAA ATGCTAACTACGAAGATCGCACGATTCGACTAGAACAATTCTCGATTTACGAT GCGGATCAAGTCATTAAAGACAAGAAAAACAACGTCATCTACCGTGGTTCGC TGCCCTATATTTATGATATTCGCAATCCTAATTGGGGAACGGGTAAATTTGAG GACTACGATCCACTCAAAACCGCCAACTACAATAAAAAAGTCAGCGCTTGGG GCTTAGGCGTGCAGCATGTGGGTTATTTAGGTTATGGCTTTACCACTCGTGTT GGCGTGGCCTTTAACGAAATCAAACAAAGCTATGAGCATTTAGGTCTCGATGC GCGTTATAGCGCAGGTCAAGCGAGCCCAACTCCTGAAGCCGATAGCAAAGAT AACGGTATTACCTATAACTTGGGCTTAACCTATATGCCCATCGACGATTTATC GTTTTTTGTTAACCACTCTAAAGGACGTACCGCCTATAGCATTTTAGGCTCAAT CACAGGGAAAAATACCGACCGCGAAGACTCAGAATCTGTCAGTAATGATCTC GGCATGCGCTTTAAAGCCTTTGATGATCAGATGCTGGCTTCGCTCGTCTTCTTT AAAAGCTCACGGACTAATGTCCCTTACAACAACCCAGACTATAACGCCGGCG TGTCAACTGCCGATGTACCAGTGTATTTTTACGATGGCAGTGAAGATACCCAA GGGGTTGAGCTGGATCTCAATGCCCACCTAAATGAGCAATGGCGCGTTAACCT TAATGGCATGTATCAAGACGCAAGGGATAAACAAAACCCGAATGATAAAGCC AACTATGACAGCCGTCAAAAAGGGGTACCTTATGTCACCGCCAGTGCTTGGGT AACCTATGGCGCAGACTGGTTTGCCTTATCAAGCCCAATTGAGCTGAGTTTAG GCGCCAAATATGTGGATGAAAGAAGCACTCACTCAAAAGATTTTGGCATCCC TGATGGCTATGTCCCTAGTTATACTCTGGTAGATTCAGCAGTCAGTTACGCTA CCGACTCCTGGAAGTTACAATTGAATATCAACAACCTATTCAATAAAGACTAT TACAGCAAAGCCATGTTCCTCGGCGGTATGCCAGGCGAAGAGCGCAATGTGA AACTGCAATATAGCTACAGTTTCTAA SO1483 aceB ATGACGGAACACACTTTAAGTGAGCAACAAGTAAATTTGACGCTGAATAAAG CCACTGCGAATGGCACTCTGGCTCTGGTGGGAAATACCATTCCTGGGCAAGA GGTGATTTTTACCGAAGGTGCAATGGCGTTGCTTGAATCACTTTGTCGTGAAT TTGGGGCTGAAGTGCCAACCTTACTCGCCAAGCGTAAAGATAGACAAGCGCG TATCGATAAAGGTGCTTTACCTGACTTTTTACCTGAAACTCGTGCGATTCGTG ATGGCGCTTGGAAGATCCGCGGTATCCCGAATGACCTGCTTGATCGCCGCGTT GAAATTACCGGCCCCGTTGAACGTAAGATGGTAATCAATGCGCTCAATGCCA ATGCCAAAGTGTTTATGGCTGATTTTGAAGACTCTTTGGCACCAAGCTGGCAA AAAGTGGTTGAAGGCCAAATTAACCTGCGTGATGCAGTACGCGGAGAGATTG AATACACGGCCCCAGAAACCGGTAAGCACTATAAGTTAGGCCCTAATCCTGC GGTATTGATCTGCCGTGTACGTGGCCTGCATTTAAAAGAGAAGCACGTTGAAT TTAACCAGCAGTCTATCCCTGGAGGTTTGTTCGATTTTGCGATGTACTTTTACC 191 ATAACTATCGTCAATTGCTGGCGAAGGGCAGTGGTCCTTACTTCTATATTCCA AAACTTGAGAGTCATATCGAGGCGCGCTGGTGGGCAAAAGTGTTTGCTTTTGT TGAGGAAAGATTCTGTCTTCAAGCGGGTACTATCAAATGTACTTGCTTGATTG AGACGCTGCCTGCGGTGTTTGAAATGGATGAAATTCTTTATGAGTTGCGCTCC AACATTGTCGCACTCAACTGTGGCCGTTGGGATTATATCTTCAGCTATATCAA AACGTTAAAACGTCATGGCGATCGTGTGTTACCGGACCGCCAAGCGGTGACT ATGGATACGCCTTTTTTAAGTGCTTACTCCAGACTGTTGATCAAAACCTGCCA TAAACGTGGCGCGTTAGCGATGGGCGGCATGGCTGCCTTTATTCCAGCAAAA GATCCTGCGCAGAACGAAGCTGTGTTGCAGCGGGTTCGAAAGGATAAAGAGC TCGAAGCCCGTAATGGCCACGATGGTACTTGGGTTGCGCATCCCGGTCTGGCG GATACGGCCATGGGGATTTTTAACGAGTATATCGGCCAAGACCATCAAAACC AATTACACATTACCCGCGATGTAGACGCTCCGATTTTAGCCGCTGAGTTATTA AAAACCTGCGATGGCGAGCGCACCGAGCAAGGCATGCGCCTAAATATTCGCA TCGCGCTGCAATATCTGGAGGCATGGATCAGTGGCAACGGTTGTGTGCCGATT TACGGATTAATGGAAGATGCGGCAACCGCTGAAATCTCCCGCGCCTCGATTTG GCAATGGATCCAACATGGTAAGTCACTCTCCAATGGCAAACTTGTTACTAAAC AATTGTTTAAAGACATGCTGGTAGAAGAGTTGGCTAATGTGAAAAAAGAAGT GGGCAGCGACAGATTTACCCACGGCAAATTTACCCAAGCAGCGGTATTGCTT GAGGATATTACCACTTCTGATGAGTTGGTCGACTTCTTAACCTTACCCGGTTA CGAGATGCTAACTGCTTAA SO1484 aceA ATGACTAAGGCAACACAGACTTCACGTCAGGCGCAGATTGACGCGATCAAAA AAGATTGGGCAGAGAATCCACGTTGGAAAAACGTCCGTCGTCCATACACTGC AGAAGAAGTTGTGGCACTTCGTGGTTCAATCGTACCCGAAAACACCATTGCCA AGCGTGGTGCAGCTAAATTGTGGGATCTCGTTAACGGTGGCTCGAAGAAAGG TTATGTGAACTCGTTAGGCGCGCTGACTGGCGGCCAAGCGGTACAGCAAGCT AAAGCGGGTATTGAAGCCATTTATCTGTCTGGTTGGCAAGTGGCGGCCGACGC TAACTTAGCTGGCACCATGTACCCAGATCAATCCTTATACCCAGCAAACTCAG TACCTGCTGTCGTATCACGTATTAATAACTCCTTCCGCCGCGCTGACCAAATC CAGTGGAGCAACGGTGTCAATCCTGAAGATGAAAACTTTGTCGATTATTTCCT GCCGATTATTGCCGATGCGGAAGCAGGTTTTGGTGGCGTACTGAATGCGTTCG AGTTAATGAAGTCGATGATCGACGCAGGCGCCGCTGGTGTGCACTTTGAGGA CCAATTAGCCTCAGTGAAAAAGTGCGGTCATATGGGCGGTAAAGTATTAGTA CCAACCCAAGAAGCGGTACAAAAATTGGTTGCAGCGCGCCTTGCTGCTGACG TGAGCGGTGTTGAAACCTTAGTGATTGCCCGTACCGATGCGAACGCGGCGGA TCTGCTGACCTCTGATTGCGACCCATATGACCGTGATTTTGTCACTGGCGAGC GTACCAACGAAGGTTTCTATCGCGTTAACGCAGGTCTCGACCAAGCAATTTCT CGCGGTCTCGCTTACGCCCCTTATGCAGATTTAATTTGGTGTGAAACTGCTAA GCCAGATTTAGAAGAAGCGCGCCGTTTTGCGGAAGCTATCCATGCTCAGTACC CAGATCAATTACTGGCCTATAACTGCTCACCTTCGTTCAACTGGAAGAAAAAC CTGGACGACGCCACGATTGCACGCTTCCAACAAGCGCTGTCAGACATGGGCT ACAAGTACCAGTTCATCACTTTAGCGGGCATCCATAACATGTGGTACAACATG TTTGACCTGGCTTACGATTATGCTCGTGGTGAAGGTATGAAGCATTATGTTGA GAAAGTTCAAGAAGTTGAGTTTGCGGCAGCGAAGAAAGGTTACACCTTCGTG GCGCATCAACAGGAAGTGGGCACAGGTTATTTCGACCAAGTGACTACGGTTA TCCAAGGCGGCCATTCATCAGTGACTGCACTGACGGGCTCTACCGAAGAAGA GCAGTTTTAA 192 Table 6.16 Protein sequences of the split borosin BGC in S. oneidensis MR-1 Accession + annotation Protein sequence WP_011071665 SonM LGSLVCVGTGLQLAGQISVLSRSYIEHADIVFSLLPDGFSQRWLTKLNPNVINL QQFYAQNGEVKNRRDTYEQMVNAILDAVRAGKKTVCALYGHPGVFACVSH MAITRAKAEGFSAKMEPGISAEACLWADLGIDPGNSGHQSFEASQFMFFNHV PDPTTHLLLWQIAIAGEHTLTQFHTSSDRLQILVEQLNQWYPLDHEVVIYEAA NLPIQAPRIERLPLANLPQAHLMPISTLLIPPAKKLEYNYAILAKLGIGPEDLG* WP_011071666 SonA MSGLSDFFTQLGQDAQLMEDYKQNPEAVMRAHGLTDEQINAVMTGDMEKL KTLSGDSSYQSYLVISHGNGD* WP_011071667 GGDEF protein LKFFSVFILAILSILCMPLIASTTNYDETLTKIETLQHSDLPAAINLIKTIESEFGA MSRLQQGRVLLFKGAASIYSGQYQTAIELLGQAEALLKDSEMLFSAYSYEAT AYIALRHFNDAFIAMGKSLGLIERIEDTNLKRASYLRLASLYSAMGISEEVAT YATKALALASESDVKDICGARLYLSVHQLEIASYAQAFDEFKSTRSYCESSGY PLIANIALKGMGESSLRLNEPQLAISYLLDSLKGYESFNFELEINSVHELLSEAY LLLQDSTKAELHAQYVMNLVDDSSNTELKHGASGVLAKIYAQKQQFEQAYE YSRKEQHYNQLIFDESRMKTLAYQAAKFNADEQAREINLLNKERELYIAQQ MVKEREYTNMLMFITILVGGLFFLAILLVVGNLQKRRFMRMAKMDALTGVL NRGAGQDLGENMFVQAAARGGDYCVILFDLDHFKRINDSYGHGTGDWALK KVVEVLKPHIRNGDVFARIGGEEFALFLPYANEAKGLDVAEQCRSRIEAIDTH LSGHKFTITASFGVSGMTKDDLSLDPLLHRADMALYAAKSNGRNCVSCYQD ALMCDKRVTNLTSELARQP* WP_011071668 KefC-like MEESSLLTSVLLFLLAAVIFVPLGKRFGAGPILSYLAAGVILGPGGMALVSDP AAVLHFAELGVVLMLFVLGLELNPSKLWELRSAIFGLGSGQLLLSWAAIGGL AWGFGLSIEAALVVGAALSLSSTAFAVQLMSEHRLLTTPLGRDAFGVLLMQD LAVIPMLLLTAYLAPNSAQIEHHAVPWYWTCVALIGFVLVGKYLLPRVLKLV ASSGVREVLTAFALLLVMGSAQLMEWLGLSAGMGAFLAGIMLANSSYRHQL ETDIEPFKGLLLGLFFMAVGMSMDLKLFLTDPLLILAILLGMLLIKTLVLMLL GRVRHHTWRPSIALGLILAEGGEFAFVLLSQAQLSSIVDDKIAQILVLAIGLSM AVTPMIFTLFRATKPKVVDTRLPDTINVTESEVVIAGFGRVGQITGRILASSGIP FVALDKDASHVDVIRQYGGEVYFGDARRLDMLMSAGIARSRLLLLAVDSVE DSIEIAQQVKTHFPHINIIARARDRNHAYRLMSLGVTDVFRETFGSALSASEKI LQGLGLSQVQANERVKIFAEHDKKLVIASAAHQNDLAKLIDLSNKGKAELES LMRGDRENVS* WP_011071669 TonB-like MPVSQPIFRLSLITLACFSALAQNVYAENTSTAPDTNVERITVYGKQNSVVKN SGLATKSDMSLMETPAAVVVVDQELISAQGVDNLQDLIRNISGVTQAGNNY GIGDNLVIRGLGANYTFDGMYGGAGLGNTFNPTRSLTNVESVEVLKGPATGL YGMGSAGGVINLIEKKPQFESKHKITTEVGQWDTYSLAIDSTGGITDDVAYRL VAKTARSEGYRDLGADRDEVFGSLKWVLSDSQDVMLSGAYIKDAIAVDSIG HPIRIYNADSVGGKTAGEVTWQDLINDPNGQGIQLTDEQRQQLAASLASGDG LTPYAFGDAGLISPMAKDNEGEELRFKLTHNIYFTDNLFLNQQLQYRDYTTG FARQTGAYNYVYWNNKGKINADPRAPLVENGVLYPFAARRQEYRKLDAEE TSWQYFADLRYDFQIGNIDNELLVNANYEDRTIRLEQFSIYDADQVIKDKKN NVIYRGSLPYIYDIRNPNWGTGKFEDYDPLKTANYNKKVSAWGLGVQHVGY LGYGFTTRVGVAFNEIKQSYEHLGLDARYSAGQASPTPEADSKDNGITYNLG LTYMPIDDLSFFVNHSKGRTAYSILGSITGKNTDREDSESVSNDLGMRFKAFD DQMLASLVFFKSSRTNVPYNNPDYNAGVSTADVPVYFYDGSEDTQGVELDL NAHLNEQWRVNLNGMYQDARDKQNPNDKANYDSRQKGVPYVTASAWVT YGADWFALSSPIELSLGAKYVDERSTHSKDFGIPDGYVPSYTLVDSAVSYAT DSWKLQLNINNLFNKDYYSKAMFLGGMPGEERNVKLQYSYSF* 193 WP_011071670 AceB MTEHTLSEQQVNLTLNKATANGTLALVGNTIPGQEVIFTEGAMALLESLCRE FGAEVPTLLAKRKDRQARIDKGALPDFLPETRAIRDGAWKIRGIPNDLLDRRV EITGPVERKMVINALNANAKVFMADFEDSLAPSWQKVVEGQINLRDAVRGEI EYTAPETGKHYKLGPNPAVLICRVRGLHLKEKHVEFNQQSIPGGLFDFAMYF YHNYRQLLAKGSGPYFYIPKLESHIEARWWAKVFAFVEERFCLQAGTIKCTC LIETLPAVFEMDEILYELRSNIVALNCGRWDYIFSYIKTLKRHGDRVLPDRQA VTMDTPFLSAYSRLLIKTCHKRGALAMGGMAAFIPAKDPAQNEAVLQRVRK DKELEARNGHDGTWVAHPGLADTAMGIFNEYIGQDHQNQLHITRDVDAPIL AAELLKTCDGERTEQGMRLNIRIALQYLEAWISGNGCVPIYGLMEDAATAEIS RASIWQWIQHGKSLSNGKLVTKQLFKDMLVEELANVKKEVGSDRFTHGKFT QAAVLLEDITTSDELVDFLTLPGYEMLTA* WP_011071671 AceA MTKATQTSRQAQIDAIKKDWAENPRWKNVRRPYTAEEVVALRGSIVPENTIA KRGAAKLWDLVNGGSKKGYVNSLGALTGGQAVQQAKAGIEAIYLSGWQVA ADANLAGTMYPDQSLYPANSVPAVVSRINNSFRRADQIQWSNGVNPEDENF VDYFLPIIADAEAGFGGVLNAFELMKSMIDAGAAGVHFEDQLASVKKCGHM GGKVLVPTQEAVQKLVAARLAADVSGVETLVIARTDANAADLLTSDCDPYD RDFVTGERTNEGFYRVNAGLDQAISRGLAYAPYADLIWCETAKPDLEEARRF AEAIHAQYPDQLLAYNCSPSFNWKKNLDDATIARFQQALSDMGYKYQFITLA GIHNMWYNMFDLAYDYARGEGMKHYVEKVQEVEFAAAKKGYTFVAHQQE VGTGYFDQVTTVIQGGHSSVTALTGSTEEEQF* 194 7 Concluding remarks 7.1 Natural product research with synthetic biology tools Natural product research has benefited from improved DNA sequencing technologies. In the late 1980s, after the discovery of PCR,195 penicillin was classified as a non-ribosomal peptide (NRP) when the gene coding for the NRP synthetase (NRPS) responsible for part of its biosynthesis was uncovered.196,197 The connection between the final, bioactive natural product molecule and the genes responsible for its biosynthesis gets at the heart of modern natural product research. Once identified, putative genes or gene clusters can be cloned out of the native organism or synthesized for heterologous expression in a variety of lab-suitable and tractable hosts—allowing for the indirect study of natural product biosynthesis from clusters originating in uncultivated, intractable organisms or from cryptic/silent BGCs.1 To address the challenges in studying biological processes in non-native hosts, natural product research has grown together with the field of synthetic biology.1,198 Just as how understanding of the regulation of penicillin production in filamentous fungi aids in manipulating the organism for increased production titers,199 understanding how individual genes in a known or putative BGC interact with each other is critical for using heterologous expression hosts effectively.198 Strategies for high-throughput refactoring of BGCs to find optimal relative expression levels of genes,200,201 providing sufficient metabolic precursor flux through the pathway,202 and ensuring the presence of appropriate cofactors for active enzymes65 are just a few of the potential challenges that arise in the heterologous study of natural product biosynthesis. Despite these challenges, the need for new antibiotic scaffolds, other therapeutics, and simple curiosity continue to motivate the growth of these fields of study. Our work has made extensive use of a gene-centric (bottom-up) investigational approach—an approach that relies heavily upon public sequence databases and synthetic biology tools. Happily, many of the proteins discussed in this thesis were amenable to 195 heterologous expression, purification, and in vitro studies and required minimal optimization for production. In the Shewanella oneidensis MR-1 split borosin system, we have not yet identified the protease responsible for removing the leader moiety from the precursor peptide, SonA. Characterization of this unknown protein may require more extensive use of synthetic biology tools such that the biosynthesis of the final RiPP natural product may be reconstituted in vitro or in vivo (heterologously). 7.2 Modularity in RiPP biosynthesis to expand the accessible chemical diversity through protein and peptide engineering The modularity of RiPP biosynthesis is primarily centered on the leader moiety of the precursor peptide. This portion of the precursor acts as a handle, recruiting BGC enzymes that may otherwise be slow and promiscuous to the core peptide sequence. This two-part peptide thus allows for engineering from two directions: 1) direct mutagenesis of the core peptide sequence to alter the amino acid sequence of the final natural product and 2) engineering known recognition sequences within the leader peptide to recruit alternative, promiscuous modifying enzymes to the core peptide (i.e., combinatorial biosynthesis). These two approaches can be used individually, together, or alongside other engineering methods. Gu and Schmidt succinctly defined three diversity-generating principles of RiPP biosynthesis and demonstrated how this notion can be applied to engineering practices using cyanobactin biosynthesis as a model system.63 These three principles can be summarized as follows: 1) the core peptide sequence may have a variable/un-conserved sequence, 2) there are recognition motifs in the leader that can be matched to recruited enzymes, and 3) slower BGC enzymes act at later biosynthetic steps.203 Together, these three principles offer a framework for applying knowledge of the RiPP biosynthetic process to understanding its purpose in vivo and its potential engineerability. Figure 7.1 uses the patellamide pathway as an example to summarize these three principles. Specific examples of how these three principles can be applied to the rational engineering of RiPP pathways follow. 196 Figure 7.1 Diversity-generating biosynthesis Adapted from Gu and Schmidt.63 A: Native patellamide pathway for patellins 2 and 3. The precursor peptide has three recognition sequences (RSI, RSII, and RSIII) and two core cassettes. The core peptides have variable amino acid sequences and lengths (Principle 1). The recognition sequences are color coded to match their cognate BGC enzymes (Principle 2). The last enzyme in the pathway, TruF1 is the slowest BGC enzyme in the biosynthesis of patellin 2 and 3 (Principle 3). B: Describes which components of the pathway are related to the three principles. 7.2.1 Principle 1: Variable core peptide sequences The first principle relies on the flexibility of the core peptide with respect to its amino acid sequence and length (Figure 7.1 B). The biosynthesis of the patellamide compounds found in cyanobacteria provides an example of how mutations in core peptides can natively generate chemical diversity within a single RiPP pathway. These cytotoxic, cyclic peptides are members of the cyanobactin family of RiPPs and are templated by multiple diverse core peptides separated by individual recognition and protease cleavage sites on a single precursor peptide.204 This concept is illustrated in Figure 7.1 by two core 197 “cassettes” in the precursor peptide. The native chemical diversity of these compounds achieved through the variability of the core peptide alone is clearly demonstrated through an impressive survey of 46 cyanobacteria isolates. This study demonstrated that, despite the full patellamide gene cluster exhibiting greater than 99% DNA identity, the predicted core regions within the precursor peptides of these highly conserved clusters is hypervariable, exhibiting identity as low as 46%.205 The variability of the core peptide sequences suggests that these RiPP clusters are undergoing substrate evolution while the modifying enzymes remain much more evolutionarily static. While each cyanobacterial isolate only possessed one putative cyanobactin pathway, these organisms are obligate symbionts of marine ascidians (e.g., sea squirts) and many cyanobacteria species may co- colonize a single ascidian organism. This results in a microbial community capable of producing dozens different molecules. Similar pathway gene conservation and hypervariable core sequences can be seen in linaridin206 and microviridin148 RiPP biosynthesis. By utilizing conserved pathway elements and allowing the core peptide to succumb to mutation, a theme common in RiPP biosynthesis at large,207 chemical diversity of RiPP natural products can be achieved through peptide scaffold modification and the use of promiscuous modifying enzymes.208 The first principle can also directly be applied to engineering strategies to produce custom peptide natural products. Since the scaffold of the RiPP natural product is directly templated in the DNA sequence of the core peptide, new molecules can be created via a simple point mutation to the core. First, the desired precursor peptide (with a non-native core sequence) must be produced—this is often accomplished by site directed mutagenesis and subsequent expression of the modified precursor gene, an approach amenable to rational and high-throughput engineering approaches alike. For example, the targeted, saturating mutagenesis of a single residue within a key “hinge” region of nisin produced an analog with more potent antibiotic activity.209 This approach was also used to demonstrate how site-directed mutagenesis of the plantazolicin core peptide can reveal the plasticity of pathway heterocyclases to accept varying core sequences while only installing 198 heterocycles at predictable/conserved locations within the core.210 This approach was applied to a high-throughput study where a library of 106 lanthipeptide cores with identical leader peptides was used to create and identify a RiPP capable of preventing HIV from budding from infected cells.211 Another example of a high-throughput strategy included randomization at seven residues within a thiopeptide core peptide to generate 133 variants. From these 133 variants, 29 were fully matured by the cognate BGC enzymes, 12 retained antibiotic activity, and one had increased antibiotic activity.212 Other synthetic biology technologies can be applied to this approach for the ribosomal incorporation of non- canonical/non-natural amino acids directly into a translated peptide. This has shown particular promise for further expanding the chemical diversity of the lasso peptide family of RiPPs.213 This single principle of using the plasticity of the core peptide in RiPP biosynthesis to generate chemical diversity is prevalent in many additional native and synthetic/engineered examples.204,207,214–216 7.2.2 Principle 2: Recognition motifs in the leader can be matched to recruited BGC enzymes The second principle of diversity-generating biosynthesis dictates that the leader peptide is typically responsible for directing the modification of the core peptide to its final state, relying on specific, conserved recognition sequences (RSs) to recruit dedicated modifying enzymes (Figure 7.1 B). As such, the leader peptide is the other typical target for engineering efforts. As an illustration of this idea in a native context, again consider cyanobactin biosynthesis, which has such well-conserved and characterized RSs that it has been proposed as a model system for RiPP biosynthetic systems. RS I and RS II within the leader peptide recruit a heterocyclase (acting on Cys residues within the core) and the protease responsible for releasing the core from the leader, respectively (Figure 7.1 A).63 This biosynthetic logic can be reduced to a simple engineering strategy: the simple transplantation of known RSs into a single leader peptide to achieve the desired post- translational modification of a custom core peptide. Burkhart et al. succinctly demonstrated this strategy for combining RiPP pathways by engineering a chimeric leader peptide with 199 RSs from two unrelated RiPP families (thaizolone-containing peptides and lanthipeptides) to recruit BGC enzymes from both pathways and successfully modify a similarly chimeric core.217 This second principle can also be used to generate libraries of new RiPP natural products, but this strategy is more amenable to exploiting the cross-reactivity of related RiPP pathways (as opposed to unrelated BGCs).218 Interestingly, the leader peptide is dispensable for some RiPP enzymes/biosynthetic steps. There are generally three situations possible for this so-called leaderless RiPP biosynthesis: 1) the enzyme recognizes the partially modified core peptide which has already been cleaved from its leader, 2) the leader peptide acts in trans with the core peptide, or 3) the leader peptide is dispensable for a particular step as it is not required for tethering the core to the enzyme nor for in trans/allosteric activation of the modifying enzyme. Commonly, the leader peptide needs to be attached to the core in order to tether the core and keep it proximal to the active site of the bound modifying enzyme. However, more examples of leaderless RiPP biosynthesis are becoming apparent.219 In an effort to avoid the downstream leader peptide removal from the core, Oman et al. cleverly fused the leader portion of the lacticin 481 precursor peptide to the LctM synthetase, resulting in a constitutively active enzyme which can catalyze modification onto a synthetic core peptide.220 Microviridin B biosynthesis similarly requires the leader peptide to be present in cis or in trans with the core peptide.221 Some examples of RiPP BGC enzymes that natively do not require a leader are PirF which can geranylate tripeptides,222 PagF, which requires very minimal leader (paper demonstrates use as biotechnological tool),223 and LynF, which prefers the cyclized product without a leader. 7.2.3 Principle 3: Slower BGC enzymes act at later biosynthetic steps Leader-less RiPP biosynthesis is related to the third and final principle of diversity- generating biosynthesis. This principle focuses on the later steps of RiPP biosynthesis— which may occur after the leader has been cleaved from the core, and are typically slower than earlier steps (Figure 7.1 B).65,203 In the patellamide biosynthesis pathway used as an 200 example in Figure 7.1 A, the prenyltransferase acts upon the core peptide after the leader has been removed and is the slowest step in patellin maturation.63,223 This logic goes counter to a typical primary metabolism pathway where the slow, rate-limiting step is early in the pathway and functions as a commitment step in building essential metabolites such as purines and pyrimidines. Natural products are chemically diverse secondary metabolites which have equally diverse bioactivities. While much of RiPP biosynthesis focuses upon the leader peptide, final tailoring of a core peptide helps drive the diversity-generating strategy, where multiple natural product compounds can be biosynthesized from the same BGC.203 This principle has not yet been as heavily exploited or studied as principles 1 and 2, but investigations into the nuanced regulation of RiPP BGCs and the diverse bioactivities of RiPPs are underway.224 7.3 Future directions for engineering borosin RiPPs: α–N-methylation is now an accessible PTM via traditional RiPP biosynthesis Through the work presented in Chapter 2 of this thesis wherein we demonstrate the prevalence of borosins in basidiomycete fungi, we also discovered two additional “types” of fused borosin methyltransferase-precursors. Type 1 includes the founding member of this RiPP family: OphMA, which encodes a hydrophobic core peptide that is posttranslationally α-N-methylated. Type 2 has one characterized example (PgiMA1) and exhibits a similar domain architecture to OphMA with the exception of a repeated motif in the core peptide that is methylated upon a repeated Asp residue. Type 3 also has a single characterized example (AboMA). AboMA has an exceptionally long clasp domain between the catalytic domain and core peptide. Furthermore, the core peptide of AboMA is approximately 70 amino acids in length and may exhibit as many as 35 methylations— the most heavily-methylated borosin precursor we have thus far identified. This survey of borosin methyltransferases and core peptide substrates offers an opportunity to learn the so-called “rules” of α-N-methylation, namely, how the methylation pattern is determined for each characterized protein. We ask questions such as: What role does the enzyme play in the methylation pattern? What role does the core peptide play? Is it sequence-specific 201 and/or reliant on secondary structure of the core? The answers to these questions will allow us to push forward in engineering these unique proteins to make custom α-N-methylated peptides more efficiently. One important flaw in using the fungal borosin systems to engineer custom peptide natural products is the fusion of enzyme to substrate, which results in only one natural product molecule produced for every enzyme (or up to 10-12 in the case of PgiMA1 due to the repeated core motif). We have begun to address this with our discovery of “split borosins” in bacteria. The split borosins offer an opportunity to methylate many core peptide molecules—something that was not possible in the fused fungal borosin systems we discovered. Chapters 3 and 4 of this thesis discuss preliminary work to characterize putative split borosins from Streptomyces spp. NRRL S-118, Rhodospirillum centenum SW, and Shewanella oneidensis MR-1. The borosin methyltransferases and precursors from the former two organisms were recalcitrant to purification but we were able to determine that both systems produce active α-N-methyltransferase enzymes. The split borosin found in S. oneidensis MR-1 has proven pivotal for this work. The methyltransferase (SonM) and precursor (SonA) from this organism heterologously express well in E. coli and are easily purified individually or as a complex. 7.3.1 Biochemical characterization of a split borosin system We were able to obtain sufficiently pure SonM/SonA protein for a rigorous biochemical analysis of these two proteins, discussed in Chapter 5. Mass spectrometric analysis showed multiple substrate turnover of SonA by SonM, which was confirmed with a kinetic analysis. Based on the kinetic analysis of WT and active site mutants of SonM, we were also able to update the originally proposed catalytic mechanism.113 The crystal structures obtained from this investigation were especially useful. Perhaps most excitingly, we were able to crystallize the core peptide in the SonM active site in two conformations. The un-methylated core peptide forms an α-helix, but when methylated, the helix is broken. This rigorous biochemical characterization helps us understand the nuances of how SonM 202 posttranslationally α-N-methylates the core peptide of SonA—knowledge which may be applicable to homologous split borosin systems in other bacteria. In this way, the “minimal” split borosin BGC of S. oneidensis MR-1 is the blank slate for future engineering applications. Possible avenues may include probing the maximum length of a core peptide, testing a leader-less core peptide as substrate, using S-adenosyl ethionine as a cofactor (instead of SAM) to ethylate the core, expanding the active site cavity of SonM, and more. 7.4 Future directions for investigating how RiPPs are involved in central metabolism/homeostasis in bacteria Most RiPPs are considered to be secondary metabolite toxins, often acting as antibiotics or cytotoxins. Relatively few RiPPs are known to be involved in signaling or other primary metabolic pathways. PQQ and MFT are bacterial redox cofactors and are among the very few examples of non-toxin RiPPs.68,190 PQQ and MFT are also very small molecules, both originating from only two amino acid residues. The putative SonA core peptide is similarly small; consisting of as few as three amino acid residues. The split borosin BGC and proximal genes in S. oneidensis MR-1 have potential roles within such critical biological processes as the regulation of aerobic/anaerobic metabolism, motility, and/or pellicle formation—processes all connected by Arc transcriptional regulation and c-di-GMP signaling. Furthermore, this BGC is conserved in the majority of Shewanella spp. whose genomes are currently published on NCBI—a level of conservation not commonly found in natural product biosynthesis. For these reasons, we hypothesize that the final RiPP natural product associated with this BGC plays a signaling role in one or more of the aforementioned processes. The close association and conservation of the diguanylate cyclase signal transduction protein makes this especially compelling. The work presented in Chapter 6 of this thesis details progress made towards identifying a phenotype related to this BGC in S. oneidensis MR-1, a genetically tractable and well-studied organism. Mutants with in-frame deletions within this BGC were generated, along with complementation plasmids/strains, and inducible 203 over-expression plasmids/strains. With these bacterial strains in hand, future experiments to identify a phenotype and the structure of the final natural product can be conducted. The body of literature and molecular tools supporting this unique bacterium create a strong foundation for future research which may reveal the borosin RiPP natural product from the son BGC to play an important role in the homeostasis or metabolism of its host. 204 8 Bibliography 1. Miller FS, Freeman MF. Impact of synthetic biology on secondary metabolite biosynthesis. In: Modern Biocatalysis: Advances Towards Synthetic Biological Systems. ; 2018:287-320. doi:10.1039/9781788010450-00287 2. Walsh CT, Wencewicz TA. Prospects for new antibiotics: A molecule-centered perspective. J Antibiot (Tokyo). 2014;67(1):7-22. doi:10.1038/ja.2013.49 3. Weisblum B. Erythromycin resistance by ribosome modification. Antimicrob Agents Chemother. 1995;39(3):577-585. doi:10.1128/AAC.39.3.577 4. Spratt BG. Biochemical and genetical approaches to the mechanism of action of penicillin. Philos Trans R Soc Lond B Biol Sci. 1980;289(1036):273-283. doi:10.1098/rstb.1980.0045 5. Mastropaolo D, Camerman A, Luo Y, Brayer GD, Camerman N. Crystal and molecular structure of paclitaxel (taxol). Proc Natl Acad Sci USA. 1995;92(15):6920-6924. doi:10.1073/pnas.92.15.6920 6. Mackay M, Hodgkin DC. A crystallographic examination of the structure of morphine. J Chem Soc. 1955:3261-3267. doi:10.1039/JR9550003261 7. Shimizu Y, Chou HN, Bando H, Duyne G Van, Clardy JC. Structure of brevetoxin A (GB-1 toxin), the most potent toxin in the florida red tide organism Gymnodinium breve (Ptychodiscus brevis). J Am Chem Soc. 1986;108(3):514-515. doi:10.1021/ja00263a031 8. Niftrik LA, Fuerst JA, Damsté JSS, Kuenen JG, Jetten MSM, Strous M. The anammoxosome: an intracytoplasmic compartment in anammox bacteria. FEMS Microbiol Lett. 2004;233(1):7-13. doi:10.1016/j.femsle.2004.01.044 9. Fleming A. On the antibacterial action of cultures of A. penicillium, with special reference to their use in the isolation of B. influenzae. Br J Exp Pathol. 1929;10(3):226-236. doi:10.1093/clinids/2.1.129 10. Hamed RB, Gomez-Castellanos JR, Henry L, Ducho C, McDonough MA, Schofield 205 CJ. The enzymes of β-lactam biosynthesis. Nat Prod Rep. 2013;30(1):21-107. doi:10.1039/c2np20065a 11. Carretto E, Visiello R, Nardini P. Methicillin resistance in Staphylococcus aureus. Pet-to-Man Travel Staphylococci A World Prog. 2018;85(Pt 1):225-235. doi:10.1016/B978-0-12-813547-1.00017-0 12. Acred P, Brown DM, Turner DH, Wilson MJ. Pharmacology and chemotherapy of ampicillin—a new broad‐spectrum penicillin. Br J Pharmacol Chemother. 1962;18(2):356-369. doi:10.1111/j.1476-5381.1962.tb01416.x 13. Wiley PF, Gerzon K, Flynn EH, et al. Structure of erythromycin. J Am Chem Soc. 1957;79(22):6062-6070. doi:10.1021/ja01579a059 14. Pitt GJ. A refinement of the crystal structure of potassium benzylpenicillin. Acta Crystallogr. 1952;5:770-775. doi:10.1107/S0365110X65003365 15. Fehlhaber H, Girg M, Seibert G, et al. Moenomycin A: A structural revision and new structure-activity relations. Tetrahedron. 1990;46(5):1557-1568. doi:10.1016/S0040-4020(01)81965-7 16. Molohon KJ, Blair PM, Park S, et al. Plantazolicin is an ultra-narrow spectrum antibiotic that targets the Bacillus anthracis membrane. ACS Infect Dis. 2016;2(3):207-220. doi:10.1021/acsinfecdis.5b00115 17. Scholz R, Molohon KJ, Nachtigall J, et al. Plantazolicin, a novel microcin B17/streptolysin S-like natural product from Bacillus amyloliquefaciens FZB42. J Bacteriol. 2011;193(1):215-224. doi:10.1128/JB.00784-10 18. Matsumoto T, Yanagiya M, Maeno S, Yasuda S. A revised structure of pederin. Tetrahedron. 1968;60:6297-6300. doi:10.1016/S0040-4039(00)75458-X 19. Damste JSS, Strous M, Rijpstra WIC, et al. Linearly concatenated cyclobutane lipids form a dense bacterial membrane. Nature. 2002;419(6908):708-712. doi:10.1038/nature01067 20. Brogden KA. Antimicrobial peptides: Pore formers or metabolic inhibitors in bacteria? Nat Rev Microbiol. 2005;3(3):238-250. doi:10.1038/nrmicro1098 206 21. Tsomaia N. Peptide therapeutics: Targeting the undruggable space. Eur J Med Chem. 2015;94:459-470. doi:10.1016/j.ejmech.2015.01.014 22. Kam A, Loo S, Fan J-S, Sze SK, Yang D, Tam JP. Roseltide rT7 is a disulfide-rich, anionic, and cell-penetrating peptide that inhibits proteasomal degradation. J Biol Chem. 2019;294:19604-19615. doi:10.1074/jbc.RA119.010796 23. Field B, Osbourn AE. Metabolic diversification--Independent assembly of operon- like gene clusters in different plants. Science. 2008;320(5875):543-547. doi:10.1126/science.1154990 24. Walsh CT, Fischbach MA. Natural products version 2.0: Connecting genes to molecules. J Am Chem Soc. 2010;8(132):2469-2493. doi:10.1021/ja909118a 25. Sy-Cordero AA, Pearce CJ, Oberlies NH. Revisiting the enniatins: A review of their isolation, biosynthesis, structure determination and biological activities. J Antibiot (Tokyo). 2012;65(11):541-549. doi:10.1038/ja.2012.71 26. Kleinkauf H, von Döhren H. Nonribosomal biosynthesis of peptide antibiotics. Eur J Biochem. 1990;192(1):151-165. doi:10.1007/978-3-642-76168-3_11 27. von Döhren H, Kleinkauf H. Research on nonribosomal systems. In: The Roots of Modern Biochemistry. ; 1988:355-367. doi:10.1002/jmr.300010418 28. Gause GF, Brazhnikova MG. Gramicidin S and its use in the treatment of infected wounds. Nature. 1944;154(3918):703-703. doi:10.1038/154703a0 29. Kleinkauf H, Gevers W. Nonribosomal polypeptide biosynthesis: The biosynthesis of a cyclic peptide antibiotic, gramicidin S. Cold Spring Harb Symp Quant Biol. 1969;34:805-813. doi:10.1101/sqb.1969.034.01.092 30. Tomino S, Yamada M, Itoh H, Kurahashi K. Cell-free synthesis of gramicidin S. Biochemistry. 1967;6(8):2552-2560. doi:10.1021/bi00860a037 31. Hajime B, Yamada M, Tomino S, Kurahashi K. The role of two complementary fractions of gramicidin S synthesizing enzyme system. J Biochem. 1968;64(2):259- 261. doi:10.1093/oxfordjournals.jbchem.a128888 32. Kubota K. Biosynthesis of linear gramicidin, pentadeca peptide, is tight linked to 207 serine metabolism and to membranous phosphoglyceride. In: The Roots of Modern Biochemistry. ; 1988:331-337. doi:10.1002/jmr.300010418 33. Lee SG, Lipmann F. Tyrocidine synthetase system. Methods Enzymol. 1975;43:585- 602. doi:10.1016/0076-6879(75)43121-4 34. Fujikawa K, Sakamoto Y, Kurahashi K. Biosynthesis of tyrocidine by a cell-free enzyme system of Bacillus brevis ATCC 8185: III. further purification of components I and II and their functions in tyrocidine synthesis. J Biochem. 1971;69(5):869-879. doi:10.1093/oxfordjournals.jbchem.a129538 35. Kratzschmar J, Krause M, Marahiel MA. Gramicidin S biosynthesis operon containing the structural genes grsA and grsB has an open reading frame encoding a protein homologous to fatty acid thioesterases. Microbiology. 1989;171(10):5422- 5429. doi:10.1128/jb.171.10.5422-5429.1989 36. Mittenhuber G, Weckermann R, Marahiel MA. Gene cluster containing the genes for tyrocidine synthetases 1 and 2 from Bacillus brevis: Evidence for an operon. J Bacteriol. 1989;171(9):4881-4887. doi:10.1128/jb.171.9.4881-4887.1989 37. Schwarzer D, Finking R, Marahiel M a. Nonribosomal peptides: from genes to products. Nat Prod Rep. 2003;20(3):275-287. doi:10.1039/b111145k 38. Hoppert M, Gentzsch C, Schörgendorfer K. Structure and localization of cyclosporin synthetase, the key enzyme of cyclosporin biosynthesis in Tolypocladium inflatum. Arch Microbiol. 2001;176(4):285-293. doi:10.1007/s002030100324 39. Glinski M, Urbanke C, Hornbogen T, Zocher R. Enniatin synthetase is a monomer with extended structure: Evidence for an intramolecular reaction mechanism. Arch Microbiol. 2002;178(4):267-273. doi:10.1007/s00203-002-0451-1 40. Zocher R, Salnikow J, Kleinkauf H. Biosynthesis of enniatin B. FEBS Lett. 1976;71(1):13-17. doi:10.1016/0014-5793(76)80887-3 41. Hoyer KM, Mahlert C, Marahiel MA. The iterative gramicidin S thioesterase catalyzes peptide ligation and cyclization. Chem Biol. 2007;14(1):13-22. 208 doi:10.1016/j.chembiol.2006.10.011 42. Caboche S, Leclère V, Pupin M, Kucherov G, Jacques P. Diversity of monomers in nonribosomal peptides: Towards the prediction of origin and biological activity. J Bacteriol. 2010;192(19):5143-5150. doi:10.1128/JB.00315-10 43. Nguyen KT, Ritz D, Gu J-Q, et al. Combinatorial biosynthesis of novel antibiotics related to daptomycin. Proc Natl Acad Sci USA. 2006;103(46):17462-17467. doi:10.1073/pnas.0608589103 44. Stachelhaus T, Mootz HD, Marahiel MA. The specificity-conferring code of adenylation domains in nonribosomal peptide synthetases. Chem Biol. 1999;6(8):493-505. doi:10.1016/S1074-5521(99)80082-9 45. Reimer JM, Eivaskhani M, Harb I, Guarné A, Weigt M, Schmeing TM. Structures of a dimodular nonribosomal peptide synthetase reveal conformational flexibility. Science. 2019;366. doi:10.1126/science.aaw4388 46. Marahiel MA. A structural model for multimodular NRPS assembly lines. Nat Prod Rep. 2016;33(2):136-140. doi:10.1039/c5np00082c 47. Hahn M, Stachelhaus T. Harnessing the potential of communication-mediating domains for the biocombinatorial synthesis of nonribosomal peptides. Proc Natl Acad Sci USA. 2006;103(2):275-280. doi:10.1073/pnas.0508409103 48. Arnison PG, Bibb MJ, Bierbaum G, et al. Ribosomally synthesized and post- translationally modified peptide natural products: overview and recommendations for a universal nomenclature. Nat Prod Rep. 2013;30(1):108-160. doi:10.1039/C2NP20085F 49. Rogers LA. The inhibiting effect of Streptococcus lactis on Lactobacillus bulgaricus. J Bacteriol. 1928;16(5):321-325. doi:16559344 50. Ingram L. A ribosomal mechanism for synthesis of peptides related to nisin. BBA Sect Nucleic Acids Protein Synth. 1970;224(1):263-265. doi:10.1016/0005- 2787(70)90642-8 51. Schnell N, Entian KD, Schneider U, et al. Prepeptide sequence of epidermin, a 209 ribosomally synthesized antibiotic with four sulphide-rings. Nature. 1988;333(6170):276-278. doi:10.1038/333276a0 52. Buchman GW, Banerjee S, Hansen JN. Structure, expression, and evolution of a gene encoding the precursor of nisin, a small protein antibiotic. J Biol Chem. 1988;263(31):16260-16266. 53. Kaletta C, Entian K-D. Nisin, a peptide antibiotic: Cloning and sequencing of the nisA gene and posttranslational processing of its peptide product. J Bacteriol. 1989;171(3):1597-1601. doi:10.1128/jb.171.3.1597-1601.1989 54. Banerjee S, Hansen JN. Structure and expression of a gene encoding the precursor of subtilin, a small protein antibiotic. J Biol Chem. 1988;263(19):9508-9514. 55. Kaletta C, Entian K, Kellner R, Jung G, Reis M, Sahl H-G. Pep5, a new lantibiotic: Structural gene isolation and prepeptide sequence. Arch Microbiol. 1989;152:16-19. doi:doi.org/10.1007/BF00447005 56. Hetrick KJ, van der Donk WA. Ribosomally synthesized and post-translationally modified peptide natural product discovery in the genomic era. Curr Opin Chem Biol. 2017;38:36-44. doi:10.1016/j.cbpa.2017.02.005 57. Wilson MC, Piel J. Metagenomic approaches for exploiting uncultivated bacteria as a resource for novel biosynthetic enzymology. Chem Biol. 2013;20(5):636-647. doi:10.1016/j.chembiol.2013.04.011 58. Leslie Evans III R. Protein structures elucidating the post-ribosomal biosynthesis of pyrroloquinoline quinone. 2017. doi:978-0-355-32869-1 59. Ding W, Liu W-Q, Jia Y, Li Y, van der Donk WA, Zhang Q. Biosynthetic investigation of phomopsins reveals a widespread pathway for ribosomal natural products in Ascomycetes. Proc Natl Acad Sci USA. 2016;113(13):3521-3526. doi:10.1073/pnas.1522907113 60. van der Velden NS, Kaelin N, Helf MJ, Piel J, Freeman MF, Kuenzler M. Autocatalytic backbone N-methylation in a family of ribosomal peptide natural products. Nat Chem Biol. 2017;13:833-835. doi:10.1038/nchembio.2393 210 61. Johnson RD, Lane GA, Koulman A, et al. A novel family of cyclic oligopeptides derived from ribosomal peptide synthesis of an in planta-induced gene, gigA, in Epichloë endophytes of grasses. Fungal Genet Biol. 2015;85:14-24. doi:10.1016/j.fgb.2015.10.005 62. Benjdia A, Guillot A, Ruffié P, Leprince J, Berteau O. Post-translational modification of ribosomally synthesized peptides by a radical SAM epimerase in Bacillus subtilis. Nat Chem. 2017;9:698-707. doi:10.1038/nchem.2714 63. Gu W, Schmidt EW. Three principles of diversity-generating biosynthesis. Acc Chem Res. 2017;50(10):2569-2576. doi:10.1021/acs.accounts.7b00330 64. Freeman MF, Gurgui C, Helf MJ, et al. Metagenome mining reveals polytheonamides as posttranslationally modified ribosomal peptides. Science. 2012;338(6105):387-390. doi:10.1126/science.1226121 65. Freeman MF, Helf MJ, Bhushan A, Morinaka BI, Piel J. Seven enzymes create extraordinary molecular complexity in an uncultivated bacterium. Nat Chem. 2016;9:387-395. doi:10.1038/nchem.2666 66. Kelly WL, Pan L, Li C. Thiostrepton biosynthesis: Prototype for a new family of bacteriocins. J Am Chem Soc. 2009;131(12):4327-4334. doi:10.1021/ja807890a 67. Lubelski J, Rink R, Khusainov R, Moll GN, Kuipers OP. Biosynthesis, immunity, regulation, mode of action and engineering of the model lantibiotic nisin. Cell Mol Life Sci. 2008;65(3):455-476. doi:10.1007/s00018-007-7171-2 68. Meulenberg JJM, Sellink E, Riegman NH, Postma PW. Nucleotide sequence and structure of the Klebsiella pneumoniae pqq operon. MGG Mol Gen Genet. 1992;232(2):284-294. doi:10.1007/BF00280008 69. Okada M, Yamaguchi H, Sato I, Tsuji F, Dubnau D, Sakagami Y. Chemical structure of posttranslational modification with a farnesyl group on tryptophan. Biosci Biotechnol Biochem. 2008;72(3):914-918. doi:10.1271/bbb.80006 70. Latham JA, Iavarone AT, Barr I, Juthani P V., Klinman JP. PqqD is a novel peptide chaperone that forms a ternary complex with the radical S-adenosylmethionine 211 protein PqqE in the pyrroloquinoline quinone biosynthetic pathway. J Biol Chem. 2015;290(20):12908-12918. doi:10.1074/jbc.M115.646521 71. Burkhart BJ, Hudson GA, Dunbar KL, Mitchell DA. A prevalent peptide-binding domain guides ribosomal natural product biosynthesis. Nat Chem Biol. 2015;11(8):564-570. doi:10.1038/nchembio.1856 72. Tsai TY, Yang CY, Shih HL, Wang AHJ, Chou SH. Xanthomonas campestris PqqD in the pyrroloquinoline quinone biosynthesis operon adopts a novel saddle-like fold that possibly serves as a PQQ carrier. Proteins Struct Funct Bioinforma. 2009;76(4):1042-1048. doi:10.1002/prot.22461 73. Regni CA, Roush RF, Miller DJ, Nourse A, Walsh CT, Schulman BA. How the MccB bacterial ancestor of ubiquitin E1 initiates biosynthesis of the microcin C7 antibiotic. EMBO J. 2009;28(13):1953-1964. doi:10.1038/emboj.2009.146 74. Koehnke J, Mann G, Bent AF, et al. Structural analysis of leader peptide binding enables leader-free cyanobactin processing. Nat Chem Biol. 2015;11(8):558-563. doi:10.1038/nchembio.1841 75. Ortega MA, Hao Y, Zhang Q, Walker MC, Van Der Donk WA, Nair SK. Structure and mechanism of the tRNA-dependent lantibiotic dehydratase NisB. Nature. 2015;517(7535):509-512. doi:10.1038/nature13888 76. Schwalen CJ, Hudson GA, Kille B, Mitchell DA. Bioinformatic expansion and discovery of thiopeptide antibiotics. J Am Chem Soc. 2018;140(30):9494-9501. doi:10.1021/jacs.8b03896 77. Blin K, Wolf T, Chevrette MG, et al. AntiSMASH 4.0 - improvements in chemistry prediction and gene cluster boundary identification. Nucleic Acids Res. 2017;45(W1):W36-W41. doi:10.1093/nar/gkx319 78. Van Heel AJ, De Jong A, Song C, Viel JH, Kok J, Kuipers OP. BAGEL4: A user- friendly web server to thoroughly mine RiPPs and bacteriocins. Nucleic Acids Res. 2018;46(W1):W278-W281. doi:10.1093/nar/gky383 79. Skinnider MA, Johnston CW, Edgar RE, et al. Genomic charting of ribosomally 212 synthesized natural product chemical space facilitates targeted mining. Proc Natl Acad Sci USA. 2016;(18):E6343-E6351. doi:10.1073/pnas.1609014113 80. Agrawal P, Khater S, Gupta M, Sain N, Mohanty D. RiPPMiner: A bioinformatics resource for deciphering chemical structures of RiPPs based on prediction of cleavage and cross-links. Nucleic Acids Res. 2017;45(W1):W80-W88. doi:10.1093/nar/gkx408 81. Haft DH, Basu MK, Mitchell DA. Expansion of ribosomally produced natural products: A nitrile hydratase- and Nif11-related precursor family. BMC Biol. 2010;8(70):1-15. doi:10.1186/1741-7007-8-70 82. Cox CL, Doroghazi JR, Mitchell DA. The genomic landscape of ribosomal peptides containing thiazole and oxazole heterocycles. BMC Genomics. 2015;16(1):1-16. doi:10.1186/s12864-015-2008-0 83. Lewis K. Platforms for antibiotic discovery. Nat Rev Drug Discov. 2013;12(5):371- 387. doi:10.1038/nrd3975 84. Yu J, Zhu X, Yang Y, Luo S, Zhangsun D. Expression in Escherichia coli of fusion protein comprising α-conotoxin TxIB and preservation of selectivity to nicotinic acetylcholine receptors in the purified product. Chem Biol Drug Des. 2017;91(2):349-358. doi:10.1111/cbdd.13104 85. Luther A, Bisang C, Obrecht D. Advances in macrocyclic peptide-based antibiotics. Bioorg Med Chem. 2017. doi:10.1016/j.bmc.2017.08.006 86. Chekan JR, Estrada P, Covello PS, Nair SK. Characterization of the macrocyclase involved in the biosynthesis of RiPP cyclic peptides in plants. 2017:1-6. doi:10.1073/pnas.1620499114 87. Buczek O, Wei D, Babon JJ, et al. Structure and sodium channel activity of an excitatory I 1-superfamily conotoxin. Biochemistry. 2007;46(35):9929-9940. doi:10.1021/bi700797f 88. Renevey A, Riniker S. The importance of N-methylations for the stability of the β6.3-helical conformation of polytheonamide B. Eur Biophys J. 2017;46(4):363- 213 374. doi:10.1007/s00249-016-1179-1 89. Mahanta N, Hudson GA, Mitchell DA. Radical S-adenosylmethionine enzymes involved in RiPP biosynthesis. Biochemistry. 2017;56(40):5229-5244. doi:10.1021/acs.biochem.7b00771 90. Hegemann JD, Zimmermann M, Xie X, Marahiel MA. Lasso peptides: An intriguing class of bacterial natural products. Acc Chem Res. 2015;48(7):1909-1919. doi:10.1021/acs.accounts.5b00156 91. McBrayer DN, Gantman BK, Tal-Gan Y. N-Methylation of amino acids in gelatinase biosynthesis-activating pheromone identifies key site for stability enhancement with retention of the Enterococcus faecalis fsr quorum sensing circuit response. ACS Infect Dis. 2019;5:1035-1041. doi:10.1021/acsinfecdis.9b00097 92. White TR, Renzelman CM, Rand AC, et al. On-resin N-methylation of cyclic peptides for discovery of orally bioavailable scaffolds. Nat Chem Biol. 2011;7(11):810-817. doi:10.1038/nchembio.664 93. Plattner PA, Nager U. Über die Konstitution von Enniatin B. Helv Chim Acta. 1948;31(2):665-671. doi:10.1002/hlca.19480310248 94. Zocher R, Kleinkauf H. Biosynthesis of Enniatin B: Partial purification and characterization of the synthesizing enzyme and studies of the biosynthesis. Biochem Biophys Res Commun. 1978;81(4):1162-1167. doi:10.1016/0006- 291X(78)91258-5 95. Zocher R, Keller U, Kleinkauf H. Enniatin synthetase, a novel type of multifunctional enzyme catalyzing depsipeptide synthesis in Fusarium oxysporum. Biochemistry. 1982;21(1):43-48. doi:10.1021/bi00530a008 96. Billich A, Zocher R. N-Methyltransferase function of the multifunctional enzyme enniatin synthetase. Biochemistry. 1987;26(25):8417-8423. doi:10.1021/bi00399a058 97. Hou Y, Tianero MDB, Kwan JC, et al. Structure and biosynthesis of the antibiotic bottromycin D. Org Lett. 2012;14(19):5050-5053. doi:10.1021/ol3022758 214 98. Claesen J, Bibb M. Genome mining and genetic analysis of cypemycin biosynthesis reveal an unusual class of posttranslationally modified peptides. Proc Natl Acad Sci USA. 2010;107(37):16297-16302. doi:10.1073/pnas.1008608107 99. Velkov T, Swarbrick JD, Hussein MH, et al. The impact of backbone N-methylation on the structure‐activity relationship of Leu10‐teixobactin. J Pept Sci. 2019;25(9):1- 9. doi:10.1002/psc.3206 100. Mayer A, Anke H, Sterner O. Omphalotin, a new cyclic peptide with potent nematicidal activity from Omphalotus olearius I. Fermentation and biological activity. Nat Prod Lett. 1997;10(1):25-32. doi:10.1080/10575639708043691 101. Sterner O, Etzel W, Mayer A, Anke H. Omphalotin, a new cyclic peptide with potent nematicidal activity from Omphalotus Olearius II. Isolation and structure determination. Nat Prod Lett. 1997;10(1):33-38. doi:10.1080/10575639708043692 102. Büchel E, Martini U, Mayer A, Anke H, Sterner O. Omphalotins B, C and D, nematicidal cyclopeptides from Omphalotus olearius. Absolute configuration of omphalotin A. Tetrahedron. 1998;54(20):5345-5352. doi:10.1016/S0040- 4020(98)00209-9 103. Liermann JC, Opatz T, Kolshorn H, Antelo L, Hof C, Anke H. Omphalotins E-I, five oxidatively modified nematicidal cyclopeptides from Omphalotus olearius. European J Org Chem. 2009;(8):1256-1262. doi:10.1002/ejoc.200801068 104. Wawrzyn GT, Quin MB, Choudhary S, López-Gallego F, Schmidt-Dannert C. Draft genome of Omphalotus olearius provides a predictive framework for sesquiterpenoid natural product biosynthesis in basidiomycota. Chem Biol. 2012;19(6):772-783. doi:10.1016/j.chembiol.2012.05.012 105. Ramm S, Krawczyk B, Mühlenweg A, Poch A, Mçsker E, Süssmuth RD. A self- sacrificing N-methyltransferase is the precursor of the fungal natural product omphalotin. Angew Chemie - Int Ed. 2017;56(33):9994-9997. doi:10.1002/anie.201703488 106. Bills GF, Gloer JB. Biologically active secondary metabolites from the fungi. In: 215 The Fungal Kingdom. ; 2016:1087-1119. doi:10.1128/microbiolspec.funk-0009- 2016 107. Kupfer DM, Drabenstot SD, Buchanan KL, et al. Introns and splicing elements of five diverse fungi. Eukaryot Cell. 2004;3(5):1088-1100. doi:10.1128/EC.3.5.1088- 1100.2004 108. Aly AH, Debbab A, Proksch P. Fifty years of drug discovery from fungi. Fungal Divers. 2011;50:3-19. doi:10.1007/s13225-011-0116-y 109. Nagano N, Umemura M, Izumikawa M, et al. Class of cyclic ribosomal peptide synthetic genes in filamentous fungi. Fungal Genet Biol. 2016;86:58-70. doi:10.1016/j.fgb.2015.12.010 110. Hallen HE, Luo H, Scott-Craig JS, Walton JD. Gene family encoding the major toxins of lethal Amanita mushrooms. Proc Natl Acad Sci USA. 2007;104(48):19097- 19101. doi:10.1073/pnas.0707340104 111. Stadler M, Hoffmeister D. Fungal natural products-the mushroom perspective. Front Microbiol. 2015;6:1-4. doi:10.3389/fmicb.2015.00127 112. Umemura M, Nagano N, Koike H, et al. Characterization of the biosynthetic gene cluster for the ribosomally synthesized cyclic peptide ustiloxin B in Aspergillus flavus. Fungal Genet Biol. 2014;68:23-30. doi:10.1016/j.fgb.2014.04.011 113. Song H, Velden NS Van Der, Shiran SL, et al. A molecular mechanism for the enzymatic methylation of nitrogen atoms within peptide bonds. Sci Adv. 2018;4(8):eaat2720-eaat2720. doi:10.1126/sciadv.aat2720 114. Ongpipattanakul C, Nair SK. Molecular basis for autocatalytic backbone N- methylation in RiPP natural product biosynthesis. ACS Chem Biol. 2018;13(10):2989-2999. doi:10.1021/acschembio.8b00668 115. Quijano MR, Zach C, Miller FS, et al. Distinct autocatalytic α-N-methylating precursors expand the borosin RiPP family of peptide natural products. J Am Chem Soc. 2019;141(24):9637-9644. doi:10.1021/jacs.9b03690 116. Kharwar RN, Mishra A, Gond SK, Stierle A, Stierle D. Anticancer compounds 216 derived from fungal endophytes: Their importance and future challenges. Nat Prod Rep. 2011;28(7):1208-1228. doi:10.1039/c1np00008j 117. Umemura M, Koike H, Nagano N, et al. MIDDAS-M: Motif-independent de novo detection of secondary metabolite gene clusters through the integration of genome sequencing and transcriptome data. PLoS One. 2013;8(12):e84028-e84028. doi:10.1371/journal.pone.0084028 118. Ye Y, Minami A, Igarashi Y, et al. Unveiling the biosynthetic pathway of the ribosomally synthesized and post-translationally modified peptide ustiloxin B in filamentous fungi. Angew Chemie - Int Ed. 2016;55(28):8072-8075. doi:10.1002/anie.201602611 119. Ortega MA, van der Donk WA. New insights into the biosynthetic logic of ribosomally synthesized and post-translationally modified peptide natural products. Cell Chem Biol. 2016;23(1):31-44. doi:10.1016/j.chembiol.2015.11.012 120. van Heel AJ, de Jong A, Montalbán-López M, Kok J, Kuipers OP. BAGEL3: Automated identification of genes encoding bacteriocins and (non-)bactericidal posttranslationally modified peptides. Nucleic Acids Res. 2013;41:448-453. doi:10.1093/nar/gkt391 121. Tietz JI, Schwalen CJ, Patel PS, et al. A new genome-mining tool redefines the lasso peptide biosynthetic landscape. Nat Chem Biol. 2017;13(5):470-478. doi:10.1038/nchembio.2319 122. Mohimani H, Kersten RD, Liu WT, et al. Automated genome mining of ribosomal peptide natural products. ACS Chem Biol. 2014;9(7):1545-1551. doi:10.1021/cb500199h 123. Kirkpatrick CL, Broberg CA, McCool EN, et al. The “PepSAVI-MS” pipeline for natural product bioactive peptide discovery. Anal Chem. 2017;89(2):1194-1201. doi:10.1021/acs.analchem.6b03625 124. Galagan JE, Henn MR, Ma LJ, Cuomo CA, Birren B. Genomics of the fungal kingdom: Insights into eukaryotic biology. Genome Res. 2005;15(12):1620-1631. 217 doi:10.1101/gr.3767105 125. Rogozin IB, Carmel L, Csuros M, Koonin E V. Origin and evolution of spliceosomal introns. Biol Direct. 2012;7:1-28. doi:10.1186/1745-6150-7-11 126. Rogozin IB, Sverdlov A V., Babenko VN, Koonin E V. Analysis of evolution of exon-intron structure of eukaryotic genes. Brief Bioinform. 2005;6(2):118-134. doi:10.1093/bib/6.2.118 127. Ye Y, Ozaki T, Umemura M, Liu C, Minami A, Oikawa H. Heterologous production of asperipin-2a: Proposal for sequential oxidative macrocyclization by a fungi- specific DUF3328 oxidase. Org Biomol Chem. 2019;17(1):39-43. doi:10.1039/c8ob02824a 128. Xu C, Min J. Structure and function of WD40 domain proteins. Protein Cell. 2011;2(3):202-214. doi:10.1007/s13238-011-1018-1 129. Le Marquer M, San Clemente H, Roux C, Savelli B, Frei Dit Frey N. Identification of new signalling peptides through a genome-wide survey of 250 fungal secretomes. BMC Genomics. 2019;20(1):1-15. doi:10.1186/s12864-018-5414-2 130. Rawlings ND, Barrett AJ, Bateman A. MEROPS: The peptidase database. Nucleic Acids Res. 2009;38:325-331. doi:10.1093/nar/gkp971 131. Zimmermann L, Stephens A, Nam SZ, et al. A completely reimplemented MPI bioinformatics toolkit with a new HHpred server at its core. J Mol Biol. 2018;430(15):2237-2243. doi:10.1016/j.jmb.2017.12.007 132. Kelley LA, Mezulis S, Yates, Christopher M, Wass MN, Sternberg MJ. The Phyre2 web portal for protein modeling, prediction and analysis. Nat Protoc. 2016;10(6):845-858. doi:10.1038/nprot.2015-053 133. Katti M V., Sami-Subbu R, Ranjekar PK, Gupta VS. Amino acid repeat patterns in protein sequences: Their diversity and structural-functional implications. Protein Sci. 2000;9(6):1203-1209. doi:10.1110/ps.9.6.1203 134. Vetting MW, Hegde SS, Fajardo JE, et al. Pentapeptide repeat proteins. Biochemistry. 2006;45(1):1-10. doi:10.1021/bi052130w 218 135. Li YF, Tsai KJS, Harvey CJB, et al. Comprehensive curation and analysis of fungal biosynthetic gene clusters of published natural products. Fungal Genet Biol. 2016;89:18-28. doi:10.1016/j.fgb.2016.01.012 136. Ványolós A, Dékány M, Kovács B, et al. Gymnopeptides A and B, cyclic octadecapeptides from the mushroom Gymnopus fusipes. Org Lett. 2016;18(11):2688-2691. doi:10.1021/acs.orglett.6b01158 137. Pan Z, Wu C, Wang W, et al. Total synthesis and stereochemical assignment of gymnopeptides A and B. Org Lett. 2017;19(17):4420-4423. doi:10.1021/acs.orglett.7b01742 138. Boulin T, Bessereau JL. Mos1-mediated insertional mutagenesis in Caenorhabditis elegans. Nat Protoc. 2007;2(5):1276-1287. doi:10.1038/nprot.2007.192 139. Gurgui C, Piel J. Metagenomic approaches to identify and isolate bioactive natural products from microbiota of marine sponges. In: Streit WR, Daniel R, eds. Methods in Molecular Biology. Vol 668. Totowa, NJ: Humana Press; 2010:247-264. doi:10.1007/978-1-60761-823-2_22 140. Obrecht D, Chevalier E, Moehle K, Robinson JA. β-Hairpin protein epitope mimetic technology in drug discovery. Drug Discov Today Technol. 2012;9(1):e63-e69. doi:10.1016/j.ddtec.2011.07.006 141. Laufer B, Chatterjee J, Frank AO, Kessler H. Can N-methylated amino acids serve as substitutes for prolines in conformational design of cyclic pentapeptides? J Pept Sci. 2009;15(3):141-146. doi:10.1002/psc.1076 142. Blackwell M. The fungi: 1, 2, 3 ... 5.1 million species? Am J Bot. 2011;98(3):426- 438. doi:10.3732/ajb.1000298 143. Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol Biol Evol. 2013;30(4):772-780. doi:10.1093/molbev/mst010 144. Huelsenbeck JP, Ronquist F. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics. 2001;17(8):754-755. doi:10.1093/bioinformatics/17.8.754 219 145. Ho SN, Hunt HD, Horton RM, Pullen JK, Pease LR. Site-directed mutagenesis by overlap extension using the polymerase chain reaction. Gene. 1989;77(1):51-59. doi:10.1016/0378-1119(89)90358-2 146. Cubero OF, Crespo A, Fatehi J, Bridge PD. DNA extraction and PCR amplification method suitable for fresh , herbarium-stored , lichenized , and other fungi. Plant Syst Evol. 1999;216:243-249. doi:10.1007/BF01084401 147. Tsukui T, Nagano N, Umemura M, et al. Ustiloxins, fungal cyclic peptides, are ribosomally synthesized in Ustilaginoidea virens. Bioinformatics. 2015;31(7):981- 985. doi:10.1093/bioinformatics/btu753 148. Zhang Y, Li K, Yang G, Mcbride JL, Bruner SD, Ding Y. A distributive peptide cyclase processes multiple microviridin core peptides within a single polypeptide substrate. Nat Commun. 2018;9(1780):1-10. doi:10.1038/s41467-018-04154-3 149. Sugimoto K, Senda T, Aoshima H, Masai E, Fukuda M, Mitsui Y. Crystal structure of an aromatic ring opening dioxygenase LigAB, a protocatechuate 4,5- dioxygenase, under aerobic conditions. Structure. 1999;7(8):953-965. doi:10.1016/S0969-2126(99)80122-1 150. Stadtwald R. Rhodospirillum centenum, sp. nov., a thermotolerant cyst-forming anoxygenic photosynthetic bacterium. Antonie Van Leeuwenhoek. 1989;55:291- 296. doi:10.1007/BF00393857 151. Lu YK, Marden J, Han M, et al. Metabolic flexibility revealed in the genome of the cyst-forming α-1 proteobacterium Rhodospirillum centenum. BMC Genomics. 2010;11(1):1-12. doi:10.1186/1471-2164-11-325 152. Ho Y-SJ. Structure of the GAF domain, a ubiquitous signaling motif and a new class of cyclic GMP receptor. EMBO J. 2000;19(20):5288-5299. doi:10.1093/emboj/19.20.5288 153. Frey S, Görlich D. A new set of highly efficient, tag-cleaving proteases for purifying recombinant proteins. J Chromatogr A. 2014;1337:95-105. doi:10.1016/j.chroma.2014.02.029 220 154. Giansanti P, Tsiatsiani L, Low TY, Heck AJR. Six alternative proteases for mass spectrometry-based proteomics beyond trypsin. Nat Protoc. 2016;11(5):993-1006. doi:10.1038/nprot.2016.057 155. Baltz RH. Renaissance in antibacterial discovery from actinomycetes. Curr Opin Pharmacol. 2008;8(5):557-563. doi:10.1016/j.coph.2008.04.008 156. Doroghazi JR, Albright JC, Goering AW, et al. A roadmap for natural product discovery based on large-scale genomics and metabolomics. Nat Chem Biol. 2014;10(11):963-968. doi:10.1038/nCHeMBIO.1659 157. Philmus B, Christiansen G, Yoshida WY, Hemscheidt TK. Post-translational modification in microviridin biosynthesis. Chembiochem. 2008;9(18):3066-3073. doi:10.1002/cbic.200800560 158. Onaka H, Nakaho M, Hayashi K, Igarashi Y, Furumai T. Cloning and characterization of the goadsporin biosynthetic gene cluster from Streptomyces sp. TP-A0584. Microbiology. 2005;151(12):3923-3933. doi:10.1099/mic.0.28420-0 159. Onaka H, Tabata H, Igarashi Y, Sato Y, Furumai T. Goadsporin, a chemical substance which promotes secondary metabolism and morphogenesis in streptomycetes. I. Purification and characterization. J Antibiot (Tokyo). 2001;54(12):1036-1044. doi:10.7164/antibiotics.54.1036 160. Yang J, Kulkarni K, Manolaridis I, et al. Mechanism of isoprenylcysteine carboxyl methylation from the crystal structure of the integral membrane methyltransferase ICMT. Mol Cell. 2011;44(6):997-1004. doi:10.1016/j.molcel.2011.10.020 161. Hau HH, Gralnick JA. Ecology and biotechnology of the genus Shewanella. Annu Rev Microbiol. 2007;61(1):237-258. doi:10.1146/annurev.micro.61.080706.093257 162. Duchin S, Vershinin Z, Levy D, Aharoni A. A continuous kinetic assay for protein and DNA methyltransferase enzymatic activities. Epigenetics and Chromatin. 2015;8(1):1-9. doi:10.1186/s13072-015-0048-y 163. Bar-Even A, Noor E, Savir Y, et al. The moderately efficient enzyme: Evolutionary and physicochemical trends shaping enzyme parameters. Biochemistry. 221 2011;50(21):4402-4410. doi:10.1021/bi2002289 164. Kitagawa M, Ara T, Arifuzzaman M, et al. Complete set of ORF clones of Escherichia coli ASKA library (A complete set of E. coli K-12 ORF archive): unique resources for biological research. DNA Res. 2005;12(5):291-299. doi:10.1093/dnares/dsi012 165. Kamat SS, Bagaria A, Kumaran D, et al. Catalytic mechanism and three- dimensional structure of adenine deaminase. Biochemistry. 2011;50(11):1917-1927. doi:10.1021/bi101788n 166. Heidelberg JF, Paulsen IT, Nelson KE, et al. Genome sequence of the dissimilatory metal ion-reducing bacterium Shewanella oneidensis. Nat Biotechnol. 2002;20(11):1118-1123. doi:10.1038/nbt749 167. Luo Y, Cobb RE, Zhao H. Recent advances in natural product discovery. Curr Opin Biotechnol. 2014;30:230-237. doi:10.1016/j.copbio.2014.09.002 168. Hegemann JD, Zimmermann M, Zhu S, Klug D, Marahiel MA. Lasso peptides from proteobacteria: Genome mining employing heterologous expression and mass spectrometry. Biopolymers. 2013;100(5):527-542. doi:10.1002/bip.22326 169. Long PF, Dunlap WC, Battershill CN, Jaspars M. Shotgun cloning and heterologous expression of the patellamide gene cluster as a strategy to achieving sustained metabolite production. ChemBioChem. 2005;6(10):1760-1765. doi:10.1002/cbic.200500210 170. Völler GH, Krawczyk JM, Pesic A, Krawczyk B, Nachtigall J, Süssmuth RD. Characterization of new class III lantibiotics: Erythreapeptin, avermipeptin and griseopeptin from Saccharopolyspora erythraea, Streptomyces avermitilis and Streptomyces griseus demonstrates stepwise N-terminal leader processing. ChemBioChem. 2012;13(8):1174-1183. doi:10.1002/cbic.201200118 171. Chen S, Xu B, Chen E, et al. Zn-dependent bifunctional proteases are responsible for leader peptide processing of class III lanthipeptides. Proc Natl Acad Sci USA. 2019;116(7):2533-2538. doi:10.1073/pnas.1815594116 222 172. Liu X, De Wulf P. Probing the ArcA-P modulon of Escherichia coli by whole genome transcriptional analysis and sequence recognition profiling. J Biol Chem. 2004;279(13):12588-12597. doi:10.1074/jbc.M313454200 173. Gao H, Wang X, Yang ZK, Palzkill T, Zhou J. Probing regulon of ArcA in Shewanella oneidensis MR-1 by integrated genomic analyses. BMC Genomics. 2008;9(42):1-17. doi:10.1186/1471-2164-9-42 174. Lassak J, Henche AL, Binnenkade L, Thormann KM. ArcS, the cognate sensor kinase in an atypical Arc system of Shewanella oneidensis MR-1. Appl Environ Microbiol. 2010;76(10):3263-3274. doi:10.1128/AEM.00512-10 175. Gralnick JA, Brown CT, Newman DK. Anaerobic regulation by an atypical Arc system in Shewanella oneidensis. Mol Microbiol. 2005;56(5):1347-1357. doi:10.1111/j.1365-2958.2005.04628.x 176. Ahn S, Jung J, Jang IA, Madsen EL, Park W. Role of glyoxylate shunt in oxidative stress response. J Biol Chem. 2016;291(22):11928-11938. doi:10.1074/jbc.M115.708149 177. Hengge R. Principles of c-di-GMP signalling in bacteria. Nat Rev Microbiol. 2009;7(4):263-273. doi:10.1038/nrmicro2109 178. Gambari C, Boyeldieu A, Armitano J, Méjean V, Jourlin-Castelli C. Control of pellicle biogenesis involves the diguanylate cyclases PdgA and PdgB, the c-di-GMP binding protein MxdA and the chemotaxis response regulator CheY3 in Shewanella oneidensis. Environ Microbiol. 2019;21(1):81-97. doi:10.1111/1462-2920.14424 179. Thormann KM, Thormann KM, Duttler S, et al. Control of formation and cellular detachment from Shewanella oneidensis MR-1 biofilms by cyclic di-GMP. J Bacteriol. 2006;188(7):2681-2691. doi:10.1128/JB.188.7.2681 180. Chao L, Rakshe S, Leff M, Spormanna AM. PdeB, a cyclic di-GMP-specific phosphodiesterase that regulates Shewanella oneidensis MR-1 motility and biofilm formation. J Bacteriol. 2013;195(17):3827-3833. doi:10.1128/JB.00498-13 181. Price MN, Wetmore KM, Waters RJ, et al. Mutant phenotypes for thousands of 223 bacterial genes of unknown function. Nature. 2018;557(7706):503-509. doi:10.1038/s41586-018-0124-0 182. Wu G, Jin F. Pellicle development of Shewanella oneidensis is an aerotaxis-piloted and energy-dependent process. Biochem Biophys Res Commun. 2019;519(1):127- 133. doi:10.1016/j.bbrc.2019.08.144 183. Armitano J, Méjean V, Jourlin-Castelli C. Aerotaxis governs floating biofilm formation in Shewanella oneidensis. Environ Microbiol. 2013;15(11):3108-3118. doi:10.1111/1462-2920.12158 184. Liang Y, Gao H, Guo X, et al. Transcriptome analysis of pellicle formation of Shewanella oneidensis. Arch Microbiol. 2012;194(6):473-482. doi:10.1007/s00203- 011-0782-x 185. Plate L, Marletta MA. Nitric oxide modulates bacterial biofilm formation through a multicomponent cyclic-di-GMP signaling network. Mol Cell. 2012;46(4):449-460. doi:10.1016/j.molcel.2012.03.023 186. Liang Y, Gao H, Chen J, et al. Pellicle formation in Shewanella oneidensis. BMC Microbiol. 2010;10(291):1-11. doi:10.1186/1471-2180-10-291 187. Paulick A, Delalez NJ, Brenzinger S, et al. Dual stator dynamics in the Shewanella oneidensis MR-1 flagellar motor. Mol Microbiol. 2015;96(5):993-1001. doi:10.1111/mmi.12984 188. Yuan J, Chen Y, Zhou G, Chen H, Gao H. Investigation of roles of divalent cations in Shewanella oneidensis pellicle formation reveals unique impacts of insoluble iron. Biochim Biophys Acta - Gen Subj. 2013;1830(11):5248-5257. doi:10.1016/j.bbagen.2013.07.023 189. Li Y, Rebuffat S. The manifold roles of microbial ribosomal peptide-based natural products in physiology and ecology. J Biol Chem. 2020;295(1):34-54. doi:10.1074/jbc.REV119.006545 190. Ayikpoe R, Govindarajan V, Latham JA. Occurrence, function, and biosynthesis of mycofactocin. Appl Microbiol Biotechnol. 2019;103(7):2903-2912. 224 doi:10.1007/s00253-019-09684-4 191. Groh JL, Luo Q, Ballard JD, Krumholz LR. Genes that enhance the ecological fitness of Shewanella oneidensis MR-1 in sediments reveal the value of antibiotic resistance. Appl Environ Microbiol. 2007;73(2):492-498. doi:10.1128/AEM.01086- 06 192. Simões I, Faro R, Bur D, Kay J, Faro C. Shewasin A, an active pepsin homolog from the bacterium Shewanella amazonensis. FEBS J. 2011;278(17):3177-3186. doi:10.1111/j.1742-4658.2011.08243.x 193. Leal AR, Cruz R, Bur D, et al. Enzymatic properties, evidence for in vivo expression, and intracellular localization of shewasin D, the pepsin homolog from Shewanella denitrificans. Sci Rep. 2016;6:1-12. doi:10.1038/srep23869 194. Saltikov CW, Newman DK. Genetic identification of a respiratory arsenate reductase. Proc Natl Acad Sci USA. 2003;100(19):10983-10988. doi:10.1073/pnas.1834303100 195. Saiki RK, Gelfand DH, Stoffel S, et al. Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science. 1988;239(4839):487-491. doi:jstor.org/stable/1700278 196. Banko G, Demain AL, Wolfe S. δ-(L-α-Aminoadipy1)-L-cysteiny1-D-va1ine Synthetase (ACV Synthetase): A multifunctional enzyme with broad substrate specificity for the synthesis of penicillin and cephalosporin precursors. J Am Chem Soc. 1987;109(9):2858-2860. doi:10.1021/ja00243a068 197. Schofield CJ, Baldwin JE, Byford MF, et al. Proteins of the penicillin biosynthesis pathway. Curr Opin Struct Biol. 1997;7(6):857-864. doi:10.1016/S0959- 440X(97)80158-3 198. Smanski MJ, Zhou H, Claesen J, Shen B, Fischbach MA, Voigt CA. Synthetic biology to access and expand nature’s chemical diversity. Nat Rev Microbiol. 2016;14(3):135-149. doi:10.1038/nrmicro.2015.24 199. Brakhage AA. Molecular regulation of β-lactam biosynthesis in filamentous fungi. 225 Microbiol Mol Biol Rev. 1998;62(3):547-585. doi:9729600 200. Temme K, Zhao D, Voigt CA. Refactoring the nitrogen fixation gene cluster from Klebsiella oxytoca. Proc Natl Acad Sci USA. 2012;109(18):7085-7090. doi:10.1073/pnas.1120788109 201. Smanski MJ, Bhatia S, Zhao D, et al. Functional optimization of gene clusters by combinatorial design and assembly. Nat Biotechnol. 2014;32(12):1241-1249. doi:10.1038/nbt.3063 202. Martin VJJ, Pitera DJ, Withers ST, Newman JD, Keasling JD. Engineering a mevalonate pathway in Escherichia coli for production of terpenoids. Nat Biotechnol. 2003;21(7):796-802. doi:10.1038/nbt833 203. Tianero MD, Pierce E, Raghuraman S, et al. Metabolic model for diversity- generating biosynthesis. Proc Natl Acad Sci USA. 2016;113(7):1772-1777. doi:10.1073/pnas.1525438113 204. Sardar D, Pierce E, McIntosh JA, Schmidt EW. Recognition sequences and substrate evolution in cyanobactin biosynthesis. ACS Synth Biol. 2015;4(2):167-176. doi:10.1021/sb500019b 205. Donia MS, Hathaway BJ, Sudek S, et al. Natural combinatorial peptide libraries in cyanobacterial symbionts of marine ascidians. Nat Chem Biol. 2006;2(12):729-735. doi:10.1038/nchembio829 206. Mo T, Liu WQ, Ji W, et al. Biosynthetic insights into linaridin natural products from genome mining and precursor peptide mutagenesis. ACS Chem Biol. 2017;12(6):1484-1488. doi:10.1021/acschembio.7b00262 207. Oman TJ, van der Donk WA. Follow the leader: the use of leader peptides to guide natural product biosynthesis. Nat Chem Biol. 2010;6(1):9-18. doi:10.1038/nchembio.286 208. Li B, Sher D, Kelly L, et al. Catalytic promiscuity in the biosynthesis of cyclic peptide secondary metabolites in planktonic marine cyanobacteria. Proc Natl Acad Sci USA. 2010;107(23):10430-10435. doi:10.1073/pnas.0913677107 226 209. Molloy EM, Field D, O’Connor PM, Cotter PD, Hill C, Ross RP. Saturation mutagenesis of lysine 12 leads to the identification of derivatives of nisin A with enhanced antimicrobial activity. PLoS One. 2013;8(3):e58530-e58530. doi:10.1371/journal.pone.0058530 210. Deane CD, Melby JO, Molohon KJ, Susarrey AR, Mitchell DA. Engineering unnatural variants of plantazolicin through codon reprogramming. ACS Chem Biol. 2013;8(9):1998-2008. doi:10.1021/cb4003392 211. Yang X, Lennard KR, He C, et al. A lanthipeptide library used to identify a protein– protein interaction inhibitor. Nat Chem Biol. 2018;14:375-380. doi:10.1038/s41589- 018-0008-5 212. Jang SA, Kim H, Lee JY, et al. Mechanism of action and specificity of antimicrobial peptides designed based on buforin IIb. Peptides. 2012;34(2):283-289. doi:10.1016/j.peptides.2012.01.015 213. Piscotta FJ, Tharp JM, Liu WR, Link AJ. Expanding the chemical diversity of lasso peptide MccJ25 with genetically encoded noncanonical amino acids. Chem Commun. 2015;51(2):409-412. doi:10.1039/c4cc07778d 214. Lee J, Mcintosh J, Hathaway BJ, Schmidt EW. Using marine natural products to discover a protease that catalyzes peptide macrocyclization of diverse substrates. J Am Chem Soc. 2009;131:2122-2124. doi:10.1021/ja8092168 215. McIntosh JA, Robertson CR, Agarwal V, Nair SK, Bulaj GW, Schmidt EW. Circular logic: Nonribosomal peptide-like macrocyclization with a ribosomal peptide catalyst. J Am Chem Soc. 2010;132(44):15499-15501. doi:10.1021/ja1067806 216. Ruffner DE, Schmidt EW, Heemstra JR. Assessing the combinatorial potential of the RiPP cyanobactin tru pathway. ACS Synth Biol. 2015;4(4):482-492. doi:10.1021/sb500267d 217. Burkhart BJ, Kakkar N, Hudson GA, Van Der Donk WA, Mitchell DA. Chimeric leader peptides for the generation of non-natural hybrid RiPP products. ACS Cent 227 Sci. 2017;3(6):629-638. doi:10.1021/acscentsci.7b00141 218. Sardar D, Lin Z, Schmidt EW. Modularity of RiPP enzymes enables designed synthesis of decorated peptides. Chem Biol. 2015;22(7):907-916. doi:10.1016/j.chembiol.2015.06.014 219. Reyna-González E, Schmid B, Petras D, Süssmuth RD, Dittmann E. Leader peptide- free in vitro reconstitution of microviridin biosynthesis enables design of synthetic protease-targeted libraries. Angew Chemie - Int Ed. 2016;55(32):9398-9401. doi:10.1002/anie.201604345 220. Oman TJ, Knerr PJ, Bindman NA, Velásquez JE, Van Der Donk WA. An engineered lantibiotic synthetase that does not require a leader peptide on its substrate. J Am Chem Soc. 2012;134(16):6952-6955. doi:10.1021/ja3017297 221. Li K, Condurso HL, Li G, Ding Y, Bruner SD. Structural basis for precursor protein- directed ribosomal peptide macrocyclization. Nat Chem Biol. 2016;12(11):973-979. doi:10.1038/nchembio.2200 222. Morita M, Hao Y, Jokela JK, et al. Post-translational tyrosine geranylation in cyanobactin biosynthesis. J Am Chem Soc. 2018;140(19):6044-6048. doi:10.1021/jacs.8b03137 223. Hao Y, Pierce E, Roe D, et al. Molecular basis for the broad substrate selectivity of a peptide prenyltransferase. Proc Natl Acad Sci USA. 2016;113(49):14037-14042. doi:10.1073/pnas.1609869113 224. Bartholomae M, Buivydas A, Viel JH, Montalbán-López M, Kuipers OP. Major gene-regulatory mechanisms operating in ribosomally synthesized and post- translationally modified peptide (RiPP) biosynthesis. Mol Microbiol. 2017;106(2):186-206. doi:10.1111/mmi.13764 228 9 Appendix 1: Supplemental information for Chapter 2 Table 9.1 Sequences of primers, genes, and proteins in this study. Precursors, with few exceptions, are signified by the first letter of the encoding organism’s genus, followed by two letters of the species, ‘M’ signifying the methyltransferase domain, and ‘A’ as recommended for RiPP precursors.48 A: ‘Primers’ contains all of the primer names and sequences used in this study. B: ‘Genes’, cloning vectors, flanking restriction sites, and full coding regions of all expressed genes in this study are listed. C: ‘Protein sequences’ lists the complete protein names, protein IDs, coding sequences, and producer organisms for all the putative borosins described in this study. D: ‘Protein sequences for alignment’ give all the protein sequence boundaries used in the sequence alignments. Please find the remaining tabs in the supplemental table S1 of the online version of this paper. These remaining tabs provide genomic information available on JGI, including InterProScan matches, of identified open reading frames for all the gene clusters depicted in Figure 9.2; borosin precursor information is highlighted in light yellow, while gene annotations for five genes upstream and downstream are highlighted in light orange. A Primers used in this study Primer name Primer sequence (5' to 3') Fwd_SGI_Order_Gibson PCR AATTTTGTTTAACTTTAAGAAGGAGATATACCATGGGTAGCAGT C Rev_SGI_Order_GibsonP CR CTGTTCGACTTAAGCATTATGCGGCCG T7_fw TAATACGACTCACTATAGGG T7_rv GCTAGTTATTGCTCAGCGG Rev_DuetDown1 GATTATGCGGCCGTGTACAA Phlgi232Exon2seqfw CAGAGGACTACATGTTT GibPhlgiNMT232.1F CCGCGCGGCAGCCATATGTCTTCCGCTTCAAGTGACTCG PhlgiNMT232.1R CACAGCGTTCAACATTGTCTCGGCCATCTGGACGTACG PhlgiNMT232.2Fnew CGTACGTCCAGATGGCCGAGACAATGTTGAACGCTGTG PhlgiNMT232.2R GGAGCTTGACCTTGGAGTTCTCAAAGTCGATCTTAGAGACACCG PhlgiNMT232.3abF CGGTGTCTCTAAGATCGACTTTGAGAACTCCAAGGTCAAGCTCC TAG PhlgiNMT232.3aR CATGCAAGGAATGCCTGCGAC PhlgiNMT232.3bR GCAAATACAAAAATGAACACGACATGGTCGGGGACGGG GibPhlgiNMT232.3aR CGGAGCTCGAATTCGGATCCTTACATGCAAGGAATGCCTG PhlgiNMT232B.4fnew CCCGTCCCCGACCATGTCGTGTTCATTTTTGTATTTGC GibPhlgiNMT232.4bR CGGAGCTCGAATTCGGATCCTTATTTTACAAGGTCGAATTTCTTC TGTACAATTTTGC PhlgiNMT232.1F ATGTCTTCCGCTTCAAGTGACTCG PhlgiNMT232.4bR CTATTTTACAAGGTCGAATTTCTTCTGTACAATTTTGC 229 prFM1118 GACCATGTAGCGTTCGCTGTGCCCGTCCCCGACCATGTCGCAGG CATTCCTTGCATGTAA prFM1119 AGATGCAGAGGTAGGAGTGCCGAAAGCAACATGGTCGAGGGAC GCGGGA prFM1116 GGCACTCCTACCTCTGCATCT prFM1117 GAACGCTACATGGTCGAGGGATGCAGGGACCGGAGCGGCGAAC GCTACGTGGTCGAGGGA ledA_fwd ATGGAGACTCCTACCTTAAAC ledA_rev TCAGGCGCTACTAACAACAG Boro78AF-YVQMAE TAYRTNCARATGGCNGA Boro78AF-YTQMAE TAYACNCARATGGCNGA Boro78AF-YVQMSE TAYRTNCARATGTCNGA Boro78BF- YVQMC*SE_711 TAYRTNCARATGWGYGA Boro78BF- YTQMSE_715 TAYACNCARATGTCNGA Boro78BF- YTQMC*SE_714 TAYACNCARATGWGYGA 96-FYGHPG-F TTYTAYGGNCAYCCNGG Boro121R-VVHI*MG*A CNATRTARTGNACNAC Boro121R-VVHYVG*A CNACRTARTGNACNAC Core1-VAVVGV-R1 ACNCCNACNACNGCRAC Core1- VAVVGV- R2 ACNCCNACNACNGCYAC Core2- VGAVAV- R1 ACNGCRACNGCNCCNAC Core2- VGAVAV- R2 ACNGCYACNGCNCCNAC GyfWlk_R CGATAGCTCGCTGTGAAGGAT GyfWlk_F CGAGCACGAATATGGCGCTGAT GyfWlk2_R GCTTGGTAACCCTCACTTT GyfWlk2_F AATTCAAAATTTGGGGTACTTCTC GyfInt_F ATCCTTCACAGCGAGCTATCG GyfInt_R ATCAGCGCCATATTCGTGCTCG GymWalk_R ATCCTTCACAGCGAGCTATCG GymWalk_F ATCAGCGCCATATTCGTGCTCG GymA-Exon1_F2 CCGCGCGGCAGCCATATGCAAAGCTCTACCCAA GymA-Exon1_R2 ACCTCGGCCATCTGTATATA GymA-Exon2_F TATATACAGATGGCCGAGGTCATGCTGAGGGA GymA-Exon2_R GAGAAGTACCCCAAATTTTGAATTATTGAATTTTACAAACGTGA AGTCTGC GymA-Exon3_F2 AATTCAAAATTTGGGGTACTTCTC GymA-Exon3_R2 CGGAGCTCGAATTCGGATCCAAAACCTACTGGTACACAAGT 230 B Gene sequences Gene name Expression vector used Flanking restriction sites in vector Full coding sequences of genes expressed in this manuscript aboMA pET28b NcoI, HindIII See cell below ATGGGTAGCAGTCATCACCACCACCATCATTCAAGCGGCTTAGTTCCTCGTGGTAGCATGTC ATCACCGGCGGTTGAAACCAAAGTTCCGGCATCACCTGATGTTACCGCAGAAGTGATTCCT GCACCTCCTAGCAGCCATCGTCCGTTACCTTTTGGTTTACGTCCGGGTAAACTGGTGATTGT TGGTAGCGGCATTGGCAGCATTGGCCAGTTCACCTTATCAGCAGTTGCGCACATCGAACAG GCAGATCGTGTGTTCTTTGTGGTGGCAGATCCGGCAACCGAAGCGTTCATTTACAGCAAGA ACAAGAACAGCGTGGACCTGTACAAGTTCTACGACGACAAGAAGCCGCGCATGGACACCT ACATCCAGATGGCAGAAGTGATGCTGCGTGAACTGAGAAAAGGCTATAGCGTGGTGGGCG TGATCTACGGTCATCCTGGCGTGTTTGTTACTCCGTCACATCGTGCAATCAGTATTGCGCGC GATGAGGGCTATAGCGCGAAAATGCTGCCTGGTGTTAGCGCAGAAGATAACCTGTTTGCGG ATATTGGCATCGACCCGTCACGTCCTGGCTGTCTGACCTATGAAGCGACTGATTTACTGCTG CGTAATCGTACCTTAGTTCCGAGCAGCCACCTGGTGCTGTTCCAGGTTGGCTGTATTGGTCT GAGCGATTTTCGCTTCAAAGGCTTCGACAACATCAACTTCGACGTGCTGCTGGACCGCCTG GAACAGGTGTATGGTCCGGATCATGCGGTTATTCACTATATGGCAGCGGTTTTACCGCAGA GCACCACCACCATTGATCGCTACACCATCAAGGAGCTGCGTGATCCTGTGATCAAAAAACG CATCACCGCGATCAGCACCTTCTACTTACCGCCGAAAGCACTGTCACCGCTGCACGAAGAA TCAGCAGCGAAATTAGGCCTGATGAAAGCGGGCTACAAGATCCTGGATGGTGCACAAGCG CCTTATCCGCCTTTTCCTTGGGCTGGTCCTAATGTTCCGATTGGTATTGCGTATGGTCGTCGT GAACTGGCAGCGGTGGCGAAACTGGATAGCCATGTTCCTCCGGCAAACTATAAACCTTTAC GTGCGAGCAATGCGATGAAGAGCACCATGATCAAGCTGGCGACCGACCCGAAGGCATTTG CACAATATAGCCGCAATCCGGCATTACTGGCGAATAGTACTCCGGGCTTAACTACCCCGGA GCGTAAAGCGTTACAAACCGGATCACAGGGCTTAGTGCGTTCAGTGATGAAGACTTCACCG GAGGATGTGGCGAAGCAGTTTGTTCAGGCAGAACTGCGTGATCCGACCCTGGCAAAACAGT ATAGCCAGGAATGCTACGACCAAACCGGCAATACCGATGGTATTGCGGTGATCAGCGCGTG GCTGAAAAGCAAAGGCTACGATACTACTCCGACCGCGATCAATGATGCATGGGCGGATATG CAGGCGAACTCACTGGATGTGTATCAGAGCACCTACAACACGATGGTGGATGGCAAAAGC GGTCCGGCAATCACCATCAAAAGCGGCGTGGTGTATATCGGCAATACCGTCGTGAAGAAGT TTGCGTTCAGCAAGAGCGTGCTGACTTGGAGCAGCACCGATGGTAATCCGTCATCAGCAAC CCTGTCATTTGTTGTGCTGACCGACGATGATGGTCAACCTCTGCCTGCGAACAGCTACATTG GTCCGCAGTTTACcGGtTTTTACTGGACCTCAGGTGCAAAACCAGCAGCGGCGAATACCTTA GGCCGTAATGGCGCATTTCCGTCAGGTGGTGGTGGTGGTTCAGGTGGTGGTGGTGGTTCAA GTTCACAAGGTGCAGATATTTCAACCTGGGTGGACAGCTACCAGACCTACGTaGTGACCACT GCGGGTTCATGGAAAGACGAAGATATTCTGAAGATCGACGACGATACCGCGCACACCATC ACCTATGGCCCGCTGAAGATCGTGAAGTATTCACTGAGCAATGATACCGTTAGCTGGAGCG CGACCGATGGTAACCCGTTCAACGCGGTGATCTTCTTCAAGGTGAATAAACCGACCAAAGC GAATCCGACCGCAGGCAACCAGTTTGTGGGCAAGAAGTGGTTACCGTCAGATCCTGCACCA GCAGCAGTTAATTGGACCGGCCTGATTGGTTCAACCGCAGATCCGAAAGGTACAGCAGCAG CAAATGCAACCGCATCAATGTGGAAGAGCATCGGCATCAATCTGGGTGTGGCAGTTAGCGC GATGGTTTTAGGTACTGCGGTGATCAAGGCGATTGGTGCAGCATGGGATAAAGGTAGCGCA GCGTGGAAAGCGGCAAAAGCAGCAGCGGATAAAGCGAAAAAAGACGCAGAAGCAGCGGA AAAAGATAGCGCGGTGGACGACGAGAAATTCGCGGACGAAGAACCTCCTGATCTGGAgGAg CTGCCGATTCCTGATGCAGATCCGCTGGTGGATGTTACCGATGTGGATGTGACCGATGTGG ATGTTACCGACGTcGAcGTTACCGATGTGGACGTGACCGATGTGGATGTGACCGATGTGGAT 231 GTGACCGATGTCGACGTGACCGATGTTGATGTGACCGATGTGGATGTGGTGGATGTGCTGG ATGTTGTGGTGATCTAA badMA pETDUET-1 NcoI, HindIII See cell below ATGGGTAGCAGTCATCACCACCACCACCACCATCATGCATCACACATGAGCACCACCACCA GCAATAATGCGGGCTCACTGACTATTGCGGGTAGCGGTATTGCATCAGTTGCGCACATTAC CCTGGAAACCTTATCGCACATCCGCGAAGCGGACAAGGTGTTCTACATCGTGTGTGATCCG GCAACTGAAGCGTTCATTCACGATAACGCGAAAGCGGAGGCGGTGGATTTAACCGTGTACT ACGACACCAACAAAGCGCGCTATGACAGCTATGTGCAAATGGCGGAGGTGATGCTGCAAG ATGTTCGTGGCGGCAAAGATGTTCTGGGCATCTTCTATGGTCATCCTGGCGTGTTCGTTTCA CCGTCACATCGTGCGCTGGCTATTGCACGTAGCGAAGGCTATAAAGCGAAAATGCTGCCGG GTGTTAGCGCAGAAGATTACCTGTTCCTGGAGTTCGACCCGTCGGTTCATGGTTGTGCAACC TTTGAAGCGACCGAATTGTTACTGCGCGAAAAACCGCTGAACACCACCATGCACAACATCA TCTGGCAAGTTGGCGCGGTTGGCGTTGATGACATGGTTTTCACCAACAGCAAGCTGCACGT TCTGGTTGATCGTCTGGAGAAGGACTTCGGCCCGGAACATCAGGTTGTGCACTATATTGGT GCGGTTTTACCGGGTTCACGTACCGTGATGGATACCTTCACGGTGGCGGATCTGTGCAAAG ATGATGTGGTGAAACAGTTCAACCCGTCGAGCACCCTGTACATTCCTCCGCGTAGCTTAGC GGCAAATTCAAGCGACATTGCAGCGTCATTAGGCGCAAAACCGGATCATCCGCTGGTTGAT CCGACCCTGTTTCCTCCTTTAAGATGGACCAAGTCAACCAGCCCTGAAGCACCTGCGTATGG TCCGCTGGAACAAGCAGCAGTTGCAGAATTAGCGAACCATAAAGTTCCGAGCCAGCACAA GGTTTTAGCGGCGTCACCGGCAATGCGCACCTTAGTTGCAGAACTGAATGTTGCGCTGCGC AAGAAATTAGCAGCAGACCCGAAGGCGTTTGCGGGTGGTAGAGAAGGTCTGACTGAAGTG GAGAAACTGGCAGTTGGTACTGGTAATGTGGGCACTATGGGCGCGGTTATGCGTGCATTAC CTGGTGGTGAACAAAGCACCGATATGGTTACTTCACCGGCGAGCATCGAGCAGCAATCACG TAGAGAAGCGTTCTTCCTGATCGTGCTGATTGTTTCAACCCGCATCCTG ceuMA2 pETDUET-1 NcoI, HindIII See cell below ATGGGTAGCAGTCATCACCACCACCACCACCATCATGCGTCTCACATGGCAACCACCAAAA CTGGTTCACTGACCATTGCGGGTAGCGGTATTGCGAGCGTTGCGCATATTACCCTGGAAGT GCTGTCATATCTCCAGGAAGCGGACAAAATCTACTACGCCATCGTGGACCCGGTTACCGAA GCGTTCATCCAGGATAAGAGCAAAGGCCGCTGCTTTGATTTACGCGTGTACTACGACAAGG ACAAGATGCGCAGCGAAACCTACGTGCAGATGAGCGAGGTGATGTTACGCGATGTTCGTAG CGGCTATAATGTTCTGGCGATCTTCTACGGCCATCCGGGCGTGTTTGTTTGTCCGACTCATC GTGCGATCAGCATTGCAAGAAGCGAAGGTTATACCGCGAAGATGCTGCCGGGTGTTAGCGC AGAAGACTATATGTTCAGCGACATCGGCTTTGATCCAGCAGTTCCGGGTTGTATGACCCAG GAAGCGACCTCACTGCTGATTTACAACAAGCAGCTTGATCCGAGCGTGCACAACATCATCT GGCAAGTTGGCAGCGTTGGCGTGGACAATATGGTGTTCGACAACAAGCAGTTCCACCTGTT GGTGGATCACTTAGAGCGCGATTTCGGCAGCATCCACAAGGTGATCCACTATGTTGGCGCG ATTATGCCGCAATCAGCAACCGTGATGGACGAGTACACCATCAGCGATCTGCGCAAAGAAG ATGTGGTGAAGAAGTTCACCACCACCAGCACCCTGTACATTCCGCCTCGCGAAATTGCGCC TGTTGATCAGCGCATTATGCAAGCGCTGGAATTTAGCGGCAATGGTGATCGCTACATGGCG CTGTCACAATTACGTGGCGTTCATGCGCGCAATAGCGGTTTATGCGCTTATGGTCCGGCAGA ACAAGCAGCGGTGGATAAACTGGATCATCATACCCCTCCGGACGATTACGAAGTGTTACGT GCATCACCGGCGATTCGCCGCTTTACCGAAGATTTAGCGCTGAAACCGGATCTGCGTAGCC GTTACAAAGAAGATCCGCTGAGCGTGCTGGATGCAATTCCTGGCTTAACCAGCCAGGAGAA ATTCGCGCTGAGCTTCGATAAACCTGGTCCGGTGTACAAGGTTATGCGTGCAACTCCAGCG GCGATTGCAGCTGGTCAGGAACATTCACTTGACGAAATTGCGGGTTCAGCAGATAGCGAAT CACCGGGTGCGTTAGCAACCACCATCGTGGTGATTGTGCATATTTAA cmaMA pET24a NdeI, NotI See cell below ATGGACCATCATCATCATCACCATCATCATGCGACCGCGAACCCGAAGGCGGGTCAGCTGA CCATCGTTGGTAGCGGCATCGCGAGCATTAACCACATGACCCTGCAAGCGGTGGCGTGCAT 232 TGAAACCGCGGACGTGGTTTGCTACGTGGTTGCGGATGGCGCGACCGAGGCGTTTATCCGT AAGAAAAACGAAAACTGCATTGACCTGTACCCGCTGTATAGCGAAACCAAGGAACGTACC GATACCTACATCCAGATGGCGGAATTCATGCTGAACCACGTGCGTGCGGGTAAAAACGTGG TGGGTGTGTTCTACGGTCATCCGGGCGTGTTTGTTTGCCCGACCCACCGTGCGATCTACATT GCGCGTAACGAGGGTTATCGTGCGGTTATGCTGCCGGGCCTGAGCGCGGAAGACTGCCTGT ATGCGGACCTGGGTATCGATCCGAGCACCGTGGGCTGCATTACCTACGAGGCGACCGATAT GCTGGTTTATAACCGTCCGCTGAACAGCAGCAGCCACCTGGTGCTGTACCAAGTGGGTATC GTTGGCAAGGCGGACTTTAAATTCGCGTATGATCCGAAGGAAAACCACCACTTTGGTAAAC TGATTGACCGTCTGGAGCTGGAATACGGCCCGGATCACACCGTGGTTCACTATATCGCGCC GATTTTTCCGACCGAGGAACCGGTTATGGAGCGTTTCACCATCGGTCAACTGAAGCTGAAA GAAAACAGCGATAAGATCGCGACCATTAGCACCTTTTACCTGCCGCCGAAGGCGCCGAGCG CGAAAGTGAGCCTGAACCGTGAGTTTCTGCGTAGCCTGAACATCGCGGACAGCCGTGATCC GATGACCCCGTTCCCGTGGAACCCGACCGCGGCGCCGTACGGTGAACGTGAAAAGAAAGT GATTCTGGAGCTGGAAAGCCATGTGCCGCCGCCGGGTTATCGTCCGCTGAAGAAAAACAGC GGCCTGGCGCAAGCGCTGGAAAAACTGAGCCTGGACACCCGTGCGCTGGCGGCGTGGAAG ACCGACCGTAAAGCGTACGCGGATAGCGTTAGCGGCCTGACCGACGATGAGCGTGATGCG CTGGCGAGCGGCAAGCATGCGCAGCTGAGCGGTGCGCTGAAAGAAGGTGGCGTGCCGATG AACCACGCGCAACTGACCTTCTTTTTCATCATTAGCAACCTGTAA cmiMA pET24a NdeI, NotI See cell below ATGATCCACCACCATCATCACCACCACCATGGTGCGAGCCTGGCGAAGAAAGGCCAGCTGA CCATTGTGGGTAGCGGCATCGCGAGCATTAGCCACCTGACCCTGCAAGCTGTGAGCGCGAT CGAAAACGCGGACATTGTTTGCTACGTGGTTGCGGATGGTGCGACCGAGGCGTTCATCCGT AAGAAAAACCCGAACAGCCTGGACCTGTACCACCTGTATGGCGAAGACAAACAGCGTACC GATACCTACATCCAAATGGCGGAGTTTATGCTGATTCGTGTGCGTCAGGGTCAAAACGTGG TTGGCGTTTTCTATGGTCACCCGGGCGTGTTTGTTTGCCCGACCCACCGTGCGCTGTACATT GCGCGTAGCGAAGGTTATAAAGCGCGTATGCTGCCGGGTCTGAGCGCGGAGGACTGCCTGT TTGCGGACCTGGGTATTGATCCGAGCAGCGTGGGCTGCGTTACCTACGAAGCGACCGATCT GCTGGTGTTTAAACGTCCGATCAACCCGGCGAGCCACCTGGTTCTGTACCAAGTGGGTATT GTTGGCAAGAGCAACTTCAAATTTGACTATACCAGCGATGAGAACATCCACTTCACCAAGC TGCTGGACCGTCTGGAGGAAGCGTACGGTCCGGAACACAGCGTTACCCACTATATTGCGCC GCTGTTTCCGACCGAGGACCCGATCGCGGAGGAATATACCATTGCGCAACTGCGTCTGCCG GAAATCCGTGATAAGATCCACACCATTAGCACCTTCTACGTGCCGCCGAAAACCAGCGAAA GCCTGATTTATGATGAGGTTCTGCTGGCGAGCCTGGGTGTGACCCACAAACCGAGCGTTCC GTATCCGTGGAACCCGGAGGCGACCCCGTATGGCCCGCGTGAAAAGAAAGCGATCGAACT GCTGGCGGAGCATGAACCGCCGAAGGGTTACCGTCCGCTGAAAGAACGTAGCGGCCTGCT GGCGGTGCTGGAGAAGCTGTGCCTGGAACCGCTGGAGATGAAGAAATACAACGAAGACCG TCAGGCGTATGCGGATGGTCTGAAAGGCCTGACCGAGAACGAAAAAGAGGCGCTGGTTAA AGGTGACCATCGTACCCTGGCGGGTGCGCTGAAAGTGGGTGATACCCCGACCAACCCGGCG GCGCTGGTTTTCACCTTTATCATTACCCGTCTGGATTAA cmuMA pETDUET-1 NcoI, HindIII See cell below ATGGGTAGCAGTCATCACCACCACCACCACCATCATGCATCACACATGCCTGCACCTCGTA AAGGCACTCTGACCATTGCTGGTAGCGGTATTGCGAGCATTGGCCATATTACCCTGGAAAC CCTGAGCCACATTCAGGGTGCGGACAAAATCCACTATGCGGTGACTGATCCGGCAACTGAA GCGTTCATCCTGGAGAAAAGCAAGGACAGCAGCAGCTGCTTTGATCTGGGCATCTACTACG ACAAGAACAAGATGCGCTACGAAACCTACGTGCAGATGTGCGAGGTGATGCTGCGCGATG TTAGAGGCGGCCATAATGTTCTGGGCATTTTTTACGGTCATCCGGGCGTGTTTGTTTCACCG ACCCATCGTGCGATTGCATTAGCGCGCGATGAAGGTTATACCGCAAAGATGCTGCCGGGTA TTAGCGCGGAAGACTATATGTTCAGCGATCTGGGCTTCGATCCGGCATTTCCTGGTTGTATG ACCCAGGAAGCGACCATCCTGTTAGTTCGTGGTCGTAAGTTAGATCCGAGCGTGCACAACA TCATCTGGCAAGTTGGCGGCGTTGGCGTTGATACTATGGTGTTCGATAATGCGAACTTCTAC 233 ATCCTGGTGGACCGCCTGGAAGAGGATCTGGGCCCGGATCATAAAGTGGTGCATTACATTG GTGCGGTTTTACCGCAAAGCACCGCCGTGATCGATGAGTTCACGGTTGCGGGTCTGCGTAA AGAAGAAGTGGTGAAACAGATTACCACCGTGAGCACCTTCTACTTACCGCCGCGTACCCTG CTGCATGCAGATCAGGATATGGTGCAGAAACTGGGTCTGTCAGATAGCTTAGGCAAACGTG CGGTGCACGTGTATCCGCGCACTAAATGGATCAATGCGGAATCACCTTCACCTCCTGCGTAT GGTCCGTTTGAACGTGCAGCGGTGGATCGTTTAGCGGATCACACTATTCCGAGCAATCACC TGTTTCTGCGTGGTTCACAGGCGCTGCGTCAACTGATGACCGATCTGGCATTACAACCGACT TTACGTGCACGTTATGTGGCAGATCCGACCAGCGTGCTGGATGATGTGACTGGCATGTCAG CGGAAGAAACTTTTGCATTAACCCTGCGTCATCCTGCGCCGGTGTTCAAGGTGATGCGTGC AACTGGTGAAGCGATTGCAAATGGTGTTCCGACTTTAGGCGAAATCGCGGAAAGCGCGAAT AGCAGCATTGCGGGTAGCTCATGTGCACTGATTGGCTTCTTTGTGGTGGTTCTGGAA cpeMA pET24a NdeI, NotI See cell below ATGCCGCACCATCATCACCACCACCACCACAGCACCACCCGTGGTAGCCTGACCCTGGCGG GTGCGGGCGTGACCAGCATTGGTCACCTGACCCTGCAGACCGTTGCGGCGATTGAAAACGC GGACATCGTGTGCTATATTCTGAACGATCCGGTTACCGAGGCGTTCATCATTAAGAAAAAC CCGAACGTGTACGACCTGTATCAACTGTACGACGATGGCAAGCCGCGTATCGAAACCTACC ACCAGATGGTTGAGGTGCTGATGAGCAAAGTTCGTAGCGGTCAAGACGTGGTTGGCCTGTT CACCGGTCATCCGGGCGTGGTTGATACCCCGGCGGCGCAGGCGTTTAAGATTGCGCGTCAA GAAGGTTATACCGCGCGTATGCTGCCGGGCATTACCACCAACGATGCGCTGCTGGCGGACC TGGTGGCGGATCCGGCGCTGGGTGGCGCGATGGCGTACGAGGCGACCGATTTCCTGAACAA CAACCGTGTTCTGCACCCGCAGATGAACGTGTTTATCCAGCAAGTTGGTGTGGTTGGCAAC AAACACTTCAACTTTATGGAAATGCGTAGCAGCCTGCTGGACAAGCTGATTGATCGTCTGG AGGAAACCTATGGTGGCGAAAAAGAGATCATTCACTACATCGCGCCGATGCTGCCGATTGA CAAGCCGGTGATGCAGAAAATGACCGTTAGCGATCTGAAGAAACCGGAATATAAGGCGAA AATCGTGCCGAGCAGCACCTTTTACATTACCCCGAACGAGCAACTGAGCAGCGTTCTGGAT AGCACCGAAGGTAAGAAACTGCATCGTGAGGCGATGAGCGCGCTGGCGAACCACACCCAC GGCAAGAACTATGCGCCGATGAAAGAGAACCTGGCGCTGACCGAAGCGCTGGAGCGTCTG GCGCTGGAACCGAAGAGCCTGGAGGCGTATCGTAGCGACCCGCAAAGCTACGTGAACGAA AACGGTCGTGGCCTGACCGAGGAAGAGCGTAAAGCGCTGGTTACCGGTCGTGGCATCCGTG AGCTGCTGAGCGATGGTCCGGTGGCGGCGCACCGTATTGCGCCGCTGGCGCTGGTTTAA gesMA pET24a NdeI, NotI See cell below ATGAGCCATCATCACCACCACCACCACCACGTGCAGCCGCAAAGCAGCGCGAAGAAAGGT GGCCTGGTGGTTGTGGGTAGCGGCATTCGTAGCGTGAGCCAGCTGACCCTGGAAGCGGTTA TGCACATCGAGAAGGCGGACACCGTTCTGTACTGCGTGTGCGATCCGAGCACCGAAGGTTT CATTAAGCGTAAAAACAAGAACGCGATCGACATTTATGGCTACTATAGCGACCTGAAGGAG CGTCCGGATGCGTTTGTTCAAATGGCGGAAGTGATCCTGCGTGAGGTTCGTAAAGGTATTA ACGTTGTGGCGGTGTTCTACGGTCACCCGGGCATTTTTGTTCATCCGAGCCGTCGTGCGCTG GCGATTGCGAAGAAAGAAGGTTATGCGGCGCGTATGCTGCCGGGTATCAGCGCGGAGGAC TGCCTGTTCGCGGATCTGCTGGTGAACCCGAGCTTTCCGGGTGCGCAGCTGGTTGAGGCGA GCGATATTGTTTATCGTGCGCGTCCGCTGGCGACCAGCTGCCACGTTGTGATCTTCCAAGCG GCGTGCTTTGGTCACTGGAAATATAACTTCACCGCGTTTGAGAACGGCAAGTTCGACCACC TGGTTAACCGTCTGCAGAAAGACTACGGTCCGGATCACCCGATCGTGAGCTATATGGCGGC GGTTAGCCCGCTGGAAGATCCGGTGATCAACCGTCACACCATTAGCGACCTGTACAAGGCG GATGTTAAGAAAGAGATCACCCCGAACTGCACCCTGTATATTCCGCCGAAGGACCTGCTGC CGATCAGCCCGGCGGGTGAACTGATCATTCTGGGTCATCAAGCGGGTCCGGATGAGACCCC GAAGTTCCCGCCGCTGCCGATTCACCACTACCTGGCGCCGGAGGAAGAGACCTATGGTCCG CAAGAAACCAGCGCGGTTGCGGCGCTGGAGAAAGGTGCGATCAGCGCGGACTACCGTCCG TATTGCGCGAGCCCGGCGATGCAGAAGGTGACCGAAAGCCTGAGCCTGGATCCGGAAGTG CTGAAAACCTACCGTGAAAGCCCGCAAGCGTTTGCGGAGAGCATTCCGGGTCTGGAAGCGC 234 GTGAGGTGAAAGCGCTGGCGAGCGGTAGCCCGGTTAAGATCCACGACAGCATGTGGGTGG AAGGTAAAAGCGAGGTTCGTTGGTAA gjuMA pETDUET-1 NcoI, HindIII See cell below ATGGGTAGCAGTCATCACCACCACCACCACCATCATGCGTCTCACATGGCAACTCCGATTG CAACTACCACCAATACTCCGACCAAAGCGGGTAGCCTGACCATTGCTGGTAGCGGTATTGC GTCAGTTGGCCATATTACTCTGGAAACCCTGGCGTACATCAAGGAGAGCCACAAGGTGTTC TACCTGGTGTGTGACCCGGTTACTGAAGCGTTCATCCAGGAAAACGGTAAAGGCCCGTGCA TCAATCTGAGCATCTACTACGACAGCCAGAAAAGCCGCTACGACAGCTATCTCCAGATGTG CGAGGTGATGCTGCGTGATGTGAGAAATGGCCTGGATGTGCTGGGTGTGTTTTATGGTCAT CCGGGCGTGTTTGTTTCACCGAGCCATCGTGCTATTGCGTTAGCGCGCGAAGAAGGCTTTAA TGCGAAGATGCTGGCTGGTGTTAGCGCGGAAGATTGCTTATTCGCTGACCTGGAGTTCGAT CCGGCAAGTTTTGGCTGTATGACCTGTGAAGCGAGCGAACTGCTGATTCGCAATCGCCCGT TAAACCCGTACATCCATAACGTGATCTGGCAAGTTGGCAGCGTTGGCGTGACTGACATGAC CTTCAACAACAACAAGTTCCCGATCCTGATTGACCGCCTGGAGAAGGATTTCGGCCCGAAC CATACCGTGATCCATTATGTTGGTCGCGTTATTCCGCAGAGCGTGAGCAAGATCGAAACCTT CACCATTGCGGACCTGAGGAAAGAAGAGGTGATGAACCACTTCGACGCGATCAGCACCCT GTATGTTCCTCCTCGCGACATTAGCCCTGTTGATCCGACTATGGCGGAAAAATTAGGTCCGA GCGGCACTAGAGTTGAACCCATCGAAGCGTTTCGTCCGAGCCTGAAATGGTCAGCACAAAA CGACAAACGCAGCTACGCGTATAACCCGTACGAGAGCGATGTTGTGGCGCAACTGGACAA CTATGTTACCCCTGAAGGCCATCGCATTTTACAGGGTTCACCGGCGATGAAGAAGTTCCTG ATCACCTTAGCAACCTCACCGCAGCTGCTGCAAGCATATCGCGAAAATCCGAGCGCGATTG TTGATACGGTTGAAGGCCTGAATGAGCAGGAGAAGTACGGCCTGAAACTGGGTAGCGAAG GTGCGGTTTATGCACTGATGTCACGTCCTACTGGCGATATTGCACGCGAGAAAGAACTGAC CAACGACGAGATCGCGAACAATCATGGTGCGCCGTATGCGTTTGTTAGCGCGGTGATTATT GCGGCGATTATTTGCGCGCTTTAA gymMA1 pET28b NcoI, BamHI See cell below ATGGGCAGCAGCCATCATCATCATCATCACAGCAGCGGCCTGGTGCCGCGCGGCAGCCATA TGCAAAGCTCTACCCAAAAGCAAGCAGGCTCCCTCACCATTGTTGGTTCCGGAATTGAGAG CATCAGCCAGATCACTCTCCAGTCTCTCTCTCACATTGAGGCTGCCTCCAAGGTCTTTTACT GTGTTGTAGACCCTGCGACTGAGGCATATCTCCTTGCCAAGAACAAGAACTGTGTTGACCT CTATCAGTACTATGATAATGGCAAGCCAAGGATGGACACCTATATACAGATGGCCGAGGTC ATGCTGAGGGAAGTTCGCAATGGTCTCGACATTGTTGGTGTATTCTATGGCCACCCAGGTGT ATTTGTGAATCCTTCACAGCGAGCTATCGCAATTGCCAAAAGTGAGGGTTACCAAGCAAGG ATGCTCCCAGGCATATCTGCTGAGGACTGTCTCTTTGCTGACTTGGGAATTGATCCTTGCAA CCCTGGCTGTGTTAGCTATGAGGCATCAGACTTCCTTATCAGAGAGAGGCCAGTCAATGTCT CCAGCCACTTTATTCTTTGGCAAGTTGGATGCATTGGTGTTGCAGACTTCACGTTTGTAAAA TTCAATAATTCAAAATTTGGGGTACTTCTCGACCGGCTCGAGCACGAATATGGCGCTGATC ATACAGTTGTGCACTATATCGCAGCCGTGTTGCCTTACGAGAATCCAGTGATTGACAAACTC ACCATCAGCCAGCTCCGTGACACCGAGGTCGCGAAGCGTGTGAGTGGTATATCGACCTTCT ATATCCCTCCAAAGGAGCTAAAGGACCCGAGCATGGATATCATGCGCCGCCTAGAACTTCT GGCTGCTGACCAAGTTCCAGACAAGCAATGGCACTTCTACCCAACAAACCAGTGGGCACCG TCTGCACCCAACGTAGTTCCTTATGGACCAATAGAACAAGCCGCCATTGTCCAGTTGGGCA GTCACACCATTCCAGAGCAATTTCAGCCTATTGCTACTTCCAAAGCTATGACTGACATCTTG ACAAAGCTGGCTTTGGACCCCAAGATGCTCACTGAGTACAAGGCTGACCGTCGTGCCTTTG CTCAATCTGCACTGGAGTTGACAGTCAATGAGAGAGATGCTTTGGAGATGGGGACTTTCTG GGCACTCCGCTGTGCTATGAAGAAGATGCCTTCATCTTTCATGGATGAAGTTGATGCCAATA ATTTACCAGTTGTTGCTGTTGTAGGAGTTGCTGTCGGCGCCGTAGCTGTCACCGTGGTCGTG TCACTCAATGACCTGACTGACAGTGTCAATTGA ledMA pET24a NdeI, NotI See cell below 235 ATGGAGCACCATCATCACCACCACCACCACACCCCGACCCTGAACAAAAGCGGCAGCCTGA CCATCGTGGGTACCGGCATCGAGAGCATTGGTCAGATGACCCTGCAAACCCTGAGCTACAT TGAAGCGGCGGACAAGGTGTTCTATTGCGTTATCGATCCGGCGACCGAAGCGTTTATTCTG ACCAAGAACAAAGACTGCGTTGATCTGTACCAGTACTATGACAACGGCAAAAGCCGTATGG ATACCTATACCCAAATGAGCGAGGTGATGCTGCGTGAAGTTCGTAAGGGCCTGGACGTGGT TGGTGTGTTCTACGGTCACCCGGGCGTGTTTGTTAACCCGAGCCTGCGTGCGCTGGCGATTG CGAAGAGCGAGGGCTTCAAAGCGCGTATGCTGCCGGGTGTTAGCGCGGAAGACTGCCTGTA CGCGGACCTGTGCATCGATCCGAGCAACCCGGGTTGCCTGACCTATGAGGCGAGCGATTTT CTGATTCGTGAACGTCCGACCAACATCTACAGCCACTTCATTCTGTTTCAAGTGGGTTGCGT TGGCATCGCGGACTTCAACTTTACCGGCTTCGAGAACAGCAAATTTGGTATTCTGGTGGATC GTCTGGAGAAGGAATACGGCGCGGAGCACCCGGTGGTTCACTATATTGCGGCGATGCTGCC GCATGAAGACCCGGTTACCGATCAGTGGACCATTGGTCAACTGCGTGAGCCGGAATTCTAC AAACGTGTGGGTGGCGTTAGCACCTTTTATATCCCGCCGAAGGAGCGTAAAGAAATTAACG TGGACATCATTCGTGAGCTGAAGTTCCTGCCGGAAGGCAAAGTTCCGGATACCCGTACCCA GATCTATCCGCCGAACCAATGGGAACCGGAAGTGCCGACCGTTCCGGCGTACGGTAGCAAC GAGCATGCGGCGATTGCGCAGCTGGATACCCACACCCCGCCGGAACAGTATCAACCGCTGG CGACCAGCAAGGCGATGACCGACGTGATGACCAAGCTGGCGCTGGATCCGAAAGCGCTGG CGGAATACAAGGCGGACCACCGTGCGTTCGCGCAAAGCGTTCCGGATCTGACCGCGAACG AGCGTACCGCGCTGGAAATCGGCGACAGCTGGGCGTTTCGTTGCGCGATGAAAGAGATGCC GATTAGCCTGCTGGATAACGCGAAGCAGAGCATGGAGGAAGCGAGCGAACAAGGTTTTCC GTGGATCATTGTGGTTGGTGTGGTTGGCGTGGTTGGTAGCGTGGTTAGCAGCGCGTAA mroMA1 pET24a NdeI, NotI See cell below ATGGCGCACCATCATCACCACCACCACCACCTGAAGAAACCGGGTAGCCTGACCATTGCGG GTAGCGGCATTGCGAGCATCGGCCACATTACCCTGGAGACCCTGGCGCTGATCAAGGAAGC GGACAAAATTTTCTACGCGGTTACCGATCCGGCGACCGAGTGCTATATCCAGGAAAACAGC CGTGGTGACCACTTCGATCTGACCACCTTTTACGACACCAACAAGAAACGTTACGAGAGCT ATGTGCAAATGAGCGAAGTTATGCTGCGTGATGTGCGTGCGGGTCGTAACGTGCTGGGCAT TTTCTATGGTCATCCGGGCGTGTTTGTTGCGCCGAGCCACCGTGCGATTGCGATTGCGCGTG AGGAAGGTTTCCAGGCGAAGATGCTGCCGGGCATCAGCGCGGAGGACTACATGTTCGCGG ACCTGGGTTTTGATCCGAGCACCTATGGCTGCATGACCCAGGAAGCGACCGAACTGCTGGT TCGTAACAAGAAACTGGATCCGAGCATTCACAACATCATTTGGCAAGTTGGTAGCGTGGGC GTTGACACCATGGTGTTCGATAACGGCAAGTTTCACCTGCTGGTTGAGCGTCTGGAAAAGG ACTTCGGTCTGGATCACAAAATTCAGCACTACATCGGCGCGATTCTGCCGCAAAGCGTGAC CGTTAAAGACACCTTTGCGATCCGTGATCTGCGTAAGGAAGAGGTGCTGAAACAGTTCACC ACCACCAGCACCTTTTATGTGCCGCCGCGTACCCCGGCGCCGATTGACCCGAAAGCGGTTC AGGCGCTGGGTCTGCCGGCGACCGTGACCAAAGGTGCGCAGGATTGGACCGGCTTCCAAA GCGTTAGCCCGGCGTACGGCCCGGATGAGATGCGTGCGGTTGCGGCGCTGGATAGCTTTGT GCCGAGCCAGGAAAAAGCGGTGGTTCACGCGAGCCGTGCGATGCAAAGCCTGATGGTTGA TCTGGCGCTGCGTCCGGCGCTGCTGGAGCAGTATAAAGCGGATCCGGTGGCGTTTGCGAAC ACCCGTAACGGTCTGACCGCGCAAGAAAAATTCGCGCTGGGTCTGAAGAAACCGGGCCCG ATCTTTGTGGTTATGCGTCAGCTGCCGAGCGCGATTGCGAGCGGTCAGGAACCGAGCCAAG AGGAAATCGCGCGTGCGGACGATGCGACCGCGTTTATCATTATCTACATCGTGCAAGGCTA A pgiMA1 pET28b NcoI, BamHI See cell below ATGGGCAGCAGCCATCATCATCATCATCACAGCAGCGGCCTGGTGCCGCGCGGCAGCCATA TGTCTTCCGCTTCAAGTGACTCGAATACCGGCAGCTTGACCATCGCTGGTTCAGGCATTGCT AGTGTCCGTCACATGACGCTCGAGACTCTCGCTCACGTTCAGGAGGCCGACATCGTGTTCTA CGTCGTTGCAGACCCTGTTACGGAGGCGTACATCAAAAAGAACGCTAGAGGTCCTTGCAAG GATCTCGAGGTCTTATTCGACAAGGACAAGGTACGGTACGATACGTACGTCCAGATGGCCG AGACAATGTTGAACGCTGTGAGGGAGGGTCAAAAAGTGCTTGGCATATTCTACGGCCACCC 236 CGGTGTCTTTGTTTCtCCCTCTCGGCGCGCATTGTCTATCGCTCGAAAGGAAGGCTACCAGG CTAAAATGTTGCCGGGTATCTCTTCAGAGGACTACATGTTTGCTGACCTCGAATTTGACCCG GCTGTACACGGCTGCTGCGCCTACGAGGCTACCCAACTCCTCTTGCGAGAAGTTTCTCTTGA TACAGCGATGAGCAACATCATCTGGCAGGTCGGCGGcGTCGGTGTCTCTAAGATCGACTTTG AGAACTCCAAGGTCAAGCTCCTAGTCGACCGACTGGAGAAGGACTTTGGTCCTGACCACCA CGTCGTGCATTACATAGGCGCAGTACTTCCCCAGTCCGCAACTGTTCAGGACGTGCTGAAG ATTTCCGATCTTCGCAAAGAGGAAATCGTTGCTCAATTCAACTCGTGCTCTACTCTCTATGT CCCACCGCTcACACATGCTAACAAGTTCTCCGGTAACATGGTCAAGCAGCTCTTTGGTCAGG ACGTGACCcAGGTCTCCTCAGCTCTGTGTCCCACGCCCAAGTGGGCTGCCGGGTCTCATCTC GGCGATGTTGTTGAGTACGGCCCTCGCGAGAAGGCTGCCGTCGATGCCCTGGTGGAGCACA CaGTTCCGGCcGATTACCGTGTCCTCGGCGGCTCGCTCGCTTTCCAGCAGTTCATGATCGACC TCGCCCTCCGTCCCGCAATCCAAGCGAACTACAAAGAGAACCCTCGCGCGCTCGTGaACGC GACCAAAGGCCTCACAACTGTCGAaCAGGCCGCGCTGTTGCTTCGCCAGCCcGGCGCCGTCT TCGGGGTCATGAAACTTCGCGCGAGCGAAGTGGCAAAtGAACAGGGtCACCCCGTCGtTCCC GCGTCCCTCGACCATGTTGCTTTCACCGCACCTTCCCCCGCGTCCCTTGACCATGTAGCTTTC TCTtCCCCAAACCtTGCGTCCCTCGATCAtGTCGCGTTCATTGCCCCTACCCCTGCATCCCTCG ATCATGTCGCATTCTCAGCCCCCACTCCCGCGTCCCTCGACCACGTATCGTTTGGAACTCCc ACCTCTGCATCTCTCGATCACGTCGCATTCGAGGCCCCCGTCCCTGCGTCCCTCGACCACGT AGCGTTCGCCGCTCCCGTCCCTGCATCCCTCGACCATGTAGCGTTCGCCGCCCCCACCCCGG CATCTCTCGACCATGTAGCGTTCGCTGCCCCTACCCCTGCATCCCTCGACCATGTGGCATTC GCCGTGCCTGTTCCTGCATCCCTAGATCACATAGCATTCTCCGTCCCCACCCCTGCATCTCTC GACCACGTGGCTTTCGCTGTGtCCGTCCCCGACCATGTCGCAGGCATTCCTTGCATGTAAG pgiMA1_mut pET28b NcoI, BamHI See cell below ATGGGCAGCAGCCATCATCATCATCATCACAGCAGCGGCCTGGTGCCGCGCGGCAGCCATA TGTCTTCCGCTTCAAGTGACTCGAATACCGGCAGCTTGACCATCGCTGGTTCAGGCATTGCT AGTGTCCGTCACATGACGCTCGAGACTCTCGCTCACGTTCAGGAGGCCGACATCGTGTTCTA CGTCGTTGCAGACCCTGTTACGGAGGCGTACATCAAAAAGAACGCTAGAGGTCCTTGCAAG GATCTCGAGGTCTTATTCGACAAGGACAAGGTACGGTACGATACGTACGTCCAGATGGCCG AGACAATGTTGAACGCTGTGAGGGAGGGTCAAAAAGTGCTTGGCATATTCTACGGCCACCC CGGTGTCTTTGTTTCCCCCTCTCGGCGCGCATTGTCTATCGCTCGAAAGGAAGGCTACCAGG CTAAAATGTTGCCGGGTATCTCTTCAGAGGACTACATGTTTGCTGACCTCGAATTTGACCCG GCTGTACACGGCTGCTGCGCCTACGAGGCTACCCAACTCCTCTTGCGAGAAGTTTCTCTTGA TACAGCGATGAGCAACATCATCTGGCAGGTCGGCGGTGTCGGTGTCTCTAAGATCGACTTT GAGAACTCCAAGGTCAAGCTCCTAGTCGACCGACTGGAGAAGGACTTTGGTCCTGACCACC ACGTCGTGCATTACATAGGCGCAGTACTTCCCCAGTCCGCAACTGTTCAGGACGTGCTGAA GATTTCCGATCTTCGCAAAGAGGAAATCGTTGCTCAATTCAACTCGTGCTCTACTCTCTATG TCCCACCGCTcACACATGCTAACAAGTTCTCCGGTAACATGGTCAAGCAGCTCTTTGGTCAG GACGTGACCcAGGTCTCCTCAGCTCTGTGTCCCACGCCCAAGTGGGCTGCCGGGTCTCATCT CGGCGATGTTGTTGAGTACGGCCCTCGCGAGAAGGCTGCCGTCGATGCCCTGGTGGAGCAC ACaGTTCCGGCcGATTACCGTGTCCTCGGCGGCTCGCTCGCTTTCCAGCAGTTCATGATCGAC CTCGCCCTCCGTCCCGCAATCCAAGCGAACTACAAAGAGAACCCTCGCGCGCTCGTGaACG CGACCAAAGGCCTCACAACTGTCGAaCAGGCCGCGCTGTTGCTTCGCCAGCCcGGCGCCGTC TTCGGGGTCATGAAACTTCGCGCGAGCGAAGTGGCAAAtGAACAGGGtCACCCCGTCGtTCC CGCGTCCCTCGACCATGTTGCTTTCGGcACTCCTACCTCTGCATCTCTCGATCACGTCGCATT CGAGGCCCCCGTCCCTGCGTCCCTCGACCACGTAGCGTTCGCCGCTCCgGTCCCTGCATCCC TCGACCATGTAGCGTTCGCTGTGCCCGTCCCCGACCATGTCGCAGGCATTCCTTGCATGTAA pocMA pETDUET-1 NcoI, HindIII See cell below ATGGGTAGCAGTCATCACCACCACCACCACCATCATGCGTCTCACATGCCGGTGAGTACTA CGACGACGAAAAATGGCACCCTGGTTATTGCGGGTAGCGGTATTGCGAGCATTGCGCACAT TACCCTGGAAACCCTGTCACACATCAAGGAATCAGATCGCGTGTACTACATCGTGGGTGAT 237 CCGGCAACCGAAGCGTTCATCCAGGATAATGCATCAGGCACCTGTTTCGATCTGACCATCTT CTACGACACCAACAAGGTGCGCTACGACAGCTATGTGCAGATGTGCGAGGTGATGCTGCGT GATGTTAGAGCTGGTCATACCGTGCTGGGCGTGTTTTATGGTCATCCTGGCGTGTTTGTTTC ACCGAGCCATCGTGCTATTGCGATTGCGCGCGATGAGGGCTATAAAGCACGCATGTTACCT GGTGTTAGCGCGGAAGATTACCTGTTCGCGGATCTGGGCTTTGATCCGGCAACTCATGGTTG TACCAGCTATGAAGCGACCGATCTGTTAGTGCGCAACAAACCGCTGAATGCGAGCACCCAC AACATCATCTGGCAAGTTGGTGGCGTTGGTGTTGGTACGATGGTGTTCGATAACGCGAAGT TCCACCTGTTGGTGGATCGTCTGGAGAAGGATTTTGGTCCGAGCCATACGGTGGTGCACTA CATTGGTGCAGTGTTACCGCAGAGCATTACCACGATGGATAAACTGACCATTGCGGACCTG AGGAAGGATGCGGTGGTGAAACAGTTCAACCCGACCAGCACCTTCTATATTCCTCCTCGCG ACATTTCACTGCCTCTGGACACGATGGCGAAGAAACTGGGCATGGATGATGCATCAGCAAG ACCTGTGAGCTTATATCCGCCGTCACGTTGGACTGGCACCAAATTCACCACTGCACCGGCTT ATGGTCCTCGCGAAAAAGATGTTATCGCGAAGATCGACACCTACGCAGCACCGAAAGACC ACAAGATCCTGCATGCGAGCCGTAGCATGAAAAAGCTGATGACCGATCTGGCGTTAAACCC GAAGCTGCTGGAGAAATATCGCGCGAACACCAAGGCGGTTGTTGAAGCAACTGAAGGCTT ATCAGCGCAGGAGAAAGCGGCATTAAACATGGACCTGGCTGGCCCGGTTCATGCAGTGATG AAAGCAACTCCGAGCGACATTACCGATGGTAGAGAAATGAGCGTTGACGCGGTTGCAAGC GCGACTGAACCTTCAGCGGCACTGATTCTGTTGCTTGTTTAA rviMA1 pET24a NdeI, NotI See cell below ATGAGCCATCATCACCACCACCACCACCACACCAAGCGTGGTACCCTGACCATCGCGGGTA GCGGCATTGCGAGCGTTGGTCACATCACCCTGGGCACCCTGAGCTACATCAAGGAAAGCGA CAAAATTTTCTATCTGGTGTGCGATCCGGTTACCGAGGCGTTTATCTACGACAACAGCACCG CGGACTGCTTCGATCTGAGCGTGTTTTATGACAAGACCAAAGGTCGTTACGATAGCTATATT CAAATGTGCGAAGTTATGCTGAAAGCGGTGCGTGCGGGTCATGATGTGCTGGGCGTTTTCT ACGGTCACCCGGGCGTGTTTGTTAGCCCGAGCCATCGTGCGATTGCGGTTGCGCGTCAGGA AGGTTACAAGGCGAAAATGCTGCCGGGCATTAGCGCGGAAGACTATATGTTCGCGGACCTG GAGTTTGATCCGAGCGTGAGCGGTTGCAAGACCTGCGAAGCGACCGAGATCCTGCTGCGTG ACAAACCGCTGGATCCGACCATTCAGAACATCATTTGGCAAGTGGGTAGCGTTGGCGTGGT TGACATGGAATTCAGCAAGAGCAAATTTCAACTGCTGGTTGATCGTCTGGAGAAGGACTTC GGTCCGGATCACAAAGTGGTTCACTACATTGGTGCGGTGCTGCCGCAAAGCACCACCACCA TGGACACCTTCACCATTGCGGACCTGCGTAAGGAAGATGTTGCGAAACAGTTTGGTACCAT CAGCACCCTGTATATTCCGCCGCGTGACGAGGGCCACGTTAACCTGAGCATGGCGAAGGTG TTTGGTGGCCCGGGTGCGAGCGTTAAACTGAACGATAGCATCAAGTGGGCGGGCCCGAAAC TGAACATTGTGAGCGCGAACGACCCGCACGAACGTGATGTGATCGCGCAGGTTGATACCCA CGTGGCGCCGGAGGGTCACAAGAAACTGCGTGTTAGCGCGGCGATGAAGAAATTCATGAC CGACCTGGCGCTGAAGCCGAAATTTCTGGAGGAATATAAGCTGGATCCGGTGGCGGTGGTT GAAAGCGCGGAGGGCCTGAGCAACCTGGAACGTTTCGGTCTGAAGTTTGCGCGTAGCGGTC CGGCGGATGCGCTGATGAAAGCGACCGAGAGCGATATCGCGAGCGGTCGTCAGCTGACCG AGGAAGAGATTGCGCAGGGTACCGGTCCGGTTGGCCTGCAGACCGCGCTGGCGCTGCTGGT GCTGCTGGGTCTGGGCGTGGCGATTGTTACCCGTCCGGACGATTAA rviMA2 pET24a NdeI, NotI See cell below ATGACCCATCATCACCACCACCACCACCACACCGGTACCGAACGTGGTACCCTGACCATTG CGGGTAGCGGCATTGCGTGCGTTGCGCACATCACCCTGCAGATGCTGAGCTACATTAAGGA GAGCGACAAACTGTTCTATCTGGTGTGCGATCCGGTTACCGAAGCGTTTATCCAAGACAAC GCGACCGGTGACTGCTTCGATCTGAGCGTGTTCTACGACAAGAACAAAAGCCGTCACGATA GCTATATCCAGATGTGCGAAATTATGCTGCGTGCGGTGCGTGCGGATCACCATGTGCTGGG CGTTTTCTACGGTCACCCGGGCATCTTTGTGAGCCCGAGCTATCGTGCGATGGCGGTTGCGC GTGAGGAAGGTTACAAGGCGAAAATGCTGCCGGGCATTAGCACCGAGGACTACCTGTTCGC GGACCTGGAATTTGATCCGTGCCTGCCGGGTTGCAACACCTACGAGGCGACCGAACTGCTG CTGCGTGACCGTAGCCTGGATCCGAGCATTCACAACATCATTTGGCAGGTTGGTAGCGTGG 238 GCGTTATCGACATTCAATTCGAGAAGAGCAAATTTCACCTGCTGGTTGACCGTCTGGAAAA GGACTTCGGTCCGGATCACAAAGTGGTTCACTACATTGGTGCGGTTCTGCCGCAGAGCACC ACCACCATGGACACCTTCACCATTAGCGACCTGCGTAAAGAGGACGTGGCGAAACAATTTG GCACCATCAGCACCCTGTATATTCCGCCGCGTGATAAACCGCTGGCGCACCCGGGTATGGC GGAAGCGATTGGCAGCCTGACCGCGCCGGCGAAACTGTACAGCCCGGTGAAGTGGGCGGG TCCGAAACTGAACATTGTTAGCCCGTACAGCCCGTATGAGCGTGACGTGATCGCGCGTATT GATACCCACGTTGCGCCGGAAGGCCACAAGAAACTGTATACCAGCGCGGCGATGAAGAAA TTCATGACCGACCTGGCGCTGAAGCCGAAACTGCTGGAGGAATACATGCTGGATCCGGTGG CGGTGGTTGAGAGCGCGGACGGTCTGAGCGATGTTGAAAAGTTTGGTCTGAAGCTGGCGAA AGACGGCGTGGCGAACATCCTGATGATGGCGACCGAGAGCGACATTGCGAGCGGTCGTCA CCTGGCGGAGGATGAAATCGCGAAGGCGAAAGGTCCGCTGGGCCTGCTGACCGTGGTTCTG GTGATTGTTGGCAGCAGCCTGGTGGTTCACCGTCTGACCTAA sveMA pET24a NdeI, NotI See cell below ATGGCGCACCATCATCACCACCACCACCACAGCAGCACCCACCCGAAGCGTGGTAGCCTGA CCATTGCGGGTACCGGTATTGCGACCCTGGCGCACATGACCCTGGAGACCGTTAGCCACAT CAAGGAAGCGGACAAAGTTTACTATATTGTGACCGATCCGGTTACCCAGGCGTTCATCGAG GAAAACGCGAAGGGCCCGACCTTTGACCTGAGCGTGTACTATGACGCGGATAAATACCGTT ATACCAGCTACGTGCAAATGGCGGAAGTGATGCTGAACGCGGTGCGTGAAGGTTGCAACGT TCTGGGCCTGTTCTATGGTCACCCGGGCATCTTTGTGAGCCCGAGCCATCGTGCGCTGGCGA TTGCGCGTGAGGAAGGTTATGAGGCGCGTATGCTGCCGGGCGTTAGCGCGGAAGACTATAT GTTTGCGGACCTGGGTCTGGATCCGGCGCTGCCGGGCTGCGTGTGCTACGAGGCGACCAAC TTTCTGATCCGTAACAAACCGCTGAACCCGGCGACCCACAACATCCTGTGGCAGGTTGGTG CGGTGGGCATTACCGCGATGGATTTCGAGAACAGCAAGTTTAGCCTGCTGGTTGACCGTCT GGAACGTGATCTGGGTCCGAACCACAAAGTGGTTCACTATGTTGGTGCGGTGCTGCCGCAA AGCGCGACCATCATGGAAACCTATACCATTGCGGAGCTGCGTAAGCCGGAAGTTATCAAAC GTATTAGCACCACCAGCAGCACCTTCTACATCCCGCCGCGTGATAGCGAGGCGATTGACTA TGATATGGTGGCGCGTCTGGGTATCCCGCCGGAAAAGTACCGTAAAATTCCGAGCTATCCG CCGAACCAGTGGGCGGGTCCGAACTATACCAGCACCCCGGCGTATGGCCCGGAGGAAAAG GCGGCGGTTAGCCAACTGGCGAACCACGTGGTTCCGAACAGCTACAAAACCCTGCACGCGA GCCCGGCGATGAAGAAAGTGATGATCGACCTGGCGACCGATCGTAGCCTGTACAAGAAAT ATGAGGCGAACCGTGACGCGTTTGTTGATGCGGTGAAGGGTCTGACCGAGCTGGAAAAGGT GGCGCTGAAAATGGGTACCGACGGCAGCGTTTACAAGGTGATGAGCGCGACCCAGGCGGA TATCGAGCTGGGCAAAGAACCGAGCATTGAGGAACTGGAGGAAGGTCGTGGCCGTCTGCT GCTGGTTGTGATTACCGCGGCGGTGGTTGTGTAA C Protein sequences (for RNA coverage: sequencing from public data bases https://jgi.doe.gov/data-and-tools/mycocosm/) Protein name Originating organism Protein ID Partial or Full RNA coverage AboMA Anomoporia bombycina ATCC 64506 v1.0 1346513 yes MSSPAVETKVPASPDVTAEVIPAPPSSHRPLPFGLRPGKLVIVGSGIGSIGQFTLSAVAHIEQADRVF FVVADPATEAFIYSKNKNSVDLYKFYDDKKPRMDTYIQMAEVMLRELRKGYSVVGVIYGHPGVF VTPSHRAISIARDEGYSAKMLPGVSAEDNLFADIGIDPSRPGCLTYEATDLLLRNRTLVPSSHLVLF QVGCIGLSDFRFKGFDNINFDVLLDRLEQVYGPDHAVIHYMAAVLPQSTTTIDRYTIKELRDPVIK KRITAISTFYLPPKALSPLHEESAAKLGLMKAGYKILDGAQAPYPPFPWAGPNVPIGIAYGRRELAA VAKLDSHVPPANYKPLRASNAMKSTMIKLATDPKAFAQYSRNPALLANSTPGLTTPERKALQTGS QGLVRSVMKTSPEDVAKQFVQAELRDPTLAKQYSQECYDQTGNTDGIAVISAWLKSKGYDTTPT AINDAWADMQANSLDVYQSTYNTMVDGKSGPAITIKSGVVYIGNTVVKKFAFSKSVLTWSSTDG NPSSATLSFVVLTDDDGQPLPANSYIGPQFTGFYWTSGAKPAAANTLGRNGAFPSGGGGGSGGGG 239 GSSSQGADISTWVDSYQTYVVTTAGSWKDEDILKIDDDTAHTITYGPLKIVKYSLSNDTVSWSAT DGNPFNAVIFFKVNKPTKANPTAGNQFVGKKWLPSDPAPAAVNWTGLIGSTADPKGTAAANATA SMWKSIGINLGVAVSAMVLGTAVIKAIGAAWDKGSAAWKAAKAAADKAKKDAEAAEKDSAVD DEKFADEEPPDLEELPIPDADPLVDVTDVDVTDVDVTDVDVTDVDVTDVDVTDVDVTDVDVTD VDVTDVDVVDVLDVVVI* AgaMA1 Armillaria gallica 21-2 v1.0 1000654 no MPANKGTLTIAGSGIASIGHITLETLSYIQGADKVYYVITDPATEAFIQDKSEGDCFDLTVYYDKNK IRYETYVQMCEVMLRDVRADYNVVGVFYGHPGVFVSPSHRAIAIARDEGYRARMLPGVSAEDY MFSDLGFDPAVPGCMTQEATAMLNHNKKLDPSIHNIIWQVGAVGIDTMVFDNRKFHLLVDRLEE DFGPDHRVVNYIGAVLPQSTTVMDEFTIGDLRKEDVVKQFTTVSTFYVPPRTRAPVDQEAMQKFG PSDAPLAHTVRHLYPPSKWAGTQTSVVPAYGPCERAAVDRIADYTPPPDHMILRASPAIRQFMTD LALNPGLRDRYKADPVAVLDATPDLSTQEKFALSFDKPGPVYTVMRATPAAIASGQEPTFDDIAG ATESASPPLFVIT* AgaMA2 Armillaria gallica 21-2 v1.0 622643 no MPANKKGTLTIAGSGIASIGHITLETLSYIQEADKVYYAITDPATEAFIQDKSEGDCFDLTVYYDKN KIRYETYVQMCEVMLRDVRADYNVVGVFYGHPGVFVSPSHRAIAIARDEGYRARMLPGVSAEDY MFSDLGFDPAVPGCMTQEATAMLNHNKKLDPSIHNIIWQVGAVGIDTMVFDNRKFHLLVDRLEE DFGPDHRVVNYIGAVLPQSTTVMDEFTIGDLRKEDVVKQFTTVSTFYVPPRTRAPVDQEAMQKFG PSDAPLVYPPSKWAGTQTFVVPAYGPCERAAVDRIADYTPPPDHMILRASPAIRQFMTDLALNPGL RDRYKADPVAVLDATPDLSTQEKFALSFDKPGPVYIVMRATPAAIASGQEPTFDDIAGATESASPP LFIIVQVPA* AolMA Arthrobotrys oligospora ATCC 24927 4309 no MSEGGKLILVGTGVRSLCQLTLEAIDEIERADVIYYAVRDATTEGFIKKRNKEAIDLYQYFINDEEI PEADIYIQIAEVMLAATRKGRRVVGAFFGHPGLFMSPNRRALAIAQAEGYTAKILPGVSVDDCLLA DLGVDPSFIGCLTCEARDFMIHDHLGLTSRHVIMYEVGYLGFYGDDSKTDYFEYFVNRLEEIYGNE HSLVNYTAAISPLMQPVINTLTIGDLRKPEVRKQITSASTLYFPPKEILKLNKFGCDLLDQGITNKEQ FQHAIFPGQPLYQLIGKALPHEAYSEHAQQVIAGLHRRKISPRYPLYRASAAMQSTMEDIYLKNEV RKEYLISPTSFTLRVVPGLKEMEKIALASGNYSQIDGAMKSGDLDQLTTGAIEIGNYKVILYSGYAI GYERATFAIADFTNFSFFNIY* AosMA Armillaria ostoyae C18/9 252778 no MPANKKGTLTIAGSGIASIGHITLETLSYIQEADKVYYAITDPATEAFIHDKSKGDCFDLSVYYDKN KNRYETYVQMCEVMLRDVRADYNVLGVFYGHPGVFVSPSHRAIAIARDEGYRARMLPGVSAED YMFSDLGFDPAVPGCMTQEATAMLIHNKKLDPLIHNIIWQVGSVGVDTMVFDNRKFHLLVDRLE EDFGLDHKVVHYIGAVLPQSTTVMDEFTIGDLRKEDVVKQFTTMSTFYVPPRTPAPVDQEAMQKF RSLDAPLARTVHLYPPSKWAGTQTSVVPAYGPYERAAVDRIADYTPPPDHMILRASPAIRQFMMD LALNPGLRDRYKADPVAVLDATPDLSTQEKFALSFDKPGPVYTVMRATPAAIASGQEPTFDGIAG AAKPASFPGVAPLIIISV* ApeMA Apodospora peruviana CBS118394 v1.0 642771 yes MAAEHATPSPVETHFGRTVPAMGRRPGKLVMVGSGIKSISHMTLETVSHIEQADKVFYCVADPGT ELFVKSKAKWSFDLYTLYDNDKNRYITYVQMAELCLQAARDGFFSVGVFYGHPGVFVSPSHRAI GIAKREGIEAYMLPGISAEDCLFADLGVDPSFTGCQTYEATDLLLRDRPISPYSHLIVWQVGVVGD TGFNFGGFTQTKFQVLVDRLEEVYGSDHRLIHYFASTLSHGPAHIEPLRISDLRKPEVEKRMNGIST FYVPQIGKSAHNPKTAERLGLRVDSKTPDRSFGHLIGPAISYNTLETRAVQALKTHKPSPSYRKNR LPTSTLPVLTALATSPKAVAHFKRNTTQFLDAFPDMATHVKKVLQTGSPGLLRLLSLNSSADVAA 240 KFVQAEFRDSTLASKYAAVLKENNGDPDGETNIIKFLQDQGYDTTPEDVSTAYLSAISVDLNTYA GYYASTFTNGGVGPNILIQNGAVTVDDTVIKNPVYAQSLLQWSIKDGNAFNAKLTFRILTDDDGK PLAPGAYIGPQFYGTYWKSEEPSTPNIQGKTGTAPIKPVNPVTPVTPTPLDTFTGNFVAYKADATT GKWSEDGTFVVSDPAGSTVPTAVYKGKTLNNYQYSGNETLTWSSTDGNDSNGSISFFINKTATST NPTLGAQATGRVWAPAEAMPAKVNFFMSLGQSANPSTQSVPSQSASEWKSVGINVGVGLATMLL GTAIIEAIKWRIKLKANPTDPEINQGVKDSSEKVSQSSEQQEAVQKSSVESDASGSADVQPSDIPVP DAPVTTTTDTTTTDTTTTDTTTTDTTTTDTTTTDTTTTDTTTTTDTTTTTDVTTDVTTDVDVVVDV DVIVIL* BadMA Bjerkandera adusta v1.0 128644 yes MSTTTSNNAGSLTIAGSGIASVAHITLETLSHIREADKVFYIVCDPATEAFIHDNAKAEAVDLTVYY DTNKARYDSYVQMAEVMLQDVRGGKDVLGIFYGHPGVFVSPSHRALAIARSEGYKAKMLPGVS AEDYLFADLEFDPSVHGCATFEATELLLREKPLNTTMHNIIWQVGAVGVDDMVFTNSKLHVLVD RLEKDFGPEHQVVHYIGAVLPGSRTVMDTFTVADLCKDDVVKQFNPSSTLYIPPRSLAANSSDIAA SLGAKPDHPLVDPTLFPPLRWTKSTSPEAPAYGPLEQAAVAELANHKVPSQHKVLAASPAMRTLV AELNVALRKKLAADPKAFAGGREGLTEVEKLAVGTGNVGTMGAVMRALPGGEQSTDMVTSPAS IEQQSRREAFFLIVLIVSTRILH* CbeMA Cercospora beticola XP_023455951.1 no MPSQTSIWNHIDELTRHDVFPSTEAGKGELVVVGTGIASIRQMTVEALDYIQRADKVFYATLDAV TETFIKHHAPSAEDLYQYYDTEKNRVTTYVQMAEVILSSVRKGKLTVAVFYGHPGVFVTPSHRAI YIARHEGYKAQMLPGVSAEDCLYADLGIDPASSGCSMYEASFLLNEPNRLDSRHHLIIWQVGCVG KEAMIFDNKEIYKLADYLEAEYGPDHPVIAYLAAIQPFHDSKMDKMTVQDLRDQDKVQNIPITAG TTLYVPPKKLPANPPAYKDMAIGYQLALTSAFRISHPDLDVVETYTQEEKSWCEELASWSPPKSY NANAAPPVLRRIAVKLALLHHRLHGNVALSDVANAITTAEPSLTDEEANLLRQFVGHLDFMFKKE RPPQSVTTSIINNTIVPPIVTQLNIIRKDGSIMKGVKKPSLYVY* CeaMA1 Ceratobasidium sp. (anastomosis group I, AG-I) v1.0 486605 no MASITTGRDTTKSGSLIIAGSGISSVAHLTLETVSHLKNADNVFYLVGDPVTEAFIQENNKSTTNLV AHYATSKHRYQTYVEMAEVMLREVRAGHSVFGIFYGHPGVLTTPAHRALTLARQEGYEARMLP GVSSVDYMFADLELEPGQHGCMIHEATDLLARDRRLDPSVHNIILQPSRVGSATLEKEASKFQLLV DRLVRDFGPDHKIVHYSGAVLPQSSSAMVVFVIENLRNEQLANQIRSTSILYIPPRDIVPVHPDAAA ALKLPDMLGLLSTSVQWVGPRFIETADYGPVERKFVDQLERQVIPEGQQSLRASTAMRKFMINLA LDPNGLKEYKESPSAVAAGVPGLTDRERSALAIASEGPIFVVMSRTDDEEPTEEQLMEADRNGARI VDSCTMCTLGGGRNS* CeaMA2 Ceratobasidium sp. (anastomosis group I, AG-I) v1.0 594340 no MTTPSDTNKKGTLTIAGSGIASIRHITLETLSYIKESDKIYYLVADPATEAFIIENANGSCVSLYGLY GIDKIRYDTYVQMSEVLLRDVRAGFDVLGIFYGHPGVFVSPTQRAMSIALEEGFQARMLPGVSAE DYLFADLRVDPCMFGCAAYEATELLYRKRRLNPTMQNIIWQVGKRFTIIKLTSPDTQNSKFGLLV DHLEEDYGPDHKVVHYIGAVLPQATTVIQPYTISELRKPEVASQIRACSTFYIPPRDEILPDASMSER LGLDAPISHLLGGRYPRPAWSVSGFKTAPAYGPREKHLVAELNVRGIPEPDMVLFASQPMRKFMA DLALKPRLRDSYRSNPQVIVDAVKGLTSLENMALKLNRVTAITRVMSVNPTALILGIEPTETDLAI DPYMDNGDPKIVVSG* CeuMA1 Cerrena unicolor v1.1 312586 no MATQKSGSLTIAGSGIASIGHITLETLSYIEQADKVYYAVADPATEAFIQDKSKVECFDLTVYYDK DKIRFETYIQMSEVMLRDVRAGHSVLGIFYGHPGVFVCPSHRAIAIALSEGYKARMLPGISAEDYM 241 FSDIGFDPALPGCTTQEATHLLLHNKKLDPSMHNIIWQVGGVGADTMNFDNRQFHQLVDCLERDF GSSHKVVHYIGAVMPQSTTIMDEFSIADLRKEEVVKQFTTWSTFYIPPRDAAPVDEGIMQSLGLSS NDMQYTMYPPSSTMRLGIRSPNLDVYGRAGRAAIEKLDHHTPAARHQVLRASPAIRKFMEDLAL KSDLRDRYKADPHTVLDAIPGLTSQEKIALGFGKPGPVYKVMRATGRETADGQEHVPHDLTTTDE PGAPVLLLLLLQTT* CeuMA2 Cerrena unicolor v1.1 361677 yes MATTKTGSLTIAGSGIASVAHITLEVLSYLQEADKIYYAIVDPVTEAFIQDKSKGRCFDLRVYYDK DKMRSETYVQMSEVMLRDVRSGYNVLAIFYGHPGVFVCPTHRAISIARSEGYTAKMLPGVSAED YMFSDIGFDPAVPGCMTQEATSLLIYNKQLDPSVHNIIWQVGSVGVDNMVFDNKQFHLLVDHLE RDFGSIHKVIHYVGAIMPQSATVMDEYTISDLRKEDVVKKFTTTSTLYIPPREIAPVDQRIMQALEF SGNGDRYMALSQLRGVHARNSGLCAYGPAEQAAVDKLDHHTPPDDYEVLRASPAIRRFTEDLAL KPDLRSRYKEDPLSVLDAIPGLTSQEKFALSFDKPGPVYKVMRATPAAIAAGQEHSLDEIAGSADS ESPGALATTIVVIVHI* CfuMA Cladosporium fulvum v1.0 186945 no MPSQSIWSHIAELTRGGPVPKDVPHKGELVVVGTGIASLRQLTVEALDYIQRADVVFYATLDAVT EAFIKQHAKAAENLYQYYDTEKNRNATYTQMAETILASVRKGNMTVAVFYGHPGVFVTPSHRAI YIARQEGYKAKMLPGVSAEDCLYADLDIDPASSGCSMYEASFLLLEPDRLDSRHHLIIWQVGCVG KEAMVFDNKELYKLADYLEAEYGPKHPAIAYLAAIQPFNDSKMDHMTVEDLRDPEKVRSIPINAG TTLYVPPKKLPANPQAYKDIEIGYKLGLTSAFRISHPELDVAETYSEIEKGWCEELVSWTPPKSYIP NAATPALRRIAIKLALLHHRLHGSMSLEDIANAATAAEPSLTTDESDLLKQSVGFLDSMFNKERPP QSVTTSIVRSVVPPIVTQLNIIRKDGTVMMGDGKPSIYVF* CloMA Chalara longipes BDJ v1.0 462219 no MATSSSFQQLPRGSLTIVGSGFRSIIQFTTEALMHIEAAEKLYYCVLDAATRGFIKAKNSNSVDLYE CYSNTKPRYETYIQMTEAMLRSVRDGLKATVVLYGHPGVFIHPSHRAIAIARSEGYDAWMLLGIS VEDYLFADLLIDPSNPGTQTVEATEILLKERPLLTSSHVIIYQVGCIGNFTFNFSGIKNDKFDALVDR LIQEYGPDHPLVNYQAAISPLSEASIGRHIVSDLRKAEVQESVTGASTFYIPPKTVLQVTPQGAKLV SESDELPTYLSKDVPVFPPFPFNQSLAPIAPAYSSAERKAIEELDNHITPLEYRKYNASSAMQKTVES ISFSLDTIKKFRESPSAFASSIEELEPHEIDALSTGSGERIDAAMQGNAAVNPNAAWLITFAIIFGK* CmaMA Coprinopsis marcescibilis CBS121175 v1.0 670214 yes MDATANPKAGQLTIVGSGIASINHMTLQAVACIETADVVCYVVADGATEAFIRKKNENCIDLYPL YSETKERTDTYIQMAEFMLNHVRAGKNVVGVFYGHPGVFVCPTHRAIYIARNEGYRAVMLPGLS AEDCLYADLGIDPSTVGCITYEATDMLVYNRPLNSSSHLVLYQVGIVGKADFKFAYDPKENHHFG KLIDRLELEYGPDHTVVHYIAPIFPTEEPVMERFTIGQLKLKENSDKIATISTFYLPPKAPSAKVSLN REFLRSLNIADSRDPMTPFPWNPTAAPYGEREKKVILELESHVPPPGYRPLKKNSGLAQALEKLSL DTRALAAWKTDRKAYADSVSGLTDDERDALASGKHAQLSGALKEGGVPMNHAQLTFFFIISNL* CmiMA Coprinellus micaceus FP101781 v2.0 1707844 yes MIGASLAKKGQLTIVGSGIASISHLTLQAVSAIENADIVCYVVADGATEAFIRKKNPNSLDLYHLY GEDKQRTDTYIQMAEFMLIRVRQGQNVVGVFYGHPGVFVCPTHRALYIARSEGYKARMLPGLSA EDCLFADLGIDPSSVGCVTYEATDLLVFKRPINPASHLVLYQVGIVGKSNFKFDYTSDENIHFTKLL DRLEEAYGPEHSVTHYIAPLFPTEDPIAEEYTIAQLRLPEIRDKIHTISTFYVPPKTSESLIYDEVLLAS LGVTHKPSVPYPWNPEATPYGPREKKAIELLAEHEPPKGYRPLKERSGLLAVLEKLCLEPLEMKK YNEDRQAYADGLKGLTENEKEALVKGDHRTLAGALKVGDTPTNPAALVFTFIITRLD* CmuMA Cystostereum murrayi CysMur001 v1.0 1185527 yes 242 MPAPRKGTLTIAGSGIASIGHITLETLSHIQGADKIHYAVTDPATEAFILEKSKDSSSCFDLGIYYDK NKMRYETYVQMCEVMLRDVRGGHNVLGIFYGHPGVFVSPTHRAIALARDEGYTAKMLPGISAED YMFSDLGFDPAFPGCMTQEATILLVRGRKLDPSVHNIIWQVGGVGVDTMVFDNANFYILVDRLEE DLGPDHKVVHYIGAVLPQSTAVIDEFTVAGLRKEEVVKQITTVSTFYLPPRTLLHADQDMVQKLG LSDSLGKRAVHVYPRTKWINAESPSPPAYGPFERAAVDRLADHTIPSNHLFLRGSQALRQLMTDL ALQPTLRARYVADPTSVLDDVTGMSAEETFALTLRHPAPVFKVMRATGEAIANGVPTLGEIAESA NSSIAGSSCALIGFFVVVLEI* CpeMA Coprinellus pellucidus v1.0 554111 yes MPSTTRGSLTLAGAGVTSIGHLTLQTVSAIENADIVCYILNDPVTEAFIIKKNPNVYDLYQLYDDGK PRIETYHQMVEVLMSKVRSGQDVVGLFTGHPGVVNTPAAQAFKIARQEGYTARMLPGITTNDAL LADVVADPALGGAMAYEATDFLNNNRVLHPEMNVFIQQVGVVGNKHFNFMEMRSSLLDKLIDR LEETYGGEKEIIHYIAPMLPIDKPVMQKMTVSDLKKPEYKAKIVPSSTFYITPNEQLSSVLDSTEGK KLHREAMSALANHTHGKNYAPMKENLALTEALERLALEPKSLEAYRSDPQSYVNENGRGLTEEE RKALVTGRGIRELLSDGPVAAHRIAPLALV* DbiMA2 Dendrothele bispora CBS 962.96 v1.0 758933 yes MPVRIPSPQKEAGSLTIVGTGIESIGQITLQAISHIETASKVFYCVVDPATEAFIRTKNKNCFDLYPY YDNGKHRMDTYIQMAEVMLKEVRNGLDVVGVFYGHPGVFVSPSHRALAIAESEGYKARMLPGV SAEDCLFADLRIDPSHPGCMTYEASDFLIRERPVNIHSHLVLWQVGCVGVADFNSGGFKNTKFDV LVDRLEQEYGADHPVVHYMASILPYEDPVTDKFTVSQFRDPQIAKRICGISTFYIPPKETKDSNVEA MHRLQLLPSGKGVLKETGRYPSNKWAPSGSFHDVDPYGPRELAAVTKLKSHTIPEHYQPLATSKA MTDVMTKLALDPRVLSEYKASRQDFVHSVPGLTPNEKNALVKGEIAAIRCGMKNIPISEKQWELR DGLVTKFIVVPIWVSIDDTTGNLE* DbOphMA Dendrothele bispora CBS 962.96 v1.0 765759 yes MESSTQTKPGSLIVVGTGIESIGQMTLQALSYIEAASKVFYCVIDPATEAFILTKNKNCVDLYQYYD NGKSRMDTYTQMAELMLKEVRNGLDVVGVFYGHPGVFVNPSHRALAIARSEGYQARMLPGVSA EDCLFADLCIDPSNPGCLTYEASDFLIRERPVNVHSHLILFQVGCVGIADFNFSGFDNSKFTILVDRL EQEYGPDHTVVHYIAAMMPHQDPVTDKFTIGQLREPEIAKRVGGVSTFYIPPKARKDINTDIIRLLE FLPAGKVPDKHTQIYPPNQWEPDVPTLPPYGQNEQAAITRLEAHAPPEEYQPLATSKAMTDVMTK LALDPKALAEYKADHRAFAQSVPDLTPQERAALELGDSWAIRCAMKNMPSSLLEAASQSVEEAS MNGFPWVIVTGIVGVIGSVVSSA* FmeMA1 Fomitiporia mediterranea v1.0 25792 no MATSTETTEKKGSLTIAGTGIASIKHITLETLSYIKEAEKVYYLVADPATEAFIQDNASGTCFNLHV FYDTNKHRYDSYVQMAEVMLLDVRAGHSVLGIFYGHPGVFVSPSHRAIAIAREEGFKAHMLPGIS AEDYMFADIGFDPATHGCVSYEATELLVRDKPLLPSSHNIIWQVGAIGANAMVFDNGKFNILVDR LEQVFGPDHKVVHYIGAVLPQSTSTIEAYTISDLRKGDVVEKFSTTSTLYVPPSVEARLSGIMVREL GLEDSGFHTKSSQSRTLWAGPVTSSAPAYGPQERIVIAQIDKDVIPDSHQILQASDAMKKTMANLA LNPKLSEEYYASPSTVVEKVTGLSEQEKKALILCSAGAIHMVMAATQTNIAQGHQWSAEELEAAG TPHPALALLVVIICLI* FmeMA2 Fomitiporia mediterranea v1.0 30904 no MAATTETMKKGSLTIAGSGIASIKHMTLETVSHIKEAEKVYYIVTDPATEAYIKDNAVGACFDLRV FYDTNKPRYESYVQMSEVMLRDVRVGHSVLGIFYGHPGVFVSPSHRAIAIAKEEGFQARMLPGIS AEDYLFADIGFDPAAHGCMSYEATELLVRNKPLNTSTHNIIWQVGALGAEAMVFDNAKFSLLVD RLEQDYGSDHKVVHYIGAILPQADPTVEAYIVADLRKEDVVKQFNAISTLYIPPRVAGKFLDDMA KKLGIADSAAYLKNHYPQAPYTGPEFATDPAYGPREKAVIDQIDNHAAPEGHTVLHASDALKKLN 243 TDLALSPKFLEEYKENPMPILEAMDGLTNEEKAALMQNPLGATHELMWATPDEIANGRALPVVN FMAYGGYGGYYGGGCRPCPCCVVTDRWSSGGSNKCNMVNNLNV* FmeMA3 Fomitiporia mediterranea v1.0 162487 no MAATTETTKKGSLTIAGSGIASIKHMTLETVSHIKEVEKVYYIVSDPATEAYIKDNAVGTCFDLRV FYDTNKPRYESDVQMSEVMLRDVRAGHSVLGIFYGHPGVFVSPSHRAIAIAKEEGFQARMLPGIS AEDYLFADIGFDPAVHGCMSYEATELLVRNKPLNTSTYNIIWQVGALGAEAMVFDNAKFSLLVD RLERDYGSDHKVVHYIGAILPQADSTIEAHTVSDLRKEDIVKQFNAISTLYIPPRVAGKFLDDMVE KLGIADPATFLKNHYTQPPYSGPEFATDPAYGPREKAVIDQIDNHAAPEGHTVLHATDALKKLNT DLALSPKFLKEYKENPMPILEAMDGLTDEEQAALMQNPLGATHELMWATPDEIANGRVLPVVNF CFLGGNRRGYRRGYQAVNYGGSYNTYIINNF* FmeMA4 Fomitiporia mediterranea v1.0 117392 yes MATSTETAQKKGSLTIAGTGIASIKHITLETLSYIKEAEKVYYLVADPATEAFIHDNASGTCFNLHV FYDTNKLRYDSYVQMAEVMLRDVRAGNSVLGLFYGHPGVFVSPSHRAIAVAREEGFKAQTLPGI SAEDYMFADIGFDPASHGCVSYEATDLLARDKPLLPSSHNIIWQVGAIGANAMVFDNGKFNVLVD RLERDFGPNHKVVHYIGAVLPQSTSKVEQYTVADLRKDYVVKTFTTTSTLYVPPCVDAGISNIMA RELGLEDSTGLRTRGNQPLPLKTGPAISLASVYGSHERTTIAQIDKGVTPDTLQILQASDAMKKLM ADLALKPKLLEKYRGNPSVVIDEVTGLAPQEKAALTLCSAGAIYMVMAASQIDIAKGRQWSTEEL KTAADVSAPVILVLSQYNTVH GesMA Gyromitra esculenta CBS101906 v1.0 514041 yes MSVQPQSSAKKGGLVVVGSGIRSVSQLTLEAVMHIEKADTVLYCVCDPSTEGFIKRKNKNAIDIY GYYSDLKERPDAFVQMAEVILREVRKGINVVAVFYGHPGIFVHPSRRALAIAKKEGYAARMLPGI SAEDCLFADLLVNPSFPGAQLVEASDIVYRARPLATSCHVVIFQAACFGHWKYNFTAFENGKFDH LVNRLQKDYGPDHPIVSYMAAVSPLEDPVINRHTISDLYKADVKKEITPNCTLYIPPKDLLPISPAG ELIILGHQAGPDETPKFPPLPIHHYLAPEEETYGPQETSAVAALEKGAISADYRPYCASPAMQKVTE SLSLDPEVLKTYRESPQAFAESIPGLEAREVKALASGSPVKIHDSMWVEGKSEVRW* GjuMA Gymnopilus junonius AH 44721 v1.0 1778734 yes MATPIATTTNTPTKAGSLTIAGSGIASVGHITLETLAYIKESHKVFYLVCDPVTEAFIQENGKGPCIN LSIYYDSQKSRYDSYLQMCEVMLRDVRNGLDVLGVFYGHPGVFVSPSHRAIALAREEGFNAKML AGVSAEDCLFADLEFDPASFGCMTCEASELLIRNRPLNPYIHNVIWQVGSVGVTDMTFNNNKFPIL IDRLEKDFGPNHTVIHYVGRVIPQSVSKIETFTIADLRKEEVMNHFDAISTLYVPPRDISPVDPTMAE KLGPSGTRVEPIEAFRPSLKWSAQNDKRSYAYNPYESDVVAQLDNYVTPEGHRILQGSPAMKKFL ITLATSPQLLQAYRENPSAIVDTVEGLNEQEKYGLKLGSEGAVYALMSRPTGDIAREKELTNDEIA NNHGAPYAFVSAVIIAAIICAL* GymMA1 Gymnopus fusipes MUCL028262 - - MQSSTQKQAGSLTIVGSGIESISQITLQSLSHIEAASKVFYCVVDPATEAYLLAKNKNCVDLYQYY DNGKPRMDTYIQMAEVMLREVRNGLDIVGVFYGHPGVFVNPSQRAIAIAKSEGYQARMLPGISAE DCLFADLGIDPCNPGCVSYEASDFLIRERPVNVSSHFILWQVGCIGVADFTFVKFNNSKFGVLLDR LEHEYGADHTVVHYIAAVLPYENPVIDKLTISQLRDTEVAKRVSGISTFYIPPKELKDPSMDIMRRL ELLAADQVPDKQWHFYPTNQWAPSAPNVVPYGPIEQAAIVQLGSHTIPEQFQPIATSKAMTDILTK LALDPKMLTEYKADRRAFAQSALELTVNERDALEMGTFWALRCAMKKMPSSFMDEVDANNLPV VAVVGVAVGAVAVTVVVSLNDLTDSVN* HpiMA Hydnomerulius pinastri v2.0 28991 yes 244 MPVPTTTNKNGSLTIAGSGIASIRHMTLETLSAIKSADKVYYTVCDPATEAFIQDNATGSCSDLTVY YDKEKSRYDTYVQMCEVMLREVRAGHNVLGVFYGHPGVFVSPSHRAIAIARAEGYKAEMLAGV SAEDYMFADLGFDPAAHGCVTYEATEMLLRKKQLNPATHNIIWQVGGVGVSNMIFDNARFHLLV DRLEDTFGPDHQVVHYIGAVLPLSVKTMETYTIADLRKEDVVAQFNPTSTLYIPPRDVSPNDPEVA QQLSSFEAVVRSKYPPPGWTTSEPSSALAYGPRERDAIAQLDSHVAPDSHKVLRASSAIRRLMADL ALSPELLATYRKDPQAVVAATEGLTVQEKAALSLNKAGAIYGVMKATPYDIANNRSLSVADMGA INEPAALTTMINIHVTHV* LedMA Lentinula edodes Le(Bin) 0899 ss11 v1.0 1040599 yes METPTLNKSGSLTIVGTGIESIGQMTLQTLSYIEAADKVFYCVIDPATEAFILTKNKDCVDLYQYYD NGKSRMDTYTQMSEVMLREVRKGLDVVGVFYGHPGVFVNPSLRALAIAKSEGFKARMLPGVSA EDCLYADLCIDPSNPGCLTYEASDFLIRERPTNIYSHFILFQVGCVGIADFNFTGFENSKFGILVDRL EKEYGAEHPVVHYIAAMLPHEDPVTDQWTIGQLREPEFYKRVGGVSTFYIPPKERKEINVDIIREL KFLPEGKVPDTRTQIYPPNQWEPEVPTVPAYGSNEHAAIAQLDTHTPPEQYQPLATSKAMTDVMT KLALDPKALAEYKADHRAFAQSVPDLTANERTALEIGDSWAFRCAMKEMPISLLDNAKQSMEEA SEQGFPWIIVVGVVGVVGSVVSSA* LlaMA Lentinula lateritia RHP3577 ss4 v1.0 755966 yes METPTLNKSGSLTIVGTGIESIGQMTLQTLSYIEAADKVFYCVIDPATEAFILTKNKDCVDLYQYYD NGKSRMDTYTQMSEVMLREVRKGLEVVGVFYGHPGVFVNPSLRALAIAKSEGYKARMLPGVSA EDCLYADLCIDPSNPGCLTYEASDFLIRERPTNIYSHFILFQVGCVGIADFNFTGFENSKFGILVDRL EKEYGADHPVVHYIAAMLPHEDPVTDQWTIGQLREPEFYKRVGGVSTFYIPPKERKEINVDIIREL KFLPEGKVPDTRTQIYPPNQWEPEVPTVPAYGSNEHAAIAQLDAHSAPEQYQPLATSKAMTDVMT KLALDPKALAEYKADHRAFAQSVPDLTANERTALEIGDSWAFRCAMKEMPVSLLDNAKQSMEE ASEQGFPWIIVVGVVGVVGSVVSSA* LraMA Lentinula raphanica INPA1701G ss19 v1.0 642948 yes MESSTQTKTGSLIIVGTGIESIGQMTLQTLSYIEAADRVFYCVIDPATEAFILTKNKNCVDLYQYYD NGKTRMDTYTQMSEVMLREVRKGLKVVGVFYGHPGVFVNPSLRALAIAKSEGFKARMLPGVSA EDCLYADLCIDPSNPGCLTYEASDFLIRERPANIYSHFILFQVGCVGIADFSFTGFDNSKFGVLVDRL EKEYGGDHPVVHYIAAMLPHEEPVTDKFTIAQLREPEVYKRVGGVSTFYIPPKERKEINADIIHQLK FLPEGKVPDKRTQIFPPNQWEPEVPTLPAYGPNDYATIALIDSHTPPEQYQPLATSKAMTDVMIKL ALDPQALEEYKADHRAFAQSIPDLTTHERIALEMGDSWAFRCAMKDMPQSLLERAQQNMEESAQ HGFPWIIVVGVVGVVGSVVSSA* MeuMA Mycosphaerella eumusae CBS 114824 KXT02930.1 no MASSSVWSYIDHLTQEDDISSSCGDAGDKKGELVVVGTGIASLRQMTVEALDYIQRADMVFYVV LDAMTECFIQTHAKKHHDLYQYYDKNKPRNASYVQMAELMVQSVRDGNLTVAVYYGHPGVFV FPTHRAIHIAREEGYKAKMLPGVSAEDCLYADLGIDPGTTGCSMFEATYLLNEPDRLDPRNHVIIW QPGCVGKSTMVFDNSEIHELADYLEKTYGPEYPIIAYLAAVRPFNDPQIDKLMVKDLRDLEKLKAI PFNAATTLYIPPKTLPVVPQDMEDPIELQLARNSAFRMSHPEMNLVDNYTKQDKQWVEDLKHFV PPNDYKRMTASTAMRRAAIKLALLHHRLHGVLPRELIADRALSKSGLTPNEAESLRVMIDNLDLF LREGVERPPAVNGVSVIVFALLIIRNEDQRVNLHGGKMGWKRSVVVN* MfiMA Marasmius fiardii PR-910 v1.0 958901 yes MTFNDKKGSLTIAGSGIASIRHITLETLSHIERADKVYYLVADPATEAFIQDKSKGDYVDLAIYYDK DKNRYESYVQMSEVILNDVRAGYNVLGVFYGHPGVFVSPSHRTVAIARDEGYRVNMLPGVSAQ DYMFSDIGFDPAIPGCTIQEASTILFLDKRLDPTVHNIIGQVGCVGVGTMAFDNRQFHLLVDHLEK DFGPEHKVVHYIGAVLPQSATVKDEFKIADLRKDDVVKQISTISTFYIPPRQVTPVPKEVAEKLGFH 245 PLPTLPISTRIYPFLGSKASSSSTSFYEPFERNAVDRLQNHLPPLDYNTLRASPAVRQFMTDLALRPD VLNLYQADPMVLVDEIPGLTPSEKSALRSGDPGPVYELMRSNFTREKSTQMGAIVFVSI* MroMA1 Mycena rosella CBHHK067 v1.0 934645 yes MALKKPGSLTIAGSGIASIGHITLETLALIKEADKIFYAVTDPATECYIQENSRGDHFDLTTFYDTNK KRYESYVQMSEVMLRDVRAGRNVLGIFYGHPGVFVAPSHRAIAIAREEGFQAKMLPGISAEDYMF ADLGFDPSTYGCMTQEATELLVRNKKLDPSIHNIIWQVGSVGVDTMVFDNGKFHLLVERLEKDFG LDHKIQHYIGAILPQSVTVKDTFAIRDLRKEEVLKQFTTTSTFYVPPRTPAPIDPKAVQALGLPATV TKGAQDWTGFQSVSPAYGPDEMRAVAALDSFVPSQEKAVVHASRAMQSLMVDLALRPALLEQY KADPVAFANTRNGLTAQEKFALGLKKPGPIFVVMRQLPSAIASGQEPSQEEIARADDATAFIIIYIV QG* MroMA2 Mycena rosella CBHHK067 v1.0 1200894 no MALNKPGSLTIAGSGIASIGHITLETLALIKEADKIFYAVTDPATECYIQENSRGDHFDLTTFYDTNK KRYESYVQMSEVMLREVRAGRNVLGIFYGHPGVFVAPSHRAIAIAREEGFQAKMLPGISAEDYMF ADLGFDPSTQGCMTQEATELLVRNKKLDPSVHNIIWQVGSVGVDTMVFDNGKFHLLVERLEKDF GLDHKIQHYIGAILPQSVTVKDAFAIRDLRKEEVLKQFTTTSTFYIPPRAPAPIDAKVLQALGLPPPA QATKDRTGYGPLEKQAVAALDSFIPSQEKQVVHASPAMQSLMADLALRPALFEQYKADPVGFAN TRNLNGLTAQEKFALGFNKSGPIFAVMRHLPSAIASGQERSQEEIAHAADDKELLALVVVIVQ* OphMA Omphalotus olearius 2087 yes METSTQTKAGSLTIVGTGIESIGQMTLQALSYIEAAAKVFYCVIDPATEAFILTKNKNCVDLYQYY DNGKSRLNTYTQMSELMVREVRKGLDVVGVFYGHPGVFVNPSHRALAIAKSEGYRARMLPGVS AEDCLFADLCIDPSNPGCLTYEASDFLIRDRPVSIHSHLVLFQVGCVGIADFNFTGFDNNKFGVLVD RLEQEYGAEHPVVHYIAAMMPHQDPVTDKYTVAQLREPEIAKRVGGVSTFYIPPKARKASNLDII RRLELLPAGQVPDKKARIYPANQWEPDVPEVEPYRPSDQAAIAQLADHAPPEQYQPLATSKAMSD VMTKLALDPKALADYKADHRAFAQSVPDLTPQERAALELGDSWAIRCAMKNMPSSLLDAARES GEEASQNGFPWVIVVGVIGVIGSVMSTE* PgiMA1 Phlebiopsis gigantea v1.0 54959 no MSSASSDSNTGSLTIAGSGIASVRHMTLETLAHVQEADIVFYVVADPVTEAYIKKNARGPCKDLEV LFDKDKVRYDTYVQMAETMLNAVREGQKVLGIFYGHPGVFVSPSRRALSIARKEGYQAKMLPGI SSEDYMFADLEFDPAVHGCCAYEATQLLLREVSLDTAMSNIIWQVGGVGVSKIDFENSKVKLLVD RLEKDFGPDHHVVHYIGAVLPQSATVQDVLKISDLRKEEIVAQFNSCSTLYVPPLTHANKFSGNM VKQLFGQDVTEVSSALCPTPKWAAGSHLGDVVEYGPREKAAVDALVEHTVPADYRVLGGSLAF QQFMIDLALRPAIQANYKENPRALVDATKGLTTVEQAALLLRQPGAVFGVMKLRASEVANEQGH PVAPASLDHVAFTAPSPASLDHVAFSAPNPASLDHVAFIAPTPASLDHVAFSAPTPASLDHVSFGTP TSASLDHVAFEAPVPASLDHVAFAAPVPASLDHVAFAAPTPASLDHVAFAAPTPASLDHVAFAVP VPASLDHIAFSVPTPASLDHVAFAVPVPDHVAGIPCM* PgiMA2 Phlebiopsis gigantea v1.0 80884 no MSHDATTTKRGSLTIAGSGIASVAHITLETVAYLAEADSVFYIVADPVTEAFIHKNAKVPCQDLHV FYDKDKSRYDTYVQMAETMLNSVRAGEKVLGIFYGHPGVFVSPSRRALAIAREEGYEAKMLPGV SAEDYMFADLEFDPATHGCCAYEATHILLKNIPLDTSINNIIWQVGGVGVTKIDFENSKFKFLVDR LEKDFGLDHKVVHYIGAVLPQSATVKEVYTISDLRKPEVATQFNACSTLYVPPRKGAADPFPAHV VEQLLGTTTSKVVDALYPVAQWDLGNNLPAVPAYGPYEQKVVAAMGDHTTPDDYRALAGSPA MQQFMAELALRPTLQAKYRASPQAVVDATPGLTDLERAALLLNAAGPVLAVMKPRAGEVMTVD KLKESVTPSAAYLFIFIVIAAAAHILV* 246 PmuMA Pseudocercospora musae KXS93410.1 no MASTVWSYFDQLTRDDDFGSCEDACSKQGELVVVGTGIASLRQMTVEALDYIQRADMVFYVVL DAMTEAFIQTHAKKHHDLYQYYDKNKPRSASYIQMAELMVQSVRDGNLTVAVYYGHPGVFVFP THRAIHIAREEGFKAKMLPGVSAEDCLYADLGIDPGSTGCSMFEATYLLNEPDRLDPRNHVIIWQP GCVGKSAMVFDNSEIHELADYLEKTYGAEYPVIAYLAAVRPFNDPQIDKLMVKDLRDLEKLRAIP FNAATTLYIPPKTLPAVPQDIANPIEVQLARNSAFRLSHPEMNLVDMYTKQDKQWCDDLKHFVPP NDYKPMTATPAMRRLAIKLALLHHRLHGALPTELIASKALSKSELSSSEAESLRLMIKNLDLFLRE GVERPPAVNGVSVIVFALLIIRSEDQRVGFDGKMEWKRSVVVN* PocMA Porodaedalea chrysoloma FP-135951 v1.0 797528 yes MPVSTTTTKNGTLVIAGSGIASIAHITLETLSHIKESDRVYYIVGDPATEAFIQDNASGTCFDLTIFY DTNKVRYDSYVQMCEVMLRDVRAGHTVLGVFYGHPGVFVSPSHRAIAIARDEGYKARMLPGVS AEDYLFADLGFDPATHGCTSYEATDLLVRNKPLNASTHNIIWQVGGVGVGTMVFDNAKFHLLVD RLEKDFGPSHTVVHYIGAVLPQSITTMDKLTIADLRKDAVVKQFNPTSTFYIPPRDISLPLDTMAKK LGMDDASARPVSLYPPSRWTGTKFTTAPAYGPREKDVIAKIDTYAAPKDHKILHASRSMKKLMT DLALNPKLLEKYRANTKAVVEATEGLSAQEKAALNMDLAGPVHAVMKATPSDITDGREMSVDA VASATEPSAALILLLV* RviMA1 Rhizopogon vinicolor AM-OR11-026 v1.0 805340 yes MITSNSSNGSNSTKCGTLTIAGSGIASVAHITLETLSYIKESEKIFYLVCDPVTEAYIQDNTTADCFD LSVFYGKNKGRHDSYIQMCEVMLKAVRAGHDVLGVFYGHPGVFVSPSHRAIAVARQEGYKAKM LPGISAEDYMFADLEFDPSLSGCKTCEATEILLRDKPLDPSIQNIIWQVGSVGVVDMEFEKSKFQLL VDRLEKDFGPGHKVVHYIGAVLPQSTTTMDTFTIADLRKEDVAKQFGTISTLYVPPRDEGHVNPS MAEAFGTPAGPARLNDSVKWVGPKLSIVSANGPHQRDVIAQIDTHIAPEGHKKLHASAAMKKFM TDLALRPKFLDEYKLNPVAVVESAQGLSNLEQFGLKFARGGPVDALMKATESDIASGRQLTEEEI AKGNGPPGAAATVLLLGALIITLSLNFS* RviMA2 Rhizopogon vinicolor AM-OR11-026 v1.0 749423 yes MSTKRGTLTIAGSGIASVGHITLGTLSYIKESDKIFYLVCDPVTEAFIYDNSTADCFDLSVFYDKTK GRYDSYIQMCEVMLKAVRAGHDVLGVFYGHPGVFVSPSHRAIAVARQEGYKAKMLPGISAEDY MFADLEFDPSVSGCKTCEATEILLRDKPLDPTIQNIIWQVGSVGVVDMEFSKSKFQLLVDRLEKDF GPDHKVVHYIGAVLPQSTTTMDTFTIADLRKEDVAKQFGTISTLYIPPRDEGHVNLSMAKVFGGP GASVKLNDSIKWAGPKLNIVSANDPHERDVIAQVDTHVAPEGHKKLRVSAAMKKFMTDLALKPK FLEEYKLDPVAVVESAEGLSNLERFGLKFARSGPADALMKATESDIASGRQLTEEEIAQGTGPVGL QTALALLVLLGLGVAIVTRPDD* RviMA3 Rhizopogon vinicolor AM-OR11-026 v1.0 700323 yes MTTSNSSNGTKRGTLTIAGSGIASVGHITLGTLSYIKESDKIFYLVCDPVTEAFIHDNSTADCFDLSV FYDKNKGRYDSYIQMCEVMLKDVRAGHHVLGVFYGHPGVFVSPSHRAIAVARQEGYNAKMLPG ISAEDYMFADLEFDPSLYGCKTCEATEILLRDKPLDPSIHNIIWQVGSVGVVDMEFSKSKFHLLVD RLEKDFGLEHKVVHYIGAVLPQSATTMDTFTIADLRKEDVAKQFGTISTLYIPPRDERPFNPRMAE AFGSPAAPAMPISSVKWAGPKLNIPPVYGPHERDVIAQIDTHVAPEGHKKLHTSAAMKKFMTDLA MKPKLLEEYKRDPVAVVEAAEALSDLEKFGLKFARVGPADVLMKATESDIASGRQLTEEEIAKAN GPQGLGTIILVWHTVHGIA* RviMA4 Rhizopogon vinicolor AM-OR11-026 v1.0 769711 yes MTTDIKRGTLTIAGSGIACIAHITLETLSYIKESDKLFYLVCDPVTEAFIQDNATGGCFDLSVFYDKN KSRYDSYIQMCEVMLKAVRVGYDVLGVFYGHPGVFVSPSHRAIAVAREEGYKARMLPGISAEDY LFADLEFDPSLHGCNTYEATELLLRGKPLDPLIHNIIWQVGSVGVIDMEFEKSKFHLLVDRLENDF 247 GPDHKVVHYIGAVLPQSTTTMDTFTISDLRKEDVAKQFGTISTLYVPLRDEALVNPIMAEAFGRTA APVTMNSSVKWAGPKLNIVSAYGPHERSVIAQIDTHVAPEGHKKLHTSTAMNKFMTDLALKPKF LEEYKLDPAAVVESAEGLSNMEKFGLKVAKAGAAHILMKATESDIASGRQLTEDEIARADGPEGL AVVVIVLVATVALLALLV* RviMA5 Rhizopogon vinicolor AM-OR11-026 v1.0 854502 yes MTTGTERGTLTIAGSGIACVAHITLETLSYIKESDKLFYLVCDPVTEAFIQDNATGDCFDLSVFYDK NKSRYDSYIQMCEVMLKAVRAGHHVLGVFYGHPGVLVSPSYRAIAVAREEGYKARMLPGISAED YLFADLEFDPCFPSGCNTYEATELLLRDRSLDPSIHNIIWQVGSVGVTDMEFEKSKLNLLVDRLEN DFGPDHKVVHYIGAVLPQSTTTMDTFAVSDLHKEDVAKQFGTISTLYIPPRDEAPVSSNMMEVLN RPPVPNMPPPSVMWVAPKLNISSAYTPHERDVIAQIDTHVAPEGYKKLHTSAAMKKFMTDLALKP KFVEEYMLDPVAVIESAEGLSDVEKFALKVAKGGAANILMKATESEIASGRHLTEDEISNAVGPLG LSATVVLVVAEAVVIMAMAVLV* RviMA6 Rhizopogon vinicolor AM-OR11-026 v1.0 710394 yes MTTGTERGTLTIAGSGIACVAHITLQMLSYIKESDKLFYLVCDPVTEAFIQDNATGDCFDLSVFYD KNKSRHDSYIQMCEIMLRAVRADHHVLGVFYGHPGIFVSPSYRAMAVAREEGYKAKMLPGISTE DYLFADLEFDPCLPGCNTYEATELLLRDRSLDPSIHNIIWQVGSVGVIDIQFEKSKFHLLVDRLEKD FGPDHKVVHYIGAVLPQSTTTMDTFTISDLRKEDVAKQFGTISTLYIPPRDKPLAHPGMAEAIGSLT APAKLYSPVKWAGPKLNIVSPYSPYERDVIARIDTHVAPEGHKKLYTSAAMKKFMTDLALKPKLL EEYMLDPVAVVESADGLSDVEKFGLKLAKDGVANILMMATESDIASGRHLAEDEIAKAKGPLGL LTVVLVIVGSSLVVHRLT* RviMA7 Rhizopogon vinicolor AM-OR11-026 v1.0 777202 no MTTSNSSDGTKRGTLTIAGSGIASVGHITLGTLSYIKESDKIFYLVCDPVTEAFIHDNSTADCFDLSV FYDKNKGRYDSYIQMCEVMLKAVRAGHDVLGVFYGHPGVFVSPSHRAIAVARQEGYKAKMLPG ISAEDYMFADLEFDPSLYGCKTCEATEILLRDKPLDPTIQNIIWQVGSVGVVDMEFSKSKFHLLVD RLEKDFGPDHKVVHYIGAVLPQSATIMDTFTIADLRKEDVAKQFGTISTLYIPPRDERPVHSGMAE AFGSPGAAVKPNTSIKWAGPKLNIVSACGPHEPDVIAQIDTHVTPEGYKKLHASVSMKKFMTDLA LKPKFLEEYKLDPVAVVEAAEGLSDLEKFGLKFARDGPADTLMKATESDIASGRQLTEEEVANGN GPLGLQTVVVVWLTTKIVSPEL* RviMA8 Rhizopogon vinicolor AM-OR11-026 v1.0 777713 yes MTTDTKRGTLTIAGSGIASIAHITLETLSYIKESDKLFYLVCDPVTEAFIQDNATGDFFDLSVFYDKN KSRYDSYIQMCEIMLRAVRAGHSVLGIFYGHPGVFVSPSHRAIAVAREEGYKARMLPGVSAEDY MFADLEFDPSQSTCNTYEATELLLRDRPLDPAIQNIIWQVGSVGVVDMEFEKSKFHLLVDRLEQDF GPDHKVVHYIGAVLPQSTTTMDIFTISDLRKENVAKQFGTISTLYIPPRDEGPVSSSMTQAFDFKAG AMVYSPVKWAGPKLNIVSALSPYERDVISQIDTHVAPEGYKILHTSAAMNKFMTDLSLKPKFLEE YKLYPEAVVESAEGLSNLEKFGLKFGSDGAVYILMKATESDIASGRQLTEDEIAKAHKSVGFPTVL VILPTVIVVLIGRE* SbaMA Sanghuangporus baumii OCB86575.1 no MAGSQKGTLTIAGSGIASIGHITLETLSYIQEADKIHYAVADPATEAFILDKSKDSSHCFDLTVYYD TNKMRYETYVQMCEVMLRDVRGGYNVLGIFYGHPGVFVSPSHRAIAIARDEGYIAKMLPGVSAE DYMFSDIGFDPAVPGCMSQEATGLLVCKKKLDPSIHNIIWQVGSVGVDTMNREFHILVDRLEEDF GLDHKVVHYIGAVLPQSTTVMDEFTIADLRKEEVVKQITTTSTFYLPPRSMAHIDQDMLQKLRLSL SPVEHVMHVYPRSKWASAESPNPPAYGPIEREAVSHLTNHTIPNDHQFLRGSRPLRQLMVDLALQ PGLRNRYKADPASVLDAIPGMSAEEKFALTLNHAAPIFKVMRASRADGEAPTLDEIAGTVNPSLA CPAIVVCFVGIMVIVIAL* 248 SveMA Serendipita vermifera ssp. bescii NFPB0129 v1.0 781716 yes MASSTHPKRGSLTIAGTGIATLAHMTLETVSHIKEADKVYYIVTDPVTQAFIEENAKGPTFDLSVY YDADKYRYTSYVQMAEVMLNAVREGCNVLGLFYGHPGIFVSPSHRALAIAREEGYEARMLPGVS AEDYMFADLGLDPALPGCVCYEATNFLIRNKPLNPATHNILWQVGAVGITAMDFENSKFSLLVDR LERDLGPNHKVVHYVGAVLPQSATIMETYTIAELRKPEVIKRISTTSSTFYIPPRDSEAIDYDMVAR LGIPPEKYRKIPSYPPNQWAGPNYTSTPAYGPEEKAAVSQLANHVVPNSYKTLHASPAMKKVMID LATDRSLYKKYEANRDAFVDAVKGLTELEKVALKMGTDGSVYKVMSATQADIELGKEPSIEELE EGRGRLLLVVITAAVVV* TcuMA Thanatephorus cucumeris MPI-SDFR-AT-0096 v1.0 718597 no MATFTEDNHPKRGSLIIAGSGIASVAHFTLETVSHLKNADKVFYLVNDPVTEAFIQENNPDTFDLV TFYSETKPRYHSYVEMAEIMLKEVRAGHKVLGIFYGHPGVFVHPSRRALFIARQENYEARMLPGIS SEDYMFADLELDPAEFGCMTCEATELIARNRPLNTSVHNIIWQAGIVGVSTLEYQESKFQLLVDRL ERDFGPEHKVVHYVGAIRMTPQAQSAMVVYSIQELRNPAVANFINSGSTLYVPPRLRDVPRVDPD SATALGLPPVTTGFLSASPTWVGSRFVTPSSYGDLENNIVAQMNENRSRSRITEPSPAMKGLMIKL AQELKLQEEYKKDPAKVAADTPDLKEIERRALSYGLDNTIRAVMSHRGSSSGPTEEQLKEISWEGS TIKHVTASSIAQ* TelMA Trypethelium eluteriae v1.0 416528 yes MAPSTSDRSKLPVAGYRPGRLVMVGSGIKSIAHLTLEAIGHIEQADKVFFVVADMTTAAFIHSRNA NAVDMYNLYDIGKPRYHTYVQMAERMLREVRNGFYVVGVFYGHPGIFVNPSHRAIAIARQEGHQ AFMLPGISAEACLFADVGIDPSTSGCQTIEATDLLLRNRPINTGSHLIIFQVGIVGDSGFHPQGFKNT KLHVLLEKLTEVYGSGHRLVHYIAPSMATVEPTIDFLTLGALKKSRNARRVTGISTFYIPPKHDVQ PSPSAAKKLGLKVQQGAKSRNFGRLTMPEDPYGPRERVAIDELDKHKDPAWYKRVRASQPMFDL LYRLGSDPRAAAKFKANPDKFLIPYDSDLTQTERAALLTRRSFPVRQALQPSADDVANQVVQRLF RDPSFATQWASTLKKNKSDPNGEQNIIAWLKQQGYDTTPEAVDSAYLQALNVDLDIYDSAYATSF SGGSTGPLIVILNGKVTVAGVEIKNPIYSQSILSWGTTDGNEYNAQLFLRVLTNDDGKPLPQNAYV GPQLYGYYWSPNSVKPTKPNINGKVGQPSPSNGSDPVQPTPLSKFAATYNTYIAGATGKYAADSQ LVVANPEPNTTVTYKGIVIKKWTYANESLSWLATDGNAQNVAIRFFINTSSTSSDPTLGPQFLGTT WAQGQNPPSKSNFFGQIGQSADPDTTANILTKANTWIQFGLNLVNGIAAMLICHAIMSLFKARNA EAANPSPENQQAEQQAEQDANDAINEQEAIQDNAADQGGNEEVDPNDLDPDEAGEPNANADAD ADADADADADADADADADADAEADADAEADADAEADADAEADADAEADADAEADADADIDI DIDADVVDIIL* ThyMA Trichophaea hybrida UTF0779 v1.0 914024 yes MTQGSLFIVGSGIRSIAQLTLEAIMHIENADKVFYVVCDPVTEGFIKEKNPNAVDLYEYYSNTKLR NETYIQMAEIMLREVRSGLRVVGVFYGHPGNFVSPTRRALAIARDEGYVAKMLPGISADDCLFAD LLIDPCYPGLQTVEATDVLVRNRPLQTTSHVVIYQVGVICKSGFDFYSIENDKFDHFVTRLQEDYG PNHPVVNYVAAVSPLAEPTIQRHTISELFKDSVKASISGVSTFYIPPKELLPLTAAGEKLILDLNTDK AAVQVKTYPPLPYCPLSTGQQAYGAYEKSVIEKIKNHTTPAGYKPYQTSRAMHKALERLYLDPET VKKYRRDPEGFAAEFEGLKENEAEALRSGNPDSCASLGAAVLHAVAVWIAC* TisMA Talaromyces islandicus CRG85870.1 no MSTSEHHRPASHGFRPGKLVIVGSGIRSISQFTLEAVAHIEHADKVFYCVADPGTDAFIERHNKNA VDLYNLYGDGKPRHQTYTQMAEVILQEVRKGFSVVGVFYGHPGVFVNPAHRAVSIAASEGYEAT MLPGVSAEDCLYADLLIDPSRPGCQTLEATDVLLRKRPIAKDCHVIIFQVGAVGDLGFNFKGFKNT KFEILVQHLLEVYGPDHSVVHYIASQLTFAAPIRDRYAIQDLVKPEVAKRITGISTFYLPPKDLLQP DEVAAKSLGLVSRPTTTASFGPYAPDQPYGPRELAAIKALKAHKDPANYNKTRASPALYQALESL 249 ALNPKDVLKFRSSREKFIARIDGLTKPEQKALRFASTGLIRQVLKSSAKDIATKFVQDEFRNPTLAT QYAQILKENRNKTDGIDKITEWLKAQGYDTTPEAIGEAYKQELSRNLDSYDGKYTTNVDGKPGPQ LLLQKGTVLVDGVKIPNWSYSSSQLSWTVEDGNPSSAMLHFQLLTNDTGKPLPPGSYIGPQFYGL YWRKGSSKPTGNNTVGKVGEVPPPDPITPVKPTPISAWLDTYQTYLKSSSGTWDKAGELAITGDE TNPTVTYKGKQIQKYSYQNETISWSSADGNPNNALSFYFNKNPTQKNPAPGNQFSGKYWESGQA PPTAANLFGQIGSSSSPGTAANDAMTAAQWKTIGINLGVGILTFVLGDFTLKAINALIKWVRNPTK ENRDALDQANDDAGEAEAQQEAVEAEGADLNPGGDIVDAGDVPAQAAEAAEAAEAAEVAEVA EVAEAAEAAEAAEAAEAAEVAEVAEVAEVAEVAEVAEVVDVVEVII* WmiMA Wilcoxina mikolae CBS 423.85 v1.0 650847 no MPQGSLTIVGSGIRSIAQLTLEAIMHIENADKVFYVVCDPATEGFIKQKNPNAVDLYEYYSNTKLR NETYIQMAEIMLREVRSGLRVVGVFYGHPGNFVSPTRRALAIAQDEGYVAKMLPGISADDCLFAD LLIDPCYPGLQTVEATDVLVRDRPLQITSHVVIYQVGVICKSGFDFTSIENDKFDHFVNRLQQDYGP SHPVINYVAAVSPLAEPTIQRYTISDLFKDSVKACISGVSTFYLPPKELLPITDVGEKLILDLGTDKA ALQVKTYPPLPYCPLSTGQQPYGPYEKAVIERIKDHTTPADYRPYNTSQAMYKALERLYLDPEAV KKYRRDPEGFAAAFEGLKENEAQALKSGNPDSSASLGHVRHPV* D Protein sequences for alignment Protein name Originating organism Protein sequence boundaries used in the alignments AboMA Anomoporia bombycina ATCC 64506 v1.0 GKLVIVGSGIGSIGQFTLSAVAHIEQADRVFFVVADPATEAFIY SKNKNSVDLYKFYDDKKPRMDTYIQMAEVMLRELRKGYSVV GVIYGHPGVFVTPSHRAISIARDEGYSAKMLPGVSAEDNLFAD IGIDPSRPGCLTYEATDLLLRNRTLVPSSHLVLFQVGCIGLSDFR FKGFDNINFDVLLDRLEQVYGPDHAVIHYMAAVLPQSTTTIDR YTIKELRDPVIKKRITAISTFYLPPKA AgaMA1 Armillaria gallica 21-2 v1.0 GTLTIAGSGIASIGHITLETLSYIQEADKVYYAITDPATEAFIQD KSEGDCFDLTVYYDKNKIRYETYVQMCEVMLRDVRADYNVV GVFYGHPGVFVSPSHRAIAIARDEGYRARMLPGVSAEDYMFS DLGFDPAVPGCMTQEATAMLNHNKKLDPSIHNIIWQVGAVGI DTMVFDNRKFHLLVDRLEEDFGPDHRVVNYIGAVLPQSTTVM DEFTIGDLRKEDVVKQFTTVSTFYVPPRT AgaMA2 Armillaria gallica 21-2 v1.0 GTLTIAGSGIASIGHITLETLSYIQGADKVYYVITDPATEAFIQD KSEGDCFDLTVYYDKNKIRYETYVQMCEVMLRDVRADYNVV GVFYGHPGVFVSPSHRAIAIARDEGYRARMLPGVSAEDYMFS DLGFDPAVPGCMTQEATAMLNHNKKLDPSIHNIIWQVGAVGI DTMVFDNRKFHLLVDRLEEDFGPDHRVVNYIGAVLPQSTTVM DEFTIGDLRKEDVVKQFTTVSTFYVPPRT AolMA Arthrobotrys oligospora ATCC 24927 GKLILVGTGVRSLCQLTLEAIDEIERADVIYYAVRDATTEGFIK KRNKEAIDLYQYFINDEEIPEADIYIQIAEVMLAATRKGRRVVG AFFGHPGLFMSPNRRALAIAQAEGYTAKILPGVSVDDCLLADL GVDPSFIGCLTCEARDFMIHDHLGLTSRHVIMYEVGYLGFYGD DSKTDYFEYFVNRLEEIYGNEHSLVNYTAAISPLMQPVINTLTI GDLRKPEVRKQITSASTLYFPPKE AosMA Armillaria ostoyae C18/9 GTLTIAGSGIASIGHITLETLSYIQEADKVYYAITDPATEAFIHD KSKGDCFDLSVYYDKNKNRYETYVQMCEVMLRDVRADYNV LGVFYGHPGVFVSPSHRAIAIARDEGYRARMLPGVSAEDYMF SDLGFDPAVPGCMTQEATAMLIHNKKLDPLIHNIIWQVGSVG 250 VDTMVFDNRKFHLLVDRLEEDFGLDHKVVHYIGAVLPQSTTV MDEFTIGDLRKEDVVKQFTTMSTFYVPPRT ApeMA Apodospora peruviana CBS118394 GKLVMVGSGIKSISHMTLETVSHIEQADKVFYCVADPGTELFV KSKAKWSFDLYTLYDNDKNRYITYVQMAELCLQAARDGFFS VGVFYGHPGVFVSPSHRAIGIAKREGIEAYMLPGISAEDCLFAD LGVDPSFTGCQTYEATDLLLRDRPISPYSHLIVWQVGVVGDTG FNFGGFTQTKFQVLVDRLEEVYGSDHRLIHYFASTLSHGPAHI EPLRISDLRKPEVEKRMNGISTFYVPQIG BadMA Bjerkandera adusta v1.0 GSLTIAGSGIASVAHITLETLSHIREADKVFYIVCDPATEAFIHD NAKAEAVDLTVYYDTNKARYDSYVQMAEVMLQDVRGGKD VLGIFYGHPGVFVSPSHRALAIARSEGYKAKMLPGVSAEDYLF ADLEFDPSVHGCATFEATELLLREKPLNTTMHNIIWQVGAVG VDDMVFTNSKLHVLVDRLEKDFGPEHQVVHYIGAVLPGSRTV MDTFTVADLCKDDVVKQFNPSSTLYIPPRS CbeMA Cercospora beticola GELVVVGTGIASIRQMTVEALDYIQRADKVFYATLDAVTETFI KHHAPSAEDLYQYYDTEKNRVTTYVQMAEVILSSVRKGKLT VAVFYGHPGVFVTPSHRAIYIARHEGYKAQMLPGVSAEDCLY ADLGIDPASSGCSMYEASFLLNEPNRLDSRHHLIIWQVGCVGK EAMIFDNKEIYKLADYLEAEYGPDHPVIAYLAAIQPFHDSKMD KMTVQDLRDQDKVQNIPITAGTTLYVPPKK CeaMA1 Ceratobasidium sp. (anastomosis group I, AG-I) v1.0 GSLIIAGSGISSVAHLTLETVSHLKNADNVFYLVGDPVTEAFIQ ENNKSTTNLVAHYATSKHRYQTYVEMAEVMLREVRAGHSVF GIFYGHPGVLTTPAHRALTLARQEGYEARMLPGVSSVDYMFA DLELEPGQHGCMIHEATDLLARDRRLDPSVHNIILQPSRVGSA TLEKEASKFQLLVDRLVRDFGPDHKIVHYSGAVLPQSSSAMV VFVIENLRNEQLANQIRSTSILYIPPRD CeaMA2 Ceratobasidium sp. (anastomosis group I, AG-I) v1.0 GTLTIAGSGIASIRHITLETLSYIKESDKIYYLVADPATEAFIIEN ANGSCVSLYGLYGIDKIRYDTYVQMSEVLLRDVRAGFDVLGI FYGHPGVFVSPTQRAMSIALEEGFQARMLPGVSAEDYLFADL RVDPCMFGCAAYEATELLYRKRRLNPTMQNIIWQVGKRFTIIK LTSPDTQNSKFGLLVDHLEEDYGPDHKVVHYIGAVLPQATTVI QPYTISELRKPEVASQIRACSTFYIPPRD CeuMA1 Cerrena unicolor v1.1 GSLTIAGSGIASIGHITLETLSYIEQADKVYYAVADPATEAFIQD KSKVECFDLTVYYDKDKIRFETYIQMSEVMLRDVRAGHSVLG IFYGHPGVFVCPSHRAIAIALSEGYKARMLPGISAEDYMFSDIG FDPALPGCTTQEATHLLLHNKKLDPSMHNIIWQVGGVGADTM NFDNRQFHQLVDCLERDFGSSHKVVHYIGAVMPQSTTIMDEF SIADLRKEEVVKQFTTWSTFYIPPRD CeuMA2 Cerrena unicolor v1.1 GSLTIAGSGIASVAHITLEVLSYLQEADKIYYAIVDPVTEAFIQD KSKGRCFDLRVYYDKDKMRSETYVQMSEVMLRDVRSGYNV LAIFYGHPGVFVCPTHRAISIARSEGYTAKMLPGVSAEDYMFS DIGFDPAVPGCMTQEATSLLIYNKQLDPSVHNIIWQVGSVGVD NMVFDNKQFHLLVDHLERDFGSIHKVIHYVGAIMPQSATVMD EYTISDLRKEDVVKKFTTTSTLYIPPRE CfuMA Cladosporium fulvum v1.0 GELVVVGTGIASLRQLTVEALDYIQRADVVFYATLDAVTEAFI KQHAKAAENLYQYYDTEKNRNATYTQMAETILASVRKGNMT VAVFYGHPGVFVTPSHRAIYIARQEGYKAKMLPGVSAEDCLY ADLDIDPASSGCSMYEASFLLLEPDRLDSRHHLIIWQVGCVGK EAMVFDNKELYKLADYLEAEYGPKHPAIAYLAAIQPFNDSKM DHMTVEDLRDPEKVRSIPINAGTTLYVPPKK 251 CloMA Chalara longipes BDJ v1.0 GSLTIVGSGFRSIIQFTTEALMHIEAAEKLYYCVLDAATRGFIK AKNSNSVDLYECYSNTKPRYETYIQMTEAMLRSVRDGLKATV VLYGHPGVFIHPSHRAIAIARSEGYDAWMLLGISVEDYLFADL LIDPSNPGTQTVEATEILLKERPLLTSSHVIIYQVGCIGNFTFNFS GIKNDKFDALVDRLIQEYGPDHPLVNYQAAISPLSEASIGRHIV SDLRKAEVQESVTGASTFYIPPKT CmaMA Coprinopsis marcescibilis CBS121175 v1.0 GQLTIVGSGIASINHMTLQAVACIETADVVCYVVADGATEAFI RKKNENCIDLYPLYSETKERTDTYIQMAEFMLNHVRAGKNVV GVFYGHPGVFVCPTHRAIYIARNEGYRAVMLPGLSAEDCLYA DLGIDPSTVGCITYEATDMLVYNRPLNSSSHLVLYQVGIVGKA DFKFAYDPKENHHFGKLIDRLELEYGPDHTVVHYIAPIFPTEEP VMERFTIGQLKLKENSDKIATISTFYLPPKA CmiMA Coprinellus micaceus FP101781 v2.0 GQLTIVGSGIASISHLTLQAVSAIENADIVCYVVADGATEAFIR KKNPNSLDLYHLYGEDKQRTDTYIQMAEFMLIRVRQGQNVV GVFYGHPGVFVCPTHRALYIARSEGYKARMLPGLSAEDCLFA DLGIDPSSVGCVTYEATDLLVFKRPINPASHLVLYQVGIVGKS NFKFDYTSDENIHFTKLLDRLEEAYGPEHSVTHYIAPLFPTEDPI AEEYTIAQLRLPEIRDKIHTISTFYVPPKT CmuMA Cystostereum murrayi CysMur001 v1.0 GTLTIAGSGIASIGHITLETLSHIQGADKIHYAVTDPATEAFILE KSKDSSSCFDLGIYYDKNKMRYETYVQMCEVMLRDVRGGHN VLGIFYGHPGVFVSPTHRAIALARDEGYTAKMLPGISAEDYMF SDLGFDPAFPGCMTQEATILLVRGRKLDPSVHNIIWQVGGVGV DTMVFDNANFYILVDRLEEDLGPDHKVVHYIGAVLPQSTAVI DEFTVAGLRKEEVVKQITTVSTFYLPPRT CpeMA Coprinellus pellucidus v1.0 GSLTLAGAGVTSIGHLTLQTVSAIENADIVCYILNDPVTEAFIIK KNPNVYDLYQLYDDGKPRIETYHQMVEVLMSKVRSGQDVVG LFTGHPGVVNTPAAQAFKIARQEGYTARMLPGITTNDALLAD VVADPALGGAMAYEATDFLNNNRVLHPEMNVFIQQVGVVGN KHFNFMEMRSSLLDKLIDRLEETYGGEKEIIHYIAPMLPIDKPV MQKMTVSDLKKPEYKAKIVPSSTFYITPNE DbiMA2 Dendrothele bispora CBS 962.96 v1.0 GSLTIVGTGIESIGQITLQAISHIETASKVFYCVVDPATEAFIRTK NKNCFDLYPYYDNGKHRMDTYIQMAEVMLKEVRNGLDVVG VFYGHPGVFVSPSHRALAIAESEGYKARMLPGVSAEDCLFAD LRIDPSHPGCMTYEASDFLIRERPVNIHSHLVLWQVGCVGVAD FNSGGFKNTKFDVLVDRLEQEYGADHPVVHYMASILPYEDPV TDKFTVSQFRDPQIAKRICGISTFYIPPKE DbOphM A Dendrothele bispora CBS 962.96 v1.0 GSLIVVGTGIESIGQMTLQALSYIEAASKVFYCVIDPATEAFILT KNKNCVDLYQYYDNGKSRMDTYTQMAELMLKEVRNGLDVV GVFYGHPGVFVNPSHRALAIARSEGYQARMLPGVSAEDCLFA DLCIDPSNPGCLTYEASDFLIRERPVNVHSHLILFQVGCVGIAD FNFSGFDNSKFTILVDRLEQEYGPDHTVVHYIAAMMPHQDPV TDKFTIGQLREPEIAKRVGGVSTFYIPPKA FmeMA1 Fomitiporia mediterranea v1.0 GSLTIAGTGIASIKHITLETLSYIKEAEKVYYLVADPATEAFIQD NASGTCFNLHVFYDTNKHRYDSYVQMAEVMLLDVRAGHSVL GIFYGHPGVFVSPSHRAIAIAREEGFKAHMLPGISAEDYMFADI GFDPATHGCVSYEATELLVRDKPLLPSSHNIIWQVGAIGANAM VFDNGKFNILVDRLEQVFGPDHKVVHYIGAVLPQSTSTIEAYTI SDLRKGDVVEKFSTTSTLYVPPSV FmeMA2 Fomitiporia mediterranea v1.0 GSLTIAGSGIASIKHMTLETVSHIKEAEKVYYIVTDPATEAYIK DNAVGACFDLRVFYDTNKPRYESYVQMSEVMLRDVRVGHSV 252 LGIFYGHPGVFVSPSHRAIAIAKEEGFQARMLPGISAEDYLFAD IGFDPAAHGCMSYEATELLVRNKPLNTSTHNIIWQVGALGAE AMVFDNAKFSLLVDRLEQDYGSDHKVVHYIGAILPQADPTVE AYIVADLRKEDVVKQFNAISTLYIPPRV FmeMA3 Fomitiporia mediterranea v1.0 GSLTIAGSGIASIKHMTLETVSHIKEVEKVYYIVSDPATEAYIK DNAVGTCFDLRVFYDTNKPRYESDVQMSEVMLRDVRAGHSV LGIFYGHPGVFVSPSHRAIAIAKEEGFQARMLPGISAEDYLFAD IGFDPAVHGCMSYEATELLVRNKPLNTSTYNIIWQVGALGAE AMVFDNAKFSLLVDRLERDYGSDHKVVHYIGAILPQADSTIEA HTVSDLRKEDIVKQFNAISTLYIPPRV FmeMA4 Fomitiporia mediterranea v1.0 GSLTIAGTGIASIKHITLETLSYIKEAEKVYYLVADPATEAFIHD NASGTCFNLHVFYDTNKLRYDSYVQMAEVMLRDVRAGNSVL GLFYGHPGVFVSPSHRAIAVAREEGFKAQTLPGISAEDYMFAD IGFDPASHGCVSYEATDLLARDKPLLPSSHNIIWQVGAIGANA MVFDNGKFNVLVDRLERDFGPNHKVVHYIGAVLPQSTSKVEQ YTVADLRKDYVVKTFTTTSTLYVPPCV GesMA Gyromitra esculenta CBS101906 v1.0 GGLVVVGSGIRSVSQLTLEAVMHIEKADTVLYCVCDPSTEGFI KRKNKNAIDIYGYYSDLKERPDAFVQMAEVILREVRKGINVV AVFYGHPGIFVHPSRRALAIAKKEGYAARMLPGISAEDCLFAD LLVNPSFPGAQLVEASDIVYRARPLATSCHVVIFQAACFGHWK YNFTAFENGKFDHLVNRLQKDYGPDHPIVSYMAAVSPLEDPV INRHTISDLYKADVKKEITPNCTLYIPPKD GjuMA Gymnopilus junonius AH 44721 v1.0 GSLTIAGSGIASVGHITLETLAYIKESHKVFYLVCDPVTEAFIQE NGKGPCINLSIYYDSQKSRYDSYLQMCEVMLRDVRNGLDVLG VFYGHPGVFVSPSHRAIALAREEGFNAKMLAGVSAEDCLFAD LEFDPASFGCMTCEASELLIRNRPLNPYIHNVIWQVGSVGVTD MTFNNNKFPILIDRLEKDFGPNHTVIHYVGRVIPQSVSKIETFTI ADLRKEEVMNHFDAISTLYVPPRD HpiMA Hydnomerulius pinastri v2.0 GSLTIAGSGIASIRHMTLETLSAIKSADKVYYTVCDPATEAFIQ DNATGSCSDLTVYYDKEKSRYDTYVQMCEVMLREVRAGHN VLGVFYGHPGVFVSPSHRAIAIARAEGYKAEMLAGVSAEDYM FADLGFDPAAHGCVTYEATEMLLRKKQLNPATHNIIWQVGGV GVSNMIFDNARFHLLVDRLEDTFGPDHQVVHYIGAVLPLSVK TMETYTIADLRKEDVVAQFNPTSTLYIPPRD LedMA Lentinula edodes Le(Bin) 0899 ss11 v1.0 GSLTIVGTGIESIGQMTLQTLSYIEAADKVFYCVIDPATEAFILT KNKDCVDLYQYYDNGKSRMDTYTQMSEVMLREVRKGLDVV GVFYGHPGVFVNPSLRALAIAKSEGFKARMLPGVSAEDCLYA DLCIDPSNPGCLTYEASDFLIRERPTNIYSHFILFQVGCVGIADF NFTGFENSKFGILVDRLEKEYGAEHPVVHYIAAMLPHEDPVTD QWTIGQLREPEFYKRVGGVSTFYIPPKE LlaMA Lentinula lateritia RHP3577 ss4 v1.0 GSLTIVGTGIESIGQMTLQTLSYIEAADKVFYCVIDPATEAFILT KNKDCVDLYQYYDNGKSRMDTYTQMSEVMLREVRKGLEVV GVFYGHPGVFVNPSLRALAIAKSEGYKARMLPGVSAEDCLYA DLCIDPSNPGCLTYEASDFLIRERPTNIYSHFILFQVGCVGIADF NFTGFENSKFGILVDRLEKEYGADHPVVHYIAAMLPHEDPVT DQWTIGQLREPEFYKRVGGVSTFYIPPKE LraMA Lentinula raphanica INPA1701G ss19 v1.0 GSLIIVGTGIESIGQMTLQTLSYIEAADRVFYCVIDPATEAFILT KNKNCVDLYQYYDNGKTRMDTYTQMSEVMLREVRKGLKVV GVFYGHPGVFVNPSLRALAIAKSEGFKARMLPGVSAEDCLYA DLCIDPSNPGCLTYEASDFLIRERPANIYSHFILFQVGCVGIADF 253 SFTGFDNSKFGVLVDRLEKEYGGDHPVVHYIAAMLPHEEPVT DKFTIAQLREPEVYKRVGGVSTFYIPPKE MeuMA Mycosphaerella eumusae CBS 114824 GELVVVGTGIASLRQMTVEALDYIQRADMVFYVVLDAMTEC FIQTHAKKHHDLYQYYDKNKPRNASYVQMAELMVQSVRDG NLTVAVYYGHPGVFVFPTHRAIHIAREEGYKAKMLPGVSAED CLYADLGIDPGTTGCSMFEATYLLNEPDRLDPRNHVIIWQPGC VGKSTMVFDNSEIHELADYLEKTYGPEYPIIAYLAAVRPFNDP QIDKLMVKDLRDLEKLKAIPFNAATTLYIPPKT MfiMA Marasmius fiardii PR-910 v1.0 GSLTIAGSGIASIRHITLETLSHIERADKVYYLVADPATEAFIQD KSKGDYVDLAIYYDKDKNRYESYVQMSEVILNDVRAGYNVL GVFYGHPGVFVSPSHRTVAIARDEGYRVNMLPGVSAQDYMFS DIGFDPAIPGCTIQEASTILFLDKRLDPTVHNIIGQVGCVGVGT MAFDNRQFHLLVDHLEKDFGPEHKVVHYIGAVLPQSATVKDE FKIADLRKDDVVKQISTISTFYIPPRQ MroMA1 Mycena rosella CBHHK067 v1.0 GSLTIAGSGIASIGHITLETLALIKEADKIFYAVTDPATECYIQEN SRGDHFDLTTFYDTNKKRYESYVQMSEVMLRDVRAGRNVLG IFYGHPGVFVAPSHRAIAIAREEGFQAKMLPGISAEDYMFADL GFDPSTYGCMTQEATELLVRNKKLDPSIHNIIWQVGSVGVDT MVFDNGKFHLLVERLEKDFGLDHKIQHYIGAILPQSVTVKDTF AIRDLRKEEVLKQFTTTSTFYVPPRT MroMA2 Mycena rosella CBHHK067 v1.0 GSLTIAGSGIASIGHITLETLALIKEADKIFYAVTDPATECYIQEN SRGDHFDLTTFYDTNKKRYESYVQMSEVMLREVRAGRNVLGI FYGHPGVFVAPSHRAIAIAREEGFQAKMLPGISAEDYMFADLG FDPSTQGCMTQEATELLVRNKKLDPSVHNIIWQVGSVGVDTM VFDNGKFHLLVERLEKDFGLDHKIQHYIGAILPQSVTVKDAFA IRDLRKEEVLKQFTTTSTFYIPPRA OphMA Omphalotus olearius GSLTIVGTGIESIGQMTLQALSYIEAAAKVFYCVIDPATEAFILT KNKNCVDLYQYYDNGKSRLNTYTQMSELMVREVRKGLDVV GVFYGHPGVFVNPSHRALAIAKSEGYRARMLPGVSAEDCLFA DLCIDPSNPGCLTYEASDFLIRDRPVSIHSHLVLFQVGCVGIAD FNFTGFDNNKFGVLVDRLEQEYGAEHPVVHYIAAMMPHQDP VTDKYTVAQLREPEIAKRVGGVSTFYIPPKA PgiMA1 Phlebiopsis gigantea v1.0 GSLTIAGSGIASVRHMTLETLAHVQEADIVFYVVADPVTEAYI KKNARGPCKDLEVLFDKDKVRYDTYVQMAETMLNAVREGQ KVLGIFYGHPGVFVSPSRRALSIARKEGYQAKMLPGISSEDYM FADLEFDPAVHGCCAYEATQLLLREVSLDTAMSNIIWQVGGV GVSKIDFENSKVKLLVDRLEKDFGPDHHVVHYIGAVLPQSAT VQDVLKISDLRKEEIVAQFNSCSTLYVPPLT PgiMA2 Phlebiopsis gigantea v1.0 GSLTIAGSGIASVAHITLETVAYLAEADSVFYIVADPVTEAFIH KNAKVPCQDLHVFYDKDKSRYDTYVQMAETMLNSVRAGEK VLGIFYGHPGVFVSPSRRALAIAREEGYEAKMLPGVSAEDYMF ADLEFDPATHGCCAYEATHILLKNIPLDTSINNIIWQVGGVGVT KIDFENSKFKFLVDRLEKDFGLDHKVVHYIGAVLPQSATVKEV YTISDLRKPEVATQFNACSTLYVPPRK PmuMA Pseudocercospora musae GELVVVGTGIASLRQMTVEALDYIQRADMVFYVVLDAMTEA FIQTHAKKHHDLYQYYDKNKPRSASYIQMAELMVQSVRDGN LTVAVYYGHPGVFVFPTHRAIHIAREEGFKAKMLPGVSAEDC LYADLGIDPGSTGCSMFEATYLLNEPDRLDPRNHVIIWQPGCV GKSAMVFDNSEIHELADYLEKTYGAEYPVIAYLAAVRPFNDP QIDKLMVKDLRDLEKLRAIPFNAATTLYIPPKT 254 PocMA Porodaedalea chrysoloma FP- 135951 v1.0 GTLVIAGSGIASIAHITLETLSHIKESDRVYYIVGDPATEAFIQD NASGTCFDLTIFYDTNKVRYDSYVQMCEVMLRDVRAGHTVL GVFYGHPGVFVSPSHRAIAIARDEGYKARMLPGVSAEDYLFA DLGFDPATHGCTSYEATDLLVRNKPLNASTHNIIWQVGGVGV GTMVFDNAKFHLLVDRLEKDFGPSHTVVHYIGAVLPQSITTM DKLTIADLRKDAVVKQFNPTSTFYIPPRD RviMA1 Rhizopogon vinicolor AM- OR11-026 v1.0 GTLTIAGSGIASVAHITLETLSYIKESEKIFYLVCDPVTEAYIQD NTTADCFDLSVFYGKNKGRHDSYIQMCEVMLKAVRAGHDVL GVFYGHPGVFVSPSHRAIAVARQEGYKAKMLPGISAEDYMFA DLEFDPSLSGCKTCEATEILLRDKPLDPSIQNIIWQVGSVGVVD MEFEKSKFQLLVDRLEKDFGPGHKVVHYIGAVLPQSTTTMDT FTIADLRKEDVAKQFGTISTLYVPPRD RviMA2 Rhizopogon vinicolor AM- OR11-026 v1.0 GTLTIAGSGIASVGHITLGTLSYIKESDKIFYLVCDPVTEAFIYD NSTADCFDLSVFYDKTKGRYDSYIQMCEVMLKAVRAGHDVL GVFYGHPGVFVSPSHRAIAVARQEGYKAKMLPGISAEDYMFA DLEFDPSVSGCKTCEATEILLRDKPLDPTIQNIIWQVGSVGVVD MEFSKSKFQLLVDRLEKDFGPDHKVVHYIGAVLPQSTTTMDT FTIADLRKEDVAKQFGTISTLYIPPRD RviMA3 Rhizopogon vinicolor AM- OR11-026 v1.0 GTLTIAGSGIASVGHITLGTLSYIKESDKIFYLVCDPVTEAFIHD NSTADCFDLSVFYDKNKGRYDSYIQMCEVMLKDVRAGHHVL GVFYGHPGVFVSPSHRAIAVARQEGYNAKMLPGISAEDYMFA DLEFDPSLYGCKTCEATEILLRDKPLDPSIHNIIWQVGSVGVVD MEFSKSKFHLLVDRLEKDFGLEHKVVHYIGAVLPQSATTMDT FTIADLRKEDVAKQFGTISTLYIPPRD RviMA4 Rhizopogon vinicolor AM- OR11-026 v1.0 GTLTIAGSGIACIAHITLETLSYIKESDKLFYLVCDPVTEAFIQD NATGGCFDLSVFYDKNKSRYDSYIQMCEVMLKAVRVGYDVL GVFYGHPGVFVSPSHRAIAVAREEGYKARMLPGISAEDYLFA DLEFDPSLHGCNTYEATELLLRGKPLDPLIHNIIWQVGSVGVID MEFEKSKFHLLVDRLENDFGPDHKVVHYIGAVLPQSTTTMDT FTISDLRKEDVAKQFGTISTLYVPLRD RviMA5 Rhizopogon vinicolor AM- OR11-026 v1.0 GTLTIAGSGIACVAHITLETLSYIKESDKLFYLVCDPVTEAFIQD NATGDCFDLSVFYDKNKSRYDSYIQMCEVMLKAVRAGHHVL GVFYGHPGVLVSPSYRAIAVAREEGYKARMLPGISAEDYLFA DLEFDPCFPSGCNTYEATELLLRDRSLDPSIHNIIWQVGSVGVT DMEFEKSKLNLLVDRLENDFGPDHKVVHYIGAVLPQSTTTMD TFAVSDLHKEDVAKQFGTISTLYIPPRD RviMA6 Rhizopogon vinicolor AM- OR11-026 v1.0 GTLTIAGSGIACVAHITLQMLSYIKESDKLFYLVCDPVTEAFIQ DNATGDCFDLSVFYDKNKSRHDSYIQMCEIMLRAVRADHHV LGVFYGHPGIFVSPSYRAMAVAREEGYKAKMLPGISTEDYLF ADLEFDPCLPGCNTYEATELLLRDRSLDPSIHNIIWQVGSVGVI DIQFEKSKFHLLVDRLEKDFGPDHKVVHYIGAVLPQSTTTMDT FTISDLRKEDVAKQFGTISTLYIPPRD RviMA7 Rhizopogon vinicolor AM- OR11-026 v1.0 GTLTIAGSGIASVGHITLGTLSYIKESDKIFYLVCDPVTEAFIHD NSTADCFDLSVFYDKNKGRYDSYIQMCEVMLKAVRAGHDVL GVFYGHPGVFVSPSHRAIAVARQEGYKAKMLPGISAEDYMFA DLEFDPSLYGCKTCEATEILLRDKPLDPTIQNIIWQVGSVGVVD MEFSKSKFHLLVDRLEKDFGPDHKVVHYIGAVLPQSATIMDTF TIADLRKEDVAKQFGTISTLYIPPRD 255 RviMA8 Rhizopogon vinicolor AM- OR11-026 v1.0 GTLTIAGSGIASIAHITLETLSYIKESDKLFYLVCDPVTEAFIQD NATGDFFDLSVFYDKNKSRYDSYIQMCEIMLRAVRAGHSVLG IFYGHPGVFVSPSHRAIAVAREEGYKARMLPGVSAEDYMFAD LEFDPSQSTCNTYEATELLLRDRPLDPAIQNIIWQVGSVGVVD MEFEKSKFHLLVDRLEQDFGPDHKVVHYIGAVLPQSTTTMDIF TISDLRKENVAKQFGTISTLYIPPRD SbaMA Sanghuangporus baumii GTLTIAGSGIASIGHITLETLSYIQEADKIHYAVADPATEAFILD KSKDSSHCFDLTVYYDTNKMRYETYVQMCEVMLRDVRGGY NVLGIFYGHPGVFVSPSHRAIAIARDEGYIAKMLPGVSAEDYM FSDIGFDPAVPGCMSQEATGLLVCKKKLDPSIHNIIWQVGSVG VDTMNREFHILVDRLEEDFGLDHKVVHYIGAVLPQSTTVMDE FTIADLRKEEVVKQITTTSTFYLPPRS SveMA Serendipita vermifera ssp. bescii NFPB0129 v1.0 GSLTIAGTGIATLAHMTLETVSHIKEADKVYYIVTDPVTQAFIE ENAKGPTFDLSVYYDADKYRYTSYVQMAEVMLNAVREGCN VLGLFYGHPGIFVSPSHRALAIAREEGYEARMLPGVSAEDYMF ADLGLDPALPGCVCYEATNFLIRNKPLNPATHNILWQVGAVGI TAMDFENSKFSLLVDRLERDLGPNHKVVHYVGAVLPQSATIM ETYTIAELRKPEVIKRISTTSSTFYIPPRD TcuMA Thanatephorus cucumeris MPI- SDFR-AT-0096 v1.0 GSLIIAGSGIASVAHFTLETVSHLKNADKVFYLVNDPVTEAFIQ ENNPDTFDLVTFYSETKPRYHSYVEMAEIMLKEVRAGHKVLG IFYGHPGVFVHPSRRALFIARQENYEARMLPGISSEDYMFADL ELDPAEFGCMTCEATELIARNRPLNTSVHNIIWQAGIVGVSTLE YQESKFQLLVDRLERDFGPEHKVVHYVGAIRMTPQAQSAMV VYSIQELRNPAVANFINSGSTLYVPPRL TelMA Trypethelium eluteriae v1.0 GRLVMVGSGIKSIAHLTLEAIGHIEQADKVFFVVADMTTAAFI HSRNANAVDMYNLYDIGKPRYHTYVQMAERMLREVRNGFY VVGVFYGHPGIFVNPSHRAIAIARQEGHQAFMLPGISAEACLF ADVGIDPSTSGCQTIEATDLLLRNRPINTGSHLIIFQVGIVGDSG FHPQGFKNTKLHVLLEKLTEVYGSGHRLVHYIAPSMATVEPTI DFLTLGALKKSRNARRVTGISTFYIPPKH ThyMA Trichophaea hybrida UTF0779 v1.0 GSLFIVGSGIRSIAQLTLEAIMHIENADKVFYVVCDPVTEGFIKE KNPNAVDLYEYYSNTKLRNETYIQMAEIMLREVRSGLRVVGV FYGHPGNFVSPTRRALAIARDEGYVAKMLPGISADDCLFADLL IDPCYPGLQTVEATDVLVRNRPLQTTSHVVIYQVGVICKSGFD FYSIENDKFDHFVTRLQEDYGPNHPVVNYVAAVSPLAEPTIQR HTISELFKDSVKASISGVSTFYIPPKE TisMA Talaromyces islandicus GKLVIVGSGIRSISQFTLEAVAHIEHADKVFYCVADPGTDAFIE RHNKNAVDLYNLYGDGKPRHQTYTQMAEVILQEVRKGFSVV GVFYGHPGVFVNPAHRAVSIAASEGYEATMLPGVSAEDCLYA DLLIDPSRPGCQTLEATDVLLRKRPIAKDCHVIIFQVGAVGDLG FNFKGFKNTKFEILVQHLLEVYGPDHSVVHYIASQLTFAAPIRD RYAIQDLVKPEVAKRITGISTFYLPPKD WmiMA Wilcoxina mikolae CBS 423.85 v1.0 GSLTIVGSGIRSIAQLTLEAIMHIENADKVFYVVCDPATEGFIK QKNPNAVDLYEYYSNTKLRNETYIQMAEIMLREVRSGLRVVG VFYGHPGNFVSPTRRALAIAQDEGYVAKMLPGISADDCLFAD LLIDPCYPGLQTVEATDVLVRDRPLQITSHVVIYQVGVICKSGF DFTSIENDKFDHFVNRLQQDYGPSHPVINYVAAVSPLAEPTIQR YTISDLFKDSVKACISGVSTFYLPPKE 256 Table 9.2 Splicing variability across phylum, organism, and putative borosin precursor. Phylum Exons Organism Gene name Ascomycota 1 Apodospora peruviana apeMA Ascomycota 1 Trypethelium eluteriae telMA Ascomycota 1 Talaromyces islandicus tisMA Ascomycota 3 Cercospora beticola cbeMA Ascomycota 3 Chalara longipes cloMA Ascomycota 3 Wilcoxina mikolae wmiMA Ascomycota 4 Arthrobotrys oligospora aolMA Ascomycota 4 Gyromitra esculenta gesMA Ascomycota 4 Trichophaea hybrida thyMA Ascomycota 5 Mycosphaerella eumusae meuMA Ascomycota 5 Pseudocercospora musae pmuMA Ascomycota 6 Cladosporium fulvum cfuMA Basidiomycota 3 Anomoporia bombycina aboMA Basidiomycota 3 Armillaria gallica agaMA1, agaMA2 Basidiomycota 3 Armillaria ostoyae aosMA Basidiomycota 3 Bjerkandera adusta badMA Basidiomycota 3 Ceratobasidium sp. AG-1 ceaMA1, ceaMA2 Basidiomycota 3 Cerrena unicolor ceuMA1, ceuMA2 Basidiomycota 3 Cystostereum murrayi cmuMA Basidiomycota 3 Dendrothele bispora dbOphMA, dbiMA2 Basidiomycota 3 Fomitiporia mediterranea fmeMA1-4 Basidiomycota 3 Gymnopilus junonius gjuMA Basidiomycota 3 Hydnomerulius pinastri hpiMA Basidiomycota 3 Lentinula edodes ledMA Basidiomycota 3 Lentinula lateritia llaMA Basidiomycota 3 Lentinula raphanica lraMA Basidiomycota 3 Marasmius fiardii mfiMA Basidiomycota 3 Mycena rosella mroMA1, mroMA2 Basidiomycota 3 Omphalotus olearius ophMA Basidiomycota 3 Phlebiopsis gigantea pgiMA1, pgiMA2 Basidiomycota 3 Porodaedalea chrysoloma pocMA Basidiomycota 3 Rhizopogon vinicolor rviMA1-8 Basidiomycota 3 Serendipita vermifera ssp. bescii sveMA Basidiomycota 3 Thanatephorus cucumeris tcuMA Basidiomycota 4 Coprinopsis marcescibilis cmaMA Basidiomycota 4 Coprinellus micaceus cmiMA Basidiomycota 4 Coprinellus pellucidus cpeMA 257 Figure 9.1 MAFFT sequence alignment of putative borosin precursors identified in this study Borosin precursor sequences correspond to Gly10-Ala252 of OphMA, where CobA from Bacillus megaterium was used as the outgroup in the phylogenetic tree depicted in Fig. 2. Catalytically relevant residues (Tyr66, Arg72, Tyr76 in OphMA) are marked with an asterisk (*) and the symbol (#) denotes residues involved in core peptide binding as seen in the structure of OphMA. Information concerning full protein sequences and originating hosts can be found in Table 9.1. 258 259 260 261 Figure 9.2 Genetic loci of borosin precursors catalytically validated in this study When permissible, 15 genes upstream and downstream are graphically represented as proportionally sized blocked arrows. Genes are color coded based on the predicted functions and homologies of their encoded proteins. Partial or complete RNA transcription of genes from publicly available data (genome.jgi.doe.gov/programs/fungi/index.jsf) is represented as colored arrow outlines. More information concerning the host organisms and encoded open reading frames can be found in Table 9.1. 262 Figure 9.3 LC-MS(/MS) data for borosin precursor E. coli expressions (Below) For space considerations, included in this document is data for CeuMA2, PgiMA1, PgiMA1_mut, and AboMA (data contributed by FM); all remaining data is in the supplementary material that can be found on the online version of the published paper. Lettering order is maintained in this document and the supplemental. LC-MS and LC-MS/MS spectra for borosin precursor expressions reveal methylated residues in proteolytically released core peptide fragments. Extracted ion chromatograms (EICs) of all the fragmented peptides (± 0.01 amu unless otherwise noted) cleaved by sequence-specific proteases precede all LCMS/MS data, where orange highlighted numbers represent the number of methylations detected in the listed peptide fragment. Peak integration, shown as a percent, is normalized to the most abundant peak depicted in the entire panel. Percentages of EIC areas provide visual approximations for relative PTM levels when taken into context with expression conditions, purification, digestion strategy, and analytical methods used. Slight differences in retention times of identical peptides from different expressions are due to slight variations in self-packed nLC columns described in the Materials and Methods section. For LC-MS/MS spectra showing overlapping, differentially methylated species, the most abundant MS/MS masses are annotated in closest proximity to the peptide sequence. The borosin precursor, time of in-vivo expression, parent ion details, and LC retention times (RT) are denoted in the upper righthand corner of the LC-MS/MS spectra. Observed MS/MS fragmented masses are listed above (b-ions) and below (y-ions) the listed sequence with grey lines marking sites of fragmentation. The mass difference from the theoretical expected masses are labelled in parentheses. A mass cutoff of 10.0-ppm was used for the annotated LC-MS/MS peaks. Ion masses are denoted with varying numbers of methylations in brackets, where ‘Me’ marks a mass shift corresponding to methylation. Protease abbreviations used in this figure are trypsin (Tryp) and chymotrypsin (Chymotryp). Please see the Materials and Methods section for more details concerning the acquisition of the LC-MS/MS data. a, Organization of extracted ion chromatogram (EIC) and LC-MS(/MS) data for E. coli expressions of borosins. The table indicates figure panel numbers, summary of expression data presented in the panels, corresponding borosin precursor protein, and LC-MS(/MS) peptide sequences. Residues in the LC-MS fragments (‘Sequence’ column) are shaded orange based on verified or inferred N-methylation position. b, EIC data of LedMA expressions for 24 h and 72 h; c-k, LC-MS/MS fragmentation data for LedMA; l, EIC data of CmaMA expressions for 24 h and 72 h; m-r, LCMS/MS fragmentation data for CmaMA; s, EIC data of CmiMA expressions for 24 h and 72 h; t-y, LC-MS/MS fragmentation data for CmiMA; y, EIC data of MroMA1 expressions for 24 h and 72 h; aa-ff, LC-MS/MS fragmentation data for MroMA1; gg, EIC data of SveMA expressions for 24 h and 72 h; hh-nn, LC-MS/MS fragmentation data for SveMA; oo, EIC data of CeuMA2 expressions for 24 h and 72 h; pp-xx, LC-MS/MS fragmentation data for CeuMA2; yy, EIC data of PocMA expressions for 24 h and 72 h; zz-bbb, LC-MS/MS fragmentation data for PocMA; ccc, EIC data of GjuMA expressions for 24 h and 72 h (6-Me species: ±0.02 amu); ddd-iii, LC-MS/MS fragmentation data for GjuMA; jjj, EIC data of PgiMA1 expression for 24 h (72 h expressions did not yield soluble protein). Note each panel represents a unique core peptide fragment with the relative percentages of unmethylated and methylated fragments noted. Zoomed-in panel levels are depicted when necessary; kkk-dddd, LC-MS/ MS fragmentation data for PgiMA1; eeee, EIC data of PgiMA1_mut expression for 24 h. Note each panel represents a unique core peptide fragment with the relative percentages of unmethylated and methylated fragments noted. Zoomed-in panel levels are depicted when necessary; ffff-kkkk, LC-MS/MS fragmentation data for PgiMA1_mut; llll-qqqq, LC-MS/MS fragmentation data for AboMA. 263 264 265 oo pp qq 266 rr ss tt 267 uu vv ww 268 xx jjj EIC from HPLC-MS PgiMA1 (24 h expression), chymotrypsin 269 kkk lll 270 mmm nnn ooo 271 ppp qqq rrr 272 sss ttt uuu 273 vvv www xxx 274 yyy zzz aaaa 275 bbbb cccc dddd 276 eeee ffff 277 gggg hhhh iiii 278 jjjj kkkk 279 llll 280 mmmm 281 nnnn 282 oooo 283 pppp 284 qqqq 285 Figure 9.4 LC-MS/MS data for in vitro methyltransferase assays of borosin precursors CmaMA, LedMA, MroMA1, and SveMA (Below) For all in vitro reactions, borosin precursors were expressed in E. coli for 2 h prior to purification; details concerning these experiments are described in the Materials and Methods section. Ion chromatograms of proteolytically cleaved borosin core peptides are depicted. Orange highlighted numbers and dashed lines represent the number of methylations detected in the listed peptide fragment. The bottom spectrum shows the base-level methylation state for the minimally expressed borosin precursor. The next two spectra reveal increased methylation states upon in-vitro incubation with SAM overnight or for 72 h, respectively. The top spectrum reveals the in-vivo methylation state for the precursor when expressed for 72 h. a, LedMA; b, CmaMA; c, MroMA1; d, SveMA. 286 287 Figure 9.5 MAFFT sequence alignment of putative borosin precursors identified in the Agaricales order of Basidiomycete fungi. (Below) Borosin precursor sequences correspond to Gly10-Ala252 of OphMA. Conserved regions in the methyltransferase domains targeted for degenerate primer construction are underscored by green lines. Information concerning full protein and primer sequences can be found in Table 9.1. 288 289 Figure 9.6 LC-MS(/MS) data of E. coli expressions for the gymnopeptide B borosin precursor GymMA1 (Below) LC-MS and LC-MS/MS spectra for borosin precursor expressions reveal methylated residues in the core peptide fragment of GymMA1. EICs of all the fragmented peptides (± 0.01 amu) cleaved by AspN precede all LC-MS/MS data, where orange highlighted numbers represent the number of methylations detected in the listed peptide fragment. Peak integration, shown as a percent, is normalized to the most abundant peak depicted in the entire panel. Percentages of EIC areas provide visual approximations for relative PTM levels when taken into context with expression conditions, purification, digestion strategy, and analytical methods used. Slight differences in retention times of identical peptides from different expressions are due to slight variations in self-packed nLC columns described in the Materials and Methods section. For LC-MS/MS spectra showing overlapping, differentially methylated species, the most abundant MS/MS masses are annotated in closest proximity to the peptide sequence. For extended expressions, alternative methylation states are detected in the GymMA1 precursor, similarly to OphMA. The borosin precursor, time of in-vivo expression, parent ion details, and LC retention times (RT) are denoted in the upper righthand corner of the LC-MS/MS spectra. Observed MS/MS fragmented masses are listed above (b-ions) and below (y-ions) the listed sequence with grey lines marking sites of fragmentation. The mass difference from the theoretical expected masses are labelled in parentheses. A mass cutoff of 10.0-ppm was used for the annotated LC-MS/MS peaks. Ion masses are denoted with varying numbers of methylations in brackets, where ‘Me’ marks a mass shift corresponding to methylation. Please see the Materials and Methods section for more details concerning the acquisition of the LC-MS/MS data. a, EIC data of GymMA1 expressions for 24 h and 72 h; b-h, LCMS/MS fragmentation data for GymMA1. a 290 b 291 c 292 d 293 e 294 f 295 g 296 h 297 10 Appendix 2: Supplemental information for Chapter 5 Figure 10.1 SonM WT fitted kinetic curves Fitted kinetic curves for determining the rate of SonM WT for SAM and 0Me-SonA. Bottom two fitted curves were used to determine a Ki of BBD and 2Me-SonA with SonM. BBD is a competitive inhibitor of 0Me- SonA but 2Me-SonA is not. See Table 5.2 for values. Activity was verified by subsequent MS analysis as shown in Figure 10.3. 298 Figure 10.2 Fitted kinetic curves for SonM active site mutants Fitted kinetic curves for SonM mutants with detectable activity for SAM and 0Me-SonA. All SonM active site mutants exhibited a lower catalytic efficiency when compared to WT. See Table 5.2 for values. Activity was verified by subsequent MS analysis as shown in Figure 10.3. 299 Figure 10.3 HPLC-MS/MS data for AspN-digested SonA after in vitro reaction with SonM (Below) Early and late time points for in vitro reactions with WT SonM and active site mutants. When possible, each reaction has an early and late time point, the specifics of which are indicated on each panel. Methylated residues are shown with orange circles, empty circles indicate that the methylation location is inferred. A: WT, B: Y93F, C: Y58F, D: R67K, E: R67A, F: Y71F, G: Y58F-Y71F A: SonM WT (0-2Me at 6 min and 24.5 min reaction times) 300 301 302 303 304 305 B: SonM Y93F (0-2Me at 10 min and 47.73 min reaction times) 306 307 308 309 310 311 C: SonM Y58F (0-2Me at 10 min and 108.5 min reaction times) 312 313 314 315 316 D: SonM R67K (0-2Me at 19.5 min and 135.5 min reaction times) 317 318 319 320 321 E: SonM R67A (0Me at 128.9 min reaction time) 322 F: SonM Y71F (0-2Me at 19.5 min and 135.5 min reaction times) 323 324 325 326 327 328 G: SonM Y58F-Y71F (0Me at 128.9 min reaction time)