Isolation and Identification of De Novo Long Noncoding RNAs from Mouse Myoblasts and Embryonic Stem Cells A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Catherine Ann Alsager Lee IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE Nobuaki Kikyo November, 2012 c© Catherine Ann Alsager Lee 2012 ALL RIGHTS RESERVED Acknowledgements I extend my deepest gratitude to Nobuaki Kikyo, my advisor and principal investigator, whose guidance, patience, and expertise aided the writing of this thesis in innumerable ways. I am also grateful to Sue Keirstead, Atushi Asakura, Ying Zhang, and all the members of the Kikyo Lab for their help and support, without whom this work would not have been possible. i Dedication This work is dedicated to my parents and to Tristan, whose love and encouragement during the long hours of research and writing were integral to the completion of this task. And also to my grandfather, Dr. Kyu Lee, who inspired in me my enthusiasm for science and who taught me the value of hard work and education. ii Abstract Long noncoding RNAs (lncRNAs) are a pervasive class of transcripts whose im- portance and biological relevance are only beginning to be elucidated. LncRNAs have been detected in nearly every cell type and found to be fundamentally involved in many biological processes; however, studies that characterize lncRNA expression during cer- tain periods of development are largely missing. Here, we demonstrate how a pool of potentially relevant lncRNAs can be identified using a RNA-chromatin immunoprecip- itation (RNA-ChIP) technique that pulls down sufficient amount of RNA to send for sequencing. In our initial experiment, we attempted to identify lncRNAs bound to the MyoD protein in myoblast cells; however, the lack of immunoprecipitation-compatible highly specific antibodies against MyoD prevented us from pursuing this project. As an alternative, we successfully identified lncRNAs bound to the histone-modifying com- plex COMPASS, as well as those bound to the master pluripotency factors Oct4 and Sox2 in mouse embryonic stem cells. This study provides a proof-of-principle to identify lncRNAs potentially involved in chromatin regulation of pluripotency. iii Contents Acknowledgements i Dedication ii Abstract iii List of Tables vi List of Figures vii 1 Introduction 1 2 Methods 6 2.1 Cell Culture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Myoblast Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Immunostaining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4 Antibody Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.5 DNA Length Optimization for ChIP . . . . . . . . . . . . . . . . . . . . 10 2.6 ChIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.7 RNA-ChIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.8 RNA-sequencing and Data Analysis . . . . . . . . . . . . . . . . . . . . 14 3 Results 16 3.1 ITS Differentiates Myoblasts into Myotubes . . . . . . . . . . . . . . . . 16 3.2 Western Blots Identify Immunoprecipitation- compatible Antibodies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 iv 3.3 Chromatin Size is Optimized for Immunoprecipitation . . . . . . . . . . 24 3.4 ChIP with MyoD Shows Non-specific Binding . . . . . . . . . . . . . . . 28 3.5 RNA-ChIP with MyoD Leads to Re-testing of Antibodies . . . . . . . . 33 3.6 RNA-seq Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4 Discussion 36 5 Conclusion 38 6 Glossary of Bioinformatics Terms 39 References 40 v List of Tables 2.1 Primary antibodies used for western blotting. . . . . . . . . . . . . . . . 9 2.2 Secondary antibodies used for western blotting. . . . . . . . . . . . . . . 9 2.3 Antibodies used for ChIP experiments. . . . . . . . . . . . . . . . . . . . 12 2.4 Sequences of primers used for qPCR. . . . . . . . . . . . . . . . . . . . . 13 3.1 Mapping percentages of RNA-seq 1 samples. . . . . . . . . . . . . . . . . 34 vi List of Figures 1.1 Overview of lncRNA populations based on their location in the genome. 3 3.1 MHC expression in myotubes from differentiated C2C12 cells. . . . . . . 17 3.2 Myogenin expression in myotubes from differentiated C2C12 cells. . . . 18 3.3 Western blots of IP-compatible antibodies against muscle-specific proteins. 20 3.4 Western blots of IP-compatible antibodies against components of the COMPASS complex. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.5 Western blots of IP-compatible antibodies against Oct4. . . . . . . . . . 22 3.6 Western blots of IP-compatible antibodies against Sox2. . . . . . . . . . 23 3.7 Size distribution of DNA after sonication of chromatin ranges from 0.1kb- 12kb. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.8 Size distribution of DNA after sonication of chromatin ranges from 0.1kb- 3kb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.9 Size distribution of DNA after incubation with micrococcal nuclease and 45 pulses of sonication of chromatin ranges from 0.3kb-1kb. . . . . . . . 27 3.10 Regulatory regions of the MyoD gene. . . . . . . . . . . . . . . . . . . . 29 3.11 Relative expression fold change of DNA from ChIP determined by qPCR 31 3.12 Relative expression fold change of DNA from ChIP determined by qPCR 32 3.13 Representative examples of peaks viewed with IGV. . . . . . . . . . . . 35 vii Chapter 1 Introduction Long noncoding RNA (lncRNA) is operationally defined as RNA longer than 200 bases that does not encode mRNA, rRNA or tRNA [1, 2]. Although several lncRNAs have been sporadically identified and characterized in the past 20 years, genome-wide identifi- cation of lncRNAs has only recently become possible with the advent of high-throughput sequencing technologies of cDNA (RNA-seq). Evidence that this field is gaining mo- mentum can be seen in the most recent report of the ENCODE (Encyclopedia of DNA Elements) project published in September 2012, which described 9,640 lncRNA loci in comparison to 20,687 protein-coding genes in 15 human cell lines [3, 4, 5]. This ratio of lncRNAs and protein-coding genes underscores the potential magnitude and diver- sity of the biological effects mediated by lncRNAs. Indeed, despite the fact that only about 100 lncRNAs have been functionally characterized to date [4], it has become clear that lncRNAs are involved in almost every aspect of cellular and molecular bi- ology. LncRNAs control cell differentiation, development, cancer progression, and cell metabolism, among other cell functions. At the gene expression level, lncRNAs regu- late all processes of RNA metabolism including chromatin modification, transcription, splicing, RNA transport, and translation. LncRNAs themselves are transcribed from intergenic regions, exons, introns, and their overlapping regions (Figure 1.1A and 1.1B). At the mechanistic level, lncRNAs serve as ”scaffolds” providing platforms to assemble RNAprotein complexes, ”guides” to recruit RNA-protein complexes to target genes, and ”decoys” by binding to and sequestering regulatory proteins away from their target DNA sequences [1, 2]. 1 2The first challenge in studying lncRNAs is how to collect RNA pools that poten- tially contain lncRNAs of interest. One can prepare RNA pools by simply isolating total RNA from cells or tissues in an unbiased manner; however, immunoprecipitation-based approaches are also commonly used to enrich lncRNAs associated with specific pro- teins. Cross-linking with UV or formaldehyde followed by fragmentation of chromatin is used to immunoprecipitate RNA-chromatin complexes (RNA-chromatin immunopre- cipitation or RNA-ChIP) [6, 7, 8]. For any immunoprecipitation-based approaches, specificity and affinity of the anti- bodies are decisive factors for the success or failure of the projects. While the specificity of the antibodies is commonly verified by detecting only one band in western blotting, the antibodies may react with other proteins when detergents are used at a low concen- tration during immunoprecipitation. One solution to address the specificity issue is to use multiple antibodies against the same protein and select reproducibly co-precipitated lncRNAs for further study. Similarly, immunoprecipitation of several different subunits within a single protein complex is also an option to identify lncRNAs that are likely to be genuinely interacting with the complex. After collection by immunoprecipitation, the sequences of the RNA pool of interest can be obtained by RNA-sequencing (RNA-seq). RNA-seq is a powerful tool based on the principles of next-generation sequencing that can be applied to the detection and quantification of lncRNAs. It works on a genome-wide scale at single nucleotide resolu- tion and is not limited to detecting already known sequences. Thus, it can be used to discover previously unknown lncRNAs in an unbiased manner [9]. For RNA-seq, one must decide whether to use total RNA or polyadenylated RNA. The presence of rRNA (around 80-85% of total RNA) and tRNA (15%) [10, 11] can dras- tically reduce the diversity of a cDNA library during amplification of cDNAs. Polyadeny- lated RNA is frequently used for RNA-seq to avoid this problem. However, given the prevalence of non-polyadenylated lncRNA in the genome (around 40% of total lncR- NAs), the disadvantage of losing this fraction is not negligible [12]. One solution to this problem is to use commercially available kits to remove rRNA from total RNA without losing non-polyadenylated RNA. After sequencing, a typical pipeline for RNA-seq analysis is to align the reads gen- erated by sequencing to the UCSC mouse mm9 or human hg19 reference genomes using 3Figure 1.1: Overview of lncRNA populations based on their location in the genome. LncRNAs can be categorized into subgroups of intergenic, exonic, intronic, and over- lapping according to where they are found relative to nearby protein-coding genes. (A) Proportion of lncRNA subgroups [3]. (B) Location of each type of lncRNA. 4software programs such as the short-read mappers Bowtie 2 [13] and Burrows-Wheeler Aligner [14], and the splice-junction identifier TopHat [15]. Next, the reads are used to assemble a transcriptome and discover previously unannotated transcripts with pro- grams such as Cufflinks [16], which relies on a reference annotation database, or Scrip- ture, which builds the transcriptome ab initio [17]. From here, novel lncRNAs can be identified by excluding protein-coding transcripts and annotated lncRNAs based on the databases of RefSeq, ENCODE, and FANTOM (Functional Annotation of the Mam- malian Genome) [18], as well as the two databases of experimentally verified lncRNAs generated by the Mattick lab: lncRNAdb [19] and NRED (Noncoding RNA Expression Database) [20]. Novel lncRNAs often undergo further scrutiny to ensure that they are not tran- scriptional noise and that they indeed do not encode proteins. Finally, behavior of the system of interest following knockdown or overexpression of the lncRNAs is typically a final step in functionally characterizing a novel lncRNA. The transcription factor MyoD is the master regulator of muscle differentiation. The Kikyo lab recently demonstrated that when the transactivation domain of MyoD is fused to the Oct4 protein, the efficiency of reprogramming of fibroblasts to induced pluripotent stem cells (iPSCa) is increased more than 10-fold [21]. The power of MyoD has been partly attributed to its interaction with the Pbx and Meis homeodomain pro- teins through its H/C region and helix III region as this allows MyoD to bind to its target promoters [22]. However, this interaction is essential for only a subset of MyoD target genes [23]. It remains to be elucidated how other target genes are activated by MyoD [24] and it is possible that long noncoding RNAs (lncRNAs) interacting with MyoD are playing a role. We set out to identify lncRNAs potentially bound to MyoD in undifferentiated muscles cells (myoblasts). At the same time, we looked at other groups of proteins potentially interacting with lncRNAs. We thought it likely that lncRNAs could be playing a role in transcriptional activation in addition to their previously known role in transcriptional repression and for this reason we investigated the potential of the proteins WDR5, MLL1, and Rbbp5 to immunoprecipitate lncRNAs. These proteins, in addition to Ash2L, form a complex known as COMPASS [25] in yeast and Trithorax in mouse and human (reviewed in [26]). All four of these proteins are required to catalyze the tri-methylation of histone 53 lysine 4 (H3K4me3), a marker of active transcription [27]. Their opposite is the poly- comb repressive complexes (PRC1 and PRC2), which tri-methylate histone 3 lysine 27 (H3K27me3), a marker of transcriptional repression [28, 29, 30]. PRC2 is known to interact with the lncRNAs HOTAIR (HOX antisense intergenic RNA) [31], and ANRIL (antisense noncoding RNA in the INK4 locus) [32]. LncRNAs have also been shown to dramatically influence pluripotency and the pluripotent state [33]. Early microarray studies found lncRNAs that were differentially expressed during mouse embryonic stem cell (MESC) differentiation [34] and microar- rays have also been used to discover lncRNAs that function in regulating reprogramming to pluripotency of somatic cells [35]. Other studies have shown that lncRNAs are neces- sary to maintain a state of pluripotency in stem or progenitor cells [36, 37]. In addition, data from chromatin immunoprecipitation (ChIP) experiments have shed light on a population of lncRNAs whose occupancy intersects with the known pluripotency fac- tors Oct4 and Nanog [38]. To date, however, it has not been shown that Oct4 or Sox2 directly interact with lncRNAs. Chapter 2 Methods 2.1 Cell Culture C2C12 myoblast cells [39] were grown in Dulbecco’s Modified Eagle Medium (DMEM/High Glucose, HyClone SH30243.01) containing 10% fetal bovine serum (FBS). Cells were passaged and expanded when they reached 80-90% confluency. This involved detaching the cells with trypsin and replating them in fresh media at a 1:20 dilution. For harvesting, trypsin was again used to detach the cells. Cells being used for west- ern blotting were left unfixed. Cells being used for ChIP or RNA-immunoprecipitation (RIP) were collected into 50ml conical tubes and incubated with 1% formaldehyde (paraformaldehyde, Sigma 30525-89-4) for 10 minutes while rotating at room temper- ature. 1.25M glycine was added to be 10% of the total solution to end the fixation process. The cells were collected by centrifugation at 1000rpm for 5 minutes at 4◦C, washed with PBS (137mM NaCl, 2.7mM KCl, 4.3mM Na2HPO4, 1.47mM KH2PO4, and 10mM phosphate (pH 7.4)) and divided into 2x107 or 1x107 cell aliquots for ChIP or RIP, respectively. The cells were immediately frozen at -80◦C. 2.2 Myoblast Differentiation C2C12 myoblast cells were cultured as previously described. Cells were seeded at the density required to achieve 20% confluency the following day (Day 1). On Day 1, the cells were washed once with PBS and the media was replaced with DMEM containing 6 71% insulin transferrin selenium [40, 41] (ITS, Gibco 41400-045). The ITS media was changed on Day 3 and the cells harvested and fixed on Day 5. 2.3 Immunostaining For immunostaining, C2C12 myoblasts or myoblasts treated with ITS were fixed for 10 minutes in 4% formaldehyde and permeabilized with a solution containing 0.05% Triton X-100 (Fisher Scientific BP151-500). All antibodies were diluted in blocking solution (9% FBS and 0.2% Tween 20 (Fisher Scientific BP337-500)). Permeabilized cells were washed with blocking solution and incubated with primary antibodies diluted 1:200 for 1 hour. The following primary antibodies were used: myosin heavy chain (MHC) (MF20, Developmental Studies Hybridoma Bank (DSHB)) and myogenin (F5D, DSHB). After incubation with primary antibodies, the cells were washed with blocking solution and incubated with secondary antibodies diluted 1:200 and Hoechst 33342 dye diluted 1:200 for 1 hour. The following secondary antibodies were used: Alexa 594 (Life Technologies A21207) and Alexa 488 (Life Technologies A11001) After incubation with secondary antibodies, the cells were washed twice with blocking solution and once with PBS. A Zeiss Axioert 200m fluorescent microscope was used to visualize and photograph the cells. 2.4 Antibody Testing We used sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-page) followed by western blotting to test our antibodies. 12% SDS gels were poured according to Laemmli [42]. When solidified, the gels were assembled into a gel tank (BioRad Mini Trans-Blot Cell) and immersed in running buffer containing 1% SDS, 1.92M glycine, and 0.25M Tris (Trizma base, Sigma T-6066). C2C12 cells were combined with sample dye containing Bromophenol blue (BioRad 161-0404) and β-mercaptoethanol, vortexed thoroughly and heated at 94◦C. The samples were loaded into the wells and a prestained molecular protein marker (Benchmark, Invitrogen 10748-010) was loaded alongside the samples. The gels were run at 10mA/gel through the stacking phase and 20mA/gel through the separating phase. 8After electrophoresis, the proteins in the gels were electrotransfered to Immonbilon P membranes (Millipore) in a transfer buffer solution containing 15% methanol, 192mM glycine, 25mM Tris, and 0.1% SDS in a Mini-PROTEAN 3 system (Biorad) runnimg at 150mA/tank overnight. After overnight transfer the cassettes were disassembled and the gel with the mem- brane was placed on Saran wrap. A ballpoint pen was used to trace the edges of the gel and the molecular marker onto the membrane. The membrane was cut to the traced size and then into 10-12 strips for testing of different antibodies. The membranes were incubated for 1 hour in a blocking solution containing 2% milk powder (blotting grade blocker non-fat dry milk, BioRad 170-6404) in PBT (0.1% Tween 20 (Fisher Scientific BP151-500) in PBS) on a lab shaker. Excess blocking solution was removed and the membranes were incubated with primary antibodies (Table 2.1) for 1 hour while shaking, washed 3 times for 4 minutes with PBT, and incubated with sec- ondary antibodies (Table 2.2) for 1 hour while shaking. The membranes were washed 6 times for 4 minutes with PBT and incubated in SuperSignal West Dura Extended Duration Substrate (Thermo Scientific 34075) before being placed in a film cassette and exposed to x-ray films in the darkroom. 9Antibody Company Catalog Number Lot Number MyoD Millipore MAB3878 JC1628178 MyoD Santa Cruz (N-19) sc-31940 I0706 MyoD BD Pharmingen 554130 04882 MyoD Santa Cruz (M-318) sc-760 C0212 MyoD Santa Cruz (C-20) sc-302 C0812 MyoD Santa Cruz (C-20) sc-304 I0909 MyoD Santa Cruz (C-20) sc-304 J2111 MyoD Santa Cruz (C-20) sc-304 D0312 Myogenin DSHB F5D-MG unknown Myogenin Millipore MAB3876 1967332 Myf5 Santa Cruz (C-20) sc-302 C0812 WDR5 R&D Systems AF5810 CCZK0111111 WDR5 Bethyl Laboratories A302-429A A302-429A-1 WDR5 Bethyl Laboratories A302-430A A302-430A-1 Rbbp5 Bethyl Laboratories A300-109A A300-109A-2 MLL1 Active Motif 61295 17210001 MLL1 Bethyl Laboratories A300-374A A300-374A-1 MLL1 Bethyl Laboratories A300-086A A300-086A-1 MLL1 Millipore ABE240 NRG1922437 Oct4 Santa Cruz (N-19) sc-8628 A2412 Oct4 Santa Cruz (H-34) sc-9081 L0210 Oct4 Santa Cruz (H-34) sc-9081 E1011 Oct4 Abcam ab19857 GR60398-1 Sox2 Millipore CS204373 DAM1948375 Sox2 Santa Cruz (Y-17) sc-17320 L0211 Sox2 Santa Cruz (Y-17) sc-17320 A0312 Sox2 Millipore CS207294 NRG1928895 Table 2.1: Primary antibodies used for western blotting. Antibodies were diluted at various concentration in blocking solution. Antibody Company Catalog Number Bovine-anti Goat IgG Jackson Immunoresearch Laboratories 085-035-180 Goat-anti Rabbit IgG Jackson Immunoresearch Laboratories 211-032-171 Goat-anti Mouse IgG Jackson Immunoresearch Laboratories 115-035-174 Table 2.2: Secondary antibodies used for western blotting. Antibodies were diluted at 1:1000 in blocking solution. 10 2.5 DNA Length Optimization for ChIP We used agarose gel electrophoresis to test DNA length after sonication and micrococcal nuclease (MNase) incubation. 2x107 C2C12 cells fixed with 1% formaldehyde were resuspended in 498.5µl of a cell lysis buffer solution containing 50mM Tris-HCl, 1mM CaCl2, and 2.5mM MgCl2. The protease inhibitors leupeptin (Sigma L2884, 1µg/ml), Pepstatin A (Sigma P5318, 1µg/ml methanol), and PMSF (Sigma P7626, 17µg/ml isopropanol) were added. The cells were lysed by vortexing for 15 seconds, placing them on ice for 15 minutes, and vortexing for an additional 15 seconds every five minutes. 1000 gel units of MNase (NEB M02475) diluted in 9.5µl of TE (1mM EDTA and 10mM Tris-HCl) was added and the samples were incubated at room temperature for 20 minutes. Addition of 10µl of 0.5M EDTA and placement of the samples on ice stopped the enzymatic reaction. The samples were adjusted to contain 150mM NaCl, 1% NP-40 (MP Biomedicals 198596), 0.5% Na deoxycholate (MP Biomedicals 102906), and 0.1% SDS and sonicated with a Branson 450 sonicator to test the effect of varying the power, duty cycle, and number of pulses on DNA lenth. The cell debris was pelleted by centrifugation at 13,000rpm for 15 minutes at 4◦C. The supernatant containing the sheared chromatin was transferred to a new tube. 100µl of an elution buffer containing 1% SDS and 0.1M NaHCO3 with 0.1µg/µl of proteinase K (Invitrogen 25530-015) was added and the samples were incubated for 2 hours at 65◦C while rotating in a hybridizaton oven. Proteinase K was inactivated by incubation at 95◦C. The DNA was purified using a Zymo plasmid miniprep kit (Zymo Research D4020). 100µl of a membrane binding solution containing 4.5M guanadine isothiocyanate and 0.5M potassium acetate was added to each sample and mixed until a cloudy, white precipitate formed. The mixture was then transferred to the column (Zymo Spin II C1008-250) and centrifuged at 12,000rpm for 1 minute at room temperature. 400µl of Zymo wash buffer (Zymo Research D4036-4-48) was added followed by centrifugation at 12,000rpm for 1 minute. The flow-through was discarded and the samples were centrifuged for an additional 5 minutes to dry the filter. The column was transferred to a low retention recovery tube (Fisher Scientific 02-681-320) and 52µl of Zyppy Elution 11 Buffer (1M Tris-HCl at 10mM and 0.5M EDTA at 0.1mM) was added to the column. After allowing this to incubate at room temperature for 1 minute, the samples were centrifuged at 12,000rpm for 2 minutes. This elution step was repeated once. 150-500ng of DNA from each sample was loaded into a 2% agarose gel alongside a 100bp (Invitrogen 15628-019) and 1kb ladder (Invitrogen 10787-018). The gel was run at 100V for 20-25 minutes in TAE (40mM Tris and 1mM EDTA adjusted to pH 8.3 with glacial acetic acid) on an EMBI Tec Electrophoresis Cell system (Model RunOne). The gels were shaken in ethidium bromide (BioRad 161-0433) for 15 minutes, rinsed once with de-ionized water and visualized with a BioRad gel documentation system. 2.6 ChIP The same procedure as detailed above for cell lysis and MNase incubation was followed and sonication was performed using the optimized conditions: power 4, duty cycle 50%, for 45 pulses with a rest period of 30 seconds between every 9 pulses. For the immunoprecipitation, 20µl of Dynabeads Protein G bead suspension (In- vitrogen 1004D) per sample was washed twice with 100µl of RIPA buffer containing 150mM NaCl, 1% NP-40, 0.5% Na deoxycholate (MP Biomedicals 102906), 0.1% SDS, and 50mM Tris-HCl. A magnetic rack (MagnoRack, Invitrogen CS15000) was used to collect the beads. The 20µl of bead suspension was combined with 5µg of anti- body (Table 2.3), 50µl of supernatant from the cell extract, and 450µl RIPA buffer. The preparation was thoroughly combined by pipetting and incubated overnight at 4◦C with rotation. The beads were washed for five minutes with rotation in 500µl of each of the fol- lowing solutions: RIPA, LiCl wash buffer (1% Triton X-100, 0.1% SDS, 250mM LiCl, 0.2mM EDTA, and 20mM Tris-HCl), and TE. The samples were decrosslinked and the DNA was purified using the methods de- scribed previously. Quantitative PCR (qPCR) was used to test for the presence or absence or DNA transcripts pulled down by our antibodies. Each reaction used 10µl of PCR mix (Promega GoTaq qPCR Master Mix 289548) combined with 3µl of DEPC treated water, 2µl of 2.5µM mixed forward and reverse primer (Table 2.3), and 5µl of purified DNA. The following PCR Program was used: 95◦C for 2 minutes, 40 cycles 12 (95◦C 30 seconds, 58◦C 30 seconds, 72◦C 30 seconds), 95◦C 15 seconds, 60◦C 15 seconds, 20 minute ramp up to 95◦C, 95◦C for 15 seconds, hold at 4◦C (Eppendorf Mastercycler realplex2). Reactions were run in triplicate and averages were taken: Inputavg, IgGavg, and Xavg. Fold change was calcuated as: = 2(−(Xavg−IgGavg)−Inputavg) = 2(−∆CT−Inputavg) = 2(−∆∆CT ) Antibody Company Catalog Number Lot Number MyoD Santa Cruz (C-20) sc-304 I0909 MyoD Santa Cruz (C-20) sc-304 J2111 Rabbit IgG Santa Cruz sc-2027 C2712 H3K27me3 Upstate 17-622 24440 MyoD Santa Cruz (C-20) sc-304 D0312 MyoD Millipore MAB3878 JC1628178 Table 2.3: Antibodies used for ChIP experiments. 2.7 RNA-ChIP The same procedure as previously described was followed for MNase incubation, son- ication, and immunoprecipitation. The only differences were that a larger number of C2C12 cells, 2x108, was used as the starting material and 10µl of RNaseOUT Recom- binant Ribonuclease Inhibitor (Invitrogen 10000840) and 10µg/ml Heparin was added along with the protease inhibitors. The sample was decrosslinked using reagents from a PureLink RNA Mini Kit (Life Technologies 12183-016). 100µl of lysis buffer (Life Technologies 46-6001) was added and the sample was incubated at 80◦C for 1 minute, vortexed briefly, and 0.1µg/µl of proteinase K (Invitrogen 25530-015) was added. The sample was incubated at 56◦C for 15 minutes then at 80◦C for 15 minutes. Finally, the sample was placed on ice for 1 minute and allowed to come to room temperature. 13 Name Sequence MyoD CER F1 GGG CAT TTA TGG GTC TTC CT MyoD CER R1 CTC ATG CCT GGT GTT TAG GG MyoD DRR F1 TCA GGA CCA GGA CCA TGT CT MyoD DRR R1 CTG GAC CTG TGG CCT CTT AC Myogenin E2/E1 F1 GAA TCA CAT GTA ATC CAC CTG GA Myogenin E2/E1 R1 ACA CCA ACT GCT GGG TGC CA B-actin F3 TTC GCG GGC GAC GAT GCG B-actin R1 TTC TGA CCC ATT CCC ACC ATC ACA Oct4 F3 AGG TCA AGG GGC TAG AGG GTG GGA TT Oct4 R3 TGA GAA GGC GAA GTC TGA AGC CA Sox2 F11 GCC GGA AAC CCA TTT ATT CCC TGA Sox2 R11 TCG GGC TCC AAA CTT CTC TCC TTT MyoD CER ChIP F2 AGC CAG TTA ATC TCC CAG AGT GCT MyoD CER ChIP R2 TAG AGA AAC CGG AGA AGA CCC AGG AA MyoD DRR ChIP F2 AAA GTA AGA GGC CAC AGG TCC AGA MyoD DRR ChIP R2 TCT GGA AAC CGG ATC CAA CTA GCA Table 2.4: Sequences of primers used for qPCR. A DNase step was added from the PureLink DNase set (Invitrogen 46-6026) that used a DNAse I mixture of 0.09M MnCl2, 7µl of 2X DNase Buffer (Invitrogen 46-6025), and 10µl of PureLink DNase I (Invitrogen 10002884). After this mixture was added to the sample it was incubated at room temperature for 15 minutes. The beads were removed and 325µl of bead lysis buffer and 200µl of isopropanol was added to the sam- ple and mixed by vortexing briefly. The sample was transfered to a spin column and centrifuged at 10,000rpm for 30 seconds at room temperature. 500µl of wash solution (Invitrogen 46-6003 with ethanol added) was added to the column followed by centrifu- gation at 10,000rpm for 30 seconds at room temperature. This was repeated once. 30µl of RNase-free water (Invitrogen 46-8000) was added to the column and it was incubated at room temperature for 1 minute followed by centrifugation at 13,000rpm for 2 minutes. The RNA concentration was measured using a Qubit fluorometer (Life Technologies). 14 2.8 RNA-sequencing and Data Analysis My colleague prepared cDNA libraries and the BioMedical Genomics Center (BMGC) obtained raw sequence data from the libraries. I then analyzed the sequences in col- laboration with the Minnesota Supercomputing Institute (MSI). In preparation for se- quencing, the Ovation RNA-seq System V2 (NuGEN 7102) was used to create cDNA from the co-precipitated RNA fragments. The cDNA was sent to BMGC for fragmenta- tion by sonication and size estimation using a DNA High Sensitivity Lab Chip (Agilent 5067-4626). The cDNA was sent back to our lab and the Ovation Ultraflow Library System (NuGEN 0303) was used to generate blunt ends by end repair so that adaptor and barcode sequences could be ligated to the cDNA fragments. The cDNA fragments that contained adaptor sequences at both ends were amplified by PCR to create the final cDNA library. AgencourtRNAClean XP Beads were used to further purify the cDNA from ribosomal RNA and small RNA fragments. Two rounds of samples were sent. The first contained cDNA from RNA-ChIP ex- periments using CGR (mouse embryonic stem) cells with antibodies against WDR5, MLL1, and Rbbp5, and an IgG control and the second contained cDNA from RNA- ChIP experiments with antibodies against Sox2 (Millipore), Sox2 (Santa Cruz), Oct4 (Santa Cruz), Oct4 (Abcam), a repeat of WDR5, and an IgG control. Paired-end sequencing using 50-base-pair reads and 200-base-pair fragments was performed at the BMGC on an Illumina HiSeq 2000. Our cDNA library was washed across a flow cell which binds the adaptor sequences. The cDNA that hybridized to the flow cell underwent bridge amplification to form clusters of cDNA clones. Sequencing primer was added and DNA bases were added one at a time. Each cycle produced a base read for each cluster and the flow cell was imaged after the addition of each base. The bases were labeled with different fluorophores and it was the reading of this fluorescence that produced the sequence information. The output of this sequence information was in the format of several FastQ files, which were uploaded by BMGC into our project space at MSI. A FastQ file contains the raw reads data. It is given a Phred score, which gives the probability of the accuracy of the base calling. A Phred score of 30 is considered acceptable though, generally, even with a high Phred score it is necessary to use Quality Trimmer or Column Trimmer 15 to trim reads that fall below the accepted level. The reads can then be mapped to a reference genome, in our case the UCSC mouse mm9 genome build, using a software pro- gram developed specifically for RNA-seq data analysis called Bowtie. Another program called TopHat works together with Bowtie to identify exon-exon splice junctions. The mapping process produces two output files. The first is a SAM file, a tab-deliminated text file that contains sequence alignment data and the other is a BAM file, the bi- nary version of the SAM file. It is the BAM file that can be opened in the Interactive Genome Browser (Broad Institute) to visualize the mapping of the reads which form peaks when many reads map to one region of the genome. A program called MACS (Model based Analysis of ChIP-seq) is then used to ‘call’ the peaks, or in more general terms, to assign statistical significance to peaks based on their width and height above background (IgG) levels. Additional significance can be attached to certain peaks that are shared between samples known to form a complex (WDR5, MLL1, Rbbp5), samples that are known to bind the same region of the DNA (Oct4, Sox2), or between samples from antibodies against the same protein but from different companies (Sox2 Millipore, Sox2 Santa Cruz). Chapter 3 Results 3.1 ITS Differentiates Myoblasts into Myotubes A protocol was developed for the large-scale production of myotubes via the differentia- tion of C2C12 myoblast cells. We determined early on that insulin transferrin selenium (ITS) [40, 41] was more effective than horse serum (HS) [43, 44] at producing myotubes from myoblasts. Therefore, we chose ITS to develop the large-scale protocol. Myotube formation was greatest when ITS was added to myoblasts at a confluency of 20%. The number of myoblasts seeded per dish on Day 0 to obtain 20% confluency on Day 1 when ITS was added was determined as 3x104 cells/3.5cm dish, 3x105 cells/10cm dish, and 2x106 cells/15cm dish. On Day 2, the cells became 40% confluent with no per- ceptible changes. On Day 3, the cells became 60-70% confluenct with some elongation and circular patterning of the cells. By Day 4, this elongation and patterning increased and cells reached 80-90% confluency with many dead cells observed. On Day 5, conflu- ency decreased to 50-60% with a further increase in dead cells. At this point, obvious elongation and bundling of the cells into multi-nucleated myotubes was observed. This method typically produced about 3.6x106 cells/10cm dish and 7.5x106 cells/15cm dish. Successful myotube formation was assessed by the activation of the myotube-specific genes myosin heavy chain (MHC) and myogenin. Immunofluorescence microscopy con- firmed the expression of MHC in 47% of cells treated with ITS (Figure 3.1) and myogenin in 33% of cells treated with ITS (Figure 3.2), compared to 0% expression of MHC or myogenin in untreated myoblast cells. 16 17 ITS (+) ITS (-) D A P I M H C M er ge 50µm Figure 3.1: MHC expression in myotubes from differentiated C2C12 cells. C2C12 cells were treated with or without ITS and immunostained with antibodies against MHC after fixation with formaldehyde on Day 5. DNA was counterstained with DAPI. Cells were visualized at 20X with a Zeiss Axiovert 200M fluorescent microscope. 18 ITS (+) ITS (-) D A P I M yo ge ni n M er ge 50µm Figure 3.2: Myogenin expression in myotubes from differentiated C2C12 cells. C2C12 cells were treated with or without ITS and immunostained with antibodies against myogenin after fixation with formaldehyde on Day 5. DNA was counterstained with DAPI. Cells were visualized at 20X with a Zeiss Axiovert 200M fluorescent microscope. 19 3.2 Western Blots Identify Immunoprecipitation- compatible Antibodies Immunoprecipitation (IP) requires that the antibodies used have high affinity and speci- ficity for the protein being pulled down. Western blots that detect a single band of the expected size indicate that the antibody is appropriate for use in immunoprecipitation. Using this technique, we found several IP-compatible antibodies against muscle-specific proteins, components of the COMPASS complex, and pluripotency proteins for use in our ChIP and RNA-ChIP experiments. We evaluated eight MyoD antibodies and found six that detected multiple bands or had high background (Millipore MAB3878: lot JC1628178, Santa Cruz sc-31940: lot I0706, BD Pharmingen 554130: lot 04882, Santa Cruz sc-760: lot C0212, Santa Cruz sc-302: lot C0812, Santa Cruz sc-304: lot D0312) and two that detected a single band at 45kDa (Santa Cruz sc-304: lot I0909 and J2111) (Figure 3.3). Testing of two myogenin antibodies found that both detected multiple bands (F5D-MG (Developmental Studies Hybridoma Bank (DSHB), Millipore MAB3876: lot 1967332). The one Myf5 antibody tested (Santa Cruz sc-302: lot C0812) detected multiple bands as well. For the COMPASS complex, we tested three WDR5 antibodies and found two of them (R&D Systems CCZK0111111: lot AF5810, Bethyl Laboratories A302-429A: lot A302-429A-1) detected multiple bands and one of them (Bethyl Laboratories A302- 430A: lot A302-430A-1) showed a strong dominant band at 40kDa. Testing of four Mll1 antibodies revealed three that detected multiple bands (Active Motif 61295: lot 17210001, Bethyl Laboratories A300-374A: lot A300-374A-1 and A300-086A-1) and one that detected a single band at 180kDa (Millipore ABE240: lot NRG1922437). Testing of one Rbbp5 antibody (Bethyl Laboratories A300-109A: lot A300-109A-2) detected a single band at 60kDa (Figure 3.4). Of the four Oct4 antibodies tested one of them (Santa Cruz sc-8628: lot A2412) detected multiple bands and three of them (Santa Cruz sc-9081: lot L0210 and E1011, Abcam ab19857: lot GR60398-1) detected a single band at 34kDa. Testing of four Sox2 antibodies found one that detected a band of the incorrect size (Millipore CS204373: lot DAM1948375) and three that detected a band at 34kDa (Santa Cruz sc-17320: lot L0211 and A0312, Millipore CS207294: lot NRG1928895) (Figure 3.5 and 3.6). 20 MyoD sc-304 J2111 181.8 115.5 82.2 63.2 48.8 37.1 25.9 19.4 14.8 kDa MyoD sc-304 I0909 Figure 3.3: Western blots of IP-compatible antibodies against muscle-specific proteins. All antibodies were diluted in blocking solution. Primary antibodies: MyoD sc-204: lot I0909 diluted 1/2000, MyoD sc-304: lot J2111 diluted 1/2000. Secondary antibody: Rabbit IgG diluted 1/1000. Exposure time = 1 minute. Expected size = 45kDa. Due to the nature of our western blotting method, where the membranes are cut into strips instead of left intact, shifting of the strips can cause variation in the orientation of the molecular marker to the band of interest. 21 181.8 115.5 82.2 63.2 48.8 37.1 25.9 19.4 14.8 kDa Rbbp5 Bethyl A300-109A-2 181.8 115.5 82.2 63.2 48.8 37.1 25.9 19.4 14.8 kDa 6.0 WDR5 Bethyl A302-430A-1 MLL1 Millipore ABE240 181.8 115.5 82.2 63.2 48.8 37.1 25.9 19.4 14.8 kDa Figure 3.4: Western blots of IP-compatible antibodies against components of the COM- PASS complex. All antibodies were diluted in blocking solution. Primary antibodies: WDR5 Bethyl Laboratories A302-430A lot: A302-430A-1 diluted 1/20,000, MLL1 Mil- lipore ABE240 lot: NRG1922437 diluted 1/600, Rbbp5 Bethyl Laboratories A300-109A lot: A300-109A-2 diluted 1/6400. Secondary antibody: Rabbit IgG diluted 1/1000. Exposure time WDR5, Rbbp5 = 1 second, MLL1 = 1 minute. Expected size = 40kDa WDR5, 180kDa MLL1, 60kDa Rbbp5. 22 Oct4 sc-9081 L0210 181.8 115.5 82.2 63.2 48.8 37.1 25.9 19.4 14.8 kDa Oct4 sc-9081 E1011 181.8 115.5 82.2 63.2 48.8 37.1 25.9 19.4 14.8 kDa Oct4 Abcam ab19857 Figure 3.5: Western blots of IP-compatible antibodies against Oct4. All antibodies were diluted in blocking solution. Primary antibodies: Santa Cruz sc-9081 lot: L0210 and E1011 diluted 1/4000, Abcam ab19857 lot: GR60398-1 diluted 1/6400. Secondary antibody: Rabbit IgG diluted 1/1000. Exposure time = 1 second. Expected size = 34kDa. 23 181.8 115.5 82.2 63.2 48.8 37.1 25.9 19.4 14.8 kDa 6.0 Sox2 sc-17320 L0211 181.8 115.5 82.2 63.2 48.8 37.1 25.9 19.4 14.8 kDa 6.0 Sox2 sc-17320 A0312 Sox2 Millipore CS207294 181.8 115.5 82.2 63.2 48.8 37.1 25.9 19.4 14.8 kDa Figure 3.6: Western blots of IP-compatible antibodies against Sox2. All antibodies were diluted in blocking solution. Primary antibodies: Santa Cruz sc-17320 lot: L0211 and A0312 diluted 1/400, Millipore CS207294: lot NRG1928895 diluted 1/6400. Secondary antibodies: Goat IgG diluted 1/1000 for Santa Cruz sc-17320 lot: L0211 and A0312, Mouse IgG diluted 1/1000 for Millipore CS207294: lot NRG1928895. Exposure time = 1 second. Expected size = 34kDa. 24 3.3 Chromatin Size is Optimized for Immunoprecipitation For the purpose of ChIP or RNA-ChIP, the DNA needs to be sheared into fragments between 200 and 700 base pairs (bp) in length, centered at 500bp. We established and tested a method that did this using a combination of sonication and incubation with micrococcal nuclease. At duty cycle 50%, power 4, increasing the number of pulses to 108 or the power level to 6 with our sonicator (Branson 450) had a negligible effect on DNA length and size distribution, producing DNA in lengths ranging from 0.1kb to 12kb (Figure 3.7). Increasing the total number of pulses to 144 at duty cycle 50%, power 4, had a noticeable effect on DNA length, producing DNA between 0.1kb and 3kb in length (Figure 3.8). However, sonication for this number of pulses was labor-intensive and resulted in loss of greater than 20% of the sample volume during the sonication process. Attempts to raise the duty cycle from 50% to 100% resulted in emulsification and degradation of the sample. It was only with the addition of a 20 minute micrococcal nuclease incubation step that a dramatic decrease in DNA length was observed, producing DNA between 0.3kb and 1kb in length, centered at 0.5kb at both 25◦C and 37◦C (Figure 3.9). As a result, we added a 20 minute incubation at 25◦C with micrococcal nuclease to our preparation of DNA for ChIP followed by sonication at power 4, duty cycle 50% for 45 pulses with a rest period of 30 seconds between every 9 pulses of sonication delivered. 25 1.5kb 1kb 0.5kb 1kb M 100bp M Pulses: 90 108 45 63 81 45 63 81 Power: 4 5 6 0.1kb 12kb 1500bp 100bp Figure 3.7: Size distribution of DNA after sonication of chromatin ranges from 0.1kb- 12kb. Measured range is indicated by bar. Power settings and number of pulses delivered are indicated. Size markers, 1kb and 100bp, are applied to the left and right sides of the gel, respectively. 26 1.5kb 1kb 0.5kb 1kb M 100bp M Pulses: 90 90 108 108 126 126 144 144 Power: 4 12kb 5kb 0.1kb 1500bp 100bp 1000bp 500bp Figure 3.8: Size distribution of DNA after sonication of chromatin ranges from 0.1kb- 3kb. Measured range is indicated by bar. Power settings and number of pulses delivered are indicated. Size markers, 1kb and 100bp, are applied to the left and right sides of the gel, respectively. 27 1.5 kb 0.5kb 1kb 1kb M 100bp M Min: 5 10 20 20 5+10 20 20 25°C 37°C Pulses: 45 45 12kb 5kb 0.1kb 1500bp 100bp 1000bp 500bp Figure 3.9: Size distribution of DNA after incubation with micrococcal nuclease and 45 pulses of sonication of chromatin ranges from 0.3kb-1kb. Measured range is indicated by bar. Temperatures, power settings, and number of pulses delivered are indicated. Size markers, 1kb and 100bp, are applied to the left and right sides of the gel, respectively. 28 3.4 ChIP with MyoD Shows Non-specific Binding Even with optimally sized chromatin and highly specific antibodies, IP experiments can still fail to pull down RNA or DNA previously known to be associated with the protein of interest. To confirm that our MyoD antibodies were IP-compatible, we de- signed qPCR primers targeting regions where MyoD is known to bind. These included regions upstream of the MyoD gene itself: the proximal regulatory region (PRR), the distal regulatory region (DRR), and the core enhancer region (CER) [45] (Figure 3.10), in addition to the E2/E1 region upstream of the transcription start site (TSS) of the muscle-specific gene, myogenin. Testing of these primers by qPCR and MOPS (3-(N- morpholino)propanesulfonic acid) denaturing gradient gel electrophoresis according to Lerman [46], revealed that the MyoD CER, MyoD DRR, and myogenin E2/E1 primers were compatible with qPCR. Positive control for our ChIP protocol used antibodies against H3K27me3 (Upstate 17-622 lot: 24440). H3K27me3 is a marker of transcriptional repression. It binds near the TSS of repressed genes which in myoblasts includes the genomic regions of the Oct4 and Sox2 genes targeted by our Oct4 F3/R3 and Sox2 F11/R11 primers. qPCR indi- cated that a high amount of DNA was precipitated by H3K27me3 at the Oct4 F3/R3 and Sox2 F11/R11 regions, ranging from 66.87-110.92 (Figure 3.11). ChIP with two MyoD antibodies (sc-304 lot I0909 and J2111) did not precipitate greater amounts of DNA at the regulatory regions of MyoD and myogenin compared to the control IgG (sc-2027 lot: C2712). The recovered DNA amount for MyoD and myo- genin genes ranged from 0.61-1.5 and from 0.34-3.28 for the negative controls (Figure 3.11). Overall, these results indicated that we could not detect MyoD binding the regula- tory regions of MyoD and myogenin genes as reported in the literature [45]. It is highly likely this was due to the low quality of the antibodies, as our positive and negative controls precipitated DNA at the expected amounts. Primers for the PRR, DRR, and CER regions of MyoD were redesigned and or- dered in addition to new primers overlapping the TSS of MyoD and myogenin. Testing revealed that the new MyoD CER primer (MyoD CER F2/R2), the new MyoD DRR primer (MyoD DRR F2/R2), and a primer spanning the TSS of MyoD (CD1) were 29 MyoD CER DRR PRR -20k -5k -200bp Exon 1 Figure 3.10: Regulatory regions of the MyoD gene. CER = core enhancer region, DRR = distal regulatory region, PRR = proximal regulatory region. 30 compatible with qPCR. In addition, MyoD sc-304 was reordered (lot:D0312) and ChIP with this antibody and also with MyoD (Millipore MAB3878 lot: JC1628178) was per- formed. However, qPCR revealed that these MyoD antibodies were also not binding the regulatory regions of MyoD and myogenin genes at a higher level than the con- trol IgG. Fold change for MyoD sc-304 (lot:D0312) ranged from 2.14-3.4 compared to 1.97 for the negative control. Similarly, fold change for MyoD MAB3878 ranged from 0.19-0.51 compared to 0.31 for the negative control, indicating non-specific binding for both MyoD antibodies. Indeed, both MyoD antibodies (MyoD sc-304 lot:D0312 and Millipore MAB3878) showed multiple bands by western blot. As a result of the lack of IP-compatible antibodies against MyoD, we were obligated to suspend the MyoD project. 31 Fo ld C ha ng e ChIP 2 ChIP 1 Fo ld C ha ng e Figure 3.11: Relative expression fold change of DNA from ChIP using antibodies against H3K27me3 (top) and MyoD sc-304 (I0909 and J2111) (bottom), as determined by qPCR. Three technical replicates were used for each antibody. Amount of DNA co- precipitated with IgG was definted as 1.0. 32 Fo ld C ha ng e ChIP 3 Figure 3.12: Relative expression fold change of DNA from ChIP using antibodies against MyoD sc-304 (lot:D0312) and MyoD MAB3878 as determined by qPCR. Three technical replicates were used for each antibody. Amount of DNA co-precipitated with IgG was defined as 1.0. 33 3.5 RNA-ChIP with MyoD Leads to Re-testing of Anti- bodies The amount of RNA purified from the RNA-ChIP experiments using antibodies against MyoD (sc-304 lot: I0909 and J2111) was too low (<20ng/ml) to be detected by a Qubit fluorometer (Life Technologies). RNA-ChIP with antibodies against Ezh2 and Suz12, members of the polycomb repressive complex 2 (PRC2) previously known to bind lncRNAs, were used as a positive control; however, they also did not recover sufficient RNA to be measured. Re-testing by western blot of the two MyoD antibodies (sc-304 lot: I0909 and J2111) indicated that they had become inactive. Attempts to test other MyoD antibodies by western blot showed multiple dominant bands or high background (Millipore MAB3878 and Santa Cruz sc-760). Re-ordering of MyoD (sc-304 lot: D0312) showed an additional dominant band by western blot. Because of the low specificity of the available antibodies, we were not able to use MyoD antibodies for RNA-ChIP. However, we found highly specific antibodies against WDR5, MLL1, Rbbp5, Oct4, and Sox2. My colleague obtained sufficient amount of RNA (>100ng) with each of these antibodies using the RNA-ChIP protocol we optimized together. 34 3.6 RNA-seq Data Analysis Phred scores for the WDR5, MLL1, Rbbp5 and IgG samples were greater than 30, with only one or two positions requiring trimming with Column Trimmer. FastQC was used to check the quality of the samples and adaptor contamination was listed as an overrepresented sequence (ORS) for the Rbbp5 sample. CutAdapt was used to remove the contaminating adaptor sequences from this sample. Ribosomal RNA was listed as an ORS for all 4 of the samples. As a consequence, mapping the reads to the genome using Bowtie in conjunction with Tophat yielded low mapping percentages (Table 3.1). Rescuing reads by allowing a 1bp mismatch for the barcode sequences rescued 5-10% of the unmappalbe MLL1, Rbbp5, and IgG reads. Unexpectedly, rescuing the reads reduced the mapping percentage of WDR5 to 4% and this sample could not be used for further analysis. Sample Mapping % MLL1 30% WDR5 54% Rbbp5 46% IgG 34% Table 3.1: Mapping percentages of RNA-seq 1 samples. After mapping, the BAM files were uploaded to IGV for manual inspection. MACS was used to call the peaks from the MLL1, Rbbp5, and IgG samples. After IgG peaks were subtracted, 6,794 MLL1 peaks and 11,764 Rbbp5 peaks remained and 803 of these peaks overlapped. Representative examples of peaks viewed with IGV are shown in Figure 3.13. The low mapping efficiency of the first set of samples was attributed to degradation of components of the Ovation RNA-seq System V2. The second set of samples showed higher mapping percentages (75-90%) and a greater number of peaks called per sample (36,871-50,653). Importantly, 92 loci (peaks) were co-precipitated by all 4 pluripotency antibodies (the Oct4 and Sox2 duplicates), generating a robust list of candidate lncRNAs for further analysis. 35 MLL1 Rbbp5 IgG MLL1 Rbbp5 IgG Figure 3.13: Representative example of peaks viewed with IGV. Top: A 1,305 base pair region (chr12:80,679,221-80,680.526) showing a peak shared by MLL1 and Rbbp5 but not IgG. Bottom: A 507 base pair region (chr18:54,082,038-54,082,648) showing an MLL1-specific peak not shared by Rbbp5 or IgG. The height and shape of a peak is determined by the number of reads (square arrows below the peaks) mapping at that location. DNA base pairs are depicted in unique colors and location of the peaks relative to known genes is shown (exon = solid blue bar, intron = hashed blue line). Chapter 4 Discussion While establishing a protocol for RNA-ChIP, we immunoprecipitated three sets of pro- teins: MyoD, the components of the COMPASS complex WDR5, MLL1 and Rbbp5, and two key pluripotency proteins Oct4 and Sox2. Whether lncRNAs interact with MyoD, the master regulator of muscle differentia- tion, remains an elusive question. Muscle differentiation is a well-defined system that could be taken advantage of to characterize lncRNAs expressed during progressive stages of muscle development. RNA-ChIP with antibodies against MyoD followed by RNA- seq persists as a valid pursuit should IP-compatible antibodies against MyoD become available. As for the COMPASS complex, MLL1 contains an RNA binding domain [8]. This makes it and the other proteins of the COMPASS complex likely candidates for interact- ing with lncRNAs. Generating new WDR5, MLL1, and Rbbp5 samples for sequencing using the replaced Ovation RNA-seq System V2 is highly likely to improve mapping results and in this way overlapping peaks from all three samples could be used to gen- erate a list of candidate lncRNAs. The high mapping percentage and large number of overlapping peaks co-precipitated by our Oct4 and Sox2 duplicates with the second round of RNA-ChIP indicates that our protocol is indeed effective. This data could lead to an exciting new chapter in the regulation of pluripotency as it has not been previously shown that Oct4 and Sox2 bind RNA. Oct4 and Sox2 are known to form a heterodimer and bind a cis-regulatory element essential for the activation of a third master pluripotency factor, Nanog [47]. 36 37 These three proteins in turn bind to closely localized genomic sites [48], upregulating genes important for pluripotency and downregulating lineage specific genes. It is con- ceivable that lncRNAs are playing a role by acting as scaffolds for the assembly of the three pluripotency factors on the chromatin or as guides to recruit these factors to reg- ulatory regions. Data from RNA-ChIP with antibodies against Nanog could be used to augment previously existing Oct4 and Sox2 peak data and assist in the selection of candidate lncRNAs. Future work with this project involves filtering the list of candidate lncRNAs pro- duced by the RNA-seq data analysis. Subtraction of the regions occupied by previously annotated lncRNAs and protein coding genes can generate a list of candidate novel lncRNAs. These novel lncRNAs can be further scrutinized to verify that they are not transcriptional noise and that they indeed do not encode proteins. For instance, if the candidate is located within a K4-K36 domain and enriched with RNA polymerase II binding sites and DNase I hypersensitivity sites (a sign of open chromatin) as detected with the ENCODE data, the candidate is likely to be a product of active transcription [37, 49, 50]. The protein-coding potential of a candidate lncRNA can be evaluated with the Coding Potential Calculator (CPC) algorithm and other programs [51, 52]. Back at the bench, characterization of lncRNA function typically involves rapid am- plification of cDNA ends (RACE) to identify the full length transcript [53]. Knockdown and overexpression of the novel lncRNA can further validate its biological function in a system of interest. In summary, we established a RNA-ChIP protocol to identify lncRNAs bound to chromatin proteins in embryonic stem cells. Although RNA-ChIP has been previously used to identify lncRNAs that bind to RNA-binding proteins, the novelty of our ap- proach is that it has been applied to proteins that are not previously known to bind RNA. Furthermore, while the ENCODE and FANTOM projects have identified several thousand lncRNAs from human and mouse cells, these groups used a defined number of cells in a very defined situation. The power of this technique is that it can be applied to very specific contexts to detect lncRNAs potentially binding proteins of interest and also that it allows for the discovery of de novo lncRNAs . Chapter 5 Conclusion Currently, interactions between proteins, microRNAs, and specific regions of the DNA are the main concepts applied to studies of gene regulation. Our approach opens the door to a novel layer of gene regulation that incorporates RNA-protein interactions through the isolation and identification of lncRNAs bound to specific proteins of interest. The proof-of-principle technique established by this project is a useful tool to characterize lncRNA expression during any developmental stage and is widely applicable to the study of chromatin binding proteins in other biological contexts. Furthermore, we expect that additional technological innovations geared toward studying lncRNAs will continuously emerge to support the rapid development of this fascinating research field. 38 Chapter 6 Glossary of Bioinformatics Terms • Fragment – A cDNA piece 200 base pairs in length generated by sonication and reverse transcription. • Flow cell – A planar optically transparent surface similar to a microscope slide which contains a lawn of oligonucleotide anchors bound to its surface. • Adaptor sequences – Sequences ligated to the ends of fragments that attach to the oligonucleotide anchors bound to the flow cell and simultaneously provide primers during bridge amplification. • Multiplexing – The sequencing of multiple samples on one lane of a flow cell. • Barcode – A unique sequence used to distinguish samples during multiplexing. • Read – A 50 base pair sequence read from the end of a fragment bound to a flow cell. • Paired-end sequencing – Reading of 50 base pairs from both ends of a fragment. This method generates a ‘forward’ and a ‘reverse’ read. • Unmappable read – A read that cannot be unambiguously assigned a location in the genome. In this case, the threshold was set to 5. Thus, any read mapping to greater than 5 locations was labeled as unmappable. 39 References [1] K. C. Wang and H. Y. Chang. Molecular mechanisms of long noncoding rnas. Mol Cell, 43(6):904–914, 2011. [2] J. L. Rinn and H. Y. Chang. Genome regulation by long noncoding rnas. Annu Rev Biochem, 81:145–166, 2012. [3] T. Derrien, R. Johnson, G. Bussotti, A. Tanzer, S. Djebali, H. Tilgner, G. Guernec, D. Martin, A. Merkel, D. G. Knowles, J. Lagarde, L. Veeravalli, X. Ruan, Y. Ruan, T. Lassmann, P. Carninci, J. B. Brown, L. Lipovich, J. M. Gonzalez, M. Thomas, C. A. Davis, R. Shiekhattar, T. R. Gingeras, T. J. Hubbard, C. Notredame, J. Har- row, and R. Guigo. The gencode v7 catalog of human long noncoding rnas: analysis of their gene structure, evolution, and expression. Genome Res, 22(9):1775–1789, 2012. [4] B. Banfai, H. Jia, J. Khatun, E. Wood, B. Risk, Jr. Gundling, W. E., A. Kundaje, H. P. Gunawardena, Y. Yu, L. Xie, K. Krajewski, B. D. Strahl, X. Chen, P. Bickel, M. C. Giddings, J. B. Brown, and L. Lipovich. Long noncoding rnas are rarely translated in two human cell lines. Genome Res, 22(9):1646–1657, 2012. [5] I. Dunham, A. Kundaje, S. F. Aldred, P. J. Collins, C. A. Davis, F. Doyle, C. B. Epstein, S. Frietze, J. Harrow, R. Kaul, J. Khatun, B. R. Lajoie, S. G. Landt, B. K. Lee, F. Pauli, K. R. Rosenbloom, P. Sabo, A. Safi, A. Sanyal, N. Shoresh, J. M. Simon, L. Song, N. D. Trinklein, R. C. Altshuler, E. Birney, J. B. Brown, C. Cheng, S. Djebali, X. Dong, J. Ernst, T. S. Furey, M. Gerstein, B. Giardine, M. Greven, R. C. Hardison, R. S. Harris, J. Herrero, M. M. Hoffman, S. Iyer, M. Kelllis, P. Kheradpour, T. Lassman, Q. Li, X. Lin, G. K. Marinov, A. Merkel, A. Mortazavi, 40 41 S. C. Parker, T. E. Reddy, J. Rozowsky, F. Schlesinger, R. E. Thurman, J. Wang, L. D. Ward, T. W. Whitfield, S. P. Wilder, W. Wu, H. S. Xi, K. Y. Yip, J. Zhuang, B. E. Bernstein, E. D. Green, C. Gunter, M. Snyder, M. J. Pazin, R. F. Lowdon, L. A. Dillon, L. B. Adams, C. J. Kelly, J. Zhang, J. R. Wexler, P. J. Good, E. A. Feingold, G. E. Crawford, J. Dekker, L. Elinitski, P. J. Farnham, M. C. Giddings, T. R. Gingeras, R. Guigo, T. J. Hubbard, M. Kellis, W. J. Kent, J. D. Lieb, E. H. Margulies, R. M. Myers, J. A. Starnatoyannopoulos, S. A. Tennebaum, Z. Weng, K. P. White, B. Wold, Y. Yu, J. Wrobel, B. A. Risk, H. P. Gunawardena, H. C. Kuiper, C. W. Maier, L. Xie, X. Chen, T. S. Mikkelsen, et al. An integrated encyclopedia of dna elements in the human genome. Nature, 489(7414):57–74, 2012. [6] K. L. Yap, S. Li, A. M. Munoz-Cabello, S. Raguz, L. Zeng, S. Mujtaba, J. Gil, M. J. Walsh, and M. M. Zhou. Molecular interplay of the noncoding rna anril and methylated histone h3 lysine 27 by polycomb cbx7 in transcriptional silencing of ink4a. Mol Cell, 38(5):662–674, 2010. [7] T. Sanchez-Elsner, D. Gou, E. Kremmer, and F. Sauer. Noncoding rnas of trithorax response elements recruit drosophila ash1 to ultrabithorax. Science, 311(5764):1118–1123, 2006. [8] S. Bertani, S. Sauer, E. Bolotin, and F. Sauer. The noncoding rna mistral activates hoxa6 and hoxa7 expression and stem cell differentiation by recruiting mll1 to chromatin. Mol Cell, 43(6):1040–1046, 2011. [9] S. R. Atkinson, S. Marguerat, and J. Bahler. Exploring long non-coding rnas through sequencing. Semin Cell Dev Biol, 23(2):200–205, 2012. [10] J. Robert E. Farrell. RNA Methodologies: A Laboratory Guide for Isolation and Characterization. Elsevier Science, 2005. [11] H. Lodish, A. Berk, C.A. Kaiser, M. Krieger, M.P. Scott, A. Bretscher, H. Ploegh, and P. Matsudaira. Molecular Cell Biology. W. H. Freeman, 2007. [12] S. Djebali, C. A. Davis, A. Merkel, A. Dobin, T. Lassmann, A. Mortazavi, A. Tanzer, J. Lagarde, W. Lin, F. Schlesinger, C. Xue, G. K. Marinov, J. Khatun, 42 B. A. Williams, C. Zaleski, J. Rozowsky, M. Roder, F. Kokocinski, R. F. Ab- delhamid, T. Alioto, I. Antoshechkin, M. T. Baer, N. S. Bar, P. Batut, K. Bell, I. Bell, S. Chakrabortty, X. Chen, J. Chrast, J. Curado, T. Derrien, J. Drenkow, E. Dumais, J. Dumais, R. Duttagupta, E. Falconnet, M. Fastuca, K. Fejes-Toth, P. Ferreira, S. Foissac, M. J. Fullwood, H. Gao, D. Gonzalez, A. Gordon, H. Gu- nawardena, C. Howald, S. Jha, R. Johnson, P. Kapranov, B. King, C. Kingswood, O. J. Luo, E. Park, K. Persaud, J. B. Preall, P. Ribeca, B. Risk, D. Robyr, M. Sammeth, L. Schaffer, L. H. See, A. Shahab, J. Skancke, A. M. Suzuki, H. Takahashi, H. Tilgner, D. Trout, N. Walters, H. Wang, J. Wrobel, Y. Yu, X. Ruan, Y. Hayashizaki, J. Harrow, M. Gerstein, T. Hubbard, A. Reymond, S. E. Antonarakis, G. Hannon, M. C. Giddings, Y. Ruan, B. Wold, P. Carninci, R. Guigo, and T. R. Gingeras. Landscape of transcription in human cells. Nature, 489(7414):101–108, 2012. [13] B. Langmead and S. L. Salzberg. Fast gapped-read alignment with bowtie 2. Nat Methods, 9(4):357–359, 2012. [14] H. Li and R. Durbin. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics, 25(14):1754–1760, 2009. [15] C. Trapnell, L. Pachter, and S. L. Salzberg. Tophat: discovering splice junctions with rna-seq. Bioinformatics, 25(9):1105–1111, 2009. [16] C. Trapnell, B. A. Williams, G. Pertea, A. Mortazavi, G. Kwan, M. J. van Baren, S. L. Salzberg, B. J. Wold, and L. Pachter. Transcript assembly and quantifica- tion by rna-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol, 28(5):511–515, 2010. [17] M. Guttman, M. Garber, J. Z. Levin, J. Donaghey, J. Robinson, X. Adiconis, L. Fan, M. J. Koziol, A. Gnirke, C. Nusbaum, J. L. Rinn, E. S. Lander, and A. Regev. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincrnas. Nat Biotechnol, 28(5):503– 510, 2010. [18] H. Kawaji, J. Severin, M. Lizio, A. R. Forrest, E. van Nimwegen, M. Rehli, K. Schroder, K. Irvine, H. Suzuki, P. Carninci, Y. Hayashizaki, and C. O. Daub. 43 Update of the fantom web resource: from mammalian transcriptional landscape to its dynamic regulation. Nucleic Acids Res, 39(Database issue):D856–D860, 2011. [19] P. P. Amaral, M. B. Clark, D. K. Gascoigne, M. E. Dinger, and J. S. Mattick. lncr- nadb: a reference database for long noncoding rnas. Nucleic Acids Res, 39(Database issue):D146–D151, 2011. [20] M. E. Dinger, K. C. Pang, T. R. Mercer, M. L. Crowe, S. M. Grimmond, and J. S. Mattick. Nred: a database of long noncoding rna expression. Nucleic Acids Res, 37(Database issue):D122–D126, 2009. [21] H. Hirai, T. Tani, N. Katoku-Kikyo, S. Kellner, P. Karian, M. Firpo, and N. Kikyo. Radical acceleration of nuclear reprogramming by chromatin remodeling with the transactivation domain of myod. Stem Cells, 29(9):1349–1361, 2011. [22] C. A. Berkes, D. A. Bergstrom, B. H. Penn, K. J. Seaver, P. S. Knoepfler, and S. J. Tapscott. Pbx marks genes for activation by myod indicating a role for a homeodomain protein in establishing myogenic potential. Mol Cell, 14(4):465–477, 2004. [23] I. L. de la Serna, K. A. Carlson, and A. N. Imbalzano. Mammalian swi/snf com- plexes promote myod-mediated muscle differentiation. Nat Genet, 27(2):187–190, 2001. [24] H. Hirai, T. Tani, and N. Kikyo. Structure and functions of powerful transactiva- tors: Vp16, myod and foxa. Int J Dev Biol, 54(11-12):1589–1596, 2010. [25] T. Miller, N. J. Krogan, J. Dover, H. Erdjument-Bromage, P. Tempst, M. John- ston, J. F. Greenblatt, and A. Shilatifard. Compass: a complex of proteins as- sociated with a trithorax-related set domain protein. Proc Natl Acad Sci U S A, 98(23):12902–12907, 2001. [26] V. Pirrotta. Polycombing the genome: Pcg, trxg, and chromatin silencing. Cell, 93(3):333–336, 1998. 44 [27] R. Cao, L. Wang, H. Wang, L. Xia, H. Erdjument-Bromage, P. Tempst, R. S. Jones, and Y. Zhang. Role of histone h3 lysine 27 methylation in polycomb-group silencing. Science, 298(5595):1039–1043, 2002. [28] D. Pasini, K. H. Hansen, J. Christensen, K. Agger, P. A. Cloos, and K. Helin. Coordinated regulation of transcriptional repression by the rbp2 h3k4 demethylase and polycomb-repressive complex 2. Genes Dev, 22(10):1345–1355, 2008. [29] T. Mahmoudi and C. P. Verrijzer. Chromatin silencing and activation by polycomb and trithorax group proteins. Oncogene, 20(24):3055–3066, 2001. [30] J. A. Kennison. The polycomb and trithorax group proteins of drosophila: trans- regulators of homeotic gene function. Annu Rev Genet, 29:289–303, 1995. [31] J. L. Rinn, M. Kertesz, J. K. Wang, S. L. Squazzo, X. Xu, S. A. Brugmann, L. H. Goodnough, J. A. Helms, P. J. Farnham, E. Segal, and H. Y. Chang. Functional demarcation of active and silent chromatin domains in human hox loci by noncoding rnas. Cell, 129(7):1311–1323, 2007. [32] E. Pasmant, I. Laurendeau, D. Heron, M. Vidaud, D. Vidaud, and I. Bieche. Characterization of a germ-line deletion, including the entire ink4/arf locus, in a melanoma-neural system tumor family: identification of anril, an antisense non- coding rna whose expression coclusters with arf. Cancer Res, 67(8):3963–3969, 2007. [33] M. Guttman, J. Donaghey, B. W. Carey, M. Garber, J. K. Grenier, G. Munson, G. Young, A. B. Lucas, R. Ach, L. Bruhn, X. Yang, I. Amit, A. Meissner, A. Regev, J. L. Rinn, D. E. Root, and E. S. Lander. lincrnas act in the circuitry controlling pluripotency and differentiation. Nature, 477(7364):295–300, 2011. [34] M. E. Dinger, P. P. Amaral, T. R. Mercer, K. C. Pang, S. J. Bruce, B. B. Gardiner, M. E. Askarian-Amiri, K. Ru, G. Solda, C. Simons, S. M. Sunkin, M. L. Crowe, S. M. Grimmond, A. C. Perkins, and J. S. Mattick. Long noncoding rnas in mouse embryonic stem cell pluripotency and differentiation. Genome Res, 18(9):1433– 1445, 2008. 45 [35] S. Loewer, M. N. Cabili, M. Guttman, Y. H. Loh, K. Thomas, I. H. Park, M. Gar- ber, M. Curran, T. Onder, S. Agarwal, P. D. Manos, S. Datta, E. S. Lander, T. M. Schlaeger, G. Q. Daley, and J. L. Rinn. Large intergenic non-coding rna-ror modulates reprogramming of human induced pluripotent stem cells. Nat Genet, 42(12):1113–1117, 2010. [36] S. Y. Ng, R. Johnson, and L. W. Stanton. Human long non-coding rnas promote pluripotency and neuronal differentiation by association with chromatin modifiers and transcription factors. EMBO J, 31(3):522–533, 2012. [37] M. Kretz, D. E. Webster, R. J. Flockhart, C. S. Lee, A. Zehnder, V. Lopez-Pajares, K. Qu, G. X. Zheng, J. Chow, G. E. Kim, J. L. Rinn, H. Y. Chang, Z. Siprashvili, and P. A. Khavari. Suppression of progenitor differentiation requires the long noncoding rna ancr. Genes Dev, 26(4):338–343, 2012. [38] J. Sheik Mohamed, P. M. Gaughwin, B. Lim, P. Robson, and L. Lipovich. Con- served long noncoding rnas transcriptionally regulated by oct4 and nanog modulate pluripotency in mouse embryonic stem cells. RNA, 16(2):324–337, 2010. [39] H. M. Blau, C. P. Chiu, and C. Webster. Cytoplasmic activation of human nuclear genes in stable heterocaryons. Cell, 32(4):1171–1180, 1983. [40] K. E. Yutzey, R. L. Kline, and S. F. Konieczny. An internal regulatory element controls troponin i gene expression. Mol Cell Biol, 9(4):1397–1405, 1989. [41] N. Yoshida, S. Yoshida, K. Koishi, K. Masuda, and Y. Nabeshima. Cell hetero- geneity upon myogenic differentiation: down-regulation of myod and. J Cell Sci, 111 ( Pt 6):769–779, 1998. [42] U. K. Laemmli. Cleavage of structural proteins during the assembly of the head of bacteriophage t4. Nature, 227(5259):680–685, 1970. [43] D. Yaffe and O. Saxel. A myogenic cell line with altered serum requirements for differentiation. Differentiation, 7(3):159–166, 1977. 46 [44] C. H. Clegg, T. A. Linkhart, B. B. Olwin, and S. D. Hauschka. Growth factor control of skeletal muscle differentiation: commitment to terminal. J Cell Biol, 105(2):949–956, 1987. [45] J. H. Yang, Y. Song, J. H. Seol, J. Y. Park, Y. J. Yang, J. W. Han, H. D. Youn, and E. J. Cho. Myogenic transcriptional activation of myod mediated by replication- independent histone deposition. Proc Natl Acad Sci U S A, 108(1):85–90, 2011. [46] S. G. Fischer and L. S. Lerman. Length-independent separation of dna restriction fragments in two-dimensional gel electrophoresis. Cell, 16(1):191–200, 1979. [47] D. J. Rodda, J. L. Chew, L. H. Lim, Y. H. Loh, B. Wang, H. H. Ng, and P. Robson. Transcriptional regulation of nanog by oct4 and sox2. J Biol Chem, 280(26):24731– 24737, 2005. [48] I. Chambers and S. R. Tomlinson. The transcriptional foundation of pluripotency. Development, 136(14):2311–2322, 2009. [49] R. J. Flockhart, D. E. Webster, K. Qu, N. Mascarenhas, J. Kovalski, M. Kretz, and P. A. Khavari. Brafv600e remodels the melanocyte transcriptome and induces bancr to regulate melanoma cell migration. Genome Res, 22(6):1006–1014, 2012. [50] S. Guil, M. Soler, A. Portela, J. Carrere, E. Fonalleras, A. Gomez, A. Villanueva, and M. Esteller. Intronic rnas mediate ezh2 regulation of epigenetic targets. Nat Struct Mol Biol, 19(7):664–670, 2012. [51] L. Kong, Y. Zhang, Z. Q. Ye, X. Q. Liu, S. Q. Zhao, L. Wei, and G. Gao. Cpc: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res, 35(Web Server issue):W345–349, 2007. [52] M. Clamp, B. Fry, M. Kamal, X. Xie, J. Cuff, M. F. Lin, M. Kellis, K. Lindblad- Toh, and E. S. Lander. Distinguishing protein-coding and noncoding genes in the human genome. Proc Natl Acad Sci U S A, 104(49):19428–19433, 2007. [53] J. Sambrook and D.W. Russell. Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press, 2001.