Vishwanathan, NanditaBandyopadhyay, ArpanFu, Hsu-YuanSharma, MohitJohnson, KathrynMudge, JoannRamaraj, ThiruvaranganOnsongo, GetiriaSilverstein, Kevin A. T.Jacob, Nitya M.Le, HuongKarypis, GeorgeHu, Wei-Shou2016-04-202016-04-202016-04-20http://hdl.handle.net/11299/179004To assess the quality of the genome assembly, a comparison was done between a publically available AMDS00000000.1 draft assembly of Chinese hamster genome with in-house UMN1.0 to mark consensus regions on the former draft assembly. Using a stringent criterion of contiguous segments larger than 1 Kbp with sequence identity greater than 97 % matching to unique reference and query sequence, 82 % of the total scaffold length consists of consensus regions with the UMN1.0. Since these regions are conserved across two independent assemblies, they are highly likely to have been assembled correctly. Similar analysis was done to compare the CHO-K1 (AFTD00000000.1) and Chinese hamster genome (AMDS00000000.1). Using the above-mentioned criteria 1.72 Gbp (71.9 %) of CHO-K1 genome can be identified as 'high confidence regions'. These regions of consensus were compiled in a bed file containing the scaffold ID, start and end positions of the 'high confidence regions' is available in this data set.Chinese hamster Ovary (CHO) cell lines are the dominant industrial workhorses for therapeutic recombinant protein production. The availability of the genome sequence of Chinese hamster and CHO cells will spur further genome and RNA sequencing of producing cell lines. However, the mammalian genomes assembled using shot-gun sequencing data still contain regions of uncertain quality due to assembly errors. Identifying high confidence regions in the assembled genome will facilitate its use for cell engineering and genome engineering. This dataset includes two genome annotation files that identify the 'high confidence regions' shared by the genome assemblies in comparison. The potential use of these files are to find locations in the publically available genome which are likely to be assembled correctly. These regions can be used confidently for genome engineering.CHO cellsGenomicsGenome quality assessmentRegions of High Confidence in Chinese Hamster and CHO-K1 Genome AssembliesDatasethttp://dx.doi.org/10.13020/D6Z304