============== Title: Term Coverage of Dietary Supplements in Product Labels Author: Yefeng Wang Email: wang4688 (at) umn.edu Date: 7 July 2016 Web Location: Data Repository for the University of Minnesota (DRUM) Web Address: http://hdl.handle.net/11299/181275 ============== Input ============== lstProducts.csv - CSV file that is used to retrieve information in InfoDownload.py. It is included in the repository. Python Scripts ============== Note: Requires Python 3. See https://www.python.org/downloads/ to download. The Python files are listed in the order that they should be executed. InfoDownload.py - Code for retrieving information directly from the DSLD website. Reads from lstProducts.csv. Note that this script will take some time to execute, potentially several hours, as it performs four HTTP requests for each of the over 45,000 rows in lstProducts.csv. type_print.py - Code that uses output from InfoDownload.py to create SupType.csv. ingrTypeStat.py - Code for reading from DSF CSVs, retrieving the ingredient category, and counting ingredients under each category. Outputs ingrTypeStat.csv. new_suptypelist.py - Code that defines regular expression filters and performs normalization. By default, outputs {ingredient_category}_new.txt, which are comparisons between the original data and the normalized data. Set normalize_switch value in this script (line 14) to True to generate {ingredient_category}.txt, which are the normalized data (with duplicates removed). type_stat.py - Code for counting the ingredient number under each LanguaL(TM) category. Outputs TypeStat.csv. Normalized ingredient category lists (included in repository) ============================================================= amino acid.txt animal part or source.txt bacteria.txt blend.txt botanical.txt carbohydrate.txt chemical.txt default.txt element.txt enzyme.txt fat.txt fatty acid.txt fiber.txt hormone.txt mineral.txt other.txt polysaccharide.txt protein.txt vitamin.txt Comparison between original and normalized ingredients (included in repository) ============================================================= amino acid_new.txt animal part or source_new.txt bacteria_new.txt blend_new.txt botanical_new.txt carbohydrate_new.txt chemical_new.txt default_new.txt element_new.txt enzyme_new.txt fat_new.txt fatty acid_new.txt fiber_new.txt hormone_new.txt mineral_new.txt other_new.txt polysaccharide_new.txt protein_new.txt vitamin_new.txt