This readme.txt file was generated on 2019-08-28

Title of Dataset Integrated Dietary Supplement Knowledge Base (iDISK)

Author Information Brizvi, Rubina F Vasilakes, Jake A Adam, Terrence J Melton, Genevieve B Bishop, Jeffrey R Tao, Cui Zhang, Rui ()

University of Minnesota Institute for Health Informatics, Natural Language Processing / Information Extraction (NLP/IE) Program

Funding information: This research was supported by National Center for Complementary & Integrative Health Award (#R01AT009457) (Zhang) and the Agency for Healthcare Research & Quality grant (#1R01HS022085) (Melton).

SHARING/ACCESS INFORMATION

License:

CC-By-SA Attribution-ShareAlike 3.0 United States

Recommended citation for the data:

    Rizvi, Rubina F; Vasilakes, Jake A; Adam, Terrence J; Melton, Genevieve B; Bishop, Jeffrey R; Tao, Cui; Zhang, Rui. (2019). Integrated Dietary Supplement Knowledge Base (iDISK). Retrieved from the Data Repository for the University of Minnesota, https://doi.org/10.13020/d6bm3v. 

integrated Dietary Supplements Knowledge Base (iDISK)

The integrated Dietary Supplements Knowledge Base (iDISK) is a knowledge base for dietary supplements created by standardizing and integrating multiple resources. These resources are the Dietary Supplements Label Database (DSLD), the “About Herbs” database from Memorial Sloan Kettering Cancer Center (MSKCC), the Canadian Natural Health Products and Ingredients database (NHP), as well the Natural Medicines Comprehensive Database (NMCD) developed by the Therapeutic Research Center (TRC). iDISK also contains a variety of attributes and relationships describing information about each dietary supplement such as which products it is an ingredient of and what drugs it might interact with. iDISK is available as a Neo4j graph database as well as UMLS style RRF files. See the README for details on installing and using the Neo4j version.

Note that the version released here does not include data from NMCD. Please contact the authors for further information.

General Schema

iDISK contains N concept types and R relation types:

Concept Types

  • SDSI: Semantic Dietary Supplement Ingredient
  • DSP: Dietary Supplement Product
  • DIS: Disease or Syndrome
  • SPD: Pharmaceutical Drug
  • SOC: System Organ Class
  • SS: Sign or Symptom
  • TC: Therapeutic Class

Following the UMLS, concepts are collections of synonymous atoms. An atom is a term (e.g. a possible name for a supplement, such as “Ginkgo” or “Ginkgo Biloba”) from a given data source.

Relation types

  • has_adverse_effect_on(SDSI, SOC)
  • has_adverse_reaction(SDSI, SS)
  • has_ingredient(DSP, SDSI)
  • has_therapeutic_class(SDSI, TC)
  • interacts_with(SDSI, SPD)
  • is_effective_for(SDSI, DIS)

N.B. Depending on the data release you are using (public or private) some of these relation types may not exist, as they come from restricted sources.

Neo4j Release

The Neo4j release idisk_neo4j.dump can be loaded into a Neo4j graph using the command

bin/neo4j-admin load --from=/path/to/idisk_neo4j.dump --database=<database> [--force]

This is most easily done from the “Terminal” tab in the Neo4j Desktop.

N.B. Use the --force option if you have already created the graph.

RRF Release

The RRF release idisk_rrf.zip borrows heavily from format of the UMLS Metathesaurus. There are four files:

  • MRSTY.RRF: The types of each concept.
  • MRCONSO.RRF: The atoms of each concept.
  • MRSAT.RRF: The attributes of each concept or relationship (if applicable).
  • MRREL.RRF: The relationships between concepts.

Unzip idisk_rrf.zip to obtain the files.

The RRF format is a flat-file pipe-delimited format similar to CSV. A description of the fields of each file follows:

MRSTY.RRF * CUI: The unique concept identifier. * STY: The concept type.

MRCONSO.RRF * CUI: The unique concept identifier. * AUI: The unique atom identifier. * STR: This atom’s string representation. * TTY: Term type of this atom. CN: common name, SN: scientific name, SY: synonym (unspecified). * SAB: Source database from where this atom was obtained. * SCODE: The ID in the source (if available) of this atom. * ISPREF: Y if this atom is the preferred term in the source, N otherwise.

MRSAT.RRF * ATUI: The unique attribute identifier. * UI: The unique concept or relationship identifier. * STYPE: DSCUI if UI is a concept or DSRUI if UI is a relationship. * ATN: The attribute name. * ATV: The attribute value. * SAB: Source database from where this attribute was obtained.

MRREL.RRF * RUI: The unique relationship identifier. * CUI1: The CUI of the subject concept of this relationship. * REL: The relation type. * CUI2: The CUI of the object concept of this relationship. * SAB: Source database from where this relationship was obtained.