Pakhomov, Serguei2018-05-032018-05-032018-05-03https://hdl.handle.net/11299/1962651. MayoSRS.csv: A set of 101 medical concept pairs manually rater by medical coders for semantic relatedness. 2. MiniMayoSRS.csv: A subset of 29 medical concept pairs manually rater by medical coders for semantic relatedness with high inter-rater agreement. 3. UMNSRS_similarity.csv: A set of 566 UMLS concept pairs manually rated for semantic similarity using a continuous response scale. 4. UMNSRS_relatedenss.csv: A set of 588 UMLS concept pairs manually rated for semantic relatedness using a continuous response scale. 5. UMNSRS_similarity_mod449_word2vec.csv: Modification of the UMNSRS-Similarity dataset to exclude control samples and those pairs that did not match text in clinical, biomedical and general English corpora. Exact modifications are detailed in the referenced paper. The resulting dataset contains 449 pairs. 6. UMNSRS_relatedness_mod458_word2vec.csv: Modification of the UMNSRS-Similarity dataset to exclude control samples and those pairs that did not match text in clinical, biomedical and general English corpora. Exact modifications are detailed in the referenced paper. The resulting dataset contains 458 pairs.This is a collection of reference standards created to test and validate computerized approaches to quantifying the degree of semantic relatedness and similarity between medical terms. Each dataset consists of a list of term pairs that have been evaluated by various healthcare professionals (e.g., medical coders, residents, clinicians) to determine the degree of semantic relatedness and similarity. The details pertaining to each dataset are provided in the referenced publications.CC0 1.0 Universalhttp://creativecommons.org/publicdomain/zero/1.0/semantic relatednesssemantic similaritymedical terminologyword embeddingstext miningnatural language processinglexical semanticshealth informaticsSemantic Relatedness and Similarity Reference Standards for Medical TermsDatasethttps://doi.org/10.13020/D6CX04