This codebook.txt file was generated on 20180709 by ------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset: Link Lists for Websites Reporting Information on Hurricane Sandy from 2003 to 2012 2. Author Information Principal Investigator Contact Information Name: Matthew Weber Institution: University of Minnesota Address: 111 Murphy Hall, 206 Church St SE, Minneapolis, MN 55455 Email: msw@umn.edu 3. Date of data collection (single date, range, approximate date) 20121101 - 20170930 4. Geographic location of data collection (where was data collected?): NA 5. Information about funding sources that supported the collection of the data: National Science Foundation -------------------------- SHARING/ACCESS INFORMATION -------------------------- 1. Licenses/restrictions placed on the data: Attribution-NonCommercial-ShareAlike 3.0 United States 2. Links to publications that cite or use the data:Weber, M. S. (2018). Methods and Approaches to Using Web Archives in Computational Communication Research. Communication Methods and Measures, 1-16. https://doi.org/10.1080/19312458.2018.1447657 3. Links to other publicly accessible locations of the data: http://archivehub.rutgers.edu/downloads/ 4. Links/relationships to ancillary data sets: Data were derived from master Web Wide Crawls collected and maintained by the Internet Archive 5. Was data derived from another source? If yes, list source(s): Internet Archive (archive.org) 6. Recommended citation for the data: Weber, Matthew S. (2018). Link Lists for Websites Reporting Information on Hurricane Sandy from 2003 to 2012. Retrieved from the Data Repository for the University of Minnesota, https://doi.org/10.13020/D6JM43. --------------------- DATA & FILE OVERVIEW --------------------- 1. File List A. Filename: NSFIA_SANDY_2003_2012-all.tar Short description: Compressed folder of text files with hyperlinks that existed between websites reporting information on Superstorm Sandy from 2003 – 2012. B. Filename: Readme Short description: Documentation file 2. Relationship between files: NA 3. Additional related data collected that was not included in the current data package: See additional link list files in the main repository; data are part of a broader set of social science data tracing hyperlink activity amongst websites 4. Are there multiple versions of the dataset? no -------------------------- METHODOLOGICAL INFORMATION -------------------------- 1. Description of methods used for collection/generation of data: Raw data were collected by extracting archived web pages in the WARC format from the Internet Archive's repository. Data were selected based on seed lists created by the research team that captured the primary websites relating to the focal event. Seed lists were generated based on news coverage of a given event or topic, as well as preexisting lists of hyperlinks pertaining to a given topic. 2. Methods for processing the data: The WARC files contain all relevant data from a main webpage. For instance, a WARC file will contain data such as the text of a webpage, records of image files, video files, hyperlinks, and other text directly relevant to a webpage. In order to process the raw WARC files and reduce the data to a more manageable format, hyperlinks and associated descriptive text were extracted into text files, reducing the file size by more than a 1:10 ratio. 3. Instrument- or software-specific information needed to interpret the data: Data are readable via any word processing document. Optimally, data should be processed using R or Python. Data are designed to be analyzed using social network analysis as a method, and associated social network analysis packages in R and Python can be utilized. 4. Standards and calibration information, if appropriate: NA 5. Environmental/experimental conditions: Data capture web data from the specified time range. 6. Describe any quality-assurance procedures performed on the data: See publications for information on reliability and accuracy. 7. People involved with sample collection, processing, analysis and/or submission: Data were collected by the PI and the associated research team. ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: NSFIA_SANDY_2003_2012-all.tar ----------------------------------------- Contains compressed folder "sandy", which contains: SANDY_2003_2012_0001.txt SANDY_2003_2012_0002.txt SANDY_2003_2012_0003.txt SANDY_2003_2012_0004.txt SANDY_2003_2012_0005.txt SANDY_2003_2012_0006.txt SANDY_2003_2012_0007.txt SANDY_2003_2012_0008.txt SANDY_2003_2012_0009.txt SANDY_2003_2012_0010.txt SANDY_2003_2012_0011.txt SANDY_2003_2012_0012.txt SANDY_2003_2012_0013.txt SANDY_2003_2012_0014.txt SANDY_2003_2012_0015.txt SANDY_2003_2012_0016.txt SANDY_2003_2012_0017.txt SANDY_2003_2012_0018.txt SANDY_2003_2012_0019.txt SANDY_2003_2012_0020.txt SANDY_2003_2012_0021.txt SANDY_2003_2012_0022.txt SANDY_2003_2012_0023.txt SANDY_2003_2012_0024.txt SANDY_2003_2012_0025.txt SANDY_2003_2012_0026.txt SANDY_2003_2012_0027.txt SANDY_2003_2012_0028.txt SANDY_2003_2012_0029.txt SANDY_2003_2012_0030.txt SANDY_2003_2012_0031.txt SANDY_2003_2012_0032.txt SANDY_2003_2012_0033.txt SANDY_2003_2012_0034.txt SANDY_2003_2012_0035.txt SANDY_2003_2012_0036.txt SANDY_2003_2012_0037.txt SANDY_2003_2012_0038.txt SANDY_2003_2012_0039.txt SANDY_2003_2012_0040.txt SANDY_2003_2012_0041.txt SANDY_2003_2012_0042.txt SANDY_2003_2012_0043.txt SANDY_2003_2012_0044.txt SANDY_2003_2012_0045.txt SANDY_2003_2012_0046.txt SANDY_2003_2012_0047.txt SANDY_2003_2012_0048.txt SANDY_2003_2012_0049.txt SANDY_2003_2012_0050.txt SANDY_2003_2012_0051.txt SANDY_2003_2012_0052.txt SANDY_2003_2012_0053.txt SANDY_2003_2012_0054.txt SANDY_2003_2012_0055.txt SANDY_2003_2012_0056.txt SANDY_2003_2012_0057.txt SANDY_2003_2012_0058.txt SANDY_2003_2012_0059.txt SANDY_2003_2012_0060.txt SANDY_2003_2012_0061.txt SANDY_2003_2012_0062.txt SANDY_2003_2012_0063.txt SANDY_2003_2012_0064.txt SANDY_2003_2012_0065.txt SANDY_2003_2012_0066.txt SANDY_2003_2012_0067.txt SANDY_2003_2012_0068.txt SANDY_2003_2012_0069.txt SANDY_2003_2012_0070.txt SANDY_2003_2012_0071.txt SANDY_2003_2012_0072.txt SANDY_2003_2012_0073.txt SANDY_2003_2012_0074.txt SANDY_2003_2012_0075.txt SANDY_2003_2012_0076.txt SANDY_2003_2012_0077.txt SANDY_2003_2012_0078.txt SANDY_2003_2012_0079.txt SANDY_2003_2012_0080.txt SANDY_2003_2012_0081.txt SANDY_2003_2012_0082.txt SANDY_2003_2012_0083.txt SANDY_2003_2012_0084.txt SANDY_2003_2012_0085.txt SANDY_2003_2012_0086.txt Each text file contains the same columns: A. Source B. destination C. date D. hyperlink text E. content type (e.g. http page, video, image, etc.) F. valid page (e.g. response code received during crawl) G. page size in bytes