This readme.txt file was generated on <20210311> by Cody Hennesy.


-------------------
GENERAL INFORMATION
-------------------


1. Title of Dataset 
ACRL Conference Twitter IDs and Session Descriptions, 2011-2019

2. Author Information

  Principal Investigator Contact Information
        Name: Cody Hennesy
           Institution: University of Minnesota, Twin Cities
           Address: Wilson Library
           Email: chennesy@umn.edu
	   ORCID: 0000-0002-9410-9810

  Associate or Co-investigator Contact Information
        Name: Margot Hanson
           Institution: California State University, Maritime Academy 
           Address: California Maritime Library
           Email: mhanson@csum.edu
	   ORCID:

  Associate or Co-investigator Contact Information
           Name: Annis Lee Adams
           Institution: California State University, East Bay
           Address: University Libraries
           Email:lee.adams@csueastbay.edu
	   ORCID: 0000-0003-0428-7793

3. Date of data collection (single date, range, approximate date) 

20180721 - 20200221 (YYYYMMDD)

4. Geographic location of data collection (where was data collected?): 
Online

5. Information about funding sources that supported the collection of the data:

A "Research, Scholarly and Creative Activities" faculty development grant from Cal Maritime paid for access to historical Twitter data via Gnip.


--------------------------
SHARING/ACCESS INFORMATION
-------------------------- 


1. Licenses/restrictions placed on the data:
CC BY-NC, Attribution-NonCommercial 3.0 United States

2. Links to publications that cite or use the data:
Hanson, Margot, Cody Hennesy, and Annis Lee Adams. "A Little Birdie Told Me: Text Analysis of ACRL Conference Tweets & Programs." Association of College & Research Libraries 2021 Conference. April 15, 2021.

3. Links to other publicly accessible locations of the data:


4. Links/relationships to ancillary data sets:


5. Was data derived from another source?
           If yes, list source(s):
Twitter IDs were compiled from Gnip and the Twitter Archiving Google Sheets tool. 
The ACRL Conference Program session descriptions and titles were compiled from the program PDFs available on the ACRL website (http://www.ala.org/acrl/conferences/past). 

Twitter IDs are shared here following the Twitter Developer Agreement and Policy which is dated March 10, 2020 (https://developer.twitter.com/en/developer-terms/agreement-and-policy): "If you provide Twitter Content to third parties, including downloadable datasets or via an API, you may only distribute Tweet IDs, Direct Message IDs, and/or User IDs (except as described below). We also grant special permissions to academic researchers sharing Tweet IDs and User IDs for non-commercial research purposes."

ACRL Conference Program session data is shared following the ALA Copyright Statement (http://www.ala.org/copyright): "Permission to use, copy and distribute documents delivered from this website and related graphics is hereby granted for private, non-commercial and education purposes only, provided that the above copyright notice appears with the following notice: This document may be reprinted and distributed for non-commercial and educational purposes only, and not for resale.  No resale use may be made of material on this website at any time.   All other rights reserved." 

6. Recommended citation for the data:

Hennesy, Cody, Margot Hanson, and Aniss Lee Adams (2020). ACRL Conference Twitter IDs and Session Descriptions, 2011-2019. Data Repository for the University of Minnesota.


---------------------
DATA & FILE OVERVIEW
---------------------


1. File List
   	A. Filenames: tweet_ids_2011.csv, tweet_ids_2013.csv, tweet_ids_2015.csv, tweet_ids_2017.csv, and tweet_ids_2019.csv. 
  
      	Short description: Each comma separated value file contains a single column including a full list of unique tweet IDs for the relevant year. Tweets listed for each year were retrieved by matching on the official ACRL Conference hashtag for the relevant conference year (#ACRL2011, #ACRL2013, #ACRL2015, #ACRL2017, and #ACRL2019).      

	B. Filenames: prog_sessions_2011.csv, prog_sessions_2013.csv, prog_sessions_2015.csv, prog_sessions_2017.csv, prog_sessions_2019.csv.

	Short description: Each comma separated value file contains a single column with the titles and descriptions from every program session listed in the ACRL Conference Programs for those years. Session text was parsed from the official ACRL Program PDFs available online at http://www.ala.org/acrl/conferences/past. Each row represents a session. 

2. Relationship between files:  No relationship.      

3. Additional related data collected that was not included in the current data package:

Tweet data that was originally collected was cleaned to remove retweets, quote tweets, and duplicate tweets. For each conference, tweets from the official conference days plus two days before and after were collected. The original data included the full text of each tweet, which was not included here to follow Twitter Developer Terms of Service. See DocNow's Hydrator tool (https://github.com/docnow/hydrator) to generate JSON or CSV tweets using Tweet IDs.


4. Are there multiple versions of the dataset? 
No


--------------------------
METHODOLOGICAL INFORMATION
--------------------------

1. Description of methods used for collection/generation of data: 
The tweet data was acquired from Gnip as JSON files and from TAGS in a Google Sheet, the latter of which was exported to a CSV file.

PDF files of the official ACRL conference programs from 2011 to 2019 were downloaded from the ACRL website.

For more details, see: Hanson, Margot, Cody Hennesy, and Annis Lee Adams. "A Little Birdie Told Me: Text Analysis of ACRL Conference Tweets & Programs." Association of College & Research Libraries 2021 Conference. April 15, 2021.

2. Methods for processing the data: 
The tweet JSON and CSV files were imported into a Python computing environment, and concatenated into a Pandas dataframe after normalizing metadata columns to follow the same structure. After removing tweets that would not be analyzed, the tweet id column for each year was exported into the CSV files preserved here.

The program PDFs were parsed using Python. After the initial parsing code was run, two of the study authors manually compared Google Sheets of the resulting sessions for each conference with the PDF programs, checking for errors. The final program session data was exported to CSV files from Google Sheets, and imported into a single Pandas dataframe using Python. The titles and descriptions from each session were concatenated into a single field, which is shared here.


-----------------------------------------
DATA-SPECIFIC INFORMATION FOR: tweet_ids_2011.csv, tweet_ids_2013.csv, tweet_ids_2015.csv, tweet_ids_2017.csv, and tweet_ids_2019.csv
-----------------------------------------

1. Number of variables: 1

2. Number of cases/rows: 
2011 : 6016
2013 : 7326
2015 : 11855
2017 : 11269
2019 : 8455

3. Missing data codes:
None

4. Variable List
             
    A. Name: id
       Description: Unique ID for tweets

-----------------------------------------
DATA-SPECIFIC INFORMATION FOR: prog_sessions_2011.csv, prog_sessions_2013.csv, prog_sessions_2015.csv, prog_sessions_2017.csv, prog_sessions_2019.csv
-----------------------------------------

1. Number of variables: 1


2. Number of cases/rows: 
2011 : 367
2013 : 462
2015 : 555
2017 : 495
2019 : 571

3. Missing data codes:
None

4. Variable List
             
    A. Name: sessions
       Description: The title and description for sessions from the ACRL Conference Program for each year.