This readme.txt file was generated on 20240604 by Kwangho Baek and revised on 20241011 Recommended citation for the data: Khani, A. and Baek, K. 2024. Southern Minnesota Rural Transit Origin, Destination, and Reservation Data (The ODR Data). Retrieved from the Data Repository for the University of Minnesota (DRUM), https://doi.org/10.13020/yhg3-qh31. ------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset Southern Minnesota rural transit origin, destination, and reservation data (The ODR data) 2. Author Information Principal Investigator Contact Information Name: Dr. Alireza Khani Institution: University of Minnesota, Twin Cities Address: 500 Pillsbury Drive SE, Room 136, Minneapolis, MN 55455 Email: akhani@umn.edu ORCID: 0000-0002-3091-4415 Associate or Co-investigator Contact Information Name: Kwangho Baek Institution: University of Minnesota, Twin Cities Address: 500 Pillsbury Drive SE, Room 175, Minneapolis, MN 55455 Email: baek0040@umn.edu ORCID: 0000-0002-2991-950X 3. Date published or finalized for release: 20241011 4. Date of data collection (single date, range, approximate date): 20230123-20230212 for the pre-deployment phase, 20231016-20231021 for the post-deployment phase 5. Geographic location of data collection (where was data collected?): Minnesota’s Brown County, Nicollet County, Le Sueur County, Blue Earth County, Waseca County, Steele County, Freeborn County, Mower County, Dodge County, Olmsted County, Winona County, Fillmore County, Houston County 6. Information about funding sources that supported the collection of the data: Federal Transit Administration’s Accelerating Innovative Mobility (AIM) Challenge Grants 7. Overview of the data (abstract): The ODR data provides detailed observations of six Southern Minnesota Transit Agencies’ trip reservations and actual trips over two one-week periods, spanning both pre- and post-MaaS deployment phases. The collected features for the reservation-based services— demand-responsive transits, some ADA paratransits, and route deviations— included the following: date and clock our of phone call (ride requests) received or reservation reception time (RRT or call-in time), the request’s intended trip date, clock hour of preferred departure time (PDT), clock hour of scheduled departure time (SDT), clock hour of actual pick-up time (APT), recorded trip duration, origin & destination (OD), fare type (cash, token, etc.), service type (paratransit, student, etc.), and some information on trip cancellations. On the other hand, the collected features for the fixed route buses include the inferred clock hours from bus schedules, expanded by the boarding and alighting activities for each bus stop. Some measures were taken to mask sensitive information. ------------------------- SHARING/ACCESS INFORMATION -------------------------- 1. Licenses/restrictions placed on the data: Attribution 4.0 International (http://creativecommons.org/licenses/by/4.0/) 2. Links to publications that cite or use the data: Baek, K., DeBruin, H., and Khani, A. 2024. "MnDOT’s Mobility-as-a-Service Platform: Assessing User Behavior and Measuring System’s Benefits." Center for Transportation Studies research report Note: after publication, report will be available in https://conservancy.umn.edu/handle/11299/241 3. Was data derived from another source? No 4. Terms of Use: Data Repository for the U of Minnesota (DRUM) By using these files, users agree to the Terms of Use. https://conservancy.umn.edu/pages/policies/#drum-terms-of-use --------------------- DATA & FILE OVERVIEW --------------------- Primary Data File List A. Filename: ODR.csv Short description: Southern Minnesota rural transit origin, destination, and reservation data pre-and post-deployment phases combined (with the phase identifier column “Phase”) Additional documentation: B. AnonymizationMethods.pdf C. DataCollectionManual_DRT.pdf D. DataCollectionManual_FixedRoute.pdf E. Post_DataCollectionWorksheet_DRT.csv F. Pre_DataCollectionWorksheet_DRT.csv G. Pre_DataCollectionWorksheet_FixedRoute_AlbertLeaExample.csv -------------------------- METHODOLOGICAL INFORMATION -------------------------- 1. Description of methods used for collection/generation of data: We requested transit agencies to conduct a manual data collection to retrieve each ride’s origin, destination, and reservation-related information. The transit agencies were given two different standardized worksheets for two groups of transit service types (1: demand-responsive transits, route deviations, and some ADA paratransit; 2: fixed route buses). We asked the agencies to collect as many features as possible, considering their operational constraints, and not to compromise the safety and quality of their services. Data collection activity for each feature varied on the existing systems of each agency. In some cases, reservation call receptionists or vehicle drivers manually collected certain features, while in others, data administrators pulled data from their agency’s automated storage systems. For the fixed route bus data collection (conducted only for the pre-deployment phase), bus drivers surveyed the alighting locations of every boarding passenger at each stop. The timestamps of boarding and alighting activities of the fixed routes are attached later using the vehicle trip identifier and predefined timetables. 2. Methods for processing the data: 1) Sensitive information was masked as follows: Passenger names were converted into irreversible 128-bit hash values using MD5 hashing. Address-based origin and destination locations were transformed into latitude and longitude coordinates, rounded to the second decimal place (with a maximum margin of error of approximately 1800 feet). Timestamps were masked to display only the clock hour. Additionally, location data for isolated points (where the nearest neighbor is more than 5 miles away) were removed. 2) Driving distances and travel times between the origin and destination for each record were calculated using OpenStreetMap's Directions API before applying the location masking procedures. Similarly, recorded travel times were computed by subtracting actual departure time from the actual arrival time before masking the timestamps. 3. Describe any quality-assurance procedures performed on the data: To fix illogical timestamps or data collection errors before the masking, we cleaned the raw timestamp values of each row in the following sequence: 1) If any timestamp was in the out-of-service hour, we swapped its AM/PM designator. 2) If the PDT preceded the RRT, we discarded the PDT. 3) If the SDT preceded the RRT, we discarded the RRT. 4) If the time difference between the PDT and SDT exceeded two hours, we deleted the PDT. 5) If the ADT preceded the APT by less than an hour, we swapped the ADT and APT. 6) If in-vehicle travel time (ADT-APT) exceeded two hours, we swapped either timestamp’s AM/PM. If the resultant time difference still exceeded two hours, we deleted the ADT. 7) If the time difference between APT and SDT was more than 30 minutes, we deleted SDT. 4. People involved with sample collection, processing, analysis and/or submission: 1) People in charge of the sample collection (by each transit agency) a. Brown County: Patrick LaCourse b. Mankato Transit: Shawn Schloesser, Joey Penkert c. Minnesota River Valley Transit (MRVT): Sherri Terhurne d. Rochester Public Transit: Bradley Bobbitt, Mike Collins, Erickson Schafer e. Rolling Hills Transit: Melinda Fields, Bill Spitzer f. Southern Minnesota Area Rural Transit (SMART): Kirk Kuchera, Chris Thompson 2) Processing: Kwangho Baek (University of Minnesota) 3) Analysis: Alireza Khani, Kwangho Baek 4) Submission: Kwangho Baek ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: ODR.csv ----------------------------------------- 1. Number of variables: 25 2. Number of cases/rows: 7,596 3. Missing data codes: “NC” Values could not be collected by the transit agency reservation receptionists and/or bus drivers “NA” Systematically not applicable values (e.g., the preferred departure time is not collected for fixed route buses) “Masked” Coordinates for location points with no nearby neighbors within 5 miles 4. Variable List A. Name: ID Description: Integer data ID. Sorted by pre- or post-deployment phase, service type, and transit agency. B. Name: Phase Description: The data collection phase Value labels: Pre-Deployment, Post-Deployment C. Name: Agency Description: Transit agency that collected the data Value labels: Brown County, Mankato, MRVT, Rochester, Rolling Hills, SMART D. Name: ServiceType Description: Transit service type Value labels: ADA Paratransit, Demand Response, Fixed Route, Route Deviation E. Name: ReservationDate Description: The date when the reservation was received F. Name: ReservationCallInHour Description: The timestamp when the reservation call was received on ReservationDate. Masked to display only the clock hour G. Name: PassengerID Description: The MD5-hash anonymized passenger ID H. Name: IntendedTripDate Description: The date of trip requested from the user or passenger; not all reservations were realized, so added “Intended” I. Name: PreferredDepartureHour Description: The desired or preferred departure time on IntendedTripDate, that the user or passenger requested to the transit agency, masked to display only the clock hour J. Name: ScheduledDepartureHour Description: The departure time on IntendedTripDate the agencies assigned and notified to the user or passenger, depending on their vehicle/driver schedules. Masked to display only the clock hour K. Name: Payment Type Description: The fare payment type Value labels: Assorted, Billed, Cash, Check, Electronic, Pass L. Name: PassengerType Description: The passenger type. Some agencies differentiate the fare depending on this value Value labels: Adult, Student M. Name: TripUnrealizedReason Description: The reason for the unrealization of the reserved trip Value labels: AgencyIncapable, NoShow, PassengerCanceled, TripHappened N. Name: PickUpHour Description: Actual pick-up time of the passenger. Masked to display only the clock hour O. Name: TravelTimeInMinutes Description: Computed by subtracting actual departure time from the actual arrival time before masking the timestamps. Units: minute P. Name: CanceledDate Description: Date of trip cancelation (call-in date for cancelation) Q. Name: OriginLat Description: The blurred latitude of the requested trip origin; for the fixed route bus trips, the exact value R. Name: OriginLong Description: The blurred longitude of the requested trip origin; for the fixed route bus trips, the exact value S. Name: DestinationLat Description: The blurred longitude of the requested trip destination; for the fixed route bus trips, the exact value T. Name: DestinationLong Description: The blurred longitude of the requested trip destination; for the fixed route bus trips, the exact value U. Name: OpenStreetMapTimeMinute Description: Expected travel time (in minutes) from the given origin and destination, assuming a personal car ride. The values are from the OpenStreetMap API V. Name: OpenStreetMapDistMile Description: Expected travel distance (in miles) from the given origin and destination, assuming a personal car ride. The values are from the OpenStreetMap API W. Name: OriginToBusStopDistanceMile Description: The destination from the origin to the nearest bus stop in mile X. Name: DestinationToBusStopDistanceMile Description: The destination from the destination to the nearest bus stop in mile Y. Name: NumOfPassengers Description: The number of passengers associated with the very reservation or ride Value labels: NC,1,2,3+