Data Practices What Exactly Do We Mean By Data?

  • Date: February 18, 2022

Data takes many forms, and the word itself has many meanings. For the purposes of this report, data is digital information that is collected, structured, and can be further analyzed. The first section of this report will speak to critical issues regarding collecting and structuring data for transit agencies, while the second section of the report will address how agencies are analyzing data.

Data collection is more than just a matter of clipboards, paper, and pens. Rural, tribal, and small urban agencies are increasingly making use of sensors to passively collect large amounts of data that can be mined for transportation insights. In some cases, agencies may be able to gain access to novel datasets based on aggregate cellphone location data from third-party providers.

Manual Data Source example
Manual Data Source
Passive Data Sources - Bus
Passive Data Sources

Data structure and management is the next challenge agencies must face. Most data maintained by transit agencies are structured into a tabular format: tables with rows representing an individual record and columns with the characteristics, often numeric values, text, or dates. The kinds of data that exist are, in fact, much broader, but, in practice, many of the forms of data described in this report will fit this description. For effective analysis of data to take place, agencies must be mindful of data cleaning, data structures, common data standards, storage considerations, and consistent key fields—like route names and dates—so that disparate data sources can be related together as needed.

External Data concept
External Data
Data Standards example
Data Standards

A growing expectation of public agencies is that they will publish “open data” for inspection and analysis by the broader public. For some transit agencies, this will simply be a matter of submitting National Transit Database (NTD) reports, while others may create open data websites that provide route geometry files, transit schedule data, and ridership data. Open data, in particular, is more likely to have longitudinal characteristics (tracking changes to records over time, such as an agency’s ridership) or spatial characteristics (recording the location of features, such as where stops are located). Even if your agency does not directly publish open data, this report will give you guidance on how to make use of open data sources that are available to you.

Longitudinal Data example
Longitudinal Data
Spatial Data example
Spatial Data