👁 Retrosheet
Data Resources
Several baseball researchers have set up repositories of Retrosheet data which is already parsed and which folks may find helpful to them in working with Retrosheet data. These resources include the following. If you are aware of additional resources which may be of interest to Retrosheet users, please let us know at the e-mail address shown below.
- Baseball Kaggles
Jordan Raddick, the Director of Education and Outreach at the Institute for Data-Intensive Engineering and Science (IDIES) at Johns Hopkins University in Baltimore, has created a series of Kaggles which produce csv files of all plays organized by season.
- Baseball Public Dataset
Tom Tango of MLB.com has created a public database in BigQuery form of Retrosheet play-by-play data.
- Boxball
Boxball is a project designed and maintained by David Roher that makes it easier for users to set up their own Retrosheet databases on their PCs. A single command will automatically create and populate the SQL database of your choice (PostgreSQL, MySQL, etc) with all of the tables produced by the Chadwick software as well as all of the data in the Baseball Databank. It also maintains all outputs in CSV and Parquet format for users who prefer to work directly with flat files.
- Chadwick Baseball Bureau
Ted Turocy has created a set of parsing tools which are more comprehensive than bevent. He also maintains a repository of Retrosheet data.
- Extraction and Transformation of Retrosheet Data
Craig Toner has organized Retrosheet data to be compatible with the creation of relational databases and has made his results available via a Github repository.
- Retrosheet Schedules
Mat Kovach has created a shell script to put Retrosheet's schedule files into an SQLite database.