TravisTorrent, a GHTorrent partner project, provides free and easy-to-use Travis CI build analyses to the masses through its open database.
We access the Travis CI API and for each build, combine vanilla API data (such as build number and build result), with an analysis of the build log (such as how many tests were run, which test failed, …) and repository and commit data from GitHub (such as latency between pushing and building), acquired through GHTorrent.
If you used (and liked) the TravisTorrent data set, please cite our openly available preprint.
Beller M, Gousios G, Zaidman A. (2017) TravisTorrent: Synthesizing Travis CI and GitHub for Full-Stack Research on Continuous Integration
@inproceedings{msr17challenge,
title={TravisTorrent: Synthesizing Travis CI and GitHub for Full-Stack Research on Continuous Integration},
author={Beller, Moritz and Gousios, Georgios and Zaidman, Andy},
booktitle={Proceedings of the 14th working conference on mining software repositories},
preprint={http://www.st.ewi.tudelft.nl/~mbeller/publications/2017_beller_gousios_zaidman_travistorrent_synthesizing_travis_ci_and_github_for_full-stack_research_on_continuous_integration.pdf},
year={2017}
}
TravisTorrent has reached the state of a minimally working prototype. Currently, we use a static process to update our databases. We plan to automatically synchronize TravisTorrent with Travis, at least for a select number of projects (as analyzing log files is a CPU intensive process, and linking GitHub data requires lots of HTTP requests). We also plan to give our users the opportunity to add projects they are interested in. At this point, our log analyses are focused on general Travis data and provide testing specialisations for Ruby and Java, but we plan to extend them.
We are doing research on software repositories, testing and continuous integration. Travis CI is an exciting new data source for us, one that has several of the problems we are facing as data miners solved. The uniformity of data will allow scaling of research to hundreds or thousands of repositories spanning across multiple languages and application domains. However, we encountered several non-trivial technical issues that might make it all but trivial for end users or researchers to use Travis data before processing, especially when they want to combine it with GitHub data.
TravisTorrent was chosen to resemble the close proximity to the GHTorrent project. According to GHTorrent’s website, “the name signifies a torrent of data coming from GitHub.” Similarly, TravisTorrent can be seen as the torrent of logs arriving from Travis CI.
tr_
(for example, for Rails/Rails
, we recognized a few Java builds, even though the project couldn’t be more Ruby). Buildlog analysis is somewhat of a black art. Help us with your ideas by contributing!Absolutely. Both our TravisPoker and TravisHarvester are publicly available and looking forward to receiving your PRs!