A scientific paper consists of a constellation of artifacts that extend beyond the document itself: software, hardware, evaluation data and documentation, raw survey results, mechanized proofs, models, test suites, benchmarks, and so on. In some cases, the quality of these artifacts is as important as that of the document itself, yet many of our conferences offer no formal means to submit and evaluate anything but the paper itself. To address this shortcoming, WOOT will run for the first time an optional artifact evaluation process, inspired by similar efforts in software engineering and other areas.
The goal of the artifact evaluation process is two-fold. Our primary goal is to reward authors who take the trouble to create useful artifacts beyond the paper. Sometimes the software tools that accompany the paper take years to build; in many such cases, authors who go to this trouble should be rewarded for setting high standards and creating systems that others in the community can build on. Conversely, authors sometimes take liberties in describing the status of their artifacts—claims they would temper if they knew the artifacts are going to be scrutinized. This leads to more accurate reporting.
Our hope is that eventually, the assessment of a paper's accompanying artifacts will guide the decision-making about papers: that is, the Artifact Evaluation Committee (AEC) would inform and advise the Program Committee (PC). This would, however, represent a radical shift in our conference evaluation processes; we would rather proceed gradually. Thus, in our process of organizing an AEC for the first time at WOOT, the artifact evaluation process is optional, and authors choose to undergo evaluation only after their paper has been (conditionally) accepted for publication at the workshop. Nonetheless, feedback from the AEC can help improve the final version of the paper, the talk at the conference, and any publicly released artifacts.
The evaluation criteria are ultimately simple. A paper sets up certain expectations of its artifacts based on its content. The AEC will read the paper and then judge how well the artifact matches these criteria. Thus the AEC's decision will be that the artifact does or does not "conform to the expectations set by the paper." Ultimately, we expect artifacts to be:
- consistent with the paper
- as complete as possible
- documented well
- easy to reuse, facilitating further research
We believe the dissemination of artifacts benefits our science and engineering as a whole. Their availability improves replicability and reproducibility and enables authors to build on top of each other's work. It can also help more unambiguously resolve questions about cases not considered by the original authors.
Beyond helping the community as a whole, it confers several direct and indirect benefits to the authors themselves. The most direct benefit is, of course, the recognition that the authors accrue. But the very act of creating a bundle that can be used by the AEC confers several benefits:
- The same bundle can be distributed to third-parties and we believe this will foster science in general.
- A bundle can be used subsequently for later experiments (e.g., on new parameters).
- The bundle simplifies having to re-run the system subsequently when, say, having to respond to a journal reviewer's questions.
- The bundle is more likely to survive being put in storage between the departure of one student and the arrival of the next.
To maintain a wall of separation between paper review and the artifacts, authors will be asked to submit their artifacts only after their papers have been (conditionally) accepted for publication at WOOT. Of course, authors can and should prepare their artifacts well in advance, and provide the artifacts to the PC via supplemental materials, as many authors already do.
The authors of all conditionally accepted papers will be asked whether they intend to have their artifact evaluated and if so, to submit the artifact via a separate HotCRP instance to have a clear separation between the artifact evaluation process and the regular review process. They are welcome to indicate that they do not and the artifact evaluation does not interfere with the shepherding process at all.
After artifact submission, at least one member of the AEC will download and install the artifact (where relevant) and evaluate it. Since we anticipate small glitches with installation and use, reviewers may communicate with authors to help resolve glitches while preserving reviewer anonymity. The AEC will complete its evaluation and notify authors of the outcome. There is approximately one week between feedback from the AEC and the deadline for the camera-ready versions of accepted papers. This is intended to allow authors sufficient time to include the feedback from the AEC as they deem fit. We are aware that this is a short deadline, but scheduling constraints with the new submission model, unfortunately, prevent us from having more time.
For the camera ready version, authors that have successfully passed the evaluation process can add a special badge to their papers to demonstrate that their paper has passed this additional evaluation. We also encourage the authors to make the artifacts available such that also others can replicate the results.
Finally, the PC Chair's report will include a brief discussion of the artifact evaluation process and we plan to report on the experience of this first artifact evaluation process during the workshop.
To avoid excluding some papers, the AEC will try to accept any artifact that authors wish to submit. These can be software, hardware, data sets, survey results, test suites, mechanized proofs, and so on. Given the experience in other communities, we decided to not accept paper proofs in the artifact evaluation process. The AEC lacks the time and often the expertise to carefully review paper proofs. Obviously, the better the artifact is packaged, the more likely the AEC can actually work with it during the evaluation process.
While we encourage open research, submission of an artifact does not contain tacit permission to make its content public. All AEC members will be instructed that they may not publicize any part of your artifact during or after completing the evaluation, nor retain any part of it after evaluation. Thus, you are free to include models, data files, proprietary binaries, etc. in your artifact. Also, note that participating in the AEC experiment does not require you to later publish your artifacts, but of course we strongly encourage you to do so.
In addition, we strongly encourage that you anonymize any data files that you submit. We recognize that some artifacts may attempt to perform malicious operations by design. These cases should be boldly and explicitly flagged in detail in the readme so AEC members can take appropriate precautions before installing and running these artifacts. The evaluation of exploits and similar results might lead to additional hurdles where we still need to collect experience how to handle this best. Please contact us in case you have concerns, for example when evaluating bug finding tools or other types of artifacts that need special requirements.
The AEC will consist of about 5–10 members. We intend for other members to be a combination of senior graduate students, postdocs, and researchers, identified with the help of the WOOT Program Committee.
Qualified graduate students are often in a much better position than many researchers to handle the diversity of systems expectations we will encounter. In addition, these graduate students represent the future of the community, so involving them in this process early will help push this process forward. However, participation in the AEC can provide useful insight into both the value of artifacts, the process of artifact evaluation, and help establish community norms for artifacts. We, therefore, seek to include a broad cross-section of the WOOT community on the AEC.
Naturally, the AEC chairs will devote considerable attention to both mentoring and monitoring the junior members of the AEC, helping to educate the students on their power and responsibilities.