Implications of Ceiling Effects in Defect Predictors

Loading...
Thumbnail Image

View/Download File

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Implications of Ceiling Effects in Defect Predictors

Published Date

2008

Publisher

Type

Report

Abstract

Context: There are many methods that input static code features and output a predictor for faulty code modules. These data mining methods have hit a "performance ceiling"; i.e., some inherent upper bound on the amount of information offered by, say, static code features when identifying modules which contain faults. Objective: We seek an explanation for this ceiling effect. Perhaps static code features have "limited information content"; i.e. their information can be quickly and completely discovered by even simple learners. Method:An initial literature review documents the ceiling effect in other work. Next, using three sub-sampling techniques (under-, over-, and micro-sampling), we look for the lower useful bound on the number of training instances. Results: Using micro-sampling, we find that as few as 50 instances yield as much information as larger training sets. Conclusions: We have found much evidence for the limited information hypothesis. Further progress in learning defect predictors may not come from better algorithms. Rather, we need to be improving the information content of the training data, perhaps with case-based reasoning methods.

Keywords

Description

Associated research group: Critical Systems Research Group

Related to

Replaces

License

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

PROMISE '08 Proceedings of the 4th international workshop on Predictor models in software engineering

Suggested citation

Menzies, Tim; Turhan, Burak; Gay, Gregory; Bener, Ayse; Cukic, Bojan; Jiang, Yue. (2008). Implications of Ceiling Effects in Defect Predictors. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/217361.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.