Analysis of Students' Programming Knowledge and Error Development

Ella Albrecht


Learning to program is a hard task since it involves different types of specialized knowledge. You do not only need knowledge about the programming language and its concepts, but also knowledge from the problem domain and general problem solving abilities. Knowing how students develop programming knowledge and where they struggle, may help in the development of suitable teaching strategies. However, the ever increasing number of students makes it more and more difficult for educators to identify students’ needs, problems, and deficiencies. The goal of the thesis is to gain insights into students programming knowledge development based on their solutions to programming exercises. Knowledge is composed of so called knowledge components (KCs). In this thesis, we focus on KCs on a syntactic level, which can be derived from abstract systax trees, e.g., loops, comparison, etc., and semantic level, represented by so called roles of variables. Since knowledge is not directly measurable, skill models are an often used for the estimation of knowledge. But, the programming domain has its own characteristics which have to be considered when selecting an appropriate skill model. One of the main characteristics of the programming domain are the dependencies between KCs. Hence, we propose and evaluate a Dynamic Bayesian Network (DBN) for skill modeling which allows to model that dependencies explicitly. Besides the choice of a concrete model, also certain metaparameters like, e.g., the granularity level of KCs, has to be set when designing a skill model. Therefore, we evaluate how meta-parameterization affects the prediction performance of skill models and which meta-parameters to choose. We use the DBN to create learning curves for each KC and deduce implications for teaching from them. But not only students knowledge but also their “mal-knowledge” is of importance. Therefore, we manually inspect students’ programming errors and determine the error’s frequency, duration, and re-occurrence. We distinguish between the error categories syntactic, conceptual, strategic, sloppiness, misinterpretation, and domain and analyze how the errors change over time. Moreover, we use k-means clustering to identify different patterns in the development of programming errors. The results of our case studies are promising. We show that the correct metaparameterization has a huge effect on the prediction performance of skill models. In addition, our DBN performs as well as the other skill models while providing better interpretability. The learning curves of KCs and the analysis of programming errors provide valuable information which can be used for course improvement, e.g., that students require more practice opportunities or are struggling with certain concepts.
learning analytics; educational data mining; programming knowledge; skill modeling; student modeling; programming errors; learning curve analysis
Document Type: 
Ph.D. Theses
2020 © Software Engineering For Distributed Systems Group

Main menu 2