Computer-mediated communication (CMC) competence is a key prerequisite for effective participation in online collaborative learning and problem-based learning (PBL). Prior learning analytics work has shown that CMC competence can be forecast from behavioural traces such as message counts and login frequencies, but these models ignore the rich linguistic information contained in the messages themselves and rely on self-report questionnaires as ground truth. Building on the notion of CMC competence as a multidimensional construct encompassing knowledge, skills, and motivation in technology-mediated environments, this methodological paper proposes and specifies a text-aware forecasting model that integrates behavioural and linguistic features extracted from discussion posts and instant messages in a web-based collaborative PBL environment. The study design combines a validated CMC competence scale with multi-course xAPI logging, natural language processing of message content (including discourse acts, politeness and socio-emotional support markers, and transformer-based embeddings), and interpretable machine learning models. We articulate research questions related to prediction accuracy, added value of text features over behavioural baselines, phase-specific patterns across PBL cycles, and the interpretability of linguistic signals associated with positive communication outcomes. Rather than reporting a completed empirical study, the paper presents a detailed research blueprint, including feature engineering strategies, modelling pipelines, and an illustrative results section describing plausible analytical outcomes. The contribution is twofold: it advances the operationalisation and measurement of CMC competence beyond clickstream counts and provides a reusable analytic framework for researchers and practitioners seeking to design more informative dashboards and targeted communication support in collaborative learning environments.
Computer-mediated communication (CMC) competence is a key prerequisite for effective participation in online collaborative learning and problem-based learning (PBL). Prior learning analytics work has shown that CMC competence can be forecast from behavioural traces such as message counts and login frequencies, but these models ignore the rich linguistic information contained in the messages themselves and rely on self-report questionnaires as ground truth. Building on the notion of CMC competence as a multidimensional construct encompassing knowledge, skills, and motivation in technology-mediated environments, this methodological paper proposes and specifies a text-aware forecasting model that integrates behavioural and linguistic features extracted from discussion posts and instant messages in a web-based collaborative PBL environment. The study design combines a validated CMC competence scale with multi-course xAPI logging, natural language processing of message content (including discourse acts, politeness and socio-emotional support markers, and transformer-based embeddings), and interpretable machine learning models. We articulate research questions related to prediction accuracy, added value of text features over behavioural baselines, phase-specific patterns across PBL cycles, and the interpretability of linguistic signals associated with positive communication outcomes. Rather than reporting a completed empirical study, the paper presents a detailed research blueprint, including feature engineering strategies, modelling pipelines, and an illustrative results section describing plausible analytical outcomes. The contribution is twofold: it advances the operationalisation and measurement of CMC competence beyond clickstream counts and provides a reusable analytic framework for researchers and practitioners seeking to design more informative dashboards and targeted communication support in collaborative learning environments.