Forks on social coding platforms such as GitHub are both a technical mechanism for branching development and a social signal of community interest and reuse. Prior work has proposed “fork visibility performance” as a lens on the interoperability and survivability of programming language stacks in open source projects, but existing studies are limited by small samples and simplistic predictive models. In particular, a recent study using \(k\)-nearest neighbours on 38 projects suggested that multi-language interoperability may improve fork visibility, yet could not provide statistically robust evidence or a nuanced view of the relative importance of technical and social factors. This paper revisits fork visibility and language survivability at scale. Using data from a large corpus of GitHub repositories obtained via GHTorrent and the GitHub API, we formulate fork visibility prediction as a supervised learning problem and compare classical baselines to modern tree-based and neural architectures. We construct a rich set of features capturing language composition and interoperability, project scale and activity, social engagement, and language-network centrality. We further complement static prediction with a survival analysis of project activity, modelling the time to repository inactivity as a function of language stacks and social-technical covariates. Our contributions are threefold: (1) an operationalisation of fork visibility and language survivability suitable for large-scale analysis; (2) an empirical comparison of predictive models and feature families for fork visibility prediction; and (3) an investigation of how language interoperability interacts with social and organisational factors in shaping project survival. The paper concludes with implications for project maintainers and platform designers, and outlines how these models can underpin practical recommendations for language stack design in new projects.
Forks on social coding platforms such as GitHub are both a technical mechanism for branching development and a social signal of community interest and reuse. Prior work has proposed “fork visibility performance” as a lens on the interoperability and survivability of programming language stacks in open source projects, but existing studies are limited by small samples and simplistic predictive models. In particular, a recent study using \(k\)-nearest neighbours on 38 projects suggested that multi-language interoperability may improve fork visibility, yet could not provide statistically robust evidence or a nuanced view of the relative importance of technical and social factors. This paper revisits fork visibility and language survivability at scale. Using data from a large corpus of GitHub repositories obtained via GHTorrent and the GitHub API, we formulate fork visibility prediction as a supervised learning problem and compare classical baselines to modern tree-based and neural architectures. We construct a rich set of features capturing language composition and interoperability, project scale and activity, social engagement, and language-network centrality. We further complement static prediction with a survival analysis of project activity, modelling the time to repository inactivity as a function of language stacks and social-technical covariates. Our contributions are threefold: (1) an operationalisation of fork visibility and language survivability suitable for large-scale analysis; (2) an empirical comparison of predictive models and feature families for fork visibility prediction; and (3) an investigation of how language interoperability interacts with social and organisational factors in shaping project survival. The paper concludes with implications for project maintainers and platform designers, and outlines how these models can underpin practical recommendations for language stack design in new projects.