Background. Theoretical models of research data sharing often claim that, beyond a certain level of reuse, openly sharing data becomes time-efficient at the community level. However, empirical tests of these break-even reuse thresholds remain scarce, and key parameters—such as the time required to prepare data for reuse or to integrate external datasets—are rarely quantified for specific disciplines. Objectives. This paper has three main objectives: first, to empirically estimate the time costs of data collection, curation, sharing, and reuse in ecology; second, to calibrate and test a break-even reuse model using these discipline-specific parameters; and third, to extend the model with a hierarchical treatment of heterogeneous datasets, distinguishing high-value from low-value data products. Methods. We conducted a mixed-methods study combining a survey of 163 practicing ecologists on their data-related time investments and sharing practices with repository analytics from a sample of 320 ecological datasets deposited in major archives such as Dryad, GBIF, and institutional repositories. We used these data to fit hierarchical models of key time parameters and reuse rates. A Monte Carlo simulation framework then propagated parameter uncertainty to obtain posterior distributions of break-even reuse thresholds. We further stratified datasets into high-value and low-value categories and compared the time-efficiency of selective versus universal sharing strategies. Results. Across respondents, the median time required to collect a reusable ecological dataset was 30 person-days, while the additional time to prepare and deposit the dataset for reuse was 5 person-days. The median time for reusers to discover, appraise, and integrate an existing dataset was 3 person-days. Under these conditions, the median break-even reuse threshold—the minimum number of reuse events per dataset required to avoid a net time loss at the community level—was 0.3 (95% credible interval: 0.1, 0.8). Repository analytics suggested an expected reuse rate of 0.9 (95% CI: 0.5, 1.6) reuses per dataset within five years, indicating that, in ecology, current sharing practices are already time-efficient on average. High-value datasets exhibited substantially lower break-even thresholds and higher reuse rates, making them strongly time-efficient even under pessimistic assumptions, while low-value datasets hovered around break-even. Conclusions. Our results provide empirical support for the claim that data sharing in ecology is, on average, time-efficient at the community level, but also reveal considerable heterogeneity across dataset types. The extended model highlights the potential of selective sharing strategies that prioritize high-value datasets, which deliver large efficiency gains with modest curation investments. We close by discussing implications for repository design, funder mandates, and discipline-specific data policies.
Background. Theoretical models of research data sharing often claim that, beyond a certain level of reuse, openly sharing data becomes time-efficient at the community level. However, empirical tests of these break-even reuse thresholds remain scarce, and key parameters—such as the time required to prepare data for reuse or to integrate external datasets—are rarely quantified for specific disciplines. Objectives. This paper has three main objectives: first, to empirically estimate the time costs of data collection, curation, sharing, and reuse in ecology; second, to calibrate and test a break-even reuse model using these discipline-specific parameters; and third, to extend the model with a hierarchical treatment of heterogeneous datasets, distinguishing high-value from low-value data products. Methods. We conducted a mixed-methods study combining a survey of 163 practicing ecologists on their data-related time investments and sharing practices with repository analytics from a sample of 320 ecological datasets deposited in major archives such as Dryad, GBIF, and institutional repositories. We used these data to fit hierarchical models of key time parameters and reuse rates. A Monte Carlo simulation framework then propagated parameter uncertainty to obtain posterior distributions of break-even reuse thresholds. We further stratified datasets into high-value and low-value categories and compared the time-efficiency of selective versus universal sharing strategies. Results. Across respondents, the median time required to collect a reusable ecological dataset was 30 person-days, while the additional time to prepare and deposit the dataset for reuse was 5 person-days. The median time for reusers to discover, appraise, and integrate an existing dataset was 3 person-days. Under these conditions, the median break-even reuse threshold—the minimum number of reuse events per dataset required to avoid a net time loss at the community level—was 0.3 (95% credible interval: 0.1, 0.8). Repository analytics suggested an expected reuse rate of 0.9 (95% CI: 0.5, 1.6) reuses per dataset within five years, indicating that, in ecology, current sharing practices are already time-efficient on average. High-value datasets exhibited substantially lower break-even thresholds and higher reuse rates, making them strongly time-efficient even under pessimistic assumptions, while low-value datasets hovered around break-even. Conclusions. Our results provide empirical support for the claim that data sharing in ecology is, on average, time-efficient at the community level, but also reveal considerable heterogeneity across dataset types. The extended model highlights the potential of selective sharing strategies that prioritize high-value datasets, which deliver large efficiency gains with modest curation investments. We close by discussing implications for repository design, funder mandates, and discipline-specific data policies.