view of digital tablet on bookshelf

Transition from Paper-Based to Digital Data Collection in a field-based Epidemiological Study: A Case Report from rural India

Abstract

Electronic Data Capture (EDC) through mobile devices is a transformative approach that significantly enhances data quality. By enabling real-time validation, automating error correction, and improving security, EDC greatly reduces delays in data collection.

Research consistently demonstrates the substantial benefits of digital platforms, particularly in rural and resource-limited areas, where they lead to improved accuracy and faster access to data. However, the transition from traditional pen-and-paper to digital systems presents several challenges. Limited infrastructure, unreliable internet connectivity, device management issues, and the urgent need for effective training for field personnel can hinder progress. In this case report, we outline our research team’s transition to digital data collection, detailing our experiences, the challenges we faced, and the lessons we learned. By sharing these insights, we aim to empower others who are undertaking similar transitions. Embracing digital innovation is not merely an option; it is an essential step toward significantly enhancing public health research and driving impactful outcomes.

Background

Accurate and timely data collection is essential for field-based epidemiological research. With over a decade of experience in conducting epidemiological studies, our research team has traditionally relied on paper-based questionnaires for collecting data. This approach is effective and commonly used due to its low cost and simplicity. But paper forms are prone to transcription errors and often presents limitations such as- time-intensive data entry, and difficulties in real-time data monitoring (Walther et al., 2011). With the advancement of technology and the need for more efficient research methodologies, electronic data capture (EDC) using mobile devices has emerged as a feasible alternative. It offers real-time validation, error correction, automated skip patterns, and improved data security thereby reducing delays and improving data quality (Wahi et al., 2008).               

Several studies have demonstrated the benefits of digital platforms in rural and resource-constrained settings, including enhanced accuracy, faster data availability, and reduced logistical challenges (McLean et al., 2017; Kenny et al., 2020). But, transitioning from paper-based to digital systems is not without challenges, as it requires addressing barriers related to infrastructure, internet connectivity, device management, and training of field staff (Kaboré et al., 2022).

This case report details the shift from traditional paper forms to digital forms during our formative phase data collection for a community study in rural Andhra Pradesh, India. This case report draws on a community-based cross-sectional study conducted in the rural districts of Srikakulam and Parvathipuram Manyam, Andhra Pradesh, India, from April 2024 to August 2024. Through this study we wanted to understand the current status of existing Heat Action Plans (HAPs) across three key time points: pre-summer, during summer, and post-summer, reflecting the seasonal periods during which HAPs are operational at three levels: community, Primary Health Center, and workplace level.

Below, we highlight the practical experiences, challenges encountered, and lessons learned by our NIHR GHRC-NCD-EC SRIHER research team. Through this case report, we aim to inform other research teams considering similar shifts in comparable contexts.

1.1 Initial Approach

We developed structured questionnaires for three-time-points to capture information relevant to the study’s objectives. The tools were pilot-tested for clarity, cultural appropriateness, and feasibility before being finalized for field deployment. Due to the limited time available for developing and customizing the digital tool and the time-bound nature of data collection, we used a traditional paper-based approach in a few villages, initially. The printed questionnaires were administered in communities in the local Telugu language by trained research staff and field investigators during household visits. The field supervisors oversaw all field activities to ensure accuracy and completeness. Before administering the questionnaire, written informed consent was obtained from all the participants. Completed forms were securely stored and transported to the study office, where responses were manually entered into a digital database, cleaned, and validated prior to analysis.

This activity took place offsite, which presented several challenges, especially in field research. The process was time-consuming and increased the likelihood of errors during manual recording and transcription. The lack of automated checks led to incomplete or inconsistent responses, which affected data quality. Delays in digitizing and analyzing the information slowed down our research process. Additionally, it was difficult to monitor progress or address issues in real time, which limited our research team’s responsiveness.

1.2 Transition to Digital Form

We used two different platforms for our data collection: REDCap (Research Electronic Data Capture) for workplace-level data and SMARThealth  for community-level data. With REDCap, we collected data through traditional methods and then transferred it into the REDCap platform while the SMARThealth digital platform was used for real-time, on-site data collection at the community level. Following the initial paper-based phase at the community level, the research team transitioned to the SMARThealth digital data collection system using tablets equipped with structured questionnaires. The platform incorporated in-built checks, skip patterns, and mandatory fields to improve completeness and reduce errors, while enabling near real-time data upload for remote monitoring and timely feedback. Written informed consent was also obtained and digitally recorded for all participants. The development of the first digital tool required approximately six months, as it involved a steep learning curve and the creation of a comprehensive data dictionary. Although this stage was the most challenging, it provided the team with hands-on experience that streamlined the development of subsequent tools. With an iterative learning approach, later tool-building became faster, more efficient, and better aligned with field requirements. This transition enhanced efficiency, reduced transcription errors, and accelerated the availability of datasets for analysis.

1.3 Operationalisation

To operationalise the digital system, structured training sessions were conducted for research staff and field investigators. During the trial phase, each new data collector was paired with an experienced staff member proficient in the regional language (Telugu) to facilitate smooth communication with participants. In the initial phase of implementation, the SMARThealth team worked closely with the research lead, who maintained direct contact with data collectors, most of whom were project research staff, during field visits. This enabled the identification and resolution of challenges in real time. Additionally, open communication channels were maintained among the field teams, researchers, and developers, enabling timely troubleshooting and feedback. The combination of real-time support, structured supervision, and onsite technical assistance was critical in ensuring the successful operationalisation of the digital data collection system.

2. Results

2.1 Benefits of Digital Platform Transition

Compared to our paper-based approach, digital data collection using mobile devices or tablets with pre-programmed forms eliminates the need for printing and manual data entry. While both methods required tool design, pilot testing, and training, digital tools offered real-time data validation, reduced transcription errors, and allowed faster access to data for analysis. Paper forms, however, proved more practical in areas with limited electricity or internet access. Digital collection streamlined the process, but required upfront investment in devices and software, as well as technical support for implementation and troubleshooting.

The implementation of digital data capture yielded significant quantifiable improvements across multiple operational metrics.

Time efficiency: Paper-based methodology necessitated 5–6 days of post-collection data cleaning and validation before statistical analysis. The digital platform facilitated concurrent real-time data validation during collection, which eliminated the need for this separate cleaning phase. This saved us nearly a week in our research timeline, allowing us to move directly from data collection to analysis. Overall, this resulted in a dramatic time reduction, cutting down the time required for post-data analysis by approximately 90%, despite encountering some minor data formatting challenges.

One of our research staff members who administered the digital data collection tool noted,

“I found that the digital platform cut down the time required for administering the tool also I found it easy providing quality data”

Real-time validation: The platform demonstrated enhanced data integrity through integrated quality assurance mechanisms. Field investigators and the research staff were able to review and validate data at the point of collection, reducing data entry errors and inconsistencies.

Improved questionnaire design: Questions with automated skip logic were easily embedded in the digital platform, enabling more efficient navigation through complex questionnaires and minimizing interviewer error. This feature automatically guided interviewers to the next relevant question based on previous responses, eliminating the manual decision-making about which sections to complete or skip, thus reducing human error and ensuring consistent questionnaire administration across all data collection sessions.

Facilitation of follow-up studies: The digital system enabled the secure storage of participant identifiers, allowing for a seamless follow-up study with the same cohort of participants—a process that would have been logistically challenging with paper forms.

2.2 Challenges and Opportunities in the transition to digital data collection

Despite the advantages, the transition to digital data collection was accompanied by three major challenges:

  1. Development of the first survey tool
    We developed three tools along with their corresponding data dictionaries for the first phase of the project. The data dictionary contains the commands required for integrating each tool into the digital application. During the development of our first tool, we encountered significant challenges and made numerous mistakes in preparing the data dictionary, as this was our first experience with such work. However, based on the lessons learned from the first tool, we created a comprehensive checklist for data dictionary preparation. This checklist proved invaluable during the development of our second tool, significantly reducing our errors and streamlining the development process. The checklist is available at the end of the case study.
  2. Internet Connectivity Issues
    The digital platform required an active internet connection to synchronize data with the central server. During the implementation stage, poor internet connectivity in the remote rural study sites required field staff to travel 5–10 km to upload data, adding operational complexity. his step was crucial to avoid data loss and ensure timely uploads and was resolved through a store-and-forward feature that enabled offline data capture with automatic synchronization once connectivity was available, supported by mobile hotspots where feasible.
  3. Data Extraction Format Challenges
    Data extraction issues arose because the initial export formats were incompatible with the planned statistical analyses, requiring extensive manual restructuring. Customization of the export functions to align with analysis templates addressed this problem.

2.3 Recommendation for planning to transition to digital data collection

Based on the field experience, several practical lessons emerged to guide future transitions from paper-based to digital data collection. These recommendations are summarized in Table 1 across key domains of planning, implementation, and sustainability.

Table 1: Key recommendations for digital data collection

  • Pre-implementation planning
  • ✔️ Conduct a thorough feasibility assessment
    ✔️Develop a comprehensive data dictionary checklist


    Technology and infrastructure
    ✔️Ensure offline capabilities
    ✔️Plan for data format compatibility


    Team preparation and training
    ✔️Invest in comprehensive training
    ✔️Establish technical support systems
    Implementation strategy
    ✔️Start with pilot testing
    ✔️Prepare for initial challenges


    Long-term considerations
    ✔️Plan for scalability

    ✔️Document lessons learned  


    3. Discussion

    This case report demonstrates both the transformative potential and implementation challenges of transitioning to electronic data capture in resource-constrained rural settings. Digital platforms delivered substantial improvements in data quality, efficiency, and integrity through automated validation, skip logic, and elimination of transcription errors—outcomes consistent with established literature emphasizing EDC superiority over traditional methods (Walther et al., 2011).

    Three primary implementation barriers emerged: metadata specification challenges, reflecting the steep learning curve for digital tool development; network infrastructure limitations, creating operational delays; and data interoperability issues, requiring extensive manual restructuring. However, adaptive strategies effectively mitigated these challenges: structured validation checklists streamlined subsequent tool development, store-and-forward architecture enabled seamless offline functionality (Kenny et al., 2020), and customized export templates eliminated data wrangling requirements. This iterative problem-solving approach strengthened team capacity and established continuous learning protocols aligned with the principles of implementation science.

    Key recommendations for similar transitions, presented in Table 1, show that conducting a pre-implementation feasibility assessment with a checklist-driven development approach, integrating offline functionality with statistical software compatibility, providing comprehensive team training with structured supervision, and conducting systematic pilot testing with documented lesson integration are all essential. While the digital transition introduces operational complexity in LMIC contexts, strategic planning and adaptive capacity building enable the sustainable implementation of long-term benefits that substantially outweigh the initial challenges.

    Limitations

    This case report is limited by its context-specific nature, short-term observation period, and absence of a formal comparative evaluation. While the lessons offer practical insights, further studies are necessary to evaluate the long-term sustainability, cost-effectiveness, and user perspectives across diverse contexts.

    Conclusion

    The transition from paper-based to electronic data capture in resource-constrained rural settings, while initially challenging, represents a strategic investment in research efficiency and data quality. Our experience demonstrates that with systematic planning, adaptive problem-solving, and iterative capacity building, digital platforms can achieve a 90% reduction in data processing timelines while substantially improving accuracy and operational effectiveness. For research teams in similar contexts, the documented lessons and structured implementation framework provide a practical roadmap for successful digital transformation, ultimately advancing the quality and impact of epidemiological research in low-resource environments.

    Acknowledgments

    The authors highly acknowledge and thank the NIHR Global Health Research Centre on Non-Communicable Diseases and Environmental Change (NIHR203247) and the Department of Environmental Health Engineering, Faculty of Public Health, Sri Ramachandra Institute of Higher Education and Research, Chennai, for providing us with this platform for carrying out the work.

    Author Information

    This case study was authored by P.K Latha, with valuable contributions by Dr. Vidhya Venugopal and the team at Sri Ramachandra Institute of Higher Education (SRIHER)


    References

    1. Kaboré SS, Ngangue P, Soubeiga D, Barro A, Pilabré AH, Bationo N, Pafadnam Y, Drabo KM, Hien H, Savadogo GB. Barriers and facilitators for the sustainability of digital health interventions in low and middle-income countries: a systematic review. Frontiers in digital health. 2022 Nov 28;4:1014375.
    2. Kenny A, Gordon N, Downey J, Eddins O, Buchholz K, Menyon A, Mansah W. Design and implementation of a mobile health electronic data capture platform that functions in fully-disconnected settings: a pilot study in rural Liberia. BMC Medical Informatics and Decision Making. 2020 Feb 22;20(1):39.
    3. McLean E, Dube A, Saul J, Branson K, Luhanga M, Mwiba O, Kalobekamo F, Geis S, Crampin AC. Implementing electronic data capture at a well-established health and demographic surveillance site in rural northern Malawi. Global health action. 2017 Jan 1;10(1):1367162.
    4. Wahi MM, Parks DV, Skeate RC, Goldin SB. Reducing errors from the electronic transcription of data collected on paper forms: a research data case study. Journal of the American Medical Informatics Association. 2008 May 1;15(3):386-9.
    5. Walther B, Hossin S, Townend J, Abernethy N, Parker D, Jeffries D. Comparison of electronic data capture (EDC) with the standard data capture method for clinical trial data. PloS one. 2011 Sep 23;6(9):e25348.

    ——————————————————

    This research was funded by the NIHR (Global Health Research Centre for Non-communicable Diseases and Environmental Change) using UK international development funding from the UK Government to support global health research. The views expressed in this publication are those of the author(s) and not necessarily those of the NIHR or the UK government.

    Related Media