Experimental research on human behavior outside the confines of a laboratory has long been a crucial aspect of psychological research (Grzyb & Doliński, 2021). Field experiments conducted in natural environments such as parks, shops, workplaces, or public spaces provide insights by observing and measuring behavior in settings where participants feel more comfortable and behave more naturally (Bandiera et al., 2011; Brent et al., 2017). This approach offers knowledge of human actions and interactions in diverse contexts, from physical activities (Pourchon et al., 2017) to social behavior (Benz & Meier, 2008).
Conducting such studies comes with technical challenges that may discourage many researchers and suppress a broader adoption of field research. The surge of mobile technology, particularly smartphones, has presented an innovative solution to many of these challenges. Smartphones are an integral part of daily life for most individuals in developed countries. This offers a user-friendly and accessible means for participants to receive instructions and provide feedback outside laboratories. Despite the potential of mobile devices to transform field research, existing mobile research software often falls short regarding ease of use, affordability, and security. Researchers may encounter hurdles such as software complexity, high costs, difficulty in adopting new tools (especially those with steep learning curves or requiring advanced programming knowledge), understanding documentation, or data security and privacy concerns.
Last but not least, available research solutions do not capitalize on the recent rapid growth of Artificial Intelligence (AI), Large Language Models (LLM) (Vaswani et al., 2017), and their adoption via chat-based interfaces. Given the potential of AI to understand natural language and effectively interact with human agents, AI might increasingly be considered a significant help in the research development process, e.g., by assisting in the preparation of research materials. This might streamline and facilitate researchers' work, allowing the researchers to focus more on strategic decisions, challenging tasks, quality control, and other tasks that cannot be re-directed to artificial intelligence. Thus, building text-based research systems compatible with LLMs is a priority that might provide a boost and a long-term breakthrough in research methods.
Recognizing these gaps, we developed the Pocket Lab App (PoLA), a tool designed to address the limitations of existing mobile research applications and the potential of LLMs. PoLA is a freely available, open-source application for Android devices—a platform chosen for its wide adoption and flexibility. Designed with user-friendliness, PoLA simplifies presenting instructions, managing experiment timing, and collecting participant responses. It operates offline and stores data locally on the device, significantly reducing security risks associated with data transmission and storage on external servers. Rather than depending on graphical user interfaces or complex coding (Meer et al., 2020; Peirce et al., 2019), it operates on elementary, straightforward text commands and syntax. This reduces the learning curve for new users. It's simplicity and text-based format also facilitates an effective integration with LLMs that excel in language operations. To this end, we introduce a chat-based AI component that streamlines the creation and revision of experimental protocols through natural language instructions. PoLA's Helpful Assistant (HeLA), provides immediate interactive support to get started, develop, troubleshoot, and get creative with the PoLA.
Addressing Limitations of Current Mobile Research Applications
Research applications for smartphones serve as crucial tools in collecting data outside traditional laboratory settings. A range of options are available, each with its own set of features and limitations (Arslan et al., 2018; Meer et al., 2020; https://comparison-to.formr.org). PoLA specifically addresses several limitations often encountered in existing platforms.
Use of Laboratory-Owned Smartphones. Most mobile data collection software relies on participants' own smartphones (Arslan et al., 2018). While necessary for studies that do not require in-person interaction with the researcher, this approach does not suit all research designs. For instance, when a study demands participants to be physically present in a lab, installing the research software on a laboratory-owned smartphone or lab phone becomes advantageous (Meer et al., 2020). This strategy offers numerous benefits, such as enhanced control and standardization over the experiment's technical aspects (screen resolution, brightness, or response times), stability in the software's performance, unaffected by external apps or settings on a personal device, flexibility for participants who may need their smartphones for other tasks during the study (e.g., geolocation gaming), and inclusion of individuals without smartphones or those hesitant to install research software on their personal devices due to privacy concerns. Systems that run on participants' smartphones can also be installed on laboratory smartphones. Yet, they often still involve installation and configuration complexity that might be unnecessary for systems developed natively for lab phones. For instance, as PoLA was developed for lab phones, it offers an install-and-run approach, allowing data collection immediately after downloading.
Data security. There has been an increasing concern about how data is managed in research. Commonly, mobile research tools require sending participants' data to third-party servers, raising concerns about data misuse and breaches. In contrast, PoLA emphasizes local data storage, ensuring that sensitive information remains on the device and is shared exclusively with the research team. This approach not only alleviates potential privacy concerns of the participants but also aids in adhering to stringent data protection regulations like the General Data Protection Regulation in Europe), as the data does not cross borders or get handled by third parties.
Artificial intelligence assistance. Unlike other platforms, PoLA leverages artificial intelligence through customized prompts to streamline interactions with the application environment. This provides improved, faster, and more comfortable adoption and encourages its more creative use. PoLA's AI feature optimizes the research protocol creation process, simplifies understanding documentation, and assists in troubleshooting, making the tool more accessible to researchers without extensive technical expertise and allowing all researchers (regardless of their technical expertise) to focus on tasks that AI cannot perform.
Cost efficiency. PoLA is licensed under GNU General Public License version 3 (GPLv3). This open-source model ensures that researchers from any location can freely use, modify, and share PoLA. This offers research cost efficiency. In contrast, several applications for mobile research require considerable payment plans. This financial barrier can be a deterrent for researchers operating with limited funding — including those in the early stages of their careers, researchers from developing countries, or academic institutions with constrained budgets. PoLA, being free, opens up opportunities for a broader range of research projects, particularly for those still in the prototype phase or awaiting grant approvals.
Regional accessibility. Some platforms restrict access to specific regions due to regional agreements around data collection and storage (Evans, 2012). These limitations can suppress international collaboration or prevent researchers in specific locales from accessing the best tools for their needs. PoLA's global accessibility ensures that it can be a valuable resource for researchers worldwide, encouraging a more inclusive and diversified research community. It facilitates international collaboration and the pooling of global insights to enhance its features, ensuring the app evolves with the input of diverse perspectives.
Long-Term Reliability and Stability. Some internet-based platforms can be discontinued, which might be risky for longer projects (e.g., Meers et al., 2020). Moreover, reliance on servers rather than the smartphone internal storage exposes research projects to additional risks, such as server downtime or maintenance shutdowns, potentially interrupting study flow and data collection. PoLA ensures a smoother, more predictable experience for researchers and participants by operating locally and not depending on internet server connections.
Integrated survey and notifications systems. Some mobile research applications focus solely on sending timed notifications but require separate services for survey administration (Shevchenko et al., 2021). This complicates setting up the study data collection and might inflate costs for external services. PoLA integrates these signaling and collecting functions within a single application, reducing complexity and streamlining the research process.
Development
PoLA is an Android application developed in Android Studio Iguana 2023.2.1, and coded in Kotlin programming language. PoLA runs on Android 11 or higher smartphones, making it suitable for ca. 56% of devices on the market. The main components of the application are responsible for loading a .txt file with the study protocol provided by the user, presenting instructions to participants, setting alarms to signal consecutive phases of the experiment, collecting self-report data, and saving the data locally (along with a backup) in an accessible format that the researcher can easily and quickly transfer to a PC and process (Figure 1). The application is open-source, licensed under GNU GPL version 3, and scalable. Users can inspect, modify, and expand upon the existing code to tailor the app to specific research needs, fostering innovation and adaptability within the research community.
We developed a custom prompt for LLMs, PoLA Helpful Assistant (HeLA). HeLA comprehensively understands the commands and syntax necessary to develop research protocols. The prompt can be copied into chatbots based on LLMs with a large context window, e.g., ChatGPT (OpenAI, US), Gemini (Google AI, US), or Claude (Anthropic, US). This initiates a conversation followed by specific prompts regarding PoLA use. For convenience, we developed a tailored version of ChatGPT (GPTs) using custom GPTs functionality in ChatGPT 4.0. In this case, users open the HeLA window and ask questions immediately because the HeLA prompt (along with additional documentation) has already been integrated into the system.
Researcher's perspective on the PoLA use
The process begins when researchers download and install PoLA onto a laboratory smartphone via a straightforward .apk file download. To ensure the application's uninterrupted operation during the study, it is advised to pin the app to the screen, preventing participants from inadvertently exiting the app or accessing unrelated applications.
Upon its launch, researchers are prompted to load a study-specific protocol formatted as a .txt file outlining the sequence of instructions, timers, and surveys that guide the participants through the experiment and collect responses. This .txt file is easily crafted in any text editor, translating each line of text into a sequence of discrete, actionable slides within PoLA. The researcher can write the protocol based on documentation and revise it by HeLA. Alternatively, the research can describe the study in human language, and HeLA will suggest the protocol that needs manual checking and approval after, in most cases, minimal correction (Figure 2).
Below is an example of a bogus study protocol involving comments in natural language (preceded by a double slash) followed by PoLA-specific commands. It is worth noting that HeLA can generate these formal commands based on the user's natural language comments, considering the overall context while giving suggestions for specific sections. In the examples provided below, PoLA is requested to produce bogus scales, while the instructions, items, and response formats for official scales can be provided.
During the experiment, the researchers start the session and hand the phone to the participant, who follows instructions and provides responses. The application saves data after each response and generates a more extensive report (along with a backup). Suppose the user wants to integrate the data collected with PoLA with data from other devices (e.g., mobile physiological monitors). In that case, we advise a time-based synchronization after syncing the clocks of both devices, e.g., using internet-based time synchronization.
Once the procedure is completed, the participant returns the lab phone. The researcher navigates to the smartphone's Documents folder and downloads data in widely adopted formats (.csv or .xlsx) to their PC for statistical analysis.
Participant's perspective on the application use
Upon arrival, each participant is briefed on using the lab phone equipped with the PoLA, which will guide them through the study's procedures and gather their responses. After reading the instructions and trying PoLA's functions (e.g., switching off alarms or triple taps for proceeding after critical instructions), the participant can discuss any questions regarding the app's use with the research assistant. During the experiment, the participant carries the lab phone in a pocket or a sachet if more standardized settings are required. Usually, the study will involve some baseline measurements followed by an experimental activity. At set intervals, signaled by an alarm from the lab phone, participants are prompted to engage with the device. They dismiss the alarm, review the upcoming instructions, complete self-report surveys, and proceed as directed. Upon completing the experiment's requirements, participants return to the lab and hand the lab phone back to the research assistant. It marks the end of their active role in the study as the research assistant concludes the session by closing the application.
Protocol
PoLA follows instructions from a .txt protocol file uploaded by the user. A protocol is stored and used continuously until another protocol is uploaded. The protocol can be created in any text file editing program, e.g. Notepad or Microsoft Office Word. The most convenient way is to develop the protocol on a personal computer, transfer it through a USB cable from the PC to the lab phone, or download it to the lab phone via cloud storage. Table X presents available commands. Demo and tutorial protocols are available within the app for educational purposes. In PoLA, the researcher can preview and inspect the app's protocol with additional graphical features that help confirm the protocol's integrity.
Commands overview
PoLA incorporates commands designed to manage the experiment flow, collect self-reports, and record meta-data or any additional information the researcher might want to include in the output.
Flow. PoLA displays instructions (INSTRUCTION). To avoid accidental progressions to the next slide, participants can be prompted to tap the screen three times (TAP_INSTRUCTION) to reveal the "Continue" button, ensuring deliberate navigation through the app in the key moments that can compromise the procedure. To manage transitions between study periods (e.g., sessions intertwined with reports of emotions), the researcher can set timers (TIMER), which include a count-down timer that signals its end with sound and vibration.
Self-reports. PoLA includes scales with up to nine response buttons for each item. One command (a line in the protocol) can include one item (SCALE) or a list of items (MULTISCALE). The items can be presented in a randomized order (RANDOMIZED_MULTISCALE). Participants provide open-ended responses or other text entries with input fields (INPUTFIELD) for text data. Input fields are also helpful for logging data (e.g., participant ID) by the researcher upon starting the application before handing the lab phone to the participant.
Metadata. For greater clarity, the researcher can log any pre-defined text in the output file (LOG), e.g., the phase of the experiment (instructions, baseline, or the block number if the session is split into blocks). The study's name (STUDY_ID) can also be provided in the protocol so that PoLA names each output file using the name of the study, which helps organize the data collection and integration. Comments in the protocol are accepted in any format (e.g., double slash) as the application skips lines that do not start with any recognized command.
Syntax
PoLA uses simple syntax. Each command takes one line. Elements in each line (command, header, body, button text, response options, etc.) are separated by semicolons. Any line that does not start with any command is ignored, allowing for formatting of the protocol file (e.g., spacing or commenting) according to the researcher's preferences.
Output
Each PoLA launch generates an output file with a name that includes the name of the study (if provided via the STUDY_ID command), date, and time. After asking for permissions, the files are saved in two sub-folders created in the smartphone's Documents folder, i.e., PoLA_Data and PoLA_Backup. One file is in .csv format (tab-separated values), and the other is in MS Excel (.xlsx) format. Both files are automatically backed up in the backup folder, and their filenames are marked as backups. The .csv file includes time-stamped data: the date, time, the command/slide with its details, and the participant's responses in two formats: text (button's label) and numeric (which button was tapped). The Excel file includes the data and additional information in separate sheets, such as the protocol used for the data collection and the smartphone details.
AI-based assistance
Overview
PoLA operates on instructions provided as plain text. Consequently, it is adequate for employing Large Language Models (LLMs), which display high accuracy in understanding texts and generating novel text based on provided examples and prompts. PoLA uses a relatively short set of commands and simple syntax, which fits the current LLMs' limited context windows. For instance, as of March 2024, OpenAI's GPTs tailored to specific user needs, can accept up to 8000 letters of instruction that are kept in attention while responding to prompts. Such an amount is sufficient to explain to an LLM how PoLA's protocols are constructed and provide examples. Consequently, users can leverage the LLMs to assist in PoLA's use.
We developed PoLA's Helpful Assistant (HeLA), a custom GPT based on GPTs functionality in ChatGPT 4.0 (OpenAI, US). This assistant has the necessary knowledge to review protocols written by humans, generate a single command or a complete protocol based on a description formulated in natural language, and turn questionnaires (pasted as text or a picture, e.g., screenshot) into PoLA slides. Although more feedback from more varied users is needed, we found HeLA knowledgeable, creative, and accurate in writing study protocols for PoLA. HeLA also provides adequate training in how the app works and how to use it effectively.
As GPTs require a paid service (as of March 2024), we also provide the full prompt (8000 characters) on GitHub. Users can paste this code to other LLMs in the initial prompt and ask questions, achieving similar functionality.
Scripting in the natural language
A unique feature of HeLA is that it generates complete protocols based on the description of the study formulated in the natural language. The majority of the protocols HeLA generates are fully functional. Figure 1 presents the prompt, HeLA's response (including the protocol along with informative comments), and slides generated by PoLA based on the protocol copied to a .txt file and uploaded to PoLA. This indicates that HeLA can accurately interpret and transform a study's description into a comprehensive and operational protocol. This evidences an instance where the content delivered to the participants is generated exclusively in natural language, with HeLA responsible for its transformation into a protocol understood by PoLA. This synergy between HeLA and PoLA indicates the innovative benefits of using LLMs to advance the research process.
Converting scales from text or image to PoLA slides
HeLA generates an adequate PoLA slide based on a copy of the scale pasted in text or as an image. HeLA selects an adequate command, e.g., INPUTFILED or MULTISCALE, and fills all necessary command fields. This functionality is beneficial as it saves time and helps choose the most adequate method from the variety of PoLA's commands. It is also less error-prone in terms of possible typos or syntax-related issues. Consequently, HeLA's functionality promotes a more streamlined, reliable, and practical approach to slide generation based on prompts including pre-existing measures.
Creativity in Finding Workarounds
HeLA's displays creativity in finding solutions. For instance, we presented to HeLA a picture of a survey with checkboxes, a functionality not present in the current PoLA version. Rather than denying translating the request into a PoLA slide, HeLA suggested an effective workaround using a set of INPUTFIELDS or MULTISCALES to achieve the same aim of collecting binary answers:
HeLA Response: However, for checkbox-style multiple-choice questions as seen in your image, PoLA currently doesn't support this directly in the same way. Each game could be listed as an input field as a workaround, or you could modify the study protocol to capture this information differently, perhaps using multiple MULTISCALE commands with single options. Please note, this code assumes that you'll be inputting text for each game played, which is not how checkboxes work typically, but it's a way to adapt to the current capabilities of PoLA. (Content generated with ChatGPT 4.0 using GPTs with a custom instruction presented in PoLA’s documentation).
HeLA's teaching and support capabilities
Users might discuss several areas related to PoLA use with HeLA. For instance, HeLA can explain whether PoLA is an adequate choice for their research purposes based on the provided study description. We found that HeLA provides adequate information and aims to propose workarounds if a function is not available. However, when asked explicitly, HeLA balances PoLA's pros and cons and offers more suitable software choices. This is because while HeLA has expert knowledge and a comprehensive understanding of PoLA, it also has access to knowledge about other software contained in the LLM. HeLA also explains in detail how to write protocols and transfer them from PC to PoLA. We found HeLA a considerable first line of support, and in most cases we explored, it was sufficient to resolve issues related to PoLA use. This indicates that using LLMs could improve research by providing researchers with immediate, comprehensive, interactive, and generative support, and PoLA is pioneering this type of progress.
Protocol revision
HeLA is efficient in checking possible errors in the protocol. First, HeLA identifies obvious typos (including the commands, syntax, or items text). Second, HeLA identifies the inadequate choice of scales. Third, HeLA also identifies more complex problems. For instance, it identified the wrong time in TIMER. It compared the instruction presented to participants with the provided number field for a time in second. While reviewing the protocol HeLA wrote: "The command you've provided for the TIMER function in PoLA seems to have an incorrect time value. The time should be in seconds, and for 5 minutes, it should be 300 seconds."
As with all coding, providing comments in natural language to each command is a good practice for further code modification or reuse. However, HeLA also benefits from natural language comments while reviewing protocols by comparing the intents provided in the natural language comments with the subsequent technical commands and identifying discrepancies. For instance, when there was a mismatch between the researcher's intent revealed in the comment to use an alphabetical order of items presentation and the later use of a command that randomizes the order of items presentation. HeLA identified this problem and suggested an adequate revision. Such descriptions with natural language comments followed by commands can also be used in further training of HeLA in the process of LLM fine-tuning.
Performance
PoLA was extensively tested in a psychophysiological experiment on geolocation mobile gaming (Kaczmarek et al., 2024). We used lab phones to guide participants through sessions and collect self-reports while they concurrently played Pokémon Go using their private smartphones. This provided data with little interference with the player's gaming experience, as the players did not need to switch the apps on their phones or install any applications on their smartphones. Players volunteered in couples, and each individual participated in two gaming sessions, resulting in 80 cases of PoLA use. Each outdoor session lasted about 40 minutes and was preceded by a 5-minute resting baseline measurement in the lab, also guided by PoLA. This resulted in around 60 hours of PoLA's working time. During each session, participants were notified every ten minutes to answer scales about cognitions and emotions during gaming. Before and after the session, participants responded to longer questionnaires.
Of interest to the scope of the present paper are two control questions regarding the formal aspects of using PoLA, which we collected to learn more about the technical aspect of our study and the feasibility and acceptance of our measurement approach. For the measurement of comfort and convenience, participants reported their agreement to a question: "Using the lab phone was convenient" on a scale from 1 (complete or almost complete disagreement) to 9 (complete or almost complete agreement). As participants had to switch every 10 minutes from their phone to the lab phone (carried in a sachet strapped to their belts, along with a cardiovascular monitor), we also asked to which extent they agreed that the frequent switching between smartphones was acceptable ("The frequency of measurement of was acceptable"). We found that the smartphone-guided outdoor experiment with PoLA was convenient, and the participants widely accepted the switching between smartphones.
The limitation of these questions is that we did not ask explicitly about PoLA. Instead, we asked about the overall study experience where participants used PoLA deployed on a smartphone. However, the rating resulting from combining these two factors (being guided through the experiment by a smartphone and using the specific application) produced high ratings. This suggests that neither of these methods produced unacceptable results because otherwise, the product of these two factors would yield low values.
Table 1. Participant feedback on PoLA's usability
|
|
Range
|
M
|
SD
|
(Overall convenience) Using the lab phone was convenient
|
4-9
|
7.90
|
1.20
|
(Switching frequency) The frequency of measurement was acceptable
|
2-9
|
7.55
|
1.57
|
Note. Scale for each question 1-9.
|
|
|
|
Limitations
There are some limitations to the application to keep in mind. First, PoLA requires direct contact between the participant and the research assistant, who presents the lab phone to the participant, instructs them on how to use it, and launches the application. The feasibility of such an approach has already been reported (Meers et al., 2020). However, PoLA is not feasible for studies where participants cannot have direct contact with the researcher.
Second, as is the case for most software, crashes can happen, breaking the flow of the experiments. This is likely given the complex procedures Android introduced for energy consumption and privacy. We found such cases marginal and in no way more frequent than for other laboratory-based research software we have used. However, the PI, research assistants, and the participants should know this. Especially for outdoor studies, participants should be informed of the possibility of the app crash and provided with solutions to such scenarios, e.g., calling or returning to the laboratory once such malfunction is detected. This is imperative because PoLA provides instructions that are otherwise unavailable. Moreover, if the study is outside the lab, the participants must be informed in detail on how to behave in different scenarios.
Third, the current version offers a sufficient solution to run moderately complex field experiments. While the app offers flexibility in protocol design and study implementation, there may be constraints on how the app can be customized to meet more complex research needs. Complex experimental designs or highly specialized tasks may require functionalities beyond what the app offers. The application was developed to minimize its layers of complexity. Although this facilitates the app's adoption, it might be necessary to progress to a more complex application when more complex solutions for navigating the study, such as branching, are needed. Moreover, some other types of slides for data collection could be available. For instance, the software does not allow pictures and videos that might be helpful, e.g., provide more detailed instructions. Depending on researchers' and participants' feedback, future updates might extend this and other functionalities, as the application is scalable.
Fourth, the software generates a straightforward output for each participant. However, aggregating the data for all participants needs manual editing and merging, which can be time-consuming for questionnaires with a randomized order, extensive study protocols, or large numbers of participants. Such data outputs must be carefully processed, inspected, and double-checked for integrity. Future updates might provide a more automatic merging process across participants.
Fifth, while the app incorporates measures to ensure data security and participant privacy, the inherent risks associated with digital data collection cannot be eliminated. Potential vulnerabilities could arise from device theft, unauthorized access, or data breaches, posing concerns for the confidentiality of sensitive information. It is imperative to address the data security and privacy concerns according to institutional and regional regulations where the study is conducted.
Sixth, the application allows for the design and running of experiments in all languages thanks to the use of UTF-8. This encoding system can represent characters from all the world's writing systems. However, the interface and documentation are in English. This might be inconvenient to researchers who use other languages.
Finally, HeLA introduces an innovative approach to interfacing with research software through artificial intelligence and natural language. However, the use of HeLA should be considered experimental at the current stage. It is crucial to acknowledge the limitations inherent to LLM technologies that also influence HeLA. HeLA suffers from general LLM shortcomings, such as the risk of providing convincing but inaccurate suggestions or "hallucinations" in some scenarios. For instance, HeLA generates correct protocols when the user's request is within PoLA's capabilities. Nevertheless, HeLA's creative approach to finding workarounds also leads to far-fetched solutions, where denying a particular function would be more correct. For instance, we found that HeLA might suggest ways to run experiments (e.g., a Stroop task), although PoLA lacks such capabilities. Researchers are advised to verify HeLA's final outputs against official documentation, taking full responsibility for the accuracy of the experimental protocols provided to participants. Additionally, LLMs often use user inputs for further training, which raises privacy and intellectual property concerns. Researchers should be cautious not to include copyrighted or sensitive material in their prompts to LLMs to prevent unauthorized use or training on proprietary data.
Further directions
Future updates could involve more complex instructions (e.g., videos), self-reports (e.g., graphical scales), and logic (e.g., branching). Moreover, more graphical customization might be provided as the present version offers two modes only (dark and light). In the long term, future updates of PoLA could utilize smartphone sensors, including accelerometers, gyroscopes, GPS, ambient light sensors, and microphones. Future development should also focus on exploring and optimizing researchers' interactions with research software using natural language. Finally, we hope to increase community involvement and collaborative development of PoLA and HeLA. Collaboration with researchers, developers, and the user community is essential for developing free research resources. This collaborative approach will ensure that PoLA evolves according to the research community's needs in the most beneficial and relevant ways.