What does social support sound like? Challenges and opportunities for using passive episodic audio collection to assess the social environment

doi:10.21203/rs.3.rs-40879/v1

Background: The social environment, including social support, social burden, and quality of interactions, influences a range of health outcomes, including mental health. Passive audio data collection on mobile phones (e.g., episodic recording of the auditory environment without requiring any active input from the phone user) enables new opportunities to understand the social environment. We evaluated the use of passive audio collection on mobile phones as a window onto the relationship between the social environment within a study of mental health among adolescent mothers in Nepal.

Methods: We enrolled 23 adolescent mothers who first participated in qualitative interviews to describe their social support and identify sounds potentially associated with that support. Then episodic recordings were collected for two weeks from the same women using an app to capture 30 seconds of audio every 15 minutes from 4am to 9pm. Audio data were processed and classified using a pretrained model. Each classification category was accompanied by a predicted accuracy score. Manual validation of the machine-predicted speech and non-speech categories (10%) was done for accuracy.

Results: In qualitative interviews, mothers described a range of positive and negative social interactions and the sounds that accompanied these. Potential positive sounds included adult speech and laughter, baby babbling and laughter, and sounds from baby toys. Sounds characterizing negative stimuli included yelling, crying, screaming by adults and crying by babies. Sounds associated with social isolation included silence and TV or radio noises. Speech comprised of 43% of all passively recorded audio clips (n=7725). Manual validation showed a 23% false positive rate and 62% false-negative rate for speech, demonstrating potential underestimation of speech exposure. Other common sounds included music and vehicular noises.

Conclusions: Passively capturing audio has the potential to improve understanding of the social environment. However, the limited accuracy of the pre-trained model used in this study did not adequately distinguish between positive and negative social interactions. To improve the contribution of passive audio collection to understanding the social environment, future work should improve the accuracy of audio categorization, code for constellations of sounds, and combine audio with other smartphone data collection such as location and activity.

Medical Informatics

Digital health

passive audio data

depression

mental health

adolescents

adolescent mothers

The social environment, including social support, social burden, and quality of interactions, is important for a range of health outcomes, including mental and behavioral health. Social support has been shown to be one of the most important factors for mental health outcomes(1). For example, in the case of maternal depression, social support is one of the most important predictors of severity and duration (2). Social isolation and loss of social group membership are considered as risk factors for postpartum depression (3, 4). Additionally, changes in perceived social support influence the course of depression (5). Women who have positive and affectionate social support constantly available are significantly less likely to experience postpartum depression than those who have it available sporadically (6). Overall, postpartum support has been reported to improving infant and maternal well-being (7). The transition of adolescent women to motherhood is often psychologically stressful, and social support facilitates a smooth transition and the emotional well-being for mothers (8). Social support reduces morbidity and mortality, exposure to health hazards, and psychological stress (9). Ultimately, social support is an important factor in reducing the incidence of postpartum depression and other health issues, particularly among adolescent and first-time mothers.

Investigators have used different methods to assess and measure social support among mothers. For example, functional social support refers to informational, emotional, instrumental, and appraisal support, and structural support encompasses both formal (i.e. from doctors, nurses, and midwives) and informal (i.e. from husbands, mothers, fathers, siblings, etc.) support (9). Other methods include interactions between family and community members, satisfaction with support received, and specific types of support, such as child care (10). Another widely used framework parse social support into three-dimensions: emotional support, instrumental support, and informational support in the context of postpartum depression (6, 11, 12).

There are many challenges that come with assessing social support. Self-report measures are the dominant approach to quantify social support, but this is often unreliable and socially biased. There is a need for more objective approaches to understanding social environments, identify risk factors, and target and track social domains for change over time. In addition, social support is a multi-dimensional topic that does not have one universally agreed upon definition and thus is difficult to quantify. This creates challenges when interpreting outcomes of interventions that specifically target social support as the hypothesized mechanism of action for recovery from maternal psychological distress (10).

Passive sensing data is increasingly being used in mental health research (13, 14). It refers to the capture of users’ information without their active input while they go about their daily lives (15, 16). One of the commonly used methods is smartphone-based passive sensing data. The Global Positioning System (GPS) location, physical activity and movement, and amount of time that a device or app is used, such as Web-based social activity are a few examples of passive data collection (17, 18). Another form of passive sensing is episodic audio recording that can capture brief snippets of the audio environment (19). The audio recording can be helpful in identifying speech and non-speech audio stimuli around users that can be indicators of social support and interactions. Machine learning can be used to categorize these sounds. This helps to preserve confidentiality because transcriptions or listening directly to audio is not required. The output of machine learning can be used to quantify the timing and frequency of different types of audio to understand the social environment (20). Thus, passive audio data can provide an important and unique insight into a mother’s social environment through continuous data collection.

To explore the potential for passive audio collection as a window onto the social environment, we conducted a two-part study. First, we conducted qualitative interviews with adolescent mothers about their social experiences to identify sounds potentially associated with social support. Second, we collected audio for two weeks on mobile phones and used machine learning to quantify the frequency of different types of sounds. We combined the qualitative and passive audio collection findings to identify opportunities and challenges for future research on social support in the home environment.

Setting

This current study was part of a broader initiative to improve mental health services in rural Nepal (21), including a special focus on use of technology to improve maternal mental health for adolescents (19).We conducted the study in Chitwan, a southern region of Nepal with a population of 579,984. Chitwan has an under 5 mortality rate of 38.6 per 1000 and a literacy rate of 78.9% (22). Chitwan has been the center for district-wide scaling-up of community-based mental health services in Nepal as a result of which there is an established partnership with the local health systems (23). We conducted the study in seven health facilities selected on the basis of number of postnatal mothers visiting the facility and availability of mental health treatment. Data collection was undertaken between November 2018 and April 2019.

Study population and sampling

We recruited 23 study participations at health posts during infant immunization camps. Only mothers between the ages of 15-25 years with infants younger than 12 months of age were approached to participate in the study. We consented the mothers and administered a depression screening tool, the Patient Health Questionnaire (PHQ-9), to assess depression symptoms. PHQ-9 is a common, well validated tool used to measure depression symptoms severity across 9 items, with each item having 3 response options, ranging the score from 0 to 27. It has been used in prior studies in Nepal and validated in primary care settings in Chitwan. For a cutoff score of 10 or more, PHQ-9 has a sensitivity of 94%, the specificity of 80%, and the internal consistency of alpha of 0.84 (24). Once the mother was screened for depression and consented, a study team member visited her home for family consent. For participants under the age of 18 years, we asked for a written parental permission form and assent form. The study was approved by the Nepal Health Research Council (#327/2018) and George Washington University’s Institutional Review Board (#051845).

Both depressed and non-depressed mothers were included in the overall study. However, only non-depressed adolescent mothers (PHQ-9 < 7) were included in the analyses presented in the current paper in order to provide a reference point for understanding social support and the auditory environment which could serve as a comparison for later analyses including the depressed mothers as well.

Study procedure and data collection

Qualitative data

We conducted in-depth interviews (IDIs) and administered Day-in-life tool (n=23) to qualitatively assess mother’s social environment. We reached out to the mothers within 2-3 days of first contact at health posts to conduct IDIs. We also included field notes from each participant encounter in the data analysis. Field notes provided cultural and environmental description of mothers’ surroundings. These also include reflection from two observation tools that were administered during the study period. Home Observation Measurement of the Environment (HOME) (25, 26) and Observation of Mother Child Interaction (OMCI)(27) were observation tools used to draw impressions of mother’s surroundings and her interaction with the baby to complement the qualitative findings. Female research assistants used semi-structured interview guides to conduct IDIs that intended to be 30-45 minutes. Questions were asked to understand maternal experience for adolescent mothers, including their interaction with the child. Mothers were asked about their support system, social environment including social responsibilities. We asked about coping and help-seeking behavior, as well as negative social interactions. We included The Consolidate Criteria for Reporting Qualitative Studies (COREQ) checklist as a Supplemental file (28). Besides IDIs and field notes, we administered Day-in-life tool to understand an adolescent mothers’ typical day. We asked mothers what they did the day before and recorded the responses.

Passive audio data

Prior to the current study, we evaluated the acceptability and feasibility of passive data collection in low-resource settings (29). Based on this, we selected passive sensing modalities that would be feasible and contribute to improving maternal and child mental behavioral health. This included episodic audio recording on mobile phones. For this study, we used the Samsung J2 Ace smartphone to collect passive audio data. It was an affordable phone (US $160) popular in the study setting. Low-end mobile phones in Nepal generally cost US $70-$120. The smartphone selected for the study was slightly more expensive than commonly-used devices. In the study setting, most individuals already own mobile phones or have family members who own mobile phones. Additionally, Samsung J2 Ace phone is widely available for purchase within Nepal. It was also the cheapest option that could effectively run all the features and apps required for the study.

Mothers were provided with the phone for the duration of the study. They returned the smart phone after data collection ended. To collect the audio data, we installed our custom-built Electronic Behavior Monitoring app (EBM version 2.0). The EBM app passively collected audio data for 30 seconds every 15 min between 4 am and 9 pm. A folder (Namaste) was created automatically once the EBM app was installed on the mobile phone. All audio data were stored in the folder “Audio” inside the Namaste folder. For episodic audio recording, the microphone in the phone recorded 30-second audio clips every 15 minutes and saved the recordings in an m4a format. Every week, a research assistant then collected the audio data from the mother’s phone and uploaded it in a secured cloud database for analysis. The audio data was then processed for sound classification by a machine learning system. In our previous studies, we had produced a video to explain these data collection processes to potential study participants (29), including information on confidentiality management, such as deleting audio files (30). In the current data collection, mothers could delete audio files before research assistants uploaded the data off of the phones. Mothers were also instructed that they could turn off the phone at any point during the day when to assure confidentiality or address other data collection concerns.

Data analysis

Qualitative data

IDIs were conducted and transcribed in Nepali. Written Nepali transcripts were then translated into English preserving culturally meaningful words by a bilingual translator, using an established Nepali-English translation procedure and mental health glossary (31). The interview transcripts were then combined with field notes and independently read by two researchers (AH, AP) to generate common themes. AH and AP then generated preliminary codebook that were modified by AH, DL, and AP. The codebook was modified while obtaining intercoder agreement. The intercoder agreement of 0.70 was attained. Three researchers (AP, AH, DL) then independently coded the interview transcripts in NVivo 12 (32). After coding, code summaries were written for each of the domains by two researchers (CI, AT) following a thematic approach (33, 34). Summaries were revised after discussion with several authors (AP, AT, CI, BK). Table 1 shows the domains, themes and related codes that we explored in this study.

Table 1: Qualitative themes and related codes

	Qualitative themes	Related codes
Interaction	Adult interactions	Positive social interaction Negative social interactions
	Mother-child interaction	Positive mother-child interaction Negative mother-child interaction
Support	Emotional support	Mother’s emotional support Positive emotions Negative emotions Coping/help seeking Isolation/No social interaction Trust
	Instrumental support	Support in household chores Childcare support General help Coping/help seeking
	Informational support	Informational support Coping/help seeking
Others	Recreational events	Positive social interactions Non-human interaction
	Social/religious events	Common social interactions Positive social interactions Negative social interactions
	Physical activities/exercise	Movement
	Sporting activities	Movement
	Stressful noises	Violence

Passive audio data

On average, about 40 raw audio files were generated each day. Files were downloaded from the device by research assistants and then uploaded to a secure server. Here the files were processed using a pre-trained machine learning model based on a Convolutional Neural Network architecture (VGGish) available as part of the AudioSet project (35). AudioSet, released in 2017 by Google, is a large-scale audio events dataset. It was constructed using millions of 10 second YouTube videos that were manually annotated. The dataset contains 632 audio event classes and is presented as a hierarchical graph of event categories (e.g. animal -> dog -> barking). The categories, cover human and animal sounds, musical instruments and common environmental sounds. The files were processed by our machine learning model and the audio classification label with the highest probability score saved as a single comma separated csv file for weekly audio clips. The .csv file had audio predictions along with time and date stamps used for analysis. These data were then cleaned to remove data of mothers who dropped out of the study, and fix outliers introduced by some readings missing a valid datetime value. Standard exploratory data analysis was performed on these data including the calculation of measures of central tendency, range and missing value analysis. Plots and tables were produced on the full dataset (4AM to 9PM). We conducted a manual validation of about 10% of the audio sounds. A research assistant listened to randomly selected audio clips to validate whether the machine-predicted speech and non-speech sounds were accurate.

Qualitative results

We assessed mothers’ social environment to identify a) individuals that are present in mother’s social environment, and b) speech and non-speech sounds that reflect mothers’ interactions, support, and other social domains (Table 1). Codes in parenthesis correspond to the qualitative reference number for the quotes provided in Additional file 1.

Based on the qualitative interviews, a typical day started at around 6 AM in the morning, and would include either taking care of the baby, e.g. changing or breastfeeding, and household chores that included making morning tea for the family. Mid-morning usually included preparing meals for the family. Usually husbands in a nuclear family, and older female family members, such as mothers-in-law, and sisters-in-law, took up infant’s caretaking roles when mothers were busy with household chores. Mid-morning to noon usually included washing dishes after breakfast, washing clothes, and cleaning. In the afternoons, mothers usually rested with the baby and watched television or YouTube videos. This was also the time she interacted with female family members, neighbors, and friends; male household members were typically out of the house working or socializing during this time of day. However, female family members were present at the home throughout the day and interacted in the afternoon after household chores, and around afternoon tea. Some mothers shared using their mobile phones to call distant family and friends during the afternoon. Evenings were much like mornings with mothers cooking and washing dishes until 7 or 8 PM. A typical day would end with mothers putting the baby to sleep, and spend time talking to her husband or family members. Mothers would then go to sleep with their infants, with co-sleeping being common. When it came to family members, mornings and evenings were the most social time where the family came together to eat meals.

Social Interactions

In the qualitative interviews, mothers described their social interactions throughout the day. We coded the qualitative accounts for potential sources of audio stimuli that could be associated with presence and quality of these interactions.

Adult positive and negative social interactions

Positive social interaction was strongly related to mother’s increased comfort and trust of family members. Adult conversations were often on matters related to childcare (Social interaction code #1 (SOC_01). Besides immediate family members, mothers also interacted with their neighbors. This interaction was usually centered around the babysitting or playing with the infant (SOC_02). Although non-family members were involved periodically in these roles, mothers spent most of her time interacting with immediate family members. Negative social interactions such as yelling and screaming were reported in households where husbands or family members had alcohol use problems (SOC_03). In such households, besides increased negative interactions, there were limited positive adult sounds (laughter, adult conversations) since mothers expressed limiting their conversations even when the family members were sober. Mothers were also likely to turn their mobile phones off when there were arguments at home.

Therefore, audio stimuli that can predict positive social interactions include adult conversations, gender-specific conversations, laughter while negative sounds such can adult yelling, screaming, crying can mean negative interactions.

Mother-Child positive and negative interactions

Most positive mother - child interactions mentioned were of mother playing with the infant (SOC_04). Mothers expressed joy and satisfaction when they played with their baby often with toys and rattles and heard their baby laughing or babbling (SOC_05). Besides direct interaction, mothers enjoyed taking pictures and videos of the baby. Negative mother-child interaction often remained unreported in recorded interviews because mothers feared they would be considered bad mothers if they said they beat or yelled at their children. These incidents were reported in observational tool assessments and field notes. Through observational tool assessments, HOME and OMCI, we also observed mother’s speech (items 2-8), physical punishment (items 12, 16), and hostile speech (items 14, 15, 17, 18) as sounds present in the social environment. Similarly, OMCI also gave information on mother positive and negative speech (items 1-6, 8-11), along with positive and negative infant reaction (items 14-16), such as laughing, babbling, squealing, when with mothers.

Therefore, adult sounds like mother’s speech, laughter in conjugation with child babbling or child laughter can be an indication of positive-mother child interaction. Negative mother – child interaction can include audio stimuli such as mother screaming and crying along with child crying and screaming. Non-speech sounds such as baby rattles and toys along with adult sounds and infant-related sounds can be an indication of mother-child interaction.

Support

We explored this domain as supportive or non-supportive outcomes of social interactions. Unlike interaction, we studied support beyond the mere presence or absence of certain sounds in mother’s environment. In this domain, we explored constellations of sounds, or combination of sounds that predicted supportive or burdensome social environment for the mother.

Emotional support and burden

Family members were the primary source of emotional support, especially husbands. Mothers generally considered “talking about her feelings” or “sharing problems with her family” an important coping mechanism (Support Code #1, SUPP_01). Integrating mothers’ conversations with family members and positive sounds such as laughter, singing can predict mothers’ emotional environment. Besides family, neighbors and health care professionals were other important sources of emotional support (SUPP_02). The role of neighbors and health care professionals was even more prominent when the mother had strained family relationships, and less emotional support from family members. Frequent conversations of mothers with members outside of the family, infant sounds (laughing, crying, squealing) along with adult sounds of non-family members, and positive/negative adult sounds (laughter, crying) could be indication of mothers seeking support outside of the family members.

Another important prediction of mother’s emotional burden can be social isolation. Isolation or no social interaction was prevalent among mother who were generally worried about the future of their children, either because of financial difficulty or family problems. If family members were the primary source of stress, mothers were not comfortable sharing these thoughts with their family members. This emotional burden also led to mothers feeling negatively about the baby, especially when the baby did not sleep or cried a lot. A complete lack of human speech for a prolonged period of time can be an indication of social isolation (SUPP_03, SUPP_04). Presence of machine-generated speech sounds such as TV/radio without human speech for a prolonged period of time could be another important indication of isolation.

Hence, audio stimuli that predict emotional support or burden can be positive and negative adult speech sounds (conversations, laughter, singing, crying, yelling), frequent adult speech sounds (family members, non-family members), along with complete absence of speech sounds, or presence of machine-generated speech sounds (such as TV, radio) without adult speech sounds.

Instrumental support

Two kinds of instrumental support were important – childcare support and support in household chores. In nuclear families, husbands were strongly involved in providing instrumental support to their wives, but older female members such as mother-in-law and sister-in-law in joint families were equally engaged (SUPP_05). To understand instrumental support better, audio data can be used to predict mother’s speech sounds along with that of the family members within the house, especially around infant sounds (child crying, child laughing, babbling) indicating childcare support, or household chores sounds (washing, doing dishes, sweeping) indicating support in household chores.

Much like emotional support, instrumental support was hindered when the mother did not have good relationship with the family members. The family relationship and role of family in instrumental care were critical mostly because most mothers only trusted family members to take care of the baby in their absence. Strong family involvement was even more critical in case of young mothers who were generally inexperienced in childcare and had to learn it either experientially or through demonstration by other family members. If either of these did not happen, mothers had significant instrumental burden to take care of the baby. If the family members did not support her in household chores, she had to take care of her child on top of her regular household after childbirth. The lack of instrumental support meant mother was constantly involved in taking care of the child as well as household chores.

We can record the lack of instrumental care by assessing the amount of household chores sounds (cooking, washing, sweeping) and infant-related sounds (breastfeeding, child bathing, oil massaging, child laughing, babbling, crying) in a typical day, especially a comparison of these sounds alone and in combination with adult speech sounds, which might indicate presence of other adults. Another way to get a snapshot of mother’s social environment is through observation tools. For instance, HOME item 20 asks about childcare support provided by one of the three regular substitutes, and HOME item 41 explores the role of the father in providing at least some form of childcare daily.

Therefore, instrumental support or burden can be understood through positive and negative speech sounds (adult conversations, adult laughter, adult yelling, crying) in conjugation with child sounds (child babbling, child laughter, oil massaging, child bathing), or household sounds (washing, sweeping, cooking).

Informational support

Given adolescent mothers’ lack of experience in childcare, informational support is another important supportive outcome of social interactions. Family members were crucial in providing instrumental support both in young mothers’ maternal home (father, mother, sister), and husband’s home (mother-in-law, father-in-law, sister-in-law). Young mothers generally needed support in feeding the baby, bathing and oil massaging. Culturally in Nepal, oil massaging is a big part of infant’s growth and development, and one in which mother needed the most support. Demonstration (SUPP_06) by an older female member (mother-in-law, mother, sister, sister-in-law) along with informational support (SUPP_07, SUPP_08) was significant in supporting young mothers. An important way to capture the informational support could be adult conversations within household in conjugation with infant-related sounds (child bathing, breastfeeding, oil massaging). Very few mothers (n=4) said they learned to take care of the baby without any external informational support from family members, friends, or health care professionals (SUPP_09). Almost all mothers had some form of informational support, primarily from family members, but also from health care professionals (n=8) and neighbors (n=6). Therefore, we can have a better understanding of mother’s information support system outside of the family if we collect passive audio data that captures mother’s conversations with non-family members, especially conversations with health care professionals.

Therefore, positive adult speech sounds (adult conversations, laughter) and frequent adult speech sounds (family members, non-family members) along with infant-related sounds (child bathing, breastfeeding, oil massaging) can indicate informational support.

Other social domains

We explored other common domains of mothers’ social environment for better understanding of speech and non-speech noises that constitute these phenomena.

Recreation

Young mothers were very comfortable using mobile phones to watch YouTube videos or use social media, mostly Facebook and IMO (a messaging application popular in the study area) (Other domains code #1, OTH_01). Additionally, mothers often recorded videos on their phones or transferred and played songs/videos from their friends or family member’s phones for recreation. Smartphones were important to the mothers for recreation, either to listen to songs or engage in conversations with family and friends through messaging applications (OTH_02). Mothers also used messaging app and smartphones to talk to husbands who were away for employment. Mothers also had audio alerts on their phones when they received instant messages.

Social or religious events

Social and religious events were a big part of mothers’ environment. These events constituted of religious functions, characterized by bells, singing bowl, religious instruments or social functions like marriage, with musical instruments and wedding bands. In both the events, frequent and multiple adult human speech can indicate presence of many people in the surrounding.

Physical and sporting activities

Movement or physical activity was a big part of adolescent mothers’ lives. Bicycling and walking around the neighborhood were pretty common when running errands. Some mothers mentioned walking recreationally, to reduce mental stress. Movement was highly restricted for the first few months of childbirth for both physical and cultural reasons. Culturally, new mothers are often asked to stay home at least the first six months of childbirth to facilitate breastfeeding and reduce the risk of infections. Environmental noises suggestive of outdoor activities include vehicles, birds, domestic animals, whistles, and activity sounds (bicycling and walking sounds).

Stressful events

Some mothers described that violence was also a part of their lives. The mothers reporting violence were most likely to attribute this to husbands. For these mothers, there was a long-term exposure to violence often starting early in the marriage, throughout pregnancy and post-childbirth (OTH_03). In terms of her social environment, speech (adults arguing, crying, screaming, yelling) and non-speech (objects thrashing, hitting) could provide some indication of violence and disruptive family behaviors. Frequency and amount of these exposures in mother’s environment can provide stronger evidence of stressful events in her surroundings.

Pilot passive audio data collection

In our pilot study, we wanted to determine if we could successfully collect passive audio data from adolescent mothers and whether this data could give preliminary insights into the mothers’ social environment. We successfully collected 14 days of audio data using the EBM app installed (with their knowledge) on the mobile phones provided to them. We were able to use a pre-trained machine learning model to predict sound categories (speech, vehicle, music etc.). Finally, we were able to determine the most common sounds in mothers’ social environment along with the distribution of these sounds throughout the day.

Table 2: Sounds accuracy

	True Speech	True Non-speech
Speech	157 (73.0%)	58 (23.0%)
Non-speech	133 (61.9%)	82 (38.1%)

A total of 319 categories of sounds were identified in 7725 audio clips collected from 23 participants each recruited to participate for 14 days. About 42.7 % (n=2318) of the sounds were human speech sounds, 35.4% and 6.3% music and vehicle respectively. A manual validation of the speech sound (n = 215) showed 73% of the machine-predicted speech sounds were true speech sounds (Table 2). Of all the machine predicted non-speech sounds 62% were speech sounds.

Table 3: The most common 10 sounds categorized

	Frequency	Percent
Speech	2318	42.67
Music	1920	35.34
Vehicle	340	6.26
Organ	184	3.39
Rail transport	143	2.63
Tubular bells	131	2.41
Fire	125	2.30
Bus	101	1.86
Car	89	1.64
Insect	82	1.51
Total	5433	100

Machine-predicted speech sounds were uniformly distributed throughout the day (Figure 1). We also assessed music and vehicle noises throughout the day. Music was consistently present throughout the day (range 21% to 29%), while vehicle noises were more prevalent in the morning (range 4% to 10%) than during the day (range 4% to 6%), or evening (range 3% to 6%). Table 3 shows the ten most commonly predicted sounds.

Figure 1 shows the average distribution of speech, music, and vehicle sounds from 4 AM to 9 PM each day

Table 4: Other audio social sounds and their prediction frequency

Other Audio social sounds	Frequency	Percentage
Child speech	9	0.12
Male singing	7	0.09
Whispering	7	0.09
Whimper	4	0.05
Babbling	3	0.04
Female singing	3	0.04
Female speech	3	0.04
Child singing	2	0.03
Children playing	2	0.03
Male speech	2	0.03
Children shouting	1	0.01

Finally, we also tried collecting audio stimuli that could be strong predictors of the social environment. Since we used YouTube trained machine learning, these nuanced sounds were not detected frequently, and could not therefore be used as strong predictors of social environment. Table 4 shows the audio stimuli predicted by YouTube trained machine learning. These audio stimuli were not manually validated.

We qualitatively assessed mother’s social environment and identified audio stimuli that could give us a better picture of her social environment. We then collected audio data from adolescent mothers for 14 days and used YouTube-trained machine learning to predict audio sounds. Qualitatively, we verified that mother’s social environment mostly comprised of family members, and she was always surrounded by husband, mother-in-law, father-in-law, sister-in-law, and brother-in-law providing supportive roles especially during first few months of childbirth. Speech sounds for adult (laughter, adult speech) and child sounds (child laughter, babbling) were common around mother’s social environment. Most common non-speech sounds included mothers doing household chores (washing, cooking) or childcare activities (bathing, oil massaging the baby). Mothers also engaged in recreational activities (watching YouTube videos, TV, listening to radio) frequently. Through passive audio data analysis, we identified speech as the most frequently detected sound, followed by music and vehicles.

Based on our qualitative findings and passive audio collection, we propose a matrix for what types of social interactions and activities could potentially be captured through the audio environment (see Table 5). Speech and non-speech sounds such as laughter, adult speech, TV noises, social functions could be predictors of positive interactions. Similarly, crying, yelling, along with sounds of beating, thrashing, or object breaking showed negative interactions. A prolonged presence of any of these positive or negative sounds for a given mother could help predict her household environment. Studies have previously captured the number of conversations and duration to successfully determine social interaction (36). Passive audio data have been used to estimate the number of conversations a person engages in, the duration of the conversation, and the time the individual speaks within the conversation along with speaking rate and variation in pitch (37, 38) to understand social isolation (39).

Based on our initial experiences, social support is a more complex concept and requires understanding beyond presence and absence of audio stimuli. We suggest an analysis of “constellations of audio sounds” around the mother to give us a better picture of her social support i.e., drawing social environment conclusions based on grouping of speech and/or non-speech sounds. This would require integration of audio clips from a given time frame that predict mother’s social support. For instance, “instrumental support” could be categorized when the following combination is present: a) audio of mother laughing, b) adult female speech of a family member, and c) audio of washing dishes. Understanding the nature of social support would also require gender differentiation (male versus female speech). Non-speech sounds, both positive and negative, can help determine support and burden in mother’s environment. Additional tools such as daily diary elicitation, social support scale (40, 41), and sleep monitoring (42, 43) can give additional assessment of her support system.

In our matrix, we have suggested additional data collection tools that can further validate the findings from passive audio. For example, application and call logs have also been used to determine communication (44). In case of mother-child interaction, we suggest using additional observation tools such as HOME (27, 45, 46) and OMCI (47-49) which have been used in multi-cultural settings to assess mother and child’s social environment, interactions and support. Finally, additional methods of passive data collection, such as Bluetooth beacons attached to child’s clothing can help determine the time mother and child spend together (19, 29).

Table 5: Mother’s social environment and potential speech and non-speech sounds

Domain		Speech sounds	Non-speech sounds	Additional assessment
Interaction	Positive mother-child interaction	Baby talk, child laughter, mother laughter, motherese, mother singing, lullaby	Sound of rattles, toys	Proximity beacon, HOME, OMCI, application use [videos and photos of baby]
	Negative mother-child interaction	Adult yelling, adult crying, child screaming, child crying	Physical violence (Slapping, thrashing)	Proximity beacon, HOME, OMCI, call logs, application use [photos and videos of baby]
	Positive adult interactions	Adult laughter, adult speech	TV noises, movie theatre, music, social functions noise,	Passive data collection tool in family members’ phones, call logs
	Negative adult interactions	Adult yelling, adult crying	Physical violence (slapping, thrashing), Objects breaking, thrashing, loud noises of objects dropping	Proximity beacon, passive data collection tool in family members’ phones, call logs
Support versus burden	Emotional support or burden	Gender-specific speech detection, Person-specific audio (voice recognition)	TV and radio noises, songs, total silence	HOME, Daily Diary Elicitation Maternal social support scale, sleep monitoring
	Instrumental support or burden	Family members’ voice recognition, voice recognition (family versus outside), lack of human speech, mother doing household chores alone (lack of interaction in integration during household chores)	Mother household chores (e. g, washing utensils) along with positive human interaction (laughter, talking), constant household chores noises (washing clothes, washing dishes)	Proximity beacon, OMCI, HOME, Daily Diary Elicitation, NIH Toolbox Adult Social Relationship Scales
	Informational support	Talking	TV, radio	Application usage, browser history
Others	Recreational activities	Individual singing, humming, multiple people singing	Music, TV, YouTube (videos)	Application usage, browser history
	Social events	Adult speech, laughter	Music, instruments (organ, piano, band music)	GPS, daily diary elicitation
	Religious events	Prayers, multiple people singing (bhajans)	Puja bells / Vedic chanting	GPS, daily diary elicitation
	Physical activities/exercise	Adult speech, adult laughter, crowd noises	Outdoor noises, vehicular noises, foot tapping, running/walking noises, sweeping, washing, bicycling	Accelerometer, GPS
	Sporting activities	Crowd noises, cheering	Whistles, bicycling	Accelerometer, GPS
	Stressful noises	Constant loud human speech, persistent human noises for a long duration	Non-stop transportation noise, horns	Pollution indicators environment – environmental noise, Accelerometer, sleep monitoring

Finally, other social noises such as social events, recreational activities, physical activities, and stressful events can be good predictors of overall social environment. We can understand the time and modes of recreation for the mothers if we capture passive audio data across sounds of YouTube videos, and messaging apps. Additionally, passive audio features such as application and call logs can also help us understand her environment. In addition to passive audio data, Global Positioning System (GPS) (50) and accelerometer (38, 44) have been successfully used in studies to determine movement and activity. Passive audio data can be critical in identifying domestic violence. Intimate partner violence (IPV) have been associated with depressive symptoms (51, 52) but remain highly underreported in Nepal (53). Passive audio data can give an invaluable insight into stressful events of mother’s lives predicting domestic violence.

Additional assessment methods such as Ecological Momentary Assessment (EMA) (54) has triggers tied to hearing certain sounds which asks if a sound is accurate when heard in the environment. Such assessment methods can validate passive audio clips. Mood tracking tied to speech and non-speech sounds is another efficient method of understanding mother’s mood associated with particular audio stimuli (44). Studies have also explored alternative coding approach for culturally diverse audio (55).

Additional validation tools in combination with passive audio data can provide important insight into mothers’ lives especially under domains like mother-child interaction and domestic violence, which are difficult to assess, especially in low- and middle- income settings (48, 56). Studies focused on passive sensing data often use additional validation measures to ensure accuracy (13). We suggest similar approach to studying domains such as interaction and support. Passive audio data can generate evidence on the quantity of sounds, but qualitative assessments, and/or multiple passive data methods can provide a more reliable picture of the social environment. Additionally, it is important to consider the measures of audio data that might be most meaningful. While domains such as interactions can be assessed by the presence or absence of human speech, domains like support might require total number and frequency of speech, time-specific contact, predictability of interactions and amount of audio stimuli observed/expected. Passive audio data when combined with appropriate qualitative and observational tools such as daily diary elicitation, ecological momentary assessment, HOME, Quality of mother-child interaction, and additional passive data collection methods like Global Positioning System (GPS) (50), Bluetooth beacons (19, 29), sleep pattern detection (43), accelerometer (38, 44), including recording call logs (44) and application usages (57), and device activity (58) can provide unique and innovative insight into mother’s social environment.

Limitations

The major limitation of our current study was the accuracy of the pre-trained model. One of the major challenges to accurately collecting passive sensing data was inaccuracy of YouTube trained machine learning. The social and cultural environment of adolescent mothers are not universally consistent, so prior to the passive audio data collection, it is integral to record and train some of these culturally relevant sounds from the study setting to train the machine. Similarly, a strong prediction model to distinguish individual-specific audio stimuli such as adult versus child speech, female versus male speech, and sounds such as adult versus child laughing, can predict positive/negative adult conversations along with positive and negative mother-child interaction.

Moreover, social support and social interaction cannot fully be understood without better models to identify positive and negative sounds, such as polite versus hostile conversation, yelling or laughing sounds. These audio stimuli are culturally sensitive and must be recorded prior to the study for better prediction of the environment. It is equally integral to distinguish between machine-generated speech sounds such as TV/radio and adult speech sounds, mostly when studying social isolation, when continuous exposure to machine-generated speech sounds can have very different impact than that of adult speech sounds.

Another limitation of our study is the sampling technique which could impact the generalizability of the findings. We used purposive sampling therefore our findings might not be representative of the population. We had to constantly modify how we explained the study to the mothers and their families. This led to a mix in data capture, with low data collection at the beginning and more towards the end of the study. We only collected audio data between 4 PM – 9 PM, so we could have missed important speech and non-speech sounds outside of the study period. With the advancement of cellular networks in rural Nepal, the use of Android-based smartphones is getting more popular. We anticipate more acceptance of mhealth initiatives such as these in the future. A better understanding of average adolescent mother’s lifestyle and cultural barriers to her use of technology, will be integral in successful implementation of mobile health and passive data studies in Nepal.

Using passive audio data to capture the auditory environment of adolescent mothers gives a unique opportunity to improve our understanding of their social environment. Besides information on total speech and non-speech exposure, it can help assess quality and frequency of these sounds. Although current methods limited comprehensive distinction between positive and negative social interactions, nuanced sounds like child laughter, child crying, adult laughter, or distinction between machine (TV, radio) versus adult speech, a stronger model trained using culturally appropriate sounds can provide prediction of mother’s auditory environment. We recommend a strong machine learning model, combined with techniques such as coding of constellation of sounds, and validation with observation and qualitative tools for stronger indication of mother’s environment. Combining additional passive sensing data such as location and activity can also be integral in understanding mother’s social support and interactions.

COREQ Consolidate Criteria for Reporting Qualitative Studies

EBM Electronic Behavior Monitoring

EMA Ecological Momentary Assessment

GPS Global Positioning System

HOME Home Observation for Measurement of the Environment

IDI In-depth Interviews

OMCI Observation of Mother-Child Interaction

PHQ-9 Patient Health Questionnaire - 9

Ethics approval and consent to participate

Ethical approval was received from the Nepal Health Research Council (#327/2018) and George Washington University Institutional Review Board (#051845). We obtained written informed consent from participants over 18 years, written assent and parental permission from participants under 18 years and verbal informed consent from adult members of their household. Mothers who reported psychological distress or exposure to violence were provided support through collaborating organization in Nepal and were given referral information for additional services.

Consent for publication

The participants provided written consent for the publication.

Availability of data and materials

Data will be made publicly available upon publication of the final study results.

Competing interests

No

Funding

The study was funded by the Bill and Melinda Gates Foundation (Grant #. OPP1189927, PI: B.A. Kohrt)

Authors' contributions

AP, and BAK drafted the manuscript. AP, AH, and DL conducted the qualitative data analysis. AH supervised the qualitative data collection and analysis. AvH developed the EBM app. AvH and PB developed the StandStrong app. AvH and AP conducted the quantitative data analysis. SMM supervised data collection and onsite study implementation. BAK, AvH, and AH conceptualized the study and study design. All authors revised the manuscript.

Acknowledgements

We would like to thank Damaris Lopez for her contributions in qualitative data analysis. We would like to thank mothers and their families who participated in this study. We also like to thank the field study team (Aasha Mahato, Bhagwati Sapkota, Sabita Lohani, Bindu Aryal, Kendra Chaudhary, Bibek KC, Sirjana Panday) and colleagues at Transcultural Psychosocial Organization Nepal.

Wang J, Mann F, Lloyd-Evans B, Ma R, Johnson S. Associations between loneliness and perceived social support and outcomes of mental health problems: a systematic review. BMC Psychiatry. 2018;18(1):156.
Zheng X, Morrell J, Watts K. Changes in maternal self-efficacy, postnatal depression symptoms and social support among Chinese primiparous women during the initial postpartum period: A longitudinal study. Midwifery. 2018;62:151-60.
Ganann R, Sword W, Thabane L, Newbold B, Black M. Predictors of Postpartum Depression Among Immigrant Women in the Year After Childbirth. J Womens Health (Larchmt). 2016;25(2):155-65.
Seymour-Smith M, Cruwys T, Haslam SA, Brodribb W. Loss of group memberships predicts depression in postpartum mothers. Soc Psychiatry Psychiatr Epidemiol. 2017;52(2):201-10.
de Camps Meschino D, Philipp D, Israel A, Vigod S. Maternal-infant mental health: postpartum group intervention. Arch Womens Ment Health. 2016;19(2):243-51.
Chojenta C, Loxton D, Lucke J. How do previous mental health, social support, and stressful life events contribute to postnatal depression in a representative sample of Australian women? J Midwifery Womens Health. 2012;57(2):145-50.
Stapleton LR, Schetter CD, Westling E, Rini C, Glynn LM, Hobel CJ, et al. Perceived partner support in pregnancy predicts lower maternal and infant distress. J Fam Psychol. 2012;26(3):453-63.
Leahy-Warren P, McCarthy G, Corcoran P. First-time mothers: social support, maternal parental self-efficacy and postnatal depression. J Clin Nurs. 2012;21(3-4):388-97.
House JS. Social support and social structure. Sociological forum. 2: Kluwer Academic Publishers; 1987. p. 135-46.
Secco ML, Moffatt ME. A review of social support theories and instruments used in adolescent mothering research. J Adolesc Health. 1994;15(7):517-27.
Boothe AS, Brouwer RJ, Carter-Edwards L, Ostbye T. Unmet social support for healthy behaviors among overweight and obese postpartum women: results from the Active Mothers Postpartum Study. J Womens Health (Larchmt). 2011;20(11):1677-85.
Evans M, Donelle L, Hume-Loveland L. Social support and online postpartum depression discussion groups: a content analysis. Patient Educ Couns. 2012;87(3):405-10.
Trifan A, Oliveira M, Oliveira JL. Passive Sensing of Health Outcomes Through Smartphones: Systematic Review of Current Solutions and Possible Limitations. JMIR Mhealth Uhealth. 2019;7(8):e12649.
Triguero-Mas M, Donaire-Gonzalez D, Seto E, Valentin A, Martinez D, Smith G, et al. Natural outdoor environments and mental health: Stress as a possible mechanism. Environ Res. 2017;159:629-38.
Insel TR. Digital Phenotyping: Technology for a New Science of Behavior. JAMA. 2017;318(13):1215-6.
Campbell AT, Shane B. Eisenman, Nicholas D. Lane, Emiliano Miluzzo, Ronald A. Peterson, Hong Lu, Xiao Zheng, Mirco Musolesi, Kristóf Fodor, and Gahng-Seop Ahn. IEEE Internet Computing. The rise of people-centric sensing 2008;12(4):12-21.
Huang K, Ding X, Xu J, Chen G, Ding W. Monitoring Sleep and Detecting Irregular Nights through Unconstrained Smartphone Sensing. 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom); Beijing: IEEE; 2015. p. 36-45.
Cornet VP, Holden RJ. Systematic review of smartphone-based passive sensing for health and wellbeing. J Biomed Inform. 2018;77:120-32.
Poudyal A, van Heerden A, Hagaman A, Maharjan SM, Byanjankar P, Subba P, et al. Wearable Digital Sensors to Identify Risks of Postpartum Depression and Personalize Psychological Treatment for Adolescent Mothers: Protocol for a Mixed Methods Exploratory Study in Rural Nepal. JMIR Res Protoc. 2019;8(8):e14734.
Mohr DC, Zhang M, Schueller SM. Personal Sensing: Understanding Mental Health Using Ubiquitous Sensors and Machine Learning. Annu Rev Clin Psychol. 2017;13:23-47.
Jordans MJD, Luitel NP, Kohrt BA, Rathod SD, Garman EC, De Silva M, et al. Community-, facility-, and individual-level outcomes of a district mental healthcare plan in a low-resource setting in Nepal: A population-based evaluation. PLoS Med. 2019;16(2):e1002748.
United Nations Statistics Division. National Population and Housing Census 2011: National Report. 2011.
Jordans MJD, Luitel NP, Garman E, Kohrt BA, Rathod SD, Shrestha P, et al. Effectiveness of psychological treatments for depression and alcohol use disorder delivered by community-based counsellors: two pragmatic randomised controlled trials within primary healthcare in Nepal. Br J Psychiatry. 2019;215(2):485-93.
Kohrt BA, Harper I. Navigating diagnoses: understanding mind-body relations, mental health, and stigma in Nepal. Cult Med Psychiatry. 2008;32(4):462-91.
Bradley RH. HOME measurement of maternal responsiveness. New Dir Child Dev. 1989(43):63-73.
Bradley RH, Caldwell BM. Home observation for measurement of the environment: a validation study of screening efficiency. Am J Ment Defic. 1977;81(5):417-20.
Scherer E, Hagaman A, Chung E, Rahman A, O'Donnell K, Maselko J. The relationship between responsive caregiving and child outcomes: evidence from direct observations of mother-child dyads in Pakistan. BMC Public Health. 2019;19(1):252.
Tong A, Sainsbury P, Craig J. Consolidated criteria for reporting qualitative research (COREQ): a 32-item checklist for interviews and focus groups. Int J Qual Health Care. 2007;19(6):349-57.
Kohrt BA, Rai S, Vilakazi K, Thapa K, Bhardwaj A, van Heerden A. Procedures to Select Digital Sensing Technologies for Passive Data Collection With Children and Their Caregivers: Qualitative Cultural Assessment in South Africa and Nepal. JMIR Pediatr Parent. 2019;2(1):e12366.
van Heerden A, Wassenaar D, Essack Z, Vilakazi K, Kohrt BA. In-Home Passive Sensor Data Collection and Its Implications for Social Media Research: Perspectives of Community Women in Rural South Africa. J Empir Res Hum Res Ethics. 2020;15(1-2):97-107.
Acharya B, Basnet M, Rimal P, Citrin D, Hirachan S, Swar S, et al. Translating mental health diagnostic and symptom terminology to train health workers and engage patients in cross-cultural, non-English speaking populations. Int J Ment Health Syst. 2017;11:62.
QSR International. NVIVO qualitative data analysis software. 10 ed. Doncaster, Australia: QSR International Pty Ltd.; 2012.
Bernard HR. Analyzing Qualitative Data: Systematic Approaches: SAGE Publications Inc; 2016. 576 p.
Guest GS. Applied thematic analysis: SAGE Publications, Inc; 2011.
Hershey S, Chaudhuri S, Ellis DPW, et al. , editors. CNN architectures for large-scale audio classification. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2017.
Wang R, Want W, daSilva A, Huckins JF, Kelley WM, Heatherton TF, et al., editors. Tracking depression dynamics in college students using mobile phone and wearable sensing. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies,; 2018.
Wyatt D, Choudhury T, Bilmes J, Kitts JA. Inferring colocation and conversation networks from privacy-sensitive audio with implications for computational social science. ACM Transactions on Intelligent Systems and Technology. 2011;7.
Abdullah S, Matthews M, Frank E, Doherty G, Gay G, Choudhury T. Automatic detection of social rhythms in bipolar disorder. J Am Med Inform Assoc. 2016;23(3):538-43.
Rabbi M, Ali S, Choudhury T, Berke E. Passive and In-situ Assessment of Mental and Physical Well-being using Mobile Sensors. Proc ACM Int Conf Ubiquitous Comput. 2011;2011:385-94.
Dalgard OS, Dowrick C, Lehtinen V, Vazquez-Barquero JL, Casey P, Wilkinson G, et al. Negative life events, social support and gender difference in depression: a multinational community survey with data from the ODIN study. Soc Psychiatry Psychiatr Epidemiol. 2006;41(6):444-51.
Van Lente E, Barry MM, Molcho M, Morgan K, Watson D, Harrington J, et al. Measuring population mental health and social well-being. Int J Public Health. 2012;57(2):421-30.
Ben-Zeev D, Scherer EA, Wang R, Xie H, Campbell AT. Next-generation psychiatric assessment: Using smartphone sensors to monitor behavior and mental health. Psychiatr Rehabil J. 2015;38(3):218-26.
Wang R, Chen F, Chen Z, Li T, Harari G, Tignor S, et al., editors. StudentLife: assessing mental health, academic performance and behavioral trends of college students using smartphones. 2014 ACM international joint conference on pervasive and ubiquitous computing 2014.
Ma Y, Xu B, Bai Y, Sun G, Zhu R. Daily mood assessment based on mobile phone sensing. 9th international conference on wearable and implantable body sensor networks: IEEE; 2012. p. 142-7.
Sikander S, Ahmad I, Bates LM, Gallis J, Hagaman A, O'Donnell K, et al. Cohort Profile: Perinatal depression and child socioemotional development ; the Bachpan cohort study from rural Pakistan. BMJ Open. 2019;9(5):e025644.
Turner EL, Sikander S, Bangash O, Zaidi A, Bates L, Gallis J, et al. The effectiveness of the peer delivered Thinking Healthy Plus (THPP+) Programme for maternal depression and child socio-emotional development in Pakistan: study protocol for a three-year cluster randomized controlled trial. Trials. 2016;17(1):442.
Obradovic J, Yousafzai AK, Finch JE, Rasheed MA. Maternal scaffolding and home stimulation: Key mediators of early intervention effects on children's cognitive development. Dev Psychol. 2016;52(9):1409-21.
Rasheed MA, Yousafzai AK. The development and reliability of an observational tool for assessing mother-child interactions in field studies- experience from Pakistan. Child Care Health Dev. 2015;41(6):1161-71.
Jeong J, McCoy DC, Yousafzai AK, Salhi C, Fink G. Paternal Stimulation and Early Child Development in Low- and Middle-Income Countries. Pediatrics. 2016;138(4).
Grunerbl A, Muaremi A, Osmani V, Bahle G, Ohler S, Troster G, et al. Smartphone-based recognition of states and state changes in bipolar disorder patients. IEEE J Biomed Health Inform. 2015;19(1):140-8.
Lindhorst T, Oxford M. The long-term effects of intimate partner violence on adolescent mothers' depressive symptoms. Soc Sci Med. 2008;66(6):1322-33.
Thomas JL, Lewis JB, Martinez I, Cunningham SD, Siddique M, Tobin JN, et al. Associations between intimate partner violence profiles and mental health among low-income, urban pregnant adolescents. BMC Pregnancy Childbirth. 2019;19(1):120.
Oshiro A, Poudyal AK, Poudel KC, Jimba M, Hokama T. Intimate partner violence among general and urban poor populations in Kathmandu, Nepal. J Interpers Violence. 2011;26(10):2073-92.
Shiffman S, Stone AA, Hufford MR. Ecological momentary assessment. Annu Rev Clin Psychol. 2008;4:1-32.
SturtzSreetharan C. Citizen Sociolinguistics: A Data Collection Approach for Hard-to-capture Naturally Occurring Language Data. Field Methods. 2020.
Satyanarayana VA, Chandra PS. Should mental health assessments be integral to domestic violence research? Indian J Med Ethics. 2009;6(1):15-8.
Asselbergs J, Ruwaard J, Ejdys M, Schrader N, Sijbrandij M, Riper H. Mobile Phone-Based Unobtrusive Ecological Momentary Assessment of Day-to-Day Mood: An Explorative Study. J Med Internet Res. 2016;18(3):e72.
Saeb S, Zhang M, Karr CJ, Schueller SM, Corden ME, Kording KP, et al. Mobile Phone Sensor Correlates of Depressive Symptom Severity in Daily-Life Behavior: An Exploratory Study. J Med Internet Res. 2015;17(7):e175.

AdditionalfileSocialInteractionQuotations.xlsx

What does social support sound like? Challenges and opportunities for using passive episodic audio collection to assess the social environment

Status:

Version 1

Abstract

Figures

Background

Methods

Results

Discussion

Discussion

Conclusion

List Of Abbreviations

Declarations

References

Supplementary Files

Status:

Version 1