Context
PHRESH is an ongoing study of two low-income and predominantly African American communities in Pittsburgh, PA chosen because of their similarities. Hill District is approximately 1.37 square miles with population of approximately 10,000; while Homewood is 1.45 square miles with population of approximately 8,000. Both are residential neighborhoods. We are examining features of the built and social environment that correlate with health, as well as documenting to what extent changes impact residents' health and well-being, diet, exercise, sleep, heart, and cognitive health. The PHRESH study follows a cohort of individuals and their surrounding physical and social environment to evaluate these questions. Details on study design have been described elsewhere (43, 44). To systematically measure change, we conducted assessments of the environment at three timepoints (2012, 2015 and 2017). We modified the Bridging the Gap/Community Obesity Measures Project (BTG-COMP) Street Segment Observation form (45-47), which draws from validated instruments used by other major studies assessing neighborhood features correlated with walking and overall physical activity (38, 40, 41, 48-50). All study protocols were approved by the organization’s Institutional Review Board.
Audit Tool
The PHRESH Street Segment Audit (SSA) tool is a detailed assessment of neighborhood-level physical and social features related to health behaviors, with an emphasis on physical activity and sleep. As seen in Table 1, our tool includes (i) Land use mix capturing diversity of land use, (ii) Physical activity (PA) facility to include spaces for play or physical activity; (iii) Walking/cycling environment including presence of sidewalks, shoulders and bike lanes; (iv) Safety signs including traffic calming and control features; (iv) Amenities and litter including features that make a segment appealing, as well as pedestrian friendly, as well as two subjective assessments (perceived safety of walking; perceived attractiveness for walking) to complement the objective assessments. To the existing BTG-COMP audit tool, we added Environment (e.g. trees, cliffs/ravines) and Gathering places (e.g. restaurants, barbershop, church). In the last data collection round (2017), we added Social disorder items (e.g. presence of police, people selling illegal drugs); a single item on Noise pollution and Physical disorder items (e.g. amount of beer or liquor bottles, abandoned cars), as they have been shown to be related to health behaviors such as sleep (51-53). See Supplemental Table 1 for a full list of items.
Street Segment Selection
The two neighborhoods are residential with almost no arterial segments. Due to homogeneity among street segments within a concentrated geographic area and to reduce costs, we audited a random, representative sample from each of the study neighborhoods. To draw a representative sample, we constructed a complete listing (n=2,027) including all segments within a quarter mile of the neighborhood boundaries. The listing was compiled using a geographic shapefile provided by ESRI (ESRI, 2011), and was supplemented with street network information provided by city of Pittsburgh’s GIS department, Google Maps, and personal inspection. The decision to draw a random 25% sample was informed by an earlier published study (54). Therefore, 511, 585 and 586 segments were sampled in 2012, 2015 and 2017, respectively.
Whenever possible, a street segment was followed over time. The planned change in the study neighborhoods affected the nature and existence of some streets. We saw significant changes in areas with public housing (often old, dating back decades). Between 2011 and 2018, $136.5 million and $54.3 million in residential development (including some HOPE VI grants) came into the Hill District and Homewood, respectively. In and around public housing, entire street blocks were demolished; in certain areas, the street grids themselves changed. There are about five areas where street networks themselves changed (not just the buildings on the streets), with the changes shown in figure 1. Thus, we established consistent rules to address such changes. Specifically, if a sampled segment did not exist at a follow up wave, a randomly selected segment from the same neighborhood served as replacement. If a sampled segment was bisected, both parts were included. If a segment was lengthened, the new attributes (including revised length) of the segment were recorded for follow up audits.
Data collection
All data collectors were community members, so they were familiar with the neighborhoods, and some data collectors participated in two waves (2015, 2017) of data collection. Training was conducted by an experienced trainer and consisted of three parts: (i) in-class presentations including examples and photographs (figure 2) with discussions about highlighted characteristics to look for; (ii) field practice on ‘live’ street segments around the training site; and (iii) a certification exercise where the data collectors and the trainer independently rated the same street segment, and compared ratings to test the data collector’s understanding of the tool, observation skills, and data recording technique. Data collectors were given a comprehensive manual with the safety protocol and detailed description of audit tool items accompanied with photo examples (figure 3), and a summary sheet responding to common questions asked (e.g. FAQ). Each street segment was audited by a team of two data collectors (hereafter, DC pair), which is shown to improve reliability of ratings (41). The DC pair walked the street segments together and made a single joint rating for each item, with discussions to resolve disagreements about proposed ratings in real time. A field coordinator oversaw data collection and assigned data collectors to street segments using maps. In each year, audits were conducted between August and October.
Reliability testing
A random sub-sample of the full sample of street segments was subject to reliability testing (n=60 in 2012, 2015; n=100 in 2017). We drew a sub-sample of about 10% because it was considered reasonable from both a cost and calculation standpoint. While there were not enough segments in the sub-sample to test reliability in the separate neighborhoods, we were able to look at overall reliability if we pooled them together. Each segment in the reliability sub-sample was audited twice within a one-week period. Different DC pairs conducted the two ratings, so that no individual rated the same segment twice. The two ratings were also matched on day and time in 2017 because these factors were considered important for the new physical and social disorder items (see Table 1) added to the 2017 audit tool. Our reliability statistics were chosen to accommodate the response categories used in the SSA tool. About half the items had three response categories (“neither”, “either”, “both sides of the street”), while the rest were mostly binary noting whether a feature was present or absent in that street segment. A few items (e.g. physical disorder) had more than three response categories (e.g. none, a few (1-3), some (4-6), a lot (7 or more)).
Reliability analysis included calculation of prevalence, percentage inter-observer agreement (hereafter, PO) (55, 56) and krippendorf’s alpha (hereafter, KA) (57-60). Reliability statistics including KA are sensitive to base or prevalence rates. Therefore, while the KA is more rigorous and indicates whether agreement exceeded chance levels, we computed the PO statistic as a supplemental index of interrater reliability for all items. PO indicates the proportion of street segments where DC pairs were in exact agreement (e.g. both rated as “no” for the same street segment). For figure 4, we used the following classification for PO: PO > 90% indicates excellent agreement, PO between 75% and 90% indicates good agreement, and PO < 75% combines moderate and fair to poor agreement (61, 62). Consistent with prior research, KA > .75 indicates excellent agreement, KA between .4 and .75 indicates intermediate to good agreement, and KA < .40 indicates poor agreement (63). The reliability statistics can tell us whether an audit tool item has good to excellent agreement at a single time point. On the other hand, items with good to excellent agreement at every timepoint demonstrate stability, making them appropriate to detect change.