Action Potential Features: Computation and Spike Sorting of Human C-Nociceptor Action Potentials as obtained via Microneurography Recordings

doi:10.21203/rs.3.rs-4693883/v1

Download PDF

Article

Action Potential Features: Computation and Spike Sorting of Human C-Nociceptor Action Potentials as obtained via Microneurography Recordings

https://doi.org/10.21203/rs.3.rs-4693883/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Spike sorting represents a persistent challenge in electrophysiology, particularly in extracellular nerve recordings containing signals from several nerve fibers. This issue is exacerbated in microneurography recordings from peripheral unmyelinated afferents in awake humans, which are responsible for pain sensation. This is due to the similarity of spike shapes originating from different fibers, low signal-to-noise ratios, and shape-distorting overlaying signals.

Here, we present the first systematic assessment of morphology-based spike sorting in multiple recordings from two microneurography laboratories. We created dedicated ground truth datasets by employing semi-manual labelling methods enabling the comparison of supervised and unsupervised sorting methods for different feature sets. A strong advantage of the supervised approach was observed, while no single feature set showed a global advantage. Further, the high diversity of the results was linked to the per-recording fiber number and spike morphologies. To extend this first systematic assessment of the spike sorting problem in microneurography, our open-source pipeline enables reproducible sortability analysis of any extracellular recordings of neuronal activity if electrical stimulation of the nerve fibers is possible. The achieved advancement of spike sorting for microneurography lays the foundation for gaining insights into the neural coding of pain and itch signals in a clinical context.

Biological sciences/Physiology/Neurophysiology

Biological sciences/Computational biology and bioinformatics/Data processing

Biological sciences/Computational biology and bioinformatics/Machine learning

action potential shape

microneurography

spike sorting

classification

pain

neuropathy

Microneurography is an electrophysiological method enabling the extracellular recording of single action potentials (APs) of peripheral nerve fibers in awake humans [1, 2]. It offers an exclusive approach to examining neuronal discharges leading to itch and pain sensations in both healthy volunteers and patients suffering from chronic pain and itch [3]. Previous microneurography studies showed, for example, the links between spontaneous activity in C-fibers and neuropathic pain in humans [4].

In many neuroscientific communities, discharge patterns of neurons are known to be important for shaping synaptic transmission, signal encoding, and many other biological processes [5]. However, up to now in the field of pain and itch discharge pattern analyses, it has been disregarded and the simplistic paradigm “more spikes with higher frequency means more pain sensation” is still the only one in use. This stagnation is caused by a lack of reliable spike sorting algorithms for microneurography data, especially for the unmyelinated C-nerve fibers, which play an important role in pain and itch signaling. To gain a deeper understanding of the underlying mechanisms of chronic pain and itch and to support the development of novel treatments, a comprehensive analysis and quantification of discharge patterns is crucial.

The single-needle electrode with only one recording site typically captures activity from multiple fibers simultaneously, as C-fibers are bundled in clusters known as Remak bundles. The simultaneous recording of activity from many nerve fibers makes an automatic approach to identifying and sorting action potentials to their corresponding fiber challenging. However, through a specialized indirect analysis method called the “marking method” reliable but only semi-quantitative information can be obtained from single nerve fibers [6].

The marking method makes use of the property that unmyelinated nerve fibers conduct signals with a constant velocity when electrically activated at a low frequency, e.g., 0.25 Hz. In our manuscript, we call this stimulation background stimulation. When additional action potentials are evoked by extra stimuli, there is a slowing in the conduction velocity of the next action potential, known as activity-dependent slowing (ADS) [7]. The magnitude of the decrease in velocity correlates roughly with the number of previously elicited action potentials. The sudden increase in response latency is called a “marking” of the extra stimuli on the background stimulation. In many cases, spikes become identifiable due to the almost constant latency of action potentials. This method allows for the reliable detection of evoked spikes according to their timing after the electrical stimulus, enabling the characterization of C-fibers by grouping the action potentials from each fiber into tracks. Nevertheless, this technique does not enable the identification or sorting of action potentials in response to extra stimuli or spontaneous discharges. This limitation prevents the extraction of discharge patterns during spontaneous bursts or chemical stimuli resembling real-life situations.

Spike sorting is therefore required for a comprehensive analysis, but the reliability of spike sorting through conventional clustering methods remains uncertain, as observations indicate insufficient sorting reliability in many recordings. Moreover, the quality of spike sorting algorithms has never been systematically assessed.

In recent years, numerous studies have been dedicated to the development of extracellular spike sorting methods tailored for multi-electrode data and more recently especially for high-density probes [8, 9]. The priority of these studies focuses on scalability when dealing with larger probes, high density of recording points, and more spikes to sort due to high discharge frequencies. However, our experimental setup presents the challenge of utilizing only a single electrode, resulting in the absence of spatial information, and much lower discharge rates of unmyelinated peripheral nerve fibers. This limitation differs from the scalability concerns in multi-electrode setups.

Various automatic pipelines and algorithms exist for extracellular single-electrode data. For instance, CED's Spike2, a data acquisition and analysis tool, and SpikeInterface [10], a Python-based framework that offers a comprehensive spike sorting pipeline, incorporating multiple algorithms for spike detection and clustering. However, in the context of microneurography, known algorithms designed for single electrodes often fail, since they rely on consistent spike morphology, but it is a known phenomenon in the microneurographic community that action potentials from different fibers can have the same shape and that the shapes from a single fiber can change during the recording [11]. Additionally, spike morphology in extracellular recordings is heavily dependent on the positioning of the recording electrode, which can change due to the movements of the volunteer.

For microneurography data, only one previous targeted solution has been developed for full spike sorting [12] by Forster and Handwerker. In this approach, the authors detect spikes via thresholding, which is only feasible when the signal-to-noise ratio is high. The algorithm creates templates and is carried out in two phases. During the first pass, the templates are generated. The first spike establishes the first template and subsequent spikes are compared against all templates to identify the closest match in terms of standard deviation. If no closely fitting template exists, a new template is formed. In the second phase, all spikes are compared to these templates. If a spike is within a certain range of a template, it is sorted to that cluster. If a spike cannot be classified, it results in “no template”. The authors emphasize that the advantage is its visual understanding, enabling the clear association of spikes with corresponding templates and the reliable removal of electrical artifacts and EMG activity. The algorithm requires the manual adjustment of several parameters, which was later automatized by Turnquist et al. [11]. However, even with this improvement, the algorithm's results are sensitive to the signal-to-noise ratio, and if two fibers have similar shapes in terms of amplitude and width, there is a chance they will be associated with the same template. Importantly, the resulting classifications have not been evaluated against ground truth data.

To make use of more modern approaches for reliable spike sorting in microneurography, it is necessary to enhance the comprehension of spike shapes and features associated with a certain spike shape. Additionally, there is a need to develop robust automated processes capable of assessing highly variable datasets efficiently.

Through an interdisciplinary collaboration, an infrastructure for automatic analysis of microneurography data was built. This work involved introducing a metadata standard [13] tailored for microneurography experiments employing odML and odML-tables [14, 15], creating a Python library to export data from one important data acquisition system [16], and developing the openMNGlab [17], an open-source analytical framework. The first attempts to automatize the data analysis required more effort than normal linked to individual fiber processing [18], but the development of the infrastructure made it possible to speed up the process. Thus, the analysis of different approaches for spike sorting on representative datasets can start.

In this study, we are investigating the effectiveness of different sets of representations (features) of spike morphology for spike sorting in microneurography in 26 recordings from the microneurography laboratories in Aachen (datasets labelled with A) and Bristol (datasets labelled with B). Focusing on spikes tracked with the marking method and hence creating ground truth labels allows us to explore and compare both unsupervised clustering and supervised classification techniques. We have incorporated seven feature sets to quantify action potential morphologies. In addition to conventional feature sets, such as spike amplitude and width, we have implemented features derived from the spike sorting based on the shape, phase, and distribution features (SS-SPDF) method, as presented by Caro-Martín et al. [19]. This method, designed for extracellular recordings, has exhibited superior performance compared to other neurophysiological techniques. Detailed definitions of these feature sets are provided in the Methods section.

Our emphasis is placed on employing straightforward and easily interpretable methodologies for feature extraction, clustering, and classification. This approach allows us to assess the effectiveness of the sorting pipeline for individual recordings and comprehend the challenges linked to single-electrode extracellular signals.

Thus, the objective of the reported work is to gain insights into a spike morphology variability of microneurography data and better understand the domain-specific sorting challenges, before applying more advanced but less transparent sorting approaches. This is achieved through the creation of a harmonized and representative ground truth data collection, and the employment of transparent methods, reproducible thanks to process documentation and open-source code.

2.1 Spike sorting pipeline

We have developed an open-source pipeline for the automated and systematic analysis of action potential shapes derived from microneurographic recordings (see Digital C-fiber repository: https://github.com/Digital-C-Fiber). This pipeline serves as a framework for comparing various feature extraction methods and sorting approaches. Our investigation involved the examination of 26 datasets collected from volunteers across two laboratories. The pipeline encompasses both unsupervised and supervised methodologies (see Fig. 1).

For both approaches, the common step of spike detection is made via the marking method and only spikes lying on the tracks are regarded for further analyses. Table 1 provides a summary of all the datasets, presenting information on the count of tracked fibers, and the total number of action potentials for both raw and pre-processed data.

Table 1

Datasets overview including the number of active fibers and the number of action potentials before and after automatic and manual outlier removal. Datasets labelled with A are from the lab in Aachen and datasets labelled with B are from Bristol.
Dataset	Number of fibers	Number of APs before pre-processing	Number of APs after pre-processing
A1	2	258	224
A2	2	536	455
A3	2	346	300
A4	3	1966	1399
A5	2	340	305
A6	2	206	151
A7	3	258	203
A8	3	300	244
A9	2	230	183
A10	3	419	340
A11	5	465	371
A12	6	1932	1405
A13	4	508	397
A14	2	310	230
A15	2	283	205
A16	5	800	631
A17	5	963	756
A18	5	1036	823
A19	4	816	678
A20	3	342	296
A21	2	526	475
A22	2	288	219
B1	3	1082	976
B2	5	981	663
B3	3	513	445
B4	2	366	357

In the unsupervised approach, employing k-means clustering, we tested the efficacy of sorting microneurography data without prior human intervention, assuming the successful detection of spikes. K-means is advantageous due to its simplicity and the ability to specify the desired number of clusters, particularly since the number of active fibers is already known. The supervised solution leveraged a Support Vector Machine (SVM) classifier with a radial basis function (RBF) kernel. SVMs prove to be particularly efficient with high-dimensional feature sets, small dataset sizes, and generalization. We compute 7 sets of features (F1-F7), based on the spike morphology, with different complexity levels (see Table 2).

Evaluation of clustering results is conducted through the Fowlkes–Mallows index (FMI), using available labels as ground truth. The FMI measures the similarity of two clustering results using predicted and true labels and is defined as the geometric mean of precision and recall. For classification, we computed accuracy as the main evaluation metric. Additionally, we employed other clustering-specific metrics and classification-specific metrics, such as precision, recall, and the macro-averaged F1-score (see Supplementary Materials).

In the supervised classification method, similarly, the tracks serve as class labels. We performed 5-fold cross-validation, dividing the action potentials into 80% training data and 20% testing data.

Table 2

Overview of feature sets and corresponding identifiers for simplification. Different feature sets are extracted through domain-agnostic dimension reduction as well as through spike approaches, using the raw waveform and computed features on the waveforms. The feature sets include methodologies, such as principal component analysis (PCA) and the features from the spike sorting approach based on shape, phase, and distribution features (SS-SPDF) by Caro-Martín et al. [19].
Identifier	Feature Set
F1	Amplitude and width (“simple features”)
F2	PCA of SS-SPDF features [19] (2-comp)
F3	PCA of SS-SPDF features [19] (3-comp)
F4	Raw SS-SPDF features [19]
F5	PCA of raw waveform features (2-comp)
F6	PCA of raw waveform features (3-comp)
F7	Raw waveform features

It is important to note that acceptable scores may vary depending on the analysis objective. In some instances, classifying only a subset of action potentials correctly is sufficient, such as when estimating the activity level in a yes/no dichotomy or rough estimation of activation magnitude. Conversely, some burst quantifiers, such as the maximum/minimum inter-spike distance (ISI) could be extremely sensitive to missing/extra spikes in the spike train, hence high accuracy is critical.

2.2 Clustering results

The results for specific datasets revealed the absence of a universal feature extraction method that optimally suits every dataset (see Fig. 2a). Consequently, each dataset exhibits unique feature sets that prove most effective for its specific characteristics. With clustering, only two datasets A1 (2 fibers, different amplitude) and A5 of all 26 were sorted with an FMI of > = 0.71 (see Fig. 2a, dots near 1 in F1, F5-F7 and around 0.75 in F5-F7).

Moreover, it is important to note the correlation between increasing numbers of fibers/tracks in a recording and a corresponding decline in scores (for example, A16-A19, 4–5 fibers per recording) for both clustering and classification methods and that in general SVM outperforms k-means clustering (see Fig. 3).

Amplitude and width, "simple features” set (F1) emerges as the feature extraction method with highest FMIs for 17 datasets (A2 0.58, A3 0.61, A5 0.99, A6 0.65, A9 0.54, A10 0.44, A11 0.29, A13 0.41, A14 0.68, A15 0.66, A16 0.40, A17 0.33, A18 0.38, A19 0.50, A21 0.54, A22 0.68, B2 0.35) (see Fig. 4a). However, the FMI itself was rather low in this group with a median of 0.54. Following is the raw waveform (F7) and PCA of the raw waveform in both two and three components (correspondingly F5 and F6), each achieving the highest score for 6 other datasets (A1 0.75, A4 0.47, A8 0.49, A20 0.65, B1 0.49, B3 0.64), with a median FMI of 0.65 (see Fig. 4a).

The lowest FMI exhibited the PCA of the SS-SPDF features (F2) with two components in 11 datasets. Out of those 11 datasets, some achieve the highest scores with the F1 (“simple features”) (see Fig. 4a). In contrast, the “simple features” (F1) show the lowest score for B4, whereas all features related to the SS-SPDF method (F2-F4) lead to their highest score (see Fig. 4a). This highlights the dataset dependency in determining the most effective feature set. Additionally, the scores exhibit a largely uniform distribution across all features within each dataset.

2.3. Classification

Using the classification method, we achieve generally higher scores with a median larger than 0.6 (see Fig. 2b) for feature sets F1 (simple features) and F4-F6 (raw features from SS-SPDF method and all features related to the raw waveform), than the clustering method with median FMIs around 0.4–0.5. Only the feature sets F2 and F3 (PCA from SS-SPDF features) show lower values for classification with a median of 0.4, when compared to the other feature sets. The accuracy distribution for feature sets F2 and F3 (PCA of SS-SPDF features) indicates consistent underperformance (accuracy < 0.7) across all datasets.

When F4 (raw SS-SPDF features) is used as input, two datasets demonstrate exceptionally high performance with accuracies of 0.97 and 0.99, resulting in outliers (see Fig. 2b, red circle). Conversely, one dataset displays a performance of approximately 0.3 for F7 (raw waveform features), identified as a negative outlier (see Fig. 2b, red circle). This outlier is further explored in Section 2.4.4.

As mentioned before, computing PCA (F2 and F3) from the SS-SPDF features consistently demonstrates inadequate performance for spike sorting (see Fig. 4b). Once again, we observe the best results for A5 (2 fibers, very different amplitudes) with almost perfect accuracies of 1 for F1 and F4-F7. As illustrated by Fig. 2b of F7, A12 (6 fibers, very similar amplitudes) emerges as a negative outlier.

The raw waveform features (F7) consistently yield the best results. F2 and F3 (PCA from SS-SPDF features) exhibit the lowest average accuracies.

In summary, a clear trend for F7 being the best in most datasets (19 of 26 datasets) and F2 and F3 being the worst in most datasets (18 and 17, respectively of 26 datasets) emerges regarding classification. However, we would like to emphasize that no single feature extraction method universally succeeds across all selected recordings and that it is an individual choice depending on the experimental conditions and data quality.

Further, we analyze several examples to illustrate the results in more detail.

2.4. Exemplary results

2.4.1 Well-separable dataset – A5

The dataset A5 stands out with nearly flawless scores across several features for both the clustering and classification pipeline. Examining the action potential templates (computed through averaging spikes per track, see Methods) in Fig. 5a, the reason behind this exceptional performance becomes evident. The distinct difference in amplitude between the two fibers allows for straightforward clustering and classification, where simple features such as amplitude and width suffice to accurately differentiate and classify all action potentials to their respective tracks. It is worth noting that this represents a rare exception, as among the 26 datasets collected from two different laboratories, A5 was the only dataset to achieve such high scores.

2.4.2 Similar shapes – A6

Now we delve into the pessimistic scenario for two fibers. Upon examining the templates, it becomes apparent that the shapes for both tracks are nearly identical (see Fig. 5b). Clustering proves to be ineffective (with a maximal FMI of 0.65), with all feature sets performing slightly better than random guessing or even worse. Conversely, in terms of classification, the most favorable outcome was achieved with F4 (0.69, raw SS-SPDF features). Hence, utilizing the raw 23 features from the SS-SPDF could provide some insights, particularly in instances where the shapes have a very similar shape.

2.4.3 Mixed shapes for multiple fibers – A8

Moving on to recordings containing activity from more than two fibers, such as in A8 with 3 fibers, additional challenges emerge. The action potentials from the three specific fibers are labelled as Track3, Track4, and Track7, respectively. Track4 and Track7 exhibit templates that are nearly identical on average, causing significant difficulties in their differentiation (see Fig. 5c). Our best result, achieved with F7 (raw waveform features) as input and classification, yielded a score of 0.65. Given that F7 achieves the optimal outcome, we present the confusion matrix for the third fold to illustrate the challenges inherent in these scenarios. Due to the similarity between templates, the majority of action potentials from Track4 and Track7 are mislabelled and mismatched (see Fig. 5d). In this context, relying solely on a shape-based approach proves to be insufficient. Depending on the research question, this result could still be sufficient, as we can correctly classify almost all action potentials for one fiber (Track3, see Fig. 5d). In this scenario, we would gain valuable and reliable information from one nerve fiber and would have to ignore the data of the other two nerve fibers.

2.4.4 Multiple fibers with similar shapes – A12

A12 presents a challenging scenario with six fibers, all exhibiting remarkably similar shapes in the templates (see Fig. 5e). Consequently, it comes as no surprise that the results are among the least promising. Employing clustering, the highest FMI achieved is 0.21, which is not adequate for sorting. In terms of classification, the optimal outcome was attained with F7, yielding an accuracy of 0.3. However, the same issue persists, since the accuracy achieved remains insufficient. Successful sorting and differentiation of the action potentials appear to be highly unlikely for recordings, such as A12. It is worth noting that although the number of action potentials is relatively small, having more data could potentially enhance performance, as observed in cases such as A16 or A17, but the tracks are also more distinguishable.

2.4.5 Three distinct shapes – B3

We examined an ideal scenario involving three active fibers in one recording. In this case, all three fibers can be distinguished by their template shapes (see Fig. 5f). Utilizing F7 as input yields the highest scores for both clustering and classification, (0.64 and 0.91, respectively). Once more, this underscores the superiority of classification over clustering when applied to microneurography data. However, the shapes exhibit significant divergence from those observed in Aachen, indicating the variability inherent in microneurography recordings.

2.4.6 Multiple fibers with similar shapes – B2

As a concluding example, we present a recording from Bristol with 4 active fibers, each template exhibiting similarity to the others (see Fig. 5g). The clustering results range from 0.26 to 0.35 with the best score achieved for the amplitude and width (F1).

Using the classification pipeline, the accuracy improves to 0.67, especially when using the F7 feature extraction method.

In summary, clustering did not show good potential for spike sorting in microneurography, but classification seems promising for many recordings. In our clustering approach, employing F1 (amplitude and width) had the best scores for 17 out of 26 datasets. The raw waveform (F7) yielded the best performance for 19 datasets using SVM classification. The features of the SS-SPDF methods could provide additional information beyond the use of shape alone, which is important for fiber responses with very similar shapes in the same recording. To conclude, the classification approach empowered by the marking method demonstrates the higher potential.

The results strongly suggest that the identification of the most suitable input feature set requires individualized exploration for each dataset, although algorithmic approaches can be proposed for decision optimization.

In addition to the optimized sorting process, we can establish criteria to determine the general sortability of a recording. For instance, if the accuracy computed on the tracked spikes is below a predefined threshold, then the file can be marked as non-sortable. This threshold should depend on the specific research question and experimental objective, which would require an evaluation of the acceptable level of misclassifications. The first electrical stimulation protocol that is used for characterizing nerve fiber types in a microneurography experiment uses monotonous electrical stimulation with low frequency and can be employed for classification purposes. Thus, it can serve as training data and for the sortability check.

This work marks a significant step by introducing the first systematic spike shape analysis and reporting the challenges of microneurography data. It stands out as the first study that considers this amount of data from two different locations and hardware setups. What sets our study apart from the previous studies is the ability to evaluate the data against ground truth, facilitated by the marking method applied in microneurography.

Incorporating 26 datasets from two different locations allowed us to capture the diverse variability and variance present in microneurography recordings. Our selection aimed to encompass a wide spectrum of recording durations, fiber counts, and track counts, excluding recordings where fibers overlapped, as these instances could cause even more challenges for sorting.

Our spike sorting pipeline presents an opportunity to conduct microneurography analyses on a broader scale, employing straightforward methods that enhance transparency regarding the sorting process and its limitations within the context of microneurography. Our clustering and classification methods are adjustable, allowing for the incorporation of more advanced approaches suited to our task.

Signal pre-processing was essential for all datasets and involved techniques such as up-sampling for the Aachen data and smoothing for the Bristol data due to the different sampling frequencies of 10 and 30 kHz, respectively. While these pre-processing steps may alter the signal and potentially the spike shape, they were necessary to facilitate the computation of action potential derivatives for feature extraction using the SS-SPDF method.

Given the considerable variability and the large number of action potentials, we implemented automated outlier removal utilizing the computed templates. However, this approach may be too restrictive, particularly over extended recording periods, where action potential shapes can evolve and potentially fall outside the template range yet still belong to the tracks. Alternatively, we propose considering multiple templates per track, subsequently merging them, or implementing a manual upper limit for outlier removal when excessively many action potentials are filtered out, as, for example, observed in A4 with 1966 action potentials before filtering and 1399 after (see Table 1).

Considering the features utilized in the SS-SPDF method, our data deviates from the recording conditions presented in the literature. Specifically, the sampling frequencies for the Aachen and Bristol datasets (10 kHz and 30 kHz, respectively) are notably lower than those reported in the referenced paper (44 kHz). Moreover, microneurography recordings are sensitive to disruptions such as electrode movement, sudden overlay with other spikes originating from nerve fibers from the autonomic nervous system which bring spikes from the spinal cord to the periphery, and environmental electrical noise, which can impact the recording quality.

We decided on k-means clustering due to its interpretability and ability to specify the number of clusters, which aligns with our knowledge of the number of tracks present. A common challenge with k-means clustering is its sensitivity to initialization. Different initializations yield different results. To address this, we utilized a seed value to ensure reproducibility. Under favorable conditions, as observed in dataset A5, clustering alone suffices, and in such rare instances, the training process of the supervised approach could be disregarded.

For classification, we employed SVM classifiers with RBF kernels due to their robust performance and ability to effectively handle high-dimensional data. The challenge with using accuracy for evaluation arises when dealing with imbalanced datasets. In such cases, it is essential to consider class-specific measurements to ensure accurate evaluation. In this work, we reported the average results obtained through 5-fold cross-validation for accuracy and added class-specific measurements to the Supplementary Materials. For future work, we plan to utilize the cross-validation approach to identify the most suitable features for a specific dataset. After the identification, we intend to use all tracked action potentials for training and classify action potentials that are not on the tracks and responses to additional stimulation, such as those illustrated by the orange spikes in Fig. 6. Overall, our classification approach achieved successful classification rates of 60–70% for most action potentials. This is because detailed pattern analyses in microneurography are not sufficient and certain recordings may not reach this score due to similarities in action potential shapes across fibers or excessive activity from multiple fibers in a single recording. However, specific feature sets in individual datasets produce an acceptable sorting rate of above 80%. Thus, our pipeline indicates whether the action potentials in a given file can be sorted with a high enough degree of precision for a certain research question and with which method and according to which feature sets the highest accuracy is achieved.

The variability in the number of fibers is a challenge when comparing clustering and classification performance across single recordings. Each fiber has a distinct random choice probability, which must be considered when evaluating different results. The more fibers are within a recording the success rate drops. This means during the experimental phase careful consideration is necessary for picking the right recording for specific scientific questions. However, as demonstrated for dataset A8, it might be sufficient to sort the action potentials correctly to one specific nerve fiber and disregard the other non-separable fiber information as shown in the confusion matrix.

Another observation was the lack of a direct correlation between increased data amount in terms of recording time and improved performance, which additionally supports the idea that certain recordings are inherently more sortable than others.

This points out that conventional spike sorting algorithms should be used carefully and whenever possible validated against ground truth at least for some of the data for estimation of the reliability of the sorting process for a given recording and a specific nerve fiber activity.

In our future work, we aim to integrate more advanced classification methods, such as deep learning techniques. This needs, however, a sufficient amount of data. Therefore, we have also developed open-source software [17] and a metadata infrastructure [13] to improve data retrieval processes. As previously mentioned, this study serves as a proof-of-concept for the effectiveness of feature extraction methods in spike sorting. While our current focus has been on sorting spikes on tracks, we aspire to extend our analysis to include spikes elicited by extra stimuli, such as responses to pain- and itch-inducing substances.

Further, we expect that integrating latency information, given the unique activity-dependent slowing property of C-fibers, could enhance spike sorting by providing probabilities for action potentials to be affiliated with specific fibers. Together with advanced machine learning models enabling automatic feature extraction, this direction of research is promising to further improve spike sorting processes in microneurography necessary to the advancements of our knowledge on pain and itch signaling.

This study represents the first structured investigation of the challenges of spike sorting within microneurography. We employ the marking method as a partial labelling approach for action potentials and demonstrate the significant potential of using classification for spike sorting for highly complex microneurography datasets. The results are very variable depending on the shape differences between the fibers and the number of active fibers, requiring adjusting the choice of feature sets per recording (calibration), in the best case algorithmically. This points out the necessity to individualize the decision if a dataset is sortable with high enough accuracy for a certain research question and with which feature set.

Our future work will delve into more sophisticated computational approaches, including the exploration of transfer learning and other integrative methodologies. The implications of this research extend beyond microneurography to other single extracellular electrode recordings. Key considerations include the necessity for ground truth data by recording stimulated action potentials and using spike classification to enhance spike sorting results.

Microneurography

The studies involving human participants were reviewed and approved by the Ethics Board of the University Hospital RWTH Aachen with numbers Vo-Nr. EK141-19 and from the Faculty of Life Sciences Research Ethics Committee at the University of Bristol (reference number: 51882). The participants provided their written informed consent, and the study was conducted according to the Declaration of Helsinki.

A microelectrode is inserted into the superficial peroneal nerve (see Fig. 6a), while the volunteer or patient is awake and responsive. The receptive field of C-fibers is determined through transcutaneous electrical stimulation utilizing a constant current stimulator. Once the receptive field is identified, C-fibers are repetitively stimulated at a low frequency using the marking method. A more detailed description of microneurography can be found here [20].

Marking Method

In microneurography, experimenters employ the marking method to observe nerve fiber responses in the form of action potentials. The method utilizes the characteristic that C-fibers have an almost constant conduction velocity in response to repetitive low-frequency stimulation (e.g., 0.125–0.25 Hz), described here as background stimulation. When raw signal segments are plotted sequentially and vertically, where each segment starts at the onset of the background stimulus, the action potential responses evoked by the background stimulation are vertically aligned (see red and green spikes in Fig. 6b). This alignment enhances the visibility of spikes, even when the signal-to-noise ratio is poor, as illustrated in Fig. 6c for an exemplary spike.

Different C-fibers show distinctive conduction velocities, facilitating the differentiation of multiple fibers within a single recording [11]. When applying further stimulation, in the form of extra electrical pulses or natural stimuli, there is a slowing in the conduction velocity of the signal transmission and an increase in latency to the subsequent background stimulus. This phenomenon is known as activity-depending slowing (ADS) [7] and “marks” the fiber. ADS is useful not only to distinguish fibers but also for their classification as different physiological classes of C-fiber exhibit differing degrees of slowing to low and high-frequency electrical stimulation, such as mechanosensitive (CM) or mechanoinsensitive (CMi) C-fibers [20]. In Fig. 6b, a representative “waterfall” plot is presented, illustrating the trajectories of two tracked fibers. In this manuscript, we call them tracks. When the stimulation remains constant (indicated by blue rectangles), the responses align vertically (lines 1–5). However, after stimulating the fibers with two extra pulses (as seen in line 6), we can observe ADS in both fibers. Through repetitive consistent stimulation, latencies recover to their initial values. The remaining issues and challenges in accurately sorting spontaneous firing or chemically induced activity when the action potentials have a similar shape persist if in more than one fiber ADS is observed (see line 11). The orange-marked activity cannot be reliably sorted.

Recording and tracking with Dapsys

In this work, the primary software employed within the laboratory in Aachen for recording and analyzing fiber activity is the Data Acquisition Processor System (Dapsys) [21] with a sampling frequency of 10,000 Hz. Within Dapsys, Turnquist et al. developed a technique designed to facilitate offline post hoc semi-automatic tracking of vertically aligned action potentials [11]. This method involves filtering the signal to enhance the track quality and the automatic identification of action potentials through local linearization of their trajectories. The experimenter can manually change the identified spikes belonging to a specific track within the graphical user interface.

Recording and tracking with Open Ephys and APTrack

In the experiment conducted in Bristol, the data was recorded using Open Ephys [22] with a sampling frequency of 30,000 Hz, and the spikes were detected and tracked with APTrack [23]. SpikeSpy [24] filters offline post hoc the raw data and detects the aligned action potentials through amplitude thresholding, after starting the tracking algorithm. The experimenter can manually change the identified spikes belonging to a specific track within the graphical user interface.

Data

In this work, we are evaluating the methodology with evoked action potentials that were elicited through electrical stimuli. 22 datasets were recorded at the microneurography laboratory in Aachen, while 4 datasets were acquired at the lab in Bristol. Details are listed in Table 1.

Pre-processing for feature extraction

To ensure uniformity and comparability across all datasets, we have agreed on utilizing the HDF5/NIX [25, 26] format, a well-established standard in the field of electrophysiology.

To handle the datasets generated by Dapsys, we use our Python package PyDapsys [16]. This package enables us free access to electrophysiological recordings stored in Dapsys' proprietary data format. For analyzing an individual recording, we retrieve the raw signal, the onset timestamps of all tracked action potentials, as well as the onset timestamps of stimulation events. The data are then stored in Panda's dataframes.

To facilitate the extraction of spike feature sets, our next step involves obtaining the tracked spike waveforms from the raw signal. As illustrated in Fig. 6c, the red line represents the reference point in time, which is typically located near the center of the waveform. Considering an extracellular C-fiber action potential width of approximately 3 ms and a sampling frequency of 10,000 Hz for Dapsys files, it is necessary to encompass 30 datapoints from the raw signal to adequately capture the spike. For instance, a spike occurring at position t would lie within the data slice window [t – 15, t + 15]. Conversely, for Bristol files with a sampling frequency of 30,000 Hz, we consider 90 datapoints, thereby expanding the window range to [t – 30, t + 60], as the reference point is slightly shifted to the left. During the extraction process, we enforce the alignment of negative peaks of all spikes.

To gain deeper insights into the characteristic shape of each track, we compute a “template” by averaging all the individual spikes associated with that specific track. This averaging process results in a representative waveform that effectively shows the distinctiveness or similarity in the shapes of the tracks. In Fig. 5, the templates from several datasets are shown.

We have implemented automatic outlier removal by comparing each spike to its corresponding template. A spike is removed if its minimum value falls outside a $\:\pm\:30\%$ range of the template's minimum value. The full pre-processing steps can be found in Table 3.

Table 3

Pre-processing pipeline for Aachen and Bristol data. For the Aachen sampling frequency (10 kHz), 30 datapoints correspond to 3 ms in time. For the Bristol frequency (30 kHz), 90 datapoints correspond to 3 ms.
Pre-processing Aachen	Pre-processing Bristol
Create NIX file from Dapsys file	Create NIX file from raw Open Ephys file or use H5 file
Read-in signal from raw Dapsys file, read-in stimuli and tracks from NIX file	Read-in signal, stimuli, and tracks from NIX file
	Smooth signal (running mean with window size 4) and invert signal (Bristol signal is flipped)
Create dataframe with stimuli	Create dataframe with stimuli
Get signal values for each AP based on timestamp t in a window [t − 15, t + 15]	Get signal values for each AP based on timestamp t in a window [t − 30, t + 60]
Find minimum in signal piece to align spikes	Find minimum in signal piece to align spikes
Increase sampling frequency by resampling 30 datapoints to 60
Compute template and determine threshold for each template for automatic filtering of spikes	Compute template and determine threshold for each template for automatic filtering of spikes
Filter by threshold and drop APs that are outside of range	Filter by threshold and drop APs that are outside of range
Compute components for complex features by taking the max value of the first derivative between 20:40 position	Compute components for complex features by taking the max value of the first derivative after 30th data point
Compute all feature sets	Compute all feature sets
Plot AP shapes with template	Plot AP shapes with template

Feature set extraction

In our analysis, we explore various feature extraction techniques aimed at quantifying the characteristics of spike waveforms. We compare the disparities between simpler and more complex feature sets. These extracted feature sets serve as inputs for clustering and classification methods, and we evaluate their effectiveness in characterizing and discriminating spikes from multiple tracks.

Raw signal

Our initial feature set is the raw signal itself. We do not process the signal segments further and, instead, directly employ the 30 and 90 datapoints, respectively, as input for each spike waveform. As a result, for a given spike denoted as S, the feature vector is denoted as follows with $\:{s}_{x}$ describing the voltage at position x:

$$\:{S}_{Aachen}=\left[{s}_{1},{s}_{2},\dots\:,\:{s}_{30}\right]$$

$$\:{S}_{Bristol}=\left[{s}_{1},{s}_{2},\dots\:,\:{s}_{90}\right]$$

Amplitude and width (“simple features”)

We refer to “simple features” when taking the most fundamental characteristics of a waveform. Here, we examine two specific features: amplitude a and width w. Amplitude is defined as the peak value of the spike, while width is calculated as half of the amplitude. Consequently, each spike can be described by a two-dimensional feature vector. Formally, the feature vector for a spike S is denoted as:

$$\:S=\left[a,\:w\right]$$

Shape-, phase- and distribution-based (SPDF) features

In 2018, Caro-Martín et al. developed a comprehensive spike sorting pipeline for extracellular recordings that outperforms contemporary methods used in neurophysiology [19]. They tested their algorithm on two synthetic datasets as well as on real extracellular recordings of neural activity within the rostral-medial prefrontal cortex from rabbits. Within their methods, they extracted 24 distinct features from each action potential on shape-, phase-, and distribution-based characteristics. Shape features describe the waveform of the action potential’s first derivative in the time domain, while phase features relate the amplitudes of the first and the second derivative to points in the phase space. Additionally, distribution features take into account the amplitude distributions of both the first and second derivatives of each action potential.

Caro-Martín et al. employed a modified k-means clustering algorithm that finds the optimal number of clusters and clustering results, which also addresses the issue of overlapping spikes and evolving waveforms over time. They introduced their own validity score and error-index to evaluate their method and compare it with alternative spike sorting methods. Furthermore, the authors emphasize the method's computational efficiency and highlight the enhanced physiological interpretability of their feature-based approach compared to algorithms relying on dimensionality reduction techniques.

In this work, we have implemented these features in Python. They represent the more advanced approach and serve as feature vectors for our clustering and classification algorithms.

Feature computation

In our implementation, we primarily followed the definitions in the paper. However, we had to make certain necessary adjustments.

First, we compute both the first and second derivatives for each action potential. Six fundamental points characterize an action potential, forming the base for the feature computations. In some of our detected instances, the first fundamental point (the first zero-crossing of the first derivative) was undefined. Consequently, we automatically excluded these spikes from our data. Furthermore, to enhance the precision of the derivative computations, we have resampled the action potentials originally recorded with Dapsys from 30 to 60 datapoints with SciPy's [27] resample function.

For the final implementation, we had to modify two feature definitions. In the case of Feature 4, it was unclear how the reference waveform was computed. Therefore, we decided to exclude Feature 4 from our final vector.

For Feature 8, the original description referred to it as the "root-mean-square of the amplitudes before the FD event of the action potential" [19]. However, after closer examination of the mathematical formula, it became apparent that the values were not squared, and the authors did not define the variable m. We redefined Feature 8 as $\:{f}_{8}=\sqrt{\frac{\sum\:_{i=s}^{P1}{a}_{{FD}_{i}}^{2}}{P1-s}}$, where s is the beginning of the window.

The final feature vector of spike S is denoted as follows:

$$\:S=[{f}_{1},{f}_{2},{f}_{3},{f}_{5},\dots\:,{f}_{24}]$$

PCA features of raw signal and complex feature sets

Due to the discrepancy in the dimensionalities of the initially presented feature set vectors, we employ principal component analysis (PCA) on the raw signal and the feature vector defined by Caro-Martín et al. We conduct PCA with both two and three components to reduce the feature vector from its original 23, 30, or 90 dimensions, respectively. As a result, we generated four additional feature vectors for each spike, all of which serve as input for the spike sorting process. The definitions of these feature vectors for the spike are denoted as follows:

$$\:{S}_{2\:components}=\left[PC1,\:PC2\right]\:\text{a}\text{n}\text{d}\:{S}_{3\:components}=\left[PC1,\:PC2,\:PC3\right].$$

Clustering

Similar to the methods employed by Caro-Martín et al. and other research, we cluster our action potentials with the k-means algorithm. We utilize the implementation provided by the scikit-learn package [28]. Given the algorithm's sensitivity to initialization, we employ the k-means + + initialization scheme to enhance the robustness of our results. To make results reproducible, we set a specific random seed. The number of clusters represented by the variable k is consistent with the number of tracks observed in each dataset. As we have presented seven distinct feature set vectors, we apply the algorithm individually to each feature set vector. This approach allows us to compare, which input feature sets are most effective for clustering within the current dataset assuming we do not have any prior knowledge of the clusters.

Classification

As an alternative to common clustering, we explore the potential of our extracted feature sets through a supervised learning approach. This option is feasible due to the marking method and the resulting tracks. Our choice of using support vector machines (SVMs) as our classification models is motivated by their effectiveness in high-dimensional feature spaces, particularly when a clear separation between classes is possible. The models aim to identify a line or hyperplane within the feature space to distinguish between different classes. To assess the accuracy and generalizability of our models, we employ 5-fold cross-validation, which repeatedly partitions the data into training and test sets. We adhere to the default parameters as provided in the sci-kit learn implementation.

Evaluation

To assess our spike sorting results and compare the performance of various feature sets and methodologies, we employed evaluation methods tailored to each approach. Here, we present two metrics, one for evaluating clustering results and another for classification performance.

Fowlkes-Mallows Index (FMI)

The Fowlkes-Mallows Index (FMI) is a metric used to evaluate the confusion matrix by considering true positives (TP), false positives (FP), and false negatives (FN) after clustering. Its computation requires both true and predicted labels. Higher FMIs indicate a better clustering result. Defined as the geometric mean of precision and recall, FMI measures the similarity between two clusterings and is defined as follows:

$$\:FMI=\:\sqrt{\frac{TP}{TP+FP}\times\:\frac{TP}{TP+FN}}$$

Accuracy

Accuracy is a metric for evaluating classification results and describes the fraction of correct predictions. Higher accuracy indicates a better-performing classification model. It is usually expressed as a percentage and is defined as the ratio of correctly predicted instances to the total instances:

$$\:Accuracy=\:\frac{Number\:of\:correct\:predictions}{Total\:EquationNumber\:of\:predictions}$$

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request. The source code and selected test data are available on https://github.com/Digital-C-Fiber/SpikeSortingPipeline.

Competing Interests

B. N. is receiving consulting fees from Vertex. The other authors declare no competing interests.

Competing Interest:

BN is receiving consulting fees from Vertex. The other authors declare no competing interests.

Author Contribution

E. K. and B. N. conceptualized the work; B. N., A. F. and J. D. performed the experiments and acquired the data; A. F. and J. D. curated the data; A. T., E. K., B. N., A. P. G. and P. K. analyzed the data; A. T., P. K. and A. P. G. developed the computation pipeline; R. R., B. N., J. D. and E. K. interpreted the data; All authors critically discussed the results; A. T. wrote the initial version of the manuscript and prepared the figures; All authors have substantially revised the manuscript and approved the final version.

Acknowledgement

This project was supported by a grant from the Interdisciplinary Center for Clinical Research within the faculty of Medicine at the RWTH Aachen University. We acknowledge Aidan Nickerson and Danxia Bao for their technical support.

Data Availability

Vallbo, A. B., Hagbarth, K. E. Activity from skin mechanoreceptors recorded percutaneously in awake human subjects. Exp Neurol. 21(3), 270–89 (1968).
Torebjörk, H. E., Hallin, R. G. Responses in human A and C fibres to repeated electrical intradermal stimulation. J Neurol Neurosurg Psychiatry. 37(6), 653–64 (1974).
Kutafina, E., Becker, S., Namer, B. Measuring pain and nociception: Through the glasses of a computational scientist. Transdisciplinary overview of methods. Frontiers in Network Physiology. 3, (2023).
Kleggetveit, I. P., Namer, B., Schmidt, R., Helås, T., Rückel, M., Ørstavik, K., et al. High spontaneous activity of C-nociceptors in painful polyneuropathy. Pain. 153(10), 2040–7 (2012).
Krahe, R., Gabbiani, F. Burst firing in sensory systems. Nat Rev Neurosci. 5(1), 13–23 (2004).
Schmelz, M., Forster, C., Schmidt, R., Ringkamp, M., Handwerker, H. O., Torebjörk, H. E. Delayed responses to electrical stimuli reflect C-fiber responsiveness in human microneurography. Exp Brain Res. 104(2), 331–6 (1995).
Serra, J., Campero, M., Ochoa, J., Bostock, H. Activity-dependent slowing of conduction differentiates functional subtypes of C fibres innervating human skin. J Physiol. 515(Pt 3), 799–811 (1999).
Rey, H. G., Pedreira, C., Quian Quiroga, R. Past, present and future of spike sorting techniques. Brain Research Bulletin. 119, 106–17 (2015).
Buccino, A. P., Garcia, S., Yger, P. Spike sorting: new trends and challenges of the era of high-density probes. Prog Biomed Eng. 4(2), 022005 (2022).
Buccino, A. P., Hurwitz, C. L., Garcia, S., Magland, J., Siegle, J. H., Hurwitz, R., et al. SpikeInterface, a unified framework for spike sorting. eLife. 9, e61834 (2020).
Turnquist, B., RichardWebster, B., Namer, B. Automated detection of latency tracks in microneurography recordings using track correlation. Journal of Neuroscience Methods. 262, 133–41 (2016).
Forster, C., Handwerker, H. O. Automatic classification and analysis of microneurographic spike data using a PC/AT. Journal of Neuroscience Methods. 31(2), 109–18 (1990).
Troglio, A., Nickerson, A., Schlebusch, F., Röhrig, R., Dunham, J., Namer, B., et al. odML-Tables as a Metadata Standard in Microneurography. Stud Health Technol Inform. 307, 3–11 (2023).
Sprenger, J., Zehl, L., Pick, J., Sonntag, M., Grewe, J., Wachtler, T., et al. odMLtables: A User-Friendly Approach for Managing Metadata of Neurophysiological Experiments. Frontiers in Neuroinformatics. 13, (2019).
Grewe, J., Wachtler, T., Benda, J. A Bottom-up Approach to Data Annotation in Neurophysiology. Frontiers in Neuroinformatics. 5, (2011).
Konradi, P., Troglio, A., Pérez Garriga, A., Pérez Martín, A., Röhrig, R., Namer, B., et al. PyDapsys: an open-source library for accessing electrophysiology data recorded with DAPSYS. Frontiers in Neuroinformatics. 17, (2023).
Schlebusch, F., Kehrein, F., Röhrig, R., Namer, B., Kutafina, E. openMNGlab: Data Analysis Framework for Microneurography - A Technical Report. Stud Health Technol Inform. 283, 165–71 (2021).
Kutafina, E., Troglio, A., de Col, R., Röhrig, R., Rossmanith, P., Namer, B. Decoding Neuropathic Pain: Can We Predict Fluctuations of Propagation Speed in Stimulated Peripheral Nerve? Frontiers in Computational Neuroscience. 16, (2022).
Caro-Martín, C. R., Delgado-García, J. M., Gruart, A., Sánchez-Campusano, R. Spike sorting based on shape, phase, and distribution features, and K-TOPS clustering with validity and error indices. Sci Rep. 8(1), 17796 (2018).
Fiebig, A., Leibl, V., Oostendorf, D., Lukaschek, S., Frömbgen, J., Masoudi, M., et al. Peripheral signaling pathways contributing to non-histaminergic itch in humans. Journal of Translational Medicine. 21(1), 908 (2023).
Turnquist, B. DAPSYS (Data Acquisition Processor System). http://dapsys.net/.
Siegle, J. H., López, A. C., Patel, Y. A., Abramov, K., Ohayon, S., Voigts, J. Open Ephys: an open-source, plugin-based platform for multichannel electrophysiology. J Neural Eng. 14(4), 045003 (2017).
Nickerson, A. P., Newton, G. W. T., O’Sullivan, J. H., Martinez-Perez, M., Sales, A. C., Williams, G., et al. Open-Source Real-Time Closed-Loop Electrical Threshold Tracking for Translational Pain Research. J Vis Exp. 194, (2023).
Dunham, J., Nickerson, A. SpikeSpy. https://github.com/Microneurography/SpikeSpy.
Fortner, B. HDF: The hierarchical data format. J Software Tools Prof Program. (1998).
Adrian, S., Kellner, C., Jan, B., Thomas, W., Grewe, J. File format and library for neuroscience data and metadata. Frontiers in Neuroinformatics. 8, (2014).
Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 17(3), 261–72 (2020).
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. Scikit-learn: Machine Learning in Python. arXiv. (2018).

Competing interest reported. B. N. is receiving consulting fees from Vertex. The other authors declare no competing interests.

Supplementaryinformationfiletroglioetal.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

Action Potential Features: Computation and Spike Sorting of Human C-Nociceptor Action Potentials as obtained via Microneurography Recordings

Status:

Version 1

Abstract

Figures

Introduction

Results

2.1 Spike sorting pipeline

2.2 Clustering results

2.3. Classification

2.4. Exemplary results

Discussion

Conclusions

Methods

Microneurography

Marking Method

Recording and tracking with Dapsys

Recording and tracking with Open Ephys and APTrack

Data

Pre-processing for feature extraction

Feature set extraction

Raw signal

Amplitude and width (“simple features”)

Shape-, phase- and distribution-based (SPDF) features

Feature computation

PCA features of raw signal and complex feature sets

Clustering

Classification

Evaluation

Fowlkes-Mallows Index (FMI)

Accuracy

Declarations

Data availability

Competing Interest:

Author Contribution

Acknowledgement

Data Availability

References

Additional Declarations

Supplementary Files

Status:

Version 1