Standardized Specification for the SWC File Format
An SWC file is a delimited text file that stores the digital reconstruction of a neural morphology (or specific portions thereof) as a data table with rows and columns (Fig. 1a). Every line in the text file (row) contains seven ordered data field (columns) delimited by whitespace (Table 1) and represents a sample point traced along the neural tree (Fig. 1b).
Table 1. Each traced sample point in an SWC reconstruction file is characterized by seven data field values ordered by column.
Column
|
Field
|
Description
|
Data Type
|
1
|
Index
|
Sample identifier
|
Sequential positive integer
|
2
|
Type
|
Structure identifier
|
Positive integer [1-7]
|
3
|
X
|
3D Cartesian coordinate of X-axis
|
Floating point value
|
4
|
Y
|
3D Cartesian coordinate of Y-axis
|
Floating point value
|
5
|
Z
|
3D Cartesian coordinate of Z-axis
|
Floating point value
|
6
|
R
|
Radius of segment at sample point
|
Floating point value
|
7
|
Parent
|
Parent sample identifier
|
Positive integer or -1
|
The first data field, the Index, is a positive integer uniquely identifying each sample point. The first sample point in the SWC file is assigned Index 1, and the Index of each subsequent sample increases sequentially (1,2,3, …). The second data field, Type, is a positive integer between 1 and 7 defining the structural domain of the sample point: soma (1), axon (2), (basal) dendrite (3), apical dendrite (4), custom (5), unspecified neurite (6), and glia processes (7). The ‘custom’ Type is available for users to interpret at will, such as oblique dendrite. The unspecified neurite may be due to lack of identifying features, as often the case in cultures, or lack of biological differentiation, as early in development or in in certain invertebrate neural systems.
The next three data fields are the X, Y, Z Cartesian coordinates of the sample point in three-dimensional space, respectively. The sixth field, Radius, equals half the thickness of the neural tree segment at the location specified by the X, Y, Z coordinates. The standard specification requires that these 4 data fields be expressed as floating-point values. Unless otherwise noted, they are typically interpreted (e.g., in NeuroMorpho.Org) as representing micrometers. The final data field is the Parent index, which defines how the sample points are connected to each other. The first point in the file must be the root and has a Parent of -1. All other points have a previously declared Index as a Parent. Note that, in a tree, every point can only have one Parent, but multiple points can have the same Parent.
The text file is ASCII encoded and suffixed with a .swc file extension. The file allows for optional header and footer sections in which each line starts with the hash symbol (#). The header section can be used to organize and store metadata information about the reconstruction data (digitization source, authors, brain region, cell type, version number, recording date, etc.). Additionally, the optional footer section provides back-compatibility with extended multi-signal SWC (.eswc) files and can be used to include information such as signal intensity of the multiple imaged channels, or time-dependent changes in structure30.
The soma in a standardized SWC can consist of a single root point or multiple points, the first of which must be a root. The single-point representation approximates the soma as a sphere of radius R that is centered at X, Y, Z. The multi-point representation approximates the soma as a sequence of nodes akin to a neurite branch.
We have made the SWC file format specification publicly available (https://swc-specification.readthedocs.io), following and promoting FAIR principles, and initiated version management to encourage communal development. The first version (v.1.0.0), described above, was discussed and approved by the BRAIN Initiative Cell Census Network (BICCN) Anatomy and Morphology Working Group in 2022.
The xyz2swc Conversion and Standardization Service
We have developed the xyz2swc (RRID:SCR_023317) conversion software that allows convenient conversion of neural digital reconstructions from any common format into the SWC standard format. The tool is deployed as a publicly available online service and has two main functionalities: (1) import different formats of reconstructions and exports them as standardized SWC files; and (2) import existing SWC files for verification (and correction if needed) to ensure they meet the standardized specification of this format.
The xyz2swc service is freely accessible through any common internet browser through a user-friendly web-based graphical interface at https://neuromorpho.org/xyz2swc/ui/ (Fig. 2a). To convert their digital tracings, users simply upload the reconstruction files (either individually or as a zipped archive) and select the “Convert/Standardize” option. The service automatically detects the format of the uploaded files, performs the data conversion, and provides the converted standardized SWC files for download. Using the application does not require prior knowledge of the format specification of the original reconstruction files nor of the SWC format. For imported files that are already in SWC format, the service also provides a “Check” only option, which verifies (without converting) if the file meets standard specification and returns a summary log of any non-standard formatting issues.
We have developed a representational state transfer (REST) Application Programming Interface (API) for xyz2swc that facilitates software interaction and programmatic use of the service. The API allows convenient public access to all conversion and standardization capabilities from almost any modern programming language. Using the API does not require a software license, nor does it require the user to install any part of the xyz2swc service onto their local computer. A description of the API along with a list of possible commands is available at https://neuromorpho.org/xyz2swc/docs.
The xyz2swc service is deployed as a Docker container (Fig. 2b) for the benefit of version control, rapid updates, lower maintenance downtime, and effective scaling on potential periods of higher load. In addition, the published Docker image https://hub.docker.com/r/neuromorpho/xyz2swc contains the latest stable version of the source code, libraries, modules, and all other dependencies needed to install and run the service locally, e.g., on a private server if desired.
Mass Validation of SWC Conversion Robustness Using NeuroMorpho.Org Data
The xyz2swc service supports SWC conversion of 23 different file formats and over 68 format variations thereof (see Methods and Suppl. Table 1). These include reconstructions generated by popular open-source tracing software (e.g., SNT, KNOSSOS, NeuronJ), by commercial closed-license programs (e.g., Neurolucida, Imaris, Amira), and by morphological and electrophysiological modeling applications (TREES toolbox, Neuron, Genesis, PSICS). To our knowledge, xyz2swc covers all neural reconstruction formats described in the peer-reviewed literature or on the internet including relatively newer formats developed for open data sharing (NeuroML, SWC+) and legacy formats originally designed by individual labs (Arbor, Nevin).
These file formats vary in the manner they represent neural morphology. For example, Neurolucida uses a hierarchical tree structure, TREES toolbox represents arbors as a directed adjacency matrix, and Amira adopts a lattice point representation. The formats are also diverse in their digital encoding of the reconstruction data, such as an ASCII text file in NeuronJ.ndf files and a compressed binary file in SNT.traces files. Furthermore, there exist variations within the file structures of the same format across tracing software programs. For instance, the formatting differs between HOC files generated by Neuron, Imaris, and Eutectic.
The xyz2swc support of such multifarious inputs derives from its modular integration of original novel code with custom modifications of existing open-source programs written in several languages (Fig. 3). This design allows for adding or updating individual software components to accept new formats or variations, or to include further features, without impacting the overall service operation.
In order to test xyz2swc, we performed an automated mass validation using all digital reconstructions available on the NeuroMorpho.Org (version 8.4) repository. This online database is to date the largest collection of publicly accessible neural tracings. While all morphologies are converted into SWC when ingested into NeuroMorpho.Org30, this resource also makes the original data files available for download as provided by the contributors in the format in which they were initially collected. Of 232,029 digital reconstructions, 143,131 encompassed 23 reconstruction formats and over 68 variations from 24 software programs (Table 2). The other 88,898 files were native SWC files generated from 36 different software programs.
Xyz2swc converted 142,605 of non-SWC files from NeuroMorpho.Org v.8.4, corresponding to an overall success rate of 99.63% (a detailed report of the mass validation results for all files is included in Suppl. Table 1). These data corroborate the robustness of xyz2swc as a universal converter. The files that failed to convert had a format variation that differed from the documented version in ways that could not be reliably interpreted.
When the imported format lacks information pertaining to one of the SWC data fields, xyz2swc automatically inserts default values. For example, the NeuronJ NDF format does not store the thickness and depth of each branch, capturing the arbor as a two-dimensional linear projection. The resultant SWC file is given a uniform Radius of 0.5 and Z coordinates of 0 (Fig. 4a). Similarly, the TREES Toolbox MTR format does not specify the structural domain (e.g., axon, dendrites, and soma). The converted SWC files are thus assigned Type 6 (unspecified neurite) for all sample points (Fig. 4b). In all cases, visual checks confirm the physical integrity of the neural tree before and after conversion. Conversely, certain formats include additional features that are not supported by SWC. For example, Neurolucida DAT, ASC, and XML formats provide options to annotate subcellular structures, such as spines, varicosities, and puncta, anatomical boundaries, and user-defined textual markers. In these cases, xyz2swc converts all data supported by the SWC specification while omitting the rest.
Table 2. Mass validation for the most popular reconstruction formats on NeuroMorpho.Org.
Software Application
|
File Format
|
File Count
|
Conversion Rate (%)
|
Neurolucida
|
.dat
|
59,268
|
100.00
|
Neurolucida
|
.asc
|
19,673
|
99.86
|
Neurolucida
|
.nrx
|
2,810
|
100.00
|
Neurolucida
|
.xml
|
96
|
100.00
|
Amira
|
.am
|
16,536
|
99.21
|
NeuronJ
|
.ndf
|
7,844
|
100.00
|
TREES Toolbox
|
.mat
|
962
|
100.00
|
TREES Toolbox
|
.mtr
|
260
|
100.00
|
KNOSSOS
|
.nml
|
3,261
|
100.00
|
KNOSSOS
|
.xml
|
837
|
100.00
|
PyKNOSSOS
|
.nmx
|
1,021
|
100.00
|
SNT
|
.traces
|
13,146
|
100.00
|
Imaris
|
.ims
|
12,020
|
99.81
|
Imaris
|
.hoc
|
3,049
|
99.86
|
Eutectic
|
.nts
|
563
|
100.00
|
NeuronStudio
|
.eswc
|
118
|
100.00
|
Others
|
various
|
1,667
|
79.54
|
Of note, representing the soma as a contour tracing of the cell body perimeter or a series of contours approximating the somatic surface is not consistent with the SWC standard. The xyz2swc standardization module automatically detects soma contours in the imported file and convert them into an equivalent standardized representation (Fig. 5). Specifically, the program measures the curvature of all soma sections in the morphological representation (Fig. 5a). If the curvature angle is obtuse (θ ≥ 90°), the soma section is interpreted as a series of stacked frustums and retained as is. If the curvature angle is acute (θ < 90°), the soma section is deemed to be a contour and replaced. A single soma contour (Fig. 5b) is replaced by a single root point whose X, Y, Z coordinates correspond to the center of the contour, i.e., the average of all contour points. The Radius is computed as the average distance of each contour point from this center. If multiple contours are present (Fig. 5c), they are transformed into a series of points, each following the above procedure.
SWC Standardization Robustness
The xyz2swc service provides support for standardizing all SWC format variations represented in the original files of NeuroMorpho.Org v.8.4. Specifically, xyz2swc performs a series of checks to evaluate if the imported SWC file meets the standardized specification (Table 3). These checks include common formatting errors such as parent samples being referred to before they are defined, negative radius values, and contour soma representations. Other checks take care of non-standard formatting styles used by popular software programs. For example, SWC files generated by NeuronStudio use Type values of 5 and 6 to indicate respectively bifurcations and terminals, inconsistent with the standardized specification. The SWC file is considered standardized only if it passes all checks. If not, xyz2swc can correct the detected errors (Table 3). In particular, the “Check” option only outputs a log file of any detected errors, allowing users to just verify if their file meets the standardized specification without correction. Alternatively, the “Convert/Standardize” option exports the data into the specified standard.
Certain errors cannot be corrected, for example when an entire column of data is missing, or the supposedly SWC-formatted file is not ASCII-encoded. These errors often indicate file corruption or a non-SWC files incorrectly suffixed with a .swc extension. In those cases, xyz2swc still outputs the log file to aid users in handling the identified issues. The standardization success rate of the tested data was 100%. In other words, all 88,898 SWC-formatted original files in NeuroMorpho.org v.8.4 either already met the specified standard or were successfully standardized by xyz2swc.
Table 3. List of specification checks and corresponding error correction (in the order of execution) to ensure SWC standardization.
Check
|
Action/Correction
|
Missing Field
|
If the SWC points matrix does not have seven columns, then return an error. All further checks are omitted.
|
Number of Lines
|
- Generate an error if no samples are detected. All further checks are omitted. - If fewer than 20 lines, generate a warning to check file integrity.
|
Number of soma Samples
|
Generate warning if no soma samples detected.
|
Invalid Parent
|
If the Parent points to an Index value that does not exist, then make the sample with the invalid Parent a root point, and generate a warning to check file integrity.
|
Index/Parent Integer
|
If Index and/or Parent are float-formatted integer (e.g., “1.00”), format them as integers. If they are non-integer values (e.g., “1.34”) or non-numerical entries (e.g, “abc”), generate an error.
|
XYZ Double
|
Ensure X, Y, and Z coordinates are float/double values. Any NaN or NA values detected in the ASCII text file are treated as 0.0, and a warning is issued to check file integrity.
|
Radius Positive Double
|
- Ensure sample Radius is a double/float value. - If radius is negative or zero, set to 0.5. - Set any NaN or NA Radius values to 0.5 and generate a warning to check file integrity.
|
Non-Standard Type
|
- If Type is float-formatted integer, format as integer. If it is non-integer value or non-numerical entry, change to Type 6 indicating 'unspecified neurite'. - If Type is 0 or an integer greater than 7, set to Type 6. - If bifurcation and terminal points have non-standard Types, set them to that of parent.
|
Sequential Index
|
If the Index values are not in sequential order (starting from 1), then sort and reset Index and Parent numbering.
|
Sorted Order
|
- If parent samples are referred to before being defined, then sort and reset Index and Parent numbering. - Sort indices to ensure that the first sample in the file is a root point. If no sample point is a root, generate an error.
|
Soma Contours
|
Detect soma contour(s), and replace each with a single point.
|
Note that SWC files that nominally meet standardized specification can still represent erroneous or inexact reconstructions of the original neural structure. Some of these errors may be evident on visual inspection, especially when corresponding to mistakenly connected branches. Other software programs provide functionality to detect and correct those issues28,31.