Figure 3c shows the summary mapping of the sound intensity (sound pressure) dependent displacement response of the target window from 140 Hz to 15 kHz. The upper horizontal axis is the equivalent sound pressure converted from the sound intensity shown in the lower x-axis. Sound intensity is determined by the peak intensity measured in the frequency domain. Sound signals start to appear at 140 Hz due to the relatively high background noise and low output of the driving speaker below this frequency. Detected sound signals of all frequencies have a linear proportion to the input sound pressure and is proportional to the square of sound level in dB units. Across the eight frequencies shown in Fig. 3c, the typical displacement-to-acoustic intensity sensitivity is determined to be sub-nm/Pa. The difference in the sound signal intensity response for each (mechanical and acoustic) frequency arises from the mechanical response transfer function of the target window. In addition, we note that since the PZT used in this study can cover displacement of ± 330 µm, our system detects much louder sound within the servo bandwidth frequency range. Considering the background noise level marked with the open circle, possible sound level measurement range for all frequencies is plotted with the dashed line. For the higher frequency (2 kHz to 15 kHz) where the error signal is used, the minimum detectable sound level is ≈ 40 dB, corresponding to a sound pressure of ≈ 2 mPa. In this range, we define a maximum measurable range as less than the ± λ/4 vibration level with a marginal safety coefficient of 2, corresponding to an ≈ 100 dB dynamic range in our measurements. For the lower frequency (140 Hz to 1 kHz) where the control signal is used, the minimum detectable sound level varies from 40 dB to 65 dB, equivalent from 2 mPa to 36 mPa. In this range, the maximum measurable range for lower frequency is estimated to be much higher than the case for higher frequencies, since the optical delay line range of the PZT actuator is ± 330 µm these lower frequencies, a dynamic range up to ≈ 100 dB is estimated. We further note that vibration amplitudes larger than λ/2 can drop the locking state and hence a tighter level of locking is required to measure a large sound when the control signal is used, with a less definite pinpoint of the maximum measurable range for the lower frequencies. Since a sound level of 60 dB is typical for conversations, our scheme is sufficient to detect human voices over remote window. If our system is operated with shorter than 60 meter distances, the detectable sound level would be lower than 60 dB since the background noise is decreased.
With the Hz-level laser metrology in-place, subsequently we record and reconstructed several music pieces in the laboratory environment using a high-speed 16-bit oscilloscope to show the feasibility of remote and covert sound detection. Figure 4a shows the real-time sound detection of “UCLA fight song” and its spectrogram, with Fig. 4a the original sound waveform and Fig. 4b its spectrogram over 10 s. Figures 4b and 4c show the recorded and reconstructed waveforms from the control (blue) and error (pink) signals respectively. Slow-varying drift of control signal is suppressed by RF high-pass filtering with a 400 Hz cutoff frequency. Even though the error signal is locked to the zero point, the unsuppressed components have sound signal information. Since the control signal has more information below 1.5 kHz where most human voice is distributed, the control signal based sound information is clearer than the error signal based information. However, error signal based sound signal includes higher frequency overtones than control signal based one, from the impulse response measurements noted in Supplementary Information Sections S2. As shown in Supplementary multimedia 1, the reconstructed lyrics of “UCLA fight song” is clearly audible for both control and error signal based music records. In frequency domain, error signal based sound signal shows relatively clearer sound signal over the locking bandwidth, as described in previous section. A comparison of our mm-thickness beamsplitter with a few micrometer-thickness pellicle beam splitter is also noted in Supplementary Information Sections S3. Sound signal higher than 5 kHz is attenuated since it is above the mechanical transfer function of the target window but it is still sufficient to receive and distinguish male and female voices remotely as illustrated in supplementary multimedia 1.
Figure 5 shows further examples of the real-time music recording waveform reconstructions and its corresponding spectrograms. Left panels (a,c,e) are control signal based results and right panels (b,d,f) are error signal based results. Figures S4a-d are songs from a female singer, and Figures S4e-f are from a male singer. All data are converted into “.wav” file format (see supplementary multimedia) and these .wav files are converted into spectrograms as shown in Figures S4 b,d,e. As described in main text, the control signal based waveforms have a stronger signal than error signal based waveforms, while the error signal based waveforms have higher frequency components. From these results, we confirm that our metrology can record both male and female voice overtones at the remote site.
In summary we have shown remote and local sound detection via picometric homodyne laser interferometry. A Fabry-Perot cavity stabilized Hz-level linewidth laser with 10–15 fractional frequency instability enables picometric displacement measurement over ≈ 30 meters remote distances. Our precision homodyne laser interferometer achieves displacement noise background of 1.5 pm/Hz1/2 near 10 kHz, limited by laser frequency noise. We show the measurement capability of sound detection up to 100 kHz at remote locations about 30 m away, with a measurement range extended using high-speed electronics. The measurement method demonstrated in this study shows long-term operation via stabilization of the homodyne signal regardless of phase wrapping by long-term drift. We measure sounds from 140 Hz to 15 kHz to verify frequency-dependent displacement intensities, with acoustic sensing sensitivities as sub-nm/Pa across our conversational frequency overtones. We confirm that our methodology is able to measure sounds ranging from 2 mPa to 2 kPa, with a dynamic range determined between 60 dB to 100 dB, within the laser λ/4 displacement. With the noise floors and sensitivities determined, we successfully recorded and recreated several music sounds including female and male voices behind a window at typical conversation volumes. We believe that our proposed platform has the potential for laser sound sensing, ultrasound sensing and practical realization of optical frequency standards for acoustic measurements.