Design and theoretical analysis
The piezoelectric MEMS speaker used in the current proof-of-concept demonstration consists of six identical triangular composite cantilever beams, acting as the electro-acoustic transduction structure. These cantilever beams are constructed from a Mo(top electrode)-ScAlN(piezoelectric layer)-Mo(bottom electrode) sandwich like piezoelectric actuator structure deposited onto a silicon support layer. They are separated by air gaps and arranged into a hexagon with their fixed ends being anchored to the same SOI substrate as shown in Fig. 1a. In this design, the PDMS is used as the sealing material, and it is only applied into the air gaps between the cantilever beams, minimizing its impact on the mechanical characteristics of the structure as much as possible. The gap filling process using the "capillary effect" is also schematically illustrated in Fig. 1a. A liquid PDMS droplet is placed onto the edge of the air gap region using a syringe dispensing system. Under the influence of the surface tension, a certain amount of the liquid PDMS will be automatically drawn into the narrow air gaps. With the flow of the PDMS, effective and selective gap filling will be obtained, and this process will be self-stopped based on the liquid viscosity. It’s obvious that this sealing method is easy to perform and does not require strict control of the applied PDMS amount. Through repeating the above operation, all of the air gaps in the speaker can be sequentially filled. Afterwards, the device is put into an oven for the PDMS curing process. Detailed structure parameters used in the current proof-of-concept demonstration are also provided in the figure.
Figure 1b shows a triangular cantilever beam anchored at the base with a thickness, length and width denoted as H, L and W, respectively. Assuming that the base of the cantilever beam is at the coordinate origin, the cross-sectional width of the triangular cantilever beam gradually decreases from the base to the tip:
$$\begin{array}{c}b\left(x\right)=W\frac{L-x}{L} \left(1\right)\end{array}$$
When a uniformly distributed load force F in the z-direction is applied onto the cantilever beam surface, the resultant bending moment can be expressed as:
$$\begin{array}{c}M\left(x\right)=\overrightarrow{r}\times \overrightarrow{F}=\int F\frac{{\left(L-x\right)}^{2}}{{L}^{2}}dx=-\frac{1}{3}\frac{F}{{L}^{2}}{\left(L-x\right)}^{3} \left(2\right)\end{array}$$
where \(M\left(x\right)\) represents the bending moment at the coordinate x, \(\overrightarrow{r}\) is the moment arm that is perpendicular to the applied force \(\overrightarrow{F}\).
For a variable cross-section triangular cantilever beam, its moment of inertia can be expressed as:
$$\begin{array}{c}I\left(x\right)=\frac{b\left(x\right){H}^{3}}{12} \left(3\right)\end{array}$$
According to the relationship between deflection and bending moment22, and considering the boundary conditions of the cantilever beam
$$\begin{array}{c}\frac{{d}^{2}w}{d{x}^{2}}=-\frac{M\left(x\right)}{EI\left(x\right)} \left(4\right)\end{array}$$
$$\begin{array}{c}\frac{dw}{dx}{|}_{x=0}=0 \left(5\right)\end{array}$$
$$\begin{array}{c}w{|}_{x=0}=0 \left(6\right)\end{array}$$
The deflection curve of the triangular beam can be determined:
$$\begin{array}{c}w\left(\stackrel{-}{x}\right)={w}_{max}\times \frac{1}{3}\left[{\left(1-\stackrel{-}{x}\right)}^{4}-4\left(1-\stackrel{-}{x}\right)+3\right] \left(7\right)\end{array}$$
$$\begin{array}{c}{w}_{max}=\frac{F{L}^{3}}{EW{H}^{3}} \left(8\right)\end{array}$$
where \(w\left(\stackrel{-}{x}\right)\) represents the deflection of the triangular beam, \(\stackrel{-}{x}\) is the normalized position, \({w}_{max}\) is the deflection at the tip of the beam, and E is the Young's modulus of the beam.
The first-order resonant frequency of the triangular beam is given by23:
$$\begin{array}{c}{f}_{tri}=0.3295\frac{H}{{L}^{2}}\sqrt{\frac{E}{\rho }} \left(9\right)\end{array}$$
From Eq. (9), it can be seen that the resonant frequency of the triangular cantilever beam structure is inversely proportional to the square of its length. Figure 1c illustrates the relationship between the resonant frequency of the cantilever beam and its length. Considering the currently adopted cantilever design with 1500 µm width and approximately 1299 µm length, it is evident that the resonant frequency of the cantilever beams is sensitive to its length, where a variation of 50µm in length results in a frequency shift of 1.44 kHz. This implies that any non-uniformity in the back cavity etching during the manufacturing process can easily lead to different lengths in the six cantilever beams, resulting in an inconsistency in their resonant frequencies. Considering the fact that a well-known phase reversal characteristic will occur for a structure operating around its resonant frequency, when all of the cantilever beams are driven with the same control signal, the discrepancy in the resonant frequency will induce asynchronous vibration between them. As a result, the sound signals generated by these cantilever beams will be cancelled out by each other, introducing a remarkable dip in the SPL frequency spectrum7.
Numerical simulation
A simulation model of the proposed piezoelectric MEMS speaker within a simplified ear canal environment is established in COMSOL as depicted in Fig. 2a. In simulation, a 1/4 symmetric model is employed to simplify the model, reduce the computational burden in 3D finite element simulations and ease memory demands. The MEMS speaker is represented by considering only two layers including the piezoelectric layer and the support layer. A "fixed constraint" boundary condition is applied at the base of the cantilever beams.
To emulate the ear canal, a particularly designed cylindrical structure with similar acoustic characteristics is used. The lateral and end sides of this "simplified ear canal" are set as acoustic impedance boundaries that match those of the human ear canal. To account for the frictional losses caused by air within the gaps and to accurately estimate the acoustic losses from both sides of the cantilever beam through the gaps, the air gaps and their adjacent regions (cylindrical domain) are defined as the "thermo-viscous acoustic" domain. Based on the model described above, transient simulation is conducted to calculate the sound pressure distribution on both sides of the cantilever beam as illustrated in Fig. 2b. For clarity, the figure retains only one cantilever and one air gap, displaying results from the thermo-viscous acoustic domain near the air gap. It can be observed that when the cantilever is driven to vibrate, opposite sound pressures will be excited on its front and back sides, and they are connected through the air gap, resulting in acoustic short-circuit phenomena and inducing acoustic loss.
The generated SPL at 20 Hz under 40 Vpp driving voltage by the proposed speaker before sealing as a function of the air gap width is studied as shown in Fig. 2c. It can be seen that the acoustic loss induced SPL reduction is negligible for the air gap width less than 5 µm as reported by others14. After that, with the increasing air gap, the decrease of the SPL will become distinct. Referring to the following experiment result, the air gap width will be increased from the initially designed 5 µm to 6.8 µm due to the effect of the chlorine based dry etching process for ScAlN. As a result, a nearly 1.82 dB drop of the SPL will be caused. At the same time, since the residual stress as well as the stress gradient that present in the deposited composite stacking layers will deform the cantilever beam, thus widening the air gap further especially around the tip region. Therefore, its effect on the device performance is also studied. Figure 2d shows the changes of the air gap between the cantilever tips and the corresponding SPL at 20 Hz under different residual stresses. It can be seen that with the increase of the residual stress, the air gap will become wider and wider, followed by a monotonous decrease in SPL. Considering a residual stress of 320 MPa in the as-deposited stacking layer, the tip air gap will be increased from 6.8 µm to 8.2 µm, under the effect of which additional SPL reduction of nearly 0.65 dB will be resulted.
The generated SPL frequency responses in the audio range from 20 Hz to 20 kHz from the speaker before and after sealing treatment with PDMS are also studied as shown in Fig. 3a. For clarification, the SPL curve from 20Hz to 100Hz is magnified and presented as an inset. It’s obvious that the SPL in the low-frequency range can be improved after sealing, in which a 2.82 dB SPL enhancement at 20 Hz operation frequency can be achieved. In contrast, the SPL response in the mid and high-frequency nearly remains unchanged, and this is mainly attributed to the lower material strength of the PDMS as well as its selective sealing.
Moreover, the issue induced by the inconsistent mechanical characteristics of the cantilever beams due to the fabrication process is also studied. Figure 3c shows the resonant vibration mode of the speaker at 19 kHz in the ideal case that all of the cantilever beams are the same, from which the synchronized vibration can be observed. To simulate the process induced dimension change, four of the cantilever beams are intentionally designed to be 20 µm longer than the other two. In this case, the speaker will demonstrate two closely spaced resonant frequencies instead, namely 18.45 kHz and 19 kHz. When all of the cantilever beams are driven using the same control signal of 40 Vpp with its frequency falling between these two resonant frequencies, a distinct asynchronous vibration between the cantilever beams will occur as shown in Fig. 3d, thus inducing a significant dip more than 20 dB in the SPL response (see Fig. 3b). After sealing the air gap with PDMS, all of the cantilever beams will be mechanically coupled together as a new combination with resonant frequency of 18.9 kHz, forcing them into synchronous vibration as shown in Fig. 3e. As a result, the above-mentioned SPL drop issue can be totally eliminated. Furthermore, the sealing treatment only leads to a slight decrease of 0.1 kHz (0.5%) in the resonant frequency of the speaker, indicating that the proposed sealing method has little impact on the mechanical performance of the existing speaker structure.
To assess the operation reliability of the speaker after sealing, the stress distribution in the speaker driven at the resonant frequency with 40 Vpp is extracted as shown in Fig. 4. Due to the vibration amplification effect by resonance, the tip deflection of the cantilever beams will reach maximum, causing the highest internal stress of 103 MPa and 19.4 MPa in the cantilever beams and the PDMS, respectively. Both of these stresses are far below the safety limit, ensuring reliable operation.
Fabrication
The fabrication process of the proposed MEMS speaker is illustrated in Fig. 5a. The process starts with an 8-inch SOI (Silicon-On-Insulator) wafer including a 10 µm thick Si (Silicon) device layer and a 0.5 µm thick buried oxide layer. Initially, a series of depositions are carried out on the SOI wafer using magnetron sputtering method, involving 0.2 µm of bottom electrode Mo (Molybdenum), 1 µm of piezoelectric ScAlN (Scandium-doped Aluminum Nitride, with a Sc content of 9.6%) and an additional 0.2 µm of top electrode Mo. Subsequently, dry etching is employed to pattern the top electrode layer. Following this, the patterning of the piezoelectric layer, the growth of the passivation layer, and the via opening to the top and bottom electrodes are performed. The next step involves the fabrication of patterned Al (Aluminum) electrodes with 1 µm thickness using a lift-off process, which connect to the top and bottom electrodes of the speaker. Then, both the bottom electrode and the Si device layer are selectively etched to form the cantilever beams as well as the air gaps. Finally, deep silicon etching for the back cavity and the release of the buried oxide layer are performed. After dicing, individual speaker devices are obtained.