Heterogeneous 2D Memristor Array and Silicon Selector for Compute-in-Memory Hardware in Convolution Neural Networks

doi:10.21203/rs.3.rs-3172508/v1

Download PDF

Article

Heterogeneous 2D Memristor Array and Silicon Selector for Compute-in-Memory Hardware in Convolution Neural Networks

https://doi.org/10.21203/rs.3.rs-3172508/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Memristor crossbar arrays (CBAs) based on two-dimensional (2D) materials have emerged as a potential solution to overcome the limitations of energy consumption and latency associated with the conventional von Neumann architecture. However, current 2D memristor CBAs encounter specific challenges such as limited array size, high sneak path current, and lack of integration with peripheral circuits for hardware compute-in-memory (CIM) systems. In this work, we demonstrate a novel hardware CIM system that leverages the heterogeneous integration of scalable 2D hafnium diselenide (HfSe2) memristors and silicon (Si) selectors, as well as the integration between memristive CBAs and peripheral control-sensing circuits. The integrated 32 × 32 one-selector-one-memristor (1S1R) array effectively mitigates sneak current, exhibiting a high yield (89%) with notable uniformity. The integrated CBA demonstrates exceptional improvement of energy efficiency and response time comparable to state-of-the-art 2D materials-based memristors. To take advantage of low latency devices for achieving low energy systems, time-domain sensing circuits with the CBA are used, of which the power consumption surpasses that of analog-to-digital converters (ADCs) by 2.5 folds. Moreover, the implemented full-hardware binary convolution neural network (CNN) achieves remarkable accuracy (97.5%) in a pattern recognition task. Additionally, analog computing and in-built activation functions are demonstrated within the system, further augmenting energy efficiency. This silicon-compatible heterogeneous integration approach, along with the energy-efficient CIM system, presents a promising hardware solution for artificial intelligence (AI) applications.

Physical sciences/Nanoscience and technology/Nanoscale devices/Electronic devices

Physical sciences/Materials science/Materials for devices/Electronic devices

The introduction of CIM architecture has provided a promising solution to address the inherent inefficiencies related to data movement in the traditional von Neumann architecture^1–4. Simultaneously, it has enhanced parallel computing capabilities and energy efficiency within the domain of artificial intelligence (AI) applications^5–8. A crucial component within this architecture is the multiply-and-accumulate (MAC) unit^9–11. Memristors have garnered significant attention as viable candidates for MAC operations in CIM due to their compact size¹², high array density^13,14, low energy requirements^15–18, and compatibility with back-end-of-line (BEOL) integration^19–21. In particular, memristors based on 2D materials have generated substantial research interest due to their exceptional properties that are essential for AI applications, especially the ultrathin thickness and low switching voltage for energy-efficient computing (Supplementary Table 1).

However, the advancement of 2D materials-based memristive crossbar arrays (CBAs) face several challenges. While the synthesis and integration of 2D materials still face challenges^22–25, recent advances have demonstrated successful wafer-scale and monolithic 3D integration of 2D materials, achieving remarkable performance in logic and memory applications^{73–77,46,39}. This method is impractical for scaling up to large-scale integration with silicon platforms. Scalable methods such as chemical vapor deposition (CVD), molecular beam epitaxy (MBE), and atomic layer deposition (ALD) present challenges in achieving growth processes compatible with back-end-of-line (BEOL) technology, where the growth temperature must remain below 450°C²⁶. Alternative approaches involve the growth of 2D materials over a substrate without strict temperature constraints, followed by the transfer of the 2D material onto the desired substrate. However, conventional transfer processes may introduce impurities and doping effects into the 2D material due to the utilization of polymers and liquid media.^27,28

Furthermore, the implementation of CIM hardware utilizing 2D materials-based CBAs also faces challenges, including the lack of process integration, limited array size, speed constraints, and the absence of integration with peripheral control and sensing circuits. To enhance device selectivity within the CBA and mitigate leakage current, the incorporation of access selectors or transistors becomes crucial in the process integration^29,30. Nevertheless, existing 2D materials-based memristor arrays predominantly focus on memristor-only (1R) array configurations, thus lacking scalability^{22,23,27,28,31–33}. The issue of array size presents another obstacle as large-scale CBAs are indispensable for weighted layers in neural networks. Fully-connected (FC) layers, for instance, necessitate hundreds of neurons, while convolution layers require hundreds of channels, both of which rely on large array sizes for efficient parallel computing^34,35. Although some heterogeneous integration of 2D materials-based memristors have been reported, such as the graphene transistor/hexagonal boron nitride (h-BN) memristor-based 0.5T0.5R single cell³⁶, the MoS₂ transistor/h-BN memristor-based 1T1R single cell³⁷, the Si transistor/MoS₂ memristor-based 2T1R single cell³⁸, and the Si transistor/h-BN memristor-based 1T1R 5 × 5 CBA⁴⁸, these demonstrations suffer from limited array sizes and fail to accommodate the complete weight mapping of a neural network. Moreover, the slow programming speed remains a challenge as state-of-the-art heterogeneous Si/2D integrated 1T1R arrays report a slow program time of 232 us, which fall short of the requirement for tens of nanoseconds in compute-in-memory operation^48,40. Engineering the integration of 2D materials-based memristive CBAs with peripheral circuits at the hardware-level for CIM system has not been adequately addressed. Previous demonstrations have primarily focused on device features extracted from small CBAs without integrated peripheral circuits and have evaluated array functionality using simulation approaches^48,41–43.

Variable resistive states have been observed in HfSe₂-metal compounds, showing potential for memristive devices^44,45. Our research indicates that Molecular Beam Epitaxy (MBE) can achieve wafer-scale growth of 2D HfSe₂ thin films, enabling large-scale (1R) memristor arrays⁵³. HfSe₂ offers advantages in scalability and integration with silicon-based devices for constructing 1S1R arrays. The restricted bandgap of these semiconductors allows for lower set and reset voltages, improving energy efficiency compared to hexagonal boron nitride (hBN)⁴⁷. A thinner semiconductor layer (3 nm), as shown in device cross-section TEM (Supplementary Fig. 26), shows better switching characteristics than thicker hBN layers (6 nm)⁴⁸.

In this work, we demonstrate a hardware CIM system that leverages on heterogeneous integration technology and incorporates the design of low-energy peripheral circuits. The integrated crossbar array utilizes a 1S1R configuration, where each cell integrates a Si-based selector and a HfSe₂-based memristor. To enable this integration, a low-temperature three-dimensional (3D) stacking process is employed, involving wafer-scale 2D HfSe₂ synthesis and wafer-scale metal-assisted transfer techniques, positioning the 2D memristor above the Si-selector, thereby ensuring compatibility with complementary metal-oxide-semiconductor (CMOS) process. As a result, we successfully fabricated a 1-kilobit (32 × 32) one-selector-one-memristor (1S1R) crossbar array (CBA). This integrated array exhibits significantly reduced sneak current and enhanced endurance when compared to other 2D materials-based memristors that lack access devices. In comparison to the current state-of-the-art 2D materials-based memristors integrated with selectors, our presented integrated array demonstrates a marked improvement in both response time and energy efficiency.^48,49 Furthermore, these enhancements are shown to be comparable to those achieved by conventional oxide-based memristors,^9,21,50–52 with the added advantages of superior switching voltage, faster switching and reduced thickness as shown in Supplementary Table 11. Additionally, time-domain sensing peripheral circuits are designed by utilizing a time-to-digital converter (TDC) for energy-efficient reading and computing, which takes advantage of the rapid device speed. The CIM hardware system achieves full integration of the CBA with peripheral circuits and demonstrates high-accuracy CNN inference. Leveraging the nonlinear behavior of the TDC circuits, analog computing capabilities and in-built activation functions are developed, enhancing energy efficiency during computation for the CNN. Finally, this work suggests that semiconductors with limited bandgaps provide improved voltage and energy efficiency, opening routes for future low-voltage, high-speed memristor applications in logic and storage.

System architecture and device heterogeneous integration

The fully integrated CIM system based on a 1S1R CBA is shown in Fig. 1a. Implemented on a printed circuit board (PCB), the system consists of several modules, including a power supply module, a digital-to-analog converter (DAC) module, a time-domain sensing module, a field-programmable gate array (FPGA) control module, and a MAC computing module. Figure 1b illustrates the role of the Advanced RISC Machine (ARM) controller within the FPGA evaluation board, responsible for encoding input data and decoding output data. The FPGA control unit is programmable with program, read, and compute modes to generate encoded control signals for the Digital-to-Analog (DAC) and handle the collection of output signals with 16 parallel 16-bits counters for Time-to-Digital converter (TDC). Further information on the FPGA programming is provided in Supplementary Fig. 1. Acting as a voltage generator, the DAC module provides voltage pulses to the central CBA for programming, read, and MAC operations. The MAC compute module performs the analog vector-matrix-multiplication (VMM) operations in CNNs, leveraging the structure of the CBA²¹. Peripheral sensing circuits, including the current subtractor and TDC blocks, are integrated to enable the measurement of output currents from the central CBA. Detailed information about the system setup can be found in Supplementary Note 1.

The MAC module comprises a heterogenous integrated 1S1R CBA that is wire-bonded to the printed circuit board (PCB). In this work, a 32 × 32 CBA is utilized, as depicted in Fig. 1c. A zoomed-in optical image of the central region of the CBA is presented in Fig. 1d, revealing the word lines (WLs) as the horizontal top electrodes (TEs) and the bit lines (BLs) as the vertical bottom electrodes (BEs). To provide a schematic representation of a single cell within the CBA, Fig. 1e shows a cross-sectional view, highlighting the distinctive 3D-stacked structure between the Si-based selector and the HfSe₂-based memristor. Corresponding cross-sectional transmission electron microscopy (TEM) images of the region marked by the red dotted rectangle are displayed in Fig. 1f. The inset shows the 2D HfSe₂ layer with a thickness of 3 nm. The stacked structure is clearly visible in the low-magnification image, consisting of different layers such as the bottom BE/p-type Si/middle electrode (ME) forming the Si-based selector, and the ME/HfSe₂/TE memristor stack. Energy-dispersive X-ray spectroscopy (EDX) confirms the elemental composition of each layer in the stacked structure (Supplementary Fig. 2). Further information on the fabrication process of the Si-selector is provided in Supplementary Fig. 3.

The schematic 3D structure of a single 1S1R cell within the CBA is depicted in Fig. 1g. For the Si-based selector, titanium nitride (TiN) is used to establish a Schottky barrier at the interface between the BE and the p-type Si. Additionally, nickel silicide (NiSi) is utilized to form an Ohmic contact between the middle electrode (ME) and the p-type Si. The HfSe₂-based memristor follows the same Ti/HfSe₂/Au structure as described in a previous study, where the Ti filaments can be formed or erased under external bias, resulting in resistive switching (RS) behavior⁵³. Accordingly, the memristor operates in set mode when the selector is under reverse-bias and in reset mode when the selector is under forward-bias, as illustrated in the equivalent circuit diagram in Fig. 1h. Such operation mode is typically different from the one-diode-one-memristor 1D1R structure, which conventionally denotes a unipolar memristor in which the set and reset operations take place within the same voltage polarity (Supplementary Table 2). This distinctive operation prompts us to classify our integration as 1S1R instead of 1D1R. In addition, 1S1R integration has typical advantages compared to 1D1R for ANN applications including real-valued weight implementation, backward matrix-vector multiplication, and high resistance state stability (Supplementary Note 2). An optical image of the fabricated Si-based selector array is shown in Fig. 1i, while Fig. 1j displays a two-inch polycrystalline HfSe₂ thin film grown at a high temperature of 750 ^oC⁵³. Raman spectrum and X-ray photoelectron spectroscopy (XPS) confirm the chemical composition of 2D HfSe₂ (Supplementary Figs. 4 and 5). X-ray diffraction and High-resolution TEM further prove the layer-by-layer structure of the 2D HfSe₂ (Supplementary Fig. 6). To mitigate impurities and doping effects due to the utilization of polymers and liquid media during transfer, we adopt a metal-assisted dry transfer technique, wherein the 2D material is shielded by a metal layer (Supplementary Fig. 7). The maximum transfer temperature is controlled at 150°C to minimize any impact on the underlying Si-based selector. Notably, many existing transfer methods rely on manual craftsmanship rather than automated and controlled procedures, posing challenges for achieving wafer-scale 2D material transfer⁵⁴. In this study, we successfully implement a wafer-scale transfer process facilitated by a wafer de-bonder machine (Supplementary Fig. 8). Furthermore, the Raman mapping results depict a uniform and consistent distribution of the A_1g peak of HfSe₂ before and after the transfer, thereby substantiating the existence of continuous and uniform HfSe₂ thin film before and after the transfer (Supplementary Fig. 4). The complete CBA is finalized through interlayer oxide (SiO₂) deposition, via etching, and WL metallization (Fig. 1k). This low-temperature integration process ensures compatibility with the BEOL processes in Si-based CMOS process flows. For detailed information on the fabrication process, refer to Methods.

Device characteristics of 1S1R CBA

To gain insights into the RS behaviors within the integrated 1S1R CBA, we begin by analyzing the characteristics of individual 1S1R devices. Figure 2a depicts the current-voltage (I-V) curve of a representative Si-based selector. Here, a voltage is applied between the middle electrode (ME) and the bottom electrode (BE), while the resulting current is measured across the bulk p-type Si. Subsequently, we investigate the memristor located on top of the stack by applying a voltage at the top electrode (TE) while keeping the ME grounded. The measured RS I-V curve (Fig. 2b) exhibits behavior consistent with previously reported values for the same structure, including the switching voltage (-0.6 V), switching ratio (50 times), and reset current (1 mA)⁵³, thereby confirming the proper functionality of the integrated memristor. Further examination of the RS characteristics of the integrated device is presented in Fig. 2c. The dotted lines represent DC sweep measurements conducted over multiple cycles. Notably, during the reset process in the positive voltage regime, the I-V curves exhibit significant overlap, underscoring minimal cycle-to-cycle variation when reading the resistance states of the device. Reverse current engineering is conducted for the Si-selector, and the results are summarized in Supplementary Table 3, indicating that identifying the optimal Si-based selector, which balances a sufficiently large rectification ratio with an appropriate reverse current, is imperative for the success of our integration. By utilizing the Si-based selector at the bottom, the current during the set operation is effectively limited, leading to self-compliance behavior. The increased set voltage in the integrated 1S1R device arises from the reverse-biased Si diode dropping a significant portion of the voltage during the set process. However, Supplementary Fig. 9 shows the small switching voltages of the memristors (1R) with minimum device variation, indicating the low voltage advantage of the 2D HfSe₂-based memristors (Supplementary Table 1). It should be noted that the primary objective of this study is to demonstrate heterogeneous integration between 2D material memristors and Si selectors, with a focus on exploring the statistical behavior of devices within the 1S1R CBA (will be discussed later). The choice of Si-based diodes over transistors is motivated by practical considerations, particularly the ease of implementation within university laboratories. Potential improvements could be achieved through the utilization of foundry-fabricated transistors, facilitating a 1T1R integration approach (We demonstrated 1T1R using foundry FinFET in Supplementary Fig. 24). In such a configuration, the On-state transistor would incur a notably lower voltage drop compared to a reverse-biased diode. In addition, 1T1R integration is suitable for solving the cross-talk issue when the array density is high such as 10 nm technology node (Supplementary Table 4).

Our HfSe₂-based 1R memristor exhibits rapid response speed, with switch and read times as short as 1 ns (Fig. 2d, 2e and Supplementary Fig. 10). For the integrated 1S1R cell, the switch and read times are 45 ns and 60 ns, respectively (Supplementary Fig. 11). It should be noted that, although such response speed is considered typical for standalone memristors, it is crucial to emphasize that such a response time represents a notably swift operation, especially within the context of 2D materials-based 1T1R devices and self-rectifying memristors. To substantiate this assertion, we have conducted a comparative analysis of our work against other system-level implementations utilizing memristors in either 1S1R or 1T1R configurations. The benchmark table illustrating this comparison is provided in Supplementary Tables 5 and 6. We attribute the accelerated speed to two main factors. First, the fast Ti ion diffusivity, facilitating rapid filament formation within 20 ns, which has been reported also in other 2D materials.^53,55 Second, the 1S1R CBA design incorporates a small overlap area and maintains low parasitic capacitance, effectively mitigating any additional latency that could potentially arise from integrated devices (Supplementary Fig. 12 and Supplementary Note 3)⁵⁶. In addition, the integration of the Si-based selector in the 1S1R cell effectively controls the current during the set process. This prevents the over-shoot phenomena where the current escalates to excessively high levels and may cause deep low resistance states (LRS) or device breakdown. As shown in Fig. 2f, precise conductance adjustment during device programming is achieved through a single-boundary closed-loop pulsing scheme. When the device conductance falls below the set target conductance (330 µS), a negatively ramped stair pulse is applied to set the cell. Once the conductance is more than the set target, positive reset pulses are subsequently applied to reset the conductance to the reset target (13 µS). The detailed single-boundary close-loop pulse scheme is shown in Supplementary Note 4. To validate the increase/decrease in conductance due to Ti within the device, Conductive Atomic Force Microscopy (CAFM) measurements were conducted, with different electrodes shown in Supplementary Fig. 27. Supplementary Fig. 13 shows that the observed change in conductance is attributed to the combined effect of an increased/decreased in the number of filaments and the growth/erase of each individual filament. To further control the variation of resistance at HRS and LRS, double-boundary close-loop pulse scheme is developed (Supplementary Note 5). This closed-loop scheme is repeated for 26,500 cycles without device failure, and the endurance results are presented in Fig. 2g. Notably, the endurance of our integrated 1S1R cell surpasses that of other 2D materials-based memristors without selector integration^23,27,57,58. In comparison, a HfSe₂ memristor with the same structure but lacking selector integration experience breakdown easily due to lack of appropriate compliance current during pulse programing (Supplementary Fig. 14), underscoring the improved endurance achieved in our integrated 1S1R cell. Furthermore, Supplementary Fig. 15 demonstrates that individual devices exhibit a substantial switching ratio of 40 times through the double-boundary close-loop programming. This observed phenomenon is consistently replicable across ten distinct devices. The primary reason for this is the high-speed operation of the memristor during set/reset cycles, which requires an additional device such as a selector or transistor in series to rapidly limit the current. We validated this by applying 1 ns program pulses of ± 1 V to a 1R device, achieving an endurance of 1 million cycles (Supplementary Fig. 25). The device also demonstrates stable non-volatile retention (up to 10⁴ seconds) at 85 ºC, highlighting its potential for CIM applications where weights storage in the CBA requires reliable retention (Supplementary Fig. 16)²¹. The comparison among our device and other conventional memristors are shown in Supplementary Table 7 and Supplementary Note 6.

We proceeded with an investigation into the RS behavior within the CBA. To the best of our knowledge, unreported exploration of such extensive integration utilizing both 2D material and Si platforms. We contend that the performance characteristics of this large-scale integration have not been thoroughly examined to date. Due to with the Si-based selector, we effectively address the sneak current issue (Fig. 2h), where the programming of a selected cell inadvertently affects half-selected and non-selected cells in CBA without adequate control over the leakage current (Fig. 2i) ²⁹. As shown in Fig. 2h, only the selected cell experiences changes in conductance under external pulses, while other regions of the CBA remain unaffected. The detailed voltage pulsing program is described in Supplementary Fig. 17. It is worth noting that in our memristor array without selector integration, even with the V/2 voltage scheme applied to the 1R array, nearby cells still experience disturbance (Supplementary Fig. 18), which demonstrates the importance of the Si-based selector in mitigating sneak current⁵⁹. Moreover, it is noteworthy that the utilization of a Si-based selector, coupled with the mitigation of the sneak current issue in the crossbar array (CBA), enables precise programming of individual devices within the CBA. This precision facilitates a detailed statistical analysis, including an examination of device variations and the yield of the as-fabricated CBA. Such intricate analyses are challenging to conduct in passive memristor 1R arrays, as devices will get disturbed during the measurement of nearby devices due to uncontrollable sneak current in the passive array. To assess the 1S1R CBA, we first employed a "2D-Si NUS" pattern to set weights across the entire array using multiple cells. In Fig. 2j, the bright cells represent those set to LRS, while the background cells correspond to HRS. Subsequently, we measured the resistance of 273 cells in different columns across the array, as shown in Fig. 2k. However, discrepancies exist between the ideal pattern and the measured pattern due primarily to analog noise processes (to be discussed in the subsequent section) and variations in HRS and LRS originate from the single-boundary close-loop programming (Supplementary Note 4). The variations in HRS and LRS can be reduced by fine tuning the set and reset target ranges using the double-boundary close-loop logic^60,61. Supplementary Fig. 19 illustrates device variation of 40 devices randomly selected from the CBA, and each device is switched for 50 cycles by the double-boundary close-loop logic. The device-to-device variation is extracted and compared with other conventional oxide based memristors, showing small variation and high uniformity of the array (Supplementary Table 8). The uniform RS ratio of 7.5 times and minimal variation of the devices across these cycles are evidenced by the consistent HRS and LRS. Supplementary Table 9 shows a benchmark table of the RS ratio in the 1S1R CBA, comparing our work and other system-level memristor CBA demonstrations. Our RS ratio is evaluated and found to be comparable with other system-level CBA studies. During wafer-level integration, it is anticipated that the variation within a single die remains manageable. The primary source of variation is likely to be die-to-die differences arising from the transfer process. This variability can be mitigated by adapting the pulse programming range for each die. Figure 2l presents a fault mapping of the devices in a 32 × 32 1S1R CBA, revealing a commendable yield of 89.0% assessed across 992 devices. As shown in Supplementary Table 5, our work has significantly improved both response time and energy efficiency compared to other 2D material-based memristors with selectors. This advancement helps close the performance gap between 2D material-based memristors and oxide-based memristors. Furthermore, due to the low-voltage advantage of the ultra-thin 2D HfSe₂ film, computations based on a 1T1R configuration demonstrate the potential for substantial improvement in energy efficiency, reaching an impressive 1309.1 TOPS^− 1W^− 1. This surpasses the performance of oxide-based memristors, showcasing the superior capabilities of our proposed system (Detailed calculations of energy efficiency are presented in Supplementary Table 5, 10 and Supplementary Note 7). In contrast to conventional technologies that offer thousands of conductance states,⁶² our device demonstrates a 3-bit weight programming capability, as depicted in Supplementary Fig. 20. We contend that this range of 3–4 bits suffices for our intended applications, and exceeding this bit range would be superfluous. Moreover, the discernment of higher bits in the weights would necessitate a system equipped with a very high precision Analog-to-Digital Converter (ADC), inevitably resulting in increased power consumption.

Time-domain parallel array readout

To enhance the functionality and energy efficiency of peripheral circuits in the CIM system, we have designed specialized sensing circuits equipped with current subtractor and sensing units. A time-to-digital converter (TDC) approach was chosen instead of using an analog-to-digital converter (ADC) approach. This choice is motivated by the benefits of current summation in the array columns, which enables low-power readout in the time domain⁶³. As shown in Fig. 3a and Supplementary Fig. 21, each column pair in the column is equipped with a column subtraction unit that enables differential reading, facilitating the implementation of negative weights in CNN kernels^23,53,64,65. These readings are then processed through the TDC unit. We have employed a time delay technique with explicit lumped capacitors to extract the current subtraction results. In our hardware implementation, an FPGA-based TDC counter measures the time taken for the voltage of the column lumped capacitor to discharge from the charged state to a reference voltage (V_ref). Figure 3b shows an optical image of the fabricated PCB containing the current subtractor and TDC units. As our CBA consists of 32 columns, we have integrated a total of 16 current subtractors and 16 TDC-based sensing circuits into the CIM hardware system (Fig. 1a).

Figure 3c presents the sensed time, measured in tens of nanoseconds, corresponding to the output current from the CBA, demonstrating the rapid sensing performance and low power (2.5 folds) as compared to the conventional ADC sensing circuits (Fig. 3d)^10,66. In addition, energy consumption in the time domain is comparatively lower since only a single sample is required, as opposed to multiple samples in the frequency domain. The efficiency of TDC relies on a sufficiently large summed current resulting from the MAC operation, owing to the inverse relationship between current and time conversion. Leveraging on the fast device operation and sensing speeds, and low-power sensing circuits, low-energy sensing in nanojoules range is achieved. The detailed energy consumption of the TDC sensing circuit is shown in Supplementary Fig. 22. However, in the actual implementation on the PCB, data-dependent errors may occur due to factors such as distributed bit cell capacitances and variations in the discrete components. Figure 3e depicts the measured time-current relationship, with the extrapolated curve exhibiting a typical nonlinear decreasing trend. Although TDC sensing exhibits a nonlinear response to current, we have observed that the impact of this nonlinearity on the accuracy of CNNs in practical applications is negligible (discussed in the subsequent section).

Biological nervous systems rely on the parallel processing capabilities of multiple receptor neurons, allowing for the generation of outputs in a simultaneous manner (Fig. 3f)^67,68. In our CIM system, we have implemented a biologically-inspired parallel computing scheme. Current readings of all columns are initiated simultaneously when the TDC receives a start signal, and they are terminated using various stop signals depending on the current-dependent delay (Fig. 3g). On the other hand, the diverse properties of different kernels in a neural network pose a challenge when it comes to optimizing a single type of CBA that efficiently handles all of them. For one such case, the output current of the MAC operation through a larger kernel mostly exhibit more current and faster response compared to smaller kernels. Hence, optimal utilization of CBA needs further investigation (Fig. 3h)⁶⁹.

Hardware implementation of CNN

To demonstrate the capabilities of our hardware CIM system for AI applications, we have constructed a complete-hardware CNN that consists of four 3 × 3 convolution kernels and four output neurons (Fig. 4a). The input images used in this experiment represent binary representations of various English letters. To evaluate the classification accuracy, we directly compare the network's outputs with the labels of the input patterns. The input patterns include four stylized letters ('n', 'u', 'z', and 's') and four sets of nine noisy versions of each letter, formed by flipped one of the pixels of the original image. (Fig. 4b). Because of the limited size of dataset, these 40 patterns are used for both training and testing. It is noteworthy that previous experimental demonstrations of ANN, particularly in the field of hardware implementations, have employed relatively small datasets, consisting of 30 images or even fewer, for both training and testing purposes.⁷⁰ In our present study, our emphasis lies in showcasing the potential of the 2D material platform in executing a fully hardware compute-in-memory demonstration. It is acknowledged that utilizing identical datasets for both training and testing may lead to risk of overfitting. Future investigations should delve into the expansion of the array size, or the usage of multiple CBAs within the same system, to accommodate more intricate datasets.²¹ The flowchart in Fig. 4c illustrates the training and inference process employed in this CNN demonstration. In our hardware implementation, we initially perform ex-situ training of the CNN and extract the weights of the convolution kernels (refer to the Methods section for more details). To program the analog weights into our binary-capable 1S1R CBA hardware, we perform weight quantization, in which the weight values are updated to '-1', '0', and '1'. The quantized weight matrix is shown in Fig. 4d.

After extracting the kernel weights, they are transferred onto our hardware by programming selected cells in the CBA to represent the desired conductance values corresponding to the weights. The CBA employs the column differential pair method to represent the kernel weights, where two adjacent cells in different columns represent the weights using the equation: W_1i = G_1i⁺ – G_1i⁻, where G_1i⁺ and G_1i⁻ represent the conductance of the two neighboring cells (Fig. 4e). The subtraction operation is performed using the current subtractor in the peripheral circuits (Fig. 3a). In this demonstration, we used four 3 × 3 kernels, requiring a total of 72 (4 × 9 × 2) cells in the CBA to generate 36 differential pairs that represent the weight values. Figure 4f provides an example of the hardware inference process within the CBA. The input voltage vectors, encoded based on the input patterns, are sent to the CBA, and the MAC output currents are obtained simultaneously in different columns. The current subtraction between column pairs, such as I₁⁺ and I₁⁻, occurs outside the CBA.

We now summarize the results obtained from the CNN inference on our hardware system. Figure 4g and Fig. 4h illustrate the map of the kernel weights in terms of differential conductance within the hardware CBA. Initially, prior to the weight transfer step, the weights are randomly distributed (Fig. 4g). However, after the weight transfer process, the conductance of the cells are programmed to the desired values based on the extracted kernel weights (Fig. 4h). Next, we present the results of the CNN inference in Fig. 4i-l. Figure 4i shows the direct output currents sensed from our CIM system, where each color represents the current from one column differential pair corresponding to a neuron output in the final layer of the CNN (Fig. 4a). The CNN recognizes the input pattern by identifying the neuron with the highest output current. Figure 4j provides examples of both correct and incorrect classification results. In the upper panel, for the letter 'n', the first neuron exhibits the highest output current, correctly recognizing the letter 'n'. Conversely, in the bottom panel, the largest current occurs at the neuron corresponding to the letter 'z', whereas the true label of the input is 's'. Figure 4k shows the MAC output error as compared to the output current through arithmetic calculation. It should be noted that although error exist due to sensing noise and device variation, the classification accuracy does not get disturbed significantly, because we choose the index of the maximum output as the classified pattern. Figure 4l summarizes the classification results, comparing pure software training, CNN with quantized kernels, and hardware CNN inference. The hardware CNN achieves the same accuracy as the software implementation with quantized weights (97.5%), implying that increase in the number of misclassified patterns in this CNN implementation is due to the weight quantization. Importantly, this CNN hardware demonstration performs all data encoding, MAC operations, and output sensing within our CIM system, without any postprocessing or simulation after classification. This showcases a complete hardware implementation of the CNN based on our CIM system. The high accuracy achieved by this CNN demonstration, combined with the fast device and low-latency sensing circuits, highlights the potential of our CIM system for real-time processing applications⁷¹.

To meet the requirements of neural networks in real-world scenarios, where analog input images are common and data transfer is resource-intensive, we have extended the capabilities of our CIM system to incorporate analog computing functionality and in-built activation functions⁷². Figure 5a depicts the CNN structure employed to evaluate the performance of analog computing and the hardware activation functions. The hardware implementation is confined to the initial layer of the network, facilitating the transition from the input image (28 × 28) to feature maps (4 × 26 × 26). This involves the utilization of four 3 × 3 kernels, requiring a total of 72 devices for weight mapping onto the CBA hardware, following the same method as shown in Fig. 4. Each kernel necessitates 9 × 2 devices due to the column differential method. For subsequent layers in the CNN depicted in Fig. 5, simulations are conducted using Python TensorFlow code. Figure 5b showcases the outputs of the analog convolution operation using both software and our hardware CBA approach. The analog hardware CBA output illustrates the directly measured output currents from the hardware CBA during the convolution operation, accounting for both device-to-device and cycle-to-cycle variations within the CBA. In addition, this output is not the method that only takes the advantage of nonlinear properties of a single device. Conversely, the analog software output is simulated based on single-device conductance data and executes an ideal arithmetic multiply-and-accumulate operation. The close resemblance between the software-processed images and the output images from the hardware CBA validates the accuracy of analog computing. Discrepancies between the analog hardware CBA output and software output arise from nonideal sneak currents in the CBA and the programming error associated with the conductance of each device in the CBA. Additional information regarding the configuration for analog computing, including pixel quantization and voltage encoding, provided in Supplementary Fig. 23.

Additionally, considering the nonlinear response and monotonically decreasing trend of our TDC-based sensing circuits, we leverage these circuits to incorporate in-built hardware activation functions within the CNN. This integration of activation functions reduces data transfer during forward propagation in artificial neural networks and enhance the overall energy efficiency of the system⁷². Figure 5c illustrates the effectiveness of our activation functions based on nonlinear responses. We compare the recognition accuracy using the rectified linear activation function (ReLU), Sigmoid function, and our TDC-based activation function while maintaining the same CNN structure described in Fig. 5a and ensuring consistency across other layers during software simulation. Our in-built activation function achieves a higher accuracy (95.48%) compared to Sigmoid, indicating superior recognition performance (Fig. 5c and 5d). Following the software simulation illustrated in Fig. 5c, we proceed to extract the four 3 × 3 kernels and transfer their weights into the CBA to evaluate the accuracy of hybrid computing, which involves a utilization of both hardware and software components. In the case of hybrid computing employing TDC-based activation, the analog convolution operation and subsequent TDC-based activation functions are executed in the hardware domain, while the subsequent layers of the CNN are implemented through software coding. In contrast, for hybrid computing scenarios utilizing ReLU and Sigmoid activations, solely the analog convolution is conducted within the hardware CBA, with the activation functions being defined through mathematical equations in the software domain. Figure 5e and 5f demonstrate that the introduction of hardware analog convolution has a negligible impact on CNN recognition accuracy, maintaining a high accuracy rate of 95.02% as compared to the 95.48% in software simulation. Supplementary Video 1 provides a detailed process flow, starting from input images to the recognized output patterns, and includes additional examples.

In conclusion, we have presented a comprehensive implementation of CNNs using our hardware CIM system. The large-scale CBA achieved remarkable response speed, energy efficiency and extensive statistical analysis, leveraging the advantages of heterogeneous integration with 2D-Si materials. By integrating specialized sensing circuits, low-power, time-domain parallel readout capabilities were demonstrated. The combination of the central CBA design and peripheral circuit design facilitated efficient MAC operations within the integrated system. Our full-hardware CNN implementation successfully classified binary input images, highlighting the potential of our CIM system for AI applications. Moreover, the incorporation of analog computing and in-built activation functions expanded the capabilities of our CIM system, allowing effective handling of analog inputs and reducing data transfer during forward propagation. The hardware-based recognition process maintained high accuracy, demonstrating the robustness and performance of our system. These results manifest the potential of our CIM system as real-time accelerators, combining high-accuracy CNN inference with minimized energy consumption. This work contributes to the advancement of hardware-based AI systems, showcasing the effectiveness of 2D materials-based memristive CBAs and their fully integrated CIM hardware in enabling efficient and accurate neural network computations.

Device fabrication

First, the bottom Si-based selector array was fabricated using the process flow described in Supplementary Fig. 3. Subsequently, the as-grown HfSe₂ thin film was transferred onto the Si-based selector array utilizing the metal-assisted transfer method previously reported (as shown in Supplementary Figs. 7 and 8)⁵³. A layer of Ti/Au (10/30 nm) was then deposited on the transfer film to serve as the etch stop layer. Following this, mask aligner lithography was employed to define the 5 × 5 µm² region for the memristor. The excess transferred film was removed via reactive ion etching (RIE, Oxford) with Cl₂ and Argon gas. A layer of interlayer oxide (100 nm SiO2) was deposited over the sample through plasma-enhanced chemical vapor deposition (PECVD Oxford), and via etching was carried out using RIE, with the vias having a diameter of approximately 2 µm. The completeness of SiO₂ etching was confirmed through atomic force microscope (AFM, Park Systems) analysis. Finally, the word lines (WLs) were patterned using mask aligner lithography, followed by the deposition of a layer of Ti (350 nm).

Material characterization

The HfSe₂ was characterized by Raman spectroscopy (Renishaw inVia), XPS (Quantera PHI II), TEM, and EDX (Talos F200X). All the optical microscope images were taken using VHX Digital Microscope. Before TEM characterization, thin lamellae were fabricated using a focused ion beam (FEI, Helios NanoLab).

Electrical measurement

For the individual device measurements, we employed a B1500A semiconductor analyzer. The stair pulse with close-loop logic was executed using a GPIB cable, connecting the analyzer to a computer. The program and erase loop code were implemented using the C programming language. As for the CBA measurements, all data were directly acquired from the hardware CIM system, encompassing the program, read, and compute modes. Furthermore, to ensure accurate readings, calibration of the TDC sensing circuits was performed. This calibration involved connecting standard resistors and comparing the measured output with the ideal current, calculated using Ohm's law, thus validating the TDC performance.

Ex-situ training of CNN

A software-based CNN was implemented using TensorFlow 2.10.0 in Python 3.9.16, comprising an input layer, convolution layer, and output layer. The dimensions of the input layer are (40, 3, 3, 1), wherein each number denotes the number of input images, image height, image width, and channel number of the image, respectively. The convolution layer has dimensions of (4, 3, 3, 1), with each number representing the number of kernels, height of a kernel, width of a kernel, and channel number of a kernel, respectively. As for the output layer, its dimensions are (40, 4, 1, 1), indicating the number of output vectors, number of kernels, output height, and output width, respectively. During the training phase, the cross-entropy function is employed to calculate the cost function, and the optimization is performed using the Adam algorithm. The training utilizes a batch size of 10 and is carried out for 100 epochs. After completion of the training process, the weights are quantized and subsequently transferred to the hardware CBA for further inference and analysis.

Data availability

The data that support the plots within this article and other findings of this study are available from the corresponding author upon reasonable request.

Acknowledgement

This work is supported by the National Research Foundation, Prime Minister’s Office, Singapore, under its Competitive Research Program (NRF-CRP24-2020-0002).

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore

Samarth Jain, Sifan Li, Haofei Zheng, Lingqi Li, Xuanyao Fong & Kah-Wee Ang

Contributions

This project was supervised and directed by K.-W.A and X.F. S.J., S.L. and K.-W.A. conceived this work. S.J. and S.L. contributed equally to this project. S.J., S.L. and K.-W.A. designed the experiments. S.J. and S.L. performed the device fabrication. H.Z. and L.L. contributed to the transfer of 2D material. S.J. conducted the circuits and FPGA design and validation. S.J. and S.L. performed the electrical measurement. All authors contributed to the discussion and analysis of results. S.J., S.L., X.F. and K.-W.A. wrote the manuscript.

Corresponding authors

Correspondence to Xuanyao Fong and Kah-Wee Ang.

Competing interests

The authors declare no competing interests.

Additional information

Supplementary information is available for this paper at

Correspondence and requests for materials should be addressed to K.-W. A

Chen W-H et al (2019) CMOS-integrated memristive non-volatile computing-in-memory for AI edge processors. Nat Electron 2:420–428
Ielmini D, Wong H-S (2018) P. In-memory computing with resistive switching devices. Nat Electron 1:333–343
Wong H-SP, Salahuddin S (2015) Memory leads the way to better computing. Nat Nanotechnol 10:191–194
Zidan MA, Strachan JP, Lu WD (2018) The future of electronics based on memristive systems. Nat Electron 1:22–29
Fuller EJ et al (2019) Parallel programming of an ionic floating-gate memory array for scalable neuromorphic computing. Science 364:570–574
Xu X et al (2021) 11 TOPS photonic convolutional accelerator for optical neural networks. Nature 589:44–51
Xu X et al (2018) Scaling for edge inference of deep neural networks. Nat Electron 1:216–222
Mennel L et al (2020) Ultrafast machine vision with 2D material neural network image sensors. Nature 579:62–66
Huo Q et al (2022) A computing-in-memory macro based on three-dimensional resistive random-access memory. Nat Electron 5:469–477
Cai F et al (2019) A fully integrated reprogrammable memristor–CMOS system for efficient multiply–accumulate operations. Nat Electron 2:290–299
Li C et al (2018) Analogue signal and image processing with large memristor crossbars. Nat Electron 1:52–59
Pi S et al (2019) Memristor crossbar arrays with 6-nm half-pitch and 2-nm critical dimension. Nat Nanotechnol 14:35–39
Li C et al (2017) Three-dimensional crossbar arrays of self-rectifying Si/SiO2/Si memristors. Nat Commun 8:15666
Lin P et al (2020) Three-dimensional memristor circuits as complex neural networks. Nat Electron 3:225–232
Choi BJ et al (2016) High-Speed and Low-Energy Nitride Memristors. Adv Funct Mater 26:5290–5296
Yoon JH et al (2017) Truly Electroforming-Free and Low-Energy Memristors with Preconditioned Conductive Tunneling Paths. Adv Funct Mater 27:1702010
Prakash A et al (2015) Demonstration of Low Power 3-bit Multilevel Cell Characteristics in a TaO_x-Based RRAM by Stack Engineering. IEEE Electron Device Lett 36:32–34
Zhang Z-C et al (2021) An Ultrafast Nonvolatile Memory with Low Operation Voltage for High-Speed and Low-Power Applications. Adv Funct Mater 31:2102571
Chen P-Y, Yu S (2015) Compact Modeling of RRAM Devices and Its Applications in 1T1R and 1S1R Array Design. IEEE Trans Electron Devices 62:4022–4028
Hu M et al (2018) Memristor-Based Analog Computation and Neural Network Classification with a Dot Product Engine. Adv Mater 30:1705914
Yao P et al (2020) Fully hardware-implemented memristor convolutional neural network. Nature 577:641–646
Li Y et al (2021) Anomalous resistive switching in memristors based on two-dimensional palladium diselenide using heterophase grain boundaries. Nat Electron 4:348–356
Li Y et al (2022) In-memory computing using memristor arrays with ultrathin 2D PdSeOx/PdSe2 heterostructure. Adv Mater 34:2201488
Pam ME et al (2022) Interface Modulated Resistive Switching in Mo-irradiated ReS2 for Neuromorphic Computing. Adv Mater 34:2202722
Huh W et al (2018) Synaptic Barristor Based on Phase-Engineered 2D Heterostructures. Adv Mater 30:1801447
Wang S et al (2022) Two-dimensional devices and integration towards the silicon lines. Nat Mater 21:1225–1239
Chen S et al (2020) Wafer-scale integration of two-dimensional materials in high-density memristive crossbar arrays for artificial neural networks. Nat Electron 3:638–645
Shen Y et al (2021) Variability and Yield in h-BN-Based Memristive Circuits: The Role of Each Type of Defect. Adv Mater 33:2103656
Yu S (2018) Neuro-inspired computing with emerging nonvolatile memorys. Proc. IEEE 106, 260–285
Li H et al (2021) Memristive Crossbar Arrays for Storage and Computing Applications. Adv Intell Syst 3:2100017
Wang Y et al (2019) High on/off ratio black phosphorus based memristor with ultra-thin phosphorus oxide layer. Appl Phys Lett 115:193503
Tang B et al (2022) Wafer-scale solution-processed 2D material analog resistive memory array for memory-based computing. Nat Commun 13:3037
Afshari S et al (2023) Dot-product computation and logistic regression with 2D hexagonal-boron nitride (h-BN) memristor arrays. 2D Mater 10:035031
Ronneberger O, Fischer P, Brox T, U-Net (2015) Convolutional Networks for Biomedical Image Segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. Springer International Publishing, Cham, pp 234–241. doi:10.1007/978-3-319-24574-4_28.
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324
Yeh C-H, Zhang D, Cao W, Banerjee K (2020) 0.5T0.5R - Introducing an Ultra-Compact Memory Cell Enabled by Shared Graphene Edge-Contact and h-BN Insulator. in 2020 IEEE International Electron Devices Meeting (IEDM) 12.3.1–12.3.4 10.1109/IEDM13553.2020.9371902
Wang C-H et al (2018) 3D Monolithic Stacked 1T1R cells using Monolayer MoS2 FET and hBN RRAM Fabricated at Low (150°C) Temperature. in 2018 IEEE International Electron Devices Meeting (IEDM) 22.5.1–22.5.4 10.1109/IEDM.2018.8614495
Su CJ et al (2020) 3D Integration of Vertical-Stacking of MoS2 and Si CMOS Featuring Embedded 2T1R Configuration Demonstrated on Full Wafers. in 2020 IEEE International Electron Devices Meeting (IEDM) 12.2.1–12.2.4 10.1109/IEDM13553.2020.9371988
Tang B et al (2022) Wafer-scale solution-processed 2D material analog resistive memory array for memory-based computing. Nat Commun 13:3037
The International Roadmap For Devices and Systems (2022) The International Roadmap For Devices and Systems: 2022 https://irds.ieee.org/images/files/pdf/2022/2022IRDS_BC.pdf (2022)
Ning H et al (2023) An in-memory computing architecture based on a duplex two-dimensional material structure for in situ machine learning. Nat Nanotechnol 18:493–500
Roldan JB et al (2022) Spiking neural networks based on two-dimensional materials. Npj 2D Mater Appl 6:63
Yuan J et al (2021) Reconfigurable MoS ₂ Memtransistors for Continuous Learning in Spiking Neural Networks. Nano Lett 21:6432–6440
Albagami M et al (2020) Anomalous Conductivity Switch Observed in Treated Hafnium Diselenide Transistors. Adv Electron Mater 6:1901246
Pleshchev VG, Selezneva NV, Baranov NV (2012) Influence of copper intercalation on the resistive state of compounds in the Cu-HfSe2 system. Phys Solid State 54:716–721
Migliato Marega G et al (2023) A large-scale integrated vector–matrix multiplication processor based on monolayer molybdenum disulfide memories. Nat Electron 6:991–998
Teja Nibhanupudi SS et al (2024) Ultra-fast switching memristors based on two-dimensional materials. Nat Commun 15:2334
Zhu K et al (2023) Hybrid 2D–CMOS microchips for memristive applications. Nature 618:57–62
Sun L et al (2019) Self-selective van der Waals heterostructures for large scale memory array. Nat Commun 10:3161
Wan W et al (2022) A compute-in-memory chip based on resistive random-access memory. Nature 608:504–512
Hung J-M et al (2021) A four-megabit compute-in-memory macro with eight-bit precision based on CMOS and resistive random-access memory for AI edge devices. Nat Electron 4:921–930
He W et al (2020) 2-Bit-Per-Cell RRAM-Based In-Memory Computing for Area-/Energy-Efficient Deep Learning. IEEE Solid-State Circuits Lett 3:194–197
Li S et al (2022) Wafer-Scale 2D Hafnium Diselenide Based Memristor Crossbar Array for Energy-Efficient Neural Network Hardware. Adv Mater 34:2103376
Huyghebaert C et al (2018) IEEE, San Francisco, CA,. 2D materials: roadmap to CMOS integration. in 2018 IEEE International Electron Devices Meeting (IEDM) 22.1.1–22.1.4 10.1109/IEDM.2018.8614679
Shi Y et al (2018) Electronic synapses made of layered two-dimensional materials. Nat Electron 1:458–465
Lee H-S et al (2020) Dual-Gated MoS2 Memtransistor Crossbar Array. Adv Funct Mater 30:2003683
Feng X et al (2019) A Fully Printed Flexible MoS ₂ Memristive Artificial Synapse with Femtojoule Switching Energy. Adv Electron Mater 1900740. 10.1002/aelm.201900740
Xie J, Afshari S (2022) Sanchez Esqueda, I. Hexagonal boron nitride (h-BN) memristor arrays for analog-based machine learning hardware. Npj 2D Mater Appl 6:50
Gallo ML et al (2018) Mixed-Precision In-Memory Computing. Nat Electron 1:246–253
Kim GH et al (2017) Four-Bits-Per-Cell Operation in an HfO ₂ -Based Resistive Switching Device. Small 13:1701781
Ryu JJ et al (2019) Fully Erase-free Multi-Bit Operation in HfO ₂ -Based Resistive Switching Device. ACS Appl Mater Interfaces 11:8234–8241
Rao M et al (2023) Thousands of conductance levels in memristors integrated on CMOS. Nature 615:823–829
Jung S et al (2022) A crossbar array of magnetoresistive memory devices for in-memory computing. Nature 601:211–216
Oh S Energy-efficient Mott activation neuron for full-hardware implementation of neural networks. Nat Nanotechnol 9
Yeon H et al (2020) Alloying conducting channels for reliable neuromorphic computing. Nat Nanotechnol 15:574–579
Hsieh ER et al (2019) High-Density Multiple Bits-per-Cell 1T4R RRAM Array with Gradual SET/RESET and its Effectiveness for Deep Learning. in. IEEE International Electron Devices Meeting (IEDM) 35.6.1–35.6.4 (2019). 10.1109/IEDM19573.2019.8993514
Zhu X, Li D, Liang X, Lu WD (2019) Ionic modulation and ionic coupling effects in MoS2 devices for neuromorphic computing. Nat Mater 18:141–148
Tian H et al (2016) Anisotropic Black Phosphorus Synaptic Device for Neuromorphic Applications. Adv Mater 28:4991–4997
Hall M, Betz VHPIPE (2020) Heterogeneous Layer-Pipelined and Sparse-Aware CNN Inference for FPGAs. in Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays 320Association for Computing Machinery, New York, NY, USA, 10.1145/3373087.3375380
Prezioso M et al (2015) Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature 521:61–64
Jayachandran D et al (2020) A low-power biomimetic collision detector based on an in-memory molybdenum disulfide photodetector. Nat Electron 3:646–655
Peng X, Huang S, Jiang H, Lu A, Yu S (2020) DNN + NeuroSim V2.0: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators for On-chip Training. Preprint at http://arxiv.org/abs/2003.06471
Pendurthi R et al (2024) Monolithic three-dimensional integration of complementary two-dimensional field-effect transistors. Nat Nanotechnol 19:970–977
Kang J-H et al (2023) Monolithic 3D integration of 2D materials-based electronics towards ultimate edge computing solutions. Nat Mater 22:1470–1477
Kwon J et al (2024) 200-mm-wafer-scale integration of polycrystalline molybdenum disulfide transistors. Nat Electron 7:356–364
Jayachandran D et al (2024) Three-dimensional integration of two-dimensional field-effect transistors. Nature 625:276–281
Zhu J et al (2023) Low-thermal-budget synthesis of monolayer molybdenum disulfide for silicon back-end-of-line integration on a 200 mm platform. Nat Nanotechnol 18:456–463

There is NO Competing Interest.

NCOMMS2456794Tsupportinginformation.pdf

Download PDF

Version 1

posted

You are reading this latest preprint version

Heterogeneous 2D Memristor Array and Silicon Selector for Compute-in-Memory Hardware in Convolution Neural Networks

Status:

Version 1

Abstract

Figures

Introduction

System architecture and device heterogeneous integration

Device characteristics of 1S1R CBA

Time-domain parallel array readout

Hardware implementation of CNN

Conclusion

Methods

Device fabrication

Material characterization

Electrical measurement

Ex-situ training of CNN

Declarations

Data availability

Acknowledgement

Author information

Authors and Affiliations

Contributions

Corresponding authors

Competing interests

Additional information

References

Additional Declarations

Supplementary Files

Status:

Version 1