The ongoing transformation toward AI-driven technologies has already resulted in new, more efficient, and innovative solutions — including those that enhance everyday life. However, in many practical scenarios, deploying AI systems still requires immense computational power. To address this challenge, current research is exploring novel non-Von Neumann analog computing architectures, with a particular focus on in-memory computing applying crosspoint arrays and neuromorphic computing. These architectures draw inspiration from biological systems to perform tasks such as pattern recognition, sensory processing, and adaptive learning — all essential components of human cognition.
One of the key challenges in implementing analog computing arrays in hardware is the non-ideal behavior of devices, which often leads to significant deviations from their expected performance. To mitigate these accuracy issues, researchers have proposed and studied locally coupled, programmable memristive computing arrays. An important development in this area is the extension [1] of the Cellular Nonlinear Network (CellNN) paradigm to the memristive Cellular Nonlinear Network (M-CellNN). In this approach, memristors are directly integrated into the cell structure and used to form synaptic connections. This has enabled the creation of high-performance computing architectures that leverage analog dynamics while remaining resilient to the effects of non-ideal hardware behavior.
This paper presents and discusses in detail recent advancements in crossbar-based M-CellNN systems.
References:
[1] R. Tetzlaff, A. Ascoli, I. Messaris and L. O. Chua, "Theoretical Foundations of Memristor Cellular Nonlinear Networks: Memcomputing with Bistable-Like Memristors," in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 67, no. 2, pp. 502-515, Feb. 2020, doi: 10.1109/TCSI.2019.2940909.
More than ever modern electronic systems require semiconductor memories [1]. The rapidly increasing use of artificial intelligence in electronic systems is a major driver to this trend [2]. At the same time, new physical storage mechanisms like ferroelectric polarization, magnetoresistance, phase change, and various resistive switching effects are receiving increasing attention due to the facts that traditional charge-based memory devices are facing serious scaling limits and become harder to integrate into modern high-performance CMOS processes that use high-k metal gate technology as well as non-planar device geometries [1].
Compare to the other mentioned options, ferroelectric polarization has two important unique selling points. First, the switching is field driven and, therefore, the energy required for writing is the lowest of all options that offer nonvolatility. Second, there are three fundamentally different options for the readout. Direct sensing of the switched charge in a ferroelectric capacitor (DRAM like sensing), coupling of the polarization to the gate of a field effect transistor (Flash like sensing) and modulation of the resistance of a tunnel junction (resistive switching like sensing) offering a high flexibility to taylor devices towards the application requirements while still using the same physical mechanism and material system.
Well-known ferroelectric materials like lead zirconium titanate or strontium bismuth tantalate etc. are difficult to integrate into state-of-the-art electronic fabrication processes. About 15 years ago ferroelectricity in hafnium oxide was discovered and first published 12 years ago [3] opening the path towards exploring ferroelectricity in materials having the fluorite structure. With this innovation, all of a sudden, CMOS compatible ferroelectric materials became available. Moreover, the recent discovery of ferroelectricity in AlScN [5] added wurtzite structure ferroelectrics as a further valuable option.
The talk will first explain the different approaches to realize ferroelectric hafnium oxide based materials of high quality as well as a few aspects about AlScN. Based on this the status of integrating such materials into devices utilizing the three different readout mechanisms described above will be introduced and discussed. Finally, applications beyond the pure memory operation will be illustrated. An outlook towards future developments will conclude the talk.
References:
[1] T. Schenk, M. Pešić, S. Slesazeck, U. Schroeder and T. Mikolajick, Memory technology—a primer for material scientists, Rep. Prog. Phys. 83, 086501 (2020)
[2] T. Mikolajick, M. H. Park, L. Begon‐Lours, and S. Slesazeck, From Ferroelectric Material Optimization to Neuromorphic Devices, Adv. Mater. 35, 2206042 (2023)
[3] T.S. Böscke, J. Müller, D. Bräuhaus, U. Schröder, and U. Böttger, Ferroelectricity in hafnium oxide thin films, Appl. Phys. Lett. 99, 102903 (2011)
[4] S. Fichtner, N. Wolff, F. Lofink, L. Kienle and B. Wagner, AlScN: A III-V semiconductor based ferroelectric, Journal of Applied Physics 125, 114103 (2019)
[5] U. Schroeder, M. H. Park, T. Mikolajick, and C. S. Hwang, The fundamentals and applications of ferroelectric HfO2, Nature Reviews Materials volume 7, pages653–669 (2022)
Modern computing needs, represented by AI model inference, are increasingly limited by the latency and energy costs of memory access. Emerging non-volatile memory (NVM) devices such as resistive random-access memory (RRAM) and magnetic random-access memory (MRAM) have shown potential to enable efficient computing architectures, as the data can be mapped as the conductance values of NVM devices and computation can be directly achieved in-memory by applying input activations as voltage pulses. In a typical compute-in-memory (CIM) configuration, vector-matrix multiplications (VMM) can be performed in analog domain, in place and in parallel, thus achieving high energy efficiency during operation. In this presentation, I will discuss representative CIM array implementations. Challenges such as quantization effects, finite array size, and device non-idealities will be analyzed, and techniques such as fine-grained structured-pruning and tensor-train factoring are explored to address the memory capacity concerns. At the software level, efficient execution of AI workloads requires co-designed compilers to map the network graph on to the tiled weight-stationary architecture and balance the latency among layers. I will discuss examples of co-designed production-scale dataflow compilers that minimizes data movement between tiles and achieve high hardware utilization for diverse workloads.
Beyond VMM, the internal dynamics of memory devices can be used to natively process temporal information embedded in the inputs. Examples of temporal processing in the form of reservoir computing systems will be discussed. These systems can be directly integrated with neuromorphic sensors such as event-based cameras, or potentially form efficient bio-electronic interfaces.
Memory data transfer is becoming the main limiting factor in the execution of AI loads at the edge. To overcome speed and power limitations, imposed by the classical Von Neumann architecture, new computational paradigms are being proposed as possible alternatives. In-Memory-Computing is one of the emerging approaches widely praised in the scientific community thanks to the absence of data transfer between the memory and the computational units, with computation directly executed in the memory array. In this work we assess the performance and reliability of In-Memory-Computing exploiting embedded PCM for AI applications at the edge. We considered two different approaches: first a digital In-Memory-Computing executed in the SRAM with the ePCM serving as low latency storage for Neural Network weights and a second approach where In-Memory-Computing is executing directly in the ePCM in the analog domain. The two approaches offer complementary benefits, so a final solution merging the two techniques is envisioned as a possible path for future NPU at the edge.
We thank our sponsors for making CCMCC possible.