세종대학교

SiteMap

close

Research

Research News

A prestigious global university that fosters Sejong-type talent who challenges creative thinking and communicates with the world.

Research News

Neuromorphic-Based On-Device AI

2025.12.24 152

Neuromorphic-Based On-Device AI

1. Introduction

On-device AI refers to technology that performs artificial intelligence computations directly on edge devices without relying on cloud servers. Conventional cloud-based AI computation methods inevitably introduce delays during data transmission and reception. Therefore, in application fields that require real-time processing in units of milliseconds (ms), such as autonomous driving or Advanced Driver Assistance Systems (ADAS), ensuring safety is difficult due to communication delays when computations are performed through the cloud. For this reason, on-device AI is necessary in fields such as vehicles, smartphones, and industrial robots that require real-time computation and immediate response.


However, since most edge devices are battery-powered, available power is limited. In environments where power supply is not as freely available as in cloud servers, increased computational load leads to greater battery consumption and heat generation, which in turn leads to system instability and a reduced lifespan. Therefore, AI hardware that can maintain high performance and operate with low power is essential.


Neuromorphic computing has attracted attention as an effective approach to realizing such low-power operation. Neuromorphic chips operate based on Spiking Neural Networks (SNNs) and have an event-driven structure in which computations are performed only when neurons generate spikes. This eliminates unnecessary computations and can significantly reduce power consumption.


Therefore, this research aims to implement low-power on-device AI through the design of a neuromorphic processor that utilizes this SNN structure.


2. Overview of Neuromorphic

2.1 Concept and Background of Neuromorphic 

Neuromorphic is a compound term formed from Neuro and Morphic, referring to a computing approach that mimics the structure and operating principles of the human brain. The human brain consists of approximately 86 billion neurons and over 100 trillion synapses. It efficiently processes information in parallel through this vast neural network. Neuromorphic computing represents a new paradigm that seeks to overcome the energy efficiency limitations of conventional computer systems by implementing the structural and functional characteristics of the brain in hardware.


2.2 Information Processing Principles of the Brain: Neurons and Synapses

A neuron is the basic unit of the brain and nervous system. It is a cell that transmits information through electrical and chemical signals. Neurons continuously receive external stimuli, and accordingly, the membrane potential changes. When this membrane potential exceeds a threshold, it generates an electrical signal called a spike, transmitting information to other neurons.

Meanwhile, a synapse is the junction between neurons, playing the role of regulating the transmission strength (weight) of signals. This regulation of synaptic connection strength serves as a fundamental mechanism for learning and memory.


Figure 1. Neurons and Synapses


2.3 Operating Principles of Neuromorphic Chips

A neuromorphic chip is a hardware implementation of these neural signal transmission and learning principles of the brain. Neuromorphic chips emulate this operational method of the brain energy-efficiently through spiking neural networks (SNNs). Similar to biological neurons, SNNs accumulate input signals to update the membrane potential and fire a spike when the threshold is exceeded. The generated spike is transmitted to the next neuron, which changes the potential, and the transmitted influence varies according to the strength of the synapse. By repeating this process over multiple time steps, SNNs can process information in the time domain.


Figure 2. Spiking Neural Network


2.4 Structural Characteristics and Advantages of Neuromorphic 

Neuromorphic systems function in an event-driven manner, maximizing energy efficiency. In other words, signals are transmitted, and computations occur only when neurons fire, eliminating the need to calculate all neurons simultaneously, as is done in conventional AI chips. As a result, unnecessary computations are eliminated, significantly reducing power consumption.


Figure 3. Von Neumann Architecture


Figure 4 Neuromorphic Architecture


Conventional Von Neumann-based hardware has separate processing units (CPU/GPU) and memory, inevitably causing data bottlenecks due to data movement. In contrast, in neuromorphic architectures, neurons and synapses are physically coupled and simultaneously perform computation, storage, and learning through extensive information exchange. Therefore, all processing occurs locally without data movement, fundamentally overcoming the limitations of the Von Neumann architecture.


3. NoC-Based On-Chip Learning Neuromorphic Processor 

Conventional neuromorphic processors utilize NoC (Network on Chip), a technology that connects multiple processing cores within a chip via high-speed networks to maximize data transmission efficiency and parallel processing, thereby enhancing the scalability of neuron cores. However, commonly used 2D-Mesh-based NoC topologies suffer from increasing latency as the number of cores grows, since spike data must traverse multiple hops [1]. Our research team addressed this issue by implementing a star routing topology-based NoC. The star routing topology connects multiple cores to a single router through high-speed interfaces. When a spike packet is input to the router, it is delivered to the connected cores in a multicasting manner. This approach enables flexible communication reconfiguration and achieves low latency with a fixed 4-hop delay.


Figure 5. Overall Architecture of Neuromorphic Processor, Consisting of Router, Neuron Core, and Network Interface


3.1 Router

The router receives output packets, detects spike events, stores them in the output buffer, and then multicasts the data to all neuron cores. The arbiter within the router monitors the FIFO status and, when the buffer is full, sends a stall signal to the external interface through a handshake protocol to prevent overflow.


3.2 Network Interface

The network interface acts as a bridge between the router and neuron core clusters, ensuring the integrity of data transmitted via the handshake interface. A packet consists of 39 bits, and includes time step information within the packet, allowing the time step to be used without the use of a separate counter in the neuron core.


Figure 6. Internal Block Diagram of Network Interface and Spike Packet Structure


3.3 Neuron Core

The neuron core consists of one neuron cell that performs 512 time-multiplexed LIF operations. It retrieves synaptic weights from the Weight SRAM and membrane potential from the Neuron SRAM, updates the membrane potential using an accumulator, and computes the leak value. The updated membrane potential is compared with a threshold to determine whether the neuron spikes, then encoded together with the neuron index and transmitted to the lateral inhibition block, effectively capturing the input patterns and improving inference performance.

 

Figure 7. Internal Blocks and Data Flow of Neuron Core


3.4 STDP Learning Block

Input spike information decoded from the Network Interface is stored in a shift register, retaining up to five pre-spike traces based on the current time step. When a spike occurs in the winner neuron, the block records the spike time and uses a LUT to determine and update the weight change, avoiding complex exponential operations based on time differences.

 

Figure 8. Learning Module Architecture


4. Effective Power Consumption Reduction through Unified Refractory Time

Conventional neuron algorithms introduce a refractory period for each neuron after it fires, during which the neuron does not receive any input and does not perform computations for a certain time. In this case, neurons that fail to extract input features cannot fire, leading to repeated meaningless LIF operations throughout the entire computation period. In this study, by applying a unified refractory time that collectively imposes a refractory time on all neurons whenever a winner neuron fires under WTA (Winner Takes All), the proposed method achieved a 30% reduction in computational load compared to conventional methods.


 

Figure 9. Comparison of Conventional Refractory Time and Proposed Unified Refractory Time


5. Verification and Results

To validate the performance enhancement techniques of the neuromorphic processor introduced above, this research team conducted synthesis and functional verification on FPGA and ASIC platforms. The FPGA implementation results demonstrated resource efficiency with the time-multiplexed structure and achieved 84.7% accuracy in on-chip learning-based inference. Additionally, when compared with a SW model implementing the same structure in C language, it achieved a computation speed over 12 times faster. The single-core neuromorphic processor fabricated using ASIC demonstrated energy efficiency of 22.95pJ/SOP, meeting the target performance.

 

Figure 10. ASIC Chip Layout and Specifications through Samsung 28nm FD-SOI MPW


Based on these results, this research team presented at the IEEE International Conference on Consumer Electronics (ICCE) 2025. They also received the Best Session Award, as shown in Figure 2, and participated in the 26th Korea Semiconductor Design Contest, winning a Corporate Special Award with excellent performance. These achievements demonstrated the potential to achieve high levels of accuracy and energy efficiency in on-device environments alone.


  

Figure 11. Sejong University IQLAB Neuromorphic Research Team Achievement 1 (ICCE Best Session Award)


Figure 12. Sejong University IQLAB Neuromorphic Research Team Achievement 2 (Semiconductor Design Contest Corporate Special Award)


6. Conclusion

Recently, artificial intelligence has been rapidly transitioning from cloud server dependence to on-device AI, which performs computations directly on edge devices. On-device AI enables real-time responses without communication delays but faces the technical challenge of maintaining high performance under limited power environments. Accordingly, the research team aimed to achieve both low-power operation and high-efficiency learning through an SNN-based neuromorphic processor.


To this end, NoC was implemented as the on-chip communication structure, and a star routing topology was newly designed to significantly improve data transmission efficiency and scalability, achieving a fixed 4-hop delay. Additionally, by introducing a unified refractory time technique to eliminate unnecessary computations, the proposed method achieved approximately 30% reduction in computational load compared to conventional methods. Through FPGA- and ASIC-based verification, high energy efficiency and inference accuracy were demonstrated, and these achievements reflected technical excellence, earning the IEEE ICCE 2025 Best Session Award and the Korea Semiconductor Design Contest Corporate Special Award.


This research demonstrates that neuromorphic-based on-device AI has the potential for expansion into various real-time application fields such as autonomous driving, robotics, and mobile AI serving as a core hardware platform for implementing low-power, high-efficiency AI.



References

[1] J. Xue, L. Xie, F. Chen, L. Wu, Q. Tian, Y. Zhou, R. Ying, P. Liu, Edgemap: An optimized mapping toolchain for spiking neural network in edge computing, Sensors 23 (2023) 6548.

[2] Cha, SungHyun, SuHwan Na, and DongSun Kim. "A Fully Digital Neuromorphic AI Processor for Industrial and Consumer Applications." 2025 IEEE International Conference on Consumer Electronics (ICCE). IEEE, 2025.



Next No new post
Previous The Heatwave-Drought Increase in Eurasia Due to Rapid Climate Change and Its Impact on the Climate of South Korea