Adaptive Grasping in Robotic Manipulation through Learning-Driven Multi-Modal Sensor Fusion

Main Article Content

Jae-hyuk Moon

Abstract

Robust grasping in unstructured environments remains a fundamental challenge for robotic manipulation systems due to uncertainties in object geometry, occlusion, and sensor noise. This paper proposes a deep learning-based adaptive grasping framework that integrates visual and tactile information through a multi-modal fusion architecture. Specifically, a dual-stream convolutional neural network (CNN) is designed to extract spatial features from RGB-D images and force feedback signals, followed by a cross-attention module for feature alignment and fusion. The model is trained on a dataset of 45,000 grasping trials collected using a 6-DOF robotic arm across 120 object categories. Experimental results demonstrate that the proposed method achieves a grasp success rate of 91.3%, outperforming baseline vision-only models (84.7%) and tactile-only models (79.2%). In cluttered environments with partial occlusion, the success rate remains at 87.5%, indicating strong robustness. Furthermore, real-time deployment shows an average inference latency of 32 ms per frame, enabling practical applicability in industrial settings. Comparative ablation studies confirm that the cross-attention fusion module contributes a +5.8% improvement in grasp accuracy. These results highlight the effectiveness of multi-modal deep learning for enhancing robotic manipulation performance in complex scenarios.

Article Details

How to Cite
Moon, J.- hyuk. (2026). Adaptive Grasping in Robotic Manipulation through Learning-Driven Multi-Modal Sensor Fusion. Journal of Computer Science and Software Applications, 6(4). Retrieved from https://www.mfacademia.org/index.php/jcssa/article/view/269
Section
Articles