Modern agricultural practices increasingly rely on automation to address labor shortages and enhance operational efficiency. However, accurately detecting fruits in real-world environments remains challenging due to variable lighting, occlusions, and complex backgrounds.
This paper introduces Hidden Fruit, a portable, multimodal detection framework. First, we present a large-scale dataset containing RGB, depth, NIR, and thermal imagery along with pose information. Then, we detail our compact, Jetson-powered handheld platform designed for data collection and in-field fruit detection.
Our approach synchronizes multiple sensing modalities and applies offline calibration to align heterogeneous data sources. Experimental results show improved detection performance under difficult conditions such as occlusion, clutter, and night settings.
Hidden Fruit contributes an open dataset and methodology to support future research in agricultural automation, multimodal fusion, and object detection.
Digital ↔ Thermal
Digital (180°) ↔ Thermal (180°)
Thermal cameras are widely used for their robustness in harsh conditions and ability to capture temperature-related information. This has led to many thermal datasets. MultiSpectralMotion provides indoor and outdoor thermal images with ground-truth depth from handheld devices.
ViViD++ offers outdoor imagery from vehicle-mounted and handheld platforms. SubT-MRS covers varied platforms under degraded conditions but lacks wildland forest imagery.
Stereo datasets like STheReO, MS2, and FIReStereo mainly focus on urban driving. MS2 includes rain scenes but retains typical driving constraints. FIReStereo supports depth estimation for small UAS in degraded environments.
iscus
Overview of the system and the computer vision pipeline.
Hidden Fruit is a portable data-capture rig that synchronously collects RGB, NIR, LWIR, and depth imagery along with 6-DoF pose. Built around the Jetson Orin Nano, the system ensures synchronized capture, sensor calibration using thermal and NIR checkerboards, and structured data storage.
Platform evolution and field integration
Our platform evolved from an acrylic-mounted prototype using FLIR and RealSense cameras to a tightly integrated handheld system. Benchmarks across datasets show our method performs well under challenging lighting and occlusion scenarios, particularly in night-time and foliage-heavy settings.
We present the HiddenHeatedFruit dataset, which contains synchronized RGB, thermal, NIR, and depth imagery along with pose metadata. Offline calibration and timestamp alignment ensure accurate multimodal fusion. Annotations are semi-automated using custom tools to support 3D fruit localization.
Comparison of RGB, NIR, and Thermal Fusion
The Hidden Fruit system has applications beyond agriculture, including search-and-rescue, surveillance, and industrial automation. Our fusion techniques enhance object localization and robustness, especially in cluttered or degraded environments, offering broad potential for sustainable, automated systems.
Special thanks to Postdoc Andy (Tuan-Anh) Vu for mentorship, Professor Jawed Khalid for lab support, and Professor Yuchen Cui for additional CS guidance. Harris Song was the sole contributor—building the dataset, codebase, experiments, paper, video, and this website.
Overview of the HiddenHeatedFruit dataset
@article{song2025hiddenfruit,
author = {Harris Song},
title = {Hidden Fruit: A Multimodal Framework for Fruit Detection},
journal = {COM SCI 188 Final Project},
year = {2025},
institution = {University of California, Los Angeles}
}