Hidden Fruit: A Multimodal Framework for Fruit Detection

Harris Song1
1University of California, Los Angeles

Hidden Fruit uses synchronized RGB, thermal, and depth imaging to detect fruit in challenging outdoor conditions.

Abstract

Modern agricultural practices increasingly rely on automation to address labor shortages and enhance operational efficiency. However, accurately detecting fruits in real-world environments remains challenging due to variable lighting, occlusions, and complex backgrounds.

This paper introduces Hidden Fruit, a portable, multimodal detection framework. First, we present a large-scale dataset containing RGB, depth, NIR, and thermal imagery along with pose information. Then, we detail our compact, Jetson-powered handheld platform designed for data collection and in-field fruit detection.

Our approach synchronizes multiple sensing modalities and applies offline calibration to align heterogeneous data sources. Experimental results show improved detection performance under difficult conditions such as occlusion, clutter, and night settings.

Hidden Fruit contributes an open dataset and methodology to support future research in agricultural automation, multimodal fusion, and object detection.

Digital
Thermal

Digital ↔ Thermal

Alt Angle: Digital
Alt Angle: Thermal

Digital (180°) ↔ Thermal (180°)

1. Problem Statement

Thermal cameras are widely used for their robustness in harsh conditions and ability to capture temperature-related information. This has led to many thermal datasets. MultiSpectralMotion provides indoor and outdoor thermal images with ground-truth depth from handheld devices.
ViViD++ offers outdoor imagery from vehicle-mounted and handheld platforms. SubT-MRS covers varied platforms under degraded conditions but lacks wildland forest imagery. Stereo datasets like STheReO, MS2, and FIReStereo mainly focus on urban driving. MS2 includes rain scenes but retains typical driving constraints. FIReStereo supports depth estimation for small UAS in degraded environments. iscus

Comparison of RGB, NIR, and Thermal Fusion

Overview of the system and the computer vision pipeline.

2. System Design / Methodology

Hidden Fruit is a portable data-capture rig that synchronously collects RGB, NIR, LWIR, and depth imagery along with 6-DoF pose. Built around the Jetson Orin Nano, the system ensures synchronized capture, sensor calibration using thermal and NIR checkerboards, and structured data storage.

Platform Comparison

Platform evolution and field integration

3. Evaluation & Dataset

Our platform evolved from an acrylic-mounted prototype using FLIR and RealSense cameras to a tightly integrated handheld system. Benchmarks across datasets show our method performs well under challenging lighting and occlusion scenarios, particularly in night-time and foliage-heavy settings.

We present the HiddenHeatedFruit dataset, which contains synchronized RGB, thermal, NIR, and depth imagery along with pose metadata. Offline calibration and timestamp alignment ensure accurate multimodal fusion. Annotations are semi-automated using custom tools to support 3D fruit localization.

Comparison of RGB, NIR, and Thermal Fusion

Comparison of RGB, NIR, and Thermal Fusion

4. Discussion & Contributions

The Hidden Fruit system has applications beyond agriculture, including search-and-rescue, surveillance, and industrial automation. Our fusion techniques enhance object localization and robustness, especially in cluttered or degraded environments, offering broad potential for sustainable, automated systems.

Special thanks to Postdoc Andy (Tuan-Anh) Vu for mentorship, Professor Jawed Khalid for lab support, and Professor Yuchen Cui for additional CS guidance. Harris Song was the sole contributor—building the dataset, codebase, experiments, paper, video, and this website.

Dataset Overview Table

Overview of the HiddenHeatedFruit dataset

BibTeX

@article{song2025hiddenfruit,
  author    = {Harris Song},
  title     = {Hidden Fruit: A Multimodal Framework for Fruit Detection},
  journal   = {COM SCI 188 Final Project},
  year      = {2025},
  institution = {University of California, Los Angeles}
}