Data Set

Raw Data

The raw data from each of the collections onboard the robot was collected as a ROS bagfile. These files are very large (10+ GB) and contain time synchronized messages from the cameras, onboard fiber optic inertial measurement unit (IMU), and point spectrometers. Due to the massive amount of data, and the enormous human effort required to segment these 12,874 images, we have left the majority of unlabeled, but available to the research community.

Bag File List

Bag	Description	# HSI Images
olin_collection_05_15_23_parcel_b.bag	Off-road collection, daytime	2068
olin_collection_05_15_23_parcel_b_v2.bag	Off-road collection, daytime	923
olin_collection_05_15_23_parcel_b_v3.bag	Off-road collection, daytime	1117
olin_collection_05_15_23_parcel_b_v4.bag	Off-road collection, daytime	1294
olin_collection_05_16_23_campus.bag	On-road collection, mid-morning	3590
olin_collection_05_16_23_parcel_b.bag	Off-road collection, dawn	2556
olin_collection_05_16_23_parcel_b_v2.bag	Off-road collection, dawn	1326

After downloading a bag file, start a roscore instance and begin the playback of messages with rosbag plag <<BAGFILENAME>>

These data cubes exist currently as digital count measurements. They will be converted to reflectance in the near future via an in-development method that leverages the included spectrometer measurement.

Labeled data

504 images evenly distributed from each of the above bag files has been extracted and labeled.

The file label_map.json contains the mapping from semantic label to numeric value of the mask area. Pixels with label 0 are unlabeled.

The directory structure for the labeled data follows a structured format:

│
├── INSTANCE_LABELS/
│   ├── xxx.json # Image labels according to the ATLAS ontology
├── RGB_RAW/
│   ├── xxx.jpg # Raw RGB image of scene
├── SEGMENTATIONS_RAW/
│   ├── xxx.png # Semantic segmentation masks for each image, where each pixel is the class index
├── SOLAR_SPECTRA/
│   ├── xxx.npy # Spectra of observed solar illumination in numpy format
├── SWIR_RAW/
│   ├── xxx.npy # SWIR hyperspectral datacube in numpy format
└── VNIR_RAW/
    └───xxx.npy # VNIR hyperspectral datacube in numpy format

Each file is named with a structured format across each of the data folders.

bagfile_name_index.file_extension

Essentially, each data item is named in such as way that it can be traced back to the bag file that it came from, and the number of the sample from within that bag file. Correlating that name across the folders allows the retrieval of the full data record.

Generating Registered Data

For the purposes of data compression, the data is stored in an unregistered format. Running the following files locally will create a registered hyperspectral datacube for each of the labeled instances.

The registration process generates datacubes that are ~80 MB each. Altogether, the full labeled dataset is ~40 GB. Make sure you have enough free space locally before executing the script!

The Jupyter Notebook Register_HSI.ipynb contains the functions needed to process the two hyperspectral datacubes into a single datacube. This file also appropriately crops the high-res RGB image and segmentation labels.

Regenerating System Homographies

Registering the hyperspectral cameras is a challenging process, as their resolution is an order of magnitude less than the RGB camera. The folder calibration contains images from all three cameras with a precision checkerboard present. We used keypointgui to manually find corresponding features. The two output homographies are saved as .txt files in the calibration folder.