Snow entity learning example

This example use hyperspectal images in the visible range. The images come from the Real-World Hyperspectral Images Database (see reference at the end). The images represent objects surrounding us, books, chairs, trees, buildings ... And, as a hyperspectral image in the visible range is usefull, it cannot sort out real-word objects. A car can have any colors and it's true for many others objects. The question is, if we learn a model for an object from one cube, can we use the model on others cubes. The answer, as we will see, is 'yes and no' and we need more, the shape of the object by example. It's a problem well know in the visible range.

The images used are taken from the Real-World Hyperspectral Images Database web site: This is a database of fifty hyperspectral images of indoor and outdoor scenes under daylight illumination, and an additional twenty-five images under artificial and mixed illumination. The images were captured using a commercial hyperspectral camera (Nuance FX, CRI Inc) with an integrated liquid crystal tunable filter capable of acquiring a hyperspectral image by sequentially tuning the filter through a series of thirty-one narrow wavelength bands, each with approximately 10nm bandwidth and centered at steps of 10nm from 420nm to 720nm. The camera is equipped with an apo-chromatic lens and the images were captured with the smallest viable aperture setting, thus largely avoiding chromatic aberration. All the images are of static scenes, with labels to mask out regions with movement during exposure.

For this example we use outdoor scenes with snow accumulation and without. We learn a snow model with one cube using Support Vector Machine and classify six others with the same model. We exercise the sklearn module functionalities. Using hyperspectral processing with machine learning give more information and it can be, some times, usefull.

In [10]:
%matplotlib inline

from __future__ import print_function
import os
import os.path as osp
from pysptools.util import load_mat_file, shrink, display_linear_stretch
from pysptools.skl import HyperEstimatorCrossVal, HyperSVC
from pysptools.skl.examples import DataMine, Mask


# Use PySptools endmembers extraction and NNLS
# Extract 4 endmembers
def data_mine(M, sample_name):
    ax = {}
    ax['wavelength'] = [(x*10)+420 for x in range(M.shape[2])]
    ax['x'] = 'Wavelength (nm)'
    ax['y'] = 'Reflectance'
    dm = DataMine(M, 4, sample_name)
    dm.display_endmembers(axes=ax)
    dm.display_abundances()
    return dm.get_abundances()


# Create a mask for snow with the fourth endmember
# the threshold is 0.79 
def create_mask(M, amaps):
    m = Mask('Snow')
    m.put1(M, amaps[:,:,3], 0.79)
    m.display(colorMap='Paired')
    return m.get_roi()


# Use the sklearn module (scikit-learn) Cross Validation functionality
def tune(M, mask, params, title):
    ecv = HyperEstimatorCrossVal(HyperSVC, params)
    ecv.fit_cube(M, mask)
    ecv.print(title)
    return ecv.get_best_params()


# Get the snow model
def train(M, mask, C, gamma, class_weight, feature_name):
    model = HyperSVC(C=C, gamma=gamma, class_weight=class_weight)
    model.fit_rois(M, mask)
    return model


# And classify some cubes with the snow model
def batch_classify(spath, model, samples):
    for s in samples:
        M = load_mat_file(osp.join(spath, s))   
        Ms = shrink(shrink(shrink(M)))
        display_linear_stretch(Ms, 19, 13, 3, suffix='shrink_'+s)
        model.classify(Ms)
        model.display(suffix=s)

        
home_path = os.environ['HOME']
source_path = osp.join(home_path, 'data/CZ_hsdb')

sample = 'img1'
# M_img1 is use to create the model
M_img1 = load_mat_file(osp.join(source_path, sample))

# The images are to large for my humble computer and takes to long to process.
# The solution is to cut the pixels number. Each call to shrink() cut by half
# the pixels number and save hours of processing.
M_img1_s = shrink(shrink(shrink(M_img1)))
print('Initial image dimension:', M_img1.shape)
print('Shrinked image dimension:', M_img1_s.shape)
Initial image dimension: (1040, 1392, 31)
Shrinked image dimension: (130, 174, 31)

First, we show the hyperspectral image use to create the learned model.

In [11]:
from IPython.core.display import Image
#Image(filename=osp.join(home_path, 'tools2/CZ_hsdb_img/linear_stretch_img1.png'), width=696, height=520)
Image(filename=osp.join(home_path, 'data/CZ_hsdb_img/linear_stretch_img1_348_260.png'))
Out[11]:

We isolate the snow on the roof. EM4 is the snow spectrum.

In [12]:
amaps = data_mine(M_img1_s, sample)
<matplotlib.figure.Figure at 0x7f8abc3fd198>

And create a mask with it using endmember #4.

In [13]:
mask = create_mask(M_img1_s, amaps)

Support Vector Machine is use to create the model.

For the next step, we do a cross validation on all the cube to tune the C and gamma hyperparameters.

In [14]:
# After some trail, here is a good start for the crossval function:
p_grid = {'C': [5,10,20,30,50], 'gamma': [0.1,0.5,1.0,10.0]}
best = tune(M_img1_s, mask.get_mask(), p_grid, sample)
================================================================
Cross validation results for: img1
Param grid: {'gamma': [0.1, 0.5, 1.0, 10.0], 'C': [5, 10, 20, 30, 50]}
n splits: 2
Shuffle: True
================================================================
Best score: 0.994606542882
Best params: {'gamma': 0.5, 'C': 30}
================================================================
All scores
{'gamma': 0.1, 'C': 5} , score: 0.993280282935 , std: 0.000353669319187
{'gamma': 0.5, 'C': 5} , score: 0.994341290893 , std: 8.84173297966e-05
{'gamma': 1.0, 'C': 5} , score: 0.994385499558 , std: 0.000221043324492
{'gamma': 10.0, 'C': 5} , score: 0.993545534925 , std: 0.0
{'gamma': 0.1, 'C': 10} , score: 0.993457117595 , std: 0.000353669319187
{'gamma': 0.5, 'C': 10} , score: 0.994341290893 , std: 0.00026525198939
{'gamma': 1.0, 'C': 10} , score: 0.994473916888 , std: 0.000397877984085
{'gamma': 10.0, 'C': 10} , score: 0.993545534925 , std: 0.0
{'gamma': 0.1, 'C': 20} , score: 0.993457117595 , std: 8.84173297966e-05
{'gamma': 0.5, 'C': 20} , score: 0.994473916888 , std: 0.000397877984085
{'gamma': 1.0, 'C': 20} , score: 0.994429708223 , std: 0.000176834659593
{'gamma': 10.0, 'C': 20} , score: 0.993545534925 , std: 0.0
{'gamma': 0.1, 'C': 30} , score: 0.993722369584 , std: 8.84173297966e-05
{'gamma': 0.5, 'C': 30} , score: 0.994606542882 , std: 0.00053050397878
{'gamma': 1.0, 'C': 30} , score: 0.994164456233 , std: 8.84173297966e-05
{'gamma': 10.0, 'C': 30} , score: 0.993545534925 , std: 0.0
{'gamma': 0.1, 'C': 50} , score: 0.993722369584 , std: 8.84173297966e-05
{'gamma': 0.5, 'C': 50} , score: 0.994252873563 , std: 0.000353669319187
{'gamma': 1.0, 'C': 50} , score: 0.994031830239 , std: 0.000221043324492
{'gamma': 10.0, 'C': 50} , score: 0.993545534925 , std: 0.0
================================================================

We use the cross-validation result to train a snow model using all the cube.

In [15]:
mo = train(M_img1_s, mask, best['C'], best['gamma'], {0:1,1:1}, sample)

Last, we test the model using different cubes to verify the presence of snow.

In [16]:
samples = ['img1','img2','imga1','imgc7','imgb1','imgb6','imga7']

batch_classify(source_path, mo, samples)

Comments on results

  • img1, this cube is use to create the model, good
  • img2, snow detected, good
  • imga1, false positive, bad
  • imgc7, snow detected, good
  • imgb1, false positive, bad
  • imgb6, no snow detected, good (with some threshold)
  • imga7, no snow detected, good (with some threshold)

We need more than the visible range. By example, SPECIM compiled an minerals identification table in relation to wavelengths (see http://www.specim.fi/downloads/SisuROCK_Datasheet-ver1-14.pdf). As we can see, visible range is poor identifying anything with minerals.

On the other hand, we can see, for hyperspectral images, that support vector machine have a good generalisation capability.

Reference

The images came from the Real-World Hyperspectral Images Database and you can get them here: http://vision.seas.harvard.edu/hyperspec/download.html

The full reference is: Ayan Chakrabarti and Todd Zickler, "Statistics of Real-World Hyperspectral Images," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.