Scikit-learn Interface (alpha)¶
This module support an interface between hyperspectral algorithms and scikit-learn.
- Cross Validation
- HyperAdaBoostClassifier
- HyperBaggingClassifier
- HyperExtraTreesClassifier
- HyperGaussianNB
- HyperGradientBoostingClassifier
- HyperKNeighborsClassifier
- HyperLogisticRegression
- HyperRandomForestClassifier
- Suppot Vector Supervised Classification (HyperSVC)
- Unsupervised clustering using KMeans
Utility functions.
See also
See the example file nbex_skl_snow
for a use of HyperEstimatorCrossVal and HyperSVC. See test_sklearn
for an example.
Note
This is an alpha version . This module will certainly grow with time and anything can change, class name, class interface and so on.
Cross Validation¶
-
class
pysptools.skl.
HyperEstimatorCrossVal
(estimator, param_grid)[source]¶ Do a cross validation on a hypercube or a concatenation of hypercubes. Use scikit-learn KFold and GridSearchCV.
-
fit
(X, y)[source]¶ Run the cross validation.
Parameters: - X – numpy array A vector (n_samples, n_features) where each element n_features is a spectrum.
- y – numpy array Target values (n_samples,). A zero value is the background. A value of one or more is a class value.
-
HyperAdaBoostClassifier¶
-
class
pysptools.skl.
HyperAdaBoostClassifier
(base_estimator=None, n_estimators=50, learning_rate=1.0, algorithm='SAMME.R', random_state=None)[source]¶ Apply scikit-learn AdaBoostClassifier on a hypercube.
For the __init__ class contructor parameters: see the sklearn.ensemble.AdaBoostClassifier class parameters
The class is intrumented to be use with the scikit-learn cross validation. It use the plot and display methods from the class Output.
-
classify
(M)[source]¶ Classify a hyperspectral cube.
Parameters: M – numpy array A HSI cube (m x n x p). - Returns: numpy array
- A class map (m x n x 1).
-
display_feature_importances
(n_labels='all', height=0.2, sort=False, suffix=None)[source]¶ Display the feature importances. The output can be split in n graphs.
Parameters: - n_labels – string or integer The number of labels to output by graph. If the value is ‘all’, only one graph is generated.
- height – float [default 0.2] The bar height (in fact width).
- sort – boolean [default False] If true the feature importances are sorted.
- suffix – string [default None] Add a suffix to the file name.
-
fit
(X, y, sample_weight=None)[source]¶ Same as the sklearn.ensemble.GradientBoostingClassifier fit call.
Parameters: - X – numpy array A vector (n_samples, n_features) where each element n_features is a spectrum.
- y – numpy array Target values (n_samples,). A zero value is the background. A value of one or more is a class value.
- sample_weight – array-like of shape = [n_samples], optional
Sample weights. If None, the sample weights are initialized to
1 / n_samples
.
-
fit_rois
(M, ROIs)[source]¶ Fit the HS cube M with the use of ROIs.
Parameters: - M – numpy array A HSI cube (m x n x p).
- ROIs – ROIs type Regions of interest instance.
-
plot_feature_importances
(path, n_labels='all', height=0.2, sort=False, suffix=None)[source]¶ Plot the feature importances. The output can be split in n graphs.
Parameters: - path – string The path where to save the plot.
- n_labels – string or integer The number of labels to output by graph. If the value is ‘all’, only one graph is generated.
- height – float [default 0.2] The bar height (in fact width).
- sort – boolean [default False] If true the feature importances are sorted.
- suffix – string [default None] Add a suffix to the file name.
-
HyperBaggingClassifier¶
-
class
pysptools.skl.
HyperBaggingClassifier
(base_estimator=None, n_estimators=10, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=1, random_state=None, verbose=0)[source]¶ Apply scikit-learn BaggingClassifier on a hypercube.
For the __init__ class contructor parameters: see the sklearn.ensemble.BaggingClassifier class parameters
The class is intrumented to be use with the scikit-learn cross validation. It use the plot and display methods from the class Output.
-
classify
(M)[source]¶ Classify a hyperspectral cube.
Parameters: M – numpy array A HSI cube (m x n x p). - Returns: numpy array
- A class map (m x n x 1).
-
fit
(X, y, sample_weight=None)[source]¶ Same as the sklearn.ensemble.BaggingClassifier fit call.
Parameters: - X – numpy array A vector (n_samples, n_features) where each element n_features is a spectrum.
- y – numpy array Target values (n_samples,). A zero value is the background. A value of one or more is a class value.
- sample_weight – array-like, shape = [n_samples] or None Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
-
HyperExtraTreesClassifier¶
-
class
pysptools.skl.
HyperExtraTreesClassifier
(n_estimators=10, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, min_impurity_split=1e-07, bootstrap=False, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, class_weight=None)[source]¶ Apply scikit-learn ExtraTreesClassifier on a hypercube.
For the __init__ class contructor parameters: see the sklearn.ensemble.ExtraTreesClassifier
The class is intrumented to be use with the scikit-learn cross validation. It use the plot and display methods from the class Output.
-
classify
(M)[source]¶ Classify a hyperspectral cube.
Parameters: M – numpy array A HSI cube (m x n x p). - Returns: numpy array
- A class map (m x n x 1).
-
display_feature_importances
(n_labels='all', height=0.2, sort=False, suffix=None)[source]¶ Display the feature importances. The output can be split in n graphs.
Parameters: - n_labels – string or integer The number of labels to output by graph. If the value is ‘all’, only one graph is generated.
- height – float [default 0.2] The bar height (in fact width).
- sort – boolean [default False] If true the feature importances are sorted.
- suffix – string [default None] Add a suffix to the file name.
-
fit
(X, y, sample_weight=None)[source]¶ Same as the sklearn.ensemble.ExtraTreesClassifier fit call.
Parameters: - X – numpy array A vector (n_samples, n_features) where each element n_features is a spectrum.
- y – numpy array Target values (n_samples,). A zero value is the background. A value of one or more is a class value.
- sample_weight – array-like, shape = [n_samples] or None Sample weights. If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. In the case of classification, splits are also ignored if they would result in any single class carrying a negative weight in either child node.
-
fit_rois
(M, ROIs)[source]¶ Fit the HS cube M with the use of ROIs.
Parameters: - M – numpy array A HSI cube (m x n x p).
- ROIs – ROIs type Regions of interest instance.
-
plot_feature_importances
(path, n_labels='all', height=0.2, sort=False, suffix=None)[source]¶ Plot the feature importances. The output can be split in n graphs.
Parameters: - path – string The path where to save the plot.
- n_labels – string or integer The number of labels to output by graph. If the value is ‘all’, only one graph is generated.
- height – float [default 0.2] The bar height (in fact width).
- sort – boolean [default False] If true the feature importances are sorted.
- suffix – string [default None] Add a suffix to the file name.
-
HyperGaussianNB¶
-
class
pysptools.skl.
HyperGaussianNB
(priors=None)[source]¶ Apply scikit-learn GaussianNB on a hypercube.
For the __init__ class contructor parameters: see the sklearn.naive_bayes.GaussianNB class parameters
The class is intrumented to be use with the scikit-learn cross validation. It use the plot and display methods from the class Output.
-
classify
(M)[source]¶ Classify a hyperspectral cube.
Parameters: M – numpy array A HSI cube (m x n x p). - Returns: numpy array
- A class map (m x n x 1).
-
fit
(X, y, sample_weight=None)[source]¶ Same as the sklearn.naive_bayes.GaussianNB fit call.
Parameters: - X – numpy array A vector (n_samples, n_features) where each element n_features is a spectrum.
- y – numpy array Target values (n_samples,). A zero value is the background. A value of one or more is a class value.
-
HyperGradientBoostingClassifier¶
-
class
pysptools.skl.
HyperGradientBoostingClassifier
(loss='deviance', learning_rate=0.1, n_estimators=100, subsample=1.0, criterion='friedman_mse', min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, min_impurity_split=1e-07, init=None, random_state=None, max_features=None, verbose=0, max_leaf_nodes=None, warm_start=False, presort='auto')[source]¶ Apply scikit-learn GradientBoostingClassifier on a hypercube.
For the __init__ class contructor parameters: see the sklearn.ensemble.GradientBoostingClassifier class parameters
The class is intrumented to be use with the scikit-learn cross validation. It use the plot and display methods from the class Output.
-
classify
(M)[source]¶ Classify a hyperspectral cube.
Parameters: M – numpy array A HSI cube (m x n x p). - Returns: numpy array
- A class map (m x n x 1).
-
display_feature_importances
(n_labels='all', height=0.2, sort=False, suffix=None)[source]¶ Display the feature importances. The output can be split in n graphs.
Parameters: - n_labels – string or integer The number of labels to output by graph. If the value is ‘all’, only one graph is generated.
- height – float [default 0.2] The bar height (in fact width).
- sort – boolean [default False] If true the feature importances are sorted.
- suffix – string [default None] Add a suffix to the file name.
-
fit
(X, y)[source]¶ Same as the sklearn.ensemble.GradientBoostingClassifier fit call.
Parameters: - X – numpy array A vector (n_samples, n_features) where each element n_features is a spectrum.
- y – numpy array Target values (n_samples,). A zero value is the background. A value of one or more is a class value.
-
fit_rois
(M, ROIs)[source]¶ Fit the HS cube M with the use of ROIs.
Parameters: - M – numpy array A HSI cube (m x n x p).
- ROIs – ROIs type Regions of interest instance.
-
plot_feature_importances
(path, n_labels='all', height=0.2, sort=False, suffix=None)[source]¶ Plot the feature importances. The output can be split in n graphs.
Parameters: - path – string The path where to save the plot.
- n_labels – string or integer The number of labels to output by graph. If the value is ‘all’, only one graph is generated.
- height – float [default 0.2] The bar height (in fact width).
- sort – boolean [default False] If true the feature importances are sorted.
- suffix – string [default None] Add a suffix to the file name.
-
HyperKNeighborsClassifier¶
-
class
pysptools.skl.
HyperKNeighborsClassifier
(n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=1, **kwargs)[source]¶ Apply scikit-learn KNeighborsClassifier on a hypercube.
For the __init__ class contructor parameters: see the sklearn.neighbors.KNeighborsClassifier class parameters
The class is intrumented to be use with the scikit-learn cross validation. It use the plot and display methods from the class Output.
-
classify
(M)[source]¶ Classify a hyperspectral cube.
Parameters: M – numpy array A HSI cube (m x n x p). - Returns: numpy array
- A class map (m x n x 1).
-
fit
(X, y)[source]¶ Same as the sklearn.neighbors.KNeighborsClassifier fit call.
Parameters: - X – numpy array A vector (n_samples, n_features) where each element n_features is a spectrum.
- y – numpy array Target values (n_samples,). A zero value is the background. A value of one or more is a class value.
-
HyperLogisticRegression¶
-
class
pysptools.skl.
HyperLogisticRegression
(penalty='l2', dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='liblinear', max_iter=100, multi_class='ovr', verbose=0, warm_start=False, n_jobs=1)[source]¶ Apply scikit-learn LogisticRegression on a hypercube.
For the __init__ class contructor parameters: see the sklearn.linear_model.LogisticRegression class parameters
The class is intrumented to be use with the scikit-learn cross validation. It use the plot and display methods from the class Output.
-
classify
(M)[source]¶ Classify a hyperspectral cube.
Parameters: M – numpy array A HSI cube (m x n x p). - Returns: numpy array
- A class map (m x n x 1).
-
fit
(X, y)[source]¶ Same as the sklearn.linear_model.HyperLogisticRegression fit call.
Parameters: - X – numpy array A vector (n_samples, n_features) where each element n_features is a spectrum.
- y – numpy array Target values (n_samples,). A zero value is the background. A value of one or more is a class value.
-
HyperRandomForestClassifier¶
-
class
pysptools.skl.
HyperRandomForestClassifier
(n_estimators=10, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, class_weight=None)[source]¶ Apply scikit-learn RandomForestClassifier on a hypercube.
For the __init__ class contructor parameters: see the sklearn.ensemble.RandomForestClassifier class parameters
The class is intrumented to be use with the scikit-learn cross validation. It use the plot and display methods from the class Output.
-
classify
(M)[source]¶ Classify a hyperspectral cube.
Parameters: M – numpy array A HSI cube (m x n x p). - Returns: numpy array
- A class map (m x n x 1).
-
display_feature_importances
(n_labels='all', height=0.2, sort=False, suffix=None)[source]¶ Display the feature importances. The output can be split in n graphs.
Parameters: - n_labels – string or integer The number of labels to output by graph. If the value is ‘all’, only one graph is generated.
- height – float [default 0.2] The bar height (in fact width).
- sort – boolean [default False] If true the feature importances are sorted.
- suffix – string [default None] Add a suffix to the file name.
-
fit
(X, y)[source]¶ Same as the sklearn.ensemble.RandomForestClassifier fit call.
Parameters: - X – numpy array A vector (n_samples, n_features) where each element n_features is a spectrum.
- y – numpy array Target values (n_samples,). A zero value is the background. A value of one or more is a class value.
-
fit_rois
(M, ROIs)[source]¶ Fit the HS cube M with the use of ROIs.
Parameters: - M – numpy array A HSI cube (m x n x p).
- ROIs – ROIs type Regions of interest instance.
-
plot_feature_importances
(path, n_labels='all', height=0.2, sort=False, suffix=None)[source]¶ Plot the feature importances. The output can be split in n graphs.
Parameters: - path – string The path where to save the plot.
- n_labels – string or integer The number of labels to output by graph. If the value is ‘all’, only one graph is generated.
- height – float [default 0.2] The bar height (in fact width).
- sort – boolean [default False] If true the feature importances are sorted.
- suffix – string [default None] Add a suffix to the file name.
-
Suppot Vector Supervised Classification (HyperSVC)¶
see test_HyperSVC.py
for an example
-
class
pysptools.skl.
HyperSVC
(C=1.0, kernel='rbf', degree=3, gamma='auto', coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=None, random_state=None)[source]¶ Apply scikit-learn SVC on a hypercube.
For the __init__ class contructor parameters: see the sklearn.svm.SVC class parameters
The class is intrumented to be use with the scikit-learn cross validation. It use the plot and display methods from the class Output.
Note: the class always do a preprocessing.scale before any processing.
Note: the C parameter is set to 1, the result of this setting is that the class_weight is relative to C and that the first value of class_weight is the background. An example: you wish to fit two classes “1” and “2” with the help of one ROI for each, you declare class_weight like this:
- class_weight={0:1,1:10,2:10}
- 0: is always the background and is set to 1, 1: is the first class,
- 2: is the second. A value of 10 for both classes give good results to start with.
-
classify
(M)[source]¶ Classify a hyperspectral cube. Do a preprocessing.scale before.
Parameters: M – numpy array A HSI cube (m x n x p). - Returns: numpy array
- A class map (m x n x 1).
-
fit
(X, y)[source]¶ Same as the sklearn.svm.SVC fit call, but with preprocessing.scale call first.
Parameters: - X – numpy array A vector (n_samples, n_features) where each element n_features is a spectrum.
- y – numpy array Target values (n_samples,). A zero value is the background. A value of one or more is a class value.
Unsupervised clustering using KMeans¶
See the file test_kmeans.py
for an example.
-
class
pysptools.skl.
KMeans
[source]¶ KMeans clustering algorithm adapted to hyperspectral imaging
-
display
(interpolation='none', colorMap='Accent', suffix=None)¶ Display the cluster map.
Parameters: - path – string The path where to put the plot.
- interpolation – string [default none] A matplotlib interpolation method.
- colorMap – string [default ‘Accent’] A color map element of [‘Accent’, ‘Dark2’, ‘Paired’, ‘Pastel1’, ‘Pastel2’, ‘Set1’, ‘Set2’, ‘Set3’], “Accent” is the default and it fall back on “Jet”.
- suffix – string [default None] Add a suffix to the title.
-
plot
(path, interpolation='none', colorMap='Accent', suffix=None)¶ Plot the cluster map.
Parameters: - path – string The path where to put the plot.
- interpolation – string [default none] A matplotlib interpolation method.
- colorMap – string [default ‘Accent’] A color map element of [‘Accent’, ‘Dark2’, ‘Paired’, ‘Pastel1’, ‘Pastel2’, ‘Set1’, ‘Set2’, ‘Set3’], “Accent” is the default and it fall back on “Jet”.
- suffix – string [default None] Add a suffix to the file name.
-
predict
(M, n_clusters=5, n_jobs=1, init='k-means++')¶ KMeans clustering algorithm adapted to hyperspectral imaging. It is a simple wrapper to the scikit-learn version.
Parameters: - M – numpy array A HSI cube (m x n x p).
- n_clusters – int [default 5] The number of clusters to generate.
- n_jobs – int [default 1] Taken from scikit-learn doc: The number of jobs to use for the computation. This works by breaking down the pairwise matrix into n_jobs even slices and computing them in parallel. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used.
- init – string or array [default ‘k-means++’] Taken from scikit-learn doc: Method for initialization, defaults to k-means++: k-means++ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details. random: choose k observations (rows) at random from data for the initial centroids. If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers.
- Returns: numpy array
- A cluster map (m x n x c), c is the clusters number .
-
hyper_scale¶
shape_to_XY¶
-
pysptools.skl.
shape_to_XY
(M_list, cmap_list)[source]¶ Receive as input a hypercubes list and the corresponding masks list. The function reshape and concatenate both to create the X and Y arrays.
Parameters: - M_list – numpy array list A list of HSI cube (m x n x p).
- cmap_list – numpy array list A list of class map (m x n), as usual the classes are numbered: 0 for the background, 1 for the first class …